[Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ....

David Bateman David.Bateman at motorola.com
Tue Sep 9 11:10:49 CDT 2008


Ben Abbott wrote:
> On Tuesday, September 09, 2008, at 09:41AM, "David Bateman" <David.Bateman at motorola.com> wrote:
>   
>> Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length 
>> lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok 
>> but "(?<[a-z]*)" isn't. I'd hoped to replace this with 
>> "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is 
>> not ok either. What I'd have to do is replace it with
>>
>> ((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
>>
>> which used the alternate operator and MALENGTH+1 copies of the 
>> lookbehind expression to get the effect. This seems to be a ridiculous 
>> amount of extra crap in the pattern space to get this functionality. Is 
>> it worth supporting arbitrary length lookbehind expressions like 
>> "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is 
>> it worth supporting it but limits max_length, and print a warning? If so 
>> what value should be the limit?
>>
>> Frankly I wonder how mathworks got this to work as they appear to be 
>> using the Boost regex library which also doesn't support arbitrary 
>> length lookbehind expressions....
>>
>> D.
>>     
>
> David,
>
> Have you tried the example in Matlab?
>
> Using 2007b, It does *not* work for me. My 2008a/b is busy running some simulations, so I can't try it there until later.
>
>   
>>> g='x^(-1)+y(-1)+z(-1)=0';
>>> regexprep(g,'(?<=[a-z]*)\(\-[1-9]*\)','\_minus1')
>>>       
> ans =
> x^_minus1+y_minus1+z_minus1=0
>
> If I understand correctly the result should be 
>
> ans =
> x^(-1)+y_minus1+z_minus1=0
>
> Correct?
>
> Ben
>
>
>
>   

The message

http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/babf37252132fd99/250b037e60b345ff?lnk=gst&q=lookbehind#250b037e60b345ff

seems to imply that mathworks have their own regexp engine and that 
lookbehind is inefficient. I therefore don't consider it that much of an 
issue to duplicate the lookbehind pattern in the pattern space and so 
propose the attached changeset that replaces "(?>=[a-z]*)" with 
"((?>=[a-z]{0})|(?>=[a-z]{1})|...(?>=[a-z]{10}))" before calling PCRE on 
it. It also issues a warning about the maximum length string if the 
lookbehind might be an issue. So the limitation is that "+" then 
represents 1 to 10 characters and "*" 0 to 10 characters in a lookbehind 
expression. This limitation doesn't apply to lookaheads, etc.

D.

-- 
David Bateman                                David.Bateman at motorola.com
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph) 
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob) 
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax) 

The information contained in this communication has been classified as: 

[x] General Business Information 
[ ] Motorola Internal Use Only 
[ ] Motorola Confidential Proprietary

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch8502
Url: https://www-old.cae.wisc.edu/pipermail/help-octave/attachments/20080909/a9446cfe/attachment-0001.ksh 


More information about the Help-octave mailing list