[Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ....

David Bateman David.Bateman at motorola.com
Wed Sep 10 06:49:15 CDT 2008


John W. Eaton wrote:
> On  9-Sep-2008, David Bateman wrote:
>
> | Ben Abbott wrote:
> | > On Tuesday, September 09, 2008, at 09:41AM, "David Bateman" <David.Bateman at motorola.com> wrote:
> | >   
> | >> Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length 
> | >> lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok 
> | >> but "(?<[a-z]*)" isn't. I'd hoped to replace this with 
> | >> "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is 
> | >> not ok either. What I'd have to do is replace it with
> | >>
> | >> ((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
> | >>
> | >> which used the alternate operator and MALENGTH+1 copies of the 
> | >> lookbehind expression to get the effect. This seems to be a ridiculous 
> | >> amount of extra crap in the pattern space to get this functionality. Is 
> | >> it worth supporting arbitrary length lookbehind expressions like 
> | >> "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is 
> | >> it worth supporting it but limits max_length, and print a warning? If so 
> | >> what value should be the limit?
> | >>
> | >> Frankly I wonder how mathworks got this to work as they appear to be 
> | >> using the Boost regex library which also doesn't support arbitrary 
> | >> length lookbehind expressions....
> | >>
> | >> D.
> | >>     
> | >
> | > David,
> | >
> | > Have you tried the example in Matlab?
> | >
> | > Using 2007b, It does *not* work for me. My 2008a/b is busy running some simulations, so I can't try it there until later.
> | >
> | >   
> | >>> g='x^(-1)+y(-1)+z(-1)=0';
> | >>> regexprep(g,'(?<=[a-z]*)\(\-[1-9]*\)','\_minus1')
> | >>>       
> | > ans =
> | > x^_minus1+y_minus1+z_minus1=0
> | >
> | > If I understand correctly the result should be 
> | >
> | > ans =
> | > x^(-1)+y_minus1+z_minus1=0
> | >
> | > Correct?
> | >
> | > Ben
> | >
> | >
> | >
> | >   
> | 
> | The message
> | 
> | http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/babf37252132fd99/250b037e60b345ff?lnk=gst&q=lookbehind#250b037e60b345ff
> | 
> | seems to imply that mathworks have their own regexp engine and that 
> | lookbehind is inefficient. I therefore don't consider it that much of an 
> | issue to duplicate the lookbehind pattern in the pattern space and so 
> | propose the attached changeset that replaces "(?>=[a-z]*)" with 
> | "((?>=[a-z]{0})|(?>=[a-z]{1})|...(?>=[a-z]{10}))" before calling PCRE on 
> | it. It also issues a warning about the maximum length string if the 
> | lookbehind might be an issue. So the limitation is that "+" then 
> | represents 1 to 10 characters and "*" 0 to 10 characters in a lookbehind 
> | expression. This limitation doesn't apply to lookaheads, etc.
>
> Is the bug report
>
>   http://bugs.exim.org/show_bug.cgi?id=547
>
> the same problem?  Note the comment
>
>   I can't see an efficient way of doing this with the current
>   implementation.  Note that Perl is even more restrictive - all
>   alternatives in the lookbehind have to be the same length in Perl.
>   
Well I added this as alternative lookbehinds rather than alternatives in 
the lookbend expression itself. However yes it is the same issue.


> I guess it might be worth asking whether there is a way to get this
> feature, even if it is not efficient.
>   
The inefficient way of doing it is essentially do the pattern space 
expansion I did but in PCRE itself. However it can be more efficient in 
PCRE as it can know how much it has to expand the search length. There 
also cases like "(?<=Nov(ember))" to consider that match both "Nov" and 
"November" and so need to be expanded as "((?<=Nov)|(?<=November))" that 
I haven't taken into account. Maybe this is what the bug report is 
taking about about PCRE handling alternatively in the lookbehind 
expressions.

Yes it would be better if PCRE handled this internal rather than leaving 
us to do it by modifying the pattern.

Cheers
David

> Meanwhile, I've applied your changeset.
>
> Thanks,
>
> jwe
>
>   


-- 
David Bateman                                David.Bateman at motorola.com
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph) 
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob) 
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax) 

The information contained in this communication has been classified as: 

[x] General Business Information 
[ ] Motorola Internal Use Only 
[ ] Motorola Confidential Proprietary



More information about the Help-octave mailing list