[Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ....
David Bateman
David.Bateman at motorola.com
Tue Sep 9 11:10:49 CDT 2008
Ben Abbott wrote:
> On Tuesday, September 09, 2008, at 09:41AM, "David Bateman" <David.Bateman at motorola.com> wrote:
>
>> Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length
>> lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok
>> but "(?<[a-z]*)" isn't. I'd hoped to replace this with
>> "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is
>> not ok either. What I'd have to do is replace it with
>>
>> ((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
>>
>> which used the alternate operator and MALENGTH+1 copies of the
>> lookbehind expression to get the effect. This seems to be a ridiculous
>> amount of extra crap in the pattern space to get this functionality. Is
>> it worth supporting arbitrary length lookbehind expressions like
>> "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is
>> it worth supporting it but limits max_length, and print a warning? If so
>> what value should be the limit?
>>
>> Frankly I wonder how mathworks got this to work as they appear to be
>> using the Boost regex library which also doesn't support arbitrary
>> length lookbehind expressions....
>>
>> D.
>>
>
> David,
>
> Have you tried the example in Matlab?
>
> Using 2007b, It does *not* work for me. My 2008a/b is busy running some simulations, so I can't try it there until later.
>
>
>>> g='x^(-1)+y(-1)+z(-1)=0';
>>> regexprep(g,'(?<=[a-z]*)\(\-[1-9]*\)','\_minus1')
>>>
> ans =
> x^_minus1+y_minus1+z_minus1=0
>
> If I understand correctly the result should be
>
> ans =
> x^(-1)+y_minus1+z_minus1=0
>
> Correct?
>
> Ben
>
>
>
>
The message
http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/babf37252132fd99/250b037e60b345ff?lnk=gst&q=lookbehind#250b037e60b345ff
seems to imply that mathworks have their own regexp engine and that
lookbehind is inefficient. I therefore don't consider it that much of an
issue to duplicate the lookbehind pattern in the pattern space and so
propose the attached changeset that replaces "(?>=[a-z]*)" with
"((?>=[a-z]{0})|(?>=[a-z]{1})|...(?>=[a-z]{10}))" before calling PCRE on
it. It also issues a warning about the maximum length string if the
lookbehind might be an issue. So the limitation is that "+" then
represents 1 to 10 characters and "*" 0 to 10 characters in a lookbehind
expression. This limitation doesn't apply to lookaheads, etc.
D.
--
David Bateman David.Bateman at motorola.com
Motorola Labs - Paris +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax)
The information contained in this communication has been classified as:
[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch8502
Url: https://www-old.cae.wisc.edu/pipermail/help-octave/attachments/20080909/a9446cfe/attachment-0001.ksh
More information about the Help-octave
mailing list