Aw: Re: regexp: matching expressions b4 and after ....
David Bateman
David.Bateman at motorola.com
Tue Sep 9 08:41:54 CDT 2008
David Bateman wrote:
> Ok, forget it.. I figured it out.. The issue is that matlab uses a
> different syntax for named tokens than PCRE, so we are obliged to look
> for named tokens like "(?<name>)" and replace them with the PCRE
> compatible "(?P<name>)". The test in Octave to do this was trapping
> "(?<=...)" and "(?<!...") as a syntax error for a matlab named token.
> The other lookaround operator "(?=...)" and "(?!...)" seem to work as
> pretty much as expected.
>
> One issue is that PCRE does not accept arbitrary length lookaround
> expressions and so "(?<=[a-z]*)" is not legal with PCRE. Though
> maximum length lookarounds are acceptable, so you can write instead
> "(?<=[a-z]{10})" for example.
>
> I have a changeset to address this, but wonder if I should look for
> lookaround operators with "*" or "+" and replace with "{MAX_LENGTH}"
> and "{1:MAX_LENGTH}" respectively, with a warning about this
> limitation. Should I do this before submitting the changeset?
>
Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length
lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok
but "(?<[a-z]*)" isn't. I'd hoped to replace this with
"(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is
not ok either. What I'd have to do is replace it with
((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
which used the alternate operator and MALENGTH+1 copies of the
lookbehind expression to get the effect. This seems to be a ridiculous
amount of extra crap in the pattern space to get this functionality. Is
it worth supporting arbitrary length lookbehind expressions like
"(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is
it worth supporting it but limits max_length, and print a warning? If so
what value should be the limit?
Frankly I wonder how mathworks got this to work as they appear to be
using the Boost regex library which also doesn't support arbitrary
length lookbehind expressions....
D.
--
David Bateman David.Bateman at motorola.com
Motorola Labs - Paris +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax)
The information contained in this communication has been classified as:
[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary
More information about the Help-octave
mailing list