Aw: Re: regexp: matching expressions b4 and after ....

David Bateman David.Bateman at motorola.com
Tue Sep 9 08:41:54 CDT 2008


David Bateman wrote:
> Ok, forget it.. I figured it out.. The issue is that matlab uses a 
> different syntax for named tokens than PCRE, so we are obliged to look 
> for named tokens like "(?<name>)" and replace them with the PCRE 
> compatible "(?P<name>)". The test in Octave to do this was trapping 
> "(?<=...)" and "(?<!...") as a syntax error for a matlab named token. 
> The other lookaround operator "(?=...)" and "(?!...)" seem to work as 
> pretty much as expected.
>
> One issue is that PCRE does not accept arbitrary length lookaround 
> expressions and so  "(?<=[a-z]*)" is not legal with PCRE. Though 
> maximum length lookarounds are acceptable, so you can write instead 
> "(?<=[a-z]{10})" for example.
>
> I have a changeset to address this, but wonder if I should look for 
> lookaround operators with "*" or "+" and replace with "{MAX_LENGTH}" 
> and "{1:MAX_LENGTH}" respectively, with a warning about this 
> limitation. Should I do this before submitting the changeset?
>

Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length 
lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok 
but "(?<[a-z]*)" isn't. I'd hoped to replace this with 
"(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is 
not ok either. What I'd have to do is replace it with

((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))

which used the alternate operator and MALENGTH+1 copies of the 
lookbehind expression to get the effect. This seems to be a ridiculous 
amount of extra crap in the pattern space to get this functionality. Is 
it worth supporting arbitrary length lookbehind expressions like 
"(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is 
it worth supporting it but limits max_length, and print a warning? If so 
what value should be the limit?

Frankly I wonder how mathworks got this to work as they appear to be 
using the Boost regex library which also doesn't support arbitrary 
length lookbehind expressions....

D.


-- 
David Bateman                                David.Bateman at motorola.com
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph) 
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob) 
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax) 

The information contained in this communication has been classified as: 

[x] General Business Information 
[ ] Motorola Internal Use Only 
[ ] Motorola Confidential Proprietary



More information about the Help-octave mailing list