Overlapping Regexps

Bill Denney bill at denney.ws
Mon Mar 31 16:50:41 CDT 2008


Kim Hansen wrote:
> On Sun, Mar 30, 2008 at 6:26 PM, Bill Denney <bill at denney.ws> wrote:
>   
>> When running the following,
>>
>>  frag = {"MGTGGR" "R" "GAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIR" "NNLTR" \
>>         "LHELENCSVIEGHLQILLMFK" "TRPEDFR" "DLSFPK" "LIMITDYLLLFR" \
>>         "VYGLESLK" "DLFPNLTVIR"};
>>  seq = strcat (frag{:});
>>  cuts = regexp (seq, '[KR][^P]');
>>
>>  the result is
>>  cuts = [6 41 46 67 74 80 92 100],
>>  but I expect for cuts to also find 7.  In other words, I expected
>>  cuts = [6 7 41 46 67 74 80 92 100].
>>
>>  On a related note, if there is overlap in matches, is there a way to
>>  make regexp return the overlapping matches?  For example:
>>
>>  a = "ababababab"
>>  b = regexp (a, "aba")
>>
>>  returns b = [1 5] when I would like for it to return b = [1 3 5 7].
>>
>>  Is this a bug in my understanding of regexp or in regexp?
>>     
>
> What you need is the "zero-width positive look-ahead assertion", it is
> documented for Perl in "man perlre". I have just tested it in Octave
> and it works there too (octave uses libpcre for regexpes).
>
> Your first regexp should be: "[KR](?=[^P])"   or "[KR](?!P)"
>
> The second: "a(?=ba)"
Thanks, that was just what I was looking for.  I didn't know about those 
(and I thought that I knew regexps-- there is apparently always more to 
know about them).

Have a good day,

Bill


More information about the Help-octave mailing list