segfault after regexp

Thomas Weber thomas.weber.mail at gmail.com
Sat Oct 11 09:39:53 CDT 2008


On Sat, Oct 11, 2008 at 09:09:24AM -0400, John W. Eaton wrote:
> On 11-Oct-2008, Thomas Weber wrote:
> 
> | On Sat, Oct 04, 2008 at 11:40:04AM +0200, Thomas Weber wrote:
> | > Well, quoting pcrestack's man page:
> | > "As a very rough rule of thumb, you should reckon on about 500 bytes per
> | > recursion. Thus, if you want to limit your stack usage to 8Mb, you
> | > should set the limit at 16000 recursions. A 64Mb stack, on the  other
> | > hand,  can support around 128000 recursions. The pcretest test program
> | > has a command line option (-S) that can be used to increase the size of
> | > its stack."
> | > 
> | > So, we have some estimates, with a security factor of (say) 2, we should
> | > be alright. 
> | > 
> | > This doesn't address the important question though: what kind of memory
> | > limit do we pose on the stack?
> | 
> | Patch attached. I assume a maximum of 500MB on the stack (if there's no
> | hard limit), with a safety factor of 2.
> 
> I don't think getrlimit and setrlimit are portable, so at a minimum,
> you'll need a configure check and only use this method if thse
> functions are available.

I actually thought they are in POSIX. Can people with different systems
comment? For that matter, does the original crash happen on Windows or
Mac?
 
> But is this really the right place for the fix?  Or even the right
> approach to take?  Octave is not the only program using PCRE that
> might run into this problem.  

Eh, yes:
PHP:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=476419

PCRE itself:
http://bugs.exim.org/show_bug.cgi?id=704


> It seems to me that it would be better to fix it in PCRE itself,
> preferably by using a different algorithm that doesn't suffer from
> these problems.  

There is a different algorithm already implemented, pcre_dfa_exec().
It's not Perl compatible, though.

Reading through its documentation, we will just hit a different problem,
though: it needs a workspace for saving the number of different possible
matches. So we would need to choose how many partial matches we would
like to track (man pcreapi for details).

> Modifying the stack limit does not seem like a real fix to the actual
> problem.  Instead, you are just hiding it.  The problem still exists,
> and will still bite for larger problems or more complex data.

Sorry, but with enough data, your RAM won't handle that, either. There's
a limit on how much we can cater:
1) When compiling PCRE, the user has chosen a far too large recursion
limit.
2) The soft limit in his shell on stack usage is too low for the value
from 1).

3) There comes Octave, simply using what it is told to use and not
working with it. But now Octave should overcome 1) and 2)? I'd say if it
was trivial to overcome, PCRE would handle it itself.

PCRE's default usage means aggressive recursion, how should we change
that?

	Thomas


More information about the Bug-octave mailing list