segfault after regexp

Thomas Weber thomas.weber.mail at gmail.com
Sat Oct 4 04:40:04 CDT 2008


On Tue, Sep 30, 2008 at 04:16:29PM -0400, John W. Eaton wrote:
> On 30-Sep-2008, Thomas Weber wrote:
> There is already a handler installed for SIGSEGV, but I think it fails
> in this instance.  I'm not certain why that happens, but my guess is
> that, calling the signal handler fails if there is no more stack space.

Yes, according to sigaltstack(2), that's the problem. (I'm mentioning
sigaltstack here mostly for reference, so I don't have to search the net
again). 

I fear however that this will turn into a very system-specific solution.

> 
> | > | It might be possible to change regexp.cc to set the soft limit for stack
> | > | recursion (the equivalent of the above 'ulimit -s' command) to the hard
> | > | limit. 
> | > | 
> | > | I don't know however what kind of consequences this has for the system
> | > | in question.
> | > 
> | > I think we should first find out why this is going into an apparently
> | > infinite recursion.  If it is an error in the way that we are using
> | > the PCRE functions, then maybe we can fix it.  Otherwise, I think the
> | > bug should be fixed in PCRE.
> | 
> | I don't think it's an infinite recursion (it works when given enough
> | space, so it's definitely finite). 
> 
> OK, but it seems like a very large number of recursive calls for what
> seems to be a relatively simple regexp operating on what also seems to
> be a small amount of data.

According to pcrestack(3), that's a problem that might happen with
nested, unlimited regexps. 
 
> | There are already several options in the PCRE library, including a
> | different implementation for regexps like this.
> 
> So should we be trying to recognize the characteristics of the regexp
> and set some options before calling PCRE?  

I don't think we will have much luck in recognizing the characteristics
of a regexp. If the data is trivial, even the most complicated regexp
will work; vice versa, with enough data, even simple regexp's might run
into this.


> should be handled by PCRE itself.  We are just users of the library.
> How are we supposed to know what kinds of regexps will cause trouble?

Well, quoting pcrestack's man page:
"As a very rough rule of thumb, you should reckon on about 500 bytes per
recursion. Thus, if you want to limit your stack usage to 8Mb, you
should set the limit at 16000 recursions. A 64Mb stack, on the  other
hand,  can support around 128000 recursions. The pcretest test program
has a command line option (-S) that can be used to increase the size of
its stack."

So, we have some estimates, with a security factor of (say) 2, we should
be alright. 

This doesn't address the important question though: what kind of memory
limit do we pose on the stack?

	Thomas


More information about the Bug-octave mailing list