Occasional seg fault in make check at dispatch.cc

John W. Eaton jwe at octave.org
Fri Feb 6 11:11:44 CST 2009


On  6-Feb-2009, Michael D. Godfrey wrote:

| >
| > Does the following change fix the problem for you (at least, does it
| > avoid the segfault)?
| >
| >   http://hg.savannah.gnu.org/hgweb/octave/rev/7838271ee25c
| I did a hg pull
|            hg update
| 
| after your email, which I believe applied this patch.  The resulting
| system has been run several times on 64 and on 32bit. 
| Using your instructions for running under gdb.  Most of
| the time dispatch.cc produces 2 FAILs, but never a seg fault
| so far.  The output in fntests.log if the were 2 FAILS is:
| 
|  >>>>> processing /d/src/octave/hg/octave/src/DLD-FUNCTIONS/dispatch.cc
|   ***** test # replace base m-file
|  echo_to_file ('function a=dispatch_x(a)', "dispatch_x.m");
|  dispatch('dispatch_x','length','string')
|  assert(dispatch_x(3),3)
|  assert(dispatch_x("a"),1)
|  sleep (2);
|  echo_to_file ('function a=dispatch_x(a),++a;', "dispatch_x.m");
|  rehash();
|  assert(dispatch_x(3),4)
|  assert(dispatch_x("a"),1)
| !!!!! test failed
| `dispatch_x' undefined near line 5 column 9  ***** test # replace 
| dispatch m-file
|  echo_to_file ('function a=dispatch_y(a)', "dispatch_y.m");
|  dispatch('hello','dispatch_y','complex scalar')
|  assert(hello(3i),3i)
|  sleep (2);
|  echo_to_file ('function a=dispatch_y(a),++a;', "dispatch_y.m");
|  rehash();
|  assert(hello(3i),1+3i)
| !!!!! test failed
| `hello' undefined near line 5 column 9>>>>> processing 
| /d/src/octave/hg/octave/src/DLD-
| FUNCTIONS/dlmread.cc
| 
| =================================================
| So, at least, this seems to have cured the seg faults.  It would be nice 
| to know why...

I'm not sure.  As far as I can see, doing something like

  #include <iostream>

  int foo (int i)
  {
    static bool ok_to_recurse = true;
    std::cerr << "foo" << std::endl;
    if (i)
      return i;
    int retval = 0;
    if (ok_to_recurse)
      {
	ok_to_recurse = false;
	retval = foo (1);
      }
    ok_to_recurse = true;
    return retval;
  }

  int main (void)
  {
    std::cerr << "foo: " << foo (0) << std::endl;
    std::cerr << "foo: " << foo (0) << std::endl;
  }

should print

  foo
  foo
  foo: 1
  foo
  foo
  foo: 1

and this is I think essentially what the
symbol_table::fcn_info::fcn_info_rep::find function was doing.  Though
it is more complex overall, this was the method the find function used
to recurse at most one time.  But since it seems clearer to write

  #include <iostream>

  int bar (int i)
  {
    std::cerr << "bar" << std::endl;
    if (i)
      return i;

    return 0;
  }

  int foo (int i)
  {
    int retval = bar (i);
    if (! retval)
      retval = bar (1);

    return retval;
  }

  int main (void)
  {
    std::cerr << "foo: " << foo (0) << std::endl;
    std::cerr << "foo: " << foo (0) << std::endl;
  }

I don't mind changing it.

| At least once on 32 bit it ran with no FAILS.

I still see some random failures.  They seem less frequent by
inserting delays, like this:

  %!function echo_to_file (str, name, pre_delay, post_delay)
  %!  sleep (pre_delay);
  %!  fid = fopen (name, 'w');
  %!  if (fid != -1)
  %!    fprintf (fid, str);
  %!    fprintf (fid, '\n');
  %!    fclose (fid);
  %!  sleep (post_delay);
  %!  endif

  %!test # replace base m-file
  %! echo_to_file ('function a=dispatch_x(a)', "dispatch_x.m", 1, 2);
  %! unwind_protect
  %!   rehash();
  %!   dispatch('dispatch_x','length','string')
  %!   assert(dispatch_x(3),3)
  %!   assert(dispatch_x("a"),1)
  %!   echo_to_file ('function a=dispatch_x(a),++a;', "dispatch_x.m", 1, 2);
  %!   rehash();
  %!   assert(dispatch_x(3),4)
  %!   assert(dispatch_x("a"),1)
  %! unwind_protect_cleanup
  %!   unlink("dispatch_x.m");
  %! end_unwind_protect

  %!test # replace dispatch m-file
  %! echo_to_file ('function a=dispatch_y(a)', "dispatch_y.m", 1, 2);
  %! unwind_protect
  %!   rehash();
  %!   dispatch('hello','dispatch_y','complex scalar')
  %!   assert(hello(3i),3i)
  %!   echo_to_file ('function a=dispatch_y(a),++a;', "dispatch_y.m", 1, 2);
  %!   rehash();
  %!   assert(hello(3i),1+3i)
  %!   unwind_protect_cleanup
  %!   unlink("dispatch_y.m");
  %! end_unwind_protect

but I don't see why the newly created files are not found immediately.
I can see that there might be a race condition if we were using a
subprocess to write the files, but we are simply opening a file,
writing to it, and closing it, then looking for it, all from the same
process.  Oops, except that we are relying on file and directory time
stamps to decide whether a function has changed or a directory has a
new file, and the on many systems, that has a resolution of only one
second.  But the experiments I've tried so far have not confirmed that
exactly where the check is failing, so I really don't know yet if
there is any way to make this more reliable.  I guess the good thing
is that we don't really expect people to be writing scripts that
generate .m files, so this test is a somewhat unusual case.

Unfortunately, I also see that if I make the delays all 0 in the
modified test case above, that the test still hangs on my system (and
so might crash on yours) and I also don't see why that is happening.

jwe


More information about the Bug-octave mailing list