functionend ignored

David Grundberg individ at acc.umu.se
Thu Jul 16 17:34:42 CDT 2009


John W. Eaton skrev:
> On 16-Jul-2009, David Grundberg wrote:
>
> | > OK, what is happening is that endfunction tokens (and end tokens that
> | > should match function keywords) are ignored by the lexer.  I think
> | > this was done as a semi-tricky way to handle subfunctions.  I guess we
> | > should fix that in some other way so that the parser does see end
> | > keywords instead of just EOF.  I'll try to take a look a this part of
> | > the problem, but I can't guarantee that I will get to it any time
> | > soon.
> |
> | Looked further into the lexer yesterday and pretty much came to the same 
> | conclusion (special cases for parsing subfunctions). I think this is an 
> | interesting problem (and such a challenging tokenizer/parser) so I'll 
> | look at it more.
>
> My thoughts were to change the lexer/parser to do the following
>
>   if parsing a function file (not a script) and we are looking at
>   the second "function" keyword in the file then:
>
>     if we are not expecting an "end" token (or, perhaps
>     equivalently, we are not currently parsing a function) then we
>     are looking at a subfunction.
>   

Yes, but I think this only works when function/endfunctions are 
correctly balanced.

>     Otherwise, we are looking at a nested function (which Octave
>     doesn't currently handle, but without some changes like this,
>     we won't be able to, so I think this change is needed anyway).
>
> jwe
>   

In well formed situations, there are basically two forms of function files:

cat > foo.m << EOF
%foo.m
function foo ()
  a = 'not shared';
  siblingfoo ()
  disp (a); % prints 'not shared'
end
function siblingfoo ()
  a = 'shared';
end
EOF

and

cat > bar.m << EOF
%bar.m
function bar ()
  a = 'not shared';
  subbar ();
  disp (a); % prints 'shared'
  function subbar ()
    a = 'shared';
  end
end
EOF

The problem is with missing ends (first with no ends):

cat > foobar.m << EOF
%foobar.m
function foobar ()
  a = 'not shared';
  siblingfoo ();
  disp (a); % prints 'not shared'
function siblingfoo ()
  a = 'shared';
EOF

And a second with one end included and one missing:

cat > foobar2.m << EOF
%foobar2.m
% The function "subbar" was closed
% with an 'end', but at least one other function definition was not.
% To avoid confusion when using nested functions,
% it is illegal to use both conventions in the same file.
function foobar ()
  a = 'not shared';
  subbar ();
  disp (a);
function subbar ()
  a = 'shared';
end
EOF

Notice the behavior of foobar.m. It should be interpreted as foo.m. Now 
we have a very peculiar behavior, where we can't tell whether or not we 
are parsing subfunctions before the very first endfunction.

I think we need these rules:

If we see a functiondefinition (quux) while we parse a function (baz), 
we add it to the baz function's function list. The parent pointer must 
be moved to quux.

If we see EOF before any endfunction, all functions in baz's function 
list are siblings to baz, and the function hierarchy must be rearranged. 
I think we should give a warning and say that all functions should be 
ended correctly. (Matlab doesn't do that.)

If we see a endfunction, then the current function is closed. Since it 
contains a endfunction, we won't allow unbalanced ends. (like the 
illegal foobar2.m). The parent pointer must be moved back to the ended 
functions parent.

If we get EOF and we are defining a function and a sibling file is 
impossible (i.e. we have seen an endfunction), we need to raise an error 
(like Matlab does with foobar2.m)

Now two examples (case A and B), if we are seeing this file (case A):

1. function foo()
2. function bar()
3. function baz()
4. endfunction
5. endfunction
6. endfunction

Initial state:
  requires_balanced = false
  parent pointer = toplevel (implicit start level)
  toplevel.functions = {}

1. function foo()

  parent pointer = foo
  toplevel.functions = {foo}

2. function bar()

  parent pointer = bar
  foo.functions = {bar}

3. function baz()

  parent pointer = baz
  bar.functions = {baz}

4. endfunction

  parent pointer = bar (baz.parent)
  requires_balanced = true (because we've seen an endfunction)

5. endfunction

  parent pointer = foo (bar.parent)

6. endfunction

  parent pointer = toplevel (foo.parent)


Or if we have a file that looks like this (case B):

1. function foo()
2. function bar()
3. function baz()
4. EOF

Then after line 3, we'll have exactly the same state as in case A:

  toplevel.functions = {foo}
  foo.functions = {bar}
  bar.functions = {baz}

But the EOF will require that we reevaluate the relation between the 
functions.

4. EOF

Function tree must be rearranged (oboy) to

  toplevel.functions = {foo, bar, baz}
  foo.functions = {}
  bar.functions = {}

As stated previously, I think files missing all endfunctions should 
print a warning or some diagnostic.

Notice that if we had read an endfunction or two before the EOF in case 
B, then we issue an error. (not enough endfunctions)

And last, a question:

Correct me if I'm wrong, but I think the lexer shouldn't need to know 
what level of indention we are parsing. I don't understand why the lexer 
has been written this way, with end_tokens_expected? To me, the only 
situation where the lexer needs to mark 'end' as a non-keyword is in 
object indexing.

Best regards,
David


More information about the Bug-octave mailing list