[Sbp-users] Indentation (tree) parsing

Adam Megacz megacz at cs.berkeley.edu
Sun May 2 16:58:45 EDT 2010


Adam Megacz <megacz at cs.berkeley.edu> writes:
> I believe that variable arity parsing (where the arity is determined
> from the indentation, like in Haskell) is possible, but I haven't had
> time to work it out yet.

FWIW, there are at least two subtle issues here.

First, how to handle a "partial" indentation decrease.  For example:

...red
........blue
.....green

There really are only two choices.  Either allow this (and say that
blue and green are siblings with red their common parent), or declare
this to be an error and reject it.

The other issue is trickier.

...red
.....blue
........green
.....purple
.....white

Are purple and white siblings, or are they a single node that spans
two lines?  The knee-jerk reaction is to say that they are siblings,
but then you won't be able to parse this:

...module foo
.....import fred
........bob
.....if 1=2 then yes
.....else no

because the "if 1=2 then yes" and "else no" will end up as siblings in
the parse tree, where any sane grammar will expect "else no" to be a
child of "if 1=2".

The choices here are more difficult:

  1. Always assume they are siblings

  2. Always assume that the largest possible contiguous sequence of
     identically-indented lines is a single node

  3. Allow both cases, creating an ambiguity

The problem with (1) is described above.  (2) isn't really workable
either, because it would clearly reject indentation patterns that are
widely used in Haskell, Python, YAML, etc.

(3) is interesting, and it shows off the unique advantages you get
when combining GLR parsing with intersection grammars: the indentation
grammar by itself is ambiguous, but if you mix it with the grammar for
the non-whitespace part of a language the intersection of the two is
(hopefully) unambiguous.

The problem comes from that "hopefully" part... perhaps my biggest
frustration with SBP so far is how difficult it is to know if your
grammar is ambiguous, and how hard it can be to figure out the source
(and proper resolution) of an ambiguity.  So I'm not sure if I want to
encourage more ambiguity, especially if it's being created by one part
of the grammar and resolved by a very distant part.

  - a




More information about the Sbp-users mailing list