[Sbp-users] Parsing Expression Grammar (PEG) vs. SBP (scannerless GLR)

Adam Megacz megacz at cs.berkeley.edu
Sat Apr 4 01:28:40 EDT 2009


Bernhard Damberger <bernied at gmail.com> writes:
> I am curious as to what you think about the current rage in the
> parsing world with PEGs (and Packrat parsing)?

PEGs are quite nice.  Keep in mind, however, that in order to use them
the grammar writer must prioritize all choices; this can lead to very
subtle errors.  Terrence Parr puts it quite well:

  http://article.gmane.org/gmane.comp.parsers.peg.general/2

In contrast, advanced GLR parsers give you both prioritized choice and
nondeterministic choice.  The latter results in a runtime error
(ambiguity) when both alternatives match the string.

Adding exclusive choice -- wherein the parser generator must prove at
grammar compilation time that no string matches both alternatives --
would turn this runtime error into a compile-time error.  It's
undecidable in general, but tractible in many common cases.  Someday
I'd like to add it to SBP.

Bryan Ford gives an example of a CFG that PEGs cannot parse in his
ICFP'02 paper:

  S = "x" S "x" | "x"


> I know PEGs are linear (in both time and memory).
> I noticed that SBP is O(n^3 log n) in time.

Only for pathological grammars.  If you give SBP an LALR grammar, it
will run in linear time.  I believe this is also true any PEG grammar
fed to SBP, although I haven't given it enough careful thought.  I
would be very interested in hearing if anybody has found a
PEG-parseable grammar that causes superlinear behavior in SBP.

One longer-term goal I have is to provide analysis tools that help
grammar writers figure out if their grammars are likely to produce
ambiguities or exhibit significantly superlinear behavior.


> How about in memory?

This is the nice part about (G)LR parsers: memory consumption is
proportional to the nesting depth of the productions used.  In many
domains this is "effectively O(1)".

For example, when parsing something like C/C++ with an LR parser the
memory usage winds up being proportional to (basically) the nesting
depth of your expressions, rather than the length of your input file.
For human-generated source code in most programming languages that
turns out to be "effectively constant" -- people tend to actively seek
out ways to break up deeply-nested expressions in their code.

SBP currently builds the entire parse tree, so you won't get O(1)
memory consumption out of it at the moment.  Parsers that use semantic
actions (like Elkhound) or which produce incremental parse trees
(perhaps in SBP's future) do, however.

  - a




More information about the Sbp-users mailing list