Sunday, March 17, 2013

Parsing expression grammar (PEG)

Parsing expression grammar or PEG is a type of analytic formal grammar. It was first proposed by Bryan Ford in 2004. It is related to the top-down parsing model.

It looks similar to CFG, but different in that it uses "/" instead of "|" to specify the derivation choices of a non-terminal. In doing so it also specifies that the rules delimited by "/" are chosen in the order of appearance. This in effect adds a priority order to the rules involved, and avoids relevant ambiguity. Say in a CFG state you have two rules in shift/reduce or reduce/reduce conflict, which in CFG needs extra priority/precedence specification, now in PEG one just choose the one with high priority. This way it claims that: unlike CFG, PEG cannot be ambiguous.

Any PEG can be parsed in linear time by using a packrat parser.

Advantages of PEG include: no ambiguity; more powerful than regular expressions (this does not sound a very special advantage though, since LL/LR are also more powerful than regular expressions); does not need a separate tokenization step (such as by lex), tokenization rules in PEG can be written the same way as other grammar rules.

Disadvantages of PEG include: large memory consumption; cannot use left-recursion (same as LL); may contain subtle grammar errors and the author needs to tweak the grammar a bit; cannot recognize some unambiguous non-deterministic grammars, for example: S ← 'x' S 'x' | 'x'. Note that LL and LR also cannot recognize this, but CYK algorithm can.

Reference:
[1] http://en.wikipedia.org/wiki/Parsing_expression_grammar

No comments:

Blog Archive

Followers