On July 22nd, Guido, the father of Python, published his first blog post “PEG Parser” on Medium. In this article, Guido says he is considering using PEG Parser instead of the existing class LL(1) parsing (named pgen) to refactor the Python interpreter. The reason is that the current pgen limits the freedom of Python grammar, making some grammars difficult to implement, and also makes the current grammar tree not clean enough, which affects the grammar tree’s ideology to a certain extent, and does not best reflect the designer’s intention.
What is the difference between PEG Parser and the existing LL(1) Parser? It can be as simple to understand that the PEG syntax interpreter will load all the code at once when parsing the syntax, so the interpreter can judge the semantics of the grammar according to the symbols at any position. The current LL(1) Parse parsing grammar only detects a symbol forward to guess the semantics, which leads to the ambiguous expression of some grammars, which limits the grammar definition of the Python language. Of course, loading all the code at once means that PEG Parser needs more memory to run.
Guido said that:
“My idea now, putting these things together, is to see if we can create a new parser for CPython that uses PEG and packrat parsing to construct the AST directly during parsing, thereby skipping the intermediate parse tree construction, possibly saving memory despite using an infinite lookahead buffer. I’m not there yet, but I have a prototype that can compile a subset of Python into an AST at about the same speed as CPython’s current parser. It uses more memory, however, and I expect that extending the subset to the full language will slow down the PEG parser. But I also haven’t done anything to optimize it, so I am hopeful.”