CSc 453 University of Arizona ======================================================== Intro to Parsing with Derivatives ======================================================== 10/11/16 Modifications we have had to make to the grammar. - Precedence and associativity. - Removing left recursion for LL(1). - As doing an LL(1) parse have to add left associativity back in. Ideally - Specify the grammar and have a parser generator create the parser for you. - Options: LALR - N is number of tokens in the program - O(N) parsing - Shift-reduce errors Parsing with Derivatives - Punchline: Provide a grammar, ambiguous or not, as data. Can provide input, and the output of a PWD parser is the forest of possible parse trees. - Let’s start with recognizers. A recognizer takes as input a grammar and an input and indicates whether the input belongs in the language. - First we need a test suite Inputs that are in and not in each language. - Grammar A, Start=Stm: Stm -> print NUM - Grammar B, Start = Stm: Stm -> print NUM | read ID - Grammar C, Start = SL: SL -> Stm SL | epsilon, Stm as above - Recognizing with Derivatives, python-like pseudocode G[0] = grammar for language input = list of tokens count = 1 for each token in the input: G[count] = Deriv( G[count-1], token ) count = count + 1 if nullable(G[count]) then accept else reject - Deriv Examples: Taking the derivative of a grammar wrt a token. - Input: grammar, token - Output: new grammar that describes language of strings in the given grammar with the given token removed off beginning of the string - Examples - Input: Grammar A, print token - Output: Derivative of Grammar A wrt the print token - D_print_Stm -> NUM - Have students do other examples. - Deriv Algorithm: empty (empty set) denotes empty language, IOW language with no strings epsilon denotes empty string x and t are variables representing tokens L1 and L2 represent grammars. D_t(empty) = empty D_t(epsilon) = empty D_t(x) = if x==t then epsilon else empty D_t(L1 | L2) = D_t(L1) | D_t(L2) D_t(L1 L2) = D_t(L1) L2 | Delta(L1) D_t(L2) Delta(L1) = if nullable(L1) then epsilon else empty - Apply this algorithm to the Grammar A, B, and C examples. - What happens when have epsilon concatenated with L? - What happens when have empty set concatenated with L? - What happens with Grammar C? Some PWD History - 1964, Brzozowski Derivatives of Regular Expressions - 2009, Ownes, Reppy, and Turon, Regular-expression Derivatives re-examined - 2010, October 2010, Matthew Might and David Darais post "Yacc is dead" paper on arXiv - Lots of interaction ensued. See blog post, http://matt.might.net/articles/parsing-with-derivatives/. - 2016, PLDI paper, "On the Complexity and Performance of Parsing with Derivatives" by Michael D. Adams, Celeste Hollenbeck, Matthew Might http://conf.researchr.org/event/pldi-2016/pldi-2016-papers-on-the-complexity-and-performance-of-parsing-with-derivatives Limitations - O(N^3) and this is after optimizations like compaction are incorporated - Not computable to determine if a grammar is ambiguous. Will need to catch the case that the parser returns more than one parse tree as an error. Then heuristics will need to be added to select between the different parse trees. Probably want to incorporate this into the parse process itself. Would those reduction thingies they have in PWD paper work? ------------------------ mstrout@cs.arizona.edu, 10/11/16