CSc 453 University of Arizona
========================================================
Intro to Parsing with Derivatives
========================================================
10/11/16


Modifications we have had to make to the grammar.
    - Precedence and associativity.
    - Removing left recursion for LL(1).
    - As doing an LL(1) parse have to add left associativity back in.
    
Ideally
    - Specify the grammar and have a parser generator create the parser for you.
    - Options: LALR
        - N is number of tokens in the program
        - O(N) parsing
        - Shift-reduce errors
        
Parsing with Derivatives
    - Punchline: Provide a grammar, ambiguous or not, as data.  
    Can provide input, and the output of a PWD parser is the forest 
    of possible parse trees.
    
    - Let’s start with recognizers.  A recognizer takes as input a grammar 
    and an input and indicates whether the input belongs in the language.

    - First we need a test suite  
    Inputs that are in and not in each language.

        - Grammar A, Start=Stm: Stm -> print NUM

        - Grammar B, Start = Stm: Stm -> print NUM | read ID

        - Grammar C, Start = SL: SL -> Stm SL | epsilon, Stm as above

    - Recognizing with Derivatives, python-like pseudocode
        G[0] = grammar for language
        input = list of tokens
        count = 1
        for each token in the input:
            G[count] = Deriv( G[count-1],  token )
            count = count + 1
            
        if nullable(G[count]) then accept else reject
        
    - Deriv Examples: Taking the derivative of a grammar wrt a token.
        - Input: grammar, token
        
        - Output: new grammar that describes language of strings in the given 
        grammar with the given token removed off beginning of the string
        
        - Examples
            - Input: Grammar A, print token
            - Output: Derivative of Grammar A wrt the print token
                - D_print_Stm -> NUM
            - Have students do other examples.

    - Deriv Algorithm:

        empty (empty set) denotes empty language, IOW language with no strings
        epsilon denotes empty string
        
        x and t are variables representing tokens
        L1 and L2 represent grammars.
        
        D_t(empty) = empty

        D_t(epsilon) = empty
        
        D_t(x) = if x==t then epsilon else empty
        
        D_t(L1 | L2) = D_t(L1) | D_t(L2)
        
        D_t(L1 L2) = D_t(L1) L2 | Delta(L1) D_t(L2)
        
        Delta(L1) = if nullable(L1) then epsilon else empty
    
    - Apply this algorithm to the Grammar A, B, and C examples.
    
    - What happens when have epsilon concatenated with L?
    
    - What happens when have empty set concatenated with L?
    
    - What happens with Grammar C?
 
    
Some PWD History
    - 1964, Brzozowski Derivatives of Regular Expressions
    - 2009, Ownes, Reppy, and Turon, Regular-expression Derivatives re-examined
    - 2010, October 2010, Matthew Might and David Darais post "Yacc is dead"
            paper on arXiv
    - Lots of interaction ensued.  See blog post, 
        http://matt.might.net/articles/parsing-with-derivatives/.
    - 2016, PLDI paper, "On the Complexity and Performance of Parsing with 
        Derivatives" by Michael D. Adams, Celeste Hollenbeck, Matthew Might
          http://conf.researchr.org/event/pldi-2016/pldi-2016-papers-on-the-complexity-and-performance-of-parsing-with-derivatives

Limitations
    - O(N^3) and this is after optimizations like compaction are incorporated
    - Not computable to determine if a grammar is ambiguous.  Will need
    to catch the case that the parser returns more than one parse tree
    as an error.  Then heuristics will need to be added to select
    between the different parse trees.  Probably want to incorporate
    this into the parse process itself.  Would those reduction thingies
    they have in PWD paper work?
         
------------------------
mstrout@cs.arizona.edu, 10/11/16