CSc 453 University of Arizona
========================================================
Left Associativity and LL(1) Parsing
========================================================
10/11/16


One group came to office hours yesterday and brought up an important
issue.  When we remove the left recursion from an expression grammar for
a left associative operator, the bottom up creation of the AST no longer
reflects the correct left associativity.
 
Assume the grammar that can handle left associativity is as follows
E -> E - Q | Q
Q -> ID
 
We would eliminate the left recursion with the following:
E -> Q E'
E' -> - Q E' | epsilon
 
Both grammars still generate the same set of strings, BUT the second
grammar will result in a parse tree where the subtrees are always to the
right.
   a - b - c
E ==> Q E'  ==> ID(a) E' ==> ID(a) - Q E' ==> ID(a) - ID(b) E' 
  ==> ID(a) - ID(b) - Q E' ==> ID(a) - ID(b) - ID(c)
 
If you draw the parse tree, the second subtraction will end up as the
right subtree to the first subtraction.  The straightforward creation of
the AST with this parse will result in a right associative subtraction
(a-(b-c)) and an incorrect code being generated.

I drew on this department's compiler braintrust (Dr. Sethi, Dr.
Proebsting, Dr. Colberg, and Dr. Debray) to help figure out how to
correctly implement left associativity with a recursive descent parser. 
Immediately below I show how you can add one more parameter to your
expression parse for each level of precedence to create the correct AST.
Below that I include Dr. Sethi's description.  We will also discuss
this in class on Tuesday.
 
-- This function would have the same type signature, but ...
--     * when calling parseE' would pass the AST constructed for parsing Q
--     * and would return the AST created in parseE'.
parseE :: [Token] -> (AST,[Token])
parseE (x:ts)
    | inFIRST_Q x =
        let
            (q_ast,ts1) = parseQ (x:ts)
        in
            parseE' ts1 q_ast
 
-- This function will need to take the given AST and put it as a left 
-- subtree in the AST.
-- It will also have to recursively call itself with the AST it just created.
parseE' :: [Token] -> AST -> (AST,[Token])
parseE' (TokenMinus:ts) x =
    let
        (q_ast,ts1) = parseQ ts
    in
        parseE' ts1 (MinusExp x q_ast)
 
Groups should figure out the pattern and make this work for all other operators.
 
Why isn't this approach needed for right associative expressions?
 
====================
Dr. Sethi's email
 
Michelle,
 
Yes, the usual LL(1) grammar for expressions is right recursive.  The
short answer is that we cover construction of syntax trees (the usual
left -recursive ones) on pages 318-321 of the purple dragon book.

The longer answer is as follows.  The usual grammar for expressions has
productions of the form
 
E -> E + T | E * T | T
T -> T * F | T / F | F
F -> number
 
Here is a version suitable for top-down parsing:
 
E -> T A
A -> + T A | - T A | <empty>
T -> F B
B -> * F B | / F B | <empty>
F -> number
 
With respect to E -> T A, suppose you have a subtree for T and want to
build the subtree for A.  For clarity, let's use digits, as in T1 and T2
to distinguish between instances of a nonterminal such as T.  The
production can therefore be rewritten as E -> T1 A1.  As a example, if
the input is 9-5-3, suppose T1 generates 9 and A1 generates -5-3.  Build
the subtree for T1 -- that's a leaf for 9.  Then, pass this subtree as a
parameter to the function for parsing A.

Now consider A1 -> - T2 A2, where again the subscripts distinguish
between instances of nonterminals.  A1 generates -5-3, T2 generates 5,
and A2 generates -3.  Recall from the last paragraph that the function
gets (inherits) the subtree for 9 as a parameter.  When parsing - T2 A2,
call the function for T as usual to build the subtree 5 for T2.

Now, before calling the function for A2, construct the subtree for 9 -5.
 We inherited the subtree for 9 and just build the subtree for 5, so we
can build the node for 9-5.

At this point, continue parsing the right side.  We've already processed
T2 in -T2 A2, so the next call is the function for A.  Pass it the
subtree we just built for 9-5 as a parameter.
 
In effect the recursive descent pseudo code for nonterminal A is as follows
 
function A(x)  # parameter x is a subtree
    If next token is '-'
        match('-')
        x = minusnode(x, T())
        return A(x)
 
Hope this helps.  I'd be happy to go over it with you.
 
Ravi
 
P.S.  The formal explanation is in terms of inherited and syntesized
attributes.  The parameter for function A is an inherited attribute.  It
returns a synthesized attribute.  The code is beguilingly simple.


==========================
Student post on Piazza
Code that compiles and works for MinusExp

module Main where

import System.IO

data Token
    = TokenMinus
    | TokenNum Int
    deriving (Show,Eq)
    
data AST
    = MinusExp AST AST
    | IntLiteral Int
    deriving (Show)

main = do
    let toks = [TokenNum 5, TokenMinus, TokenNum 4, TokenMinus, TokenNum 3]
        (ast,_)  = parseE toks
    print (show ast)

-- This function would have the same type signature, but ...
--     when calling parseE' would pass the AST constructed for parsing Q
--     and would return the AST created in parseE'.
parseE :: [Token] -> (AST,[Token])
parseE ((TokenNum n):ts) =
        let
            (q_ast,ts1) = parseQ ((TokenNum n):ts)
        in
            parseE' ts1 q_ast
 
-- This function will need to take the given AST and put it as a left 
-- subtree in the AST.
-- It will also have to recursively call itself with the AST it just created.
parseE' :: [Token] -> AST -> (AST,[Token])
parseE' (TokenMinus:ts) x =
    let
        (q_ast,ts1) = parseQ ts
    in
        parseE' ts1 (MinusExp x q_ast)
        
parseE' [] ast = (ast,[])
        
parseQ :: [Token] -> (AST,[Token])
parseQ ((TokenNum n):ts) = (IntLiteral n, ts)

         
------------------------
mstrout@cs.arizona.edu, 10/11/16