CSc 453 University of Arizona ======================================================== Left Associativity and LL(1) Parsing ======================================================== 10/11/16 One group came to office hours yesterday and brought up an important issue. When we remove the left recursion from an expression grammar for a left associative operator, the bottom up creation of the AST no longer reflects the correct left associativity. Assume the grammar that can handle left associativity is as follows E -> E - Q | Q Q -> ID We would eliminate the left recursion with the following: E -> Q E' E' -> - Q E' | epsilon Both grammars still generate the same set of strings, BUT the second grammar will result in a parse tree where the subtrees are always to the right. a - b - c E ==> Q E' ==> ID(a) E' ==> ID(a) - Q E' ==> ID(a) - ID(b) E' ==> ID(a) - ID(b) - Q E' ==> ID(a) - ID(b) - ID(c) If you draw the parse tree, the second subtraction will end up as the right subtree to the first subtraction. The straightforward creation of the AST with this parse will result in a right associative subtraction (a-(b-c)) and an incorrect code being generated. I drew on this department's compiler braintrust (Dr. Sethi, Dr. Proebsting, Dr. Colberg, and Dr. Debray) to help figure out how to correctly implement left associativity with a recursive descent parser. Immediately below I show how you can add one more parameter to your expression parse for each level of precedence to create the correct AST. Below that I include Dr. Sethi's description. We will also discuss this in class on Tuesday. -- This function would have the same type signature, but ... -- * when calling parseE' would pass the AST constructed for parsing Q -- * and would return the AST created in parseE'. parseE :: [Token] -> (AST,[Token]) parseE (x:ts) | inFIRST_Q x = let (q_ast,ts1) = parseQ (x:ts) in parseE' ts1 q_ast -- This function will need to take the given AST and put it as a left -- subtree in the AST. -- It will also have to recursively call itself with the AST it just created. parseE' :: [Token] -> AST -> (AST,[Token]) parseE' (TokenMinus:ts) x = let (q_ast,ts1) = parseQ ts in parseE' ts1 (MinusExp x q_ast) Groups should figure out the pattern and make this work for all other operators. Why isn't this approach needed for right associative expressions? ==================== Dr. Sethi's email Michelle, Yes, the usual LL(1) grammar for expressions is right recursive. The short answer is that we cover construction of syntax trees (the usual left -recursive ones) on pages 318-321 of the purple dragon book. The longer answer is as follows. The usual grammar for expressions has productions of the form E -> E + T | E * T | T T -> T * F | T / F | F F -> number Here is a version suitable for top-down parsing: E -> T A A -> + T A | - T A | T -> F B B -> * F B | / F B | F -> number With respect to E -> T A, suppose you have a subtree for T and want to build the subtree for A. For clarity, let's use digits, as in T1 and T2 to distinguish between instances of a nonterminal such as T. The production can therefore be rewritten as E -> T1 A1. As a example, if the input is 9-5-3, suppose T1 generates 9 and A1 generates -5-3. Build the subtree for T1 -- that's a leaf for 9. Then, pass this subtree as a parameter to the function for parsing A. Now consider A1 -> - T2 A2, where again the subscripts distinguish between instances of nonterminals. A1 generates -5-3, T2 generates 5, and A2 generates -3. Recall from the last paragraph that the function gets (inherits) the subtree for 9 as a parameter. When parsing - T2 A2, call the function for T as usual to build the subtree 5 for T2. Now, before calling the function for A2, construct the subtree for 9 -5. We inherited the subtree for 9 and just build the subtree for 5, so we can build the node for 9-5. At this point, continue parsing the right side. We've already processed T2 in -T2 A2, so the next call is the function for A. Pass it the subtree we just built for 9-5 as a parameter. In effect the recursive descent pseudo code for nonterminal A is as follows function A(x) # parameter x is a subtree If next token is '-' match('-') x = minusnode(x, T()) return A(x) Hope this helps. I'd be happy to go over it with you. Ravi P.S. The formal explanation is in terms of inherited and syntesized attributes. The parameter for function A is an inherited attribute. It returns a synthesized attribute. The code is beguilingly simple. ========================== Student post on Piazza Code that compiles and works for MinusExp module Main where import System.IO data Token = TokenMinus | TokenNum Int deriving (Show,Eq) data AST = MinusExp AST AST | IntLiteral Int deriving (Show) main = do let toks = [TokenNum 5, TokenMinus, TokenNum 4, TokenMinus, TokenNum 3] (ast,_) = parseE toks print (show ast) -- This function would have the same type signature, but ... -- when calling parseE' would pass the AST constructed for parsing Q -- and would return the AST created in parseE'. parseE :: [Token] -> (AST,[Token]) parseE ((TokenNum n):ts) = let (q_ast,ts1) = parseQ ((TokenNum n):ts) in parseE' ts1 q_ast -- This function will need to take the given AST and put it as a left -- subtree in the AST. -- It will also have to recursively call itself with the AST it just created. parseE' :: [Token] -> AST -> (AST,[Token]) parseE' (TokenMinus:ts) x = let (q_ast,ts1) = parseQ ts in parseE' ts1 (MinusExp x q_ast) parseE' [] ast = (ast,[]) parseQ :: [Token] -> (AST,[Token]) parseQ ((TokenNum n):ts) = (IntLiteral n, ts) ------------------------ mstrout@cs.arizona.edu, 10/11/16