|
|
1.1 ! root 1: ! 2: ML-YACC, version 1.0 ! 3: ! 4: Preliminary Documentation ! 5: for Preliminary Version ! 6: ! 7: David R. Tarditi ! 8: Andrew W. Appel ! 9: ! 10: Department of Computer Science ! 11: Princeton University ! 12: Princeton, NJ 08544 ! 13: ! 14: February 1, 1989 ! 15: ! 16: (c) 1989 Andrew W. Appel, David R. Tarditi ! 17: This software comes with ABSOLUTELY NO WARRANTY. ! 18: This software is subject only to the PRINCETON STANDARD ML SOFTWARE LIBRARY ! 19: COPYRIGHT NOTICE, LICENSE AND DISCLAIMER, (in the file "COPYRIGHT", ! 20: distributed with this software). You may copy and distribute this software; ! 21: see the COPYRIGHT NOTICE for details and restrictions. ! 22: ! 23: Description ! 24: ----------- ! 25: ! 26: This is a preliminary guide to using ML-Yacc. It is not complete documentation. ! 27: It tells how to invoke ML-Yacc, and the syntax of an ML-Yacc specification. ! 28: The syntax description assumes the user knows how to use Yacc, and notes the ! 29: differences between ML-Yacc and Yacc. ! 30: ! 31: Note that version 2.0 will be released at the end of 1989; it has a slightly ! 32: different interface and much improved error recovery. ! 33: ! 34: ML-Yacc is Yacc-like parser generator for Standard ML. It generates ! 35: parsers for LALR languages, like Yacc, and its syntax is very similar to ! 36: that of Yacc. ! 37: ! 38: It handles syntax errors differently from Yacc. The parser ! 39: generated by ML-Yacc will attempt to automatically recover from a ! 40: syntax error by making a single token insertion, deletion, or substitution. ! 41: At the moment, only tokens without values can be inserted or substituted. ! 42: A future version will also insert tokens with values, by providing a ! 43: mechanism for specifying code to be evaluated for each token that will ! 44: provided a dummy value. ! 45: ! 46: Syntax Error Recovery Method. ! 47: ----------------------------- ! 48: ! 49: The method used is described in ! 50: ! 51: 'A Practical Method for LR and LL Syntactic Error Diagnosis and ! 52: Recovery', by M. Burke and G. Fisher, ACM Transactions on ! 53: Programming Languages and Systems, Vol. 9, No. 2, April 1987, ! 54: pp. 164-197. ! 55: ! 56: The partial, deferred method discussed in the article has been ! 57: implemented. ! 58: ! 59: This method defers reductions for some number of shifted tokens. The ! 60: deferred reductions are kept in a queue, and when an element is pulled ! 61: from the queue, the reductions are then applied to the value stack. ! 62: ! 63: When a syntax error is encountered, the parser reads some number of tokens ! 64: ahead. It then uses the queue of deferred actions to check possible error ! 65: recoveries. IMPORTANT: Since the lexer is several tokens ahead of the ! 66: execution of semantic actions, the semantic routines CANNOT influence the ! 67: action of the lexer in any significant way. Example 1: it is hard to imagine ! 68: how to compile "typedef" in C. Example 2: the "infix" operators of ML must be ! 69: treated as ordinary identifiers and handled specially by the semantic actions. ! 70: ! 71: ! 72: Invoking ML-Yacc ! 73: ---------------- ! 74: ! 75: Use "mlyacc.sml"; this will create a structure ParseGen. The function ! 76: ParseGen.parseGen creates a program for a parser from an input ! 77: specification. It takes a string argument -- the name of the file ! 78: containing the input specification. The output file name is ! 79: determined by appending ".sml" to the input file name. ! 80: ! 81: Using the parser created by ML-Yacc ! 82: ----------------------------------- ! 83: ! 84: Invoking the parser ! 85: ------------------- ! 86: ! 87: After following the commands above, a structure whose name was ! 88: specified by the user will exist. Three of the components are relevent to ! 89: the user: ! 90: ! 91: HDR: a structure containing values declared in the "user routines" ! 92: section of the ML-Yacc specification. ! 93: ! 94: LexValue: a structure containing constructors for returning tokens ! 95: and their values to the parser from the lexical analyzer. ! 96: ! 97: parse: a function which takes a lexing function and a pair of integers ! 98: and parses the input from the lexing functions. ! 99: The lexing function must returns values of type LexValue.V ! 100: The pair of integers specifies the defer and lookahead ! 101: parameters for the error-recovery algorithm. ! 102: For interactive parsers, (0,0) is suggested; for ! 103: good error recovery, (10,5) is suggested. ! 104: ! 105: Creating the lex function ! 106: ------------------------- ! 107: ! 108: The lexing function must have the following form: ! 109: ! 110: lexer : unit -> LexValue.V ! 111: ! 112: This lexical-analyzer interface is exactly that provided by ML-Lex, an ML ! 113: version of Lex. The user should include the statement ! 114: 'open {structure name}.LexValue' ! 115: in the user routines section of his ML-Lex specification. This will ! 116: make the components of HDR and the constructors for terminals available to ! 117: the lex actions. The lex actions should all return these constructors as ! 118: values. ! 119: ! 120: The names of the constructors are the same as the terminal names used ! 121: in the ML-Yacc specification. The constructors for those terminals ! 122: that have values take only values with the same type as the terminal, ! 123: of course. Those for terminals with no values are nullary constructors. ! 124: ! 125: A sample ML-Lex specification is given at the end of this document. ! 126: ! 127: Tying the lexer and parser together ! 128: ----------------------------------- ! 129: ! 130: The use of ML-Lex is suggested but not required. ! 131: The user should create the lexing function using makeLexer, and then ! 132: pass this function to the function C.parse. Here is some code which ! 133: does this. ! 134: ! 135: fun parse(infile) = ! 136: let val in_str = open_in infile ! 137: val lexer = mlex.makeLexer(input in_str) ! 138: val p = C.parse lexer (5,5) ! 139: in (close_in in_str; p) ! 140: end ! 141: ! 142: Grammar specifications ! 143: ---------------------- ! 144: ! 145: The ML-Yacc specification is very similar to a Yacc specification. The ! 146: specification has the following form: ! 147: ! 148: {user routines} ! 149: %% ! 150: {declarations} ! 151: %% ! 152: {rules} ! 153: ! 154: The declarations section contains the following declarations: ! 155: ! 156: %term %eof %left %nonassoc %verbose %subst %default ! 157: %nonterm %start %right %structure %prefer %keyword ! 158: ! 159: User routines ! 160: ------------- ! 161: ! 162: The user routines section must contain the following: ! 163: ! 164: type Lineno = ... ! 165: val lineno : Lineno ref = ... ! 166: val error : string -> Lineno -> unit ! 167: which is used to print error messages. ! 168: ! 169: The Lineno type is some representation of a position in the input file, so that ! 170: error messages can be keyed to the line number (or character position, etc.) ! 171: of the token that "caused" the error. If this is not necessary, Lineno can be ! 172: just "unit". ! 173: ! 174: %verbose ! 175: -------- ! 176: ! 177: Generate a y.output file, which gives a verbose description of the ! 178: tables created from the grammar. ! 179: ! 180: %structure ! 181: ---------- ! 182: ! 183: The name of the structure in which to place the parser should be specifed ! 184: using the statement %structure {structure name} ! 185: ! 186: Terminals and nonterminals ! 187: -------------------------- ! 188: ! 189: All terminals must be declared in the %term statement. All ! 190: nonterminals must be declared in the %nonterm statement. The ! 191: statements both have a form similar to that of an ML constructor ! 192: statement. ! 193: ! 194: Each symbol may be followed by the phrase "of <type>". ! 195: Multiple symbols must be separated by a bar: '|'. At least one ! 196: symbol must appear in each statement. ! 197: ! 198: The user is cautioned not to use ML reserved words for terminal or ! 199: nonterminal names. The program produced by ML-Yacc will not load correctly. ! 200: The <type> may be any valid ML type expression. ! 201: ! 202: A terminal name for the eof terminal must also be supplied. ! 203: ! 204: The start nonterminal and the eof terminal ! 205: ------------------------------------------ ! 206: ! 207: The eof terminal must be named in the %eof statement. Suppose the ! 208: eof terminal were EOF. Then the %eof statement would be ! 209: ! 210: %eof EOF ! 211: ! 212: The start terminal should be named in the %start statement. If one ! 213: is not supplied, the lhs of the first rule will be used. ! 214: ! 215: Precedence ! 216: ---------- ! 217: ! 218: Precedence is specified in the same manner as yacc. The terminals ! 219: are listed after the %left, %right, or %nonassoc statement. The ! 220: statements are in order of ascending (tighter binding) precedence. ! 221: ! 222: Precedence operates like it does in yacc, except for %nonassoc. ! 223: Like YACC, each rule is assigned the precedence of its rightmost terminal. ! 224: In the case of a shift/reduce conflict the precedence of the terminal ! 225: in the shift and the precedence of the rule in the reduce are compared. ! 226: If the terminal has a higher precedence, the shift is performed. If the ! 227: rule has a higher precedence, the reduce is performed. If the terminal ! 228: and the rule have the same precedence, the associativity of the terminal ! 229: is used to resolve the conflict. If the terminal is left associative, ! 230: we reduce. If the terminal is right associative, we shift. If the ! 231: terminal is nonassociative we print an error message and shift. ! 232: ! 233: Thus %nonassoc does not produce a fatal error if the associativity of a ! 234: nonassociative terminal is used to resolve a conflict. A warning message is ! 235: printed, though, about this, and a shift is planted. The shift causes the ! 236: nonassociative terminal to default to right associativity. ! 237: ! 238: Reduce/reduce conflicts are fatal in ML-Yacc, unlike in yacc. There is ! 239: no resolution of reduce/reduce conflicts based on rule ordering. ! 240: ! 241: The %prec tag may be used to alter the precedence for a rule. It is ! 242: described below under the Rules section. ! 243: ! 244: Information for an error correction algorithm. ! 245: ---------------------------------------------- ! 246: ! 247: The following keywords allow the user to specify information that may improve ! 248: recovery: ! 249: ! 250: %keyword - a list of tokens which are keywords ! 251: %prefer - a list of tokens which are preferred ! 252: for insertion ! 253: %subst - a list of preferred substitions for ! 254: certain tokens. ! 255: %default - value to be used for inserted tokens ! 256: that carry values. (not yet implemented) ! 257: ! 258: %keyword and %prefer should each be followed by a list of tokens. ! 259: ! 260: %subst has the following syntax: ! 261: ! 262: %subst {token} for {token} | ... | {token} for {token} ! 263: ! 264: Rules ! 265: ----- ! 266: ! 267: The rule section consists of a list of rules. ! 268: ! 269: Each rule has the syntax: ! 270: ! 271: {lhs nonterminal} : {rhs symbol list} ({value of same type as nonterminal, ! 272: or any type if the nonterminal has ! 273: none }) ! 274: ! 275: or ! 276: ! 277: {lhs nonterminal} : {rhs symbol list} %prec {terminal} ( ... value ...) ! 278: ! 279: The second form gives the rule the same precedence as the terminal. ! 280: ! 281: The | may be used to separate multiple rhs's with the same lhs. ! 282: ! 283: A null rhs may be specified by simply by having an empty rhs symbol list. ! 284: ! 285: The ( ... value ... ) part is not optional. It may be empty, though, if ! 286: the lhs nonterminal has no type associated with it. ! 287: ! 288: Values ! 289: ------ ! 290: ! 291: The value for a symbol on the rhs of a rule is in a variable ! 292: {symbol}N. N is the number of occurrences of the symbol in the rhs, ! 293: up to and including the symbol. Suppose we had a specification of ! 294: the form: ! 295: ! 296: %term ... PLUS ... ! 297: %nonterm ... EXP of int ... ! 298: ... ! 299: %% ! 300: ! 301: EXP : EXP PLUS EXP ! 302: ! 303: Then the action could contain (EXP1 + EXP2). EXP1 is the value of the ! 304: first occurrence of EXP on the rhs, while EXP2 is the value of the second ! 305: occurrence on the rhs. ! 306: ! 307: If a symbol has no type associated with it, it has no value associated ! 308: with it. Attempting to use PLUS1, in the above example, would result ! 309: later in a compilation error. ! 310: ! 311: Any value returned by a rule whose left hand side has no type associated with ! 312: it is ignored. The ML code associated with such rules may return any kind of ! 313: value; it will be executed for possible side-effects. ! 314: ! 315: If a terminal or nonterminal occurs only once on the rhs of a rule, its ! 316: value is also in {symbol}, as well as in {symbol}1 ! 317: ! 318: Bugs ! 319: ---- ! 320: There should be a better way for semantic-action routines to print out errors ! 321: and get the appropriate range of line numbers in the message automatically. ! 322: ! 323: Functors should be used to make the lexer more independent of the parser. ! 324: Speed should be improved in future versions. ! 325: ! 326: Sample specification ! 327: -------------------- ! 328: (* sample.grm *) ! 329: type Lineno = int ! 330: val lineno = ref 1 ! 331: fun error s l = output std_out ! 332: ("line " ^ makestring (l:int) ^ ":" ^ s ^ "\n") ! 333: fun lookup s = ordof(s,0) - ord("a")+1 ! 334: %% ! 335: %structure Calc ! 336: %eof EOF ! 337: %start START ! 338: ! 339: %left SUB PLUS ! 340: %left TIMES DIV ! 341: %right CARAT ! 342: ! 343: %term ID of string | NUM of int | PLUS | TIMES | PRINT | EOS | EOF | ! 344: CARAT | DIV | SUB ! 345: %nonterm EXP of int | START | STMT | STMT_LIST ! 346: ! 347: %% ! 348: START : STMT_LIST () ! 349: STMT_LIST : STMT_LIST STMT EOS () ! 350: STMT_LIST : () ! 351: STMT : PRINT EXP (print EXP; print "\n"; flush_out std_out) ! 352: STMT : EXP () ! 353: EXP : NUM (NUM) ! 354: | ID (lookup ID) ! 355: | EXP PLUS EXP (EXP1+EXP2) ! 356: | EXP TIMES EXP (EXP1*EXP2) ! 357: | EXP DIV EXP (EXP1 div EXP2) ! 358: | EXP SUB EXP (EXP1-EXP2) ! 359: | EXP CARAT EXP (let fun e (m,0) = 1 ! 360: | e (m,i) = m*e(m,i-1) ! 361: in e (EXP1,EXP2) ! 362: end) ! 363: ! 364: Sample ML-Lex specification ! 365: --------------------------- ! 366: (* sample.lex *) ! 367: open Calc.LexValue ! 368: type lexresult=V ! 369: val eof = fn () => EOF ! 370: %% ! 371: alpha=[A-Za-z]; ! 372: digit=[0-9]; ! 373: ws = [\ \t\n]; ! 374: %% ! 375: {ws}+ => (lex()); ! 376: {digit}+ => (NUM (revfold (fn (a,r) => ord(a)-ord("0")+10*r) (explode yytext) 0)); ! 377: "+" => (PLUS); ! 378: "*" => (TIMES); ! 379: ";" => (EOS); ! 380: {alpha}+ => (if yytext="print" then PRINT else ID yytext); ! 381: "-" => (SUB); ! 382: "^" => (CARAT); ! 383: "/" => (DIV); ! 384: . => (Calc.HDR.error ("ignoring bad character " ^ yytext); lex()); ! 385: ! 386: ! 387: Sample "main" module ! 388: -------------------- ! 389: (* sample.sml *) ! 390: structure Sample = ! 391: struct ! 392: fun run filename = ! 393: (* more suitable for non-interactive use *) ! 394: let val in_str = open_in filename ! 395: val lexer = Mlex.makeLexer (input in_str) ! 396: val p = (Calc.HDR.lineno := 0; ! 397: Calc.parse lexer (5,5)) ! 398: in (close_in in_str; p) ! 399: end ! 400: ! 401: fun run_std_in () = ! 402: (* more suitable for interactive use *) ! 403: let val lexer = Mlex.makeLexer (fn _ => input_line std_in) ! 404: val p = (Calc.HDR.lineno := 0; ! 405: Calc.parse lexer (0,0)) ! 406: in p ! 407: end ! 408: end ! 409: ! 410: Sample input ! 411: ------------ ! 412: (* sample.input, contains an intentional syntax error *) ! 413: print 4+2; ! 414: a+b ! 415: print b*c; ! 416: ! 417: ! 418: How to try the sample program ! 419: ---------------------------- ! 420: ! 421: % sml ! 422: - use "mlyacc.sml"; (* load the parser generator *) ! 423: - use "lexgen.sml"; (* load the lexical analyzer generator *) ! 424: - open LexGen ParseGen; ! 425: - exportML "yacclex"; (* save the image for later use *) ! 426: ! 427: - lexGen "sample.lex"; (* creates the file sample.lex.sml *) ! 428: - parseGen "sample.grm"; (* creates the file sample.grm.sml *) ! 429: - ^D (* exiting here is optional, of course *) ! 430: ! 431: % sml ! 432: - use "sample.grm.sml"; (* compile the parser *) ! 433: - use "sample.lex.sml"; (* compile the lexer *) ! 434: - use "sample.sml"; (* compile the main module *) ! 435: - Sample.run "sample.input"; (* run the sample program *) ! 436:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.