researchv10no/cmd/sml/lib/mlyacc/mlyacc.doc - annotate

Return to mlyacc.doc CVS log
Up to [Research Unix] / researchv10no / cmd / sml / lib / mlyacc
Annotation of researchv10no/cmd/sml/lib/mlyacc/mlyacc.doc, revision 1.1

1.1     ! root        1: 
        !             2:                         ML-YACC, version 1.0
        !             3: 
        !             4:                         Preliminary Documentation
        !             5:                         for Preliminary Version
        !             6: 
        !             7:                         David R. Tarditi
        !             8:                         Andrew W. Appel
        !             9: 
        !            10:                         Department of Computer Science
        !            11:                         Princeton University
        !            12:                         Princeton, NJ 08544
        !            13: 
        !            14:                         February 1, 1989
        !            15: 
        !            16: (c) 1989 Andrew W. Appel, David R. Tarditi
        !            17: This software comes with ABSOLUTELY NO WARRANTY.
        !            18: This software is subject only to the PRINCETON STANDARD ML SOFTWARE LIBRARY
        !            19: COPYRIGHT NOTICE, LICENSE AND DISCLAIMER, (in the file "COPYRIGHT",
        !            20: distributed with this software). You may copy and distribute this software;
        !            21: see the COPYRIGHT NOTICE for details and restrictions.
        !            22: 
        !            23: Description
        !            24: -----------
        !            25: 
        !            26: This is a preliminary guide to using ML-Yacc. It is not complete documentation.
        !            27: It tells how to invoke ML-Yacc, and the syntax of an ML-Yacc specification.
        !            28: The syntax description assumes the user knows how to use Yacc, and notes the
        !            29: differences between ML-Yacc and Yacc.
        !            30: 
        !            31: Note that version 2.0 will be released at the end of 1989; it has a slightly
        !            32: different interface and much improved error recovery.
        !            33: 
        !            34: ML-Yacc is Yacc-like parser generator for Standard ML.  It generates
        !            35: parsers for LALR languages, like Yacc, and its syntax is very similar to
        !            36: that of Yacc.
        !            37: 
        !            38: It handles syntax errors differently from Yacc.  The parser
        !            39: generated by ML-Yacc will attempt to automatically recover from a
        !            40: syntax error by making a single token insertion, deletion, or substitution.
        !            41: At the moment, only tokens without values can be inserted or substituted.
        !            42: A future version will also insert tokens with values, by providing a
        !            43: mechanism for specifying code to be evaluated for each token that will
        !            44: provided a dummy value.
        !            45: 
        !            46: Syntax Error Recovery Method.
        !            47: -----------------------------
        !            48: 
        !            49: The method used is described in 
        !            50: 
        !            51:         'A Practical Method for LR and LL Syntactic Error Diagnosis and
        !            52:          Recovery', by M. Burke and G. Fisher, ACM Transactions on
        !            53:          Programming Languages and Systems, Vol. 9, No. 2, April 1987,
        !            54:          pp. 164-197.
        !            55: 
        !            56: The partial, deferred method discussed in the article has been
        !            57: implemented.  
        !            58: 
        !            59: This method defers reductions for some number of shifted tokens.  The
        !            60: deferred reductions are kept in a queue, and when an element is pulled
        !            61: from the queue, the reductions are then applied to the value stack.
        !            62: 
        !            63: When a syntax error is encountered, the parser reads some number of tokens
        !            64: ahead.  It then uses the queue of deferred actions to check possible error
        !            65: recoveries.  IMPORTANT:  Since the lexer is several tokens ahead of the
        !            66: execution of semantic actions, the semantic routines CANNOT influence the
        !            67: action of the lexer in any significant way.  Example 1:  it is hard to imagine
        !            68: how to compile "typedef" in C.  Example 2:  the "infix" operators of ML must be
        !            69: treated as ordinary identifiers and handled specially by the semantic actions.
        !            70: 
        !            71: 
        !            72: Invoking ML-Yacc
        !            73: ----------------
        !            74: 
        !            75: Use "mlyacc.sml"; this will create a structure ParseGen.  The function
        !            76: ParseGen.parseGen creates a program for a parser from an input
        !            77: specification.  It takes a string argument -- the name of the file
        !            78: containing the input specification.  The output file name is 
        !            79: determined by appending ".sml" to the input file name.
        !            80: 
        !            81: Using the parser created by ML-Yacc
        !            82: -----------------------------------
        !            83: 
        !            84: Invoking the parser
        !            85: -------------------
        !            86: 
        !            87: After following the commands above, a structure whose name was
        !            88: specified by the user will exist.  Three of the components are relevent to
        !            89: the user:
        !            90: 
        !            91:         HDR: a structure containing values declared in the "user routines"
        !            92:              section of the ML-Yacc specification.
        !            93: 
        !            94:         LexValue: a structure containing constructors for returning tokens
        !            95:                   and their values to the parser from the lexical analyzer.
        !            96: 
        !            97:         parse: a function which takes a lexing function and a pair of integers
        !            98:                and parses the input from the lexing functions.
        !            99:                The lexing function must returns values of type LexValue.V
        !           100:                The pair of integers specifies the defer and lookahead
        !           101:                parameters for the error-recovery algorithm.
        !           102:                For interactive parsers, (0,0) is suggested; for
        !           103:               good error recovery, (10,5) is suggested.
        !           104: 
        !           105: Creating the lex function
        !           106: -------------------------
        !           107: 
        !           108: The lexing function must have the following form:
        !           109: 
        !           110: lexer : unit -> LexValue.V
        !           111: 
        !           112: This lexical-analyzer interface is exactly that provided by ML-Lex, an ML
        !           113: version of Lex.  The user should include the statement 
        !           114: 'open {structure name}.LexValue'
        !           115: in the user routines section of his ML-Lex specification.  This will
        !           116: make the components of HDR and the constructors for terminals available to
        !           117: the lex actions.  The lex actions should all return these constructors as
        !           118: values.
        !           119: 
        !           120: The names of the constructors are the same as the terminal names used
        !           121: in the ML-Yacc specification.   The constructors for those terminals
        !           122: that have values take only values with the same type as the terminal,
        !           123: of course.  Those for terminals with no values are nullary constructors.
        !           124: 
        !           125: A sample ML-Lex specification is given at the end of this document.
        !           126: 
        !           127: Tying the lexer and parser together
        !           128: -----------------------------------
        !           129: 
        !           130: The use of ML-Lex is suggested but not required.
        !           131: The user should create the lexing function using makeLexer, and then
        !           132: pass this function to the function C.parse.   Here is some code which
        !           133: does this.
        !           134: 
        !           135: fun parse(infile) = 
        !           136:      let val in_str = open_in infile
        !           137:          val lexer = mlex.makeLexer(input in_str)
        !           138:          val p = C.parse lexer (5,5)
        !           139:      in (close_in in_str; p)
        !           140:      end
        !           141: 
        !           142: Grammar specifications
        !           143: ----------------------
        !           144: 
        !           145: The ML-Yacc specification is very similar to a Yacc specification.  The
        !           146: specification has the following form:
        !           147: 
        !           148: {user routines}
        !           149: %%
        !           150: {declarations}
        !           151: %%
        !           152: {rules}
        !           153: 
        !           154: The declarations section contains the following declarations:
        !           155: 
        !           156: %term           %eof    %left   %nonassoc   %verbose    %subst      %default
        !           157: %nonterm        %start  %right  %structure  %prefer     %keyword
        !           158: 
        !           159: User routines
        !           160: -------------
        !           161: 
        !           162: The user routines section must contain the following:
        !           163: 
        !           164:         type Lineno = ...
        !           165:         val  lineno : Lineno ref = ...
        !           166:         val error :  string -> Lineno -> unit
        !           167:              which is used to print error messages.
        !           168: 
        !           169: The Lineno type is some representation of a position in the input file, so that
        !           170: error messages can be keyed to the line number (or character position, etc.)
        !           171: of the token that "caused" the error.  If this is not necessary, Lineno can be
        !           172: just "unit".
        !           173: 
        !           174: %verbose
        !           175: --------
        !           176: 
        !           177: Generate a y.output file, which gives a verbose description of the
        !           178: tables created from the grammar.
        !           179: 
        !           180: %structure
        !           181: ----------
        !           182: 
        !           183: The name of the structure in which to place the parser should be specifed
        !           184: using the statement %structure {structure name}
        !           185: 
        !           186: Terminals and nonterminals
        !           187: --------------------------
        !           188: 
        !           189: All terminals must be declared in the %term statement.  All
        !           190: nonterminals must be declared in the %nonterm statement.  The
        !           191: statements both have a form similar to that of an ML constructor
        !           192: statement.
        !           193: 
        !           194: Each symbol may be followed by the phrase "of <type>".
        !           195: Multiple symbols must be separated by a bar: '|'.  At least one
        !           196: symbol must appear in each statement.
        !           197: 
        !           198: The user is cautioned not to use ML reserved words for terminal or
        !           199: nonterminal names.  The program produced by ML-Yacc will not load correctly.
        !           200: The <type> may be any valid ML type expression.
        !           201: 
        !           202: A terminal name for the eof terminal must also be supplied.
        !           203: 
        !           204: The start nonterminal and the eof terminal
        !           205: ------------------------------------------
        !           206: 
        !           207: The eof terminal must be named in the %eof statement.  Suppose the
        !           208: eof terminal were EOF.  Then the %eof statement would be
        !           209: 
        !           210: %eof EOF
        !           211: 
        !           212: The start terminal should be named in the %start statement.  If one
        !           213: is not supplied, the lhs of the first rule will be used.
        !           214: 
        !           215: Precedence
        !           216: ----------
        !           217: 
        !           218: Precedence is specified in the same manner as yacc.  The terminals
        !           219: are listed after the %left, %right, or %nonassoc statement.  The
        !           220: statements are in order of ascending (tighter binding) precedence.
        !           221: 
        !           222: Precedence operates like it does in yacc, except for %nonassoc.
        !           223: Like YACC, each rule is assigned the precedence of its rightmost terminal.
        !           224: In the case of a shift/reduce conflict the precedence of the terminal
        !           225: in the shift and the precedence of the rule in the reduce are compared.
        !           226: If the terminal has a higher precedence, the shift is performed.  If the
        !           227: rule has a higher precedence, the reduce is performed.  If the terminal
        !           228: and the rule have the same precedence, the associativity of the terminal
        !           229: is used to resolve the conflict.  If the terminal is left associative,
        !           230: we reduce.  If the terminal is right associative, we shift.  If the
        !           231: terminal is nonassociative we print an error message and shift.
        !           232: 
        !           233: Thus %nonassoc does not produce a fatal error if the associativity of a
        !           234: nonassociative terminal is used to resolve a conflict.  A warning message is
        !           235: printed, though, about this, and a shift is planted.  The shift causes the
        !           236: nonassociative terminal to default to right associativity.
        !           237: 
        !           238: Reduce/reduce conflicts are fatal in ML-Yacc, unlike in yacc.  There is
        !           239: no resolution of reduce/reduce conflicts based on rule ordering.
        !           240: 
        !           241: The %prec tag may be used to alter the precedence for a rule.  It is
        !           242: described below under the Rules section.
        !           243: 
        !           244: Information for an error correction algorithm.
        !           245: ----------------------------------------------
        !           246: 
        !           247: The following keywords allow the user to specify information that may improve
        !           248: recovery:
        !           249: 
        !           250:         %keyword                - a list of tokens which are keywords
        !           251:         %prefer                 - a list of tokens which are preferred
        !           252:                                   for insertion
        !           253:         %subst                  - a list of preferred substitions for
        !           254:                                   certain tokens.
        !           255:         %default                - value to be used for inserted tokens
        !           256:                                     that carry values. (not yet implemented)
        !           257: 
        !           258: %keyword and %prefer should each be followed by a list of tokens.
        !           259: 
        !           260: %subst has the following syntax: 
        !           261: 
        !           262:    %subst {token} for {token} |  ...   | {token} for {token}
        !           263: 
        !           264: Rules
        !           265: -----
        !           266: 
        !           267: The rule section consists of a list of rules.
        !           268: 
        !           269: Each rule has the syntax:
        !           270: 
        !           271: {lhs nonterminal} : {rhs symbol list}   ({value of same type as nonterminal,
        !           272:                                          or any type if the nonterminal has
        !           273:                                          none }) 
        !           274: 
        !           275:  or
        !           276: 
        !           277:  {lhs nonterminal} : {rhs symbol list} %prec {terminal} ( ... value ...)
        !           278: 
        !           279: The second form gives the rule the same precedence as the terminal.
        !           280: 
        !           281: The | may be used to separate multiple rhs's with the same lhs.
        !           282: 
        !           283: A null rhs may be specified by simply by having an empty rhs symbol list.
        !           284: 
        !           285: The ( ... value ... ) part is not optional.  It may be empty, though, if
        !           286: the lhs nonterminal has no type associated with it.
        !           287: 
        !           288: Values
        !           289: ------
        !           290: 
        !           291: The value for a symbol on the rhs of a rule is in a variable
        !           292: {symbol}N.  N is the number of occurrences of the symbol in the rhs,
        !           293: up to and including the symbol.  Suppose we had a specification of
        !           294: the form:
        !           295: 
        !           296: %term ... PLUS ...
        !           297: %nonterm ... EXP of int ...
        !           298: ...
        !           299: %%
        !           300: 
        !           301: EXP : EXP PLUS EXP 
        !           302: 
        !           303: Then the action could contain (EXP1 + EXP2).  EXP1 is the value of the
        !           304: first occurrence of EXP on the rhs, while EXP2 is the value of the second
        !           305: occurrence on the rhs.
        !           306: 
        !           307: If a symbol has no type associated with it, it has no value associated
        !           308: with it.  Attempting to use PLUS1, in the above example, would result
        !           309: later in a compilation error.
        !           310: 
        !           311: Any value returned by a rule whose left hand side has no type associated with
        !           312: it is ignored.  The ML code associated with such rules may return any kind of
        !           313: value; it will be executed for possible side-effects.
        !           314: 
        !           315: If a terminal or nonterminal occurs only once on the rhs of a rule, its
        !           316: value is also in {symbol}, as well as in {symbol}1
        !           317: 
        !           318: Bugs
        !           319: ----
        !           320: There should be a better way for semantic-action routines to print out errors
        !           321: and get the appropriate range of line numbers in the message automatically.
        !           322: 
        !           323: Functors should be used to make the lexer more independent of the parser.
        !           324: Speed should be improved in future versions.
        !           325: 
        !           326: Sample specification
        !           327: --------------------
        !           328: (* sample.grm *)
        !           329: type Lineno = int
        !           330: val lineno = ref 1
        !           331: fun error s l = output std_out 
        !           332:         ("line " ^ makestring (l:int) ^ ":" ^ s ^ "\n")
        !           333: fun lookup s = ordof(s,0) - ord("a")+1
        !           334: %%
        !           335: %structure Calc
        !           336: %eof   EOF
        !           337: %start START
        !           338: 
        !           339: %left SUB PLUS
        !           340: %left TIMES DIV
        !           341: %right CARAT
        !           342: 
        !           343: %term ID of string | NUM of int | PLUS | TIMES | PRINT | EOS | EOF |
        !           344:         CARAT | DIV | SUB
        !           345: %nonterm EXP of int | START | STMT | STMT_LIST
        !           346: 
        !           347: %%
        !           348:   START : STMT_LIST     ()
        !           349:   STMT_LIST : STMT_LIST STMT EOS ()
        !           350:   STMT_LIST : ()
        !           351:   STMT : PRINT EXP      (print EXP; print "\n"; flush_out std_out)
        !           352:   STMT : EXP            ()
        !           353:   EXP : NUM             (NUM)
        !           354:       | ID              (lookup ID)
        !           355:       | EXP PLUS EXP    (EXP1+EXP2)
        !           356:       | EXP TIMES EXP   (EXP1*EXP2)
        !           357:       | EXP DIV EXP     (EXP1 div EXP2)
        !           358:       | EXP SUB EXP     (EXP1-EXP2)
        !           359:       | EXP CARAT EXP   (let fun e (m,0) = 1
        !           360:                                 | e (m,i) = m*e(m,i-1)
        !           361:                          in e (EXP1,EXP2)       
        !           362:                          end)
        !           363: 
        !           364: Sample ML-Lex specification
        !           365: ---------------------------
        !           366: (* sample.lex *)
        !           367: open Calc.LexValue
        !           368: type lexresult=V
        !           369: val eof = fn () => EOF
        !           370: %%
        !           371: alpha=[A-Za-z];
        !           372: digit=[0-9];
        !           373: ws = [\ \t\n];
        !           374: %%
        !           375: {ws}+    => (lex());
        !           376: {digit}+ => (NUM (revfold (fn (a,r) => ord(a)-ord("0")+10*r) (explode yytext) 0));
        !           377: "+"      => (PLUS);
        !           378: "*"      => (TIMES);
        !           379: ";"      => (EOS);
        !           380: {alpha}+ => (if yytext="print" then PRINT else ID yytext);
        !           381: "-"      => (SUB);
        !           382: "^"      => (CARAT);
        !           383: "/"      => (DIV);
        !           384: .        => (Calc.HDR.error ("ignoring bad character " ^ yytext); lex());
        !           385: 
        !           386: 
        !           387: Sample "main" module
        !           388: --------------------
        !           389: (* sample.sml *)
        !           390: structure Sample =
        !           391:    struct
        !           392:       fun run filename = 
        !           393:             (* more suitable for non-interactive use *)
        !           394:           let val in_str = open_in filename
        !           395:               val lexer =  Mlex.makeLexer (input in_str)
        !           396:               val p = (Calc.HDR.lineno := 0;
        !           397:                        Calc.parse lexer (5,5))
        !           398:            in (close_in in_str; p)
        !           399:            end
        !           400:     
        !           401:       fun run_std_in () =
        !           402:             (* more suitable for interactive use *)
        !           403:         let val lexer = Mlex.makeLexer (fn _ => input_line std_in)
        !           404:             val p = (Calc.HDR.lineno := 0;
        !           405:                        Calc.parse lexer (0,0))
        !           406:          in p
        !           407:         end
        !           408:    end
        !           409: 
        !           410: Sample input
        !           411: ------------
        !           412: (* sample.input, contains an intentional syntax error *)
        !           413: print 4+2;
        !           414: a+b
        !           415: print b*c;
        !           416: 
        !           417: 
        !           418: How to try the sample program
        !           419: ----------------------------
        !           420: 
        !           421: % sml
        !           422: - use "mlyacc.sml";  (* load the parser generator *)
        !           423: - use "lexgen.sml";  (* load the lexical analyzer generator *)
        !           424: - open LexGen ParseGen;
        !           425: - exportML "yacclex"; (* save the image for later use *)
        !           426: 
        !           427: - lexGen "sample.lex";  (* creates the file sample.lex.sml *)
        !           428: - parseGen "sample.grm"; (* creates the file sample.grm.sml *)
        !           429: - ^D            (* exiting here is optional, of course *)
        !           430: 
        !           431: % sml
        !           432: - use "sample.grm.sml";     (* compile the parser *)
        !           433: - use "sample.lex.sml";     (* compile the lexer *)
        !           434: - use "sample.sml";         (* compile the main module *)
        !           435: - Sample.run "sample.input"; (* run the sample program *)
        !           436:
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.