GNUtools/bison/bison.texinfo - annotate

Return to bison.texinfo CVS log
Up to [Apple XNU] / GNUtools / bison
Annotation of GNUtools/bison/bison.texinfo, revision 1.1.1.1

1.1       root        1: \input texinfo @c -*-texinfo-*-
                      2: @comment %**start of header
                      3: @setfilename bison.info
                      4: @settitle Bison 1.20
                      5: @setchapternewpage odd
                      6: 
                      7: @c SMALL BOOK version   
                      8: @c This edition has been formatted so that you can format and print it in
                      9: @c the smallbook format. 
                     10: @c @smallbook
                     11: 
                     12: @c next time, consider using @set for edition number, etc...
                     13: 
                     14: @c Set following if you have the new `shorttitlepage' command
                     15: @c @clear shorttitlepage-enabled
                     16: @c @set shorttitlepage-enabled
                     17: 
                     18: @c ISPELL CHECK: done, 14 Jan 1993 --bob
                     19: 
                     20: @c Check COPYRIGHT dates.  should be updated in the titlepage, ifinfo
                     21: @c titlepage; should NOT be changed in the GPL.  --mew
                     22: 
                     23: @iftex
                     24: @syncodeindex fn cp
                     25: @syncodeindex vr cp
                     26: @syncodeindex tp cp
                     27: @end iftex
                     28: @ifinfo
                     29: @synindex fn cp
                     30: @synindex vr cp
                     31: @synindex tp cp
                     32: @end ifinfo
                     33: @comment %**end of header
                     34: 
                     35: @ifinfo
                     36: This file documents the Bison parser generator.
                     37: 
                     38: Copyright (C) 1988, 1989, 1990, 1991, 1992 Free Software Foundation, Inc.
                     39: 
                     40: Permission is granted to make and distribute verbatim copies of
                     41: this manual provided the copyright notice and this permission notice
                     42: are preserved on all copies.
                     43: 
                     44: @ignore
                     45: Permission is granted to process this file through Tex and print the
                     46: results, provided the printed document carries copying permission
                     47: notice identical to this one except for the removal of this paragraph
                     48: (this paragraph not being relevant to the printed manual).
                     49: 
                     50: @end ignore
                     51: Permission is granted to copy and distribute modified versions of this
                     52: manual under the conditions for verbatim copying, provided also that the
                     53: sections entitled ``GNU General Public License'' and ``Conditions for
                     54: Using Bison'' are included exactly as in the original, and provided that
                     55: the entire resulting derived work is distributed under the terms of a
                     56: permission notice identical to this one.
                     57: 
                     58: Permission is granted to copy and distribute translations of this manual
                     59: into another language, under the above conditions for modified versions,
                     60: except that the sections entitled ``GNU General Public License'',
                     61: ``Conditions for Using Bison'' and this permission notice may be
                     62: included in translations approved by the Free Software Foundation
                     63: instead of in the original English.
                     64: @end ifinfo
                     65: 
                     66: @ifset shorttitlepage-enabled
                     67: @shorttitlepage Bison
                     68: @end ifset
                     69: @titlepage
                     70: @title Bison
                     71: @subtitle The YACC-compatible Parser Generator
                     72: @subtitle December 1992, Bison Version 1.20
                     73: 
                     74: @author by Charles Donnelly and Richard Stallman
                     75: 
                     76: @page
                     77: @vskip 0pt plus 1filll
                     78: Copyright @copyright{} 1988, 1989, 1990, 1991, 1992 Free Software
                     79: Foundation 
                     80: 
                     81: @sp 2
                     82: Published by the Free Software Foundation @*
                     83: 675 Massachusetts Avenue @*
                     84: Cambridge, MA 02139 USA @*
                     85: Printed copies are available for $15 each.@*
                     86: ISBN-1-882114-30-2
                     87: 
                     88: Permission is granted to make and distribute verbatim copies of
                     89: this manual provided the copyright notice and this permission notice
                     90: are preserved on all copies.
                     91: 
                     92: @ignore
                     93: Permission is granted to process this file through TeX and print the
                     94: results, provided the printed document carries copying permission
                     95: notice identical to this one except for the removal of this paragraph
                     96: (this paragraph not being relevant to the printed manual).
                     97: 
                     98: @end ignore
                     99: Permission is granted to copy and distribute modified versions of this
                    100: manual under the conditions for verbatim copying, provided also that the
                    101: sections entitled ``GNU General Public License'' and ``Conditions for
                    102: Using Bison'' are included exactly as in the original, and provided that
                    103: the entire resulting derived work is distributed under the terms of a
                    104: permission notice identical to this one.
                    105: 
                    106: Permission is granted to copy and distribute translations of this manual
                    107: into another language, under the above conditions for modified versions,
                    108: except that the sections entitled ``GNU General Public License'',
                    109: ``Conditions for Using Bison'' and this permission notice may be
                    110: included in translations approved by the Free Software Foundation
                    111: instead of in the original English.
                    112: @sp 2
                    113: Cover art by Etienne Suvasa.
                    114: @end titlepage
                    115: @page
                    116: 
                    117: @node Top, Introduction, (dir), (dir)
                    118: 
                    119: @ifinfo
                    120: This manual documents version 1.20 of Bison.
                    121: @end ifinfo
                    122: 
                    123: @menu
                    124: * Introduction::      
                    125: * Conditions::        
                    126: * Copying::           The GNU General Public License says
                    127:                         how you can copy and share Bison
                    128: 
                    129: Tutorial sections:
                    130: * Concepts::          Basic concepts for understanding Bison.
                    131: * Examples::          Three simple explained examples of using Bison.
                    132: 
                    133: Reference sections:
                    134: * Grammar File::      Writing Bison declarations and rules.
                    135: * Interface::         C-language interface to the parser function @code{yyparse}.
                    136: * Algorithm::         How the Bison parser works at run-time.
                    137: * Error Recovery::    Writing rules for error recovery.
                    138: * Context Dependency::  What to do if your language syntax is too
                    139:                         messy for Bison to handle straightforwardly.
                    140: * Debugging::         Debugging Bison parsers that parse wrong.
                    141: * Invocation::        How to run Bison (to produce the parser source file).
                    142: * Table of Symbols::  All the keywords of the Bison language are explained.
                    143: * Glossary::          Basic concepts are explained.
                    144: * Index::             Cross-references to the text.
                    145: 
                    146:  --- The Detailed Node Listing ---
                    147: 
                    148: The Concepts of Bison
                    149: 
                    150: * Language and Grammar::  Languages and context-free grammars,
                    151:                             as mathematical ideas.
                    152: * Grammar in Bison::  How we represent grammars for Bison's sake.
                    153: * Semantic Values::   Each token or syntactic grouping can have
                    154:                         a semantic value (the value of an integer,
                    155:                         the name of an identifier, etc.).
                    156: * Semantic Actions::  Each rule can have an action containing C code.
                    157: * Bison Parser::      What are Bison's input and output,
                    158:                         how is the output used?
                    159: * Stages::            Stages in writing and running Bison grammars.
                    160: * Grammar Layout::    Overall structure of a Bison grammar file.
                    161: 
                    162: Examples
                    163: 
                    164: * RPN Calc::          Reverse polish notation calculator;
                    165:                         a first example with no operator precedence.
                    166: * Infix Calc::        Infix (algebraic) notation calculator.
                    167:                         Operator precedence is introduced.
                    168: * Simple Error Recovery::  Continuing after syntax errors.
                    169: * Multi-function Calc::    Calculator with memory and trig functions.
                    170:                         It uses multiple data-types for semantic values.
                    171: * Exercises::         Ideas for improving the multi-function calculator.
                    172: 
                    173: Reverse Polish Notation Calculator
                    174: 
                    175: * Decls: Rpcalc Decls.  Bison and C declarations for rpcalc.
                    176: * Rules: Rpcalc Rules.  Grammar Rules for rpcalc, with explanation.
                    177: * Lexer: Rpcalc Lexer.  The lexical analyzer.
                    178: * Main: Rpcalc Main.    The controlling function.
                    179: * Error: Rpcalc Error.  The error reporting function.
                    180: * Gen: Rpcalc Gen.      Running Bison on the grammar file.
                    181: * Comp: Rpcalc Compile. Run the C compiler on the output code.
                    182: 
                    183: Grammar Rules for @code{rpcalc}
                    184: 
                    185: * Rpcalc Input::      
                    186: * Rpcalc Line::       
                    187: * Rpcalc Expr::       
                    188: 
                    189: Multi-Function Calculator: @code{mfcalc}
                    190: 
                    191: * Decl: Mfcalc Decl.      Bison declarations for multi-function calculator.
                    192: * Rules: Mfcalc Rules.    Grammar rules for the calculator.
                    193: * Symtab: Mfcalc Symtab.  Symbol table management subroutines.
                    194: 
                    195: Bison Grammar Files
                    196: 
                    197: * Grammar Outline::   Overall layout of the grammar file.
                    198: * Symbols::           Terminal and nonterminal symbols.
                    199: * Rules::             How to write grammar rules.
                    200: * Recursion::         Writing recursive rules.
                    201: * Semantics::         Semantic values and actions.
                    202: * Declarations::      All kinds of Bison declarations are described here.
                    203: * Multiple Parsers::  Putting more than one Bison parser in one program.
                    204: 
                    205: Outline of a Bison Grammar
                    206: 
                    207: * C Declarations::    Syntax and usage of the C declarations section.
                    208: * Bison Declarations::  Syntax and usage of the Bison declarations section.
                    209: * Grammar Rules::     Syntax and usage of the grammar rules section.
                    210: * C Code::            Syntax and usage of the additional C code section.
                    211: 
                    212: Defining Language Semantics
                    213: 
                    214: * Value Type::        Specifying one data type for all semantic values.
                    215: * Multiple Types::    Specifying several alternative data types.
                    216: * Actions::           An action is the semantic definition of a grammar rule.
                    217: * Action Types::      Specifying data types for actions to operate on.
                    218: * Mid-Rule Actions::  Most actions go at the end of a rule.
                    219:                       This says when, why and how to use the exceptional
                    220:                         action in the middle of a rule.
                    221: 
                    222: Bison Declarations
                    223: 
                    224: * Token Decl::        Declaring terminal symbols.
                    225: * Precedence Decl::   Declaring terminals with precedence and associativity.
                    226: * Union Decl::        Declaring the set of all semantic value types.
                    227: * Type Decl::         Declaring the choice of type for a nonterminal symbol.
                    228: * Expect Decl::       Suppressing warnings about shift/reduce conflicts.
                    229: * Start Decl::        Specifying the start symbol.
                    230: * Pure Decl::         Requesting a reentrant parser.
                    231: * Decl Summary::      Table of all Bison declarations.
                    232: 
                    233: Parser C-Language Interface
                    234: 
                    235: * Parser Function::   How to call @code{yyparse} and what it returns.
                    236: * Lexical::           You must supply a function @code{yylex} 
                    237:                         which reads tokens.
                    238: * Error Reporting::   You must supply a function @code{yyerror}.
                    239: * Action Features::   Special features for use in actions.
                    240: 
                    241: The Lexical Analyzer Function @code{yylex}
                    242: 
                    243: * Calling Convention::  How @code{yyparse} calls @code{yylex}.
                    244: * Token Values::      How @code{yylex} must return the semantic value
                    245:                         of the token it has read.
                    246: * Token Positions::   How @code{yylex} must return the text position
                    247:                         (line number, etc.) of the token, if the
                    248:                          actions want that.
                    249: * Pure Calling::      How the calling convention differs
                    250:                         in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
                    251: 
                    252: The Bison Parser Algorithm 
                    253: 
                    254: * Look-Ahead::        Parser looks one token ahead when deciding what to do.
                    255: * Shift/Reduce::      Conflicts: when either shifting or reduction is valid.
                    256: * Precedence::        Operator precedence works by resolving conflicts.
                    257: * Contextual Precedence::  When an operator's precedence depends on context.
                    258: * Parser States::     The parser is a finite-state-machine with stack.
                    259: * Reduce/Reduce::     When two rules are applicable in the same situation.
                    260: * Mystery Conflicts::  Reduce/reduce conflicts that look unjustified.
                    261: * Stack Overflow::    What happens when stack gets full.  How to avoid it.
                    262: 
                    263: Operator Precedence
                    264: 
                    265: * Why Precedence::    An example showing why precedence is needed.
                    266: * Using Precedence::  How to specify precedence in Bison grammars.
                    267: * Precedence Examples::  How these features are used in the previous example.
                    268: * How Precedence::    How they work.
                    269: 
                    270: Handling Context Dependencies
                    271: 
                    272: * Semantic Tokens::   Token parsing can depend on the semantic context.
                    273: * Lexical Tie-ins::   Token parsing can depend on the syntactic context.
                    274: * Tie-in Recovery::   Lexical tie-ins have implications for how
                    275:                         error recovery rules must be written.
                    276: 
                    277: Invoking Bison
                    278: 
                    279: * Bison Options::     All the options described in detail, 
                    280:                        in alphabetical order by short options.
                    281: * Option Cross Key::  Alphabetical list of long options.
                    282: * VMS Invocation::    Bison command syntax on VMS.
                    283: @end menu
                    284: 
                    285: @node Introduction, Conditions, Top, Top
                    286: @unnumbered Introduction
                    287: @cindex introduction
                    288: 
                    289: @dfn{Bison} is a general-purpose parser generator that converts a
                    290: grammar description for an LALR(1) context-free grammar into a C
                    291: program to parse that grammar.  Once you are proficient with Bison,
                    292: you may use it to develop a wide range of language parsers, from those
                    293: used in simple desk calculators to complex programming languages.
                    294: 
                    295: Bison is upward compatible with Yacc: all properly-written Yacc grammars
                    296: ought to work with Bison with no change.  Anyone familiar with Yacc
                    297: should be able to use Bison with little trouble.  You need to be fluent in
                    298: C programming in order to use Bison or to understand this manual.
                    299: 
                    300: We begin with tutorial chapters that explain the basic concepts of using
                    301: Bison and show three explained examples, each building on the last.  If you
                    302: don't know Bison or Yacc, start by reading these chapters.  Reference
                    303: chapters follow which describe specific aspects of Bison in detail.
                    304: 
                    305: Bison was written primarily by Robert Corbett; Richard Stallman made
                    306: it Yacc-compatible.  This edition corresponds to version 1.20 of Bison.
                    307: 
                    308: @node Conditions, Copying, Introduction, Top
                    309: @unnumbered Conditions for Using Bison
                    310: 
                    311: Bison grammars can be used only in programs that are free software.  This
                    312: is in contrast to what happens with the GNU C compiler and the other
                    313: GNU programming tools.
                    314: 
                    315: The reason Bison is special is that the output of the Bison utility---the
                    316: Bison parser file---contains a verbatim copy of a sizable piece of Bison,
                    317: which is the code for the @code{yyparse} function.  (The actions from your
                    318: grammar are inserted into this function at one point, but the rest of the
                    319: function is not changed.)
                    320: 
                    321: As a result, the Bison parser file is covered by the same copying
                    322: conditions that cover Bison itself and the rest of the GNU system: any
                    323: program containing it has to be distributed under the standard GNU copying
                    324: conditions.
                    325: 
                    326: Occasionally people who would like to use Bison to develop proprietary
                    327: programs complain about this.
                    328: 
                    329: We don't particularly sympathize with their complaints.  The purpose of the
                    330: GNU project is to promote the right to share software and the practice of
                    331: sharing software; it is a means of changing society.  The people who
                    332: complain are planning to be uncooperative toward the rest of the world; why
                    333: should they deserve our help in doing so?
                    334: 
                    335: However, it's possible that a change in these conditions might encourage
                    336: computer companies to use and distribute the GNU system.  If so, then we
                    337: might decide to change the terms on @code{yyparse} as a matter of the
                    338: strategy of promoting the right to share.  Such a change would be
                    339: irrevocable.  Since we stand by the copying permissions we have announced,
                    340: we cannot withdraw them once given.
                    341: 
                    342: We mustn't make an irrevocable change hastily.  We have to wait until there
                    343: is a complete GNU system and there has been time to learn how this issue
                    344: affects its reception.
                    345: 
                    346: @node Copying, Concepts, Conditions, Top
                    347: @unnumbered GNU GENERAL PUBLIC LICENSE
                    348: @center Version 2, June 1991
                    349: 
                    350: @display
                    351: Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
                    352: 675 Mass Ave, Cambridge, MA 02139, USA
                    353: 
                    354: Everyone is permitted to copy and distribute verbatim copies
                    355: of this license document, but changing it is not allowed.
                    356: @end display
                    357: 
                    358: @unnumberedsec Preamble
                    359: 
                    360:   The licenses for most software are designed to take away your
                    361: freedom to share and change it.  By contrast, the GNU General Public
                    362: License is intended to guarantee your freedom to share and change free
                    363: software---to make sure the software is free for all its users.  This
                    364: General Public License applies to most of the Free Software
                    365: Foundation's software and to any other program whose authors commit to
                    366: using it.  (Some other Free Software Foundation software is covered by
                    367: the GNU Library General Public License instead.)  You can apply it to
                    368: your programs, too.
                    369: 
                    370:   When we speak of free software, we are referring to freedom, not
                    371: price.  Our General Public Licenses are designed to make sure that you
                    372: have the freedom to distribute copies of free software (and charge for
                    373: this service if you wish), that you receive source code or can get it
                    374: if you want it, that you can change the software or use pieces of it
                    375: in new free programs; and that you know you can do these things.
                    376: 
                    377:   To protect your rights, we need to make restrictions that forbid
                    378: anyone to deny you these rights or to ask you to surrender the rights.
                    379: These restrictions translate to certain responsibilities for you if you
                    380: distribute copies of the software, or if you modify it.
                    381: 
                    382:   For example, if you distribute copies of such a program, whether
                    383: gratis or for a fee, you must give the recipients all the rights that
                    384: you have.  You must make sure that they, too, receive or can get the
                    385: source code.  And you must show them these terms so they know their
                    386: rights.
                    387: 
                    388:   We protect your rights with two steps: (1) copyright the software, and
                    389: (2) offer you this license which gives you legal permission to copy,
                    390: distribute and/or modify the software.
                    391: 
                    392:   Also, for each author's protection and ours, we want to make certain
                    393: that everyone understands that there is no warranty for this free
                    394: software.  If the software is modified by someone else and passed on, we
                    395: want its recipients to know that what they have is not the original, so
                    396: that any problems introduced by others will not reflect on the original
                    397: authors' reputations.
                    398: 
                    399:   Finally, any free program is threatened constantly by software
                    400: patents.  We wish to avoid the danger that redistributors of a free
                    401: program will individually obtain patent licenses, in effect making the
                    402: program proprietary.  To prevent this, we have made it clear that any
                    403: patent must be licensed for everyone's free use or not licensed at all.
                    404: 
                    405:   The precise terms and conditions for copying, distribution and
                    406: modification follow.
                    407: 
                    408: @iftex
                    409: @unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
                    410: @end iftex
                    411: @ifinfo
                    412: @center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
                    413: @end ifinfo
                    414: 
                    415: @enumerate 0
                    416: @item
                    417: This License applies to any program or other work which contains
                    418: a notice placed by the copyright holder saying it may be distributed
                    419: under the terms of this General Public License.  The ``Program'', below,
                    420: refers to any such program or work, and a ``work based on the Program''
                    421: means either the Program or any derivative work under copyright law:
                    422: that is to say, a work containing the Program or a portion of it,
                    423: either verbatim or with modifications and/or translated into another
                    424: language.  (Hereinafter, translation is included without limitation in
                    425: the term ``modification''.)  Each licensee is addressed as ``you''.
                    426: 
                    427: Activities other than copying, distribution and modification are not
                    428: covered by this License; they are outside its scope.  The act of
                    429: running the Program is not restricted, and the output from the Program
                    430: is covered only if its contents constitute a work based on the
                    431: Program (independent of having been made by running the Program).
                    432: Whether that is true depends on what the Program does.
                    433: 
                    434: @item
                    435: You may copy and distribute verbatim copies of the Program's
                    436: source code as you receive it, in any medium, provided that you
                    437: conspicuously and appropriately publish on each copy an appropriate
                    438: copyright notice and disclaimer of warranty; keep intact all the
                    439: notices that refer to this License and to the absence of any warranty;
                    440: and give any other recipients of the Program a copy of this License
                    441: along with the Program.
                    442: 
                    443: You may charge a fee for the physical act of transferring a copy, and
                    444: you may at your option offer warranty protection in exchange for a fee.
                    445: 
                    446: @item
                    447: You may modify your copy or copies of the Program or any portion
                    448: of it, thus forming a work based on the Program, and copy and
                    449: distribute such modifications or work under the terms of Section 1
                    450: above, provided that you also meet all of these conditions:
                    451: 
                    452: @enumerate a
                    453: @item
                    454: You must cause the modified files to carry prominent notices
                    455: stating that you changed the files and the date of any change.
                    456: 
                    457: @item
                    458: You must cause any work that you distribute or publish, that in
                    459: whole or in part contains or is derived from the Program or any
                    460: part thereof, to be licensed as a whole at no charge to all third
                    461: parties under the terms of this License.
                    462: 
                    463: @item
                    464: If the modified program normally reads commands interactively
                    465: when run, you must cause it, when started running for such
                    466: interactive use in the most ordinary way, to print or display an
                    467: announcement including an appropriate copyright notice and a
                    468: notice that there is no warranty (or else, saying that you provide
                    469: a warranty) and that users may redistribute the program under
                    470: these conditions, and telling the user how to view a copy of this
                    471: License.  (Exception: if the Program itself is interactive but
                    472: does not normally print such an announcement, your work based on
                    473: the Program is not required to print an announcement.)
                    474: @end enumerate
                    475: 
                    476: These requirements apply to the modified work as a whole.  If
                    477: identifiable sections of that work are not derived from the Program,
                    478: and can be reasonably considered independent and separate works in
                    479: themselves, then this License, and its terms, do not apply to those
                    480: sections when you distribute them as separate works.  But when you
                    481: distribute the same sections as part of a whole which is a work based
                    482: on the Program, the distribution of the whole must be on the terms of
                    483: this License, whose permissions for other licensees extend to the
                    484: entire whole, and thus to each and every part regardless of who wrote it.
                    485: 
                    486: Thus, it is not the intent of this section to claim rights or contest
                    487: your rights to work written entirely by you; rather, the intent is to
                    488: exercise the right to control the distribution of derivative or
                    489: collective works based on the Program.
                    490: 
                    491: In addition, mere aggregation of another work not based on the Program
                    492: with the Program (or with a work based on the Program) on a volume of
                    493: a storage or distribution medium does not bring the other work under
                    494: the scope of this License.
                    495: 
                    496: @item
                    497: You may copy and distribute the Program (or a work based on it,
                    498: under Section 2) in object code or executable form under the terms of
                    499: Sections 1 and 2 above provided that you also do one of the following:
                    500: 
                    501: @enumerate a
                    502: @item
                    503: Accompany it with the complete corresponding machine-readable
                    504: source code, which must be distributed under the terms of Sections
                    505: 1 and 2 above on a medium customarily used for software interchange; or,
                    506: 
                    507: @item
                    508: Accompany it with a written offer, valid for at least three
                    509: years, to give any third party, for a charge no more than your
                    510: cost of physically performing source distribution, a complete
                    511: machine-readable copy of the corresponding source code, to be
                    512: distributed under the terms of Sections 1 and 2 above on a medium
                    513: customarily used for software interchange; or,
                    514: 
                    515: @item
                    516: Accompany it with the information you received as to the offer
                    517: to distribute corresponding source code.  (This alternative is
                    518: allowed only for noncommercial distribution and only if you
                    519: received the program in object code or executable form with such
                    520: an offer, in accord with Subsection b above.)
                    521: @end enumerate
                    522: 
                    523: The source code for a work means the preferred form of the work for
                    524: making modifications to it.  For an executable work, complete source
                    525: code means all the source code for all modules it contains, plus any
                    526: associated interface definition files, plus the scripts used to
                    527: control compilation and installation of the executable.  However, as a
                    528: special exception, the source code distributed need not include
                    529: anything that is normally distributed (in either source or binary
                    530: form) with the major components (compiler, kernel, and so on) of the
                    531: operating system on which the executable runs, unless that component
                    532: itself accompanies the executable.
                    533: 
                    534: If distribution of executable or object code is made by offering
                    535: access to copy from a designated place, then offering equivalent
                    536: access to copy the source code from the same place counts as
                    537: distribution of the source code, even though third parties are not
                    538: compelled to copy the source along with the object code.
                    539: 
                    540: @item
                    541: You may not copy, modify, sublicense, or distribute the Program
                    542: except as expressly provided under this License.  Any attempt
                    543: otherwise to copy, modify, sublicense or distribute the Program is
                    544: void, and will automatically terminate your rights under this License.
                    545: However, parties who have received copies, or rights, from you under
                    546: this License will not have their licenses terminated so long as such
                    547: parties remain in full compliance.
                    548: 
                    549: @item
                    550: You are not required to accept this License, since you have not
                    551: signed it.  However, nothing else grants you permission to modify or
                    552: distribute the Program or its derivative works.  These actions are
                    553: prohibited by law if you do not accept this License.  Therefore, by
                    554: modifying or distributing the Program (or any work based on the
                    555: Program), you indicate your acceptance of this License to do so, and
                    556: all its terms and conditions for copying, distributing or modifying
                    557: the Program or works based on it.
                    558: 
                    559: @item
                    560: Each time you redistribute the Program (or any work based on the
                    561: Program), the recipient automatically receives a license from the
                    562: original licensor to copy, distribute or modify the Program subject to
                    563: these terms and conditions.  You may not impose any further
                    564: restrictions on the recipients' exercise of the rights granted herein.
                    565: You are not responsible for enforcing compliance by third parties to
                    566: this License.
                    567: 
                    568: @item
                    569: If, as a consequence of a court judgment or allegation of patent
                    570: infringement or for any other reason (not limited to patent issues),
                    571: conditions are imposed on you (whether by court order, agreement or
                    572: otherwise) that contradict the conditions of this License, they do not
                    573: excuse you from the conditions of this License.  If you cannot
                    574: distribute so as to satisfy simultaneously your obligations under this
                    575: License and any other pertinent obligations, then as a consequence you
                    576: may not distribute the Program at all.  For example, if a patent
                    577: license would not permit royalty-free redistribution of the Program by
                    578: all those who receive copies directly or indirectly through you, then
                    579: the only way you could satisfy both it and this License would be to
                    580: refrain entirely from distribution of the Program.
                    581: 
                    582: If any portion of this section is held invalid or unenforceable under
                    583: any particular circumstance, the balance of the section is intended to
                    584: apply and the section as a whole is intended to apply in other
                    585: circumstances.
                    586: 
                    587: It is not the purpose of this section to induce you to infringe any
                    588: patents or other property right claims or to contest validity of any
                    589: such claims; this section has the sole purpose of protecting the
                    590: integrity of the free software distribution system, which is
                    591: implemented by public license practices.  Many people have made
                    592: generous contributions to the wide range of software distributed
                    593: through that system in reliance on consistent application of that
                    594: system; it is up to the author/donor to decide if he or she is willing
                    595: to distribute software through any other system and a licensee cannot
                    596: impose that choice.
                    597: 
                    598: This section is intended to make thoroughly clear what is believed to
                    599: be a consequence of the rest of this License.
                    600: 
                    601: @item
                    602: If the distribution and/or use of the Program is restricted in
                    603: certain countries either by patents or by copyrighted interfaces, the
                    604: original copyright holder who places the Program under this License
                    605: may add an explicit geographical distribution limitation excluding
                    606: those countries, so that distribution is permitted only in or among
                    607: countries not thus excluded.  In such case, this License incorporates
                    608: the limitation as if written in the body of this License.
                    609: 
                    610: @item
                    611: The Free Software Foundation may publish revised and/or new versions
                    612: of the General Public License from time to time.  Such new versions will
                    613: be similar in spirit to the present version, but may differ in detail to
                    614: address new problems or concerns.
                    615: 
                    616: Each version is given a distinguishing version number.  If the Program
                    617: specifies a version number of this License which applies to it and ``any
                    618: later version'', you have the option of following the terms and conditions
                    619: either of that version or of any later version published by the Free
                    620: Software Foundation.  If the Program does not specify a version number of
                    621: this License, you may choose any version ever published by the Free Software
                    622: Foundation.
                    623: 
                    624: @item
                    625: If you wish to incorporate parts of the Program into other free
                    626: programs whose distribution conditions are different, write to the author
                    627: to ask for permission.  For software which is copyrighted by the Free
                    628: Software Foundation, write to the Free Software Foundation; we sometimes
                    629: make exceptions for this.  Our decision will be guided by the two goals
                    630: of preserving the free status of all derivatives of our free software and
                    631: of promoting the sharing and reuse of software generally.
                    632: 
                    633: @iftex
                    634: @heading NO WARRANTY
                    635: @end iftex
                    636: @ifinfo
                    637: @center NO WARRANTY
                    638: @end ifinfo
                    639: 
                    640: @item
                    641: BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
                    642: FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
                    643: OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
                    644: PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
                    645: OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
                    646: MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
                    647: TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
                    648: PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
                    649: REPAIR OR CORRECTION.
                    650: 
                    651: @item
                    652: IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
                    653: WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
                    654: REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
                    655: INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
                    656: OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
                    657: TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
                    658: YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
                    659: PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
                    660: POSSIBILITY OF SUCH DAMAGES.
                    661: @end enumerate
                    662: 
                    663: @iftex
                    664: @heading END OF TERMS AND CONDITIONS
                    665: @end iftex
                    666: @ifinfo
                    667: @center END OF TERMS AND CONDITIONS
                    668: @end ifinfo
                    669: 
                    670: @page
                    671: @unnumberedsec How to Apply These Terms to Your New Programs
                    672: 
                    673:   If you develop a new program, and you want it to be of the greatest
                    674: possible use to the public, the best way to achieve this is to make it
                    675: free software which everyone can redistribute and change under these terms.
                    676: 
                    677:   To do so, attach the following notices to the program.  It is safest
                    678: to attach them to the start of each source file to most effectively
                    679: convey the exclusion of warranty; and each file should have at least
                    680: the ``copyright'' line and a pointer to where the full notice is found.
                    681: 
                    682: @smallexample
                    683: @var{one line to give the program's name and a brief idea of what it does.}
                    684: Copyright (C) 19@var{yy}  @var{name of author}
                    685: 
                    686: This program is free software; you can redistribute it and/or modify
                    687: it under the terms of the GNU General Public License as published by
                    688: the Free Software Foundation; either version 2 of the License, or
                    689: (at your option) any later version.
                    690: 
                    691: This program is distributed in the hope that it will be useful,
                    692: but WITHOUT ANY WARRANTY; without even the implied warranty of
                    693: MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
                    694: GNU General Public License for more details.
                    695: 
                    696: You should have received a copy of the GNU General Public License
                    697: along with this program; if not, write to the Free Software
                    698: Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
                    699: @end smallexample
                    700: 
                    701: Also add information on how to contact you by electronic and paper mail.
                    702: 
                    703: If the program is interactive, make it output a short notice like this
                    704: when it starts in an interactive mode:
                    705: 
                    706: @smallexample
                    707: Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
                    708: Gnomovision comes with ABSOLUTELY NO WARRANTY; for details 
                    709: type `show w'.
                    710: This is free software, and you are welcome to redistribute it
                    711: under certain conditions; type `show c' for details.
                    712: @end smallexample
                    713: 
                    714: The hypothetical commands @samp{show w} and @samp{show c} should show
                    715: the appropriate parts of the General Public License.  Of course, the
                    716: commands you use may be called something other than @samp{show w} and
                    717: @samp{show c}; they could even be mouse-clicks or menu items---whatever
                    718: suits your program.
                    719: 
                    720: You should also get your employer (if you work as a programmer) or your
                    721: school, if any, to sign a ``copyright disclaimer'' for the program, if
                    722: necessary.  Here is a sample; alter the names:
                    723: 
                    724: @smallexample
                    725: Yoyodyne, Inc., hereby disclaims all copyright interest in the program
                    726: `Gnomovision' (which makes passes at compilers) written by James Hacker.
                    727: 
                    728: @var{signature of Ty Coon}, 1 April 1989
                    729: Ty Coon, President of Vice
                    730: @end smallexample
                    731: 
                    732: This General Public License does not permit incorporating your program into
                    733: proprietary programs.  If your program is a subroutine library, you may
                    734: consider it more useful to permit linking proprietary applications with the
                    735: library.  If this is what you want to do, use the GNU Library General
                    736: Public License instead of this License.
                    737: 
                    738: @node Concepts, Examples, Copying, Top
                    739: @chapter The Concepts of Bison
                    740: 
                    741: This chapter introduces many of the basic concepts without which the
                    742: details of Bison will not make sense.  If you do not already know how to
                    743: use Bison or Yacc, we suggest you start by reading this chapter carefully.
                    744: 
                    745: @menu
                    746: * Language and Grammar::  Languages and context-free grammars,
                    747:                             as mathematical ideas.
                    748: * Grammar in Bison::  How we represent grammars for Bison's sake.
                    749: * Semantic Values::   Each token or syntactic grouping can have
                    750:                         a semantic value (the value of an integer,
                    751:                         the name of an identifier, etc.).
                    752: * Semantic Actions::  Each rule can have an action containing C code.
                    753: * Bison Parser::      What are Bison's input and output,
                    754:                         how is the output used?
                    755: * Stages::            Stages in writing and running Bison grammars.
                    756: * Grammar Layout::    Overall structure of a Bison grammar file.
                    757: @end menu
                    758: 
                    759: @node Language and Grammar, Grammar in Bison,  , Concepts
                    760: @section Languages and Context-Free Grammars
                    761: 
                    762: @c !!! ``An expression can be an integer'' is not a valid Bison
                    763: @c expression---Bison cannot read English! --rjc     6 Feb 1992
                    764: @cindex context-free grammar
                    765: @cindex grammar, context-free
                    766: In order for Bison to parse a language, it must be described by a
                    767: @dfn{context-free grammar}.  This means that you specify one or more
                    768: @dfn{syntactic groupings} and give rules for constructing them from their
                    769: parts.  For example, in the C language, one kind of grouping is called an
                    770: `expression'.  One rule for making an expression might be, ``An expression
                    771: can be made of a minus sign and another expression''.  Another would be,
                    772: ``An expression can be an integer''.  As you can see, rules are often
                    773: recursive, but there must be at least one rule which leads out of the
                    774: recursion.
                    775: 
                    776: @cindex BNF
                    777: @cindex Backus-Naur form
                    778: The most common formal system for presenting such rules for humans to read
                    779: is @dfn{Backus-Naur Form} or ``BNF'', which was developed in order to
                    780: specify the language Algol 60.  Any grammar expressed in BNF is a
                    781: context-free grammar.  The input to Bison is essentially machine-readable
                    782: BNF.
                    783: 
                    784: Not all context-free languages can be handled by Bison, only those
                    785: that are LALR(1).  In brief, this means that it must be possible to
                    786: tell how to parse any portion of an input string with just a single
                    787: token of look-ahead.  Strictly speaking, that is a description of an
                    788: LR(1) grammar, and LALR(1) involves additional restrictions that are
                    789: hard to explain simply; but it is rare in actual practice to find an
                    790: LR(1) grammar that fails to be LALR(1).  @xref{Mystery Conflicts, ,
                    791: Mysterious Reduce/Reduce Conflicts}, for more information on this.
                    792: 
                    793: @cindex symbols (abstract)
                    794: @cindex token
                    795: @cindex syntactic grouping
                    796: @cindex grouping, syntactic
                    797: In the formal grammatical rules for a language, each kind of syntactic unit
                    798: or grouping is named by a @dfn{symbol}.  Those which are built by grouping
                    799: smaller constructs according to grammatical rules are called
                    800: @dfn{nonterminal symbols}; those which can't be subdivided are called
                    801: @dfn{terminal symbols} or @dfn{token types}.  We call a piece of input
                    802: corresponding to a single terminal symbol a @dfn{token}, and a piece
                    803: corresponding to a single nonterminal symbol a @dfn{grouping}.@refill
                    804: 
                    805: We can use the C language as an example of what symbols, terminal and
                    806: nonterminal, mean.  The tokens of C are identifiers, constants (numeric and
                    807: string), and the various keywords, arithmetic operators and punctuation
                    808: marks.  So the terminal symbols of a grammar for C include `identifier',
                    809: `number', `string', plus one symbol for each keyword, operator or
                    810: punctuation mark: `if', `return', `const', `static', `int', `char',
                    811: `plus-sign', `open-brace', `close-brace', `comma' and many more.  (These
                    812: tokens can be subdivided into characters, but that is a matter of
                    813: lexicography, not grammar.)
                    814: 
                    815: Here is a simple C function subdivided into tokens:
                    816: 
                    817: @example
                    818: int             /* @r{keyword `int'} */
                    819: square (x)      /* @r{identifier, open-paren,} */
                    820:                 /* @r{identifier, close-paren} */
                    821:      int x;     /* @r{keyword `int', identifier, semicolon} */
                    822: @{               /* @r{open-brace} */
                    823:   return x * x; /* @r{keyword `return', identifier,} */
                    824:                 /* @r{asterisk, identifier, semicolon} */
                    825: @}               /* @r{close-brace} */
                    826: @end example
                    827: 
                    828: The syntactic groupings of C include the expression, the statement, the
                    829: declaration, and the function definition.  These are represented in the
                    830: grammar of C by nonterminal symbols `expression', `statement',
                    831: `declaration' and `function definition'.  The full grammar uses dozens of
                    832: additional language constructs, each with its own nonterminal symbol, in
                    833: order to express the meanings of these four.  The example above is a
                    834: function definition; it contains one declaration, and one statement.  In
                    835: the statement, each @samp{x} is an expression and so is @samp{x * x}.
                    836: 
                    837: Each nonterminal symbol must have grammatical rules showing how it is made
                    838: out of simpler constructs.  For example, one kind of C statement is the
                    839: @code{return} statement; this would be described with a grammar rule which
                    840: reads informally as follows:
                    841: 
                    842: @quotation
                    843: A `statement' can be made of a `return' keyword, an `expression' and a
                    844: `semicolon'.
                    845: @end quotation
                    846: 
                    847: @noindent
                    848: There would be many other rules for `statement', one for each kind of
                    849: statement in C.
                    850: 
                    851: @cindex start symbol
                    852: One nonterminal symbol must be distinguished as the special one which
                    853: defines a complete utterance in the language.  It is called the @dfn{start
                    854: symbol}.  In a compiler, this means a complete input program.  In the C
                    855: language, the nonterminal symbol `sequence of definitions and declarations'
                    856: plays this role.
                    857: 
                    858: For example, @samp{1 + 2} is a valid C expression---a valid part of a C
                    859: program---but it is not valid as an @emph{entire} C program.  In the
                    860: context-free grammar of C, this follows from the fact that `expression' is
                    861: not the start symbol.
                    862: 
                    863: The Bison parser reads a sequence of tokens as its input, and groups the
                    864: tokens using the grammar rules.  If the input is valid, the end result is
                    865: that the entire token sequence reduces to a single grouping whose symbol is
                    866: the grammar's start symbol.  If we use a grammar for C, the entire input
                    867: must be a `sequence of definitions and declarations'.  If not, the parser
                    868: reports a syntax error.
                    869: 
                    870: @node Grammar in Bison, Semantic Values, Language and Grammar, Concepts
                    871: @section From Formal Rules to Bison Input
                    872: @cindex Bison grammar
                    873: @cindex grammar, Bison
                    874: @cindex formal grammar
                    875: 
                    876: A formal grammar is a mathematical construct.  To define the language
                    877: for Bison, you must write a file expressing the grammar in Bison syntax:
                    878: a @dfn{Bison grammar} file.  @xref{Grammar File, ,Bison Grammar Files}.
                    879: 
                    880: A nonterminal symbol in the formal grammar is represented in Bison input
                    881: as an identifier, like an identifier in C.  By convention, it should be
                    882: in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
                    883: 
                    884: The Bison representation for a terminal symbol is also called a @dfn{token
                    885: type}.  Token types as well can be represented as C-like identifiers.  By
                    886: convention, these identifiers should be upper case to distinguish them from
                    887: nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
                    888: @code{RETURN}.  A terminal symbol that stands for a particular keyword in
                    889: the language should be named after that keyword converted to upper case.
                    890: The terminal symbol @code{error} is reserved for error recovery.
                    891: @xref{Symbols}.@refill
                    892: 
                    893: A terminal symbol can also be represented as a character literal, just like
                    894: a C character constant.  You should do this whenever a token is just a
                    895: single character (parenthesis, plus-sign, etc.): use that same character in
                    896: a literal as the terminal symbol for that token.
                    897: 
                    898: The grammar rules also have an expression in Bison syntax.  For example,
                    899: here is the Bison rule for a C @code{return} statement.  The semicolon in
                    900: quotes is a literal character token, representing part of the C syntax for
                    901: the statement; the naked semicolon, and the colon, are Bison punctuation
                    902: used in every rule.
                    903: 
                    904: @example
                    905: stmt:   RETURN expr ';'
                    906:         ;
                    907: @end example
                    908: 
                    909: @noindent
                    910: @xref{Rules, ,Syntax of Grammar Rules}.
                    911: 
                    912: @node Semantic Values, Semantic Actions, Grammar in Bison, Concepts
                    913: @section Semantic Values
                    914: @cindex semantic value
                    915: @cindex value, semantic
                    916: 
                    917: A formal grammar selects tokens only by their classifications: for example,
                    918: if a rule mentions the terminal symbol `integer constant', it means that
                    919: @emph{any} integer constant is grammatically valid in that position.  The
                    920: precise value of the constant is irrelevant to how to parse the input: if
                    921: @samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
                    922: grammatical.@refill
                    923: 
                    924: But the precise value is very important for what the input means once it is
                    925: parsed.  A compiler is useless if it fails to distinguish between 4, 1 and
                    926: 3989 as constants in the program!  Therefore, each token in a Bison grammar
                    927: has both a token type and a @dfn{semantic value}.  @xref{Semantics, ,Defining Language Semantics},
                    928: for details.
                    929: 
                    930: The token type is a terminal symbol defined in the grammar, such as
                    931: @code{INTEGER}, @code{IDENTIFIER} or @code{','}.  It tells everything
                    932: you need to know to decide where the token may validly appear and how to
                    933: group it with other tokens.  The grammar rules know nothing about tokens
                    934: except their types.@refill
                    935: 
                    936: The semantic value has all the rest of the information about the
                    937: meaning of the token, such as the value of an integer, or the name of an
                    938: identifier.  (A token such as @code{','} which is just punctuation doesn't
                    939: need to have any semantic value.)
                    940: 
                    941: For example, an input token might be classified as token type
                    942: @code{INTEGER} and have the semantic value 4.  Another input token might
                    943: have the same token type @code{INTEGER} but value 3989.  When a grammar
                    944: rule says that @code{INTEGER} is allowed, either of these tokens is
                    945: acceptable because each is an @code{INTEGER}.  When the parser accepts the
                    946: token, it keeps track of the token's semantic value.
                    947: 
                    948: Each grouping can also have a semantic value as well as its nonterminal
                    949: symbol.  For example, in a calculator, an expression typically has a
                    950: semantic value that is a number.  In a compiler for a programming
                    951: language, an expression typically has a semantic value that is a tree
                    952: structure describing the meaning of the expression.
                    953: 
                    954: @node Semantic Actions, Bison Parser, Semantic Values, Concepts
                    955: @section Semantic Actions
                    956: @cindex semantic actions
                    957: @cindex actions, semantic
                    958: 
                    959: In order to be useful, a program must do more than parse input; it must
                    960: also produce some output based on the input.  In a Bison grammar, a grammar
                    961: rule can have an @dfn{action} made up of C statements.  Each time the
                    962: parser recognizes a match for that rule, the action is executed.
                    963: @xref{Actions}.
                    964:         
                    965: Most of the time, the purpose of an action is to compute the semantic value
                    966: of the whole construct from the semantic values of its parts.  For example,
                    967: suppose we have a rule which says an expression can be the sum of two
                    968: expressions.  When the parser recognizes such a sum, each of the
                    969: subexpressions has a semantic value which describes how it was built up.
                    970: The action for this rule should create a similar sort of value for the
                    971: newly recognized larger expression.
                    972: 
                    973: For example, here is a rule that says an expression can be the sum of
                    974: two subexpressions:
                    975: 
                    976: @example
                    977: expr: expr '+' expr   @{ $$ = $1 + $3; @}
                    978:         ;
                    979: @end example
                    980: 
                    981: @noindent
                    982: The action says how to produce the semantic value of the sum expression
                    983: from the values of the two subexpressions.
                    984: 
                    985: @node Bison Parser, Stages, Semantic Actions, Concepts
                    986: @section Bison Output: the Parser File
                    987: @cindex Bison parser
                    988: @cindex Bison utility
                    989: @cindex lexical analyzer, purpose
                    990: @cindex parser
                    991: 
                    992: When you run Bison, you give it a Bison grammar file as input.  The output
                    993: is a C source file that parses the language described by the grammar.
                    994: This file is called a @dfn{Bison parser}.  Keep in mind that the Bison
                    995: utility and the Bison parser are two distinct programs: the Bison utility
                    996: is a program whose output is the Bison parser that becomes part of your
                    997: program.
                    998: 
                    999: The job of the Bison parser is to group tokens into groupings according to
                   1000: the grammar rules---for example, to build identifiers and operators into
                   1001: expressions.  As it does this, it runs the actions for the grammar rules it
                   1002: uses.
                   1003: 
                   1004: The tokens come from a function called the @dfn{lexical analyzer} that you
                   1005: must supply in some fashion (such as by writing it in C).  The Bison parser
                   1006: calls the lexical analyzer each time it wants a new token.  It doesn't know
                   1007: what is ``inside'' the tokens (though their semantic values may reflect
                   1008: this).  Typically the lexical analyzer makes the tokens by parsing
                   1009: characters of text, but Bison does not depend on this.  @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
                   1010: 
                   1011: The Bison parser file is C code which defines a function named
                   1012: @code{yyparse} which implements that grammar.  This function does not make
                   1013: a complete C program: you must supply some additional functions.  One is
                   1014: the lexical analyzer.  Another is an error-reporting function which the
                   1015: parser calls to report an error.  In addition, a complete C program must
                   1016: start with a function called @code{main}; you have to provide this, and
                   1017: arrange for it to call @code{yyparse} or the parser will never run.
                   1018: @xref{Interface, ,Parser C-Language Interface}.
                   1019: 
                   1020: Aside from the token type names and the symbols in the actions you
                   1021: write, all variable and function names used in the Bison parser file
                   1022: begin with @samp{yy} or @samp{YY}.  This includes interface functions
                   1023: such as the lexical analyzer function @code{yylex}, the error reporting
                   1024: function @code{yyerror} and the parser function @code{yyparse} itself.
                   1025: This also includes numerous identifiers used for internal purposes.
                   1026: Therefore, you should avoid using C identifiers starting with @samp{yy}
                   1027: or @samp{YY} in the Bison grammar file except for the ones defined in
                   1028: this manual.
                   1029: 
                   1030: @node Stages, Grammar Layout, Bison Parser, Concepts
                   1031: @section Stages in Using Bison
                   1032: @cindex stages in using Bison
                   1033: @cindex using Bison
                   1034: 
                   1035: The actual language-design process using Bison, from grammar specification
                   1036: to a working compiler or interpreter, has these parts:
                   1037: 
                   1038: @enumerate
                   1039: @item
                   1040: Formally specify the grammar in a form recognized by Bison
                   1041: (@pxref{Grammar File, ,Bison Grammar Files}).  For each grammatical rule in the language,
                   1042: describe the action that is to be taken when an instance of that rule
                   1043: is recognized.  The action is described by a sequence of C statements.
                   1044: 
                   1045: @item
                   1046: Write a lexical analyzer to process input and pass tokens to the
                   1047: parser.  The lexical analyzer may be written by hand in C
                   1048: (@pxref{Lexical, ,The Lexical Analyzer Function @code{yylex}}).  It could also be produced using Lex, but the use
                   1049: of Lex is not discussed in this manual.
                   1050: 
                   1051: @item
                   1052: Write a controlling function that calls the Bison-produced parser.
                   1053: 
                   1054: @item
                   1055: Write error-reporting routines.
                   1056: @end enumerate
                   1057: 
                   1058: To turn this source code as written into a runnable program, you
                   1059: must follow these steps:
                   1060: 
                   1061: @enumerate
                   1062: @item
                   1063: Run Bison on the grammar to produce the parser.
                   1064: 
                   1065: @item
                   1066: Compile the code output by Bison, as well as any other source files.
                   1067: 
                   1068: @item
                   1069: Link the object files to produce the finished product.
                   1070: @end enumerate
                   1071: 
                   1072: @node Grammar Layout,  , Stages, Concepts
                   1073: @section The Overall Layout of a Bison Grammar
                   1074: @cindex grammar file
                   1075: @cindex file format
                   1076: @cindex format of grammar file
                   1077: @cindex layout of Bison grammar
                   1078: 
                   1079: The input file for the Bison utility is a @dfn{Bison grammar file}.  The
                   1080: general form of a Bison grammar file is as follows:
                   1081: 
                   1082: @example
                   1083: %@{
                   1084: @var{C declarations}
                   1085: %@}
                   1086: 
                   1087: @var{Bison declarations}
                   1088: 
                   1089: %%
                   1090: @var{Grammar rules}
                   1091: %%
                   1092: @var{Additional C code}
                   1093: @end example
                   1094: 
                   1095: @noindent
                   1096: The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
                   1097: in every Bison grammar file to separate the sections.
                   1098: 
                   1099: The C declarations may define types and variables used in the actions.
                   1100: You can also use preprocessor commands to define macros used there, and use
                   1101: @code{#include} to include header files that do any of these things.
                   1102: 
                   1103: The Bison declarations declare the names of the terminal and nonterminal
                   1104: symbols, and may also describe operator precedence and the data types of
                   1105: semantic values of various symbols.
                   1106: 
                   1107: The grammar rules define how to construct each nonterminal symbol from its
                   1108: parts.
                   1109: 
                   1110: The additional C code can contain any C code you want to use.  Often the
                   1111: definition of the lexical analyzer @code{yylex} goes here, plus subroutines
                   1112: called by the actions in the grammar rules.  In a simple program, all the
                   1113: rest of the program can go here.
                   1114: 
                   1115: @node Examples, Grammar File, Concepts, Top
                   1116: @chapter Examples
                   1117: @cindex simple examples
                   1118: @cindex examples, simple
                   1119: 
                   1120: Now we show and explain three sample programs written using Bison: a
                   1121: reverse polish notation calculator, an algebraic (infix) notation
                   1122: calculator, and a multi-function calculator.  All three have been tested
                   1123: under BSD Unix 4.3; each produces a usable, though limited, interactive
                   1124: desk-top calculator.
                   1125: 
                   1126: These examples are simple, but Bison grammars for real programming
                   1127: languages are written the same way.
                   1128: @ifinfo
                   1129: You can copy these examples out of the Info file and into a source file
                   1130: to try them.
                   1131: @end ifinfo
                   1132: 
                   1133: @menu
                   1134: * RPN Calc::          Reverse polish notation calculator;
                   1135:                         a first example with no operator precedence.
                   1136: * Infix Calc::        Infix (algebraic) notation calculator.
                   1137:                         Operator precedence is introduced.
                   1138: * Simple Error Recovery::  Continuing after syntax errors.
                   1139: * Multi-function Calc::  Calculator with memory and trig functions.
                   1140:                            It uses multiple data-types for semantic values.
                   1141: * Exercises::         Ideas for improving the multi-function calculator.
                   1142: @end menu
                   1143: 
                   1144: @node RPN Calc, Infix Calc,  , Examples
                   1145: @section Reverse Polish Notation Calculator
                   1146: @cindex reverse polish notation
                   1147: @cindex polish notation calculator
                   1148: @cindex @code{rpcalc}
                   1149: @cindex calculator, simple
                   1150: 
                   1151: The first example is that of a simple double-precision @dfn{reverse polish
                   1152: notation} calculator (a calculator using postfix operators).  This example
                   1153: provides a good starting point, since operator precedence is not an issue.
                   1154: The second example will illustrate how operator precedence is handled.
                   1155: 
                   1156: The source code for this calculator is named @file{rpcalc.y}.  The
                   1157: @samp{.y} extension is a convention used for Bison input files.
                   1158: 
                   1159: @menu
                   1160: * Decls: Rpcalc Decls.  Bison and C declarations for rpcalc.
                   1161: * Rules: Rpcalc Rules.  Grammar Rules for rpcalc, with explanation.
                   1162: * Lexer: Rpcalc Lexer.  The lexical analyzer.
                   1163: * Main: Rpcalc Main.    The controlling function.
                   1164: * Error: Rpcalc Error.  The error reporting function.
                   1165: * Gen: Rpcalc Gen.      Running Bison on the grammar file.
                   1166: * Comp: Rpcalc Compile. Run the C compiler on the output code.
                   1167: @end menu
                   1168: 
                   1169: @node Rpcalc Decls, Rpcalc Rules,  , RPN Calc
                   1170: @subsection Declarations for @code{rpcalc}
                   1171: 
                   1172: Here are the C and Bison declarations for the reverse polish notation
                   1173: calculator.  As in C, comments are placed between @samp{/*@dots{}*/}.
                   1174: 
                   1175: @example
                   1176: /* Reverse polish notation calculator. */
                   1177: 
                   1178: %@{
                   1179: #define YYSTYPE double
                   1180: #include <math.h>
                   1181: %@}
                   1182: 
                   1183: %token NUM
                   1184: 
                   1185: %% /* Grammar rules and actions follow */
                   1186: @end example
                   1187: 
                   1188: The C declarations section (@pxref{C Declarations, ,The C Declarations Section}) contains two
                   1189: preprocessor directives.
                   1190: 
                   1191: The @code{#define} directive defines the macro @code{YYSTYPE}, thus
                   1192: specifying the C data type for semantic values of both tokens and groupings
                   1193: (@pxref{Value Type, ,Data Types of Semantic Values}).  The Bison parser will use whatever type
                   1194: @code{YYSTYPE} is defined as; if you don't define it, @code{int} is the
                   1195: default.  Because we specify @code{double}, each token and each expression
                   1196: has an associated value, which is a floating point number.
                   1197: 
                   1198: The @code{#include} directive is used to declare the exponentiation
                   1199: function @code{pow}.
                   1200: 
                   1201: The second section, Bison declarations, provides information to Bison about
                   1202: the token types (@pxref{Bison Declarations, ,The Bison Declarations Section}).  Each terminal symbol that is
                   1203: not a single-character literal must be declared here.  (Single-character
                   1204: literals normally don't need to be declared.)  In this example, all the
                   1205: arithmetic operators are designated by single-character literals, so the
                   1206: only terminal symbol that needs to be declared is @code{NUM}, the token
                   1207: type for numeric constants.
                   1208: 
                   1209: @node Rpcalc Rules, Rpcalc Lexer, Rpcalc Decls, RPN Calc
                   1210: @subsection Grammar Rules for @code{rpcalc}
                   1211: 
                   1212: Here are the grammar rules for the reverse polish notation calculator.
                   1213: 
                   1214: @example
                   1215: input:    /* empty */
                   1216:         | input line
                   1217: ;
                   1218: 
                   1219: line:     '\n'
                   1220:         | exp '\n'  @{ printf ("\t%.10g\n", $1); @}
                   1221: ;
                   1222: 
                   1223: exp:      NUM             @{ $$ = $1;         @}
                   1224:         | exp exp '+'     @{ $$ = $1 + $2;    @}
                   1225:         | exp exp '-'     @{ $$ = $1 - $2;    @}
                   1226:         | exp exp '*'     @{ $$ = $1 * $2;    @}
                   1227:         | exp exp '/'     @{ $$ = $1 / $2;    @}
                   1228:       /* Exponentiation */
                   1229:         | exp exp '^'     @{ $$ = pow ($1, $2); @}
                   1230:       /* Unary minus    */
                   1231:         | exp 'n'         @{ $$ = -$1;        @}
                   1232: ;
                   1233: %%
                   1234: @end example
                   1235: 
                   1236: The groupings of the rpcalc ``language'' defined here are the expression
                   1237: (given the name @code{exp}), the line of input (@code{line}), and the
                   1238: complete input transcript (@code{input}).  Each of these nonterminal
                   1239: symbols has several alternate rules, joined by the @samp{|} punctuator
                   1240: which is read as ``or''.  The following sections explain what these rules
                   1241: mean.
                   1242: 
                   1243: The semantics of the language is determined by the actions taken when a
                   1244: grouping is recognized.  The actions are the C code that appears inside
                   1245: braces.  @xref{Actions}.
                   1246: 
                   1247: You must specify these actions in C, but Bison provides the means for
                   1248: passing semantic values between the rules.  In each action, the
                   1249: pseudo-variable @code{$$} stands for the semantic value for the grouping
                   1250: that the rule is going to construct.  Assigning a value to @code{$$} is the
                   1251: main job of most actions.  The semantic values of the components of the
                   1252: rule are referred to as @code{$1}, @code{$2}, and so on.
                   1253: 
                   1254: @menu
                   1255: * Rpcalc Input::      
                   1256: * Rpcalc Line::       
                   1257: * Rpcalc Expr::       
                   1258: @end menu
                   1259: 
                   1260: @node Rpcalc Input, Rpcalc Line,  , Rpcalc Rules
                   1261: @subsubsection Explanation of @code{input}
                   1262: 
                   1263: Consider the definition of @code{input}:
                   1264: 
                   1265: @example
                   1266: input:    /* empty */
                   1267:         | input line
                   1268: ;
                   1269: @end example
                   1270: 
                   1271: This definition reads as follows: ``A complete input is either an empty
                   1272: string, or a complete input followed by an input line''.  Notice that
                   1273: ``complete input'' is defined in terms of itself.  This definition is said
                   1274: to be @dfn{left recursive} since @code{input} appears always as the
                   1275: leftmost symbol in the sequence.  @xref{Recursion, ,Recursive Rules}.
                   1276: 
                   1277: The first alternative is empty because there are no symbols between the
                   1278: colon and the first @samp{|}; this means that @code{input} can match an
                   1279: empty string of input (no tokens).  We write the rules this way because it
                   1280: is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
                   1281: It's conventional to put an empty alternative first and write the comment
                   1282: @samp{/* empty */} in it.
                   1283: 
                   1284: The second alternate rule (@code{input line}) handles all nontrivial input.
                   1285: It means, ``After reading any number of lines, read one more line if
                   1286: possible.''  The left recursion makes this rule into a loop.  Since the
                   1287: first alternative matches empty input, the loop can be executed zero or
                   1288: more times.
                   1289: 
                   1290: The parser function @code{yyparse} continues to process input until a
                   1291: grammatical error is seen or the lexical analyzer says there are no more
                   1292: input tokens; we will arrange for the latter to happen at end of file.
                   1293: 
                   1294: @node Rpcalc Line, Rpcalc Expr, Rpcalc Input, Rpcalc Rules
                   1295: @subsubsection Explanation of @code{line}
                   1296: 
                   1297: Now consider the definition of @code{line}:
                   1298: 
                   1299: @example
                   1300: line:     '\n'
                   1301:         | exp '\n'  @{ printf ("\t%.10g\n", $1); @}
                   1302: ;
                   1303: @end example
                   1304: 
                   1305: The first alternative is a token which is a newline character; this means
                   1306: that rpcalc accepts a blank line (and ignores it, since there is no
                   1307: action).  The second alternative is an expression followed by a newline.
                   1308: This is the alternative that makes rpcalc useful.  The semantic value of
                   1309: the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
                   1310: question is the first symbol in the alternative.  The action prints this
                   1311: value, which is the result of the computation the user asked for.
                   1312: 
                   1313: This action is unusual because it does not assign a value to @code{$$}.  As
                   1314: a consequence, the semantic value associated with the @code{line} is
                   1315: uninitialized (its value will be unpredictable).  This would be a bug if
                   1316: that value were ever used, but we don't use it: once rpcalc has printed the
                   1317: value of the user's input line, that value is no longer needed.
                   1318: 
                   1319: @node Rpcalc Expr,  , Rpcalc Line, Rpcalc Rules
                   1320: @subsubsection Explanation of @code{expr}
                   1321: 
                   1322: The @code{exp} grouping has several rules, one for each kind of expression.
                   1323: The first rule handles the simplest expressions: those that are just numbers.
                   1324: The second handles an addition-expression, which looks like two expressions
                   1325: followed by a plus-sign.  The third handles subtraction, and so on.
                   1326: 
                   1327: @example
                   1328: exp:      NUM
                   1329:         | exp exp '+'     @{ $$ = $1 + $2;    @}
                   1330:         | exp exp '-'     @{ $$ = $1 - $2;    @}
                   1331:         @dots{}
                   1332:         ;
                   1333: @end example
                   1334: 
                   1335: We have used @samp{|} to join all the rules for @code{exp}, but we could
                   1336: equally well have written them separately:
                   1337: 
                   1338: @example
                   1339: exp:      NUM ;
                   1340: exp:      exp exp '+'     @{ $$ = $1 + $2;    @} ;
                   1341: exp:      exp exp '-'     @{ $$ = $1 - $2;    @} ;
                   1342:         @dots{}
                   1343: @end example
                   1344: 
                   1345: Most of the rules have actions that compute the value of the expression in
                   1346: terms of the value of its parts.  For example, in the rule for addition,
                   1347: @code{$1} refers to the first component @code{exp} and @code{$2} refers to
                   1348: the second one.  The third component, @code{'+'}, has no meaningful
                   1349: associated semantic value, but if it had one you could refer to it as
                   1350: @code{$3}.  When @code{yyparse} recognizes a sum expression using this
                   1351: rule, the sum of the two subexpressions' values is produced as the value of
                   1352: the entire expression.  @xref{Actions}.
                   1353: 
                   1354: You don't have to give an action for every rule.  When a rule has no
                   1355: action, Bison by default copies the value of @code{$1} into @code{$$}.
                   1356: This is what happens in the first rule (the one that uses @code{NUM}).
                   1357: 
                   1358: The formatting shown here is the recommended convention, but Bison does
                   1359: not require it.  You can add or change whitespace as much as you wish.
                   1360: For example, this:
                   1361: 
                   1362: @example
                   1363: exp   : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{}
                   1364: @end example
                   1365: 
                   1366: @noindent
                   1367: means the same thing as this:
                   1368: 
                   1369: @example
                   1370: exp:      NUM
                   1371:         | exp exp '+'    @{ $$ = $1 + $2; @}
                   1372:         | @dots{}
                   1373: @end example
                   1374: 
                   1375: @noindent
                   1376: The latter, however, is much more readable.
                   1377: 
                   1378: @node Rpcalc Lexer, Rpcalc Main, Rpcalc Rules, RPN Calc
                   1379: @subsection The @code{rpcalc} Lexical Analyzer
                   1380: @cindex writing a lexical analyzer
                   1381: @cindex lexical analyzer, writing
                   1382: 
                   1383: The lexical analyzer's job is low-level parsing: converting characters or
                   1384: sequences of characters into tokens.  The Bison parser gets its tokens by
                   1385: calling the lexical analyzer.  @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
                   1386: 
                   1387: Only a simple lexical analyzer is needed for the RPN calculator.  This
                   1388: lexical analyzer skips blanks and tabs, then reads in numbers as
                   1389: @code{double} and returns them as @code{NUM} tokens.  Any other character
                   1390: that isn't part of a number is a separate token.  Note that the token-code
                   1391: for such a single-character token is the character itself.
                   1392: 
                   1393: The return value of the lexical analyzer function is a numeric code which
                   1394: represents a token type.  The same text used in Bison rules to stand for
                   1395: this token type is also a C expression for the numeric code for the type.
                   1396: This works in two ways.  If the token type is a character literal, then its
                   1397: numeric code is the ASCII code for that character; you can use the same
                   1398: character literal in the lexical analyzer to express the number.  If the
                   1399: token type is an identifier, that identifier is defined by Bison as a C
                   1400: macro whose definition is the appropriate number.  In this example,
                   1401: therefore, @code{NUM} becomes a macro for @code{yylex} to use.
                   1402: 
                   1403: The semantic value of the token (if it has one) is stored into the global
                   1404: variable @code{yylval}, which is where the Bison parser will look for it.
                   1405: (The C data type of @code{yylval} is @code{YYSTYPE}, which was defined
                   1406: at the beginning of the grammar; @pxref{Rpcalc Decls, ,Declarations for @code{rpcalc}}.)
                   1407: 
                   1408: A token type code of zero is returned if the end-of-file is encountered.
                   1409: (Bison recognizes any nonpositive value as indicating the end of the
                   1410: input.)
                   1411: 
                   1412: Here is the code for the lexical analyzer:
                   1413: 
                   1414: @example
                   1415: @group
                   1416: /* Lexical analyzer returns a double floating point 
                   1417:    number on the stack and the token NUM, or the ASCII
                   1418:    character read if not a number.  Skips all blanks
                   1419:    and tabs, returns 0 for EOF. */
                   1420: 
                   1421: #include <ctype.h>
                   1422: @end group
                   1423: 
                   1424: @group
                   1425: yylex ()
                   1426: @{
                   1427:   int c;
                   1428: 
                   1429:   /* skip white space  */
                   1430:   while ((c = getchar ()) == ' ' || c == '\t')  
                   1431:     ;
                   1432: @end group
                   1433: @group
                   1434:   /* process numbers   */
                   1435:   if (c == '.' || isdigit (c))                
                   1436:     @{
                   1437:       ungetc (c, stdin);
                   1438:       scanf ("%lf", &yylval);
                   1439:       return NUM;
                   1440:     @}
                   1441: @end group
                   1442: @group
                   1443:   /* return end-of-file  */
                   1444:   if (c == EOF)                            
                   1445:     return 0;
                   1446:   /* return single chars */
                   1447:   return c;                                
                   1448: @}
                   1449: @end group
                   1450: @end example
                   1451: 
                   1452: @node Rpcalc Main, Rpcalc Error, Rpcalc Lexer, RPN Calc
                   1453: @subsection The Controlling Function
                   1454: @cindex controlling function
                   1455: @cindex main function in simple example
                   1456: 
                   1457: In keeping with the spirit of this example, the controlling function is
                   1458: kept to the bare minimum.  The only requirement is that it call
                   1459: @code{yyparse} to start the process of parsing.
                   1460: 
                   1461: @example
                   1462: @group
                   1463: main ()
                   1464: @{
                   1465:   yyparse ();
                   1466: @}
                   1467: @end group
                   1468: @end example
                   1469: 
                   1470: @node Rpcalc Error, Rpcalc Gen, Rpcalc Main, RPN Calc
                   1471: @subsection The Error Reporting Routine
                   1472: @cindex error reporting routine
                   1473: 
                   1474: When @code{yyparse} detects a syntax error, it calls the error reporting
                   1475: function @code{yyerror} to print an error message (usually but not always
                   1476: @code{"parse error"}).  It is up to the programmer to supply @code{yyerror}
                   1477: (@pxref{Interface, ,Parser C-Language Interface}), so here is the definition we will use:
                   1478: 
                   1479: @example
                   1480: @group
                   1481: #include <stdio.h>
                   1482: 
                   1483: yyerror (s)  /* Called by yyparse on error */
                   1484:      char *s;
                   1485: @{
                   1486:   printf ("%s\n", s);
                   1487: @}
                   1488: @end group
                   1489: @end example
                   1490: 
                   1491: After @code{yyerror} returns, the Bison parser may recover from the error
                   1492: and continue parsing if the grammar contains a suitable error rule
                   1493: (@pxref{Error Recovery}).  Otherwise, @code{yyparse} returns nonzero.  We
                   1494: have not written any error rules in this example, so any invalid input will
                   1495: cause the calculator program to exit.  This is not clean behavior for a
                   1496: real calculator, but it is adequate in the first example.
                   1497: 
                   1498: @node Rpcalc Gen, Rpcalc Compile, Rpcalc Error, RPN Calc
                   1499: @subsection Running Bison to Make the Parser
                   1500: @cindex running Bison (introduction)
                   1501: 
                   1502: Before running Bison to produce a parser, we need to decide how to arrange
                   1503: all the source code in one or more source files.  For such a simple example,
                   1504: the easiest thing is to put everything in one file.  The definitions of
                   1505: @code{yylex}, @code{yyerror} and @code{main} go at the end, in the
                   1506: ``additional C code'' section of the file (@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}).
                   1507: 
                   1508: For a large project, you would probably have several source files, and use
                   1509: @code{make} to arrange to recompile them.
                   1510: 
                   1511: With all the source in a single file, you use the following command to
                   1512: convert it into a parser file:
                   1513: 
                   1514: @example
                   1515: bison @var{file_name}.y
                   1516: @end example
                   1517: 
                   1518: @noindent
                   1519: In this example the file was called @file{rpcalc.y} (for ``Reverse Polish
                   1520: CALCulator'').  Bison produces a file named @file{@var{file_name}.tab.c},
                   1521: removing the @samp{.y} from the original file name. The file output by
                   1522: Bison contains the source code for @code{yyparse}.  The additional
                   1523: functions in the input file (@code{yylex}, @code{yyerror} and @code{main})
                   1524: are copied verbatim to the output.
                   1525: 
                   1526: @node Rpcalc Compile,  , Rpcalc Gen, RPN Calc
                   1527: @subsection Compiling the Parser File
                   1528: @cindex compiling the parser
                   1529: 
                   1530: Here is how to compile and run the parser file:
                   1531: 
                   1532: @example
                   1533: @group
                   1534: # @r{List files in current directory.}
                   1535: % ls
                   1536: rpcalc.tab.c  rpcalc.y
                   1537: @end group
                   1538: 
                   1539: @group
                   1540: # @r{Compile the Bison parser.}
                   1541: # @r{@samp{-lm} tells compiler to search math library for @code{pow}.}
                   1542: % cc rpcalc.tab.c -lm -o rpcalc
                   1543: @end group
                   1544: 
                   1545: @group
                   1546: # @r{List files again.}
                   1547: % ls
                   1548: rpcalc  rpcalc.tab.c  rpcalc.y
                   1549: @end group
                   1550: @end example
                   1551: 
                   1552: The file @file{rpcalc} now contains the executable code.  Here is an
                   1553: example session using @code{rpcalc}.
                   1554: 
                   1555: @example
                   1556: % rpcalc
                   1557: 4 9 +
                   1558: 13
                   1559: 3 7 + 3 4 5 *+-
                   1560: -13
                   1561: 3 7 + 3 4 5 * + - n              @r{Note the unary minus, @samp{n}}
                   1562: 13
                   1563: 5 6 / 4 n +
                   1564: -3.166666667
                   1565: 3 4 ^                            @r{Exponentiation}
                   1566: 81
                   1567: ^D                               @r{End-of-file indicator}
                   1568: %
                   1569: @end example
                   1570: 
                   1571: @node Infix Calc, Simple Error Recovery, RPN Calc, Examples
                   1572: @section Infix Notation Calculator: @code{calc}
                   1573: @cindex infix notation calculator
                   1574: @cindex @code{calc}
                   1575: @cindex calculator, infix notation
                   1576: 
                   1577: We now modify rpcalc to handle infix operators instead of postfix.  Infix
                   1578: notation involves the concept of operator precedence and the need for
                   1579: parentheses nested to arbitrary depth.  Here is the Bison code for
                   1580: @file{calc.y}, an infix desk-top calculator.
                   1581: 
                   1582: @example
                   1583: /* Infix notation calculator--calc */
                   1584: 
                   1585: %@{
                   1586: #define YYSTYPE double
                   1587: #include <math.h>
                   1588: %@}
                   1589: 
                   1590: /* BISON Declarations */
                   1591: %token NUM
                   1592: %left '-' '+'
                   1593: %left '*' '/'
                   1594: %left NEG     /* negation--unary minus */
                   1595: %right '^'    /* exponentiation        */
                   1596: 
                   1597: /* Grammar follows */
                   1598: %%
                   1599: input:    /* empty string */
                   1600:         | input line
                   1601: ;
                   1602: 
                   1603: line:     '\n'
                   1604:         | exp '\n'  @{ printf ("\t%.10g\n", $1); @}
                   1605: ;
                   1606: 
                   1607: exp:      NUM                @{ $$ = $1;         @}
                   1608:         | exp '+' exp        @{ $$ = $1 + $3;    @}
                   1609:         | exp '-' exp        @{ $$ = $1 - $3;    @}
                   1610:         | exp '*' exp        @{ $$ = $1 * $3;    @}
                   1611:         | exp '/' exp        @{ $$ = $1 / $3;    @}
                   1612:         | '-' exp  %prec NEG @{ $$ = -$2;        @}
                   1613:         | exp '^' exp        @{ $$ = pow ($1, $3); @}
                   1614:         | '(' exp ')'        @{ $$ = $2;         @}
                   1615: ;
                   1616: %%
                   1617: @end example
                   1618: 
                   1619: @noindent
                   1620: The functions @code{yylex}, @code{yyerror} and @code{main} can be the same
                   1621: as before.
                   1622: 
                   1623: There are two important new features shown in this code.
                   1624: 
                   1625: In the second section (Bison declarations), @code{%left} declares token
                   1626: types and says they are left-associative operators.  The declarations
                   1627: @code{%left} and @code{%right} (right associativity) take the place of
                   1628: @code{%token} which is used to declare a token type name without
                   1629: associativity.  (These tokens are single-character literals, which
                   1630: ordinarily don't need to be declared.  We declare them here to specify
                   1631: the associativity.)
                   1632: 
                   1633: Operator precedence is determined by the line ordering of the
                   1634: declarations; the higher the line number of the declaration (lower on
                   1635: the page or screen), the higher the precedence.  Hence, exponentiation
                   1636: has the highest precedence, unary minus (@code{NEG}) is next, followed
                   1637: by @samp{*} and @samp{/}, and so on.  @xref{Precedence, ,Operator Precedence}.
                   1638: 
                   1639: The other important new feature is the @code{%prec} in the grammar section
                   1640: for the unary minus operator.  The @code{%prec} simply instructs Bison that
                   1641: the rule @samp{| '-' exp} has the same precedence as @code{NEG}---in this
                   1642: case the next-to-highest.  @xref{Contextual Precedence, ,Context-Dependent Precedence}.
                   1643: 
                   1644: Here is a sample run of @file{calc.y}:
                   1645: 
                   1646: @need 500
                   1647: @example
                   1648: % calc
                   1649: 4 + 4.5 - (34/(8*3+-3))
                   1650: 6.880952381
                   1651: -56 + 2
                   1652: -54
                   1653: 3 ^ 2
                   1654: 9
                   1655: @end example
                   1656: 
                   1657: @node Simple Error Recovery, Multi-function Calc, Infix Calc, Examples
                   1658: @section Simple Error Recovery
                   1659: @cindex error recovery, simple
                   1660: 
                   1661: Up to this point, this manual has not addressed the issue of @dfn{error
                   1662: recovery}---how to continue parsing after the parser detects a syntax
                   1663: error.  All we have handled is error reporting with @code{yyerror}.  Recall
                   1664: that by default @code{yyparse} returns after calling @code{yyerror}.  This
                   1665: means that an erroneous input line causes the calculator program to exit.
                   1666: Now we show how to rectify this deficiency.
                   1667: 
                   1668: The Bison language itself includes the reserved word @code{error}, which
                   1669: may be included in the grammar rules.  In the example below it has
                   1670: been added to one of the alternatives for @code{line}:
                   1671: 
                   1672: @example
                   1673: @group
                   1674: line:     '\n'
                   1675:         | exp '\n'   @{ printf ("\t%.10g\n", $1); @}
                   1676:         | error '\n' @{ yyerrok;                  @}
                   1677: ;
                   1678: @end group
                   1679: @end example
                   1680: 
                   1681: This addition to the grammar allows for simple error recovery in the event
                   1682: of a parse error.  If an expression that cannot be evaluated is read, the
                   1683: error will be recognized by the third rule for @code{line}, and parsing
                   1684: will continue.  (The @code{yyerror} function is still called upon to print
                   1685: its message as well.)  The action executes the statement @code{yyerrok}, a
                   1686: macro defined automatically by Bison; its meaning is that error recovery is
                   1687: complete (@pxref{Error Recovery}).  Note the difference between
                   1688: @code{yyerrok} and @code{yyerror}; neither one is a misprint.@refill
                   1689: 
                   1690: This form of error recovery deals with syntax errors.  There are other
                   1691: kinds of errors; for example, division by zero, which raises an exception
                   1692: signal that is normally fatal.  A real calculator program must handle this
                   1693: signal and use @code{longjmp} to return to @code{main} and resume parsing
                   1694: input lines; it would also have to discard the rest of the current line of
                   1695: input.  We won't discuss this issue further because it is not specific to
                   1696: Bison programs.
                   1697: 
                   1698: @node Multi-function Calc, Exercises, Simple Error Recovery, Examples
                   1699: @section Multi-Function Calculator: @code{mfcalc}
                   1700: @cindex multi-function calculator
                   1701: @cindex @code{mfcalc}
                   1702: @cindex calculator, multi-function
                   1703: 
                   1704: Now that the basics of Bison have been discussed, it is time to move on to
                   1705: a more advanced problem.  The above calculators provided only five
                   1706: functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}.  It would
                   1707: be nice to have a calculator that provides other mathematical functions such
                   1708: as @code{sin}, @code{cos}, etc.
                   1709: 
                   1710: It is easy to add new operators to the infix calculator as long as they are
                   1711: only single-character literals.  The lexical analyzer @code{yylex} passes
                   1712: back all non-number characters as tokens, so new grammar rules suffice for
                   1713: adding a new operator.  But we want something more flexible: built-in
                   1714: functions whose syntax has this form:
                   1715: 
                   1716: @example
                   1717: @var{function_name} (@var{argument})
                   1718: @end example
                   1719: 
                   1720: @noindent
                   1721: At the same time, we will add memory to the calculator, by allowing you
                   1722: to create named variables, store values in them, and use them later.
                   1723: Here is a sample session with the multi-function calculator:
                   1724: 
                   1725: @example
                   1726: % acalc
                   1727: pi = 3.141592653589
                   1728: 3.1415926536
                   1729: sin(pi)
                   1730: 0.0000000000
                   1731: alpha = beta1 = 2.3
                   1732: 2.3000000000
                   1733: alpha
                   1734: 2.3000000000
                   1735: ln(alpha)
                   1736: 0.8329091229
                   1737: exp(ln(beta1))
                   1738: 2.3000000000
                   1739: %
                   1740: @end example
                   1741: 
                   1742: Note that multiple assignment and nested function calls are permitted.
                   1743: 
                   1744: @menu
                   1745: * Decl: Mfcalc Decl.      Bison declarations for multi-function calculator.
                   1746: * Rules: Mfcalc Rules.    Grammar rules for the calculator.
                   1747: * Symtab: Mfcalc Symtab.  Symbol table management subroutines.
                   1748: @end menu
                   1749: 
                   1750: @node Mfcalc Decl, Mfcalc Rules,  , Multi-function Calc
                   1751: @subsection Declarations for @code{mfcalc}
                   1752: 
                   1753: Here are the C and Bison declarations for the multi-function calculator.
                   1754: 
                   1755: @smallexample
                   1756: %@{
                   1757: #include <math.h>  /* For math functions, cos(), sin(), etc. */
                   1758: #include "calc.h"  /* Contains definition of `symrec'        */
                   1759: %@}
                   1760: %union @{
                   1761: double     val;  /* For returning numbers.                   */
                   1762: symrec  *tptr;   /* For returning symbol-table pointers      */
                   1763: @}
                   1764: 
                   1765: %token <val>  NUM        /* Simple double precision number   */
                   1766: %token <tptr> VAR FNCT   /* Variable and Function            */
                   1767: %type  <val>  exp
                   1768: 
                   1769: %right '='
                   1770: %left '-' '+'
                   1771: %left '*' '/'
                   1772: %left NEG     /* Negation--unary minus */
                   1773: %right '^'    /* Exponentiation        */
                   1774: 
                   1775: /* Grammar follows */
                   1776: 
                   1777: %%
                   1778: @end smallexample
                   1779: 
                   1780: The above grammar introduces only two new features of the Bison language.
                   1781: These features allow semantic values to have various data types
                   1782: (@pxref{Multiple Types, ,More Than One Value Type}).
                   1783: 
                   1784: The @code{%union} declaration specifies the entire list of possible types;
                   1785: this is instead of defining @code{YYSTYPE}.  The allowable types are now
                   1786: double-floats (for @code{exp} and @code{NUM}) and pointers to entries in
                   1787: the symbol table.  @xref{Union Decl, ,The Collection of Value Types}.
                   1788: 
                   1789: Since values can now have various types, it is necessary to associate a
                   1790: type with each grammar symbol whose semantic value is used.  These symbols
                   1791: are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}.  Their
                   1792: declarations are augmented with information about their data type (placed
                   1793: between angle brackets).
                   1794: 
                   1795: The Bison construct @code{%type} is used for declaring nonterminal symbols,
                   1796: just as @code{%token} is used for declaring token types.  We have not used
                   1797: @code{%type} before because nonterminal symbols are normally declared
                   1798: implicitly by the rules that define them.  But @code{exp} must be declared
                   1799: explicitly so we can specify its value type.  @xref{Type Decl, ,Nonterminal Symbols}.
                   1800: 
                   1801: @node Mfcalc Rules, Mfcalc Symtab, Mfcalc Decl, Multi-function Calc
                   1802: @subsection Grammar Rules for @code{mfcalc}
                   1803: 
                   1804: Here are the grammar rules for the multi-function calculator.
                   1805: Most of them are copied directly from @code{calc}; three rules,
                   1806: those which mention @code{VAR} or @code{FNCT}, are new.
                   1807: 
                   1808: @smallexample
                   1809: input:   /* empty */
                   1810:         | input line
                   1811: ;
                   1812: 
                   1813: line:
                   1814:           '\n'
                   1815:         | exp '\n'   @{ printf ("\t%.10g\n", $1); @}
                   1816:         | error '\n' @{ yyerrok;                  @}
                   1817: ;
                   1818: 
                   1819: exp:      NUM                @{ $$ = $1;                         @}
                   1820:         | VAR                @{ $$ = $1->value.var;              @}
                   1821:         | VAR '=' exp        @{ $$ = $3; $1->value.var = $3;     @}
                   1822:         | FNCT '(' exp ')'   @{ $$ = (*($1->value.fnctptr))($3); @}
                   1823:         | exp '+' exp        @{ $$ = $1 + $3;                    @}
                   1824:         | exp '-' exp        @{ $$ = $1 - $3;                    @}
                   1825:         | exp '*' exp        @{ $$ = $1 * $3;                    @}
                   1826:         | exp '/' exp        @{ $$ = $1 / $3;                    @}
                   1827:         | '-' exp  %prec NEG @{ $$ = -$2;                        @}
                   1828:         | exp '^' exp        @{ $$ = pow ($1, $3);               @}
                   1829:         | '(' exp ')'        @{ $$ = $2;                         @}
                   1830: ;
                   1831: /* End of grammar */
                   1832: %%
                   1833: @end smallexample
                   1834: 
                   1835: @node Mfcalc Symtab,  , Mfcalc Rules, Multi-function Calc
                   1836: @subsection The @code{mfcalc} Symbol Table
                   1837: @cindex symbol table example
                   1838: 
                   1839: The multi-function calculator requires a symbol table to keep track of the
                   1840: names and meanings of variables and functions.  This doesn't affect the
                   1841: grammar rules (except for the actions) or the Bison declarations, but it
                   1842: requires some additional C functions for support.
                   1843: 
                   1844: The symbol table itself consists of a linked list of records.  Its
                   1845: definition, which is kept in the header @file{calc.h}, is as follows.  It
                   1846: provides for either functions or variables to be placed in the table.
                   1847: 
                   1848: @smallexample
                   1849: @group
                   1850: /* Data type for links in the chain of symbols.      */
                   1851: struct symrec
                   1852: @{
                   1853:   char *name;  /* name of symbol                     */
                   1854:   int type;    /* type of symbol: either VAR or FNCT */
                   1855:   union @{
                   1856:     double var;           /* value of a VAR          */
                   1857:     double (*fnctptr)();  /* value of a FNCT         */
                   1858:   @} value;
                   1859:   struct symrec *next;    /* link field              */
                   1860: @};
                   1861: @end group
                   1862: 
                   1863: @group
                   1864: typedef struct symrec symrec;
                   1865: 
                   1866: /* The symbol table: a chain of `struct symrec'.     */
                   1867: extern symrec *sym_table;
                   1868: 
                   1869: symrec *putsym ();
                   1870: symrec *getsym ();
                   1871: @end group
                   1872: @end smallexample
                   1873: 
                   1874: The new version of @code{main} includes a call to @code{init_table}, a
                   1875: function that initializes the symbol table.  Here it is, and
                   1876: @code{init_table} as well:
                   1877: 
                   1878: @smallexample
                   1879: @group
                   1880: #include <stdio.h>
                   1881: 
                   1882: main ()
                   1883: @{
                   1884:   init_table ();
                   1885:   yyparse ();
                   1886: @}
                   1887: @end group
                   1888: 
                   1889: @group
                   1890: yyerror (s)  /* Called by yyparse on error */
                   1891:      char *s;
                   1892: @{
                   1893:   printf ("%s\n", s);
                   1894: @}
                   1895: 
                   1896: struct init
                   1897: @{
                   1898:   char *fname;
                   1899:   double (*fnct)();
                   1900: @};
                   1901: @end group
                   1902: 
                   1903: @group
                   1904: struct init arith_fncts[]
                   1905:   = @{
                   1906:       "sin", sin,
                   1907:       "cos", cos,
                   1908:       "atan", atan,
                   1909:       "ln", log,
                   1910:       "exp", exp,
                   1911:       "sqrt", sqrt,
                   1912:       0, 0
                   1913:     @};
                   1914: 
                   1915: /* The symbol table: a chain of `struct symrec'.  */
                   1916: symrec *sym_table = (symrec *)0;
                   1917: @end group
                   1918: 
                   1919: @group
                   1920: init_table ()  /* puts arithmetic functions in table. */
                   1921: @{
                   1922:   int i;
                   1923:   symrec *ptr;
                   1924:   for (i = 0; arith_fncts[i].fname != 0; i++)
                   1925:     @{
                   1926:       ptr = putsym (arith_fncts[i].fname, FNCT);
                   1927:       ptr->value.fnctptr = arith_fncts[i].fnct;
                   1928:     @}
                   1929: @}
                   1930: @end group
                   1931: @end smallexample
                   1932: 
                   1933: By simply editing the initialization list and adding the necessary include
                   1934: files, you can add additional functions to the calculator.
                   1935: 
                   1936: Two important functions allow look-up and installation of symbols in the
                   1937: symbol table.  The function @code{putsym} is passed a name and the type
                   1938: (@code{VAR} or @code{FNCT}) of the object to be installed.  The object is
                   1939: linked to the front of the list, and a pointer to the object is returned.
                   1940: The function @code{getsym} is passed the name of the symbol to look up.  If
                   1941: found, a pointer to that symbol is returned; otherwise zero is returned.
                   1942: 
                   1943: @smallexample
                   1944: symrec *
                   1945: putsym (sym_name,sym_type)
                   1946:      char *sym_name;
                   1947:      int sym_type;
                   1948: @{
                   1949:   symrec *ptr;
                   1950:   ptr = (symrec *) malloc (sizeof (symrec));
                   1951:   ptr->name = (char *) malloc (strlen (sym_name) + 1);
                   1952:   strcpy (ptr->name,sym_name);
                   1953:   ptr->type = sym_type;
                   1954:   ptr->value.var = 0; /* set value to 0 even if fctn.  */
                   1955:   ptr->next = (struct symrec *)sym_table;
                   1956:   sym_table = ptr;
                   1957:   return ptr;
                   1958: @}
                   1959: 
                   1960: symrec *
                   1961: getsym (sym_name)
                   1962:      char *sym_name;
                   1963: @{
                   1964:   symrec *ptr;
                   1965:   for (ptr = sym_table; ptr != (symrec *) 0;
                   1966:        ptr = (symrec *)ptr->next)
                   1967:     if (strcmp (ptr->name,sym_name) == 0)
                   1968:       return ptr;
                   1969:   return 0;
                   1970: @}
                   1971: @end smallexample
                   1972: 
                   1973: The function @code{yylex} must now recognize variables, numeric values, and
                   1974: the single-character arithmetic operators.  Strings of alphanumeric
                   1975: characters with a leading nondigit are recognized as either variables or
                   1976: functions depending on what the symbol table says about them.
                   1977: 
                   1978: The string is passed to @code{getsym} for look up in the symbol table.  If
                   1979: the name appears in the table, a pointer to its location and its type
                   1980: (@code{VAR} or @code{FNCT}) is returned to @code{yyparse}.  If it is not
                   1981: already in the table, then it is installed as a @code{VAR} using
                   1982: @code{putsym}.  Again, a pointer and its type (which must be @code{VAR}) is
                   1983: returned to @code{yyparse}.@refill
                   1984: 
                   1985: No change is needed in the handling of numeric values and arithmetic
                   1986: operators in @code{yylex}.
                   1987: 
                   1988: @smallexample
                   1989: @group
                   1990: #include <ctype.h>
                   1991: yylex ()
                   1992: @{
                   1993:   int c;
                   1994: 
                   1995:   /* Ignore whitespace, get first nonwhite character.  */
                   1996:   while ((c = getchar ()) == ' ' || c == '\t');
                   1997: 
                   1998:   if (c == EOF)
                   1999:     return 0;
                   2000: @end group
                   2001: 
                   2002: @group
                   2003:   /* Char starts a number => parse the number.         */
                   2004:   if (c == '.' || isdigit (c))
                   2005:     @{
                   2006:       ungetc (c, stdin);
                   2007:       scanf ("%lf", &yylval.val);
                   2008:       return NUM;
                   2009:     @}
                   2010: @end group
                   2011: 
                   2012: @group
                   2013:   /* Char starts an identifier => read the name.       */
                   2014:   if (isalpha (c))
                   2015:     @{
                   2016:       symrec *s;
                   2017:       static char *symbuf = 0;
                   2018:       static int length = 0;
                   2019:       int i;
                   2020: @end group
                   2021: 
                   2022: @group
                   2023:       /* Initially make the buffer long enough
                   2024:          for a 40-character symbol name.  */
                   2025:       if (length == 0)
                   2026:         length = 40, symbuf = (char *)malloc (length + 1);
                   2027: 
                   2028:       i = 0;
                   2029:       do
                   2030: @end group
                   2031: @group
                   2032:         @{
                   2033:           /* If buffer is full, make it bigger.        */
                   2034:           if (i == length)
                   2035:             @{
                   2036:               length *= 2;
                   2037:               symbuf = (char *)realloc (symbuf, length + 1);
                   2038:             @}
                   2039:           /* Add this character to the buffer.         */
                   2040:           symbuf[i++] = c;
                   2041:           /* Get another character.                    */
                   2042:           c = getchar ();
                   2043:         @}
                   2044: @end group
                   2045: @group
                   2046:       while (c != EOF && isalnum (c));
                   2047: 
                   2048:       ungetc (c, stdin);
                   2049:       symbuf[i] = '\0';
                   2050: @end group
                   2051: 
                   2052: @group
                   2053:       s = getsym (symbuf);
                   2054:       if (s == 0)
                   2055:         s = putsym (symbuf, VAR);
                   2056:       yylval.tptr = s;
                   2057:       return s->type;
                   2058:     @}
                   2059: 
                   2060:   /* Any other character is a token by itself.        */
                   2061:   return c;
                   2062: @}
                   2063: @end group
                   2064: @end smallexample
                   2065: 
                   2066: This program is both powerful and flexible. You may easily add new
                   2067: functions, and it is a simple job to modify this code to install predefined
                   2068: variables such as @code{pi} or @code{e} as well.
                   2069: 
                   2070: @node Exercises,  , Multi-function Calc, Examples
                   2071: @section Exercises
                   2072: @cindex exercises
                   2073: 
                   2074: @enumerate
                   2075: @item
                   2076: Add some new functions from @file{math.h} to the initialization list.
                   2077: 
                   2078: @item
                   2079: Add another array that contains constants and their values.  Then
                   2080: modify @code{init_table} to add these constants to the symbol table.
                   2081: It will be easiest to give the constants type @code{VAR}.
                   2082: 
                   2083: @item
                   2084: Make the program report an error if the user refers to an
                   2085: uninitialized variable in any way except to store a value in it.
                   2086: @end enumerate
                   2087: 
                   2088: @node Grammar File, Interface, Examples, Top
                   2089: @chapter Bison Grammar Files
                   2090: 
                   2091: Bison takes as input a context-free grammar specification and produces a
                   2092: C-language function that recognizes correct instances of the grammar.
                   2093: 
                   2094: The Bison grammar input file conventionally has a name ending in @samp{.y}.
                   2095: 
                   2096: @menu
                   2097: * Grammar Outline::   Overall layout of the grammar file.
                   2098: * Symbols::           Terminal and nonterminal symbols.
                   2099: * Rules::             How to write grammar rules.
                   2100: * Recursion::         Writing recursive rules.
                   2101: * Semantics::         Semantic values and actions.
                   2102: * Declarations::      All kinds of Bison declarations are described here.
                   2103: * Multiple Parsers::  Putting more than one Bison parser in one program.
                   2104: @end menu
                   2105: 
                   2106: @node Grammar Outline, Symbols,  , Grammar File
                   2107: @section Outline of a Bison Grammar
                   2108: 
                   2109: A Bison grammar file has four main sections, shown here with the
                   2110: appropriate delimiters:
                   2111: 
                   2112: @example
                   2113: %@{
                   2114: @var{C declarations}
                   2115: %@}
                   2116: 
                   2117: @var{Bison declarations}
                   2118: 
                   2119: %%
                   2120: @var{Grammar rules}
                   2121: %%
                   2122: 
                   2123: @var{Additional C code}
                   2124: @end example
                   2125: 
                   2126: Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
                   2127: 
                   2128: @menu
                   2129: * C Declarations::    Syntax and usage of the C declarations section.
                   2130: * Bison Declarations::  Syntax and usage of the Bison declarations section.
                   2131: * Grammar Rules::     Syntax and usage of the grammar rules section.
                   2132: * C Code::            Syntax and usage of the additional C code section.
                   2133: @end menu
                   2134: 
                   2135: @node C Declarations, Bison Declarations,  , Grammar Outline
                   2136: @subsection The C Declarations Section
                   2137: @cindex C declarations section
                   2138: @cindex declarations, C
                   2139: 
                   2140: The @var{C declarations} section contains macro definitions and
                   2141: declarations of functions and variables that are used in the actions in the
                   2142: grammar rules.  These are copied to the beginning of the parser file so
                   2143: that they precede the definition of @code{yyparse}.  You can use
                   2144: @samp{#include} to get the declarations from a header file.  If you don't
                   2145: need any C declarations, you may omit the @samp{%@{} and @samp{%@}}
                   2146: delimiters that bracket this section.
                   2147: 
                   2148: @node Bison Declarations, Grammar Rules, C Declarations, Grammar Outline
                   2149: @subsection The Bison Declarations Section
                   2150: @cindex Bison declarations (introduction)
                   2151: @cindex declarations, Bison (introduction)
                   2152: 
                   2153: The @var{Bison declarations} section contains declarations that define
                   2154: terminal and nonterminal symbols, specify precedence, and so on.
                   2155: In some simple grammars you may not need any declarations.
                   2156: @xref{Declarations, ,Bison Declarations}.
                   2157: 
                   2158: @node Grammar Rules, C Code, Bison Declarations, Grammar Outline
                   2159: @subsection The Grammar Rules Section
                   2160: @cindex grammar rules section
                   2161: @cindex rules section for grammar
                   2162: 
                   2163: The @dfn{grammar rules} section contains one or more Bison grammar
                   2164: rules, and nothing else.  @xref{Rules, ,Syntax of Grammar Rules}.
                   2165: 
                   2166: There must always be at least one grammar rule, and the first
                   2167: @samp{%%} (which precedes the grammar rules) may never be omitted even
                   2168: if it is the first thing in the file.
                   2169: 
                   2170: @node C Code,  , Grammar Rules, Grammar Outline
                   2171: @subsection The Additional C Code Section
                   2172: @cindex additional C code section
                   2173: @cindex C code, section for additional
                   2174: 
                   2175: The @var{additional C code} section is copied verbatim to the end of
                   2176: the parser file, just as the @var{C declarations} section is copied to
                   2177: the beginning.  This is the most convenient place to put anything
                   2178: that you want to have in the parser file but which need not come before
                   2179: the definition of @code{yyparse}.  For example, the definitions of
                   2180: @code{yylex} and @code{yyerror} often go here.  @xref{Interface, ,Parser C-Language Interface}.
                   2181: 
                   2182: If the last section is empty, you may omit the @samp{%%} that separates it
                   2183: from the grammar rules.
                   2184: 
                   2185: The Bison parser itself contains many static variables whose names start
                   2186: with @samp{yy} and many macros whose names start with @samp{YY}.  It is a
                   2187: good idea to avoid using any such names (except those documented in this
                   2188: manual) in the additional C code section of the grammar file.
                   2189: 
                   2190: @node Symbols, Rules, Grammar Outline, Grammar File
                   2191: @section Symbols, Terminal and Nonterminal
                   2192: @cindex nonterminal symbol
                   2193: @cindex terminal symbol
                   2194: @cindex token type
                   2195: @cindex symbol
                   2196: 
                   2197: @dfn{Symbols} in Bison grammars represent the grammatical classifications
                   2198: of the language.
                   2199: 
                   2200: A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a
                   2201: class of syntactically equivalent tokens.  You use the symbol in grammar
                   2202: rules to mean that a token in that class is allowed.  The symbol is
                   2203: represented in the Bison parser by a numeric code, and the @code{yylex}
                   2204: function returns a token type code to indicate what kind of token has been
                   2205: read.  You don't need to know what the code value is; you can use the
                   2206: symbol to stand for it.
                   2207: 
                   2208: A @dfn{nonterminal symbol} stands for a class of syntactically equivalent
                   2209: groupings.  The symbol name is used in writing grammar rules.  By convention,
                   2210: it should be all lower case.
                   2211: 
                   2212: Symbol names can contain letters, digits (not at the beginning),
                   2213: underscores and periods.  Periods make sense only in nonterminals.
                   2214: 
                   2215: There are two ways of writing terminal symbols in the grammar:
                   2216: 
                   2217: @itemize @bullet
                   2218: @item
                   2219: A @dfn{named token type} is written with an identifier, like an
                   2220: identifier in C.  By convention, it should be all upper case.  Each
                   2221: such name must be defined with a Bison declaration such as
                   2222: @code{%token}.  @xref{Token Decl, ,Token Type Names}.
                   2223: 
                   2224: @item
                   2225: @cindex character token
                   2226: @cindex literal token
                   2227: @cindex single-character literal
                   2228: A @dfn{character token type} (or @dfn{literal token}) is written in
                   2229: the grammar using the same syntax used in C for character constants;
                   2230: for example, @code{'+'} is a character token type.  A character token
                   2231: type doesn't need to be declared unless you need to specify its
                   2232: semantic value data type (@pxref{Value Type, ,Data Types of Semantic Values}), associativity, or
                   2233: precedence (@pxref{Precedence, ,Operator Precedence}).
                   2234: 
                   2235: By convention, a character token type is used only to represent a
                   2236: token that consists of that particular character.  Thus, the token
                   2237: type @code{'+'} is used to represent the character @samp{+} as a
                   2238: token.  Nothing enforces this convention, but if you depart from it,
                   2239: your program will confuse other readers.
                   2240: 
                   2241: All the usual escape sequences used in character literals in C can be
                   2242: used in Bison as well, but you must not use the null character as a
                   2243: character literal because its ASCII code, zero, is the code
                   2244: @code{yylex} returns for end-of-input (@pxref{Calling Convention, ,Calling Convention for @code{yylex}}).
                   2245: @end itemize
                   2246: 
                   2247: How you choose to write a terminal symbol has no effect on its
                   2248: grammatical meaning.  That depends only on where it appears in rules and
                   2249: on when the parser function returns that symbol.
                   2250: 
                   2251: The value returned by @code{yylex} is always one of the terminal symbols
                   2252: (or 0 for end-of-input).  Whichever way you write the token type in the
                   2253: grammar rules, you write it the same way in the definition of @code{yylex}.
                   2254: The numeric code for a character token type is simply the ASCII code for
                   2255: the character, so @code{yylex} can use the identical character constant to
                   2256: generate the requisite code.  Each named token type becomes a C macro in
                   2257: the parser file, so @code{yylex} can use the name to stand for the code.
                   2258: (This is why periods don't make sense in terminal symbols.)  
                   2259: @xref{Calling Convention, ,Calling Convention for @code{yylex}}.
                   2260: 
                   2261: If @code{yylex} is defined in a separate file, you need to arrange for the
                   2262: token-type macro definitions to be available there.  Use the @samp{-d}
                   2263: option when you run Bison, so that it will write these macro definitions
                   2264: into a separate header file @file{@var{name}.tab.h} which you can include
                   2265: in the other source files that need it.  @xref{Invocation, ,Invoking Bison}.
                   2266: 
                   2267: The symbol @code{error} is a terminal symbol reserved for error recovery
                   2268: (@pxref{Error Recovery}); you shouldn't use it for any other purpose.
                   2269: In particular, @code{yylex} should never return this value.
                   2270: 
                   2271: @node Rules, Recursion, Symbols, Grammar File
                   2272: @section Syntax of Grammar Rules
                   2273: @cindex rule syntax
                   2274: @cindex grammar rule syntax
                   2275: @cindex syntax of grammar rules
                   2276: 
                   2277: A Bison grammar rule has the following general form:
                   2278: 
                   2279: @example
                   2280: @var{result}: @var{components}@dots{}
                   2281:         ;
                   2282: @end example
                   2283: 
                   2284: @noindent
                   2285: where @var{result} is the nonterminal symbol that this rule describes
                   2286: and @var{components} are various terminal and nonterminal symbols that
                   2287: are put together by this rule (@pxref{Symbols}).  
                   2288: 
                   2289: For example,
                   2290: 
                   2291: @example
                   2292: @group
                   2293: exp:      exp '+' exp
                   2294:         ;
                   2295: @end group
                   2296: @end example
                   2297: 
                   2298: @noindent
                   2299: says that two groupings of type @code{exp}, with a @samp{+} token in between,
                   2300: can be combined into a larger grouping of type @code{exp}.
                   2301: 
                   2302: Whitespace in rules is significant only to separate symbols.  You can add
                   2303: extra whitespace as you wish.
                   2304: 
                   2305: Scattered among the components can be @var{actions} that determine
                   2306: the semantics of the rule.  An action looks like this:
                   2307: 
                   2308: @example
                   2309: @{@var{C statements}@}
                   2310: @end example
                   2311: 
                   2312: @noindent
                   2313: Usually there is only one action and it follows the components.
                   2314: @xref{Actions}.
                   2315: 
                   2316: @findex |
                   2317: Multiple rules for the same @var{result} can be written separately or can
                   2318: be joined with the vertical-bar character @samp{|} as follows:
                   2319: 
                   2320: @ifinfo
                   2321: @example
                   2322: @var{result}:   @var{rule1-components}@dots{}
                   2323:         | @var{rule2-components}@dots{}
                   2324:         @dots{}
                   2325:         ;
                   2326: @end example
                   2327: @end ifinfo
                   2328: @iftex
                   2329: @example
                   2330: @group
                   2331: @var{result}:    @var{rule1-components}@dots{}
                   2332:         | @var{rule2-components}@dots{}
                   2333:         @dots{}
                   2334:         ;
                   2335: @end group
                   2336: @end example
                   2337: @end iftex
                   2338: 
                   2339: @noindent
                   2340: They are still considered distinct rules even when joined in this way.
                   2341: 
                   2342: If @var{components} in a rule is empty, it means that @var{result} can
                   2343: match the empty string.  For example, here is how to define a
                   2344: comma-separated sequence of zero or more @code{exp} groupings:
                   2345: 
                   2346: @example
                   2347: @group
                   2348: expseq:   /* empty */
                   2349:         | expseq1
                   2350:         ;
                   2351: @end group
                   2352: 
                   2353: @group
                   2354: expseq1:  exp
                   2355:         | expseq1 ',' exp
                   2356:         ;
                   2357: @end group
                   2358: @end example
                   2359: 
                   2360: @noindent
                   2361: It is customary to write a comment @samp{/* empty */} in each rule
                   2362: with no components.
                   2363: 
                   2364: @node Recursion, Semantics, Rules, Grammar File
                   2365: @section Recursive Rules
                   2366: @cindex recursive rule
                   2367: 
                   2368: A rule is called @dfn{recursive} when its @var{result} nonterminal appears
                   2369: also on its right hand side.  Nearly all Bison grammars need to use
                   2370: recursion, because that is the only way to define a sequence of any number
                   2371: of somethings.  Consider this recursive definition of a comma-separated
                   2372: sequence of one or more expressions:
                   2373: 
                   2374: @example
                   2375: @group
                   2376: expseq1:  exp
                   2377:         | expseq1 ',' exp
                   2378:         ;
                   2379: @end group
                   2380: @end example
                   2381: 
                   2382: @cindex left recursion
                   2383: @cindex right recursion
                   2384: @noindent
                   2385: Since the recursive use of @code{expseq1} is the leftmost symbol in the
                   2386: right hand side, we call this @dfn{left recursion}.  By contrast, here
                   2387: the same construct is defined using @dfn{right recursion}:
                   2388: 
                   2389: @example
                   2390: @group
                   2391: expseq1:  exp
                   2392:         | exp ',' expseq1
                   2393:         ;
                   2394: @end group
                   2395: @end example
                   2396: 
                   2397: @noindent
                   2398: Any kind of sequence can be defined using either left recursion or
                   2399: right recursion, but you should always use left recursion, because it
                   2400: can parse a sequence of any number of elements with bounded stack
                   2401: space.  Right recursion uses up space on the Bison stack in proportion
                   2402: to the number of elements in the sequence, because all the elements
                   2403: must be shifted onto the stack before the rule can be applied even
                   2404: once.  @xref{Algorithm, ,The Bison Parser Algorithm }, for
                   2405: further explanation of this.
                   2406: 
                   2407: @cindex mutual recursion
                   2408: @dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
                   2409: rule does not appear directly on its right hand side, but does appear
                   2410: in rules for other nonterminals which do appear on its right hand
                   2411: side.  
                   2412: 
                   2413: For example:
                   2414: 
                   2415: @example
                   2416: @group
                   2417: expr:     primary
                   2418:         | primary '+' primary
                   2419:         ;
                   2420: @end group
                   2421: 
                   2422: @group
                   2423: primary:  constant
                   2424:         | '(' expr ')'
                   2425:         ;
                   2426: @end group
                   2427: @end example
                   2428: 
                   2429: @noindent
                   2430: defines two mutually-recursive nonterminals, since each refers to the
                   2431: other.
                   2432: 
                   2433: @node Semantics, Declarations, Recursion, Grammar File
                   2434: @section Defining Language Semantics
                   2435: @cindex defining language semantics
                   2436: @cindex language semantics, defining 
                   2437: 
                   2438: The grammar rules for a language determine only the syntax.  The semantics
                   2439: are determined by the semantic values associated with various tokens and
                   2440: groupings, and by the actions taken when various groupings are recognized.
                   2441: 
                   2442: For example, the calculator calculates properly because the value
                   2443: associated with each expression is the proper number; it adds properly
                   2444: because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
                   2445: the numbers associated with @var{x} and @var{y}.
                   2446: 
                   2447: @menu
                   2448: * Value Type::        Specifying one data type for all semantic values.
                   2449: * Multiple Types::    Specifying several alternative data types.
                   2450: * Actions::           An action is the semantic definition of a grammar rule.
                   2451: * Action Types::      Specifying data types for actions to operate on.
                   2452: * Mid-Rule Actions::  Most actions go at the end of a rule.
                   2453:                       This says when, why and how to use the exceptional
                   2454:                         action in the middle of a rule.
                   2455: @end menu
                   2456: 
                   2457: @node Value Type, Multiple Types,  , Semantics
                   2458: @subsection Data Types of Semantic Values
                   2459: @cindex semantic value type
                   2460: @cindex value type, semantic
                   2461: @cindex data types of semantic values
                   2462: @cindex default data type
                   2463: 
                   2464: In a simple program it may be sufficient to use the same data type for
                   2465: the semantic values of all language constructs.  This was true in the
                   2466: RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish Notation Calculator}).
                   2467: 
                   2468: Bison's default is to use type @code{int} for all semantic values.  To
                   2469: specify some other type, define @code{YYSTYPE} as a macro, like this:
                   2470: 
                   2471: @example
                   2472: #define YYSTYPE double
                   2473: @end example
                   2474: 
                   2475: @noindent
                   2476: This macro definition must go in the C declarations section of the grammar
                   2477: file (@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
                   2478: 
                   2479: @node Multiple Types, Actions, Value Type, Semantics
                   2480: @subsection More Than One Value Type
                   2481: 
                   2482: In most programs, you will need different data types for different kinds
                   2483: of tokens and groupings.  For example, a numeric constant may need type
                   2484: @code{int} or @code{long}, while a string constant needs type @code{char *},
                   2485: and an identifier might need a pointer to an entry in the symbol table.
                   2486: 
                   2487: To use more than one data type for semantic values in one parser, Bison
                   2488: requires you to do two things:
                   2489: 
                   2490: @itemize @bullet
                   2491: @item
                   2492: Specify the entire collection of possible data types, with the
                   2493: @code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of Value Types}).
                   2494: 
                   2495: @item
                   2496: Choose one of those types for each symbol (terminal or nonterminal)
                   2497: for which semantic values are used.  This is done for tokens with the
                   2498: @code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) and for groupings
                   2499: with the @code{%type} Bison declaration (@pxref{Type Decl, ,Nonterminal Symbols}).
                   2500: @end itemize
                   2501: 
                   2502: @node Actions, Action Types, Multiple Types, Semantics
                   2503: @subsection Actions
                   2504: @cindex action
                   2505: @vindex $$
                   2506: @vindex $@var{n}
                   2507: 
                   2508: An action accompanies a syntactic rule and contains C code to be executed
                   2509: each time an instance of that rule is recognized.  The task of most actions
                   2510: is to compute a semantic value for the grouping built by the rule from the
                   2511: semantic values associated with tokens or smaller groupings.
                   2512: 
                   2513: An action consists of C statements surrounded by braces, much like a
                   2514: compound statement in C.  It can be placed at any position in the rule; it
                   2515: is executed at that position.  Most rules have just one action at the end
                   2516: of the rule, following all the components.  Actions in the middle of a rule
                   2517: are tricky and used only for special purposes (@pxref{Mid-Rule Actions, ,Actions in Mid-Rule}).
                   2518: 
                   2519: The C code in an action can refer to the semantic values of the components
                   2520: matched by the rule with the construct @code{$@var{n}}, which stands for
                   2521: the value of the @var{n}th component.  The semantic value for the grouping
                   2522: being constructed is @code{$$}.  (Bison translates both of these constructs
                   2523: into array element references when it copies the actions into the parser
                   2524: file.)
                   2525: 
                   2526: Here is a typical example:
                   2527: 
                   2528: @example
                   2529: @group
                   2530: exp:    @dots{}
                   2531:         | exp '+' exp
                   2532:             @{ $$ = $1 + $3; @}
                   2533: @end group
                   2534: @end example
                   2535: 
                   2536: @noindent
                   2537: This rule constructs an @code{exp} from two smaller @code{exp} groupings
                   2538: connected by a plus-sign token.  In the action, @code{$1} and @code{$3}
                   2539: refer to the semantic values of the two component @code{exp} groupings,
                   2540: which are the first and third symbols on the right hand side of the rule.
                   2541: The sum is stored into @code{$$} so that it becomes the semantic value of
                   2542: the addition-expression just recognized by the rule.  If there were a
                   2543: useful semantic value associated with the @samp{+} token, it could be
                   2544: referred to as @code{$2}.@refill
                   2545: 
                   2546: @cindex default action
                   2547: If you don't specify an action for a rule, Bison supplies a default:
                   2548: @w{@code{$$ = $1}.}  Thus, the value of the first symbol in the rule becomes
                   2549: the value of the whole rule.  Of course, the default rule is valid only
                   2550: if the two data types match.  There is no meaningful default action for
                   2551: an empty rule; every empty rule must have an explicit action unless the
                   2552: rule's value does not matter.
                   2553: 
                   2554: @code{$@var{n}} with @var{n} zero or negative is allowed for reference
                   2555: to tokens and groupings on the stack @emph{before} those that match the
                   2556: current rule.  This is a very risky practice, and to use it reliably
                   2557: you must be certain of the context in which the rule is applied.  Here
                   2558: is a case in which you can use this reliably:
                   2559: 
                   2560: @example
                   2561: @group
                   2562: foo:      expr bar '+' expr  @{ @dots{} @}
                   2563:         | expr bar '-' expr  @{ @dots{} @}
                   2564:         ;
                   2565: @end group
                   2566: 
                   2567: @group
                   2568: bar:      /* empty */
                   2569:         @{ previous_expr = $0; @}
                   2570:         ;
                   2571: @end group
                   2572: @end example
                   2573: 
                   2574: As long as @code{bar} is used only in the fashion shown here, @code{$0}
                   2575: always refers to the @code{expr} which precedes @code{bar} in the
                   2576: definition of @code{foo}.
                   2577: 
                   2578: @node Action Types, Mid-Rule Actions, Actions, Semantics
                   2579: @subsection Data Types of Values in Actions
                   2580: @cindex action data types
                   2581: @cindex data types in actions
                   2582: 
                   2583: If you have chosen a single data type for semantic values, the @code{$$}
                   2584: and @code{$@var{n}} constructs always have that data type.
                   2585: 
                   2586: If you have used @code{%union} to specify a variety of data types, then you
                   2587: must declare a choice among these types for each terminal or nonterminal
                   2588: symbol that can have a semantic value.  Then each time you use @code{$$} or
                   2589: @code{$@var{n}}, its data type is determined by which symbol it refers to
                   2590: in the rule.  In this example,@refill
                   2591: 
                   2592: @example
                   2593: @group
                   2594: exp:    @dots{}
                   2595:         | exp '+' exp
                   2596:             @{ $$ = $1 + $3; @}
                   2597: @end group
                   2598: @end example
                   2599: 
                   2600: @noindent
                   2601: @code{$1} and @code{$3} refer to instances of @code{exp}, so they all
                   2602: have the data type declared for the nonterminal symbol @code{exp}.  If
                   2603: @code{$2} were used, it would have the data type declared for the
                   2604: terminal symbol @code{'+'}, whatever that might be.@refill
                   2605: 
                   2606: Alternatively, you can specify the data type when you refer to the value,
                   2607: by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
                   2608: reference.  For example, if you have defined types as shown here:
                   2609: 
                   2610: @example
                   2611: @group
                   2612: %union @{
                   2613:   int itype;
                   2614:   double dtype;
                   2615: @}
                   2616: @end group
                   2617: @end example
                   2618: 
                   2619: @noindent
                   2620: then you can write @code{$<itype>1} to refer to the first subunit of the
                   2621: rule as an integer, or @code{$<dtype>1} to refer to it as a double.
                   2622: 
                   2623: @node Mid-Rule Actions,  , Action Types, Semantics
                   2624: @subsection Actions in Mid-Rule
                   2625: @cindex actions in mid-rule
                   2626: @cindex mid-rule actions
                   2627: 
                   2628: Occasionally it is useful to put an action in the middle of a rule.
                   2629: These actions are written just like usual end-of-rule actions, but they
                   2630: are executed before the parser even recognizes the following components.
                   2631: 
                   2632: A mid-rule action may refer to the components preceding it using
                   2633: @code{$@var{n}}, but it may not refer to subsequent components because
                   2634: it is run before they are parsed.
                   2635: 
                   2636: The mid-rule action itself counts as one of the components of the rule.
                   2637: This makes a difference when there is another action later in the same rule
                   2638: (and usually there is another at the end): you have to count the actions
                   2639: along with the symbols when working out which number @var{n} to use in
                   2640: @code{$@var{n}}.
                   2641: 
                   2642: The mid-rule action can also have a semantic value.  The action can set
                   2643: its value with an assignment to @code{$$}, and actions later in the rule
                   2644: can refer to the value using @code{$@var{n}}.  Since there is no symbol
                   2645: to name the action, there is no way to declare a data type for the value
                   2646: in advance, so you must use the @samp{$<@dots{}>} construct to specify a
                   2647: data type each time you refer to this value.
                   2648: 
                   2649: There is no way to set the value of the entire rule with a mid-rule
                   2650: action, because assignments to @code{$$} do not have that effect.  The
                   2651: only way to set the value for the entire rule is with an ordinary action
                   2652: at the end of the rule.
                   2653: 
                   2654: Here is an example from a hypothetical compiler, handling a @code{let}
                   2655: statement that looks like @samp{let (@var{variable}) @var{statement}} and
                   2656: serves to create a variable named @var{variable} temporarily for the
                   2657: duration of @var{statement}.  To parse this construct, we must put
                   2658: @var{variable} into the symbol table while @var{statement} is parsed, then
                   2659: remove it afterward.  Here is how it is done:
                   2660: 
                   2661: @example
                   2662: @group
                   2663: stmt:   LET '(' var ')'
                   2664:                 @{ $<context>$ = push_context ();
                   2665:                   declare_variable ($3); @}
                   2666:         stmt    @{ $$ = $6;
                   2667:                   pop_context ($<context>5); @}
                   2668: @end group
                   2669: @end example
                   2670: 
                   2671: @noindent
                   2672: As soon as @samp{let (@var{variable})} has been recognized, the first
                   2673: action is run.  It saves a copy of the current semantic context (the
                   2674: list of accessible variables) as its semantic value, using alternative
                   2675: @code{context} in the data-type union.  Then it calls
                   2676: @code{declare_variable} to add the new variable to that list.  Once the
                   2677: first action is finished, the embedded statement @code{stmt} can be
                   2678: parsed.  Note that the mid-rule action is component number 5, so the
                   2679: @samp{stmt} is component number 6.
                   2680: 
                   2681: After the embedded statement is parsed, its semantic value becomes the
                   2682: value of the entire @code{let}-statement.  Then the semantic value from the
                   2683: earlier action is used to restore the prior list of variables.  This
                   2684: removes the temporary @code{let}-variable from the list so that it won't
                   2685: appear to exist while the rest of the program is parsed.
                   2686: 
                   2687: Taking action before a rule is completely recognized often leads to
                   2688: conflicts since the parser must commit to a parse in order to execute the
                   2689: action.  For example, the following two rules, without mid-rule actions,
                   2690: can coexist in a working parser because the parser can shift the open-brace
                   2691: token and look at what follows before deciding whether there is a
                   2692: declaration or not:
                   2693: 
                   2694: @example
                   2695: @group
                   2696: compound: '@{' declarations statements '@}'
                   2697:         | '@{' statements '@}'
                   2698:         ;
                   2699: @end group
                   2700: @end example
                   2701: 
                   2702: @noindent
                   2703: But when we add a mid-rule action as follows, the rules become nonfunctional:
                   2704: 
                   2705: @example
                   2706: @group
                   2707: compound: @{ prepare_for_local_variables (); @}
                   2708:           '@{' declarations statements '@}'
                   2709: @end group
                   2710: @group
                   2711:         | '@{' statements '@}'
                   2712:         ;
                   2713: @end group
                   2714: @end example
                   2715: 
                   2716: @noindent
                   2717: Now the parser is forced to decide whether to run the mid-rule action
                   2718: when it has read no farther than the open-brace.  In other words, it
                   2719: must commit to using one rule or the other, without sufficient
                   2720: information to do it correctly.  (The open-brace token is what is called
                   2721: the @dfn{look-ahead} token at this time, since the parser is still
                   2722: deciding what to do about it.  @xref{Look-Ahead, ,Look-Ahead Tokens}.)
                   2723: 
                   2724: You might think that you could correct the problem by putting identical
                   2725: actions into the two rules, like this:
                   2726: 
                   2727: @example
                   2728: @group
                   2729: compound: @{ prepare_for_local_variables (); @}
                   2730:           '@{' declarations statements '@}'
                   2731:         | @{ prepare_for_local_variables (); @}
                   2732:           '@{' statements '@}'
                   2733:         ;
                   2734: @end group
                   2735: @end example
                   2736: 
                   2737: @noindent
                   2738: But this does not help, because Bison does not realize that the two actions
                   2739: are identical.  (Bison never tries to understand the C code in an action.)
                   2740: 
                   2741: If the grammar is such that a declaration can be distinguished from a
                   2742: statement by the first token (which is true in C), then one solution which
                   2743: does work is to put the action after the open-brace, like this:
                   2744: 
                   2745: @example
                   2746: @group
                   2747: compound: '@{' @{ prepare_for_local_variables (); @}
                   2748:           declarations statements '@}'
                   2749:         | '@{' statements '@}'
                   2750:         ;
                   2751: @end group
                   2752: @end example
                   2753: 
                   2754: @noindent
                   2755: Now the first token of the following declaration or statement,
                   2756: which would in any case tell Bison which rule to use, can still do so.
                   2757: 
                   2758: Another solution is to bury the action inside a nonterminal symbol which
                   2759: serves as a subroutine:
                   2760: 
                   2761: @example
                   2762: @group
                   2763: subroutine: /* empty */
                   2764:           @{ prepare_for_local_variables (); @}
                   2765:         ;
                   2766: 
                   2767: @end group
                   2768: 
                   2769: @group
                   2770: compound: subroutine
                   2771:           '@{' declarations statements '@}'
                   2772:         | subroutine
                   2773:           '@{' statements '@}'
                   2774:         ;
                   2775: @end group
                   2776: @end example
                   2777: 
                   2778: @noindent
                   2779: Now Bison can execute the action in the rule for @code{subroutine} without
                   2780: deciding which rule for @code{compound} it will eventually use.  Note that
                   2781: the action is now at the end of its rule.  Any mid-rule action can be
                   2782: converted to an end-of-rule action in this way, and this is what Bison
                   2783: actually does to implement mid-rule actions.
                   2784: 
                   2785: @node Declarations, Multiple Parsers, Semantics, Grammar File
                   2786: @section Bison Declarations
                   2787: @cindex declarations, Bison
                   2788: @cindex Bison declarations
                   2789: 
                   2790: The @dfn{Bison declarations} section of a Bison grammar defines the symbols
                   2791: used in formulating the grammar and the data types of semantic values.
                   2792: @xref{Symbols}.
                   2793: 
                   2794: All token type names (but not single-character literal tokens such as
                   2795: @code{'+'} and @code{'*'}) must be declared.  Nonterminal symbols must be
                   2796: declared if you need to specify which data type to use for the semantic
                   2797: value (@pxref{Multiple Types, ,More Than One Value Type}).
                   2798: 
                   2799: The first rule in the file also specifies the start symbol, by default.
                   2800: If you want some other symbol to be the start symbol, you must declare
                   2801: it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
                   2802: 
                   2803: @menu
                   2804: * Token Decl::        Declaring terminal symbols.
                   2805: * Precedence Decl::   Declaring terminals with precedence and associativity.
                   2806: * Union Decl::        Declaring the set of all semantic value types.
                   2807: * Type Decl::         Declaring the choice of type for a nonterminal symbol.
                   2808: * Expect Decl::       Suppressing warnings about shift/reduce conflicts.
                   2809: * Start Decl::        Specifying the start symbol.
                   2810: * Pure Decl::         Requesting a reentrant parser.
                   2811: * Decl Summary::      Table of all Bison declarations.
                   2812: @end menu
                   2813: 
                   2814: @node Token Decl, Precedence Decl,  , Declarations
                   2815: @subsection Token Type Names
                   2816: @cindex declaring token type names
                   2817: @cindex token type names, declaring
                   2818: @findex %token
                   2819: 
                   2820: The basic way to declare a token type name (terminal symbol) is as follows:
                   2821: 
                   2822: @example
                   2823: %token @var{name}
                   2824: @end example
                   2825: 
                   2826: Bison will convert this into a @code{#define} directive in
                   2827: the parser, so that the function @code{yylex} (if it is in this file)
                   2828: can use the name @var{name} to stand for this token type's code.
                   2829: 
                   2830: Alternatively, you can use @code{%left}, @code{%right}, or @code{%nonassoc}
                   2831: instead of @code{%token}, if you wish to specify precedence.
                   2832: @xref{Precedence Decl, ,Operator Precedence}.
                   2833: 
                   2834: You can explicitly specify the numeric code for a token type by appending
                   2835: an integer value in the field immediately following the token name:
                   2836: 
                   2837: @example
                   2838: %token NUM 300
                   2839: @end example
                   2840: 
                   2841: @noindent
                   2842: It is generally best, however, to let Bison choose the numeric codes for
                   2843: all token types.  Bison will automatically select codes that don't conflict
                   2844: with each other or with ASCII characters.
                   2845: 
                   2846: In the event that the stack type is a union, you must augment the
                   2847: @code{%token} or other token declaration to include the data type
                   2848: alternative delimited by angle-brackets (@pxref{Multiple Types, ,More Than One Value Type}).  
                   2849: 
                   2850: For example:
                   2851: 
                   2852: @example
                   2853: @group
                   2854: %union @{              /* define stack type */
                   2855:   double val;
                   2856:   symrec *tptr;
                   2857: @}
                   2858: %token <val> NUM      /* define token NUM and its type */
                   2859: @end group
                   2860: @end example
                   2861: 
                   2862: @node Precedence Decl, Union Decl, Token Decl, Declarations
                   2863: @subsection Operator Precedence
                   2864: @cindex precedence declarations
                   2865: @cindex declaring operator precedence
                   2866: @cindex operator precedence, declaring
                   2867: 
                   2868: Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to
                   2869: declare a token and specify its precedence and associativity, all at
                   2870: once.  These are called @dfn{precedence declarations}.
                   2871: @xref{Precedence, ,Operator Precedence}, for general information on operator precedence.
                   2872: 
                   2873: The syntax of a precedence declaration is the same as that of
                   2874: @code{%token}: either
                   2875: 
                   2876: @example
                   2877: %left @var{symbols}@dots{}
                   2878: @end example
                   2879: 
                   2880: @noindent
                   2881: or
                   2882: 
                   2883: @example
                   2884: %left <@var{type}> @var{symbols}@dots{}
                   2885: @end example
                   2886: 
                   2887: And indeed any of these declarations serves the purposes of @code{%token}.
                   2888: But in addition, they specify the associativity and relative precedence for
                   2889: all the @var{symbols}:
                   2890: 
                   2891: @itemize @bullet
                   2892: @item
                   2893: The associativity of an operator @var{op} determines how repeated uses
                   2894: of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op}
                   2895: @var{z}} is parsed by grouping @var{x} with @var{y} first or by
                   2896: grouping @var{y} with @var{z} first.  @code{%left} specifies
                   2897: left-associativity (grouping @var{x} with @var{y} first) and
                   2898: @code{%right} specifies right-associativity (grouping @var{y} with
                   2899: @var{z} first).  @code{%nonassoc} specifies no associativity, which
                   2900: means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is
                   2901: considered a syntax error.
                   2902: 
                   2903: @item
                   2904: The precedence of an operator determines how it nests with other operators.
                   2905: All the tokens declared in a single precedence declaration have equal
                   2906: precedence and nest together according to their associativity.
                   2907: When two tokens declared in different precedence declarations associate,
                   2908: the one declared later has the higher precedence and is grouped first.
                   2909: @end itemize
                   2910: 
                   2911: @node Union Decl, Type Decl, Precedence Decl, Declarations
                   2912: @subsection The Collection of Value Types
                   2913: @cindex declaring value types
                   2914: @cindex value types, declaring
                   2915: @findex %union
                   2916: 
                   2917: The @code{%union} declaration specifies the entire collection of possible
                   2918: data types for semantic values.  The keyword @code{%union} is followed by a
                   2919: pair of braces containing the same thing that goes inside a @code{union} in
                   2920: C.  
                   2921: 
                   2922: For example:
                   2923: 
                   2924: @example
                   2925: @group
                   2926: %union @{
                   2927:   double val;
                   2928:   symrec *tptr;
                   2929: @}
                   2930: @end group
                   2931: @end example
                   2932: 
                   2933: @noindent
                   2934: This says that the two alternative types are @code{double} and @code{symrec
                   2935: *}.  They are given names @code{val} and @code{tptr}; these names are used
                   2936: in the @code{%token} and @code{%type} declarations to pick one of the types
                   2937: for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
                   2938: 
                   2939: Note that, unlike making a @code{union} declaration in C, you do not write
                   2940: a semicolon after the closing brace.
                   2941: 
                   2942: @node Type Decl, Expect Decl, Union Decl, Declarations
                   2943: @subsection Nonterminal Symbols
                   2944: @cindex declaring value types, nonterminals
                   2945: @cindex value types, nonterminals, declaring
                   2946: @findex %type
                   2947: 
                   2948: @noindent
                   2949: When you use @code{%union} to specify multiple value types, you must
                   2950: declare the value type of each nonterminal symbol for which values are
                   2951: used.  This is done with a @code{%type} declaration, like this:
                   2952: 
                   2953: @example
                   2954: %type <@var{type}> @var{nonterminal}@dots{}
                   2955: @end example
                   2956: 
                   2957: @noindent
                   2958: Here @var{nonterminal} is the name of a nonterminal symbol, and @var{type}
                   2959: is the name given in the @code{%union} to the alternative that you want
                   2960: (@pxref{Union Decl, ,The Collection of Value Types}).  You can give any number of nonterminal symbols in
                   2961: the same @code{%type} declaration, if they have the same value type.  Use
                   2962: spaces to separate the symbol names.
                   2963: 
                   2964: @node Expect Decl, Start Decl, Type Decl, Declarations
                   2965: @subsection Suppressing Conflict Warnings
                   2966: @cindex suppressing conflict warnings
                   2967: @cindex preventing warnings about conflicts
                   2968: @cindex warnings, preventing
                   2969: @cindex conflicts, suppressing warnings of
                   2970: @findex %expect
                   2971: 
                   2972: Bison normally warns if there are any conflicts in the grammar
                   2973: (@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars have harmless shift/reduce
                   2974: conflicts which are resolved in a predictable way and would be difficult to
                   2975: eliminate.  It is desirable to suppress the warning about these conflicts
                   2976: unless the number of conflicts changes.  You can do this with the
                   2977: @code{%expect} declaration.
                   2978: 
                   2979: The declaration looks like this:
                   2980: 
                   2981: @example
                   2982: %expect @var{n}
                   2983: @end example
                   2984: 
                   2985: Here @var{n} is a decimal integer.  The declaration says there should be no
                   2986: warning if there are @var{n} shift/reduce conflicts and no reduce/reduce
                   2987: conflicts.  The usual warning is given if there are either more or fewer
                   2988: conflicts, or if there are any reduce/reduce conflicts.
                   2989: 
                   2990: In general, using @code{%expect} involves these steps:
                   2991: 
                   2992: @itemize @bullet
                   2993: @item
                   2994: Compile your grammar without @code{%expect}.  Use the @samp{-v} option
                   2995: to get a verbose list of where the conflicts occur.  Bison will also
                   2996: print the number of conflicts.
                   2997: 
                   2998: @item
                   2999: Check each of the conflicts to make sure that Bison's default
                   3000: resolution is what you really want.  If not, rewrite the grammar and
                   3001: go back to the beginning.
                   3002: 
                   3003: @item
                   3004: Add an @code{%expect} declaration, copying the number @var{n} from the
                   3005: number which Bison printed.
                   3006: @end itemize
                   3007: 
                   3008: Now Bison will stop annoying you about the conflicts you have checked, but
                   3009: it will warn you again if changes in the grammar result in additional
                   3010: conflicts.
                   3011: 
                   3012: @node Start Decl, Pure Decl, Expect Decl, Declarations
                   3013: @subsection The Start-Symbol
                   3014: @cindex declaring the start symbol
                   3015: @cindex start symbol, declaring
                   3016: @cindex default start symbol
                   3017: @findex %start
                   3018: 
                   3019: Bison assumes by default that the start symbol for the grammar is the first
                   3020: nonterminal specified in the grammar specification section.  The programmer
                   3021: may override this restriction with the @code{%start} declaration as follows:
                   3022: 
                   3023: @example
                   3024: %start @var{symbol}
                   3025: @end example
                   3026: 
                   3027: @node Pure Decl, Decl Summary, Start Decl, Declarations
                   3028: @subsection A Pure (Reentrant) Parser
                   3029: @cindex reentrant parser
                   3030: @cindex pure parser
                   3031: @findex %pure_parser
                   3032: 
                   3033: A @dfn{reentrant} program is one which does not alter in the course of
                   3034: execution; in other words, it consists entirely of @dfn{pure} (read-only)
                   3035: code.  Reentrancy is important whenever asynchronous execution is possible;
                   3036: for example, a nonreentrant program may not be safe to call from a signal
                   3037: handler.  In systems with multiple threads of control, a nonreentrant
                   3038: program must be called only within interlocks.
                   3039: 
                   3040: The Bison parser is not normally a reentrant program, because it uses
                   3041: statically allocated variables for communication with @code{yylex}.  These
                   3042: variables include @code{yylval} and @code{yylloc}.
                   3043: 
                   3044: The Bison declaration @code{%pure_parser} says that you want the parser
                   3045: to be reentrant.  It looks like this:
                   3046: 
                   3047: @example
                   3048: %pure_parser
                   3049: @end example
                   3050: 
                   3051: The effect is that the two communication variables become local
                   3052: variables in @code{yyparse}, and a different calling convention is used for
                   3053: the lexical analyzer function @code{yylex}.  @xref{Pure Calling, ,Calling for Pure Parsers}, for the
                   3054: details of this.  The variable @code{yynerrs} also becomes local in
                   3055: @code{yyparse} (@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}).  The convention for calling
                   3056: @code{yyparse} itself is unchanged.
                   3057: 
                   3058: @node Decl Summary,  , Pure Decl, Declarations
                   3059: @subsection Bison Declaration Summary
                   3060: @cindex Bison declaration summary
                   3061: @cindex declaration summary
                   3062: @cindex summary, Bison declaration
                   3063: 
                   3064: Here is a summary of all Bison declarations:
                   3065: 
                   3066: @table @code
                   3067: @item %union
                   3068: Declare the collection of data types that semantic values may have
                   3069: (@pxref{Union Decl, ,The Collection of Value Types}).
                   3070: 
                   3071: @item %token
                   3072: Declare a terminal symbol (token type name) with no precedence
                   3073: or associativity specified (@pxref{Token Decl, ,Token Type Names}).
                   3074: 
                   3075: @item %right
                   3076: Declare a terminal symbol (token type name) that is right-associative
                   3077: (@pxref{Precedence Decl, ,Operator Precedence}).
                   3078: 
                   3079: @item %left
                   3080: Declare a terminal symbol (token type name) that is left-associative
                   3081: (@pxref{Precedence Decl, ,Operator Precedence}).
                   3082: 
                   3083: @item %nonassoc
                   3084: Declare a terminal symbol (token type name) that is nonassociative
                   3085: (using it in a way that would be associative is a syntax error)
                   3086: (@pxref{Precedence Decl, ,Operator Precedence}).
                   3087: 
                   3088: @item %type
                   3089: Declare the type of semantic values for a nonterminal symbol
                   3090: (@pxref{Type Decl, ,Nonterminal Symbols}).
                   3091: 
                   3092: @item %start
                   3093: Specify the grammar's start symbol (@pxref{Start Decl, ,The Start-Symbol}).
                   3094: 
                   3095: @item %expect
                   3096: Declare the expected number of shift-reduce conflicts
                   3097: (@pxref{Expect Decl, ,Suppressing Conflict Warnings}).
                   3098: 
                   3099: @item %pure_parser
                   3100: Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
                   3101: @end table
                   3102: 
                   3103: @node Multiple Parsers,  , Declarations, Grammar File
                   3104: @section Multiple Parsers in the Same Program
                   3105: 
                   3106: Most programs that use Bison parse only one language and therefore contain
                   3107: only one Bison parser.  But what if you want to parse more than one
                   3108: language with the same program?  Then you need to avoid a name conflict
                   3109: between different definitions of @code{yyparse}, @code{yylval}, and so on.
                   3110: 
                   3111: The easy way to do this is to use the option @samp{-p @var{prefix}}
                   3112: (@pxref{Invocation, ,Invoking Bison}).  This renames the interface functions and
                   3113: variables of the Bison parser to start with @var{prefix} instead of
                   3114: @samp{yy}.  You can use this to give each parser distinct names that do
                   3115: not conflict.
                   3116: 
                   3117: The precise list of symbols renamed is @code{yyparse}, @code{yylex},
                   3118: @code{yyerror}, @code{yylval}, @code{yychar} and @code{yydebug}.  For
                   3119: example, if you use @samp{-p c}, the names become @code{cparse},
                   3120: @code{clex}, and so on.
                   3121: 
                   3122: @strong{All the other variables and macros associated with Bison are not
                   3123: renamed.} These others are not global; there is no conflict if the same
                   3124: name is used in different parsers.  For example, @code{YYSTYPE} is not
                   3125: renamed, but defining this in different ways in different parsers causes
                   3126: no trouble (@pxref{Value Type, ,Data Types of Semantic Values}).
                   3127: 
                   3128: The @samp{-p} option works by adding macro definitions to the beginning
                   3129: of the parser source file, defining @code{yyparse} as
                   3130: @code{@var{prefix}parse}, and so on.  This effectively substitutes one
                   3131: name for the other in the entire parser file.
                   3132: 
                   3133: @node Interface, Algorithm, Grammar File, Top
                   3134: @chapter Parser C-Language Interface
                   3135: @cindex C-language interface
                   3136: @cindex interface
                   3137: 
                   3138: The Bison parser is actually a C function named @code{yyparse}.  Here we
                   3139: describe the interface conventions of @code{yyparse} and the other
                   3140: functions that it needs to use.
                   3141: 
                   3142: Keep in mind that the parser uses many C identifiers starting with
                   3143: @samp{yy} and @samp{YY} for internal purposes.  If you use such an
                   3144: identifier (aside from those in this manual) in an action or in additional
                   3145: C code in the grammar file, you are likely to run into trouble.
                   3146: 
                   3147: @menu
                   3148: * Parser Function::   How to call @code{yyparse} and what it returns.
                   3149: * Lexical::           You must supply a function @code{yylex} 
                   3150:                         which reads tokens.
                   3151: * Error Reporting::   You must supply a function @code{yyerror}.
                   3152: * Action Features::   Special features for use in actions.
                   3153: @end menu
                   3154: 
                   3155: @node Parser Function, Lexical,  , Interface
                   3156: @section The Parser Function @code{yyparse}
                   3157: @findex yyparse
                   3158: 
                   3159: You call the function @code{yyparse} to cause parsing to occur.  This
                   3160: function reads tokens, executes actions, and ultimately returns when it
                   3161: encounters end-of-input or an unrecoverable syntax error.  You can also
                   3162: write an action which directs @code{yyparse} to return immediately without
                   3163: reading further.
                   3164: 
                   3165: The value returned by @code{yyparse} is 0 if parsing was successful (return
                   3166: is due to end-of-input).
                   3167: 
                   3168: The value is 1 if parsing failed (return is due to a syntax error).
                   3169: 
                   3170: In an action, you can cause immediate return from @code{yyparse} by using
                   3171: these macros:
                   3172: 
                   3173: @table @code
                   3174: @item YYACCEPT
                   3175: @findex YYACCEPT
                   3176: Return immediately with value 0 (to report success).
                   3177: 
                   3178: @item YYABORT
                   3179: @findex YYABORT
                   3180: Return immediately with value 1 (to report failure).
                   3181: @end table
                   3182: 
                   3183: @node Lexical, Error Reporting, Parser Function, Interface
                   3184: @section The Lexical Analyzer Function @code{yylex}
                   3185: @findex yylex
                   3186: @cindex lexical analyzer
                   3187: 
                   3188: The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
                   3189: the input stream and returns them to the parser.  Bison does not create
                   3190: this function automatically; you must write it so that @code{yyparse} can
                   3191: call it.  The function is sometimes referred to as a lexical scanner.
                   3192: 
                   3193: In simple programs, @code{yylex} is often defined at the end of the Bison
                   3194: grammar file.  If @code{yylex} is defined in a separate source file, you
                   3195: need to arrange for the token-type macro definitions to be available there.
                   3196: To do this, use the @samp{-d} option when you run Bison, so that it will
                   3197: write these macro definitions into a separate header file
                   3198: @file{@var{name}.tab.h} which you can include in the other source files
                   3199: that need it.  @xref{Invocation, ,Invoking Bison}.@refill
                   3200: 
                   3201: @menu
                   3202: * Calling Convention::  How @code{yyparse} calls @code{yylex}.
                   3203: * Token Values::      How @code{yylex} must return the semantic value
                   3204:                         of the token it has read.
                   3205: * Token Positions::   How @code{yylex} must return the text position
                   3206:                         (line number, etc.) of the token, if the
                   3207:                         actions want that.
                   3208: * Pure Calling::      How the calling convention differs
                   3209:                         in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
                   3210: @end menu
                   3211: 
                   3212: @node Calling Convention, Token Values,  , Lexical
                   3213: @subsection Calling Convention for @code{yylex}
                   3214: 
                   3215: The value that @code{yylex} returns must be the numeric code for the type
                   3216: of token it has just found, or 0 for end-of-input.
                   3217: 
                   3218: When a token is referred to in the grammar rules by a name, that name
                   3219: in the parser file becomes a C macro whose definition is the proper
                   3220: numeric code for that token type.  So @code{yylex} can use the name
                   3221: to indicate that type.  @xref{Symbols}.
                   3222: 
                   3223: When a token is referred to in the grammar rules by a character literal,
                   3224: the numeric code for that character is also the code for the token type.
                   3225: So @code{yylex} can simply return that character code.  The null character
                   3226: must not be used this way, because its code is zero and that is what
                   3227: signifies end-of-input.
                   3228: 
                   3229: Here is an example showing these things:
                   3230: 
                   3231: @example
                   3232: yylex ()
                   3233: @{
                   3234:   @dots{}
                   3235:   if (c == EOF)     /* Detect end of file. */
                   3236:     return 0;
                   3237:   @dots{}
                   3238:   if (c == '+' || c == '-')
                   3239:     return c;      /* Assume token type for `+' is '+'. */
                   3240:   @dots{}
                   3241:   return INT;      /* Return the type of the token. */
                   3242:   @dots{}
                   3243: @}
                   3244: @end example
                   3245: 
                   3246: @noindent
                   3247: This interface has been designed so that the output from the @code{lex}
                   3248: utility can be used without change as the definition of @code{yylex}.
                   3249: 
                   3250: @node Token Values, Token Positions, Calling Convention, Lexical
                   3251: @subsection Semantic Values of Tokens
                   3252: 
                   3253: @vindex yylval
                   3254: In an ordinary (nonreentrant) parser, the semantic value of the token must
                   3255: be stored into the global variable @code{yylval}.  When you are using
                   3256: just one data type for semantic values, @code{yylval} has that type.
                   3257: Thus, if the type is @code{int} (the default), you might write this in
                   3258: @code{yylex}:
                   3259: 
                   3260: @example
                   3261: @group
                   3262:   @dots{}
                   3263:   yylval = value;  /* Put value onto Bison stack. */
                   3264:   return INT;      /* Return the type of the token. */
                   3265:   @dots{}
                   3266: @end group
                   3267: @end example
                   3268: 
                   3269: When you are using multiple data types, @code{yylval}'s type is a union
                   3270: made from the @code{%union} declaration (@pxref{Union Decl, ,The Collection of Value Types}).  So when
                   3271: you store a token's value, you must use the proper member of the union.
                   3272: If the @code{%union} declaration looks like this:
                   3273: 
                   3274: @example
                   3275: @group
                   3276: %union @{
                   3277:   int intval;
                   3278:   double val;
                   3279:   symrec *tptr;
                   3280: @}
                   3281: @end group
                   3282: @end example
                   3283: 
                   3284: @noindent
                   3285: then the code in @code{yylex} might look like this:
                   3286: 
                   3287: @example
                   3288: @group
                   3289:   @dots{}
                   3290:   yylval.intval = value; /* Put value onto Bison stack. */
                   3291:   return INT;          /* Return the type of the token. */
                   3292:   @dots{}
                   3293: @end group
                   3294: @end example
                   3295: 
                   3296: @node Token Positions, Pure Calling, Token Values, Lexical
                   3297: @subsection Textual Positions of Tokens
                   3298: 
                   3299: @vindex yylloc
                   3300: If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, ,Special Features for Use in Actions}) in
                   3301: actions to keep track of the textual locations of tokens and groupings,
                   3302: then you must provide this information in @code{yylex}.  The function
                   3303: @code{yyparse} expects to find the textual location of a token just parsed
                   3304: in the global variable @code{yylloc}.  So @code{yylex} must store the
                   3305: proper data in that variable.  The value of @code{yylloc} is a structure
                   3306: and you need only initialize the members that are going to be used by the
                   3307: actions.  The four members are called @code{first_line},
                   3308: @code{first_column}, @code{last_line} and @code{last_column}.  Note that
                   3309: the use of this feature makes the parser noticeably slower.
                   3310: 
                   3311: @tindex YYLTYPE
                   3312: The data type of @code{yylloc} has the name @code{YYLTYPE}.
                   3313: 
                   3314: @node Pure Calling,  , Token Positions, Lexical
                   3315: @subsection Calling for Pure Parsers
                   3316: 
                   3317: When you use the Bison declaration @code{%pure_parser} to request a pure,
                   3318: reentrant parser, the global communication variables @code{yylval} and
                   3319: @code{yylloc} cannot be used.  (@xref{Pure Decl, ,A Pure (Reentrant) Parser}.)  In such parsers the
                   3320: two global variables are replaced by pointers passed as arguments to
                   3321: @code{yylex}.  You must declare them as shown here, and pass the
                   3322: information back by storing it through those pointers.
                   3323: 
                   3324: @example
                   3325: yylex (lvalp, llocp)
                   3326:      YYSTYPE *lvalp;
                   3327:      YYLTYPE *llocp;
                   3328: @{
                   3329:   @dots{}
                   3330:   *lvalp = value;  /* Put value onto Bison stack.  */
                   3331:   return INT;      /* Return the type of the token.  */
                   3332:   @dots{}
                   3333: @}
                   3334: @end example
                   3335: 
                   3336: If the grammar file does not use the @samp{@@} constructs to refer to
                   3337: textual positions, then the type @code{YYLTYPE} will not be defined.  In
                   3338: this case, omit the second argument; @code{yylex} will be called with
                   3339: only one argument.
                   3340: 
                   3341: @node Error Reporting, Action Features, Lexical, Interface
                   3342: @section The Error Reporting Function @code{yyerror}
                   3343: @cindex error reporting function
                   3344: @findex yyerror
                   3345: @cindex parse error
                   3346: @cindex syntax error
                   3347: 
                   3348: The Bison parser detects a @dfn{parse error} or @dfn{syntax error}
                   3349: whenever it reads a token which cannot satisfy any syntax rule.  A
                   3350: action in the grammar can also explicitly proclaim an error, using the
                   3351: macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use in Actions}).
                   3352: 
                   3353: The Bison parser expects to report the error by calling an error
                   3354: reporting function named @code{yyerror}, which you must supply.  It is
                   3355: called by @code{yyparse} whenever a syntax error is found, and it
                   3356: receives one argument.  For a parse error, the string is normally
                   3357: @w{@code{"parse error"}}.
                   3358: 
                   3359: @findex YYERROR_VERBOSE
                   3360: If you define the macro @code{YYERROR_VERBOSE} in the Bison declarations
                   3361: section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose
                   3362: and specific error message string instead of just plain @w{@code{"parse
                   3363: error"}}.  It doesn't matter what definition you use for
                   3364: @code{YYERROR_VERBOSE}, just whether you define it.
                   3365: 
                   3366: The parser can detect one other kind of error: stack overflow.  This
                   3367: happens when the input contains constructions that are very deeply
                   3368: nested.  It isn't likely you will encounter this, since the Bison
                   3369: parser extends its stack automatically up to a very large limit.  But
                   3370: if overflow happens, @code{yyparse} calls @code{yyerror} in the usual
                   3371: fashion, except that the argument string is @w{@code{"parser stack
                   3372: overflow"}}.
                   3373: 
                   3374: The following definition suffices in simple programs:
                   3375: 
                   3376: @example
                   3377: @group
                   3378: yyerror (s)
                   3379:      char *s;
                   3380: @{
                   3381: @end group
                   3382: @group
                   3383:   fprintf (stderr, "%s\n", s);
                   3384: @}
                   3385: @end group
                   3386: @end example
                   3387: 
                   3388: After @code{yyerror} returns to @code{yyparse}, the latter will attempt
                   3389: error recovery if you have written suitable error recovery grammar rules
                   3390: (@pxref{Error Recovery}).  If recovery is impossible, @code{yyparse} will
                   3391: immediately return 1.
                   3392: 
                   3393: @vindex yynerrs
                   3394: The variable @code{yynerrs} contains the number of syntax errors
                   3395: encountered so far.  Normally this variable is global; but if you
                   3396: request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) then it is a local variable
                   3397: which only the actions can access.
                   3398: 
                   3399: @node Action Features,  , Error Reporting, Interface
                   3400: @section Special Features for Use in Actions
                   3401: @cindex summary, action features
                   3402: @cindex action features summary
                   3403: 
                   3404: Here is a table of Bison constructs, variables and macros that
                   3405: are useful in actions.
                   3406: 
                   3407: @table @samp
                   3408: @item $$
                   3409: Acts like a variable that contains the semantic value for the
                   3410: grouping made by the current rule.  @xref{Actions}.
                   3411: 
                   3412: @item $@var{n}
                   3413: Acts like a variable that contains the semantic value for the
                   3414: @var{n}th component of the current rule.  @xref{Actions}.
                   3415: 
                   3416: @item $<@var{typealt}>$
                   3417: Like @code{$$} but specifies alternative @var{typealt} in the union
                   3418: specified by the @code{%union} declaration.  @xref{Action Types, ,Data Types of Values in Actions}.
                   3419: 
                   3420: @item $<@var{typealt}>@var{n}
                   3421: Like @code{$@var{n}} but specifies alternative @var{typealt} in the
                   3422: union specified by the @code{%union} declaration.  
                   3423: @xref{Action Types, ,Data Types of Values in Actions}.@refill
                   3424: 
                   3425: @item YYABORT;
                   3426: Return immediately from @code{yyparse}, indicating failure.
                   3427: @xref{Parser Function, ,The Parser Function @code{yyparse}}.
                   3428: 
                   3429: @item YYACCEPT;
                   3430: Return immediately from @code{yyparse}, indicating success.
                   3431: @xref{Parser Function, ,The Parser Function @code{yyparse}}.
                   3432: 
                   3433: @item YYBACKUP (@var{token}, @var{value});
                   3434: @findex YYBACKUP
                   3435: Unshift a token.  This macro is allowed only for rules that reduce
                   3436: a single value, and only when there is no look-ahead token.
                   3437: It installs a look-ahead token with token type @var{token} and
                   3438: semantic value @var{value}; then it discards the value that was
                   3439: going to be reduced by this rule.
                   3440: 
                   3441: If the macro is used when it is not valid, such as when there is
                   3442: a look-ahead token already, then it reports a syntax error with
                   3443: a message @samp{cannot back up} and performs ordinary error
                   3444: recovery.
                   3445: 
                   3446: In either case, the rest of the action is not executed.
                   3447: 
                   3448: @item YYEMPTY
                   3449: @vindex YYEMPTY
                   3450: Value stored in @code{yychar} when there is no look-ahead token.
                   3451: 
                   3452: @item YYERROR;
                   3453: @findex YYERROR
                   3454: Cause an immediate syntax error.  This statement initiates error
                   3455: recovery just as if the parser itself had detected an error; however, it
                   3456: does not call @code{yyerror}, and does not print any message.  If you
                   3457: want to print an error message, call @code{yyerror} explicitly before
                   3458: the @samp{YYERROR;} statement.  @xref{Error Recovery}.
                   3459: 
                   3460: @item YYRECOVERING
                   3461: This macro stands for an expression that has the value 1 when the parser
                   3462: is recovering from a syntax error, and 0 the rest of the time.
                   3463: @xref{Error Recovery}.
                   3464: 
                   3465: @item yychar
                   3466: Variable containing the current look-ahead token.  (In a pure parser,
                   3467: this is actually a local variable within @code{yyparse}.)  When there is
                   3468: no look-ahead token, the value @code{YYEMPTY} is stored in the variable.
                   3469: @xref{Look-Ahead, ,Look-Ahead Tokens}.
                   3470: 
                   3471: @item yyclearin;
                   3472: Discard the current look-ahead token.  This is useful primarily in
                   3473: error rules.  @xref{Error Recovery}.
                   3474: 
                   3475: @item yyerrok;
                   3476: Resume generating error messages immediately for subsequent syntax
                   3477: errors.  This is useful primarily in error rules.  
                   3478: @xref{Error Recovery}.
                   3479: 
                   3480: @item @@@var{n}
                   3481: @findex @@@var{n}
                   3482: Acts like a structure variable containing information on the line
                   3483: numbers and column numbers of the @var{n}th component of the current
                   3484: rule.  The structure has four members, like this:
                   3485: 
                   3486: @example
                   3487: struct @{
                   3488:   int first_line, last_line;
                   3489:   int first_column, last_column;
                   3490: @};
                   3491: @end example
                   3492: 
                   3493: Thus, to get the starting line number of the third component, use
                   3494: @samp{@@3.first_line}.
                   3495: 
                   3496: In order for the members of this structure to contain valid information,
                   3497: you must make @code{yylex} supply this information about each token.
                   3498: If you need only certain members, then @code{yylex} need only fill in
                   3499: those members.
                   3500: 
                   3501: The use of this feature makes the parser noticeably slower.
                   3502: @end table
                   3503: 
                   3504: @node Algorithm, Error Recovery, Interface, Top
                   3505: @chapter The Bison Parser Algorithm 
                   3506: @cindex Bison parser algorithm 
                   3507: @cindex algorithm of parser
                   3508: @cindex shifting
                   3509: @cindex reduction
                   3510: @cindex parser stack
                   3511: @cindex stack, parser
                   3512: 
                   3513: As Bison reads tokens, it pushes them onto a stack along with their
                   3514: semantic values.  The stack is called the @dfn{parser stack}.  Pushing a
                   3515: token is traditionally called @dfn{shifting}.
                   3516: 
                   3517: For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
                   3518: @samp{3} to come.  The stack will have four elements, one for each token
                   3519: that was shifted.
                   3520: 
                   3521: But the stack does not always have an element for each token read.  When
                   3522: the last @var{n} tokens and groupings shifted match the components of a
                   3523: grammar rule, they can be combined according to that rule.  This is called
                   3524: @dfn{reduction}.  Those tokens and groupings are replaced on the stack by a
                   3525: single grouping whose symbol is the result (left hand side) of that rule.
                   3526: Running the rule's action is part of the process of reduction, because this
                   3527: is what computes the semantic value of the resulting grouping.
                   3528: 
                   3529: For example, if the infix calculator's parser stack contains this:
                   3530: 
                   3531: @example
                   3532: 1 + 5 * 3
                   3533: @end example
                   3534: 
                   3535: @noindent
                   3536: and the next input token is a newline character, then the last three
                   3537: elements can be reduced to 15 via the rule:
                   3538: 
                   3539: @example
                   3540: expr: expr '*' expr;
                   3541: @end example
                   3542: 
                   3543: @noindent
                   3544: Then the stack contains just these three elements:
                   3545: 
                   3546: @example
                   3547: 1 + 15
                   3548: @end example
                   3549: 
                   3550: @noindent
                   3551: At this point, another reduction can be made, resulting in the single value
                   3552: 16.  Then the newline token can be shifted.
                   3553: 
                   3554: The parser tries, by shifts and reductions, to reduce the entire input down
                   3555: to a single grouping whose symbol is the grammar's start-symbol
                   3556: (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
                   3557: 
                   3558: This kind of parser is known in the literature as a bottom-up parser.
                   3559: 
                   3560: @menu
                   3561: * Look-Ahead::        Parser looks one token ahead when deciding what to do.
                   3562: * Shift/Reduce::      Conflicts: when either shifting or reduction is valid.
                   3563: * Precedence::        Operator precedence works by resolving conflicts.
                   3564: * Contextual Precedence::  When an operator's precedence depends on context.
                   3565: * Parser States::     The parser is a finite-state-machine with stack.
                   3566: * Reduce/Reduce::     When two rules are applicable in the same situation.
                   3567: * Mystery Conflicts::  Reduce/reduce conflicts that look unjustified.
                   3568: * Stack Overflow::    What happens when stack gets full.  How to avoid it.
                   3569: @end menu
                   3570: 
                   3571: @node Look-Ahead, Shift/Reduce,  , Algorithm
                   3572: @section Look-Ahead Tokens
                   3573: @cindex look-ahead token
                   3574: 
                   3575: The Bison parser does @emph{not} always reduce immediately as soon as the
                   3576: last @var{n} tokens and groupings match a rule.  This is because such a
                   3577: simple strategy is inadequate to handle most languages.  Instead, when a
                   3578: reduction is possible, the parser sometimes ``looks ahead'' at the next
                   3579: token in order to decide what to do.
                   3580: 
                   3581: When a token is read, it is not immediately shifted; first it becomes the
                   3582: @dfn{look-ahead token}, which is not on the stack.  Now the parser can
                   3583: perform one or more reductions of tokens and groupings on the stack, while
                   3584: the look-ahead token remains off to the side.  When no more reductions
                   3585: should take place, the look-ahead token is shifted onto the stack.  This
                   3586: does not mean that all possible reductions have been done; depending on the
                   3587: token type of the look-ahead token, some rules may choose to delay their
                   3588: application.
                   3589: 
                   3590: Here is a simple case where look-ahead is needed.  These three rules define
                   3591: expressions which contain binary addition operators and postfix unary
                   3592: factorial operators (@samp{!}), and allow parentheses for grouping.
                   3593: 
                   3594: @example
                   3595: @group
                   3596: expr:     term '+' expr
                   3597:         | term
                   3598:         ;
                   3599: @end group
                   3600: 
                   3601: @group
                   3602: term:     '(' expr ')'
                   3603:         | term '!'
                   3604:         | NUMBER
                   3605:         ;
                   3606: @end group
                   3607: @end example
                   3608: 
                   3609: Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
                   3610: should be done?  If the following token is @samp{)}, then the first three
                   3611: tokens must be reduced to form an @code{expr}.  This is the only valid
                   3612: course, because shifting the @samp{)} would produce a sequence of symbols
                   3613: @w{@code{term ')'}}, and no rule allows this.
                   3614: 
                   3615: If the following token is @samp{!}, then it must be shifted immediately so
                   3616: that @w{@samp{2 !}} can be reduced to make a @code{term}.  If instead the
                   3617: parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
                   3618: @code{expr}.  It would then be impossible to shift the @samp{!} because
                   3619: doing so would produce on the stack the sequence of symbols @code{expr
                   3620: '!'}.  No rule allows that sequence.
                   3621: 
                   3622: @vindex yychar
                   3623: The current look-ahead token is stored in the variable @code{yychar}.
                   3624: @xref{Action Features, ,Special Features for Use in Actions}.
                   3625: 
                   3626: @node Shift/Reduce, Precedence, Look-Ahead, Algorithm
                   3627: @section Shift/Reduce Conflicts
                   3628: @cindex conflicts
                   3629: @cindex shift/reduce conflicts
                   3630: @cindex dangling @code{else}
                   3631: @cindex @code{else}, dangling
                   3632: 
                   3633: Suppose we are parsing a language which has if-then and if-then-else
                   3634: statements, with a pair of rules like this:
                   3635: 
                   3636: @example
                   3637: @group
                   3638: if_stmt:
                   3639:           IF expr THEN stmt
                   3640:         | IF expr THEN stmt ELSE stmt
                   3641:         ;
                   3642: @end group
                   3643: @end example
                   3644: 
                   3645: @noindent
                   3646: Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
                   3647: terminal symbols for specific keyword tokens.
                   3648: 
                   3649: When the @code{ELSE} token is read and becomes the look-ahead token, the
                   3650: contents of the stack (assuming the input is valid) are just right for
                   3651: reduction by the first rule.  But it is also legitimate to shift the
                   3652: @code{ELSE}, because that would lead to eventual reduction by the second
                   3653: rule.
                   3654: 
                   3655: This situation, where either a shift or a reduction would be valid, is
                   3656: called a @dfn{shift/reduce conflict}.  Bison is designed to resolve
                   3657: these conflicts by choosing to shift, unless otherwise directed by
                   3658: operator precedence declarations.  To see the reason for this, let's
                   3659: contrast it with the other alternative.
                   3660: 
                   3661: Since the parser prefers to shift the @code{ELSE}, the result is to attach
                   3662: the else-clause to the innermost if-statement, making these two inputs
                   3663: equivalent:
                   3664: 
                   3665: @example
                   3666: if x then if y then win (); else lose;
                   3667: 
                   3668: if x then do; if y then win (); else lose; end;
                   3669: @end example
                   3670: 
                   3671: But if the parser chose to reduce when possible rather than shift, the
                   3672: result would be to attach the else-clause to the outermost if-statement,
                   3673: making these two inputs equivalent:
                   3674: 
                   3675: @example
                   3676: if x then if y then win (); else lose;
                   3677: 
                   3678: if x then do; if y then win (); end; else lose;
                   3679: @end example
                   3680: 
                   3681: The conflict exists because the grammar as written is ambiguous: either
                   3682: parsing of the simple nested if-statement is legitimate.  The established
                   3683: convention is that these ambiguities are resolved by attaching the
                   3684: else-clause to the innermost if-statement; this is what Bison accomplishes
                   3685: by choosing to shift rather than reduce.  (It would ideally be cleaner to
                   3686: write an unambiguous grammar, but that is very hard to do in this case.)
                   3687: This particular ambiguity was first encountered in the specifications of
                   3688: Algol 60 and is called the ``dangling @code{else}'' ambiguity.
                   3689: 
                   3690: To avoid warnings from Bison about predictable, legitimate shift/reduce
                   3691: conflicts, use the @code{%expect @var{n}} declaration.  There will be no
                   3692: warning as long as the number of shift/reduce conflicts is exactly @var{n}.
                   3693: @xref{Expect Decl, ,Suppressing Conflict Warnings}.
                   3694: 
                   3695: The definition of @code{if_stmt} above is solely to blame for the
                   3696: conflict, but the conflict does not actually appear without additional
                   3697: rules.  Here is a complete Bison input file that actually manifests the
                   3698: conflict:
                   3699: 
                   3700: @example
                   3701: @group
                   3702: %token IF THEN ELSE variable
                   3703: %%
                   3704: @end group
                   3705: @group
                   3706: stmt:     expr
                   3707:         | if_stmt
                   3708:         ;
                   3709: @end group
                   3710: 
                   3711: @group
                   3712: if_stmt:
                   3713:           IF expr THEN stmt
                   3714:         | IF expr THEN stmt ELSE stmt
                   3715:         ;
                   3716: @end group
                   3717: 
                   3718: expr:     variable
                   3719:         ;
                   3720: @end example
                   3721: 
                   3722: @node Precedence, Contextual Precedence, Shift/Reduce, Algorithm
                   3723: @section Operator Precedence
                   3724: @cindex operator precedence
                   3725: @cindex precedence of operators
                   3726: 
                   3727: Another situation where shift/reduce conflicts appear is in arithmetic
                   3728: expressions.  Here shifting is not always the preferred resolution; the
                   3729: Bison declarations for operator precedence allow you to specify when to
                   3730: shift and when to reduce.
                   3731: 
                   3732: @menu
                   3733: * Why Precedence::    An example showing why precedence is needed.
                   3734: * Using Precedence::  How to specify precedence in Bison grammars.
                   3735: * Precedence Examples::  How these features are used in the previous example.
                   3736: * How Precedence::    How they work.
                   3737: @end menu
                   3738: 
                   3739: @node Why Precedence, Using Precedence,  , Precedence
                   3740: @subsection When Precedence is Needed
                   3741: 
                   3742: Consider the following ambiguous grammar fragment (ambiguous because the
                   3743: input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
                   3744: 
                   3745: @example
                   3746: @group
                   3747: expr:     expr '-' expr
                   3748:         | expr '*' expr
                   3749:         | expr '<' expr
                   3750:         | '(' expr ')'
                   3751:         @dots{}
                   3752:         ;
                   3753: @end group
                   3754: @end example
                   3755: 
                   3756: @noindent
                   3757: Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
                   3758: should it reduce them via the rule for the addition operator?  It depends
                   3759: on the next token.  Of course, if the next token is @samp{)}, we must
                   3760: reduce; shifting is invalid because no single rule can reduce the token
                   3761: sequence @w{@samp{- 2 )}} or anything starting with that.  But if the next
                   3762: token is @samp{*} or @samp{<}, we have a choice: either shifting or
                   3763: reduction would allow the parse to complete, but with different
                   3764: results.
                   3765: 
                   3766: To decide which one Bison should do, we must consider the
                   3767: results.  If the next operator token @var{op} is shifted, then it
                   3768: must be reduced first in order to permit another opportunity to
                   3769: reduce the sum.  The result is (in effect) @w{@samp{1 - (2
                   3770: @var{op} 3)}}.  On the other hand, if the subtraction is reduced
                   3771: before shifting @var{op}, the result is @w{@samp{(1 - 2) @var{op}
                   3772: 3}}.  Clearly, then, the choice of shift or reduce should depend
                   3773: on the relative precedence of the operators @samp{-} and
                   3774: @var{op}: @samp{*} should be shifted first, but not @samp{<}.
                   3775: 
                   3776: @cindex associativity
                   3777: What about input such as @w{@samp{1 - 2 - 5}}; should this be
                   3778: @w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}?  For
                   3779: most operators we prefer the former, which is called @dfn{left
                   3780: association}.  The latter alternative, @dfn{right association}, is
                   3781: desirable for assignment operators.  The choice of left or right
                   3782: association is a matter of whether the parser chooses to shift or
                   3783: reduce when the stack contains @w{@samp{1 - 2}} and the look-ahead
                   3784: token is @samp{-}: shifting makes right-associativity.
                   3785: 
                   3786: @node Using Precedence, Precedence Examples, Why Precedence, Precedence
                   3787: @subsection Specifying Operator Precedence
                   3788: @findex %left
                   3789: @findex %right
                   3790: @findex %nonassoc
                   3791: 
                   3792: Bison allows you to specify these choices with the operator precedence
                   3793: declarations @code{%left} and @code{%right}.  Each such declaration
                   3794: contains a list of tokens, which are operators whose precedence and
                   3795: associativity is being declared.  The @code{%left} declaration makes all
                   3796: those operators left-associative and the @code{%right} declaration makes
                   3797: them right-associative.  A third alternative is @code{%nonassoc}, which
                   3798: declares that it is a syntax error to find the same operator twice ``in a
                   3799: row''.
                   3800: 
                   3801: The relative precedence of different operators is controlled by the
                   3802: order in which they are declared.  The first @code{%left} or
                   3803: @code{%right} declaration in the file declares the operators whose
                   3804: precedence is lowest, the next such declaration declares the operators
                   3805: whose precedence is a little higher, and so on.
                   3806: 
                   3807: @node Precedence Examples, How Precedence, Using Precedence, Precedence
                   3808: @subsection Precedence Examples
                   3809: 
                   3810: In our example, we would want the following declarations:
                   3811: 
                   3812: @example
                   3813: %left '<'
                   3814: %left '-'
                   3815: %left '*'
                   3816: @end example
                   3817: 
                   3818: In a more complete example, which supports other operators as well, we
                   3819: would declare them in groups of equal precedence.  For example, @code{'+'} is
                   3820: declared with @code{'-'}:
                   3821: 
                   3822: @example
                   3823: %left '<' '>' '=' NE LE GE
                   3824: %left '+' '-'
                   3825: %left '*' '/'
                   3826: @end example
                   3827: 
                   3828: @noindent
                   3829: (Here @code{NE} and so on stand for the operators for ``not equal''
                   3830: and so on.  We assume that these tokens are more than one character long
                   3831: and therefore are represented by names, not character literals.)
                   3832: 
                   3833: @node How Precedence,  , Precedence Examples, Precedence
                   3834: @subsection How Precedence Works
                   3835: 
                   3836: The first effect of the precedence declarations is to assign precedence
                   3837: levels to the terminal symbols declared.  The second effect is to assign
                   3838: precedence levels to certain rules: each rule gets its precedence from the
                   3839: last terminal symbol mentioned in the components.  (You can also specify
                   3840: explicitly the precedence of a rule.  @xref{Contextual Precedence, ,Context-Dependent Precedence}.)
                   3841: 
                   3842: Finally, the resolution of conflicts works by comparing the
                   3843: precedence of the rule being considered with that of the
                   3844: look-ahead token.  If the token's precedence is higher, the
                   3845: choice is to shift.  If the rule's precedence is higher, the
                   3846: choice is to reduce.  If they have equal precedence, the choice
                   3847: is made based on the associativity of that precedence level.  The
                   3848: verbose output file made by @samp{-v} (@pxref{Invocation, ,Invoking Bison}) says
                   3849: how each conflict was resolved.
                   3850: 
                   3851: Not all rules and not all tokens have precedence.  If either the rule or
                   3852: the look-ahead token has no precedence, then the default is to shift.
                   3853: 
                   3854: @node Contextual Precedence, Parser States, Precedence, Algorithm
                   3855: @section Context-Dependent Precedence
                   3856: @cindex context-dependent precedence
                   3857: @cindex unary operator precedence
                   3858: @cindex precedence, context-dependent
                   3859: @cindex precedence, unary operator
                   3860: @findex %prec
                   3861: 
                   3862: Often the precedence of an operator depends on the context.  This sounds
                   3863: outlandish at first, but it is really very common.  For example, a minus
                   3864: sign typically has a very high precedence as a unary operator, and a
                   3865: somewhat lower precedence (lower than multiplication) as a binary operator.
                   3866: 
                   3867: The Bison precedence declarations, @code{%left}, @code{%right} and
                   3868: @code{%nonassoc}, can only be used once for a given token; so a token has
                   3869: only one precedence declared in this way.  For context-dependent
                   3870: precedence, you need to use an additional mechanism: the @code{%prec}
                   3871: modifier for rules.@refill
                   3872: 
                   3873: The @code{%prec} modifier declares the precedence of a particular rule by
                   3874: specifying a terminal symbol whose precedence should be used for that rule.
                   3875: It's not necessary for that symbol to appear otherwise in the rule.  The
                   3876: modifier's syntax is:
                   3877: 
                   3878: @example
                   3879: %prec @var{terminal-symbol}
                   3880: @end example
                   3881: 
                   3882: @noindent
                   3883: and it is written after the components of the rule.  Its effect is to
                   3884: assign the rule the precedence of @var{terminal-symbol}, overriding
                   3885: the precedence that would be deduced for it in the ordinary way.  The
                   3886: altered rule precedence then affects how conflicts involving that rule
                   3887: are resolved (@pxref{Precedence, ,Operator Precedence}).
                   3888: 
                   3889: Here is how @code{%prec} solves the problem of unary minus.  First, declare
                   3890: a precedence for a fictitious terminal symbol named @code{UMINUS}.  There
                   3891: are no tokens of this type, but the symbol serves to stand for its
                   3892: precedence:
                   3893: 
                   3894: @example
                   3895: @dots{}
                   3896: %left '+' '-'
                   3897: %left '*'
                   3898: %left UMINUS
                   3899: @end example
                   3900: 
                   3901: Now the precedence of @code{UMINUS} can be used in specific rules:
                   3902: 
                   3903: @example
                   3904: @group
                   3905: exp:    @dots{}
                   3906:         | exp '-' exp
                   3907:         @dots{}
                   3908:         | '-' exp %prec UMINUS
                   3909: @end group
                   3910: @end example
                   3911: 
                   3912: @node Parser States, Reduce/Reduce, Contextual Precedence, Algorithm
                   3913: @section Parser States
                   3914: @cindex finite-state machine
                   3915: @cindex parser state
                   3916: @cindex state (of parser)
                   3917: 
                   3918: The function @code{yyparse} is implemented using a finite-state machine.
                   3919: The values pushed on the parser stack are not simply token type codes; they
                   3920: represent the entire sequence of terminal and nonterminal symbols at or
                   3921: near the top of the stack.  The current state collects all the information
                   3922: about previous input which is relevant to deciding what to do next.
                   3923: 
                   3924: Each time a look-ahead token is read, the current parser state together
                   3925: with the type of look-ahead token are looked up in a table.  This table
                   3926: entry can say, ``Shift the look-ahead token.''  In this case, it also
                   3927: specifies the new parser state, which is pushed onto the top of the
                   3928: parser stack.  Or it can say, ``Reduce using rule number @var{n}.''
                   3929: This means that a certain number of tokens or groupings are taken off
                   3930: the top of the stack, and replaced by one grouping.  In other words,
                   3931: that number of states are popped from the stack, and one new state is
                   3932: pushed.
                   3933: 
                   3934: There is one other alternative: the table can say that the look-ahead token
                   3935: is erroneous in the current state.  This causes error processing to begin
                   3936: (@pxref{Error Recovery}).
                   3937: 
                   3938: @node Reduce/Reduce, Mystery Conflicts, Parser States, Algorithm
                   3939: @section Reduce/Reduce Conflicts
                   3940: @cindex reduce/reduce conflict
                   3941: @cindex conflicts, reduce/reduce
                   3942: 
                   3943: A reduce/reduce conflict occurs if there are two or more rules that apply
                   3944: to the same sequence of input.  This usually indicates a serious error
                   3945: in the grammar.
                   3946: 
                   3947: For example, here is an erroneous attempt to define a sequence
                   3948: of zero or more @code{word} groupings.
                   3949: 
                   3950: @example
                   3951: sequence: /* empty */
                   3952:                 @{ printf ("empty sequence\n"); @}
                   3953:         | maybeword
                   3954:         | sequence word
                   3955:                 @{ printf ("added word %s\n", $2); @}
                   3956:         ;
                   3957: 
                   3958: maybeword: /* empty */
                   3959:                 @{ printf ("empty maybeword\n"); @}
                   3960:         | word
                   3961:                 @{ printf ("single word %s\n", $1); @}
                   3962:         ;
                   3963: @end example
                   3964: 
                   3965: @noindent
                   3966: The error is an ambiguity: there is more than one way to parse a single
                   3967: @code{word} into a @code{sequence}.  It could be reduced to a
                   3968: @code{maybeword} and then into a @code{sequence} via the second rule.
                   3969: Alternatively, nothing-at-all could be reduced into a @code{sequence}
                   3970: via the first rule, and this could be combined with the @code{word}
                   3971: using the third rule for @code{sequence}.
                   3972: 
                   3973: There is also more than one way to reduce nothing-at-all into a
                   3974: @code{sequence}.  This can be done directly via the first rule,
                   3975: or indirectly via @code{maybeword} and then the second rule.
                   3976: 
                   3977: You might think that this is a distinction without a difference, because it
                   3978: does not change whether any particular input is valid or not.  But it does
                   3979: affect which actions are run.  One parsing order runs the second rule's
                   3980: action; the other runs the first rule's action and the third rule's action.
                   3981: In this example, the output of the program changes.
                   3982: 
                   3983: Bison resolves a reduce/reduce conflict by choosing to use the rule that
                   3984: appears first in the grammar, but it is very risky to rely on this.  Every
                   3985: reduce/reduce conflict must be studied and usually eliminated.  Here is the
                   3986: proper way to define @code{sequence}:
                   3987: 
                   3988: @example
                   3989: sequence: /* empty */
                   3990:                 @{ printf ("empty sequence\n"); @}
                   3991:         | sequence word
                   3992:                 @{ printf ("added word %s\n", $2); @}
                   3993:         ;
                   3994: @end example
                   3995: 
                   3996: Here is another common error that yields a reduce/reduce conflict:
                   3997: 
                   3998: @example
                   3999: sequence: /* empty */
                   4000:         | sequence words
                   4001:         | sequence redirects
                   4002:         ;
                   4003: 
                   4004: words:    /* empty */
                   4005:         | words word
                   4006:         ;
                   4007: 
                   4008: redirects:/* empty */
                   4009:         | redirects redirect
                   4010:         ;
                   4011: @end example
                   4012: 
                   4013: @noindent
                   4014: The intention here is to define a sequence which can contain either
                   4015: @code{word} or @code{redirect} groupings.  The individual definitions of
                   4016: @code{sequence}, @code{words} and @code{redirects} are error-free, but the
                   4017: three together make a subtle ambiguity: even an empty input can be parsed
                   4018: in infinitely many ways!
                   4019: 
                   4020: Consider: nothing-at-all could be a @code{words}.  Or it could be two
                   4021: @code{words} in a row, or three, or any number.  It could equally well be a
                   4022: @code{redirects}, or two, or any number.  Or it could be a @code{words}
                   4023: followed by three @code{redirects} and another @code{words}.  And so on.
                   4024: 
                   4025: Here are two ways to correct these rules.  First, to make it a single level
                   4026: of sequence:
                   4027: 
                   4028: @example
                   4029: sequence: /* empty */
                   4030:         | sequence word
                   4031:         | sequence redirect
                   4032:         ;
                   4033: @end example
                   4034: 
                   4035: Second, to prevent either a @code{words} or a @code{redirects}
                   4036: from being empty:
                   4037: 
                   4038: @example
                   4039: sequence: /* empty */
                   4040:         | sequence words
                   4041:         | sequence redirects
                   4042:         ;
                   4043: 
                   4044: words:    word
                   4045:         | words word
                   4046:         ;
                   4047: 
                   4048: redirects:redirect
                   4049:         | redirects redirect
                   4050:         ;
                   4051: @end example
                   4052: 
                   4053: @node Mystery Conflicts, Stack Overflow, Reduce/Reduce, Algorithm
                   4054: @section Mysterious Reduce/Reduce Conflicts
                   4055: 
                   4056: Sometimes reduce/reduce conflicts can occur that don't look warranted.
                   4057: Here is an example:
                   4058: 
                   4059: @example
                   4060: @group
                   4061: %token ID
                   4062: 
                   4063: %%
                   4064: def:    param_spec return_spec ','
                   4065:         ;
                   4066: param_spec:
                   4067:              type
                   4068:         |    name_list ':' type
                   4069:         ;
                   4070: @end group
                   4071: @group
                   4072: return_spec:
                   4073:              type
                   4074:         |    name ':' type
                   4075:         ;
                   4076: @end group
                   4077: @group
                   4078: type:        ID
                   4079:         ;
                   4080: @end group
                   4081: @group
                   4082: name:        ID
                   4083:         ;
                   4084: name_list:
                   4085:              name
                   4086:         |    name ',' name_list
                   4087:         ;
                   4088: @end group
                   4089: @end example
                   4090: 
                   4091: It would seem that this grammar can be parsed with only a single token
                   4092: of look-ahead: when a @code{param_spec} is being read, an @code{ID} is 
                   4093: a @code{name} if a comma or colon follows, or a @code{type} if another
                   4094: @code{ID} follows.  In other words, this grammar is LR(1).
                   4095: 
                   4096: @cindex LR(1)
                   4097: @cindex LALR(1)
                   4098: However, Bison, like most parser generators, cannot actually handle all
                   4099: LR(1) grammars.  In this grammar, two contexts, that after an @code{ID}
                   4100: at the beginning of a @code{param_spec} and likewise at the beginning of
                   4101: a @code{return_spec}, are similar enough that Bison assumes they are the
                   4102: same.  They appear similar because the same set of rules would be
                   4103: active---the rule for reducing to a @code{name} and that for reducing to
                   4104: a @code{type}.  Bison is unable to determine at that stage of processing
                   4105: that the rules would require different look-ahead tokens in the two
                   4106: contexts, so it makes a single parser state for them both.  Combining
                   4107: the two contexts causes a conflict later.  In parser terminology, this
                   4108: occurrence means that the grammar is not LALR(1).
                   4109: 
                   4110: In general, it is better to fix deficiencies than to document them.  But
                   4111: this particular deficiency is intrinsically hard to fix; parser
                   4112: generators that can handle LR(1) grammars are hard to write and tend to
                   4113: produce parsers that are very large.  In practice, Bison is more useful
                   4114: as it is now.
                   4115: 
                   4116: When the problem arises, you can often fix it by identifying the two
                   4117: parser states that are being confused, and adding something to make them
                   4118: look distinct.  In the above example, adding one rule to
                   4119: @code{return_spec} as follows makes the problem go away:
                   4120: 
                   4121: @example
                   4122: @group
                   4123: %token BOGUS
                   4124: @dots{}
                   4125: %%
                   4126: @dots{}
                   4127: return_spec:
                   4128:              type
                   4129:         |    name ':' type
                   4130:         /* This rule is never used.  */
                   4131:         |    ID BOGUS
                   4132:         ;
                   4133: @end group
                   4134: @end example
                   4135: 
                   4136: This corrects the problem because it introduces the possibility of an
                   4137: additional active rule in the context after the @code{ID} at the beginning of
                   4138: @code{return_spec}.  This rule is not active in the corresponding context
                   4139: in a @code{param_spec}, so the two contexts receive distinct parser states.
                   4140: As long as the token @code{BOGUS} is never generated by @code{yylex},
                   4141: the added rule cannot alter the way actual input is parsed.
                   4142: 
                   4143: In this particular example, there is another way to solve the problem:
                   4144: rewrite the rule for @code{return_spec} to use @code{ID} directly
                   4145: instead of via @code{name}.  This also causes the two confusing
                   4146: contexts to have different sets of active rules, because the one for
                   4147: @code{return_spec} activates the altered rule for @code{return_spec}
                   4148: rather than the one for @code{name}.
                   4149: 
                   4150: @example
                   4151: param_spec:
                   4152:              type
                   4153:         |    name_list ':' type
                   4154:         ;
                   4155: return_spec:
                   4156:              type
                   4157:         |    ID ':' type
                   4158:         ;
                   4159: @end example
                   4160: 
                   4161: @node Stack Overflow,  , Mystery Conflicts, Algorithm
                   4162: @section Stack Overflow, and How to Avoid It
                   4163: @cindex stack overflow
                   4164: @cindex parser stack overflow
                   4165: @cindex overflow of parser stack
                   4166: 
                   4167: The Bison parser stack can overflow if too many tokens are shifted and
                   4168: not reduced.  When this happens, the parser function @code{yyparse}
                   4169: returns a nonzero value, pausing only to call @code{yyerror} to report
                   4170: the overflow.
                   4171: 
                   4172: @vindex YYMAXDEPTH
                   4173: By defining the macro @code{YYMAXDEPTH}, you can control how deep the
                   4174: parser stack can become before a stack overflow occurs.  Define the
                   4175: macro with a value that is an integer.  This value is the maximum number
                   4176: of tokens that can be shifted (and not reduced) before overflow.
                   4177: It must be a constant expression whose value is known at compile time.
                   4178: 
                   4179: The stack space allowed is not necessarily allocated.  If you specify a
                   4180: large value for @code{YYMAXDEPTH}, the parser actually allocates a small
                   4181: stack at first, and then makes it bigger by stages as needed.  This
                   4182: increasing allocation happens automatically and silently.  Therefore,
                   4183: you do not need to make @code{YYMAXDEPTH} painfully small merely to save
                   4184: space for ordinary inputs that do not need much stack.
                   4185: 
                   4186: @cindex default stack limit
                   4187: The default value of @code{YYMAXDEPTH}, if you do not define it, is
                   4188: 10000.
                   4189: 
                   4190: @vindex YYINITDEPTH
                   4191: You can control how much stack is allocated initially by defining the
                   4192: macro @code{YYINITDEPTH}.  This value too must be a compile-time
                   4193: constant integer.  The default is 200.
                   4194: 
                   4195: @node Error Recovery, Context Dependency, Algorithm, Top
                   4196: @chapter Error Recovery
                   4197: @cindex error recovery
                   4198: @cindex recovery from errors
                   4199: 
                   4200: It is not usually acceptable to have a program terminate on a parse
                   4201: error.  For example, a compiler should recover sufficiently to parse the
                   4202: rest of the input file and check it for errors; a calculator should accept
                   4203: another expression.
                   4204: 
                   4205: In a simple interactive command parser where each input is one line, it may
                   4206: be sufficient to allow @code{yyparse} to return 1 on error and have the
                   4207: caller ignore the rest of the input line when that happens (and then call
                   4208: @code{yyparse} again).  But this is inadequate for a compiler, because it
                   4209: forgets all the syntactic context leading up to the error.  A syntax error
                   4210: deep within a function in the compiler input should not cause the compiler
                   4211: to treat the following line like the beginning of a source file.
                   4212: 
                   4213: @findex error
                   4214: You can define how to recover from a syntax error by writing rules to
                   4215: recognize the special token @code{error}.  This is a terminal symbol that
                   4216: is always defined (you need not declare it) and reserved for error
                   4217: handling.  The Bison parser generates an @code{error} token whenever a
                   4218: syntax error happens; if you have provided a rule to recognize this token
                   4219: in the current context, the parse can continue.  
                   4220: 
                   4221: For example:
                   4222: 
                   4223: @example
                   4224: stmnts:  /* empty string */
                   4225:         | stmnts '\n'
                   4226:         | stmnts exp '\n'
                   4227:         | stmnts error '\n'
                   4228: @end example
                   4229: 
                   4230: The fourth rule in this example says that an error followed by a newline
                   4231: makes a valid addition to any @code{stmnts}.
                   4232: 
                   4233: What happens if a syntax error occurs in the middle of an @code{exp}?  The
                   4234: error recovery rule, interpreted strictly, applies to the precise sequence
                   4235: of a @code{stmnts}, an @code{error} and a newline.  If an error occurs in
                   4236: the middle of an @code{exp}, there will probably be some additional tokens
                   4237: and subexpressions on the stack after the last @code{stmnts}, and there
                   4238: will be tokens to read before the next newline.  So the rule is not
                   4239: applicable in the ordinary way.
                   4240: 
                   4241: But Bison can force the situation to fit the rule, by discarding part of
                   4242: the semantic context and part of the input.  First it discards states and
                   4243: objects from the stack until it gets back to a state in which the
                   4244: @code{error} token is acceptable.  (This means that the subexpressions
                   4245: already parsed are discarded, back to the last complete @code{stmnts}.)  At
                   4246: this point the @code{error} token can be shifted.  Then, if the old
                   4247: look-ahead token is not acceptable to be shifted next, the parser reads
                   4248: tokens and discards them until it finds a token which is acceptable.  In
                   4249: this example, Bison reads and discards input until the next newline
                   4250: so that the fourth rule can apply.
                   4251: 
                   4252: The choice of error rules in the grammar is a choice of strategies for
                   4253: error recovery.  A simple and useful strategy is simply to skip the rest of
                   4254: the current input line or current statement if an error is detected:
                   4255: 
                   4256: @example
                   4257: stmnt: error ';'  /* on error, skip until ';' is read */
                   4258: @end example
                   4259: 
                   4260: It is also useful to recover to the matching close-delimiter of an
                   4261: opening-delimiter that has already been parsed.  Otherwise the
                   4262: close-delimiter will probably appear to be unmatched, and generate another,
                   4263: spurious error message:
                   4264: 
                   4265: @example
                   4266: primary:  '(' expr ')'
                   4267:         | '(' error ')'
                   4268:         @dots{}
                   4269:         ;
                   4270: @end example
                   4271: 
                   4272: Error recovery strategies are necessarily guesses.  When they guess wrong,
                   4273: one syntax error often leads to another.  In the above example, the error
                   4274: recovery rule guesses that an error is due to bad input within one
                   4275: @code{stmnt}.  Suppose that instead a spurious semicolon is inserted in the
                   4276: middle of a valid @code{stmnt}.  After the error recovery rule recovers
                   4277: from the first error, another syntax error will be found straightaway,
                   4278: since the text following the spurious semicolon is also an invalid
                   4279: @code{stmnt}.
                   4280: 
                   4281: To prevent an outpouring of error messages, the parser will output no error
                   4282: message for another syntax error that happens shortly after the first; only
                   4283: after three consecutive input tokens have been successfully shifted will
                   4284: error messages resume.
                   4285: 
                   4286: Note that rules which accept the @code{error} token may have actions, just
                   4287: as any other rules can.
                   4288: 
                   4289: @findex yyerrok
                   4290: You can make error messages resume immediately by using the macro
                   4291: @code{yyerrok} in an action.  If you do this in the error rule's action, no
                   4292: error messages will be suppressed.  This macro requires no arguments;
                   4293: @samp{yyerrok;} is a valid C statement.
                   4294: 
                   4295: @findex yyclearin
                   4296: The previous look-ahead token is reanalyzed immediately after an error.  If
                   4297: this is unacceptable, then the macro @code{yyclearin} may be used to clear
                   4298: this token.  Write the statement @samp{yyclearin;} in the error rule's
                   4299: action.
                   4300: 
                   4301: For example, suppose that on a parse error, an error handling routine is
                   4302: called that advances the input stream to some point where parsing should
                   4303: once again commence.  The next symbol returned by the lexical scanner is
                   4304: probably correct.  The previous look-ahead token ought to be discarded
                   4305: with @samp{yyclearin;}.
                   4306: 
                   4307: @vindex YYRECOVERING
                   4308: The macro @code{YYRECOVERING} stands for an expression that has the
                   4309: value 1 when the parser is recovering from a syntax error, and 0 the
                   4310: rest of the time.  A value of 1 indicates that error messages are
                   4311: currently suppressed for new syntax errors.
                   4312: 
                   4313: @node Context Dependency, Debugging, Error Recovery, Top
                   4314: @chapter Handling Context Dependencies
                   4315: 
                   4316: The Bison paradigm is to parse tokens first, then group them into larger
                   4317: syntactic units.  In many languages, the meaning of a token is affected by
                   4318: its context.  Although this violates the Bison paradigm, certain techniques
                   4319: (known as @dfn{kludges}) may enable you to write Bison parsers for such
                   4320: languages.
                   4321: 
                   4322: @menu
                   4323: * Semantic Tokens::   Token parsing can depend on the semantic context.
                   4324: * Lexical Tie-ins::   Token parsing can depend on the syntactic context.
                   4325: * Tie-in Recovery::   Lexical tie-ins have implications for how
                   4326:                         error recovery rules must be written.
                   4327: @end menu
                   4328: 
                   4329: (Actually, ``kludge'' means any technique that gets its job done but is
                   4330: neither clean nor robust.)
                   4331: 
                   4332: @node Semantic Tokens, Lexical Tie-ins,  , Context Dependency
                   4333: @section Semantic Info in Token Types
                   4334: 
                   4335: The C language has a context dependency: the way an identifier is used
                   4336: depends on what its current meaning is.  For example, consider this:
                   4337: 
                   4338: @example
                   4339: foo (x);
                   4340: @end example
                   4341: 
                   4342: This looks like a function call statement, but if @code{foo} is a typedef
                   4343: name, then this is actually a declaration of @code{x}.  How can a Bison
                   4344: parser for C decide how to parse this input?
                   4345: 
                   4346: The method used in GNU C is to have two different token types,
                   4347: @code{IDENTIFIER} and @code{TYPENAME}.  When @code{yylex} finds an
                   4348: identifier, it looks up the current declaration of the identifier in order
                   4349: to decide which token type to return: @code{TYPENAME} if the identifier is
                   4350: declared as a typedef, @code{IDENTIFIER} otherwise.
                   4351: 
                   4352: The grammar rules can then express the context dependency by the choice of
                   4353: token type to recognize.  @code{IDENTIFIER} is accepted as an expression,
                   4354: but @code{TYPENAME} is not.  @code{TYPENAME} can start a declaration, but
                   4355: @code{IDENTIFIER} cannot.  In contexts where the meaning of the identifier
                   4356: is @emph{not} significant, such as in declarations that can shadow a
                   4357: typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
                   4358: accepted---there is one rule for each of the two token types.
                   4359: 
                   4360: This technique is simple to use if the decision of which kinds of
                   4361: identifiers to allow is made at a place close to where the identifier is
                   4362: parsed.  But in C this is not always so: C allows a declaration to
                   4363: redeclare a typedef name provided an explicit type has been specified
                   4364: earlier:
                   4365: 
                   4366: @example
                   4367: typedef int foo, bar, lose;
                   4368: static foo (bar);        /* @r{redeclare @code{bar} as static variable} */
                   4369: static int foo (lose);   /* @r{redeclare @code{foo} as function} */
                   4370: @end example
                   4371: 
                   4372: Unfortunately, the name being declared is separated from the declaration
                   4373: construct itself by a complicated syntactic structure---the ``declarator''.
                   4374: 
                   4375: As a result, the part of Bison parser for C needs to be duplicated, with
                   4376: all the nonterminal names changed: once for parsing a declaration in which
                   4377: a typedef name can be redefined, and once for parsing a declaration in
                   4378: which that can't be done.  Here is a part of the duplication, with actions
                   4379: omitted for brevity:
                   4380: 
                   4381: @example
                   4382: initdcl:
                   4383:           declarator maybeasm '='
                   4384:           init
                   4385:         | declarator maybeasm
                   4386:         ;
                   4387: 
                   4388: notype_initdcl:
                   4389:           notype_declarator maybeasm '='
                   4390:           init
                   4391:         | notype_declarator maybeasm
                   4392:         ;
                   4393: @end example
                   4394: 
                   4395: @noindent
                   4396: Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
                   4397: cannot.  The distinction between @code{declarator} and
                   4398: @code{notype_declarator} is the same sort of thing.
                   4399: 
                   4400: There is some similarity between this technique and a lexical tie-in
                   4401: (described next), in that information which alters the lexical analysis is
                   4402: changed during parsing by other parts of the program.  The difference is
                   4403: here the information is global, and is used for other purposes in the
                   4404: program.  A true lexical tie-in has a special-purpose flag controlled by
                   4405: the syntactic context.
                   4406: 
                   4407: @node Lexical Tie-ins, Tie-in Recovery, Semantic Tokens, Context Dependency
                   4408: @section Lexical Tie-ins
                   4409: @cindex lexical tie-in
                   4410: 
                   4411: One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
                   4412: which is set by Bison actions, whose purpose is to alter the way tokens are
                   4413: parsed.
                   4414: 
                   4415: For example, suppose we have a language vaguely like C, but with a special
                   4416: construct @samp{hex (@var{hex-expr})}.  After the keyword @code{hex} comes
                   4417: an expression in parentheses in which all integers are hexadecimal.  In
                   4418: particular, the token @samp{a1b} must be treated as an integer rather than
                   4419: as an identifier if it appears in that context.  Here is how you can do it:
                   4420: 
                   4421: @example
                   4422: @group
                   4423: %@{
                   4424: int hexflag;
                   4425: %@}
                   4426: %%
                   4427: @dots{}
                   4428: @end group
                   4429: @group
                   4430: expr:   IDENTIFIER
                   4431:         | constant
                   4432:         | HEX '('
                   4433:                 @{ hexflag = 1; @}
                   4434:           expr ')'
                   4435:                 @{ hexflag = 0;
                   4436:                    $$ = $4; @}
                   4437:         | expr '+' expr
                   4438:                 @{ $$ = make_sum ($1, $3); @}
                   4439:         @dots{}
                   4440:         ;
                   4441: @end group
                   4442: 
                   4443: @group
                   4444: constant:
                   4445:           INTEGER
                   4446:         | STRING
                   4447:         ;
                   4448: @end group
                   4449: @end example
                   4450: 
                   4451: @noindent
                   4452: Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
                   4453: it is nonzero, all integers are parsed in hexadecimal, and tokens starting
                   4454: with letters are parsed as integers if possible.
                   4455: 
                   4456: The declaration of @code{hexflag} shown in the C declarations section of
                   4457: the parser file is needed to make it accessible to the actions 
                   4458: (@pxref{C Declarations, ,The C Declarations Section}).  You must also write the code in @code{yylex}
                   4459: to obey the flag.
                   4460: 
                   4461: @node Tie-in Recovery,  , Lexical Tie-ins, Context Dependency
                   4462: @section Lexical Tie-ins and Error Recovery
                   4463: 
                   4464: Lexical tie-ins make strict demands on any error recovery rules you have.
                   4465: @xref{Error Recovery}.
                   4466: 
                   4467: The reason for this is that the purpose of an error recovery rule is to
                   4468: abort the parsing of one construct and resume in some larger construct.
                   4469: For example, in C-like languages, a typical error recovery rule is to skip
                   4470: tokens until the next semicolon, and then start a new statement, like this:
                   4471: 
                   4472: @example
                   4473: stmt:   expr ';'
                   4474:         | IF '(' expr ')' stmt @{ @dots{} @}
                   4475:         @dots{}
                   4476:         error ';'
                   4477:                 @{ hexflag = 0; @}
                   4478:         ;
                   4479: @end example
                   4480: 
                   4481: If there is a syntax error in the middle of a @samp{hex (@var{expr})}
                   4482: construct, this error rule will apply, and then the action for the
                   4483: completed @samp{hex (@var{expr})} will never run.  So @code{hexflag} would
                   4484: remain set for the entire rest of the input, or until the next @code{hex}
                   4485: keyword, causing identifiers to be misinterpreted as integers.
                   4486: 
                   4487: To avoid this problem the error recovery rule itself clears @code{hexflag}.
                   4488: 
                   4489: There may also be an error recovery rule that works within expressions.
                   4490: For example, there could be a rule which applies within parentheses
                   4491: and skips to the close-parenthesis:
                   4492: 
                   4493: @example
                   4494: @group
                   4495: expr:   @dots{}
                   4496:         | '(' expr ')'
                   4497:                 @{ $$ = $2; @}
                   4498:         | '(' error ')'
                   4499:         @dots{}
                   4500: @end group
                   4501: @end example
                   4502: 
                   4503: If this rule acts within the @code{hex} construct, it is not going to abort
                   4504: that construct (since it applies to an inner level of parentheses within
                   4505: the construct).  Therefore, it should not clear the flag: the rest of
                   4506: the @code{hex} construct should be parsed with the flag still in effect.
                   4507: 
                   4508: What if there is an error recovery rule which might abort out of the
                   4509: @code{hex} construct or might not, depending on circumstances?  There is no
                   4510: way you can write the action to determine whether a @code{hex} construct is
                   4511: being aborted or not.  So if you are using a lexical tie-in, you had better
                   4512: make sure your error recovery rules are not of this kind.  Each rule must
                   4513: be such that you can be sure that it always will, or always won't, have to
                   4514: clear the flag.
                   4515: 
                   4516: @node Debugging, Invocation, Context Dependency, Top
                   4517: @chapter Debugging Your Parser
                   4518: @findex YYDEBUG
                   4519: @findex yydebug
                   4520: @cindex debugging
                   4521: @cindex tracing the parser
                   4522: 
                   4523: If a Bison grammar compiles properly but doesn't do what you want when it
                   4524: runs, the @code{yydebug} parser-trace feature can help you figure out why.
                   4525: 
                   4526: To enable compilation of trace facilities, you must define the macro
                   4527: @code{YYDEBUG} when you compile the parser.  You could use
                   4528: @samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define
                   4529: YYDEBUG 1} in the C declarations section of the grammar file 
                   4530: (@pxref{C Declarations, ,The C Declarations Section}).  Alternatively, use the @samp{-t} option when
                   4531: you run Bison (@pxref{Invocation, ,Invoking Bison}).  We always define @code{YYDEBUG} so that
                   4532: debugging is always possible.
                   4533: 
                   4534: The trace facility uses @code{stderr}, so you must add @w{@code{#include
                   4535: <stdio.h>}} to the C declarations section unless it is already there.
                   4536: 
                   4537: Once you have compiled the program with trace facilities, the way to
                   4538: request a trace is to store a nonzero value in the variable @code{yydebug}.
                   4539: You can do this by making the C code do it (in @code{main}, perhaps), or
                   4540: you can alter the value with a C debugger.
                   4541: 
                   4542: Each step taken by the parser when @code{yydebug} is nonzero produces a
                   4543: line or two of trace information, written on @code{stderr}.  The trace
                   4544: messages tell you these things:
                   4545: 
                   4546: @itemize @bullet
                   4547: @item
                   4548: Each time the parser calls @code{yylex}, what kind of token was read.
                   4549: 
                   4550: @item
                   4551: Each time a token is shifted, the depth and complete contents of the
                   4552: state stack (@pxref{Parser States}).
                   4553: 
                   4554: @item
                   4555: Each time a rule is reduced, which rule it is, and the complete contents
                   4556: of the state stack afterward.
                   4557: @end itemize
                   4558: 
                   4559: To make sense of this information, it helps to refer to the listing file
                   4560: produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking Bison}).  This file
                   4561: shows the meaning of each state in terms of positions in various rules, and
                   4562: also what each state will do with each possible input token.  As you read
                   4563: the successive trace messages, you can see that the parser is functioning
                   4564: according to its specification in the listing file.  Eventually you will
                   4565: arrive at the place where something undesirable happens, and you will see
                   4566: which parts of the grammar are to blame.
                   4567: 
                   4568: The parser file is a C program and you can use C debuggers on it, but it's
                   4569: not easy to interpret what it is doing.  The parser function is a
                   4570: finite-state machine interpreter, and aside from the actions it executes
                   4571: the same code over and over.  Only the values of variables show where in
                   4572: the grammar it is working.
                   4573: 
                   4574: @findex YYPRINT
                   4575: The debugging information normally gives the token type of each token
                   4576: read, but not its semantic value.  You can optionally define a macro
                   4577: named @code{YYPRINT} to provide a way to print the value.  If you define
                   4578: @code{YYPRINT}, it should take three arguments.  The parser will pass a
                   4579: standard I/O stream, the numeric code for the token type, and the token
                   4580: value (from @code{yylval}).
                   4581: 
                   4582: Here is an example of @code{YYPRINT} suitable for the multi-function
                   4583: calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}):
                   4584: 
                   4585: @smallexample
                   4586: #define YYPRINT(file, type, value)   yyprint (file, type, value)
                   4587: 
                   4588: static void
                   4589: yyprint (file, type, value)
                   4590:      FILE *file;
                   4591:      int type;
                   4592:      YYSTYPE value;
                   4593: @{
                   4594:   if (type == VAR)
                   4595:     fprintf (file, " %s", value.tptr->name);
                   4596:   else if (type == NUM)
                   4597:     fprintf (file, " %d", value.val);
                   4598: @}
                   4599: @end smallexample
                   4600: 
                   4601: @node Invocation, Table of Symbols, Debugging, Top
                   4602: @chapter Invoking Bison
                   4603: @cindex invoking Bison
                   4604: @cindex Bison invocation
                   4605: @cindex options for invoking Bison
                   4606: 
                   4607: The usual way to invoke Bison is as follows:
                   4608: 
                   4609: @example
                   4610: bison @var{infile}
                   4611: @end example
                   4612: 
                   4613: Here @var{infile} is the grammar file name, which usually ends in
                   4614: @samp{.y}.  The parser file's name is made by replacing the @samp{.y}
                   4615: with @samp{.tab.c}.  Thus, the @samp{bison foo.y} filename yields
                   4616: @file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields
                   4617: @file{hack/foo.tab.c}.@refill
                   4618: 
                   4619: @menu
                   4620: * Bison Options::     All the options described in detail, 
                   4621:                        in alphabetical order by short options.
                   4622: * Option Cross Key::  Alphabetical list of long options.
                   4623: * VMS Invocation::    Bison command syntax on VMS.
                   4624: @end menu
                   4625: 
                   4626: @node Bison Options, Option Cross Key,  , Invocation
                   4627: @section Bison Options
                   4628: 
                   4629: Bison supports both traditional single-letter options and mnemonic long
                   4630: option names.  Long option names are indicated with @samp{--} instead of
                   4631: @samp{-}.  Abbreviations for option names are allowed as long as they
                   4632: are unique.  When a long option takes an argument, like
                   4633: @samp{--file-prefix}, connect the option name and the argument with
                   4634: @samp{=}.
                   4635: 
                   4636: Here is a list of options that can be used with Bison, alphabetized by
                   4637: short option.  It is followed by a cross key alphabetized by long
                   4638: option.
                   4639: 
                   4640: @table @samp
                   4641: @item -b @var{file-prefix}
                   4642: @itemx --file-prefix=@var{prefix}
                   4643: Specify a prefix to use for all Bison output file names.  The names are
                   4644: chosen as if the input file were named @file{@var{prefix}.c}.
                   4645: 
                   4646: @item -d
                   4647: @itemx --defines
                   4648: Write an extra output file containing macro definitions for the token
                   4649: type names defined in the grammar and the semantic value type
                   4650: @code{YYSTYPE}, as well as a few @code{extern} variable declarations.
                   4651: 
                   4652: If the parser output file is named @file{@var{name}.c} then this file
                   4653: is named @file{@var{name}.h}.@refill
                   4654: 
                   4655: This output file is essential if you wish to put the definition of
                   4656: @code{yylex} in a separate source file, because @code{yylex} needs to
                   4657: be able to refer to token type codes and the variable
                   4658: @code{yylval}.  @xref{Token Values, ,Semantic Values of Tokens}.@refill
                   4659: 
                   4660: @item -l
                   4661: @itemx --no-lines
                   4662: Don't put any @code{#line} preprocessor commands in the parser file.
                   4663: Ordinarily Bison puts them in the parser file so that the C compiler
                   4664: and debuggers will associate errors with your source file, the
                   4665: grammar file.  This option causes them to associate errors with the
                   4666: parser file, treating it an independent source file in its own right.
                   4667: 
                   4668: @item -o @var{outfile}
                   4669: @itemx --output-file=@var{outfile}
                   4670: Specify the name @var{outfile} for the parser file.
                   4671: 
                   4672: The other output files' names are constructed from @var{outfile}
                   4673: as described under the @samp{-v} and @samp{-d} switches.
                   4674: 
                   4675: @item -p @var{prefix}
                   4676: @itemx --name-prefix=@var{prefix}
                   4677: Rename the external symbols used in the parser so that they start with
                   4678: @var{prefix} instead of @samp{yy}.  The precise list of symbols renamed
                   4679: is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yylval},
                   4680: @code{yychar} and @code{yydebug}.
                   4681: 
                   4682: For example, if you use @samp{-p c}, the names become @code{cparse},
                   4683: @code{clex}, and so on.
                   4684: 
                   4685: @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
                   4686: 
                   4687: @item -t
                   4688: @itemx --debug
                   4689: Output a definition of the macro @code{YYDEBUG} into the parser file,
                   4690: so that the debugging facilities are compiled.  @xref{Debugging, ,Debugging Your Parser}.
                   4691: 
                   4692: @item -v
                   4693: @itemx --verbose
                   4694: Write an extra output file containing verbose descriptions of the
                   4695: parser states and what is done for each type of look-ahead token in
                   4696: that state.
                   4697: 
                   4698: This file also describes all the conflicts, both those resolved by
                   4699: operator precedence and the unresolved ones.
                   4700: 
                   4701: The file's name is made by removing @samp{.tab.c} or @samp{.c} from
                   4702: the parser output file name, and adding @samp{.output} instead.@refill
                   4703: 
                   4704: Therefore, if the input file is @file{foo.y}, then the parser file is
                   4705: called @file{foo.tab.c} by default.  As a consequence, the verbose
                   4706: output file is called @file{foo.output}.@refill
                   4707: 
                   4708: @item -V
                   4709: @itemx --version
                   4710: Print the version number of Bison and exit.
                   4711: 
                   4712: @item -h
                   4713: @itemx --help
                   4714: Print a summary of the command-line options to Bison and exit.
                   4715: 
                   4716: @need 1750
                   4717: @item -y
                   4718: @itemx --yacc
                   4719: @itemx --fixed-output-files
                   4720: Equivalent to @samp{-o y.tab.c}; the parser output file is called
                   4721: @file{y.tab.c}, and the other outputs are called @file{y.output} and
                   4722: @file{y.tab.h}.  The purpose of this switch is to imitate Yacc's output
                   4723: file name conventions.  Thus, the following shell script can substitute
                   4724: for Yacc:@refill
                   4725: 
                   4726: @example
                   4727: bison -y $*
                   4728: @end example
                   4729: @end table
                   4730: 
                   4731: @node Option Cross Key, VMS Invocation, Bison Options, Invocation
                   4732: @section Option Cross Key
                   4733: 
                   4734: Here is a list of options, alphabetized by long option, to help you find
                   4735: the corresponding short option.
                   4736: 
                   4737: @tex
                   4738: \def\leaderfill{\leaders\hbox to 1em{\hss.\hss}\hfill}
                   4739: 
                   4740: {\tt
                   4741: \line{ --debug \leaderfill -t}
                   4742: \line{ --defines \leaderfill -d}
                   4743: \line{ --file-prefix \leaderfill -b}
                   4744: \line{ --fixed-output-files \leaderfill -y}
                   4745: \line{ --help \leaderfill -h}
                   4746: \line{ --name-prefix \leaderfill -p}
                   4747: \line{ --no-lines \leaderfill -l}
                   4748: \line{ --output-file \leaderfill -o}
                   4749: \line{ --verbose \leaderfill -v}
                   4750: \line{ --version \leaderfill -V}
                   4751: \line{ --yacc \leaderfill -y}
                   4752: }
                   4753: @end tex
                   4754: 
                   4755: @ifinfo
                   4756: @example
                   4757: --debug                               -t
                   4758: --defines                             -d
                   4759: --file-prefix=@var{prefix}                  -b @var{file-prefix}
                   4760: --fixed-output-files --yacc           -y
                   4761: --help                                -h
                   4762: --name-prefix                         -p
                   4763: --no-lines                            -l
                   4764: --output-file=@var{outfile}                 -o @var{outfile}
                   4765: --verbose                             -v
                   4766: --version                             -V
                   4767: @end example
                   4768: @end ifinfo
                   4769: 
                   4770: @node VMS Invocation,  , Option Cross Key, Invocation
                   4771: @section Invoking Bison under VMS
                   4772: @cindex invoking Bison under VMS
                   4773: @cindex VMS
                   4774: 
                   4775: The command line syntax for Bison on VMS is a variant of the usual
                   4776: Bison command syntax---adapted to fit VMS conventions.
                   4777: 
                   4778: To find the VMS equivalent for any Bison option, start with the long
                   4779: option, and substitute a @samp{/} for the leading @samp{--}, and
                   4780: substitute a @samp{_} for each @samp{-} in the name of the long option.
                   4781: For example, the following invocation under VMS:
                   4782: 
                   4783: @example
                   4784: bison /debug/name_prefix=bar foo.y
                   4785: @end example
                   4786: 
                   4787: @noindent
                   4788: is equivalent to the following command under POSIX.
                   4789: 
                   4790: @example
                   4791: bison --debug --name-prefix=bar foo.y
                   4792: @end example
                   4793: 
                   4794: The VMS file system does not permit filenames such as
                   4795: @file{foo.tab.c}.  In the above example, the output file
                   4796: would instead be named @file{foo_tab.c}.
                   4797: 
                   4798: @node Table of Symbols, Glossary, Invocation, Top
                   4799: @appendix Bison Symbols
                   4800: @cindex Bison symbols, table of
                   4801: @cindex symbols in Bison, table of
                   4802: 
                   4803: @table @code
                   4804: @item error
                   4805: A token name reserved for error recovery.  This token may be used in
                   4806: grammar rules so as to allow the Bison parser to recognize an error in
                   4807: the grammar without halting the process.  In effect, a sentence
                   4808: containing an error may be recognized as valid.  On a parse error, the
                   4809: token @code{error} becomes the current look-ahead token.  Actions
                   4810: corresponding to @code{error} are then executed, and the look-ahead
                   4811: token is reset to the token that originally caused the violation.
                   4812: @xref{Error Recovery}.
                   4813: 
                   4814: @item YYABORT
                   4815: Macro to pretend that an unrecoverable syntax error has occurred, by
                   4816: making @code{yyparse} return 1 immediately.  The error reporting
                   4817: function @code{yyerror} is not called.  @xref{Parser Function, ,The Parser Function @code{yyparse}}.
                   4818: 
                   4819: @item YYACCEPT
                   4820: Macro to pretend that a complete utterance of the language has been
                   4821: read, by making @code{yyparse} return 0 immediately.  
                   4822: @xref{Parser Function, ,The Parser Function @code{yyparse}}.
                   4823: 
                   4824: @item YYBACKUP
                   4825: Macro to discard a value from the parser stack and fake a look-ahead
                   4826: token.  @xref{Action Features, ,Special Features for Use in Actions}.
                   4827: 
                   4828: @item YYERROR
                   4829: Macro to pretend that a syntax error has just been detected: call
                   4830: @code{yyerror} and then perform normal error recovery if possible
                   4831: (@pxref{Error Recovery}), or (if recovery is impossible) make
                   4832: @code{yyparse} return 1.  @xref{Error Recovery}.
                   4833: 
                   4834: @item YYERROR_VERBOSE
                   4835: Macro that you define with @code{#define} in the Bison declarations
                   4836: section to request verbose, specific error message strings when
                   4837: @code{yyerror} is called.
                   4838: 
                   4839: @item YYINITDEPTH
                   4840: Macro for specifying the initial size of the parser stack.
                   4841: @xref{Stack Overflow}.
                   4842: 
                   4843: @item YYLTYPE
                   4844: Macro for the data type of @code{yylloc}; a structure with four
                   4845: members.  @xref{Token Positions, ,Textual Positions of Tokens}.
                   4846: 
                   4847: @item YYMAXDEPTH
                   4848: Macro for specifying the maximum size of the parser stack.
                   4849: @xref{Stack Overflow}.
                   4850: 
                   4851: @item YYRECOVERING
                   4852: Macro whose value indicates whether the parser is recovering from a
                   4853: syntax error.  @xref{Action Features, ,Special Features for Use in Actions}.
                   4854: 
                   4855: @item YYSTYPE
                   4856: Macro for the data type of semantic values; @code{int} by default.
                   4857: @xref{Value Type, ,Data Types of Semantic Values}.
                   4858: 
                   4859: @item yychar
                   4860: External integer variable that contains the integer value of the
                   4861: current look-ahead token.  (In a pure parser, it is a local variable
                   4862: within @code{yyparse}.)  Error-recovery rule actions may examine this
                   4863: variable.  @xref{Action Features, ,Special Features for Use in Actions}.
                   4864: 
                   4865: @item yyclearin
                   4866: Macro used in error-recovery rule actions.  It clears the previous
                   4867: look-ahead token.  @xref{Error Recovery}.
                   4868: 
                   4869: @item yydebug
                   4870: External integer variable set to zero by default.  If @code{yydebug}
                   4871: is given a nonzero value, the parser will output information on input
                   4872: symbols and parser action.  @xref{Debugging, ,Debugging Your Parser}.
                   4873: 
                   4874: @item yyerrok
                   4875: Macro to cause parser to recover immediately to its normal mode
                   4876: after a parse error.  @xref{Error Recovery}.
                   4877: 
                   4878: @item yyerror
                   4879: User-supplied function to be called by @code{yyparse} on error.  The
                   4880: function receives one argument, a pointer to a character string
                   4881: containing an error message.  @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
                   4882: 
                   4883: @item yylex
                   4884: User-supplied lexical analyzer function, called with no arguments
                   4885: to get the next token.  @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
                   4886: 
                   4887: @item yylval
                   4888: External variable in which @code{yylex} should place the semantic
                   4889: value associated with a token.  (In a pure parser, it is a local
                   4890: variable within @code{yyparse}, and its address is passed to
                   4891: @code{yylex}.)  @xref{Token Values, ,Semantic Values of Tokens}.
                   4892: 
                   4893: @item yylloc
                   4894: External variable in which @code{yylex} should place the line and
                   4895: column numbers associated with a token.  (In a pure parser, it is a
                   4896: local variable within @code{yyparse}, and its address is passed to
                   4897: @code{yylex}.)  You can ignore this variable if you don't use the
                   4898: @samp{@@} feature in the grammar actions.  @xref{Token Positions, ,Textual Positions of Tokens}.
                   4899: 
                   4900: @item yynerrs
                   4901: Global variable which Bison increments each time there is a parse
                   4902: error.  (In a pure parser, it is a local variable within
                   4903: @code{yyparse}.)  @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
                   4904: 
                   4905: @item yyparse
                   4906: The parser function produced by Bison; call this function to start
                   4907: parsing.  @xref{Parser Function, ,The Parser Function @code{yyparse}}.
                   4908: 
                   4909: @item %left
                   4910: Bison declaration to assign left associativity to token(s).
                   4911: @xref{Precedence Decl, ,Operator Precedence}.
                   4912: 
                   4913: @item %nonassoc
                   4914: Bison declaration to assign nonassociativity to token(s).
                   4915: @xref{Precedence Decl, ,Operator Precedence}.
                   4916: 
                   4917: @item %prec
                   4918: Bison declaration to assign a precedence to a specific rule.
                   4919: @xref{Contextual Precedence, ,Context-Dependent Precedence}.
                   4920: 
                   4921: @item %pure_parser
                   4922: Bison declaration to request a pure (reentrant) parser.
                   4923: @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
                   4924: 
                   4925: @item %right
                   4926: Bison declaration to assign right associativity to token(s).
                   4927: @xref{Precedence Decl, ,Operator Precedence}.
                   4928: 
                   4929: @item %start
                   4930: Bison declaration to specify the start symbol.  @xref{Start Decl, ,The Start-Symbol}.
                   4931: 
                   4932: @item %token
                   4933: Bison declaration to declare token(s) without specifying precedence.
                   4934: @xref{Token Decl, ,Token Type Names}.
                   4935: 
                   4936: @item %type
                   4937: Bison declaration to declare nonterminals.  @xref{Type Decl, ,Nonterminal Symbols}.
                   4938: 
                   4939: @item %union
                   4940: Bison declaration to specify several possible data types for semantic
                   4941: values.  @xref{Union Decl, ,The Collection of Value Types}.
                   4942: @end table
                   4943: 
                   4944: These are the punctuation and delimiters used in Bison input:
                   4945: 
                   4946: @table @samp
                   4947: @item %%
                   4948: Delimiter used to separate the grammar rule section from the
                   4949: Bison declarations section or the additional C code section.
                   4950: @xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}.
                   4951: 
                   4952: @item %@{ %@}
                   4953: All code listed between @samp{%@{} and @samp{%@}} is copied directly
                   4954: to the output file uninterpreted.  Such code forms the ``C
                   4955: declarations'' section of the input file.  @xref{Grammar Outline, ,Outline of a Bison Grammar}.
                   4956: 
                   4957: @item /*@dots{}*/
                   4958: Comment delimiters, as in C.
                   4959: 
                   4960: @item :
                   4961: Separates a rule's result from its components.  @xref{Rules, ,Syntax of Grammar Rules}.
                   4962: 
                   4963: @item ;
                   4964: Terminates a rule.  @xref{Rules, ,Syntax of Grammar Rules}.
                   4965: 
                   4966: @item |
                   4967: Separates alternate rules for the same result nonterminal.
                   4968: @xref{Rules, ,Syntax of Grammar Rules}.
                   4969: @end table
                   4970: 
                   4971: @node Glossary, Index, Table of Symbols, Top
                   4972: @appendix Glossary
                   4973: @cindex glossary
                   4974: 
                   4975: @table @asis
                   4976: @item Backus-Naur Form (BNF)
                   4977: Formal method of specifying context-free grammars.  BNF was first used
                   4978: in the @cite{ALGOL-60} report, 1963.  @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
                   4979: 
                   4980: @item Context-free grammars
                   4981: Grammars specified as rules that can be applied regardless of context.
                   4982: Thus, if there is a rule which says that an integer can be used as an
                   4983: expression, integers are allowed @emph{anywhere} an expression is
                   4984: permitted.  @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
                   4985: 
                   4986: @item Dynamic allocation
                   4987: Allocation of memory that occurs during execution, rather than at
                   4988: compile time or on entry to a function.
                   4989: 
                   4990: @item Empty string
                   4991: Analogous to the empty set in set theory, the empty string is a
                   4992: character string of length zero.
                   4993: 
                   4994: @item Finite-state stack machine
                   4995: A ``machine'' that has discrete states in which it is said to exist at
                   4996: each instant in time.  As input to the machine is processed, the
                   4997: machine moves from state to state as specified by the logic of the
                   4998: machine.  In the case of the parser, the input is the language being
                   4999: parsed, and the states correspond to various stages in the grammar
                   5000: rules.  @xref{Algorithm, ,The Bison Parser Algorithm }.
                   5001: 
                   5002: @item Grouping
                   5003: A language construct that is (in general) grammatically divisible;
                   5004: for example, `expression' or `declaration' in C.  
                   5005: @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
                   5006: 
                   5007: @item Infix operator
                   5008: An arithmetic operator that is placed between the operands on which it
                   5009: performs some operation.
                   5010: 
                   5011: @item Input stream
                   5012: A continuous flow of data between devices or programs.
                   5013: 
                   5014: @item Language construct
                   5015: One of the typical usage schemas of the language.  For example, one of
                   5016: the constructs of the C language is the @code{if} statement.
                   5017: @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
                   5018: 
                   5019: @item Left associativity
                   5020: Operators having left associativity are analyzed from left to right:
                   5021: @samp{a+b+c} first computes @samp{a+b} and then combines with
                   5022: @samp{c}.  @xref{Precedence, ,Operator Precedence}.
                   5023: 
                   5024: @item Left recursion
                   5025: A rule whose result symbol is also its first component symbol;
                   5026: for example, @samp{expseq1 : expseq1 ',' exp;}.  @xref{Recursion, ,Recursive Rules}.
                   5027: 
                   5028: @item Left-to-right parsing
                   5029: Parsing a sentence of a language by analyzing it token by token from
                   5030: left to right.  @xref{Algorithm, ,The Bison Parser Algorithm }.
                   5031: 
                   5032: @item Lexical analyzer (scanner)
                   5033: A function that reads an input stream and returns tokens one by one.
                   5034: @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
                   5035: 
                   5036: @item Lexical tie-in
                   5037: A flag, set by actions in the grammar rules, which alters the way
                   5038: tokens are parsed.  @xref{Lexical Tie-ins}.
                   5039: 
                   5040: @item Look-ahead token
                   5041: A token already read but not yet shifted.  @xref{Look-Ahead, ,Look-Ahead Tokens}.
                   5042: 
                   5043: @item LALR(1)
                   5044: The class of context-free grammars that Bison (like most other parser
                   5045: generators) can handle; a subset of LR(1).  @xref{Mystery Conflicts, ,
                   5046: Mysterious Reduce/Reduce Conflicts}.
                   5047: 
                   5048: @item LR(1)
                   5049: The class of context-free grammars in which at most one token of
                   5050: look-ahead is needed to disambiguate the parsing of any piece of input.
                   5051: 
                   5052: @item Nonterminal symbol
                   5053: A grammar symbol standing for a grammatical construct that can
                   5054: be expressed through rules in terms of smaller constructs; in other
                   5055: words, a construct that is not a token.  @xref{Symbols}.
                   5056: 
                   5057: @item Parse error
                   5058: An error encountered during parsing of an input stream due to invalid
                   5059: syntax.  @xref{Error Recovery}.
                   5060: 
                   5061: @item Parser
                   5062: A function that recognizes valid sentences of a language by analyzing
                   5063: the syntax structure of a set of tokens passed to it from a lexical
                   5064: analyzer.
                   5065: 
                   5066: @item Postfix operator
                   5067: An arithmetic operator that is placed after the operands upon which it
                   5068: performs some operation.
                   5069: 
                   5070: @item Reduction
                   5071: Replacing a string of nonterminals and/or terminals with a single
                   5072: nonterminal, according to a grammar rule.  @xref{Algorithm, ,The Bison Parser Algorithm }.
                   5073: 
                   5074: @item Reentrant
                   5075: A reentrant subprogram is a subprogram which can be in invoked any
                   5076: number of times in parallel, without interference between the various
                   5077: invocations.  @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
                   5078: 
                   5079: @item Reverse polish notation
                   5080: A language in which all operators are postfix operators.
                   5081: 
                   5082: @item Right recursion
                   5083: A rule whose result symbol is also its last component symbol;
                   5084: for example, @samp{expseq1: exp ',' expseq1;}.  @xref{Recursion, ,Recursive Rules}.
                   5085: 
                   5086: @item Semantics
                   5087: In computer languages, the semantics are specified by the actions
                   5088: taken for each instance of the language, i.e., the meaning of
                   5089: each statement.  @xref{Semantics, ,Defining Language Semantics}.
                   5090: 
                   5091: @item Shift
                   5092: A parser is said to shift when it makes the choice of analyzing
                   5093: further input from the stream rather than reducing immediately some
                   5094: already-recognized rule.  @xref{Algorithm, ,The Bison Parser Algorithm }.
                   5095: 
                   5096: @item Single-character literal
                   5097: A single character that is recognized and interpreted as is.
                   5098: @xref{Grammar in Bison, ,From Formal Rules to Bison Input}.
                   5099: 
                   5100: @item Start symbol
                   5101: The nonterminal symbol that stands for a complete valid utterance in
                   5102: the language being parsed.  The start symbol is usually listed as the
                   5103: first nonterminal symbol in a language specification.  
                   5104: @xref{Start Decl, ,The Start-Symbol}.
                   5105: 
                   5106: @item Symbol table
                   5107: A data structure where symbol names and associated data are stored
                   5108: during parsing to allow for recognition and use of existing
                   5109: information in repeated uses of a symbol.  @xref{Multi-function Calc}.
                   5110: 
                   5111: @item Token
                   5112: A basic, grammatically indivisible unit of a language.  The symbol
                   5113: that describes a token in the grammar is a terminal symbol.
                   5114: The input of the Bison parser is a stream of tokens which comes from
                   5115: the lexical analyzer.  @xref{Symbols}.
                   5116: 
                   5117: @item Terminal symbol
                   5118: A grammar symbol that has no rules in the grammar and therefore
                   5119: is grammatically indivisible.  The piece of text it represents
                   5120: is a token.  @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
                   5121: @end table
                   5122: 
                   5123: @node Index,  , Glossary, Top
                   5124: @unnumbered Index
                   5125: 
                   5126: @printindex cp
                   5127: 
                   5128: @contents
                   5129: 
                   5130: @bye
                   5131: 
                   5132: 
                   5133: 
                   5134: 
                   5135: @c old menu
                   5136: 
                   5137: * Introduction::
                   5138: * Conditions::
                   5139: * Copying::           The GNU General Public License says
                   5140:                         how you can copy and share Bison
                   5141: 
                   5142: Tutorial sections:
                   5143: * Concepts::          Basic concepts for understanding Bison.
                   5144: * Examples::          Three simple explained examples of using Bison.
                   5145: 
                   5146: Reference sections:
                   5147: * Grammar File::      Writing Bison declarations and rules.
                   5148: * Interface::         C-language interface to the parser function @code{yyparse}.
                   5149: * Algorithm::         How the Bison parser works at run-time.
                   5150: * Error Recovery::    Writing rules for error recovery.
                   5151: * Context Dependency::What to do if your language syntax is too
                   5152:                         messy for Bison to handle straightforwardly.
                   5153: * Debugging::         Debugging Bison parsers that parse wrong.
                   5154: * Invocation::        How to run Bison (to produce the parser source file).
                   5155: * Table of Symbols::  All the keywords of the Bison language are explained.
                   5156: * Glossary::          Basic concepts are explained.
                   5157: * Index::             Cross-references to the text.
                   5158:
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.