|
|
1.1 ! root 1: \input texinfo @c -*-texinfo-*- ! 2: @comment %**start of header ! 3: @setfilename bison.info ! 4: @settitle Bison 1.20 ! 5: @setchapternewpage odd ! 6: ! 7: @c SMALL BOOK version ! 8: @c This edition has been formatted so that you can format and print it in ! 9: @c the smallbook format. ! 10: @c @smallbook ! 11: ! 12: @c next time, consider using @set for edition number, etc... ! 13: ! 14: @c Set following if you have the new `shorttitlepage' command ! 15: @c @clear shorttitlepage-enabled ! 16: @c @set shorttitlepage-enabled ! 17: ! 18: @c ISPELL CHECK: done, 14 Jan 1993 --bob ! 19: ! 20: @c Check COPYRIGHT dates. should be updated in the titlepage, ifinfo ! 21: @c titlepage; should NOT be changed in the GPL. --mew ! 22: ! 23: @iftex ! 24: @syncodeindex fn cp ! 25: @syncodeindex vr cp ! 26: @syncodeindex tp cp ! 27: @end iftex ! 28: @ifinfo ! 29: @synindex fn cp ! 30: @synindex vr cp ! 31: @synindex tp cp ! 32: @end ifinfo ! 33: @comment %**end of header ! 34: ! 35: @ifinfo ! 36: This file documents the Bison parser generator. ! 37: ! 38: Copyright (C) 1988, 1989, 1990, 1991, 1992 Free Software Foundation, Inc. ! 39: ! 40: Permission is granted to make and distribute verbatim copies of ! 41: this manual provided the copyright notice and this permission notice ! 42: are preserved on all copies. ! 43: ! 44: @ignore ! 45: Permission is granted to process this file through Tex and print the ! 46: results, provided the printed document carries copying permission ! 47: notice identical to this one except for the removal of this paragraph ! 48: (this paragraph not being relevant to the printed manual). ! 49: ! 50: @end ignore ! 51: Permission is granted to copy and distribute modified versions of this ! 52: manual under the conditions for verbatim copying, provided also that the ! 53: sections entitled ``GNU General Public License'' and ``Conditions for ! 54: Using Bison'' are included exactly as in the original, and provided that ! 55: the entire resulting derived work is distributed under the terms of a ! 56: permission notice identical to this one. ! 57: ! 58: Permission is granted to copy and distribute translations of this manual ! 59: into another language, under the above conditions for modified versions, ! 60: except that the sections entitled ``GNU General Public License'', ! 61: ``Conditions for Using Bison'' and this permission notice may be ! 62: included in translations approved by the Free Software Foundation ! 63: instead of in the original English. ! 64: @end ifinfo ! 65: ! 66: @ifset shorttitlepage-enabled ! 67: @shorttitlepage Bison ! 68: @end ifset ! 69: @titlepage ! 70: @title Bison ! 71: @subtitle The YACC-compatible Parser Generator ! 72: @subtitle December 1992, Bison Version 1.20 ! 73: ! 74: @author by Charles Donnelly and Richard Stallman ! 75: ! 76: @page ! 77: @vskip 0pt plus 1filll ! 78: Copyright @copyright{} 1988, 1989, 1990, 1991, 1992 Free Software ! 79: Foundation ! 80: ! 81: @sp 2 ! 82: Published by the Free Software Foundation @* ! 83: 675 Massachusetts Avenue @* ! 84: Cambridge, MA 02139 USA @* ! 85: Printed copies are available for $15 each.@* ! 86: ISBN-1-882114-30-2 ! 87: ! 88: Permission is granted to make and distribute verbatim copies of ! 89: this manual provided the copyright notice and this permission notice ! 90: are preserved on all copies. ! 91: ! 92: @ignore ! 93: Permission is granted to process this file through TeX and print the ! 94: results, provided the printed document carries copying permission ! 95: notice identical to this one except for the removal of this paragraph ! 96: (this paragraph not being relevant to the printed manual). ! 97: ! 98: @end ignore ! 99: Permission is granted to copy and distribute modified versions of this ! 100: manual under the conditions for verbatim copying, provided also that the ! 101: sections entitled ``GNU General Public License'' and ``Conditions for ! 102: Using Bison'' are included exactly as in the original, and provided that ! 103: the entire resulting derived work is distributed under the terms of a ! 104: permission notice identical to this one. ! 105: ! 106: Permission is granted to copy and distribute translations of this manual ! 107: into another language, under the above conditions for modified versions, ! 108: except that the sections entitled ``GNU General Public License'', ! 109: ``Conditions for Using Bison'' and this permission notice may be ! 110: included in translations approved by the Free Software Foundation ! 111: instead of in the original English. ! 112: @sp 2 ! 113: Cover art by Etienne Suvasa. ! 114: @end titlepage ! 115: @page ! 116: ! 117: @node Top, Introduction, (dir), (dir) ! 118: ! 119: @ifinfo ! 120: This manual documents version 1.20 of Bison. ! 121: @end ifinfo ! 122: ! 123: @menu ! 124: * Introduction:: ! 125: * Conditions:: ! 126: * Copying:: The GNU General Public License says ! 127: how you can copy and share Bison ! 128: ! 129: Tutorial sections: ! 130: * Concepts:: Basic concepts for understanding Bison. ! 131: * Examples:: Three simple explained examples of using Bison. ! 132: ! 133: Reference sections: ! 134: * Grammar File:: Writing Bison declarations and rules. ! 135: * Interface:: C-language interface to the parser function @code{yyparse}. ! 136: * Algorithm:: How the Bison parser works at run-time. ! 137: * Error Recovery:: Writing rules for error recovery. ! 138: * Context Dependency:: What to do if your language syntax is too ! 139: messy for Bison to handle straightforwardly. ! 140: * Debugging:: Debugging Bison parsers that parse wrong. ! 141: * Invocation:: How to run Bison (to produce the parser source file). ! 142: * Table of Symbols:: All the keywords of the Bison language are explained. ! 143: * Glossary:: Basic concepts are explained. ! 144: * Index:: Cross-references to the text. ! 145: ! 146: --- The Detailed Node Listing --- ! 147: ! 148: The Concepts of Bison ! 149: ! 150: * Language and Grammar:: Languages and context-free grammars, ! 151: as mathematical ideas. ! 152: * Grammar in Bison:: How we represent grammars for Bison's sake. ! 153: * Semantic Values:: Each token or syntactic grouping can have ! 154: a semantic value (the value of an integer, ! 155: the name of an identifier, etc.). ! 156: * Semantic Actions:: Each rule can have an action containing C code. ! 157: * Bison Parser:: What are Bison's input and output, ! 158: how is the output used? ! 159: * Stages:: Stages in writing and running Bison grammars. ! 160: * Grammar Layout:: Overall structure of a Bison grammar file. ! 161: ! 162: Examples ! 163: ! 164: * RPN Calc:: Reverse polish notation calculator; ! 165: a first example with no operator precedence. ! 166: * Infix Calc:: Infix (algebraic) notation calculator. ! 167: Operator precedence is introduced. ! 168: * Simple Error Recovery:: Continuing after syntax errors. ! 169: * Multi-function Calc:: Calculator with memory and trig functions. ! 170: It uses multiple data-types for semantic values. ! 171: * Exercises:: Ideas for improving the multi-function calculator. ! 172: ! 173: Reverse Polish Notation Calculator ! 174: ! 175: * Decls: Rpcalc Decls. Bison and C declarations for rpcalc. ! 176: * Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation. ! 177: * Lexer: Rpcalc Lexer. The lexical analyzer. ! 178: * Main: Rpcalc Main. The controlling function. ! 179: * Error: Rpcalc Error. The error reporting function. ! 180: * Gen: Rpcalc Gen. Running Bison on the grammar file. ! 181: * Comp: Rpcalc Compile. Run the C compiler on the output code. ! 182: ! 183: Grammar Rules for @code{rpcalc} ! 184: ! 185: * Rpcalc Input:: ! 186: * Rpcalc Line:: ! 187: * Rpcalc Expr:: ! 188: ! 189: Multi-Function Calculator: @code{mfcalc} ! 190: ! 191: * Decl: Mfcalc Decl. Bison declarations for multi-function calculator. ! 192: * Rules: Mfcalc Rules. Grammar rules for the calculator. ! 193: * Symtab: Mfcalc Symtab. Symbol table management subroutines. ! 194: ! 195: Bison Grammar Files ! 196: ! 197: * Grammar Outline:: Overall layout of the grammar file. ! 198: * Symbols:: Terminal and nonterminal symbols. ! 199: * Rules:: How to write grammar rules. ! 200: * Recursion:: Writing recursive rules. ! 201: * Semantics:: Semantic values and actions. ! 202: * Declarations:: All kinds of Bison declarations are described here. ! 203: * Multiple Parsers:: Putting more than one Bison parser in one program. ! 204: ! 205: Outline of a Bison Grammar ! 206: ! 207: * C Declarations:: Syntax and usage of the C declarations section. ! 208: * Bison Declarations:: Syntax and usage of the Bison declarations section. ! 209: * Grammar Rules:: Syntax and usage of the grammar rules section. ! 210: * C Code:: Syntax and usage of the additional C code section. ! 211: ! 212: Defining Language Semantics ! 213: ! 214: * Value Type:: Specifying one data type for all semantic values. ! 215: * Multiple Types:: Specifying several alternative data types. ! 216: * Actions:: An action is the semantic definition of a grammar rule. ! 217: * Action Types:: Specifying data types for actions to operate on. ! 218: * Mid-Rule Actions:: Most actions go at the end of a rule. ! 219: This says when, why and how to use the exceptional ! 220: action in the middle of a rule. ! 221: ! 222: Bison Declarations ! 223: ! 224: * Token Decl:: Declaring terminal symbols. ! 225: * Precedence Decl:: Declaring terminals with precedence and associativity. ! 226: * Union Decl:: Declaring the set of all semantic value types. ! 227: * Type Decl:: Declaring the choice of type for a nonterminal symbol. ! 228: * Expect Decl:: Suppressing warnings about shift/reduce conflicts. ! 229: * Start Decl:: Specifying the start symbol. ! 230: * Pure Decl:: Requesting a reentrant parser. ! 231: * Decl Summary:: Table of all Bison declarations. ! 232: ! 233: Parser C-Language Interface ! 234: ! 235: * Parser Function:: How to call @code{yyparse} and what it returns. ! 236: * Lexical:: You must supply a function @code{yylex} ! 237: which reads tokens. ! 238: * Error Reporting:: You must supply a function @code{yyerror}. ! 239: * Action Features:: Special features for use in actions. ! 240: ! 241: The Lexical Analyzer Function @code{yylex} ! 242: ! 243: * Calling Convention:: How @code{yyparse} calls @code{yylex}. ! 244: * Token Values:: How @code{yylex} must return the semantic value ! 245: of the token it has read. ! 246: * Token Positions:: How @code{yylex} must return the text position ! 247: (line number, etc.) of the token, if the ! 248: actions want that. ! 249: * Pure Calling:: How the calling convention differs ! 250: in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). ! 251: ! 252: The Bison Parser Algorithm ! 253: ! 254: * Look-Ahead:: Parser looks one token ahead when deciding what to do. ! 255: * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. ! 256: * Precedence:: Operator precedence works by resolving conflicts. ! 257: * Contextual Precedence:: When an operator's precedence depends on context. ! 258: * Parser States:: The parser is a finite-state-machine with stack. ! 259: * Reduce/Reduce:: When two rules are applicable in the same situation. ! 260: * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. ! 261: * Stack Overflow:: What happens when stack gets full. How to avoid it. ! 262: ! 263: Operator Precedence ! 264: ! 265: * Why Precedence:: An example showing why precedence is needed. ! 266: * Using Precedence:: How to specify precedence in Bison grammars. ! 267: * Precedence Examples:: How these features are used in the previous example. ! 268: * How Precedence:: How they work. ! 269: ! 270: Handling Context Dependencies ! 271: ! 272: * Semantic Tokens:: Token parsing can depend on the semantic context. ! 273: * Lexical Tie-ins:: Token parsing can depend on the syntactic context. ! 274: * Tie-in Recovery:: Lexical tie-ins have implications for how ! 275: error recovery rules must be written. ! 276: ! 277: Invoking Bison ! 278: ! 279: * Bison Options:: All the options described in detail, ! 280: in alphabetical order by short options. ! 281: * Option Cross Key:: Alphabetical list of long options. ! 282: * VMS Invocation:: Bison command syntax on VMS. ! 283: @end menu ! 284: ! 285: @node Introduction, Conditions, Top, Top ! 286: @unnumbered Introduction ! 287: @cindex introduction ! 288: ! 289: @dfn{Bison} is a general-purpose parser generator that converts a ! 290: grammar description for an LALR(1) context-free grammar into a C ! 291: program to parse that grammar. Once you are proficient with Bison, ! 292: you may use it to develop a wide range of language parsers, from those ! 293: used in simple desk calculators to complex programming languages. ! 294: ! 295: Bison is upward compatible with Yacc: all properly-written Yacc grammars ! 296: ought to work with Bison with no change. Anyone familiar with Yacc ! 297: should be able to use Bison with little trouble. You need to be fluent in ! 298: C programming in order to use Bison or to understand this manual. ! 299: ! 300: We begin with tutorial chapters that explain the basic concepts of using ! 301: Bison and show three explained examples, each building on the last. If you ! 302: don't know Bison or Yacc, start by reading these chapters. Reference ! 303: chapters follow which describe specific aspects of Bison in detail. ! 304: ! 305: Bison was written primarily by Robert Corbett; Richard Stallman made ! 306: it Yacc-compatible. This edition corresponds to version 1.20 of Bison. ! 307: ! 308: @node Conditions, Copying, Introduction, Top ! 309: @unnumbered Conditions for Using Bison ! 310: ! 311: Bison grammars can be used only in programs that are free software. This ! 312: is in contrast to what happens with the GNU C compiler and the other ! 313: GNU programming tools. ! 314: ! 315: The reason Bison is special is that the output of the Bison utility---the ! 316: Bison parser file---contains a verbatim copy of a sizable piece of Bison, ! 317: which is the code for the @code{yyparse} function. (The actions from your ! 318: grammar are inserted into this function at one point, but the rest of the ! 319: function is not changed.) ! 320: ! 321: As a result, the Bison parser file is covered by the same copying ! 322: conditions that cover Bison itself and the rest of the GNU system: any ! 323: program containing it has to be distributed under the standard GNU copying ! 324: conditions. ! 325: ! 326: Occasionally people who would like to use Bison to develop proprietary ! 327: programs complain about this. ! 328: ! 329: We don't particularly sympathize with their complaints. The purpose of the ! 330: GNU project is to promote the right to share software and the practice of ! 331: sharing software; it is a means of changing society. The people who ! 332: complain are planning to be uncooperative toward the rest of the world; why ! 333: should they deserve our help in doing so? ! 334: ! 335: However, it's possible that a change in these conditions might encourage ! 336: computer companies to use and distribute the GNU system. If so, then we ! 337: might decide to change the terms on @code{yyparse} as a matter of the ! 338: strategy of promoting the right to share. Such a change would be ! 339: irrevocable. Since we stand by the copying permissions we have announced, ! 340: we cannot withdraw them once given. ! 341: ! 342: We mustn't make an irrevocable change hastily. We have to wait until there ! 343: is a complete GNU system and there has been time to learn how this issue ! 344: affects its reception. ! 345: ! 346: @node Copying, Concepts, Conditions, Top ! 347: @unnumbered GNU GENERAL PUBLIC LICENSE ! 348: @center Version 2, June 1991 ! 349: ! 350: @display ! 351: Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc. ! 352: 675 Mass Ave, Cambridge, MA 02139, USA ! 353: ! 354: Everyone is permitted to copy and distribute verbatim copies ! 355: of this license document, but changing it is not allowed. ! 356: @end display ! 357: ! 358: @unnumberedsec Preamble ! 359: ! 360: The licenses for most software are designed to take away your ! 361: freedom to share and change it. By contrast, the GNU General Public ! 362: License is intended to guarantee your freedom to share and change free ! 363: software---to make sure the software is free for all its users. This ! 364: General Public License applies to most of the Free Software ! 365: Foundation's software and to any other program whose authors commit to ! 366: using it. (Some other Free Software Foundation software is covered by ! 367: the GNU Library General Public License instead.) You can apply it to ! 368: your programs, too. ! 369: ! 370: When we speak of free software, we are referring to freedom, not ! 371: price. Our General Public Licenses are designed to make sure that you ! 372: have the freedom to distribute copies of free software (and charge for ! 373: this service if you wish), that you receive source code or can get it ! 374: if you want it, that you can change the software or use pieces of it ! 375: in new free programs; and that you know you can do these things. ! 376: ! 377: To protect your rights, we need to make restrictions that forbid ! 378: anyone to deny you these rights or to ask you to surrender the rights. ! 379: These restrictions translate to certain responsibilities for you if you ! 380: distribute copies of the software, or if you modify it. ! 381: ! 382: For example, if you distribute copies of such a program, whether ! 383: gratis or for a fee, you must give the recipients all the rights that ! 384: you have. You must make sure that they, too, receive or can get the ! 385: source code. And you must show them these terms so they know their ! 386: rights. ! 387: ! 388: We protect your rights with two steps: (1) copyright the software, and ! 389: (2) offer you this license which gives you legal permission to copy, ! 390: distribute and/or modify the software. ! 391: ! 392: Also, for each author's protection and ours, we want to make certain ! 393: that everyone understands that there is no warranty for this free ! 394: software. If the software is modified by someone else and passed on, we ! 395: want its recipients to know that what they have is not the original, so ! 396: that any problems introduced by others will not reflect on the original ! 397: authors' reputations. ! 398: ! 399: Finally, any free program is threatened constantly by software ! 400: patents. We wish to avoid the danger that redistributors of a free ! 401: program will individually obtain patent licenses, in effect making the ! 402: program proprietary. To prevent this, we have made it clear that any ! 403: patent must be licensed for everyone's free use or not licensed at all. ! 404: ! 405: The precise terms and conditions for copying, distribution and ! 406: modification follow. ! 407: ! 408: @iftex ! 409: @unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION ! 410: @end iftex ! 411: @ifinfo ! 412: @center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION ! 413: @end ifinfo ! 414: ! 415: @enumerate 0 ! 416: @item ! 417: This License applies to any program or other work which contains ! 418: a notice placed by the copyright holder saying it may be distributed ! 419: under the terms of this General Public License. The ``Program'', below, ! 420: refers to any such program or work, and a ``work based on the Program'' ! 421: means either the Program or any derivative work under copyright law: ! 422: that is to say, a work containing the Program or a portion of it, ! 423: either verbatim or with modifications and/or translated into another ! 424: language. (Hereinafter, translation is included without limitation in ! 425: the term ``modification''.) Each licensee is addressed as ``you''. ! 426: ! 427: Activities other than copying, distribution and modification are not ! 428: covered by this License; they are outside its scope. The act of ! 429: running the Program is not restricted, and the output from the Program ! 430: is covered only if its contents constitute a work based on the ! 431: Program (independent of having been made by running the Program). ! 432: Whether that is true depends on what the Program does. ! 433: ! 434: @item ! 435: You may copy and distribute verbatim copies of the Program's ! 436: source code as you receive it, in any medium, provided that you ! 437: conspicuously and appropriately publish on each copy an appropriate ! 438: copyright notice and disclaimer of warranty; keep intact all the ! 439: notices that refer to this License and to the absence of any warranty; ! 440: and give any other recipients of the Program a copy of this License ! 441: along with the Program. ! 442: ! 443: You may charge a fee for the physical act of transferring a copy, and ! 444: you may at your option offer warranty protection in exchange for a fee. ! 445: ! 446: @item ! 447: You may modify your copy or copies of the Program or any portion ! 448: of it, thus forming a work based on the Program, and copy and ! 449: distribute such modifications or work under the terms of Section 1 ! 450: above, provided that you also meet all of these conditions: ! 451: ! 452: @enumerate a ! 453: @item ! 454: You must cause the modified files to carry prominent notices ! 455: stating that you changed the files and the date of any change. ! 456: ! 457: @item ! 458: You must cause any work that you distribute or publish, that in ! 459: whole or in part contains or is derived from the Program or any ! 460: part thereof, to be licensed as a whole at no charge to all third ! 461: parties under the terms of this License. ! 462: ! 463: @item ! 464: If the modified program normally reads commands interactively ! 465: when run, you must cause it, when started running for such ! 466: interactive use in the most ordinary way, to print or display an ! 467: announcement including an appropriate copyright notice and a ! 468: notice that there is no warranty (or else, saying that you provide ! 469: a warranty) and that users may redistribute the program under ! 470: these conditions, and telling the user how to view a copy of this ! 471: License. (Exception: if the Program itself is interactive but ! 472: does not normally print such an announcement, your work based on ! 473: the Program is not required to print an announcement.) ! 474: @end enumerate ! 475: ! 476: These requirements apply to the modified work as a whole. If ! 477: identifiable sections of that work are not derived from the Program, ! 478: and can be reasonably considered independent and separate works in ! 479: themselves, then this License, and its terms, do not apply to those ! 480: sections when you distribute them as separate works. But when you ! 481: distribute the same sections as part of a whole which is a work based ! 482: on the Program, the distribution of the whole must be on the terms of ! 483: this License, whose permissions for other licensees extend to the ! 484: entire whole, and thus to each and every part regardless of who wrote it. ! 485: ! 486: Thus, it is not the intent of this section to claim rights or contest ! 487: your rights to work written entirely by you; rather, the intent is to ! 488: exercise the right to control the distribution of derivative or ! 489: collective works based on the Program. ! 490: ! 491: In addition, mere aggregation of another work not based on the Program ! 492: with the Program (or with a work based on the Program) on a volume of ! 493: a storage or distribution medium does not bring the other work under ! 494: the scope of this License. ! 495: ! 496: @item ! 497: You may copy and distribute the Program (or a work based on it, ! 498: under Section 2) in object code or executable form under the terms of ! 499: Sections 1 and 2 above provided that you also do one of the following: ! 500: ! 501: @enumerate a ! 502: @item ! 503: Accompany it with the complete corresponding machine-readable ! 504: source code, which must be distributed under the terms of Sections ! 505: 1 and 2 above on a medium customarily used for software interchange; or, ! 506: ! 507: @item ! 508: Accompany it with a written offer, valid for at least three ! 509: years, to give any third party, for a charge no more than your ! 510: cost of physically performing source distribution, a complete ! 511: machine-readable copy of the corresponding source code, to be ! 512: distributed under the terms of Sections 1 and 2 above on a medium ! 513: customarily used for software interchange; or, ! 514: ! 515: @item ! 516: Accompany it with the information you received as to the offer ! 517: to distribute corresponding source code. (This alternative is ! 518: allowed only for noncommercial distribution and only if you ! 519: received the program in object code or executable form with such ! 520: an offer, in accord with Subsection b above.) ! 521: @end enumerate ! 522: ! 523: The source code for a work means the preferred form of the work for ! 524: making modifications to it. For an executable work, complete source ! 525: code means all the source code for all modules it contains, plus any ! 526: associated interface definition files, plus the scripts used to ! 527: control compilation and installation of the executable. However, as a ! 528: special exception, the source code distributed need not include ! 529: anything that is normally distributed (in either source or binary ! 530: form) with the major components (compiler, kernel, and so on) of the ! 531: operating system on which the executable runs, unless that component ! 532: itself accompanies the executable. ! 533: ! 534: If distribution of executable or object code is made by offering ! 535: access to copy from a designated place, then offering equivalent ! 536: access to copy the source code from the same place counts as ! 537: distribution of the source code, even though third parties are not ! 538: compelled to copy the source along with the object code. ! 539: ! 540: @item ! 541: You may not copy, modify, sublicense, or distribute the Program ! 542: except as expressly provided under this License. Any attempt ! 543: otherwise to copy, modify, sublicense or distribute the Program is ! 544: void, and will automatically terminate your rights under this License. ! 545: However, parties who have received copies, or rights, from you under ! 546: this License will not have their licenses terminated so long as such ! 547: parties remain in full compliance. ! 548: ! 549: @item ! 550: You are not required to accept this License, since you have not ! 551: signed it. However, nothing else grants you permission to modify or ! 552: distribute the Program or its derivative works. These actions are ! 553: prohibited by law if you do not accept this License. Therefore, by ! 554: modifying or distributing the Program (or any work based on the ! 555: Program), you indicate your acceptance of this License to do so, and ! 556: all its terms and conditions for copying, distributing or modifying ! 557: the Program or works based on it. ! 558: ! 559: @item ! 560: Each time you redistribute the Program (or any work based on the ! 561: Program), the recipient automatically receives a license from the ! 562: original licensor to copy, distribute or modify the Program subject to ! 563: these terms and conditions. You may not impose any further ! 564: restrictions on the recipients' exercise of the rights granted herein. ! 565: You are not responsible for enforcing compliance by third parties to ! 566: this License. ! 567: ! 568: @item ! 569: If, as a consequence of a court judgment or allegation of patent ! 570: infringement or for any other reason (not limited to patent issues), ! 571: conditions are imposed on you (whether by court order, agreement or ! 572: otherwise) that contradict the conditions of this License, they do not ! 573: excuse you from the conditions of this License. If you cannot ! 574: distribute so as to satisfy simultaneously your obligations under this ! 575: License and any other pertinent obligations, then as a consequence you ! 576: may not distribute the Program at all. For example, if a patent ! 577: license would not permit royalty-free redistribution of the Program by ! 578: all those who receive copies directly or indirectly through you, then ! 579: the only way you could satisfy both it and this License would be to ! 580: refrain entirely from distribution of the Program. ! 581: ! 582: If any portion of this section is held invalid or unenforceable under ! 583: any particular circumstance, the balance of the section is intended to ! 584: apply and the section as a whole is intended to apply in other ! 585: circumstances. ! 586: ! 587: It is not the purpose of this section to induce you to infringe any ! 588: patents or other property right claims or to contest validity of any ! 589: such claims; this section has the sole purpose of protecting the ! 590: integrity of the free software distribution system, which is ! 591: implemented by public license practices. Many people have made ! 592: generous contributions to the wide range of software distributed ! 593: through that system in reliance on consistent application of that ! 594: system; it is up to the author/donor to decide if he or she is willing ! 595: to distribute software through any other system and a licensee cannot ! 596: impose that choice. ! 597: ! 598: This section is intended to make thoroughly clear what is believed to ! 599: be a consequence of the rest of this License. ! 600: ! 601: @item ! 602: If the distribution and/or use of the Program is restricted in ! 603: certain countries either by patents or by copyrighted interfaces, the ! 604: original copyright holder who places the Program under this License ! 605: may add an explicit geographical distribution limitation excluding ! 606: those countries, so that distribution is permitted only in or among ! 607: countries not thus excluded. In such case, this License incorporates ! 608: the limitation as if written in the body of this License. ! 609: ! 610: @item ! 611: The Free Software Foundation may publish revised and/or new versions ! 612: of the General Public License from time to time. Such new versions will ! 613: be similar in spirit to the present version, but may differ in detail to ! 614: address new problems or concerns. ! 615: ! 616: Each version is given a distinguishing version number. If the Program ! 617: specifies a version number of this License which applies to it and ``any ! 618: later version'', you have the option of following the terms and conditions ! 619: either of that version or of any later version published by the Free ! 620: Software Foundation. If the Program does not specify a version number of ! 621: this License, you may choose any version ever published by the Free Software ! 622: Foundation. ! 623: ! 624: @item ! 625: If you wish to incorporate parts of the Program into other free ! 626: programs whose distribution conditions are different, write to the author ! 627: to ask for permission. For software which is copyrighted by the Free ! 628: Software Foundation, write to the Free Software Foundation; we sometimes ! 629: make exceptions for this. Our decision will be guided by the two goals ! 630: of preserving the free status of all derivatives of our free software and ! 631: of promoting the sharing and reuse of software generally. ! 632: ! 633: @iftex ! 634: @heading NO WARRANTY ! 635: @end iftex ! 636: @ifinfo ! 637: @center NO WARRANTY ! 638: @end ifinfo ! 639: ! 640: @item ! 641: BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY ! 642: FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN ! 643: OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES ! 644: PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED ! 645: OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ! 646: MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS ! 647: TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE ! 648: PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, ! 649: REPAIR OR CORRECTION. ! 650: ! 651: @item ! 652: IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING ! 653: WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR ! 654: REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, ! 655: INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING ! 656: OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED ! 657: TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY ! 658: YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER ! 659: PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE ! 660: POSSIBILITY OF SUCH DAMAGES. ! 661: @end enumerate ! 662: ! 663: @iftex ! 664: @heading END OF TERMS AND CONDITIONS ! 665: @end iftex ! 666: @ifinfo ! 667: @center END OF TERMS AND CONDITIONS ! 668: @end ifinfo ! 669: ! 670: @page ! 671: @unnumberedsec How to Apply These Terms to Your New Programs ! 672: ! 673: If you develop a new program, and you want it to be of the greatest ! 674: possible use to the public, the best way to achieve this is to make it ! 675: free software which everyone can redistribute and change under these terms. ! 676: ! 677: To do so, attach the following notices to the program. It is safest ! 678: to attach them to the start of each source file to most effectively ! 679: convey the exclusion of warranty; and each file should have at least ! 680: the ``copyright'' line and a pointer to where the full notice is found. ! 681: ! 682: @smallexample ! 683: @var{one line to give the program's name and a brief idea of what it does.} ! 684: Copyright (C) 19@var{yy} @var{name of author} ! 685: ! 686: This program is free software; you can redistribute it and/or modify ! 687: it under the terms of the GNU General Public License as published by ! 688: the Free Software Foundation; either version 2 of the License, or ! 689: (at your option) any later version. ! 690: ! 691: This program is distributed in the hope that it will be useful, ! 692: but WITHOUT ANY WARRANTY; without even the implied warranty of ! 693: MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ! 694: GNU General Public License for more details. ! 695: ! 696: You should have received a copy of the GNU General Public License ! 697: along with this program; if not, write to the Free Software ! 698: Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. ! 699: @end smallexample ! 700: ! 701: Also add information on how to contact you by electronic and paper mail. ! 702: ! 703: If the program is interactive, make it output a short notice like this ! 704: when it starts in an interactive mode: ! 705: ! 706: @smallexample ! 707: Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author} ! 708: Gnomovision comes with ABSOLUTELY NO WARRANTY; for details ! 709: type `show w'. ! 710: This is free software, and you are welcome to redistribute it ! 711: under certain conditions; type `show c' for details. ! 712: @end smallexample ! 713: ! 714: The hypothetical commands @samp{show w} and @samp{show c} should show ! 715: the appropriate parts of the General Public License. Of course, the ! 716: commands you use may be called something other than @samp{show w} and ! 717: @samp{show c}; they could even be mouse-clicks or menu items---whatever ! 718: suits your program. ! 719: ! 720: You should also get your employer (if you work as a programmer) or your ! 721: school, if any, to sign a ``copyright disclaimer'' for the program, if ! 722: necessary. Here is a sample; alter the names: ! 723: ! 724: @smallexample ! 725: Yoyodyne, Inc., hereby disclaims all copyright interest in the program ! 726: `Gnomovision' (which makes passes at compilers) written by James Hacker. ! 727: ! 728: @var{signature of Ty Coon}, 1 April 1989 ! 729: Ty Coon, President of Vice ! 730: @end smallexample ! 731: ! 732: This General Public License does not permit incorporating your program into ! 733: proprietary programs. If your program is a subroutine library, you may ! 734: consider it more useful to permit linking proprietary applications with the ! 735: library. If this is what you want to do, use the GNU Library General ! 736: Public License instead of this License. ! 737: ! 738: @node Concepts, Examples, Copying, Top ! 739: @chapter The Concepts of Bison ! 740: ! 741: This chapter introduces many of the basic concepts without which the ! 742: details of Bison will not make sense. If you do not already know how to ! 743: use Bison or Yacc, we suggest you start by reading this chapter carefully. ! 744: ! 745: @menu ! 746: * Language and Grammar:: Languages and context-free grammars, ! 747: as mathematical ideas. ! 748: * Grammar in Bison:: How we represent grammars for Bison's sake. ! 749: * Semantic Values:: Each token or syntactic grouping can have ! 750: a semantic value (the value of an integer, ! 751: the name of an identifier, etc.). ! 752: * Semantic Actions:: Each rule can have an action containing C code. ! 753: * Bison Parser:: What are Bison's input and output, ! 754: how is the output used? ! 755: * Stages:: Stages in writing and running Bison grammars. ! 756: * Grammar Layout:: Overall structure of a Bison grammar file. ! 757: @end menu ! 758: ! 759: @node Language and Grammar, Grammar in Bison, , Concepts ! 760: @section Languages and Context-Free Grammars ! 761: ! 762: @c !!! ``An expression can be an integer'' is not a valid Bison ! 763: @c expression---Bison cannot read English! --rjc 6 Feb 1992 ! 764: @cindex context-free grammar ! 765: @cindex grammar, context-free ! 766: In order for Bison to parse a language, it must be described by a ! 767: @dfn{context-free grammar}. This means that you specify one or more ! 768: @dfn{syntactic groupings} and give rules for constructing them from their ! 769: parts. For example, in the C language, one kind of grouping is called an ! 770: `expression'. One rule for making an expression might be, ``An expression ! 771: can be made of a minus sign and another expression''. Another would be, ! 772: ``An expression can be an integer''. As you can see, rules are often ! 773: recursive, but there must be at least one rule which leads out of the ! 774: recursion. ! 775: ! 776: @cindex BNF ! 777: @cindex Backus-Naur form ! 778: The most common formal system for presenting such rules for humans to read ! 779: is @dfn{Backus-Naur Form} or ``BNF'', which was developed in order to ! 780: specify the language Algol 60. Any grammar expressed in BNF is a ! 781: context-free grammar. The input to Bison is essentially machine-readable ! 782: BNF. ! 783: ! 784: Not all context-free languages can be handled by Bison, only those ! 785: that are LALR(1). In brief, this means that it must be possible to ! 786: tell how to parse any portion of an input string with just a single ! 787: token of look-ahead. Strictly speaking, that is a description of an ! 788: LR(1) grammar, and LALR(1) involves additional restrictions that are ! 789: hard to explain simply; but it is rare in actual practice to find an ! 790: LR(1) grammar that fails to be LALR(1). @xref{Mystery Conflicts, , ! 791: Mysterious Reduce/Reduce Conflicts}, for more information on this. ! 792: ! 793: @cindex symbols (abstract) ! 794: @cindex token ! 795: @cindex syntactic grouping ! 796: @cindex grouping, syntactic ! 797: In the formal grammatical rules for a language, each kind of syntactic unit ! 798: or grouping is named by a @dfn{symbol}. Those which are built by grouping ! 799: smaller constructs according to grammatical rules are called ! 800: @dfn{nonterminal symbols}; those which can't be subdivided are called ! 801: @dfn{terminal symbols} or @dfn{token types}. We call a piece of input ! 802: corresponding to a single terminal symbol a @dfn{token}, and a piece ! 803: corresponding to a single nonterminal symbol a @dfn{grouping}.@refill ! 804: ! 805: We can use the C language as an example of what symbols, terminal and ! 806: nonterminal, mean. The tokens of C are identifiers, constants (numeric and ! 807: string), and the various keywords, arithmetic operators and punctuation ! 808: marks. So the terminal symbols of a grammar for C include `identifier', ! 809: `number', `string', plus one symbol for each keyword, operator or ! 810: punctuation mark: `if', `return', `const', `static', `int', `char', ! 811: `plus-sign', `open-brace', `close-brace', `comma' and many more. (These ! 812: tokens can be subdivided into characters, but that is a matter of ! 813: lexicography, not grammar.) ! 814: ! 815: Here is a simple C function subdivided into tokens: ! 816: ! 817: @example ! 818: int /* @r{keyword `int'} */ ! 819: square (x) /* @r{identifier, open-paren,} */ ! 820: /* @r{identifier, close-paren} */ ! 821: int x; /* @r{keyword `int', identifier, semicolon} */ ! 822: @{ /* @r{open-brace} */ ! 823: return x * x; /* @r{keyword `return', identifier,} */ ! 824: /* @r{asterisk, identifier, semicolon} */ ! 825: @} /* @r{close-brace} */ ! 826: @end example ! 827: ! 828: The syntactic groupings of C include the expression, the statement, the ! 829: declaration, and the function definition. These are represented in the ! 830: grammar of C by nonterminal symbols `expression', `statement', ! 831: `declaration' and `function definition'. The full grammar uses dozens of ! 832: additional language constructs, each with its own nonterminal symbol, in ! 833: order to express the meanings of these four. The example above is a ! 834: function definition; it contains one declaration, and one statement. In ! 835: the statement, each @samp{x} is an expression and so is @samp{x * x}. ! 836: ! 837: Each nonterminal symbol must have grammatical rules showing how it is made ! 838: out of simpler constructs. For example, one kind of C statement is the ! 839: @code{return} statement; this would be described with a grammar rule which ! 840: reads informally as follows: ! 841: ! 842: @quotation ! 843: A `statement' can be made of a `return' keyword, an `expression' and a ! 844: `semicolon'. ! 845: @end quotation ! 846: ! 847: @noindent ! 848: There would be many other rules for `statement', one for each kind of ! 849: statement in C. ! 850: ! 851: @cindex start symbol ! 852: One nonterminal symbol must be distinguished as the special one which ! 853: defines a complete utterance in the language. It is called the @dfn{start ! 854: symbol}. In a compiler, this means a complete input program. In the C ! 855: language, the nonterminal symbol `sequence of definitions and declarations' ! 856: plays this role. ! 857: ! 858: For example, @samp{1 + 2} is a valid C expression---a valid part of a C ! 859: program---but it is not valid as an @emph{entire} C program. In the ! 860: context-free grammar of C, this follows from the fact that `expression' is ! 861: not the start symbol. ! 862: ! 863: The Bison parser reads a sequence of tokens as its input, and groups the ! 864: tokens using the grammar rules. If the input is valid, the end result is ! 865: that the entire token sequence reduces to a single grouping whose symbol is ! 866: the grammar's start symbol. If we use a grammar for C, the entire input ! 867: must be a `sequence of definitions and declarations'. If not, the parser ! 868: reports a syntax error. ! 869: ! 870: @node Grammar in Bison, Semantic Values, Language and Grammar, Concepts ! 871: @section From Formal Rules to Bison Input ! 872: @cindex Bison grammar ! 873: @cindex grammar, Bison ! 874: @cindex formal grammar ! 875: ! 876: A formal grammar is a mathematical construct. To define the language ! 877: for Bison, you must write a file expressing the grammar in Bison syntax: ! 878: a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}. ! 879: ! 880: A nonterminal symbol in the formal grammar is represented in Bison input ! 881: as an identifier, like an identifier in C. By convention, it should be ! 882: in lower case, such as @code{expr}, @code{stmt} or @code{declaration}. ! 883: ! 884: The Bison representation for a terminal symbol is also called a @dfn{token ! 885: type}. Token types as well can be represented as C-like identifiers. By ! 886: convention, these identifiers should be upper case to distinguish them from ! 887: nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or ! 888: @code{RETURN}. A terminal symbol that stands for a particular keyword in ! 889: the language should be named after that keyword converted to upper case. ! 890: The terminal symbol @code{error} is reserved for error recovery. ! 891: @xref{Symbols}.@refill ! 892: ! 893: A terminal symbol can also be represented as a character literal, just like ! 894: a C character constant. You should do this whenever a token is just a ! 895: single character (parenthesis, plus-sign, etc.): use that same character in ! 896: a literal as the terminal symbol for that token. ! 897: ! 898: The grammar rules also have an expression in Bison syntax. For example, ! 899: here is the Bison rule for a C @code{return} statement. The semicolon in ! 900: quotes is a literal character token, representing part of the C syntax for ! 901: the statement; the naked semicolon, and the colon, are Bison punctuation ! 902: used in every rule. ! 903: ! 904: @example ! 905: stmt: RETURN expr ';' ! 906: ; ! 907: @end example ! 908: ! 909: @noindent ! 910: @xref{Rules, ,Syntax of Grammar Rules}. ! 911: ! 912: @node Semantic Values, Semantic Actions, Grammar in Bison, Concepts ! 913: @section Semantic Values ! 914: @cindex semantic value ! 915: @cindex value, semantic ! 916: ! 917: A formal grammar selects tokens only by their classifications: for example, ! 918: if a rule mentions the terminal symbol `integer constant', it means that ! 919: @emph{any} integer constant is grammatically valid in that position. The ! 920: precise value of the constant is irrelevant to how to parse the input: if ! 921: @samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally ! 922: grammatical.@refill ! 923: ! 924: But the precise value is very important for what the input means once it is ! 925: parsed. A compiler is useless if it fails to distinguish between 4, 1 and ! 926: 3989 as constants in the program! Therefore, each token in a Bison grammar ! 927: has both a token type and a @dfn{semantic value}. @xref{Semantics, ,Defining Language Semantics}, ! 928: for details. ! 929: ! 930: The token type is a terminal symbol defined in the grammar, such as ! 931: @code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything ! 932: you need to know to decide where the token may validly appear and how to ! 933: group it with other tokens. The grammar rules know nothing about tokens ! 934: except their types.@refill ! 935: ! 936: The semantic value has all the rest of the information about the ! 937: meaning of the token, such as the value of an integer, or the name of an ! 938: identifier. (A token such as @code{','} which is just punctuation doesn't ! 939: need to have any semantic value.) ! 940: ! 941: For example, an input token might be classified as token type ! 942: @code{INTEGER} and have the semantic value 4. Another input token might ! 943: have the same token type @code{INTEGER} but value 3989. When a grammar ! 944: rule says that @code{INTEGER} is allowed, either of these tokens is ! 945: acceptable because each is an @code{INTEGER}. When the parser accepts the ! 946: token, it keeps track of the token's semantic value. ! 947: ! 948: Each grouping can also have a semantic value as well as its nonterminal ! 949: symbol. For example, in a calculator, an expression typically has a ! 950: semantic value that is a number. In a compiler for a programming ! 951: language, an expression typically has a semantic value that is a tree ! 952: structure describing the meaning of the expression. ! 953: ! 954: @node Semantic Actions, Bison Parser, Semantic Values, Concepts ! 955: @section Semantic Actions ! 956: @cindex semantic actions ! 957: @cindex actions, semantic ! 958: ! 959: In order to be useful, a program must do more than parse input; it must ! 960: also produce some output based on the input. In a Bison grammar, a grammar ! 961: rule can have an @dfn{action} made up of C statements. Each time the ! 962: parser recognizes a match for that rule, the action is executed. ! 963: @xref{Actions}. ! 964: ! 965: Most of the time, the purpose of an action is to compute the semantic value ! 966: of the whole construct from the semantic values of its parts. For example, ! 967: suppose we have a rule which says an expression can be the sum of two ! 968: expressions. When the parser recognizes such a sum, each of the ! 969: subexpressions has a semantic value which describes how it was built up. ! 970: The action for this rule should create a similar sort of value for the ! 971: newly recognized larger expression. ! 972: ! 973: For example, here is a rule that says an expression can be the sum of ! 974: two subexpressions: ! 975: ! 976: @example ! 977: expr: expr '+' expr @{ $$ = $1 + $3; @} ! 978: ; ! 979: @end example ! 980: ! 981: @noindent ! 982: The action says how to produce the semantic value of the sum expression ! 983: from the values of the two subexpressions. ! 984: ! 985: @node Bison Parser, Stages, Semantic Actions, Concepts ! 986: @section Bison Output: the Parser File ! 987: @cindex Bison parser ! 988: @cindex Bison utility ! 989: @cindex lexical analyzer, purpose ! 990: @cindex parser ! 991: ! 992: When you run Bison, you give it a Bison grammar file as input. The output ! 993: is a C source file that parses the language described by the grammar. ! 994: This file is called a @dfn{Bison parser}. Keep in mind that the Bison ! 995: utility and the Bison parser are two distinct programs: the Bison utility ! 996: is a program whose output is the Bison parser that becomes part of your ! 997: program. ! 998: ! 999: The job of the Bison parser is to group tokens into groupings according to ! 1000: the grammar rules---for example, to build identifiers and operators into ! 1001: expressions. As it does this, it runs the actions for the grammar rules it ! 1002: uses. ! 1003: ! 1004: The tokens come from a function called the @dfn{lexical analyzer} that you ! 1005: must supply in some fashion (such as by writing it in C). The Bison parser ! 1006: calls the lexical analyzer each time it wants a new token. It doesn't know ! 1007: what is ``inside'' the tokens (though their semantic values may reflect ! 1008: this). Typically the lexical analyzer makes the tokens by parsing ! 1009: characters of text, but Bison does not depend on this. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}. ! 1010: ! 1011: The Bison parser file is C code which defines a function named ! 1012: @code{yyparse} which implements that grammar. This function does not make ! 1013: a complete C program: you must supply some additional functions. One is ! 1014: the lexical analyzer. Another is an error-reporting function which the ! 1015: parser calls to report an error. In addition, a complete C program must ! 1016: start with a function called @code{main}; you have to provide this, and ! 1017: arrange for it to call @code{yyparse} or the parser will never run. ! 1018: @xref{Interface, ,Parser C-Language Interface}. ! 1019: ! 1020: Aside from the token type names and the symbols in the actions you ! 1021: write, all variable and function names used in the Bison parser file ! 1022: begin with @samp{yy} or @samp{YY}. This includes interface functions ! 1023: such as the lexical analyzer function @code{yylex}, the error reporting ! 1024: function @code{yyerror} and the parser function @code{yyparse} itself. ! 1025: This also includes numerous identifiers used for internal purposes. ! 1026: Therefore, you should avoid using C identifiers starting with @samp{yy} ! 1027: or @samp{YY} in the Bison grammar file except for the ones defined in ! 1028: this manual. ! 1029: ! 1030: @node Stages, Grammar Layout, Bison Parser, Concepts ! 1031: @section Stages in Using Bison ! 1032: @cindex stages in using Bison ! 1033: @cindex using Bison ! 1034: ! 1035: The actual language-design process using Bison, from grammar specification ! 1036: to a working compiler or interpreter, has these parts: ! 1037: ! 1038: @enumerate ! 1039: @item ! 1040: Formally specify the grammar in a form recognized by Bison ! 1041: (@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule in the language, ! 1042: describe the action that is to be taken when an instance of that rule ! 1043: is recognized. The action is described by a sequence of C statements. ! 1044: ! 1045: @item ! 1046: Write a lexical analyzer to process input and pass tokens to the ! 1047: parser. The lexical analyzer may be written by hand in C ! 1048: (@pxref{Lexical, ,The Lexical Analyzer Function @code{yylex}}). It could also be produced using Lex, but the use ! 1049: of Lex is not discussed in this manual. ! 1050: ! 1051: @item ! 1052: Write a controlling function that calls the Bison-produced parser. ! 1053: ! 1054: @item ! 1055: Write error-reporting routines. ! 1056: @end enumerate ! 1057: ! 1058: To turn this source code as written into a runnable program, you ! 1059: must follow these steps: ! 1060: ! 1061: @enumerate ! 1062: @item ! 1063: Run Bison on the grammar to produce the parser. ! 1064: ! 1065: @item ! 1066: Compile the code output by Bison, as well as any other source files. ! 1067: ! 1068: @item ! 1069: Link the object files to produce the finished product. ! 1070: @end enumerate ! 1071: ! 1072: @node Grammar Layout, , Stages, Concepts ! 1073: @section The Overall Layout of a Bison Grammar ! 1074: @cindex grammar file ! 1075: @cindex file format ! 1076: @cindex format of grammar file ! 1077: @cindex layout of Bison grammar ! 1078: ! 1079: The input file for the Bison utility is a @dfn{Bison grammar file}. The ! 1080: general form of a Bison grammar file is as follows: ! 1081: ! 1082: @example ! 1083: %@{ ! 1084: @var{C declarations} ! 1085: %@} ! 1086: ! 1087: @var{Bison declarations} ! 1088: ! 1089: %% ! 1090: @var{Grammar rules} ! 1091: %% ! 1092: @var{Additional C code} ! 1093: @end example ! 1094: ! 1095: @noindent ! 1096: The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears ! 1097: in every Bison grammar file to separate the sections. ! 1098: ! 1099: The C declarations may define types and variables used in the actions. ! 1100: You can also use preprocessor commands to define macros used there, and use ! 1101: @code{#include} to include header files that do any of these things. ! 1102: ! 1103: The Bison declarations declare the names of the terminal and nonterminal ! 1104: symbols, and may also describe operator precedence and the data types of ! 1105: semantic values of various symbols. ! 1106: ! 1107: The grammar rules define how to construct each nonterminal symbol from its ! 1108: parts. ! 1109: ! 1110: The additional C code can contain any C code you want to use. Often the ! 1111: definition of the lexical analyzer @code{yylex} goes here, plus subroutines ! 1112: called by the actions in the grammar rules. In a simple program, all the ! 1113: rest of the program can go here. ! 1114: ! 1115: @node Examples, Grammar File, Concepts, Top ! 1116: @chapter Examples ! 1117: @cindex simple examples ! 1118: @cindex examples, simple ! 1119: ! 1120: Now we show and explain three sample programs written using Bison: a ! 1121: reverse polish notation calculator, an algebraic (infix) notation ! 1122: calculator, and a multi-function calculator. All three have been tested ! 1123: under BSD Unix 4.3; each produces a usable, though limited, interactive ! 1124: desk-top calculator. ! 1125: ! 1126: These examples are simple, but Bison grammars for real programming ! 1127: languages are written the same way. ! 1128: @ifinfo ! 1129: You can copy these examples out of the Info file and into a source file ! 1130: to try them. ! 1131: @end ifinfo ! 1132: ! 1133: @menu ! 1134: * RPN Calc:: Reverse polish notation calculator; ! 1135: a first example with no operator precedence. ! 1136: * Infix Calc:: Infix (algebraic) notation calculator. ! 1137: Operator precedence is introduced. ! 1138: * Simple Error Recovery:: Continuing after syntax errors. ! 1139: * Multi-function Calc:: Calculator with memory and trig functions. ! 1140: It uses multiple data-types for semantic values. ! 1141: * Exercises:: Ideas for improving the multi-function calculator. ! 1142: @end menu ! 1143: ! 1144: @node RPN Calc, Infix Calc, , Examples ! 1145: @section Reverse Polish Notation Calculator ! 1146: @cindex reverse polish notation ! 1147: @cindex polish notation calculator ! 1148: @cindex @code{rpcalc} ! 1149: @cindex calculator, simple ! 1150: ! 1151: The first example is that of a simple double-precision @dfn{reverse polish ! 1152: notation} calculator (a calculator using postfix operators). This example ! 1153: provides a good starting point, since operator precedence is not an issue. ! 1154: The second example will illustrate how operator precedence is handled. ! 1155: ! 1156: The source code for this calculator is named @file{rpcalc.y}. The ! 1157: @samp{.y} extension is a convention used for Bison input files. ! 1158: ! 1159: @menu ! 1160: * Decls: Rpcalc Decls. Bison and C declarations for rpcalc. ! 1161: * Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation. ! 1162: * Lexer: Rpcalc Lexer. The lexical analyzer. ! 1163: * Main: Rpcalc Main. The controlling function. ! 1164: * Error: Rpcalc Error. The error reporting function. ! 1165: * Gen: Rpcalc Gen. Running Bison on the grammar file. ! 1166: * Comp: Rpcalc Compile. Run the C compiler on the output code. ! 1167: @end menu ! 1168: ! 1169: @node Rpcalc Decls, Rpcalc Rules, , RPN Calc ! 1170: @subsection Declarations for @code{rpcalc} ! 1171: ! 1172: Here are the C and Bison declarations for the reverse polish notation ! 1173: calculator. As in C, comments are placed between @samp{/*@dots{}*/}. ! 1174: ! 1175: @example ! 1176: /* Reverse polish notation calculator. */ ! 1177: ! 1178: %@{ ! 1179: #define YYSTYPE double ! 1180: #include <math.h> ! 1181: %@} ! 1182: ! 1183: %token NUM ! 1184: ! 1185: %% /* Grammar rules and actions follow */ ! 1186: @end example ! 1187: ! 1188: The C declarations section (@pxref{C Declarations, ,The C Declarations Section}) contains two ! 1189: preprocessor directives. ! 1190: ! 1191: The @code{#define} directive defines the macro @code{YYSTYPE}, thus ! 1192: specifying the C data type for semantic values of both tokens and groupings ! 1193: (@pxref{Value Type, ,Data Types of Semantic Values}). The Bison parser will use whatever type ! 1194: @code{YYSTYPE} is defined as; if you don't define it, @code{int} is the ! 1195: default. Because we specify @code{double}, each token and each expression ! 1196: has an associated value, which is a floating point number. ! 1197: ! 1198: The @code{#include} directive is used to declare the exponentiation ! 1199: function @code{pow}. ! 1200: ! 1201: The second section, Bison declarations, provides information to Bison about ! 1202: the token types (@pxref{Bison Declarations, ,The Bison Declarations Section}). Each terminal symbol that is ! 1203: not a single-character literal must be declared here. (Single-character ! 1204: literals normally don't need to be declared.) In this example, all the ! 1205: arithmetic operators are designated by single-character literals, so the ! 1206: only terminal symbol that needs to be declared is @code{NUM}, the token ! 1207: type for numeric constants. ! 1208: ! 1209: @node Rpcalc Rules, Rpcalc Lexer, Rpcalc Decls, RPN Calc ! 1210: @subsection Grammar Rules for @code{rpcalc} ! 1211: ! 1212: Here are the grammar rules for the reverse polish notation calculator. ! 1213: ! 1214: @example ! 1215: input: /* empty */ ! 1216: | input line ! 1217: ; ! 1218: ! 1219: line: '\n' ! 1220: | exp '\n' @{ printf ("\t%.10g\n", $1); @} ! 1221: ; ! 1222: ! 1223: exp: NUM @{ $$ = $1; @} ! 1224: | exp exp '+' @{ $$ = $1 + $2; @} ! 1225: | exp exp '-' @{ $$ = $1 - $2; @} ! 1226: | exp exp '*' @{ $$ = $1 * $2; @} ! 1227: | exp exp '/' @{ $$ = $1 / $2; @} ! 1228: /* Exponentiation */ ! 1229: | exp exp '^' @{ $$ = pow ($1, $2); @} ! 1230: /* Unary minus */ ! 1231: | exp 'n' @{ $$ = -$1; @} ! 1232: ; ! 1233: %% ! 1234: @end example ! 1235: ! 1236: The groupings of the rpcalc ``language'' defined here are the expression ! 1237: (given the name @code{exp}), the line of input (@code{line}), and the ! 1238: complete input transcript (@code{input}). Each of these nonterminal ! 1239: symbols has several alternate rules, joined by the @samp{|} punctuator ! 1240: which is read as ``or''. The following sections explain what these rules ! 1241: mean. ! 1242: ! 1243: The semantics of the language is determined by the actions taken when a ! 1244: grouping is recognized. The actions are the C code that appears inside ! 1245: braces. @xref{Actions}. ! 1246: ! 1247: You must specify these actions in C, but Bison provides the means for ! 1248: passing semantic values between the rules. In each action, the ! 1249: pseudo-variable @code{$$} stands for the semantic value for the grouping ! 1250: that the rule is going to construct. Assigning a value to @code{$$} is the ! 1251: main job of most actions. The semantic values of the components of the ! 1252: rule are referred to as @code{$1}, @code{$2}, and so on. ! 1253: ! 1254: @menu ! 1255: * Rpcalc Input:: ! 1256: * Rpcalc Line:: ! 1257: * Rpcalc Expr:: ! 1258: @end menu ! 1259: ! 1260: @node Rpcalc Input, Rpcalc Line, , Rpcalc Rules ! 1261: @subsubsection Explanation of @code{input} ! 1262: ! 1263: Consider the definition of @code{input}: ! 1264: ! 1265: @example ! 1266: input: /* empty */ ! 1267: | input line ! 1268: ; ! 1269: @end example ! 1270: ! 1271: This definition reads as follows: ``A complete input is either an empty ! 1272: string, or a complete input followed by an input line''. Notice that ! 1273: ``complete input'' is defined in terms of itself. This definition is said ! 1274: to be @dfn{left recursive} since @code{input} appears always as the ! 1275: leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}. ! 1276: ! 1277: The first alternative is empty because there are no symbols between the ! 1278: colon and the first @samp{|}; this means that @code{input} can match an ! 1279: empty string of input (no tokens). We write the rules this way because it ! 1280: is legitimate to type @kbd{Ctrl-d} right after you start the calculator. ! 1281: It's conventional to put an empty alternative first and write the comment ! 1282: @samp{/* empty */} in it. ! 1283: ! 1284: The second alternate rule (@code{input line}) handles all nontrivial input. ! 1285: It means, ``After reading any number of lines, read one more line if ! 1286: possible.'' The left recursion makes this rule into a loop. Since the ! 1287: first alternative matches empty input, the loop can be executed zero or ! 1288: more times. ! 1289: ! 1290: The parser function @code{yyparse} continues to process input until a ! 1291: grammatical error is seen or the lexical analyzer says there are no more ! 1292: input tokens; we will arrange for the latter to happen at end of file. ! 1293: ! 1294: @node Rpcalc Line, Rpcalc Expr, Rpcalc Input, Rpcalc Rules ! 1295: @subsubsection Explanation of @code{line} ! 1296: ! 1297: Now consider the definition of @code{line}: ! 1298: ! 1299: @example ! 1300: line: '\n' ! 1301: | exp '\n' @{ printf ("\t%.10g\n", $1); @} ! 1302: ; ! 1303: @end example ! 1304: ! 1305: The first alternative is a token which is a newline character; this means ! 1306: that rpcalc accepts a blank line (and ignores it, since there is no ! 1307: action). The second alternative is an expression followed by a newline. ! 1308: This is the alternative that makes rpcalc useful. The semantic value of ! 1309: the @code{exp} grouping is the value of @code{$1} because the @code{exp} in ! 1310: question is the first symbol in the alternative. The action prints this ! 1311: value, which is the result of the computation the user asked for. ! 1312: ! 1313: This action is unusual because it does not assign a value to @code{$$}. As ! 1314: a consequence, the semantic value associated with the @code{line} is ! 1315: uninitialized (its value will be unpredictable). This would be a bug if ! 1316: that value were ever used, but we don't use it: once rpcalc has printed the ! 1317: value of the user's input line, that value is no longer needed. ! 1318: ! 1319: @node Rpcalc Expr, , Rpcalc Line, Rpcalc Rules ! 1320: @subsubsection Explanation of @code{expr} ! 1321: ! 1322: The @code{exp} grouping has several rules, one for each kind of expression. ! 1323: The first rule handles the simplest expressions: those that are just numbers. ! 1324: The second handles an addition-expression, which looks like two expressions ! 1325: followed by a plus-sign. The third handles subtraction, and so on. ! 1326: ! 1327: @example ! 1328: exp: NUM ! 1329: | exp exp '+' @{ $$ = $1 + $2; @} ! 1330: | exp exp '-' @{ $$ = $1 - $2; @} ! 1331: @dots{} ! 1332: ; ! 1333: @end example ! 1334: ! 1335: We have used @samp{|} to join all the rules for @code{exp}, but we could ! 1336: equally well have written them separately: ! 1337: ! 1338: @example ! 1339: exp: NUM ; ! 1340: exp: exp exp '+' @{ $$ = $1 + $2; @} ; ! 1341: exp: exp exp '-' @{ $$ = $1 - $2; @} ; ! 1342: @dots{} ! 1343: @end example ! 1344: ! 1345: Most of the rules have actions that compute the value of the expression in ! 1346: terms of the value of its parts. For example, in the rule for addition, ! 1347: @code{$1} refers to the first component @code{exp} and @code{$2} refers to ! 1348: the second one. The third component, @code{'+'}, has no meaningful ! 1349: associated semantic value, but if it had one you could refer to it as ! 1350: @code{$3}. When @code{yyparse} recognizes a sum expression using this ! 1351: rule, the sum of the two subexpressions' values is produced as the value of ! 1352: the entire expression. @xref{Actions}. ! 1353: ! 1354: You don't have to give an action for every rule. When a rule has no ! 1355: action, Bison by default copies the value of @code{$1} into @code{$$}. ! 1356: This is what happens in the first rule (the one that uses @code{NUM}). ! 1357: ! 1358: The formatting shown here is the recommended convention, but Bison does ! 1359: not require it. You can add or change whitespace as much as you wish. ! 1360: For example, this: ! 1361: ! 1362: @example ! 1363: exp : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ! 1364: @end example ! 1365: ! 1366: @noindent ! 1367: means the same thing as this: ! 1368: ! 1369: @example ! 1370: exp: NUM ! 1371: | exp exp '+' @{ $$ = $1 + $2; @} ! 1372: | @dots{} ! 1373: @end example ! 1374: ! 1375: @noindent ! 1376: The latter, however, is much more readable. ! 1377: ! 1378: @node Rpcalc Lexer, Rpcalc Main, Rpcalc Rules, RPN Calc ! 1379: @subsection The @code{rpcalc} Lexical Analyzer ! 1380: @cindex writing a lexical analyzer ! 1381: @cindex lexical analyzer, writing ! 1382: ! 1383: The lexical analyzer's job is low-level parsing: converting characters or ! 1384: sequences of characters into tokens. The Bison parser gets its tokens by ! 1385: calling the lexical analyzer. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}. ! 1386: ! 1387: Only a simple lexical analyzer is needed for the RPN calculator. This ! 1388: lexical analyzer skips blanks and tabs, then reads in numbers as ! 1389: @code{double} and returns them as @code{NUM} tokens. Any other character ! 1390: that isn't part of a number is a separate token. Note that the token-code ! 1391: for such a single-character token is the character itself. ! 1392: ! 1393: The return value of the lexical analyzer function is a numeric code which ! 1394: represents a token type. The same text used in Bison rules to stand for ! 1395: this token type is also a C expression for the numeric code for the type. ! 1396: This works in two ways. If the token type is a character literal, then its ! 1397: numeric code is the ASCII code for that character; you can use the same ! 1398: character literal in the lexical analyzer to express the number. If the ! 1399: token type is an identifier, that identifier is defined by Bison as a C ! 1400: macro whose definition is the appropriate number. In this example, ! 1401: therefore, @code{NUM} becomes a macro for @code{yylex} to use. ! 1402: ! 1403: The semantic value of the token (if it has one) is stored into the global ! 1404: variable @code{yylval}, which is where the Bison parser will look for it. ! 1405: (The C data type of @code{yylval} is @code{YYSTYPE}, which was defined ! 1406: at the beginning of the grammar; @pxref{Rpcalc Decls, ,Declarations for @code{rpcalc}}.) ! 1407: ! 1408: A token type code of zero is returned if the end-of-file is encountered. ! 1409: (Bison recognizes any nonpositive value as indicating the end of the ! 1410: input.) ! 1411: ! 1412: Here is the code for the lexical analyzer: ! 1413: ! 1414: @example ! 1415: @group ! 1416: /* Lexical analyzer returns a double floating point ! 1417: number on the stack and the token NUM, or the ASCII ! 1418: character read if not a number. Skips all blanks ! 1419: and tabs, returns 0 for EOF. */ ! 1420: ! 1421: #include <ctype.h> ! 1422: @end group ! 1423: ! 1424: @group ! 1425: yylex () ! 1426: @{ ! 1427: int c; ! 1428: ! 1429: /* skip white space */ ! 1430: while ((c = getchar ()) == ' ' || c == '\t') ! 1431: ; ! 1432: @end group ! 1433: @group ! 1434: /* process numbers */ ! 1435: if (c == '.' || isdigit (c)) ! 1436: @{ ! 1437: ungetc (c, stdin); ! 1438: scanf ("%lf", &yylval); ! 1439: return NUM; ! 1440: @} ! 1441: @end group ! 1442: @group ! 1443: /* return end-of-file */ ! 1444: if (c == EOF) ! 1445: return 0; ! 1446: /* return single chars */ ! 1447: return c; ! 1448: @} ! 1449: @end group ! 1450: @end example ! 1451: ! 1452: @node Rpcalc Main, Rpcalc Error, Rpcalc Lexer, RPN Calc ! 1453: @subsection The Controlling Function ! 1454: @cindex controlling function ! 1455: @cindex main function in simple example ! 1456: ! 1457: In keeping with the spirit of this example, the controlling function is ! 1458: kept to the bare minimum. The only requirement is that it call ! 1459: @code{yyparse} to start the process of parsing. ! 1460: ! 1461: @example ! 1462: @group ! 1463: main () ! 1464: @{ ! 1465: yyparse (); ! 1466: @} ! 1467: @end group ! 1468: @end example ! 1469: ! 1470: @node Rpcalc Error, Rpcalc Gen, Rpcalc Main, RPN Calc ! 1471: @subsection The Error Reporting Routine ! 1472: @cindex error reporting routine ! 1473: ! 1474: When @code{yyparse} detects a syntax error, it calls the error reporting ! 1475: function @code{yyerror} to print an error message (usually but not always ! 1476: @code{"parse error"}). It is up to the programmer to supply @code{yyerror} ! 1477: (@pxref{Interface, ,Parser C-Language Interface}), so here is the definition we will use: ! 1478: ! 1479: @example ! 1480: @group ! 1481: #include <stdio.h> ! 1482: ! 1483: yyerror (s) /* Called by yyparse on error */ ! 1484: char *s; ! 1485: @{ ! 1486: printf ("%s\n", s); ! 1487: @} ! 1488: @end group ! 1489: @end example ! 1490: ! 1491: After @code{yyerror} returns, the Bison parser may recover from the error ! 1492: and continue parsing if the grammar contains a suitable error rule ! 1493: (@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We ! 1494: have not written any error rules in this example, so any invalid input will ! 1495: cause the calculator program to exit. This is not clean behavior for a ! 1496: real calculator, but it is adequate in the first example. ! 1497: ! 1498: @node Rpcalc Gen, Rpcalc Compile, Rpcalc Error, RPN Calc ! 1499: @subsection Running Bison to Make the Parser ! 1500: @cindex running Bison (introduction) ! 1501: ! 1502: Before running Bison to produce a parser, we need to decide how to arrange ! 1503: all the source code in one or more source files. For such a simple example, ! 1504: the easiest thing is to put everything in one file. The definitions of ! 1505: @code{yylex}, @code{yyerror} and @code{main} go at the end, in the ! 1506: ``additional C code'' section of the file (@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}). ! 1507: ! 1508: For a large project, you would probably have several source files, and use ! 1509: @code{make} to arrange to recompile them. ! 1510: ! 1511: With all the source in a single file, you use the following command to ! 1512: convert it into a parser file: ! 1513: ! 1514: @example ! 1515: bison @var{file_name}.y ! 1516: @end example ! 1517: ! 1518: @noindent ! 1519: In this example the file was called @file{rpcalc.y} (for ``Reverse Polish ! 1520: CALCulator''). Bison produces a file named @file{@var{file_name}.tab.c}, ! 1521: removing the @samp{.y} from the original file name. The file output by ! 1522: Bison contains the source code for @code{yyparse}. The additional ! 1523: functions in the input file (@code{yylex}, @code{yyerror} and @code{main}) ! 1524: are copied verbatim to the output. ! 1525: ! 1526: @node Rpcalc Compile, , Rpcalc Gen, RPN Calc ! 1527: @subsection Compiling the Parser File ! 1528: @cindex compiling the parser ! 1529: ! 1530: Here is how to compile and run the parser file: ! 1531: ! 1532: @example ! 1533: @group ! 1534: # @r{List files in current directory.} ! 1535: % ls ! 1536: rpcalc.tab.c rpcalc.y ! 1537: @end group ! 1538: ! 1539: @group ! 1540: # @r{Compile the Bison parser.} ! 1541: # @r{@samp{-lm} tells compiler to search math library for @code{pow}.} ! 1542: % cc rpcalc.tab.c -lm -o rpcalc ! 1543: @end group ! 1544: ! 1545: @group ! 1546: # @r{List files again.} ! 1547: % ls ! 1548: rpcalc rpcalc.tab.c rpcalc.y ! 1549: @end group ! 1550: @end example ! 1551: ! 1552: The file @file{rpcalc} now contains the executable code. Here is an ! 1553: example session using @code{rpcalc}. ! 1554: ! 1555: @example ! 1556: % rpcalc ! 1557: 4 9 + ! 1558: 13 ! 1559: 3 7 + 3 4 5 *+- ! 1560: -13 ! 1561: 3 7 + 3 4 5 * + - n @r{Note the unary minus, @samp{n}} ! 1562: 13 ! 1563: 5 6 / 4 n + ! 1564: -3.166666667 ! 1565: 3 4 ^ @r{Exponentiation} ! 1566: 81 ! 1567: ^D @r{End-of-file indicator} ! 1568: % ! 1569: @end example ! 1570: ! 1571: @node Infix Calc, Simple Error Recovery, RPN Calc, Examples ! 1572: @section Infix Notation Calculator: @code{calc} ! 1573: @cindex infix notation calculator ! 1574: @cindex @code{calc} ! 1575: @cindex calculator, infix notation ! 1576: ! 1577: We now modify rpcalc to handle infix operators instead of postfix. Infix ! 1578: notation involves the concept of operator precedence and the need for ! 1579: parentheses nested to arbitrary depth. Here is the Bison code for ! 1580: @file{calc.y}, an infix desk-top calculator. ! 1581: ! 1582: @example ! 1583: /* Infix notation calculator--calc */ ! 1584: ! 1585: %@{ ! 1586: #define YYSTYPE double ! 1587: #include <math.h> ! 1588: %@} ! 1589: ! 1590: /* BISON Declarations */ ! 1591: %token NUM ! 1592: %left '-' '+' ! 1593: %left '*' '/' ! 1594: %left NEG /* negation--unary minus */ ! 1595: %right '^' /* exponentiation */ ! 1596: ! 1597: /* Grammar follows */ ! 1598: %% ! 1599: input: /* empty string */ ! 1600: | input line ! 1601: ; ! 1602: ! 1603: line: '\n' ! 1604: | exp '\n' @{ printf ("\t%.10g\n", $1); @} ! 1605: ; ! 1606: ! 1607: exp: NUM @{ $$ = $1; @} ! 1608: | exp '+' exp @{ $$ = $1 + $3; @} ! 1609: | exp '-' exp @{ $$ = $1 - $3; @} ! 1610: | exp '*' exp @{ $$ = $1 * $3; @} ! 1611: | exp '/' exp @{ $$ = $1 / $3; @} ! 1612: | '-' exp %prec NEG @{ $$ = -$2; @} ! 1613: | exp '^' exp @{ $$ = pow ($1, $3); @} ! 1614: | '(' exp ')' @{ $$ = $2; @} ! 1615: ; ! 1616: %% ! 1617: @end example ! 1618: ! 1619: @noindent ! 1620: The functions @code{yylex}, @code{yyerror} and @code{main} can be the same ! 1621: as before. ! 1622: ! 1623: There are two important new features shown in this code. ! 1624: ! 1625: In the second section (Bison declarations), @code{%left} declares token ! 1626: types and says they are left-associative operators. The declarations ! 1627: @code{%left} and @code{%right} (right associativity) take the place of ! 1628: @code{%token} which is used to declare a token type name without ! 1629: associativity. (These tokens are single-character literals, which ! 1630: ordinarily don't need to be declared. We declare them here to specify ! 1631: the associativity.) ! 1632: ! 1633: Operator precedence is determined by the line ordering of the ! 1634: declarations; the higher the line number of the declaration (lower on ! 1635: the page or screen), the higher the precedence. Hence, exponentiation ! 1636: has the highest precedence, unary minus (@code{NEG}) is next, followed ! 1637: by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator Precedence}. ! 1638: ! 1639: The other important new feature is the @code{%prec} in the grammar section ! 1640: for the unary minus operator. The @code{%prec} simply instructs Bison that ! 1641: the rule @samp{| '-' exp} has the same precedence as @code{NEG}---in this ! 1642: case the next-to-highest. @xref{Contextual Precedence, ,Context-Dependent Precedence}. ! 1643: ! 1644: Here is a sample run of @file{calc.y}: ! 1645: ! 1646: @need 500 ! 1647: @example ! 1648: % calc ! 1649: 4 + 4.5 - (34/(8*3+-3)) ! 1650: 6.880952381 ! 1651: -56 + 2 ! 1652: -54 ! 1653: 3 ^ 2 ! 1654: 9 ! 1655: @end example ! 1656: ! 1657: @node Simple Error Recovery, Multi-function Calc, Infix Calc, Examples ! 1658: @section Simple Error Recovery ! 1659: @cindex error recovery, simple ! 1660: ! 1661: Up to this point, this manual has not addressed the issue of @dfn{error ! 1662: recovery}---how to continue parsing after the parser detects a syntax ! 1663: error. All we have handled is error reporting with @code{yyerror}. Recall ! 1664: that by default @code{yyparse} returns after calling @code{yyerror}. This ! 1665: means that an erroneous input line causes the calculator program to exit. ! 1666: Now we show how to rectify this deficiency. ! 1667: ! 1668: The Bison language itself includes the reserved word @code{error}, which ! 1669: may be included in the grammar rules. In the example below it has ! 1670: been added to one of the alternatives for @code{line}: ! 1671: ! 1672: @example ! 1673: @group ! 1674: line: '\n' ! 1675: | exp '\n' @{ printf ("\t%.10g\n", $1); @} ! 1676: | error '\n' @{ yyerrok; @} ! 1677: ; ! 1678: @end group ! 1679: @end example ! 1680: ! 1681: This addition to the grammar allows for simple error recovery in the event ! 1682: of a parse error. If an expression that cannot be evaluated is read, the ! 1683: error will be recognized by the third rule for @code{line}, and parsing ! 1684: will continue. (The @code{yyerror} function is still called upon to print ! 1685: its message as well.) The action executes the statement @code{yyerrok}, a ! 1686: macro defined automatically by Bison; its meaning is that error recovery is ! 1687: complete (@pxref{Error Recovery}). Note the difference between ! 1688: @code{yyerrok} and @code{yyerror}; neither one is a misprint.@refill ! 1689: ! 1690: This form of error recovery deals with syntax errors. There are other ! 1691: kinds of errors; for example, division by zero, which raises an exception ! 1692: signal that is normally fatal. A real calculator program must handle this ! 1693: signal and use @code{longjmp} to return to @code{main} and resume parsing ! 1694: input lines; it would also have to discard the rest of the current line of ! 1695: input. We won't discuss this issue further because it is not specific to ! 1696: Bison programs. ! 1697: ! 1698: @node Multi-function Calc, Exercises, Simple Error Recovery, Examples ! 1699: @section Multi-Function Calculator: @code{mfcalc} ! 1700: @cindex multi-function calculator ! 1701: @cindex @code{mfcalc} ! 1702: @cindex calculator, multi-function ! 1703: ! 1704: Now that the basics of Bison have been discussed, it is time to move on to ! 1705: a more advanced problem. The above calculators provided only five ! 1706: functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would ! 1707: be nice to have a calculator that provides other mathematical functions such ! 1708: as @code{sin}, @code{cos}, etc. ! 1709: ! 1710: It is easy to add new operators to the infix calculator as long as they are ! 1711: only single-character literals. The lexical analyzer @code{yylex} passes ! 1712: back all non-number characters as tokens, so new grammar rules suffice for ! 1713: adding a new operator. But we want something more flexible: built-in ! 1714: functions whose syntax has this form: ! 1715: ! 1716: @example ! 1717: @var{function_name} (@var{argument}) ! 1718: @end example ! 1719: ! 1720: @noindent ! 1721: At the same time, we will add memory to the calculator, by allowing you ! 1722: to create named variables, store values in them, and use them later. ! 1723: Here is a sample session with the multi-function calculator: ! 1724: ! 1725: @example ! 1726: % acalc ! 1727: pi = 3.141592653589 ! 1728: 3.1415926536 ! 1729: sin(pi) ! 1730: 0.0000000000 ! 1731: alpha = beta1 = 2.3 ! 1732: 2.3000000000 ! 1733: alpha ! 1734: 2.3000000000 ! 1735: ln(alpha) ! 1736: 0.8329091229 ! 1737: exp(ln(beta1)) ! 1738: 2.3000000000 ! 1739: % ! 1740: @end example ! 1741: ! 1742: Note that multiple assignment and nested function calls are permitted. ! 1743: ! 1744: @menu ! 1745: * Decl: Mfcalc Decl. Bison declarations for multi-function calculator. ! 1746: * Rules: Mfcalc Rules. Grammar rules for the calculator. ! 1747: * Symtab: Mfcalc Symtab. Symbol table management subroutines. ! 1748: @end menu ! 1749: ! 1750: @node Mfcalc Decl, Mfcalc Rules, , Multi-function Calc ! 1751: @subsection Declarations for @code{mfcalc} ! 1752: ! 1753: Here are the C and Bison declarations for the multi-function calculator. ! 1754: ! 1755: @smallexample ! 1756: %@{ ! 1757: #include <math.h> /* For math functions, cos(), sin(), etc. */ ! 1758: #include "calc.h" /* Contains definition of `symrec' */ ! 1759: %@} ! 1760: %union @{ ! 1761: double val; /* For returning numbers. */ ! 1762: symrec *tptr; /* For returning symbol-table pointers */ ! 1763: @} ! 1764: ! 1765: %token <val> NUM /* Simple double precision number */ ! 1766: %token <tptr> VAR FNCT /* Variable and Function */ ! 1767: %type <val> exp ! 1768: ! 1769: %right '=' ! 1770: %left '-' '+' ! 1771: %left '*' '/' ! 1772: %left NEG /* Negation--unary minus */ ! 1773: %right '^' /* Exponentiation */ ! 1774: ! 1775: /* Grammar follows */ ! 1776: ! 1777: %% ! 1778: @end smallexample ! 1779: ! 1780: The above grammar introduces only two new features of the Bison language. ! 1781: These features allow semantic values to have various data types ! 1782: (@pxref{Multiple Types, ,More Than One Value Type}). ! 1783: ! 1784: The @code{%union} declaration specifies the entire list of possible types; ! 1785: this is instead of defining @code{YYSTYPE}. The allowable types are now ! 1786: double-floats (for @code{exp} and @code{NUM}) and pointers to entries in ! 1787: the symbol table. @xref{Union Decl, ,The Collection of Value Types}. ! 1788: ! 1789: Since values can now have various types, it is necessary to associate a ! 1790: type with each grammar symbol whose semantic value is used. These symbols ! 1791: are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their ! 1792: declarations are augmented with information about their data type (placed ! 1793: between angle brackets). ! 1794: ! 1795: The Bison construct @code{%type} is used for declaring nonterminal symbols, ! 1796: just as @code{%token} is used for declaring token types. We have not used ! 1797: @code{%type} before because nonterminal symbols are normally declared ! 1798: implicitly by the rules that define them. But @code{exp} must be declared ! 1799: explicitly so we can specify its value type. @xref{Type Decl, ,Nonterminal Symbols}. ! 1800: ! 1801: @node Mfcalc Rules, Mfcalc Symtab, Mfcalc Decl, Multi-function Calc ! 1802: @subsection Grammar Rules for @code{mfcalc} ! 1803: ! 1804: Here are the grammar rules for the multi-function calculator. ! 1805: Most of them are copied directly from @code{calc}; three rules, ! 1806: those which mention @code{VAR} or @code{FNCT}, are new. ! 1807: ! 1808: @smallexample ! 1809: input: /* empty */ ! 1810: | input line ! 1811: ; ! 1812: ! 1813: line: ! 1814: '\n' ! 1815: | exp '\n' @{ printf ("\t%.10g\n", $1); @} ! 1816: | error '\n' @{ yyerrok; @} ! 1817: ; ! 1818: ! 1819: exp: NUM @{ $$ = $1; @} ! 1820: | VAR @{ $$ = $1->value.var; @} ! 1821: | VAR '=' exp @{ $$ = $3; $1->value.var = $3; @} ! 1822: | FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @} ! 1823: | exp '+' exp @{ $$ = $1 + $3; @} ! 1824: | exp '-' exp @{ $$ = $1 - $3; @} ! 1825: | exp '*' exp @{ $$ = $1 * $3; @} ! 1826: | exp '/' exp @{ $$ = $1 / $3; @} ! 1827: | '-' exp %prec NEG @{ $$ = -$2; @} ! 1828: | exp '^' exp @{ $$ = pow ($1, $3); @} ! 1829: | '(' exp ')' @{ $$ = $2; @} ! 1830: ; ! 1831: /* End of grammar */ ! 1832: %% ! 1833: @end smallexample ! 1834: ! 1835: @node Mfcalc Symtab, , Mfcalc Rules, Multi-function Calc ! 1836: @subsection The @code{mfcalc} Symbol Table ! 1837: @cindex symbol table example ! 1838: ! 1839: The multi-function calculator requires a symbol table to keep track of the ! 1840: names and meanings of variables and functions. This doesn't affect the ! 1841: grammar rules (except for the actions) or the Bison declarations, but it ! 1842: requires some additional C functions for support. ! 1843: ! 1844: The symbol table itself consists of a linked list of records. Its ! 1845: definition, which is kept in the header @file{calc.h}, is as follows. It ! 1846: provides for either functions or variables to be placed in the table. ! 1847: ! 1848: @smallexample ! 1849: @group ! 1850: /* Data type for links in the chain of symbols. */ ! 1851: struct symrec ! 1852: @{ ! 1853: char *name; /* name of symbol */ ! 1854: int type; /* type of symbol: either VAR or FNCT */ ! 1855: union @{ ! 1856: double var; /* value of a VAR */ ! 1857: double (*fnctptr)(); /* value of a FNCT */ ! 1858: @} value; ! 1859: struct symrec *next; /* link field */ ! 1860: @}; ! 1861: @end group ! 1862: ! 1863: @group ! 1864: typedef struct symrec symrec; ! 1865: ! 1866: /* The symbol table: a chain of `struct symrec'. */ ! 1867: extern symrec *sym_table; ! 1868: ! 1869: symrec *putsym (); ! 1870: symrec *getsym (); ! 1871: @end group ! 1872: @end smallexample ! 1873: ! 1874: The new version of @code{main} includes a call to @code{init_table}, a ! 1875: function that initializes the symbol table. Here it is, and ! 1876: @code{init_table} as well: ! 1877: ! 1878: @smallexample ! 1879: @group ! 1880: #include <stdio.h> ! 1881: ! 1882: main () ! 1883: @{ ! 1884: init_table (); ! 1885: yyparse (); ! 1886: @} ! 1887: @end group ! 1888: ! 1889: @group ! 1890: yyerror (s) /* Called by yyparse on error */ ! 1891: char *s; ! 1892: @{ ! 1893: printf ("%s\n", s); ! 1894: @} ! 1895: ! 1896: struct init ! 1897: @{ ! 1898: char *fname; ! 1899: double (*fnct)(); ! 1900: @}; ! 1901: @end group ! 1902: ! 1903: @group ! 1904: struct init arith_fncts[] ! 1905: = @{ ! 1906: "sin", sin, ! 1907: "cos", cos, ! 1908: "atan", atan, ! 1909: "ln", log, ! 1910: "exp", exp, ! 1911: "sqrt", sqrt, ! 1912: 0, 0 ! 1913: @}; ! 1914: ! 1915: /* The symbol table: a chain of `struct symrec'. */ ! 1916: symrec *sym_table = (symrec *)0; ! 1917: @end group ! 1918: ! 1919: @group ! 1920: init_table () /* puts arithmetic functions in table. */ ! 1921: @{ ! 1922: int i; ! 1923: symrec *ptr; ! 1924: for (i = 0; arith_fncts[i].fname != 0; i++) ! 1925: @{ ! 1926: ptr = putsym (arith_fncts[i].fname, FNCT); ! 1927: ptr->value.fnctptr = arith_fncts[i].fnct; ! 1928: @} ! 1929: @} ! 1930: @end group ! 1931: @end smallexample ! 1932: ! 1933: By simply editing the initialization list and adding the necessary include ! 1934: files, you can add additional functions to the calculator. ! 1935: ! 1936: Two important functions allow look-up and installation of symbols in the ! 1937: symbol table. The function @code{putsym} is passed a name and the type ! 1938: (@code{VAR} or @code{FNCT}) of the object to be installed. The object is ! 1939: linked to the front of the list, and a pointer to the object is returned. ! 1940: The function @code{getsym} is passed the name of the symbol to look up. If ! 1941: found, a pointer to that symbol is returned; otherwise zero is returned. ! 1942: ! 1943: @smallexample ! 1944: symrec * ! 1945: putsym (sym_name,sym_type) ! 1946: char *sym_name; ! 1947: int sym_type; ! 1948: @{ ! 1949: symrec *ptr; ! 1950: ptr = (symrec *) malloc (sizeof (symrec)); ! 1951: ptr->name = (char *) malloc (strlen (sym_name) + 1); ! 1952: strcpy (ptr->name,sym_name); ! 1953: ptr->type = sym_type; ! 1954: ptr->value.var = 0; /* set value to 0 even if fctn. */ ! 1955: ptr->next = (struct symrec *)sym_table; ! 1956: sym_table = ptr; ! 1957: return ptr; ! 1958: @} ! 1959: ! 1960: symrec * ! 1961: getsym (sym_name) ! 1962: char *sym_name; ! 1963: @{ ! 1964: symrec *ptr; ! 1965: for (ptr = sym_table; ptr != (symrec *) 0; ! 1966: ptr = (symrec *)ptr->next) ! 1967: if (strcmp (ptr->name,sym_name) == 0) ! 1968: return ptr; ! 1969: return 0; ! 1970: @} ! 1971: @end smallexample ! 1972: ! 1973: The function @code{yylex} must now recognize variables, numeric values, and ! 1974: the single-character arithmetic operators. Strings of alphanumeric ! 1975: characters with a leading nondigit are recognized as either variables or ! 1976: functions depending on what the symbol table says about them. ! 1977: ! 1978: The string is passed to @code{getsym} for look up in the symbol table. If ! 1979: the name appears in the table, a pointer to its location and its type ! 1980: (@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not ! 1981: already in the table, then it is installed as a @code{VAR} using ! 1982: @code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is ! 1983: returned to @code{yyparse}.@refill ! 1984: ! 1985: No change is needed in the handling of numeric values and arithmetic ! 1986: operators in @code{yylex}. ! 1987: ! 1988: @smallexample ! 1989: @group ! 1990: #include <ctype.h> ! 1991: yylex () ! 1992: @{ ! 1993: int c; ! 1994: ! 1995: /* Ignore whitespace, get first nonwhite character. */ ! 1996: while ((c = getchar ()) == ' ' || c == '\t'); ! 1997: ! 1998: if (c == EOF) ! 1999: return 0; ! 2000: @end group ! 2001: ! 2002: @group ! 2003: /* Char starts a number => parse the number. */ ! 2004: if (c == '.' || isdigit (c)) ! 2005: @{ ! 2006: ungetc (c, stdin); ! 2007: scanf ("%lf", &yylval.val); ! 2008: return NUM; ! 2009: @} ! 2010: @end group ! 2011: ! 2012: @group ! 2013: /* Char starts an identifier => read the name. */ ! 2014: if (isalpha (c)) ! 2015: @{ ! 2016: symrec *s; ! 2017: static char *symbuf = 0; ! 2018: static int length = 0; ! 2019: int i; ! 2020: @end group ! 2021: ! 2022: @group ! 2023: /* Initially make the buffer long enough ! 2024: for a 40-character symbol name. */ ! 2025: if (length == 0) ! 2026: length = 40, symbuf = (char *)malloc (length + 1); ! 2027: ! 2028: i = 0; ! 2029: do ! 2030: @end group ! 2031: @group ! 2032: @{ ! 2033: /* If buffer is full, make it bigger. */ ! 2034: if (i == length) ! 2035: @{ ! 2036: length *= 2; ! 2037: symbuf = (char *)realloc (symbuf, length + 1); ! 2038: @} ! 2039: /* Add this character to the buffer. */ ! 2040: symbuf[i++] = c; ! 2041: /* Get another character. */ ! 2042: c = getchar (); ! 2043: @} ! 2044: @end group ! 2045: @group ! 2046: while (c != EOF && isalnum (c)); ! 2047: ! 2048: ungetc (c, stdin); ! 2049: symbuf[i] = '\0'; ! 2050: @end group ! 2051: ! 2052: @group ! 2053: s = getsym (symbuf); ! 2054: if (s == 0) ! 2055: s = putsym (symbuf, VAR); ! 2056: yylval.tptr = s; ! 2057: return s->type; ! 2058: @} ! 2059: ! 2060: /* Any other character is a token by itself. */ ! 2061: return c; ! 2062: @} ! 2063: @end group ! 2064: @end smallexample ! 2065: ! 2066: This program is both powerful and flexible. You may easily add new ! 2067: functions, and it is a simple job to modify this code to install predefined ! 2068: variables such as @code{pi} or @code{e} as well. ! 2069: ! 2070: @node Exercises, , Multi-function Calc, Examples ! 2071: @section Exercises ! 2072: @cindex exercises ! 2073: ! 2074: @enumerate ! 2075: @item ! 2076: Add some new functions from @file{math.h} to the initialization list. ! 2077: ! 2078: @item ! 2079: Add another array that contains constants and their values. Then ! 2080: modify @code{init_table} to add these constants to the symbol table. ! 2081: It will be easiest to give the constants type @code{VAR}. ! 2082: ! 2083: @item ! 2084: Make the program report an error if the user refers to an ! 2085: uninitialized variable in any way except to store a value in it. ! 2086: @end enumerate ! 2087: ! 2088: @node Grammar File, Interface, Examples, Top ! 2089: @chapter Bison Grammar Files ! 2090: ! 2091: Bison takes as input a context-free grammar specification and produces a ! 2092: C-language function that recognizes correct instances of the grammar. ! 2093: ! 2094: The Bison grammar input file conventionally has a name ending in @samp{.y}. ! 2095: ! 2096: @menu ! 2097: * Grammar Outline:: Overall layout of the grammar file. ! 2098: * Symbols:: Terminal and nonterminal symbols. ! 2099: * Rules:: How to write grammar rules. ! 2100: * Recursion:: Writing recursive rules. ! 2101: * Semantics:: Semantic values and actions. ! 2102: * Declarations:: All kinds of Bison declarations are described here. ! 2103: * Multiple Parsers:: Putting more than one Bison parser in one program. ! 2104: @end menu ! 2105: ! 2106: @node Grammar Outline, Symbols, , Grammar File ! 2107: @section Outline of a Bison Grammar ! 2108: ! 2109: A Bison grammar file has four main sections, shown here with the ! 2110: appropriate delimiters: ! 2111: ! 2112: @example ! 2113: %@{ ! 2114: @var{C declarations} ! 2115: %@} ! 2116: ! 2117: @var{Bison declarations} ! 2118: ! 2119: %% ! 2120: @var{Grammar rules} ! 2121: %% ! 2122: ! 2123: @var{Additional C code} ! 2124: @end example ! 2125: ! 2126: Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections. ! 2127: ! 2128: @menu ! 2129: * C Declarations:: Syntax and usage of the C declarations section. ! 2130: * Bison Declarations:: Syntax and usage of the Bison declarations section. ! 2131: * Grammar Rules:: Syntax and usage of the grammar rules section. ! 2132: * C Code:: Syntax and usage of the additional C code section. ! 2133: @end menu ! 2134: ! 2135: @node C Declarations, Bison Declarations, , Grammar Outline ! 2136: @subsection The C Declarations Section ! 2137: @cindex C declarations section ! 2138: @cindex declarations, C ! 2139: ! 2140: The @var{C declarations} section contains macro definitions and ! 2141: declarations of functions and variables that are used in the actions in the ! 2142: grammar rules. These are copied to the beginning of the parser file so ! 2143: that they precede the definition of @code{yyparse}. You can use ! 2144: @samp{#include} to get the declarations from a header file. If you don't ! 2145: need any C declarations, you may omit the @samp{%@{} and @samp{%@}} ! 2146: delimiters that bracket this section. ! 2147: ! 2148: @node Bison Declarations, Grammar Rules, C Declarations, Grammar Outline ! 2149: @subsection The Bison Declarations Section ! 2150: @cindex Bison declarations (introduction) ! 2151: @cindex declarations, Bison (introduction) ! 2152: ! 2153: The @var{Bison declarations} section contains declarations that define ! 2154: terminal and nonterminal symbols, specify precedence, and so on. ! 2155: In some simple grammars you may not need any declarations. ! 2156: @xref{Declarations, ,Bison Declarations}. ! 2157: ! 2158: @node Grammar Rules, C Code, Bison Declarations, Grammar Outline ! 2159: @subsection The Grammar Rules Section ! 2160: @cindex grammar rules section ! 2161: @cindex rules section for grammar ! 2162: ! 2163: The @dfn{grammar rules} section contains one or more Bison grammar ! 2164: rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}. ! 2165: ! 2166: There must always be at least one grammar rule, and the first ! 2167: @samp{%%} (which precedes the grammar rules) may never be omitted even ! 2168: if it is the first thing in the file. ! 2169: ! 2170: @node C Code, , Grammar Rules, Grammar Outline ! 2171: @subsection The Additional C Code Section ! 2172: @cindex additional C code section ! 2173: @cindex C code, section for additional ! 2174: ! 2175: The @var{additional C code} section is copied verbatim to the end of ! 2176: the parser file, just as the @var{C declarations} section is copied to ! 2177: the beginning. This is the most convenient place to put anything ! 2178: that you want to have in the parser file but which need not come before ! 2179: the definition of @code{yyparse}. For example, the definitions of ! 2180: @code{yylex} and @code{yyerror} often go here. @xref{Interface, ,Parser C-Language Interface}. ! 2181: ! 2182: If the last section is empty, you may omit the @samp{%%} that separates it ! 2183: from the grammar rules. ! 2184: ! 2185: The Bison parser itself contains many static variables whose names start ! 2186: with @samp{yy} and many macros whose names start with @samp{YY}. It is a ! 2187: good idea to avoid using any such names (except those documented in this ! 2188: manual) in the additional C code section of the grammar file. ! 2189: ! 2190: @node Symbols, Rules, Grammar Outline, Grammar File ! 2191: @section Symbols, Terminal and Nonterminal ! 2192: @cindex nonterminal symbol ! 2193: @cindex terminal symbol ! 2194: @cindex token type ! 2195: @cindex symbol ! 2196: ! 2197: @dfn{Symbols} in Bison grammars represent the grammatical classifications ! 2198: of the language. ! 2199: ! 2200: A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a ! 2201: class of syntactically equivalent tokens. You use the symbol in grammar ! 2202: rules to mean that a token in that class is allowed. The symbol is ! 2203: represented in the Bison parser by a numeric code, and the @code{yylex} ! 2204: function returns a token type code to indicate what kind of token has been ! 2205: read. You don't need to know what the code value is; you can use the ! 2206: symbol to stand for it. ! 2207: ! 2208: A @dfn{nonterminal symbol} stands for a class of syntactically equivalent ! 2209: groupings. The symbol name is used in writing grammar rules. By convention, ! 2210: it should be all lower case. ! 2211: ! 2212: Symbol names can contain letters, digits (not at the beginning), ! 2213: underscores and periods. Periods make sense only in nonterminals. ! 2214: ! 2215: There are two ways of writing terminal symbols in the grammar: ! 2216: ! 2217: @itemize @bullet ! 2218: @item ! 2219: A @dfn{named token type} is written with an identifier, like an ! 2220: identifier in C. By convention, it should be all upper case. Each ! 2221: such name must be defined with a Bison declaration such as ! 2222: @code{%token}. @xref{Token Decl, ,Token Type Names}. ! 2223: ! 2224: @item ! 2225: @cindex character token ! 2226: @cindex literal token ! 2227: @cindex single-character literal ! 2228: A @dfn{character token type} (or @dfn{literal token}) is written in ! 2229: the grammar using the same syntax used in C for character constants; ! 2230: for example, @code{'+'} is a character token type. A character token ! 2231: type doesn't need to be declared unless you need to specify its ! 2232: semantic value data type (@pxref{Value Type, ,Data Types of Semantic Values}), associativity, or ! 2233: precedence (@pxref{Precedence, ,Operator Precedence}). ! 2234: ! 2235: By convention, a character token type is used only to represent a ! 2236: token that consists of that particular character. Thus, the token ! 2237: type @code{'+'} is used to represent the character @samp{+} as a ! 2238: token. Nothing enforces this convention, but if you depart from it, ! 2239: your program will confuse other readers. ! 2240: ! 2241: All the usual escape sequences used in character literals in C can be ! 2242: used in Bison as well, but you must not use the null character as a ! 2243: character literal because its ASCII code, zero, is the code ! 2244: @code{yylex} returns for end-of-input (@pxref{Calling Convention, ,Calling Convention for @code{yylex}}). ! 2245: @end itemize ! 2246: ! 2247: How you choose to write a terminal symbol has no effect on its ! 2248: grammatical meaning. That depends only on where it appears in rules and ! 2249: on when the parser function returns that symbol. ! 2250: ! 2251: The value returned by @code{yylex} is always one of the terminal symbols ! 2252: (or 0 for end-of-input). Whichever way you write the token type in the ! 2253: grammar rules, you write it the same way in the definition of @code{yylex}. ! 2254: The numeric code for a character token type is simply the ASCII code for ! 2255: the character, so @code{yylex} can use the identical character constant to ! 2256: generate the requisite code. Each named token type becomes a C macro in ! 2257: the parser file, so @code{yylex} can use the name to stand for the code. ! 2258: (This is why periods don't make sense in terminal symbols.) ! 2259: @xref{Calling Convention, ,Calling Convention for @code{yylex}}. ! 2260: ! 2261: If @code{yylex} is defined in a separate file, you need to arrange for the ! 2262: token-type macro definitions to be available there. Use the @samp{-d} ! 2263: option when you run Bison, so that it will write these macro definitions ! 2264: into a separate header file @file{@var{name}.tab.h} which you can include ! 2265: in the other source files that need it. @xref{Invocation, ,Invoking Bison}. ! 2266: ! 2267: The symbol @code{error} is a terminal symbol reserved for error recovery ! 2268: (@pxref{Error Recovery}); you shouldn't use it for any other purpose. ! 2269: In particular, @code{yylex} should never return this value. ! 2270: ! 2271: @node Rules, Recursion, Symbols, Grammar File ! 2272: @section Syntax of Grammar Rules ! 2273: @cindex rule syntax ! 2274: @cindex grammar rule syntax ! 2275: @cindex syntax of grammar rules ! 2276: ! 2277: A Bison grammar rule has the following general form: ! 2278: ! 2279: @example ! 2280: @var{result}: @var{components}@dots{} ! 2281: ; ! 2282: @end example ! 2283: ! 2284: @noindent ! 2285: where @var{result} is the nonterminal symbol that this rule describes ! 2286: and @var{components} are various terminal and nonterminal symbols that ! 2287: are put together by this rule (@pxref{Symbols}). ! 2288: ! 2289: For example, ! 2290: ! 2291: @example ! 2292: @group ! 2293: exp: exp '+' exp ! 2294: ; ! 2295: @end group ! 2296: @end example ! 2297: ! 2298: @noindent ! 2299: says that two groupings of type @code{exp}, with a @samp{+} token in between, ! 2300: can be combined into a larger grouping of type @code{exp}. ! 2301: ! 2302: Whitespace in rules is significant only to separate symbols. You can add ! 2303: extra whitespace as you wish. ! 2304: ! 2305: Scattered among the components can be @var{actions} that determine ! 2306: the semantics of the rule. An action looks like this: ! 2307: ! 2308: @example ! 2309: @{@var{C statements}@} ! 2310: @end example ! 2311: ! 2312: @noindent ! 2313: Usually there is only one action and it follows the components. ! 2314: @xref{Actions}. ! 2315: ! 2316: @findex | ! 2317: Multiple rules for the same @var{result} can be written separately or can ! 2318: be joined with the vertical-bar character @samp{|} as follows: ! 2319: ! 2320: @ifinfo ! 2321: @example ! 2322: @var{result}: @var{rule1-components}@dots{} ! 2323: | @var{rule2-components}@dots{} ! 2324: @dots{} ! 2325: ; ! 2326: @end example ! 2327: @end ifinfo ! 2328: @iftex ! 2329: @example ! 2330: @group ! 2331: @var{result}: @var{rule1-components}@dots{} ! 2332: | @var{rule2-components}@dots{} ! 2333: @dots{} ! 2334: ; ! 2335: @end group ! 2336: @end example ! 2337: @end iftex ! 2338: ! 2339: @noindent ! 2340: They are still considered distinct rules even when joined in this way. ! 2341: ! 2342: If @var{components} in a rule is empty, it means that @var{result} can ! 2343: match the empty string. For example, here is how to define a ! 2344: comma-separated sequence of zero or more @code{exp} groupings: ! 2345: ! 2346: @example ! 2347: @group ! 2348: expseq: /* empty */ ! 2349: | expseq1 ! 2350: ; ! 2351: @end group ! 2352: ! 2353: @group ! 2354: expseq1: exp ! 2355: | expseq1 ',' exp ! 2356: ; ! 2357: @end group ! 2358: @end example ! 2359: ! 2360: @noindent ! 2361: It is customary to write a comment @samp{/* empty */} in each rule ! 2362: with no components. ! 2363: ! 2364: @node Recursion, Semantics, Rules, Grammar File ! 2365: @section Recursive Rules ! 2366: @cindex recursive rule ! 2367: ! 2368: A rule is called @dfn{recursive} when its @var{result} nonterminal appears ! 2369: also on its right hand side. Nearly all Bison grammars need to use ! 2370: recursion, because that is the only way to define a sequence of any number ! 2371: of somethings. Consider this recursive definition of a comma-separated ! 2372: sequence of one or more expressions: ! 2373: ! 2374: @example ! 2375: @group ! 2376: expseq1: exp ! 2377: | expseq1 ',' exp ! 2378: ; ! 2379: @end group ! 2380: @end example ! 2381: ! 2382: @cindex left recursion ! 2383: @cindex right recursion ! 2384: @noindent ! 2385: Since the recursive use of @code{expseq1} is the leftmost symbol in the ! 2386: right hand side, we call this @dfn{left recursion}. By contrast, here ! 2387: the same construct is defined using @dfn{right recursion}: ! 2388: ! 2389: @example ! 2390: @group ! 2391: expseq1: exp ! 2392: | exp ',' expseq1 ! 2393: ; ! 2394: @end group ! 2395: @end example ! 2396: ! 2397: @noindent ! 2398: Any kind of sequence can be defined using either left recursion or ! 2399: right recursion, but you should always use left recursion, because it ! 2400: can parse a sequence of any number of elements with bounded stack ! 2401: space. Right recursion uses up space on the Bison stack in proportion ! 2402: to the number of elements in the sequence, because all the elements ! 2403: must be shifted onto the stack before the rule can be applied even ! 2404: once. @xref{Algorithm, ,The Bison Parser Algorithm }, for ! 2405: further explanation of this. ! 2406: ! 2407: @cindex mutual recursion ! 2408: @dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the ! 2409: rule does not appear directly on its right hand side, but does appear ! 2410: in rules for other nonterminals which do appear on its right hand ! 2411: side. ! 2412: ! 2413: For example: ! 2414: ! 2415: @example ! 2416: @group ! 2417: expr: primary ! 2418: | primary '+' primary ! 2419: ; ! 2420: @end group ! 2421: ! 2422: @group ! 2423: primary: constant ! 2424: | '(' expr ')' ! 2425: ; ! 2426: @end group ! 2427: @end example ! 2428: ! 2429: @noindent ! 2430: defines two mutually-recursive nonterminals, since each refers to the ! 2431: other. ! 2432: ! 2433: @node Semantics, Declarations, Recursion, Grammar File ! 2434: @section Defining Language Semantics ! 2435: @cindex defining language semantics ! 2436: @cindex language semantics, defining ! 2437: ! 2438: The grammar rules for a language determine only the syntax. The semantics ! 2439: are determined by the semantic values associated with various tokens and ! 2440: groupings, and by the actions taken when various groupings are recognized. ! 2441: ! 2442: For example, the calculator calculates properly because the value ! 2443: associated with each expression is the proper number; it adds properly ! 2444: because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add ! 2445: the numbers associated with @var{x} and @var{y}. ! 2446: ! 2447: @menu ! 2448: * Value Type:: Specifying one data type for all semantic values. ! 2449: * Multiple Types:: Specifying several alternative data types. ! 2450: * Actions:: An action is the semantic definition of a grammar rule. ! 2451: * Action Types:: Specifying data types for actions to operate on. ! 2452: * Mid-Rule Actions:: Most actions go at the end of a rule. ! 2453: This says when, why and how to use the exceptional ! 2454: action in the middle of a rule. ! 2455: @end menu ! 2456: ! 2457: @node Value Type, Multiple Types, , Semantics ! 2458: @subsection Data Types of Semantic Values ! 2459: @cindex semantic value type ! 2460: @cindex value type, semantic ! 2461: @cindex data types of semantic values ! 2462: @cindex default data type ! 2463: ! 2464: In a simple program it may be sufficient to use the same data type for ! 2465: the semantic values of all language constructs. This was true in the ! 2466: RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish Notation Calculator}). ! 2467: ! 2468: Bison's default is to use type @code{int} for all semantic values. To ! 2469: specify some other type, define @code{YYSTYPE} as a macro, like this: ! 2470: ! 2471: @example ! 2472: #define YYSTYPE double ! 2473: @end example ! 2474: ! 2475: @noindent ! 2476: This macro definition must go in the C declarations section of the grammar ! 2477: file (@pxref{Grammar Outline, ,Outline of a Bison Grammar}). ! 2478: ! 2479: @node Multiple Types, Actions, Value Type, Semantics ! 2480: @subsection More Than One Value Type ! 2481: ! 2482: In most programs, you will need different data types for different kinds ! 2483: of tokens and groupings. For example, a numeric constant may need type ! 2484: @code{int} or @code{long}, while a string constant needs type @code{char *}, ! 2485: and an identifier might need a pointer to an entry in the symbol table. ! 2486: ! 2487: To use more than one data type for semantic values in one parser, Bison ! 2488: requires you to do two things: ! 2489: ! 2490: @itemize @bullet ! 2491: @item ! 2492: Specify the entire collection of possible data types, with the ! 2493: @code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of Value Types}). ! 2494: ! 2495: @item ! 2496: Choose one of those types for each symbol (terminal or nonterminal) ! 2497: for which semantic values are used. This is done for tokens with the ! 2498: @code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) and for groupings ! 2499: with the @code{%type} Bison declaration (@pxref{Type Decl, ,Nonterminal Symbols}). ! 2500: @end itemize ! 2501: ! 2502: @node Actions, Action Types, Multiple Types, Semantics ! 2503: @subsection Actions ! 2504: @cindex action ! 2505: @vindex $$ ! 2506: @vindex $@var{n} ! 2507: ! 2508: An action accompanies a syntactic rule and contains C code to be executed ! 2509: each time an instance of that rule is recognized. The task of most actions ! 2510: is to compute a semantic value for the grouping built by the rule from the ! 2511: semantic values associated with tokens or smaller groupings. ! 2512: ! 2513: An action consists of C statements surrounded by braces, much like a ! 2514: compound statement in C. It can be placed at any position in the rule; it ! 2515: is executed at that position. Most rules have just one action at the end ! 2516: of the rule, following all the components. Actions in the middle of a rule ! 2517: are tricky and used only for special purposes (@pxref{Mid-Rule Actions, ,Actions in Mid-Rule}). ! 2518: ! 2519: The C code in an action can refer to the semantic values of the components ! 2520: matched by the rule with the construct @code{$@var{n}}, which stands for ! 2521: the value of the @var{n}th component. The semantic value for the grouping ! 2522: being constructed is @code{$$}. (Bison translates both of these constructs ! 2523: into array element references when it copies the actions into the parser ! 2524: file.) ! 2525: ! 2526: Here is a typical example: ! 2527: ! 2528: @example ! 2529: @group ! 2530: exp: @dots{} ! 2531: | exp '+' exp ! 2532: @{ $$ = $1 + $3; @} ! 2533: @end group ! 2534: @end example ! 2535: ! 2536: @noindent ! 2537: This rule constructs an @code{exp} from two smaller @code{exp} groupings ! 2538: connected by a plus-sign token. In the action, @code{$1} and @code{$3} ! 2539: refer to the semantic values of the two component @code{exp} groupings, ! 2540: which are the first and third symbols on the right hand side of the rule. ! 2541: The sum is stored into @code{$$} so that it becomes the semantic value of ! 2542: the addition-expression just recognized by the rule. If there were a ! 2543: useful semantic value associated with the @samp{+} token, it could be ! 2544: referred to as @code{$2}.@refill ! 2545: ! 2546: @cindex default action ! 2547: If you don't specify an action for a rule, Bison supplies a default: ! 2548: @w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule becomes ! 2549: the value of the whole rule. Of course, the default rule is valid only ! 2550: if the two data types match. There is no meaningful default action for ! 2551: an empty rule; every empty rule must have an explicit action unless the ! 2552: rule's value does not matter. ! 2553: ! 2554: @code{$@var{n}} with @var{n} zero or negative is allowed for reference ! 2555: to tokens and groupings on the stack @emph{before} those that match the ! 2556: current rule. This is a very risky practice, and to use it reliably ! 2557: you must be certain of the context in which the rule is applied. Here ! 2558: is a case in which you can use this reliably: ! 2559: ! 2560: @example ! 2561: @group ! 2562: foo: expr bar '+' expr @{ @dots{} @} ! 2563: | expr bar '-' expr @{ @dots{} @} ! 2564: ; ! 2565: @end group ! 2566: ! 2567: @group ! 2568: bar: /* empty */ ! 2569: @{ previous_expr = $0; @} ! 2570: ; ! 2571: @end group ! 2572: @end example ! 2573: ! 2574: As long as @code{bar} is used only in the fashion shown here, @code{$0} ! 2575: always refers to the @code{expr} which precedes @code{bar} in the ! 2576: definition of @code{foo}. ! 2577: ! 2578: @node Action Types, Mid-Rule Actions, Actions, Semantics ! 2579: @subsection Data Types of Values in Actions ! 2580: @cindex action data types ! 2581: @cindex data types in actions ! 2582: ! 2583: If you have chosen a single data type for semantic values, the @code{$$} ! 2584: and @code{$@var{n}} constructs always have that data type. ! 2585: ! 2586: If you have used @code{%union} to specify a variety of data types, then you ! 2587: must declare a choice among these types for each terminal or nonterminal ! 2588: symbol that can have a semantic value. Then each time you use @code{$$} or ! 2589: @code{$@var{n}}, its data type is determined by which symbol it refers to ! 2590: in the rule. In this example,@refill ! 2591: ! 2592: @example ! 2593: @group ! 2594: exp: @dots{} ! 2595: | exp '+' exp ! 2596: @{ $$ = $1 + $3; @} ! 2597: @end group ! 2598: @end example ! 2599: ! 2600: @noindent ! 2601: @code{$1} and @code{$3} refer to instances of @code{exp}, so they all ! 2602: have the data type declared for the nonterminal symbol @code{exp}. If ! 2603: @code{$2} were used, it would have the data type declared for the ! 2604: terminal symbol @code{'+'}, whatever that might be.@refill ! 2605: ! 2606: Alternatively, you can specify the data type when you refer to the value, ! 2607: by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the ! 2608: reference. For example, if you have defined types as shown here: ! 2609: ! 2610: @example ! 2611: @group ! 2612: %union @{ ! 2613: int itype; ! 2614: double dtype; ! 2615: @} ! 2616: @end group ! 2617: @end example ! 2618: ! 2619: @noindent ! 2620: then you can write @code{$<itype>1} to refer to the first subunit of the ! 2621: rule as an integer, or @code{$<dtype>1} to refer to it as a double. ! 2622: ! 2623: @node Mid-Rule Actions, , Action Types, Semantics ! 2624: @subsection Actions in Mid-Rule ! 2625: @cindex actions in mid-rule ! 2626: @cindex mid-rule actions ! 2627: ! 2628: Occasionally it is useful to put an action in the middle of a rule. ! 2629: These actions are written just like usual end-of-rule actions, but they ! 2630: are executed before the parser even recognizes the following components. ! 2631: ! 2632: A mid-rule action may refer to the components preceding it using ! 2633: @code{$@var{n}}, but it may not refer to subsequent components because ! 2634: it is run before they are parsed. ! 2635: ! 2636: The mid-rule action itself counts as one of the components of the rule. ! 2637: This makes a difference when there is another action later in the same rule ! 2638: (and usually there is another at the end): you have to count the actions ! 2639: along with the symbols when working out which number @var{n} to use in ! 2640: @code{$@var{n}}. ! 2641: ! 2642: The mid-rule action can also have a semantic value. The action can set ! 2643: its value with an assignment to @code{$$}, and actions later in the rule ! 2644: can refer to the value using @code{$@var{n}}. Since there is no symbol ! 2645: to name the action, there is no way to declare a data type for the value ! 2646: in advance, so you must use the @samp{$<@dots{}>} construct to specify a ! 2647: data type each time you refer to this value. ! 2648: ! 2649: There is no way to set the value of the entire rule with a mid-rule ! 2650: action, because assignments to @code{$$} do not have that effect. The ! 2651: only way to set the value for the entire rule is with an ordinary action ! 2652: at the end of the rule. ! 2653: ! 2654: Here is an example from a hypothetical compiler, handling a @code{let} ! 2655: statement that looks like @samp{let (@var{variable}) @var{statement}} and ! 2656: serves to create a variable named @var{variable} temporarily for the ! 2657: duration of @var{statement}. To parse this construct, we must put ! 2658: @var{variable} into the symbol table while @var{statement} is parsed, then ! 2659: remove it afterward. Here is how it is done: ! 2660: ! 2661: @example ! 2662: @group ! 2663: stmt: LET '(' var ')' ! 2664: @{ $<context>$ = push_context (); ! 2665: declare_variable ($3); @} ! 2666: stmt @{ $$ = $6; ! 2667: pop_context ($<context>5); @} ! 2668: @end group ! 2669: @end example ! 2670: ! 2671: @noindent ! 2672: As soon as @samp{let (@var{variable})} has been recognized, the first ! 2673: action is run. It saves a copy of the current semantic context (the ! 2674: list of accessible variables) as its semantic value, using alternative ! 2675: @code{context} in the data-type union. Then it calls ! 2676: @code{declare_variable} to add the new variable to that list. Once the ! 2677: first action is finished, the embedded statement @code{stmt} can be ! 2678: parsed. Note that the mid-rule action is component number 5, so the ! 2679: @samp{stmt} is component number 6. ! 2680: ! 2681: After the embedded statement is parsed, its semantic value becomes the ! 2682: value of the entire @code{let}-statement. Then the semantic value from the ! 2683: earlier action is used to restore the prior list of variables. This ! 2684: removes the temporary @code{let}-variable from the list so that it won't ! 2685: appear to exist while the rest of the program is parsed. ! 2686: ! 2687: Taking action before a rule is completely recognized often leads to ! 2688: conflicts since the parser must commit to a parse in order to execute the ! 2689: action. For example, the following two rules, without mid-rule actions, ! 2690: can coexist in a working parser because the parser can shift the open-brace ! 2691: token and look at what follows before deciding whether there is a ! 2692: declaration or not: ! 2693: ! 2694: @example ! 2695: @group ! 2696: compound: '@{' declarations statements '@}' ! 2697: | '@{' statements '@}' ! 2698: ; ! 2699: @end group ! 2700: @end example ! 2701: ! 2702: @noindent ! 2703: But when we add a mid-rule action as follows, the rules become nonfunctional: ! 2704: ! 2705: @example ! 2706: @group ! 2707: compound: @{ prepare_for_local_variables (); @} ! 2708: '@{' declarations statements '@}' ! 2709: @end group ! 2710: @group ! 2711: | '@{' statements '@}' ! 2712: ; ! 2713: @end group ! 2714: @end example ! 2715: ! 2716: @noindent ! 2717: Now the parser is forced to decide whether to run the mid-rule action ! 2718: when it has read no farther than the open-brace. In other words, it ! 2719: must commit to using one rule or the other, without sufficient ! 2720: information to do it correctly. (The open-brace token is what is called ! 2721: the @dfn{look-ahead} token at this time, since the parser is still ! 2722: deciding what to do about it. @xref{Look-Ahead, ,Look-Ahead Tokens}.) ! 2723: ! 2724: You might think that you could correct the problem by putting identical ! 2725: actions into the two rules, like this: ! 2726: ! 2727: @example ! 2728: @group ! 2729: compound: @{ prepare_for_local_variables (); @} ! 2730: '@{' declarations statements '@}' ! 2731: | @{ prepare_for_local_variables (); @} ! 2732: '@{' statements '@}' ! 2733: ; ! 2734: @end group ! 2735: @end example ! 2736: ! 2737: @noindent ! 2738: But this does not help, because Bison does not realize that the two actions ! 2739: are identical. (Bison never tries to understand the C code in an action.) ! 2740: ! 2741: If the grammar is such that a declaration can be distinguished from a ! 2742: statement by the first token (which is true in C), then one solution which ! 2743: does work is to put the action after the open-brace, like this: ! 2744: ! 2745: @example ! 2746: @group ! 2747: compound: '@{' @{ prepare_for_local_variables (); @} ! 2748: declarations statements '@}' ! 2749: | '@{' statements '@}' ! 2750: ; ! 2751: @end group ! 2752: @end example ! 2753: ! 2754: @noindent ! 2755: Now the first token of the following declaration or statement, ! 2756: which would in any case tell Bison which rule to use, can still do so. ! 2757: ! 2758: Another solution is to bury the action inside a nonterminal symbol which ! 2759: serves as a subroutine: ! 2760: ! 2761: @example ! 2762: @group ! 2763: subroutine: /* empty */ ! 2764: @{ prepare_for_local_variables (); @} ! 2765: ; ! 2766: ! 2767: @end group ! 2768: ! 2769: @group ! 2770: compound: subroutine ! 2771: '@{' declarations statements '@}' ! 2772: | subroutine ! 2773: '@{' statements '@}' ! 2774: ; ! 2775: @end group ! 2776: @end example ! 2777: ! 2778: @noindent ! 2779: Now Bison can execute the action in the rule for @code{subroutine} without ! 2780: deciding which rule for @code{compound} it will eventually use. Note that ! 2781: the action is now at the end of its rule. Any mid-rule action can be ! 2782: converted to an end-of-rule action in this way, and this is what Bison ! 2783: actually does to implement mid-rule actions. ! 2784: ! 2785: @node Declarations, Multiple Parsers, Semantics, Grammar File ! 2786: @section Bison Declarations ! 2787: @cindex declarations, Bison ! 2788: @cindex Bison declarations ! 2789: ! 2790: The @dfn{Bison declarations} section of a Bison grammar defines the symbols ! 2791: used in formulating the grammar and the data types of semantic values. ! 2792: @xref{Symbols}. ! 2793: ! 2794: All token type names (but not single-character literal tokens such as ! 2795: @code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be ! 2796: declared if you need to specify which data type to use for the semantic ! 2797: value (@pxref{Multiple Types, ,More Than One Value Type}). ! 2798: ! 2799: The first rule in the file also specifies the start symbol, by default. ! 2800: If you want some other symbol to be the start symbol, you must declare ! 2801: it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}). ! 2802: ! 2803: @menu ! 2804: * Token Decl:: Declaring terminal symbols. ! 2805: * Precedence Decl:: Declaring terminals with precedence and associativity. ! 2806: * Union Decl:: Declaring the set of all semantic value types. ! 2807: * Type Decl:: Declaring the choice of type for a nonterminal symbol. ! 2808: * Expect Decl:: Suppressing warnings about shift/reduce conflicts. ! 2809: * Start Decl:: Specifying the start symbol. ! 2810: * Pure Decl:: Requesting a reentrant parser. ! 2811: * Decl Summary:: Table of all Bison declarations. ! 2812: @end menu ! 2813: ! 2814: @node Token Decl, Precedence Decl, , Declarations ! 2815: @subsection Token Type Names ! 2816: @cindex declaring token type names ! 2817: @cindex token type names, declaring ! 2818: @findex %token ! 2819: ! 2820: The basic way to declare a token type name (terminal symbol) is as follows: ! 2821: ! 2822: @example ! 2823: %token @var{name} ! 2824: @end example ! 2825: ! 2826: Bison will convert this into a @code{#define} directive in ! 2827: the parser, so that the function @code{yylex} (if it is in this file) ! 2828: can use the name @var{name} to stand for this token type's code. ! 2829: ! 2830: Alternatively, you can use @code{%left}, @code{%right}, or @code{%nonassoc} ! 2831: instead of @code{%token}, if you wish to specify precedence. ! 2832: @xref{Precedence Decl, ,Operator Precedence}. ! 2833: ! 2834: You can explicitly specify the numeric code for a token type by appending ! 2835: an integer value in the field immediately following the token name: ! 2836: ! 2837: @example ! 2838: %token NUM 300 ! 2839: @end example ! 2840: ! 2841: @noindent ! 2842: It is generally best, however, to let Bison choose the numeric codes for ! 2843: all token types. Bison will automatically select codes that don't conflict ! 2844: with each other or with ASCII characters. ! 2845: ! 2846: In the event that the stack type is a union, you must augment the ! 2847: @code{%token} or other token declaration to include the data type ! 2848: alternative delimited by angle-brackets (@pxref{Multiple Types, ,More Than One Value Type}). ! 2849: ! 2850: For example: ! 2851: ! 2852: @example ! 2853: @group ! 2854: %union @{ /* define stack type */ ! 2855: double val; ! 2856: symrec *tptr; ! 2857: @} ! 2858: %token <val> NUM /* define token NUM and its type */ ! 2859: @end group ! 2860: @end example ! 2861: ! 2862: @node Precedence Decl, Union Decl, Token Decl, Declarations ! 2863: @subsection Operator Precedence ! 2864: @cindex precedence declarations ! 2865: @cindex declaring operator precedence ! 2866: @cindex operator precedence, declaring ! 2867: ! 2868: Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to ! 2869: declare a token and specify its precedence and associativity, all at ! 2870: once. These are called @dfn{precedence declarations}. ! 2871: @xref{Precedence, ,Operator Precedence}, for general information on operator precedence. ! 2872: ! 2873: The syntax of a precedence declaration is the same as that of ! 2874: @code{%token}: either ! 2875: ! 2876: @example ! 2877: %left @var{symbols}@dots{} ! 2878: @end example ! 2879: ! 2880: @noindent ! 2881: or ! 2882: ! 2883: @example ! 2884: %left <@var{type}> @var{symbols}@dots{} ! 2885: @end example ! 2886: ! 2887: And indeed any of these declarations serves the purposes of @code{%token}. ! 2888: But in addition, they specify the associativity and relative precedence for ! 2889: all the @var{symbols}: ! 2890: ! 2891: @itemize @bullet ! 2892: @item ! 2893: The associativity of an operator @var{op} determines how repeated uses ! 2894: of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op} ! 2895: @var{z}} is parsed by grouping @var{x} with @var{y} first or by ! 2896: grouping @var{y} with @var{z} first. @code{%left} specifies ! 2897: left-associativity (grouping @var{x} with @var{y} first) and ! 2898: @code{%right} specifies right-associativity (grouping @var{y} with ! 2899: @var{z} first). @code{%nonassoc} specifies no associativity, which ! 2900: means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is ! 2901: considered a syntax error. ! 2902: ! 2903: @item ! 2904: The precedence of an operator determines how it nests with other operators. ! 2905: All the tokens declared in a single precedence declaration have equal ! 2906: precedence and nest together according to their associativity. ! 2907: When two tokens declared in different precedence declarations associate, ! 2908: the one declared later has the higher precedence and is grouped first. ! 2909: @end itemize ! 2910: ! 2911: @node Union Decl, Type Decl, Precedence Decl, Declarations ! 2912: @subsection The Collection of Value Types ! 2913: @cindex declaring value types ! 2914: @cindex value types, declaring ! 2915: @findex %union ! 2916: ! 2917: The @code{%union} declaration specifies the entire collection of possible ! 2918: data types for semantic values. The keyword @code{%union} is followed by a ! 2919: pair of braces containing the same thing that goes inside a @code{union} in ! 2920: C. ! 2921: ! 2922: For example: ! 2923: ! 2924: @example ! 2925: @group ! 2926: %union @{ ! 2927: double val; ! 2928: symrec *tptr; ! 2929: @} ! 2930: @end group ! 2931: @end example ! 2932: ! 2933: @noindent ! 2934: This says that the two alternative types are @code{double} and @code{symrec ! 2935: *}. They are given names @code{val} and @code{tptr}; these names are used ! 2936: in the @code{%token} and @code{%type} declarations to pick one of the types ! 2937: for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}). ! 2938: ! 2939: Note that, unlike making a @code{union} declaration in C, you do not write ! 2940: a semicolon after the closing brace. ! 2941: ! 2942: @node Type Decl, Expect Decl, Union Decl, Declarations ! 2943: @subsection Nonterminal Symbols ! 2944: @cindex declaring value types, nonterminals ! 2945: @cindex value types, nonterminals, declaring ! 2946: @findex %type ! 2947: ! 2948: @noindent ! 2949: When you use @code{%union} to specify multiple value types, you must ! 2950: declare the value type of each nonterminal symbol for which values are ! 2951: used. This is done with a @code{%type} declaration, like this: ! 2952: ! 2953: @example ! 2954: %type <@var{type}> @var{nonterminal}@dots{} ! 2955: @end example ! 2956: ! 2957: @noindent ! 2958: Here @var{nonterminal} is the name of a nonterminal symbol, and @var{type} ! 2959: is the name given in the @code{%union} to the alternative that you want ! 2960: (@pxref{Union Decl, ,The Collection of Value Types}). You can give any number of nonterminal symbols in ! 2961: the same @code{%type} declaration, if they have the same value type. Use ! 2962: spaces to separate the symbol names. ! 2963: ! 2964: @node Expect Decl, Start Decl, Type Decl, Declarations ! 2965: @subsection Suppressing Conflict Warnings ! 2966: @cindex suppressing conflict warnings ! 2967: @cindex preventing warnings about conflicts ! 2968: @cindex warnings, preventing ! 2969: @cindex conflicts, suppressing warnings of ! 2970: @findex %expect ! 2971: ! 2972: Bison normally warns if there are any conflicts in the grammar ! 2973: (@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars have harmless shift/reduce ! 2974: conflicts which are resolved in a predictable way and would be difficult to ! 2975: eliminate. It is desirable to suppress the warning about these conflicts ! 2976: unless the number of conflicts changes. You can do this with the ! 2977: @code{%expect} declaration. ! 2978: ! 2979: The declaration looks like this: ! 2980: ! 2981: @example ! 2982: %expect @var{n} ! 2983: @end example ! 2984: ! 2985: Here @var{n} is a decimal integer. The declaration says there should be no ! 2986: warning if there are @var{n} shift/reduce conflicts and no reduce/reduce ! 2987: conflicts. The usual warning is given if there are either more or fewer ! 2988: conflicts, or if there are any reduce/reduce conflicts. ! 2989: ! 2990: In general, using @code{%expect} involves these steps: ! 2991: ! 2992: @itemize @bullet ! 2993: @item ! 2994: Compile your grammar without @code{%expect}. Use the @samp{-v} option ! 2995: to get a verbose list of where the conflicts occur. Bison will also ! 2996: print the number of conflicts. ! 2997: ! 2998: @item ! 2999: Check each of the conflicts to make sure that Bison's default ! 3000: resolution is what you really want. If not, rewrite the grammar and ! 3001: go back to the beginning. ! 3002: ! 3003: @item ! 3004: Add an @code{%expect} declaration, copying the number @var{n} from the ! 3005: number which Bison printed. ! 3006: @end itemize ! 3007: ! 3008: Now Bison will stop annoying you about the conflicts you have checked, but ! 3009: it will warn you again if changes in the grammar result in additional ! 3010: conflicts. ! 3011: ! 3012: @node Start Decl, Pure Decl, Expect Decl, Declarations ! 3013: @subsection The Start-Symbol ! 3014: @cindex declaring the start symbol ! 3015: @cindex start symbol, declaring ! 3016: @cindex default start symbol ! 3017: @findex %start ! 3018: ! 3019: Bison assumes by default that the start symbol for the grammar is the first ! 3020: nonterminal specified in the grammar specification section. The programmer ! 3021: may override this restriction with the @code{%start} declaration as follows: ! 3022: ! 3023: @example ! 3024: %start @var{symbol} ! 3025: @end example ! 3026: ! 3027: @node Pure Decl, Decl Summary, Start Decl, Declarations ! 3028: @subsection A Pure (Reentrant) Parser ! 3029: @cindex reentrant parser ! 3030: @cindex pure parser ! 3031: @findex %pure_parser ! 3032: ! 3033: A @dfn{reentrant} program is one which does not alter in the course of ! 3034: execution; in other words, it consists entirely of @dfn{pure} (read-only) ! 3035: code. Reentrancy is important whenever asynchronous execution is possible; ! 3036: for example, a nonreentrant program may not be safe to call from a signal ! 3037: handler. In systems with multiple threads of control, a nonreentrant ! 3038: program must be called only within interlocks. ! 3039: ! 3040: The Bison parser is not normally a reentrant program, because it uses ! 3041: statically allocated variables for communication with @code{yylex}. These ! 3042: variables include @code{yylval} and @code{yylloc}. ! 3043: ! 3044: The Bison declaration @code{%pure_parser} says that you want the parser ! 3045: to be reentrant. It looks like this: ! 3046: ! 3047: @example ! 3048: %pure_parser ! 3049: @end example ! 3050: ! 3051: The effect is that the two communication variables become local ! 3052: variables in @code{yyparse}, and a different calling convention is used for ! 3053: the lexical analyzer function @code{yylex}. @xref{Pure Calling, ,Calling for Pure Parsers}, for the ! 3054: details of this. The variable @code{yynerrs} also becomes local in ! 3055: @code{yyparse} (@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}). The convention for calling ! 3056: @code{yyparse} itself is unchanged. ! 3057: ! 3058: @node Decl Summary, , Pure Decl, Declarations ! 3059: @subsection Bison Declaration Summary ! 3060: @cindex Bison declaration summary ! 3061: @cindex declaration summary ! 3062: @cindex summary, Bison declaration ! 3063: ! 3064: Here is a summary of all Bison declarations: ! 3065: ! 3066: @table @code ! 3067: @item %union ! 3068: Declare the collection of data types that semantic values may have ! 3069: (@pxref{Union Decl, ,The Collection of Value Types}). ! 3070: ! 3071: @item %token ! 3072: Declare a terminal symbol (token type name) with no precedence ! 3073: or associativity specified (@pxref{Token Decl, ,Token Type Names}). ! 3074: ! 3075: @item %right ! 3076: Declare a terminal symbol (token type name) that is right-associative ! 3077: (@pxref{Precedence Decl, ,Operator Precedence}). ! 3078: ! 3079: @item %left ! 3080: Declare a terminal symbol (token type name) that is left-associative ! 3081: (@pxref{Precedence Decl, ,Operator Precedence}). ! 3082: ! 3083: @item %nonassoc ! 3084: Declare a terminal symbol (token type name) that is nonassociative ! 3085: (using it in a way that would be associative is a syntax error) ! 3086: (@pxref{Precedence Decl, ,Operator Precedence}). ! 3087: ! 3088: @item %type ! 3089: Declare the type of semantic values for a nonterminal symbol ! 3090: (@pxref{Type Decl, ,Nonterminal Symbols}). ! 3091: ! 3092: @item %start ! 3093: Specify the grammar's start symbol (@pxref{Start Decl, ,The Start-Symbol}). ! 3094: ! 3095: @item %expect ! 3096: Declare the expected number of shift-reduce conflicts ! 3097: (@pxref{Expect Decl, ,Suppressing Conflict Warnings}). ! 3098: ! 3099: @item %pure_parser ! 3100: Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). ! 3101: @end table ! 3102: ! 3103: @node Multiple Parsers, , Declarations, Grammar File ! 3104: @section Multiple Parsers in the Same Program ! 3105: ! 3106: Most programs that use Bison parse only one language and therefore contain ! 3107: only one Bison parser. But what if you want to parse more than one ! 3108: language with the same program? Then you need to avoid a name conflict ! 3109: between different definitions of @code{yyparse}, @code{yylval}, and so on. ! 3110: ! 3111: The easy way to do this is to use the option @samp{-p @var{prefix}} ! 3112: (@pxref{Invocation, ,Invoking Bison}). This renames the interface functions and ! 3113: variables of the Bison parser to start with @var{prefix} instead of ! 3114: @samp{yy}. You can use this to give each parser distinct names that do ! 3115: not conflict. ! 3116: ! 3117: The precise list of symbols renamed is @code{yyparse}, @code{yylex}, ! 3118: @code{yyerror}, @code{yylval}, @code{yychar} and @code{yydebug}. For ! 3119: example, if you use @samp{-p c}, the names become @code{cparse}, ! 3120: @code{clex}, and so on. ! 3121: ! 3122: @strong{All the other variables and macros associated with Bison are not ! 3123: renamed.} These others are not global; there is no conflict if the same ! 3124: name is used in different parsers. For example, @code{YYSTYPE} is not ! 3125: renamed, but defining this in different ways in different parsers causes ! 3126: no trouble (@pxref{Value Type, ,Data Types of Semantic Values}). ! 3127: ! 3128: The @samp{-p} option works by adding macro definitions to the beginning ! 3129: of the parser source file, defining @code{yyparse} as ! 3130: @code{@var{prefix}parse}, and so on. This effectively substitutes one ! 3131: name for the other in the entire parser file. ! 3132: ! 3133: @node Interface, Algorithm, Grammar File, Top ! 3134: @chapter Parser C-Language Interface ! 3135: @cindex C-language interface ! 3136: @cindex interface ! 3137: ! 3138: The Bison parser is actually a C function named @code{yyparse}. Here we ! 3139: describe the interface conventions of @code{yyparse} and the other ! 3140: functions that it needs to use. ! 3141: ! 3142: Keep in mind that the parser uses many C identifiers starting with ! 3143: @samp{yy} and @samp{YY} for internal purposes. If you use such an ! 3144: identifier (aside from those in this manual) in an action or in additional ! 3145: C code in the grammar file, you are likely to run into trouble. ! 3146: ! 3147: @menu ! 3148: * Parser Function:: How to call @code{yyparse} and what it returns. ! 3149: * Lexical:: You must supply a function @code{yylex} ! 3150: which reads tokens. ! 3151: * Error Reporting:: You must supply a function @code{yyerror}. ! 3152: * Action Features:: Special features for use in actions. ! 3153: @end menu ! 3154: ! 3155: @node Parser Function, Lexical, , Interface ! 3156: @section The Parser Function @code{yyparse} ! 3157: @findex yyparse ! 3158: ! 3159: You call the function @code{yyparse} to cause parsing to occur. This ! 3160: function reads tokens, executes actions, and ultimately returns when it ! 3161: encounters end-of-input or an unrecoverable syntax error. You can also ! 3162: write an action which directs @code{yyparse} to return immediately without ! 3163: reading further. ! 3164: ! 3165: The value returned by @code{yyparse} is 0 if parsing was successful (return ! 3166: is due to end-of-input). ! 3167: ! 3168: The value is 1 if parsing failed (return is due to a syntax error). ! 3169: ! 3170: In an action, you can cause immediate return from @code{yyparse} by using ! 3171: these macros: ! 3172: ! 3173: @table @code ! 3174: @item YYACCEPT ! 3175: @findex YYACCEPT ! 3176: Return immediately with value 0 (to report success). ! 3177: ! 3178: @item YYABORT ! 3179: @findex YYABORT ! 3180: Return immediately with value 1 (to report failure). ! 3181: @end table ! 3182: ! 3183: @node Lexical, Error Reporting, Parser Function, Interface ! 3184: @section The Lexical Analyzer Function @code{yylex} ! 3185: @findex yylex ! 3186: @cindex lexical analyzer ! 3187: ! 3188: The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from ! 3189: the input stream and returns them to the parser. Bison does not create ! 3190: this function automatically; you must write it so that @code{yyparse} can ! 3191: call it. The function is sometimes referred to as a lexical scanner. ! 3192: ! 3193: In simple programs, @code{yylex} is often defined at the end of the Bison ! 3194: grammar file. If @code{yylex} is defined in a separate source file, you ! 3195: need to arrange for the token-type macro definitions to be available there. ! 3196: To do this, use the @samp{-d} option when you run Bison, so that it will ! 3197: write these macro definitions into a separate header file ! 3198: @file{@var{name}.tab.h} which you can include in the other source files ! 3199: that need it. @xref{Invocation, ,Invoking Bison}.@refill ! 3200: ! 3201: @menu ! 3202: * Calling Convention:: How @code{yyparse} calls @code{yylex}. ! 3203: * Token Values:: How @code{yylex} must return the semantic value ! 3204: of the token it has read. ! 3205: * Token Positions:: How @code{yylex} must return the text position ! 3206: (line number, etc.) of the token, if the ! 3207: actions want that. ! 3208: * Pure Calling:: How the calling convention differs ! 3209: in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). ! 3210: @end menu ! 3211: ! 3212: @node Calling Convention, Token Values, , Lexical ! 3213: @subsection Calling Convention for @code{yylex} ! 3214: ! 3215: The value that @code{yylex} returns must be the numeric code for the type ! 3216: of token it has just found, or 0 for end-of-input. ! 3217: ! 3218: When a token is referred to in the grammar rules by a name, that name ! 3219: in the parser file becomes a C macro whose definition is the proper ! 3220: numeric code for that token type. So @code{yylex} can use the name ! 3221: to indicate that type. @xref{Symbols}. ! 3222: ! 3223: When a token is referred to in the grammar rules by a character literal, ! 3224: the numeric code for that character is also the code for the token type. ! 3225: So @code{yylex} can simply return that character code. The null character ! 3226: must not be used this way, because its code is zero and that is what ! 3227: signifies end-of-input. ! 3228: ! 3229: Here is an example showing these things: ! 3230: ! 3231: @example ! 3232: yylex () ! 3233: @{ ! 3234: @dots{} ! 3235: if (c == EOF) /* Detect end of file. */ ! 3236: return 0; ! 3237: @dots{} ! 3238: if (c == '+' || c == '-') ! 3239: return c; /* Assume token type for `+' is '+'. */ ! 3240: @dots{} ! 3241: return INT; /* Return the type of the token. */ ! 3242: @dots{} ! 3243: @} ! 3244: @end example ! 3245: ! 3246: @noindent ! 3247: This interface has been designed so that the output from the @code{lex} ! 3248: utility can be used without change as the definition of @code{yylex}. ! 3249: ! 3250: @node Token Values, Token Positions, Calling Convention, Lexical ! 3251: @subsection Semantic Values of Tokens ! 3252: ! 3253: @vindex yylval ! 3254: In an ordinary (nonreentrant) parser, the semantic value of the token must ! 3255: be stored into the global variable @code{yylval}. When you are using ! 3256: just one data type for semantic values, @code{yylval} has that type. ! 3257: Thus, if the type is @code{int} (the default), you might write this in ! 3258: @code{yylex}: ! 3259: ! 3260: @example ! 3261: @group ! 3262: @dots{} ! 3263: yylval = value; /* Put value onto Bison stack. */ ! 3264: return INT; /* Return the type of the token. */ ! 3265: @dots{} ! 3266: @end group ! 3267: @end example ! 3268: ! 3269: When you are using multiple data types, @code{yylval}'s type is a union ! 3270: made from the @code{%union} declaration (@pxref{Union Decl, ,The Collection of Value Types}). So when ! 3271: you store a token's value, you must use the proper member of the union. ! 3272: If the @code{%union} declaration looks like this: ! 3273: ! 3274: @example ! 3275: @group ! 3276: %union @{ ! 3277: int intval; ! 3278: double val; ! 3279: symrec *tptr; ! 3280: @} ! 3281: @end group ! 3282: @end example ! 3283: ! 3284: @noindent ! 3285: then the code in @code{yylex} might look like this: ! 3286: ! 3287: @example ! 3288: @group ! 3289: @dots{} ! 3290: yylval.intval = value; /* Put value onto Bison stack. */ ! 3291: return INT; /* Return the type of the token. */ ! 3292: @dots{} ! 3293: @end group ! 3294: @end example ! 3295: ! 3296: @node Token Positions, Pure Calling, Token Values, Lexical ! 3297: @subsection Textual Positions of Tokens ! 3298: ! 3299: @vindex yylloc ! 3300: If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, ,Special Features for Use in Actions}) in ! 3301: actions to keep track of the textual locations of tokens and groupings, ! 3302: then you must provide this information in @code{yylex}. The function ! 3303: @code{yyparse} expects to find the textual location of a token just parsed ! 3304: in the global variable @code{yylloc}. So @code{yylex} must store the ! 3305: proper data in that variable. The value of @code{yylloc} is a structure ! 3306: and you need only initialize the members that are going to be used by the ! 3307: actions. The four members are called @code{first_line}, ! 3308: @code{first_column}, @code{last_line} and @code{last_column}. Note that ! 3309: the use of this feature makes the parser noticeably slower. ! 3310: ! 3311: @tindex YYLTYPE ! 3312: The data type of @code{yylloc} has the name @code{YYLTYPE}. ! 3313: ! 3314: @node Pure Calling, , Token Positions, Lexical ! 3315: @subsection Calling for Pure Parsers ! 3316: ! 3317: When you use the Bison declaration @code{%pure_parser} to request a pure, ! 3318: reentrant parser, the global communication variables @code{yylval} and ! 3319: @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) Parser}.) In such parsers the ! 3320: two global variables are replaced by pointers passed as arguments to ! 3321: @code{yylex}. You must declare them as shown here, and pass the ! 3322: information back by storing it through those pointers. ! 3323: ! 3324: @example ! 3325: yylex (lvalp, llocp) ! 3326: YYSTYPE *lvalp; ! 3327: YYLTYPE *llocp; ! 3328: @{ ! 3329: @dots{} ! 3330: *lvalp = value; /* Put value onto Bison stack. */ ! 3331: return INT; /* Return the type of the token. */ ! 3332: @dots{} ! 3333: @} ! 3334: @end example ! 3335: ! 3336: If the grammar file does not use the @samp{@@} constructs to refer to ! 3337: textual positions, then the type @code{YYLTYPE} will not be defined. In ! 3338: this case, omit the second argument; @code{yylex} will be called with ! 3339: only one argument. ! 3340: ! 3341: @node Error Reporting, Action Features, Lexical, Interface ! 3342: @section The Error Reporting Function @code{yyerror} ! 3343: @cindex error reporting function ! 3344: @findex yyerror ! 3345: @cindex parse error ! 3346: @cindex syntax error ! 3347: ! 3348: The Bison parser detects a @dfn{parse error} or @dfn{syntax error} ! 3349: whenever it reads a token which cannot satisfy any syntax rule. A ! 3350: action in the grammar can also explicitly proclaim an error, using the ! 3351: macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use in Actions}). ! 3352: ! 3353: The Bison parser expects to report the error by calling an error ! 3354: reporting function named @code{yyerror}, which you must supply. It is ! 3355: called by @code{yyparse} whenever a syntax error is found, and it ! 3356: receives one argument. For a parse error, the string is normally ! 3357: @w{@code{"parse error"}}. ! 3358: ! 3359: @findex YYERROR_VERBOSE ! 3360: If you define the macro @code{YYERROR_VERBOSE} in the Bison declarations ! 3361: section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose ! 3362: and specific error message string instead of just plain @w{@code{"parse ! 3363: error"}}. It doesn't matter what definition you use for ! 3364: @code{YYERROR_VERBOSE}, just whether you define it. ! 3365: ! 3366: The parser can detect one other kind of error: stack overflow. This ! 3367: happens when the input contains constructions that are very deeply ! 3368: nested. It isn't likely you will encounter this, since the Bison ! 3369: parser extends its stack automatically up to a very large limit. But ! 3370: if overflow happens, @code{yyparse} calls @code{yyerror} in the usual ! 3371: fashion, except that the argument string is @w{@code{"parser stack ! 3372: overflow"}}. ! 3373: ! 3374: The following definition suffices in simple programs: ! 3375: ! 3376: @example ! 3377: @group ! 3378: yyerror (s) ! 3379: char *s; ! 3380: @{ ! 3381: @end group ! 3382: @group ! 3383: fprintf (stderr, "%s\n", s); ! 3384: @} ! 3385: @end group ! 3386: @end example ! 3387: ! 3388: After @code{yyerror} returns to @code{yyparse}, the latter will attempt ! 3389: error recovery if you have written suitable error recovery grammar rules ! 3390: (@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will ! 3391: immediately return 1. ! 3392: ! 3393: @vindex yynerrs ! 3394: The variable @code{yynerrs} contains the number of syntax errors ! 3395: encountered so far. Normally this variable is global; but if you ! 3396: request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) then it is a local variable ! 3397: which only the actions can access. ! 3398: ! 3399: @node Action Features, , Error Reporting, Interface ! 3400: @section Special Features for Use in Actions ! 3401: @cindex summary, action features ! 3402: @cindex action features summary ! 3403: ! 3404: Here is a table of Bison constructs, variables and macros that ! 3405: are useful in actions. ! 3406: ! 3407: @table @samp ! 3408: @item $$ ! 3409: Acts like a variable that contains the semantic value for the ! 3410: grouping made by the current rule. @xref{Actions}. ! 3411: ! 3412: @item $@var{n} ! 3413: Acts like a variable that contains the semantic value for the ! 3414: @var{n}th component of the current rule. @xref{Actions}. ! 3415: ! 3416: @item $<@var{typealt}>$ ! 3417: Like @code{$$} but specifies alternative @var{typealt} in the union ! 3418: specified by the @code{%union} declaration. @xref{Action Types, ,Data Types of Values in Actions}. ! 3419: ! 3420: @item $<@var{typealt}>@var{n} ! 3421: Like @code{$@var{n}} but specifies alternative @var{typealt} in the ! 3422: union specified by the @code{%union} declaration. ! 3423: @xref{Action Types, ,Data Types of Values in Actions}.@refill ! 3424: ! 3425: @item YYABORT; ! 3426: Return immediately from @code{yyparse}, indicating failure. ! 3427: @xref{Parser Function, ,The Parser Function @code{yyparse}}. ! 3428: ! 3429: @item YYACCEPT; ! 3430: Return immediately from @code{yyparse}, indicating success. ! 3431: @xref{Parser Function, ,The Parser Function @code{yyparse}}. ! 3432: ! 3433: @item YYBACKUP (@var{token}, @var{value}); ! 3434: @findex YYBACKUP ! 3435: Unshift a token. This macro is allowed only for rules that reduce ! 3436: a single value, and only when there is no look-ahead token. ! 3437: It installs a look-ahead token with token type @var{token} and ! 3438: semantic value @var{value}; then it discards the value that was ! 3439: going to be reduced by this rule. ! 3440: ! 3441: If the macro is used when it is not valid, such as when there is ! 3442: a look-ahead token already, then it reports a syntax error with ! 3443: a message @samp{cannot back up} and performs ordinary error ! 3444: recovery. ! 3445: ! 3446: In either case, the rest of the action is not executed. ! 3447: ! 3448: @item YYEMPTY ! 3449: @vindex YYEMPTY ! 3450: Value stored in @code{yychar} when there is no look-ahead token. ! 3451: ! 3452: @item YYERROR; ! 3453: @findex YYERROR ! 3454: Cause an immediate syntax error. This statement initiates error ! 3455: recovery just as if the parser itself had detected an error; however, it ! 3456: does not call @code{yyerror}, and does not print any message. If you ! 3457: want to print an error message, call @code{yyerror} explicitly before ! 3458: the @samp{YYERROR;} statement. @xref{Error Recovery}. ! 3459: ! 3460: @item YYRECOVERING ! 3461: This macro stands for an expression that has the value 1 when the parser ! 3462: is recovering from a syntax error, and 0 the rest of the time. ! 3463: @xref{Error Recovery}. ! 3464: ! 3465: @item yychar ! 3466: Variable containing the current look-ahead token. (In a pure parser, ! 3467: this is actually a local variable within @code{yyparse}.) When there is ! 3468: no look-ahead token, the value @code{YYEMPTY} is stored in the variable. ! 3469: @xref{Look-Ahead, ,Look-Ahead Tokens}. ! 3470: ! 3471: @item yyclearin; ! 3472: Discard the current look-ahead token. This is useful primarily in ! 3473: error rules. @xref{Error Recovery}. ! 3474: ! 3475: @item yyerrok; ! 3476: Resume generating error messages immediately for subsequent syntax ! 3477: errors. This is useful primarily in error rules. ! 3478: @xref{Error Recovery}. ! 3479: ! 3480: @item @@@var{n} ! 3481: @findex @@@var{n} ! 3482: Acts like a structure variable containing information on the line ! 3483: numbers and column numbers of the @var{n}th component of the current ! 3484: rule. The structure has four members, like this: ! 3485: ! 3486: @example ! 3487: struct @{ ! 3488: int first_line, last_line; ! 3489: int first_column, last_column; ! 3490: @}; ! 3491: @end example ! 3492: ! 3493: Thus, to get the starting line number of the third component, use ! 3494: @samp{@@3.first_line}. ! 3495: ! 3496: In order for the members of this structure to contain valid information, ! 3497: you must make @code{yylex} supply this information about each token. ! 3498: If you need only certain members, then @code{yylex} need only fill in ! 3499: those members. ! 3500: ! 3501: The use of this feature makes the parser noticeably slower. ! 3502: @end table ! 3503: ! 3504: @node Algorithm, Error Recovery, Interface, Top ! 3505: @chapter The Bison Parser Algorithm ! 3506: @cindex Bison parser algorithm ! 3507: @cindex algorithm of parser ! 3508: @cindex shifting ! 3509: @cindex reduction ! 3510: @cindex parser stack ! 3511: @cindex stack, parser ! 3512: ! 3513: As Bison reads tokens, it pushes them onto a stack along with their ! 3514: semantic values. The stack is called the @dfn{parser stack}. Pushing a ! 3515: token is traditionally called @dfn{shifting}. ! 3516: ! 3517: For example, suppose the infix calculator has read @samp{1 + 5 *}, with a ! 3518: @samp{3} to come. The stack will have four elements, one for each token ! 3519: that was shifted. ! 3520: ! 3521: But the stack does not always have an element for each token read. When ! 3522: the last @var{n} tokens and groupings shifted match the components of a ! 3523: grammar rule, they can be combined according to that rule. This is called ! 3524: @dfn{reduction}. Those tokens and groupings are replaced on the stack by a ! 3525: single grouping whose symbol is the result (left hand side) of that rule. ! 3526: Running the rule's action is part of the process of reduction, because this ! 3527: is what computes the semantic value of the resulting grouping. ! 3528: ! 3529: For example, if the infix calculator's parser stack contains this: ! 3530: ! 3531: @example ! 3532: 1 + 5 * 3 ! 3533: @end example ! 3534: ! 3535: @noindent ! 3536: and the next input token is a newline character, then the last three ! 3537: elements can be reduced to 15 via the rule: ! 3538: ! 3539: @example ! 3540: expr: expr '*' expr; ! 3541: @end example ! 3542: ! 3543: @noindent ! 3544: Then the stack contains just these three elements: ! 3545: ! 3546: @example ! 3547: 1 + 15 ! 3548: @end example ! 3549: ! 3550: @noindent ! 3551: At this point, another reduction can be made, resulting in the single value ! 3552: 16. Then the newline token can be shifted. ! 3553: ! 3554: The parser tries, by shifts and reductions, to reduce the entire input down ! 3555: to a single grouping whose symbol is the grammar's start-symbol ! 3556: (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}). ! 3557: ! 3558: This kind of parser is known in the literature as a bottom-up parser. ! 3559: ! 3560: @menu ! 3561: * Look-Ahead:: Parser looks one token ahead when deciding what to do. ! 3562: * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. ! 3563: * Precedence:: Operator precedence works by resolving conflicts. ! 3564: * Contextual Precedence:: When an operator's precedence depends on context. ! 3565: * Parser States:: The parser is a finite-state-machine with stack. ! 3566: * Reduce/Reduce:: When two rules are applicable in the same situation. ! 3567: * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. ! 3568: * Stack Overflow:: What happens when stack gets full. How to avoid it. ! 3569: @end menu ! 3570: ! 3571: @node Look-Ahead, Shift/Reduce, , Algorithm ! 3572: @section Look-Ahead Tokens ! 3573: @cindex look-ahead token ! 3574: ! 3575: The Bison parser does @emph{not} always reduce immediately as soon as the ! 3576: last @var{n} tokens and groupings match a rule. This is because such a ! 3577: simple strategy is inadequate to handle most languages. Instead, when a ! 3578: reduction is possible, the parser sometimes ``looks ahead'' at the next ! 3579: token in order to decide what to do. ! 3580: ! 3581: When a token is read, it is not immediately shifted; first it becomes the ! 3582: @dfn{look-ahead token}, which is not on the stack. Now the parser can ! 3583: perform one or more reductions of tokens and groupings on the stack, while ! 3584: the look-ahead token remains off to the side. When no more reductions ! 3585: should take place, the look-ahead token is shifted onto the stack. This ! 3586: does not mean that all possible reductions have been done; depending on the ! 3587: token type of the look-ahead token, some rules may choose to delay their ! 3588: application. ! 3589: ! 3590: Here is a simple case where look-ahead is needed. These three rules define ! 3591: expressions which contain binary addition operators and postfix unary ! 3592: factorial operators (@samp{!}), and allow parentheses for grouping. ! 3593: ! 3594: @example ! 3595: @group ! 3596: expr: term '+' expr ! 3597: | term ! 3598: ; ! 3599: @end group ! 3600: ! 3601: @group ! 3602: term: '(' expr ')' ! 3603: | term '!' ! 3604: | NUMBER ! 3605: ; ! 3606: @end group ! 3607: @end example ! 3608: ! 3609: Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what ! 3610: should be done? If the following token is @samp{)}, then the first three ! 3611: tokens must be reduced to form an @code{expr}. This is the only valid ! 3612: course, because shifting the @samp{)} would produce a sequence of symbols ! 3613: @w{@code{term ')'}}, and no rule allows this. ! 3614: ! 3615: If the following token is @samp{!}, then it must be shifted immediately so ! 3616: that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the ! 3617: parser were to reduce before shifting, @w{@samp{1 + 2}} would become an ! 3618: @code{expr}. It would then be impossible to shift the @samp{!} because ! 3619: doing so would produce on the stack the sequence of symbols @code{expr ! 3620: '!'}. No rule allows that sequence. ! 3621: ! 3622: @vindex yychar ! 3623: The current look-ahead token is stored in the variable @code{yychar}. ! 3624: @xref{Action Features, ,Special Features for Use in Actions}. ! 3625: ! 3626: @node Shift/Reduce, Precedence, Look-Ahead, Algorithm ! 3627: @section Shift/Reduce Conflicts ! 3628: @cindex conflicts ! 3629: @cindex shift/reduce conflicts ! 3630: @cindex dangling @code{else} ! 3631: @cindex @code{else}, dangling ! 3632: ! 3633: Suppose we are parsing a language which has if-then and if-then-else ! 3634: statements, with a pair of rules like this: ! 3635: ! 3636: @example ! 3637: @group ! 3638: if_stmt: ! 3639: IF expr THEN stmt ! 3640: | IF expr THEN stmt ELSE stmt ! 3641: ; ! 3642: @end group ! 3643: @end example ! 3644: ! 3645: @noindent ! 3646: Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are ! 3647: terminal symbols for specific keyword tokens. ! 3648: ! 3649: When the @code{ELSE} token is read and becomes the look-ahead token, the ! 3650: contents of the stack (assuming the input is valid) are just right for ! 3651: reduction by the first rule. But it is also legitimate to shift the ! 3652: @code{ELSE}, because that would lead to eventual reduction by the second ! 3653: rule. ! 3654: ! 3655: This situation, where either a shift or a reduction would be valid, is ! 3656: called a @dfn{shift/reduce conflict}. Bison is designed to resolve ! 3657: these conflicts by choosing to shift, unless otherwise directed by ! 3658: operator precedence declarations. To see the reason for this, let's ! 3659: contrast it with the other alternative. ! 3660: ! 3661: Since the parser prefers to shift the @code{ELSE}, the result is to attach ! 3662: the else-clause to the innermost if-statement, making these two inputs ! 3663: equivalent: ! 3664: ! 3665: @example ! 3666: if x then if y then win (); else lose; ! 3667: ! 3668: if x then do; if y then win (); else lose; end; ! 3669: @end example ! 3670: ! 3671: But if the parser chose to reduce when possible rather than shift, the ! 3672: result would be to attach the else-clause to the outermost if-statement, ! 3673: making these two inputs equivalent: ! 3674: ! 3675: @example ! 3676: if x then if y then win (); else lose; ! 3677: ! 3678: if x then do; if y then win (); end; else lose; ! 3679: @end example ! 3680: ! 3681: The conflict exists because the grammar as written is ambiguous: either ! 3682: parsing of the simple nested if-statement is legitimate. The established ! 3683: convention is that these ambiguities are resolved by attaching the ! 3684: else-clause to the innermost if-statement; this is what Bison accomplishes ! 3685: by choosing to shift rather than reduce. (It would ideally be cleaner to ! 3686: write an unambiguous grammar, but that is very hard to do in this case.) ! 3687: This particular ambiguity was first encountered in the specifications of ! 3688: Algol 60 and is called the ``dangling @code{else}'' ambiguity. ! 3689: ! 3690: To avoid warnings from Bison about predictable, legitimate shift/reduce ! 3691: conflicts, use the @code{%expect @var{n}} declaration. There will be no ! 3692: warning as long as the number of shift/reduce conflicts is exactly @var{n}. ! 3693: @xref{Expect Decl, ,Suppressing Conflict Warnings}. ! 3694: ! 3695: The definition of @code{if_stmt} above is solely to blame for the ! 3696: conflict, but the conflict does not actually appear without additional ! 3697: rules. Here is a complete Bison input file that actually manifests the ! 3698: conflict: ! 3699: ! 3700: @example ! 3701: @group ! 3702: %token IF THEN ELSE variable ! 3703: %% ! 3704: @end group ! 3705: @group ! 3706: stmt: expr ! 3707: | if_stmt ! 3708: ; ! 3709: @end group ! 3710: ! 3711: @group ! 3712: if_stmt: ! 3713: IF expr THEN stmt ! 3714: | IF expr THEN stmt ELSE stmt ! 3715: ; ! 3716: @end group ! 3717: ! 3718: expr: variable ! 3719: ; ! 3720: @end example ! 3721: ! 3722: @node Precedence, Contextual Precedence, Shift/Reduce, Algorithm ! 3723: @section Operator Precedence ! 3724: @cindex operator precedence ! 3725: @cindex precedence of operators ! 3726: ! 3727: Another situation where shift/reduce conflicts appear is in arithmetic ! 3728: expressions. Here shifting is not always the preferred resolution; the ! 3729: Bison declarations for operator precedence allow you to specify when to ! 3730: shift and when to reduce. ! 3731: ! 3732: @menu ! 3733: * Why Precedence:: An example showing why precedence is needed. ! 3734: * Using Precedence:: How to specify precedence in Bison grammars. ! 3735: * Precedence Examples:: How these features are used in the previous example. ! 3736: * How Precedence:: How they work. ! 3737: @end menu ! 3738: ! 3739: @node Why Precedence, Using Precedence, , Precedence ! 3740: @subsection When Precedence is Needed ! 3741: ! 3742: Consider the following ambiguous grammar fragment (ambiguous because the ! 3743: input @w{@samp{1 - 2 * 3}} can be parsed in two different ways): ! 3744: ! 3745: @example ! 3746: @group ! 3747: expr: expr '-' expr ! 3748: | expr '*' expr ! 3749: | expr '<' expr ! 3750: | '(' expr ')' ! 3751: @dots{} ! 3752: ; ! 3753: @end group ! 3754: @end example ! 3755: ! 3756: @noindent ! 3757: Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2}; ! 3758: should it reduce them via the rule for the addition operator? It depends ! 3759: on the next token. Of course, if the next token is @samp{)}, we must ! 3760: reduce; shifting is invalid because no single rule can reduce the token ! 3761: sequence @w{@samp{- 2 )}} or anything starting with that. But if the next ! 3762: token is @samp{*} or @samp{<}, we have a choice: either shifting or ! 3763: reduction would allow the parse to complete, but with different ! 3764: results. ! 3765: ! 3766: To decide which one Bison should do, we must consider the ! 3767: results. If the next operator token @var{op} is shifted, then it ! 3768: must be reduced first in order to permit another opportunity to ! 3769: reduce the sum. The result is (in effect) @w{@samp{1 - (2 ! 3770: @var{op} 3)}}. On the other hand, if the subtraction is reduced ! 3771: before shifting @var{op}, the result is @w{@samp{(1 - 2) @var{op} ! 3772: 3}}. Clearly, then, the choice of shift or reduce should depend ! 3773: on the relative precedence of the operators @samp{-} and ! 3774: @var{op}: @samp{*} should be shifted first, but not @samp{<}. ! 3775: ! 3776: @cindex associativity ! 3777: What about input such as @w{@samp{1 - 2 - 5}}; should this be ! 3778: @w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For ! 3779: most operators we prefer the former, which is called @dfn{left ! 3780: association}. The latter alternative, @dfn{right association}, is ! 3781: desirable for assignment operators. The choice of left or right ! 3782: association is a matter of whether the parser chooses to shift or ! 3783: reduce when the stack contains @w{@samp{1 - 2}} and the look-ahead ! 3784: token is @samp{-}: shifting makes right-associativity. ! 3785: ! 3786: @node Using Precedence, Precedence Examples, Why Precedence, Precedence ! 3787: @subsection Specifying Operator Precedence ! 3788: @findex %left ! 3789: @findex %right ! 3790: @findex %nonassoc ! 3791: ! 3792: Bison allows you to specify these choices with the operator precedence ! 3793: declarations @code{%left} and @code{%right}. Each such declaration ! 3794: contains a list of tokens, which are operators whose precedence and ! 3795: associativity is being declared. The @code{%left} declaration makes all ! 3796: those operators left-associative and the @code{%right} declaration makes ! 3797: them right-associative. A third alternative is @code{%nonassoc}, which ! 3798: declares that it is a syntax error to find the same operator twice ``in a ! 3799: row''. ! 3800: ! 3801: The relative precedence of different operators is controlled by the ! 3802: order in which they are declared. The first @code{%left} or ! 3803: @code{%right} declaration in the file declares the operators whose ! 3804: precedence is lowest, the next such declaration declares the operators ! 3805: whose precedence is a little higher, and so on. ! 3806: ! 3807: @node Precedence Examples, How Precedence, Using Precedence, Precedence ! 3808: @subsection Precedence Examples ! 3809: ! 3810: In our example, we would want the following declarations: ! 3811: ! 3812: @example ! 3813: %left '<' ! 3814: %left '-' ! 3815: %left '*' ! 3816: @end example ! 3817: ! 3818: In a more complete example, which supports other operators as well, we ! 3819: would declare them in groups of equal precedence. For example, @code{'+'} is ! 3820: declared with @code{'-'}: ! 3821: ! 3822: @example ! 3823: %left '<' '>' '=' NE LE GE ! 3824: %left '+' '-' ! 3825: %left '*' '/' ! 3826: @end example ! 3827: ! 3828: @noindent ! 3829: (Here @code{NE} and so on stand for the operators for ``not equal'' ! 3830: and so on. We assume that these tokens are more than one character long ! 3831: and therefore are represented by names, not character literals.) ! 3832: ! 3833: @node How Precedence, , Precedence Examples, Precedence ! 3834: @subsection How Precedence Works ! 3835: ! 3836: The first effect of the precedence declarations is to assign precedence ! 3837: levels to the terminal symbols declared. The second effect is to assign ! 3838: precedence levels to certain rules: each rule gets its precedence from the ! 3839: last terminal symbol mentioned in the components. (You can also specify ! 3840: explicitly the precedence of a rule. @xref{Contextual Precedence, ,Context-Dependent Precedence}.) ! 3841: ! 3842: Finally, the resolution of conflicts works by comparing the ! 3843: precedence of the rule being considered with that of the ! 3844: look-ahead token. If the token's precedence is higher, the ! 3845: choice is to shift. If the rule's precedence is higher, the ! 3846: choice is to reduce. If they have equal precedence, the choice ! 3847: is made based on the associativity of that precedence level. The ! 3848: verbose output file made by @samp{-v} (@pxref{Invocation, ,Invoking Bison}) says ! 3849: how each conflict was resolved. ! 3850: ! 3851: Not all rules and not all tokens have precedence. If either the rule or ! 3852: the look-ahead token has no precedence, then the default is to shift. ! 3853: ! 3854: @node Contextual Precedence, Parser States, Precedence, Algorithm ! 3855: @section Context-Dependent Precedence ! 3856: @cindex context-dependent precedence ! 3857: @cindex unary operator precedence ! 3858: @cindex precedence, context-dependent ! 3859: @cindex precedence, unary operator ! 3860: @findex %prec ! 3861: ! 3862: Often the precedence of an operator depends on the context. This sounds ! 3863: outlandish at first, but it is really very common. For example, a minus ! 3864: sign typically has a very high precedence as a unary operator, and a ! 3865: somewhat lower precedence (lower than multiplication) as a binary operator. ! 3866: ! 3867: The Bison precedence declarations, @code{%left}, @code{%right} and ! 3868: @code{%nonassoc}, can only be used once for a given token; so a token has ! 3869: only one precedence declared in this way. For context-dependent ! 3870: precedence, you need to use an additional mechanism: the @code{%prec} ! 3871: modifier for rules.@refill ! 3872: ! 3873: The @code{%prec} modifier declares the precedence of a particular rule by ! 3874: specifying a terminal symbol whose precedence should be used for that rule. ! 3875: It's not necessary for that symbol to appear otherwise in the rule. The ! 3876: modifier's syntax is: ! 3877: ! 3878: @example ! 3879: %prec @var{terminal-symbol} ! 3880: @end example ! 3881: ! 3882: @noindent ! 3883: and it is written after the components of the rule. Its effect is to ! 3884: assign the rule the precedence of @var{terminal-symbol}, overriding ! 3885: the precedence that would be deduced for it in the ordinary way. The ! 3886: altered rule precedence then affects how conflicts involving that rule ! 3887: are resolved (@pxref{Precedence, ,Operator Precedence}). ! 3888: ! 3889: Here is how @code{%prec} solves the problem of unary minus. First, declare ! 3890: a precedence for a fictitious terminal symbol named @code{UMINUS}. There ! 3891: are no tokens of this type, but the symbol serves to stand for its ! 3892: precedence: ! 3893: ! 3894: @example ! 3895: @dots{} ! 3896: %left '+' '-' ! 3897: %left '*' ! 3898: %left UMINUS ! 3899: @end example ! 3900: ! 3901: Now the precedence of @code{UMINUS} can be used in specific rules: ! 3902: ! 3903: @example ! 3904: @group ! 3905: exp: @dots{} ! 3906: | exp '-' exp ! 3907: @dots{} ! 3908: | '-' exp %prec UMINUS ! 3909: @end group ! 3910: @end example ! 3911: ! 3912: @node Parser States, Reduce/Reduce, Contextual Precedence, Algorithm ! 3913: @section Parser States ! 3914: @cindex finite-state machine ! 3915: @cindex parser state ! 3916: @cindex state (of parser) ! 3917: ! 3918: The function @code{yyparse} is implemented using a finite-state machine. ! 3919: The values pushed on the parser stack are not simply token type codes; they ! 3920: represent the entire sequence of terminal and nonterminal symbols at or ! 3921: near the top of the stack. The current state collects all the information ! 3922: about previous input which is relevant to deciding what to do next. ! 3923: ! 3924: Each time a look-ahead token is read, the current parser state together ! 3925: with the type of look-ahead token are looked up in a table. This table ! 3926: entry can say, ``Shift the look-ahead token.'' In this case, it also ! 3927: specifies the new parser state, which is pushed onto the top of the ! 3928: parser stack. Or it can say, ``Reduce using rule number @var{n}.'' ! 3929: This means that a certain number of tokens or groupings are taken off ! 3930: the top of the stack, and replaced by one grouping. In other words, ! 3931: that number of states are popped from the stack, and one new state is ! 3932: pushed. ! 3933: ! 3934: There is one other alternative: the table can say that the look-ahead token ! 3935: is erroneous in the current state. This causes error processing to begin ! 3936: (@pxref{Error Recovery}). ! 3937: ! 3938: @node Reduce/Reduce, Mystery Conflicts, Parser States, Algorithm ! 3939: @section Reduce/Reduce Conflicts ! 3940: @cindex reduce/reduce conflict ! 3941: @cindex conflicts, reduce/reduce ! 3942: ! 3943: A reduce/reduce conflict occurs if there are two or more rules that apply ! 3944: to the same sequence of input. This usually indicates a serious error ! 3945: in the grammar. ! 3946: ! 3947: For example, here is an erroneous attempt to define a sequence ! 3948: of zero or more @code{word} groupings. ! 3949: ! 3950: @example ! 3951: sequence: /* empty */ ! 3952: @{ printf ("empty sequence\n"); @} ! 3953: | maybeword ! 3954: | sequence word ! 3955: @{ printf ("added word %s\n", $2); @} ! 3956: ; ! 3957: ! 3958: maybeword: /* empty */ ! 3959: @{ printf ("empty maybeword\n"); @} ! 3960: | word ! 3961: @{ printf ("single word %s\n", $1); @} ! 3962: ; ! 3963: @end example ! 3964: ! 3965: @noindent ! 3966: The error is an ambiguity: there is more than one way to parse a single ! 3967: @code{word} into a @code{sequence}. It could be reduced to a ! 3968: @code{maybeword} and then into a @code{sequence} via the second rule. ! 3969: Alternatively, nothing-at-all could be reduced into a @code{sequence} ! 3970: via the first rule, and this could be combined with the @code{word} ! 3971: using the third rule for @code{sequence}. ! 3972: ! 3973: There is also more than one way to reduce nothing-at-all into a ! 3974: @code{sequence}. This can be done directly via the first rule, ! 3975: or indirectly via @code{maybeword} and then the second rule. ! 3976: ! 3977: You might think that this is a distinction without a difference, because it ! 3978: does not change whether any particular input is valid or not. But it does ! 3979: affect which actions are run. One parsing order runs the second rule's ! 3980: action; the other runs the first rule's action and the third rule's action. ! 3981: In this example, the output of the program changes. ! 3982: ! 3983: Bison resolves a reduce/reduce conflict by choosing to use the rule that ! 3984: appears first in the grammar, but it is very risky to rely on this. Every ! 3985: reduce/reduce conflict must be studied and usually eliminated. Here is the ! 3986: proper way to define @code{sequence}: ! 3987: ! 3988: @example ! 3989: sequence: /* empty */ ! 3990: @{ printf ("empty sequence\n"); @} ! 3991: | sequence word ! 3992: @{ printf ("added word %s\n", $2); @} ! 3993: ; ! 3994: @end example ! 3995: ! 3996: Here is another common error that yields a reduce/reduce conflict: ! 3997: ! 3998: @example ! 3999: sequence: /* empty */ ! 4000: | sequence words ! 4001: | sequence redirects ! 4002: ; ! 4003: ! 4004: words: /* empty */ ! 4005: | words word ! 4006: ; ! 4007: ! 4008: redirects:/* empty */ ! 4009: | redirects redirect ! 4010: ; ! 4011: @end example ! 4012: ! 4013: @noindent ! 4014: The intention here is to define a sequence which can contain either ! 4015: @code{word} or @code{redirect} groupings. The individual definitions of ! 4016: @code{sequence}, @code{words} and @code{redirects} are error-free, but the ! 4017: three together make a subtle ambiguity: even an empty input can be parsed ! 4018: in infinitely many ways! ! 4019: ! 4020: Consider: nothing-at-all could be a @code{words}. Or it could be two ! 4021: @code{words} in a row, or three, or any number. It could equally well be a ! 4022: @code{redirects}, or two, or any number. Or it could be a @code{words} ! 4023: followed by three @code{redirects} and another @code{words}. And so on. ! 4024: ! 4025: Here are two ways to correct these rules. First, to make it a single level ! 4026: of sequence: ! 4027: ! 4028: @example ! 4029: sequence: /* empty */ ! 4030: | sequence word ! 4031: | sequence redirect ! 4032: ; ! 4033: @end example ! 4034: ! 4035: Second, to prevent either a @code{words} or a @code{redirects} ! 4036: from being empty: ! 4037: ! 4038: @example ! 4039: sequence: /* empty */ ! 4040: | sequence words ! 4041: | sequence redirects ! 4042: ; ! 4043: ! 4044: words: word ! 4045: | words word ! 4046: ; ! 4047: ! 4048: redirects:redirect ! 4049: | redirects redirect ! 4050: ; ! 4051: @end example ! 4052: ! 4053: @node Mystery Conflicts, Stack Overflow, Reduce/Reduce, Algorithm ! 4054: @section Mysterious Reduce/Reduce Conflicts ! 4055: ! 4056: Sometimes reduce/reduce conflicts can occur that don't look warranted. ! 4057: Here is an example: ! 4058: ! 4059: @example ! 4060: @group ! 4061: %token ID ! 4062: ! 4063: %% ! 4064: def: param_spec return_spec ',' ! 4065: ; ! 4066: param_spec: ! 4067: type ! 4068: | name_list ':' type ! 4069: ; ! 4070: @end group ! 4071: @group ! 4072: return_spec: ! 4073: type ! 4074: | name ':' type ! 4075: ; ! 4076: @end group ! 4077: @group ! 4078: type: ID ! 4079: ; ! 4080: @end group ! 4081: @group ! 4082: name: ID ! 4083: ; ! 4084: name_list: ! 4085: name ! 4086: | name ',' name_list ! 4087: ; ! 4088: @end group ! 4089: @end example ! 4090: ! 4091: It would seem that this grammar can be parsed with only a single token ! 4092: of look-ahead: when a @code{param_spec} is being read, an @code{ID} is ! 4093: a @code{name} if a comma or colon follows, or a @code{type} if another ! 4094: @code{ID} follows. In other words, this grammar is LR(1). ! 4095: ! 4096: @cindex LR(1) ! 4097: @cindex LALR(1) ! 4098: However, Bison, like most parser generators, cannot actually handle all ! 4099: LR(1) grammars. In this grammar, two contexts, that after an @code{ID} ! 4100: at the beginning of a @code{param_spec} and likewise at the beginning of ! 4101: a @code{return_spec}, are similar enough that Bison assumes they are the ! 4102: same. They appear similar because the same set of rules would be ! 4103: active---the rule for reducing to a @code{name} and that for reducing to ! 4104: a @code{type}. Bison is unable to determine at that stage of processing ! 4105: that the rules would require different look-ahead tokens in the two ! 4106: contexts, so it makes a single parser state for them both. Combining ! 4107: the two contexts causes a conflict later. In parser terminology, this ! 4108: occurrence means that the grammar is not LALR(1). ! 4109: ! 4110: In general, it is better to fix deficiencies than to document them. But ! 4111: this particular deficiency is intrinsically hard to fix; parser ! 4112: generators that can handle LR(1) grammars are hard to write and tend to ! 4113: produce parsers that are very large. In practice, Bison is more useful ! 4114: as it is now. ! 4115: ! 4116: When the problem arises, you can often fix it by identifying the two ! 4117: parser states that are being confused, and adding something to make them ! 4118: look distinct. In the above example, adding one rule to ! 4119: @code{return_spec} as follows makes the problem go away: ! 4120: ! 4121: @example ! 4122: @group ! 4123: %token BOGUS ! 4124: @dots{} ! 4125: %% ! 4126: @dots{} ! 4127: return_spec: ! 4128: type ! 4129: | name ':' type ! 4130: /* This rule is never used. */ ! 4131: | ID BOGUS ! 4132: ; ! 4133: @end group ! 4134: @end example ! 4135: ! 4136: This corrects the problem because it introduces the possibility of an ! 4137: additional active rule in the context after the @code{ID} at the beginning of ! 4138: @code{return_spec}. This rule is not active in the corresponding context ! 4139: in a @code{param_spec}, so the two contexts receive distinct parser states. ! 4140: As long as the token @code{BOGUS} is never generated by @code{yylex}, ! 4141: the added rule cannot alter the way actual input is parsed. ! 4142: ! 4143: In this particular example, there is another way to solve the problem: ! 4144: rewrite the rule for @code{return_spec} to use @code{ID} directly ! 4145: instead of via @code{name}. This also causes the two confusing ! 4146: contexts to have different sets of active rules, because the one for ! 4147: @code{return_spec} activates the altered rule for @code{return_spec} ! 4148: rather than the one for @code{name}. ! 4149: ! 4150: @example ! 4151: param_spec: ! 4152: type ! 4153: | name_list ':' type ! 4154: ; ! 4155: return_spec: ! 4156: type ! 4157: | ID ':' type ! 4158: ; ! 4159: @end example ! 4160: ! 4161: @node Stack Overflow, , Mystery Conflicts, Algorithm ! 4162: @section Stack Overflow, and How to Avoid It ! 4163: @cindex stack overflow ! 4164: @cindex parser stack overflow ! 4165: @cindex overflow of parser stack ! 4166: ! 4167: The Bison parser stack can overflow if too many tokens are shifted and ! 4168: not reduced. When this happens, the parser function @code{yyparse} ! 4169: returns a nonzero value, pausing only to call @code{yyerror} to report ! 4170: the overflow. ! 4171: ! 4172: @vindex YYMAXDEPTH ! 4173: By defining the macro @code{YYMAXDEPTH}, you can control how deep the ! 4174: parser stack can become before a stack overflow occurs. Define the ! 4175: macro with a value that is an integer. This value is the maximum number ! 4176: of tokens that can be shifted (and not reduced) before overflow. ! 4177: It must be a constant expression whose value is known at compile time. ! 4178: ! 4179: The stack space allowed is not necessarily allocated. If you specify a ! 4180: large value for @code{YYMAXDEPTH}, the parser actually allocates a small ! 4181: stack at first, and then makes it bigger by stages as needed. This ! 4182: increasing allocation happens automatically and silently. Therefore, ! 4183: you do not need to make @code{YYMAXDEPTH} painfully small merely to save ! 4184: space for ordinary inputs that do not need much stack. ! 4185: ! 4186: @cindex default stack limit ! 4187: The default value of @code{YYMAXDEPTH}, if you do not define it, is ! 4188: 10000. ! 4189: ! 4190: @vindex YYINITDEPTH ! 4191: You can control how much stack is allocated initially by defining the ! 4192: macro @code{YYINITDEPTH}. This value too must be a compile-time ! 4193: constant integer. The default is 200. ! 4194: ! 4195: @node Error Recovery, Context Dependency, Algorithm, Top ! 4196: @chapter Error Recovery ! 4197: @cindex error recovery ! 4198: @cindex recovery from errors ! 4199: ! 4200: It is not usually acceptable to have a program terminate on a parse ! 4201: error. For example, a compiler should recover sufficiently to parse the ! 4202: rest of the input file and check it for errors; a calculator should accept ! 4203: another expression. ! 4204: ! 4205: In a simple interactive command parser where each input is one line, it may ! 4206: be sufficient to allow @code{yyparse} to return 1 on error and have the ! 4207: caller ignore the rest of the input line when that happens (and then call ! 4208: @code{yyparse} again). But this is inadequate for a compiler, because it ! 4209: forgets all the syntactic context leading up to the error. A syntax error ! 4210: deep within a function in the compiler input should not cause the compiler ! 4211: to treat the following line like the beginning of a source file. ! 4212: ! 4213: @findex error ! 4214: You can define how to recover from a syntax error by writing rules to ! 4215: recognize the special token @code{error}. This is a terminal symbol that ! 4216: is always defined (you need not declare it) and reserved for error ! 4217: handling. The Bison parser generates an @code{error} token whenever a ! 4218: syntax error happens; if you have provided a rule to recognize this token ! 4219: in the current context, the parse can continue. ! 4220: ! 4221: For example: ! 4222: ! 4223: @example ! 4224: stmnts: /* empty string */ ! 4225: | stmnts '\n' ! 4226: | stmnts exp '\n' ! 4227: | stmnts error '\n' ! 4228: @end example ! 4229: ! 4230: The fourth rule in this example says that an error followed by a newline ! 4231: makes a valid addition to any @code{stmnts}. ! 4232: ! 4233: What happens if a syntax error occurs in the middle of an @code{exp}? The ! 4234: error recovery rule, interpreted strictly, applies to the precise sequence ! 4235: of a @code{stmnts}, an @code{error} and a newline. If an error occurs in ! 4236: the middle of an @code{exp}, there will probably be some additional tokens ! 4237: and subexpressions on the stack after the last @code{stmnts}, and there ! 4238: will be tokens to read before the next newline. So the rule is not ! 4239: applicable in the ordinary way. ! 4240: ! 4241: But Bison can force the situation to fit the rule, by discarding part of ! 4242: the semantic context and part of the input. First it discards states and ! 4243: objects from the stack until it gets back to a state in which the ! 4244: @code{error} token is acceptable. (This means that the subexpressions ! 4245: already parsed are discarded, back to the last complete @code{stmnts}.) At ! 4246: this point the @code{error} token can be shifted. Then, if the old ! 4247: look-ahead token is not acceptable to be shifted next, the parser reads ! 4248: tokens and discards them until it finds a token which is acceptable. In ! 4249: this example, Bison reads and discards input until the next newline ! 4250: so that the fourth rule can apply. ! 4251: ! 4252: The choice of error rules in the grammar is a choice of strategies for ! 4253: error recovery. A simple and useful strategy is simply to skip the rest of ! 4254: the current input line or current statement if an error is detected: ! 4255: ! 4256: @example ! 4257: stmnt: error ';' /* on error, skip until ';' is read */ ! 4258: @end example ! 4259: ! 4260: It is also useful to recover to the matching close-delimiter of an ! 4261: opening-delimiter that has already been parsed. Otherwise the ! 4262: close-delimiter will probably appear to be unmatched, and generate another, ! 4263: spurious error message: ! 4264: ! 4265: @example ! 4266: primary: '(' expr ')' ! 4267: | '(' error ')' ! 4268: @dots{} ! 4269: ; ! 4270: @end example ! 4271: ! 4272: Error recovery strategies are necessarily guesses. When they guess wrong, ! 4273: one syntax error often leads to another. In the above example, the error ! 4274: recovery rule guesses that an error is due to bad input within one ! 4275: @code{stmnt}. Suppose that instead a spurious semicolon is inserted in the ! 4276: middle of a valid @code{stmnt}. After the error recovery rule recovers ! 4277: from the first error, another syntax error will be found straightaway, ! 4278: since the text following the spurious semicolon is also an invalid ! 4279: @code{stmnt}. ! 4280: ! 4281: To prevent an outpouring of error messages, the parser will output no error ! 4282: message for another syntax error that happens shortly after the first; only ! 4283: after three consecutive input tokens have been successfully shifted will ! 4284: error messages resume. ! 4285: ! 4286: Note that rules which accept the @code{error} token may have actions, just ! 4287: as any other rules can. ! 4288: ! 4289: @findex yyerrok ! 4290: You can make error messages resume immediately by using the macro ! 4291: @code{yyerrok} in an action. If you do this in the error rule's action, no ! 4292: error messages will be suppressed. This macro requires no arguments; ! 4293: @samp{yyerrok;} is a valid C statement. ! 4294: ! 4295: @findex yyclearin ! 4296: The previous look-ahead token is reanalyzed immediately after an error. If ! 4297: this is unacceptable, then the macro @code{yyclearin} may be used to clear ! 4298: this token. Write the statement @samp{yyclearin;} in the error rule's ! 4299: action. ! 4300: ! 4301: For example, suppose that on a parse error, an error handling routine is ! 4302: called that advances the input stream to some point where parsing should ! 4303: once again commence. The next symbol returned by the lexical scanner is ! 4304: probably correct. The previous look-ahead token ought to be discarded ! 4305: with @samp{yyclearin;}. ! 4306: ! 4307: @vindex YYRECOVERING ! 4308: The macro @code{YYRECOVERING} stands for an expression that has the ! 4309: value 1 when the parser is recovering from a syntax error, and 0 the ! 4310: rest of the time. A value of 1 indicates that error messages are ! 4311: currently suppressed for new syntax errors. ! 4312: ! 4313: @node Context Dependency, Debugging, Error Recovery, Top ! 4314: @chapter Handling Context Dependencies ! 4315: ! 4316: The Bison paradigm is to parse tokens first, then group them into larger ! 4317: syntactic units. In many languages, the meaning of a token is affected by ! 4318: its context. Although this violates the Bison paradigm, certain techniques ! 4319: (known as @dfn{kludges}) may enable you to write Bison parsers for such ! 4320: languages. ! 4321: ! 4322: @menu ! 4323: * Semantic Tokens:: Token parsing can depend on the semantic context. ! 4324: * Lexical Tie-ins:: Token parsing can depend on the syntactic context. ! 4325: * Tie-in Recovery:: Lexical tie-ins have implications for how ! 4326: error recovery rules must be written. ! 4327: @end menu ! 4328: ! 4329: (Actually, ``kludge'' means any technique that gets its job done but is ! 4330: neither clean nor robust.) ! 4331: ! 4332: @node Semantic Tokens, Lexical Tie-ins, , Context Dependency ! 4333: @section Semantic Info in Token Types ! 4334: ! 4335: The C language has a context dependency: the way an identifier is used ! 4336: depends on what its current meaning is. For example, consider this: ! 4337: ! 4338: @example ! 4339: foo (x); ! 4340: @end example ! 4341: ! 4342: This looks like a function call statement, but if @code{foo} is a typedef ! 4343: name, then this is actually a declaration of @code{x}. How can a Bison ! 4344: parser for C decide how to parse this input? ! 4345: ! 4346: The method used in GNU C is to have two different token types, ! 4347: @code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an ! 4348: identifier, it looks up the current declaration of the identifier in order ! 4349: to decide which token type to return: @code{TYPENAME} if the identifier is ! 4350: declared as a typedef, @code{IDENTIFIER} otherwise. ! 4351: ! 4352: The grammar rules can then express the context dependency by the choice of ! 4353: token type to recognize. @code{IDENTIFIER} is accepted as an expression, ! 4354: but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but ! 4355: @code{IDENTIFIER} cannot. In contexts where the meaning of the identifier ! 4356: is @emph{not} significant, such as in declarations that can shadow a ! 4357: typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is ! 4358: accepted---there is one rule for each of the two token types. ! 4359: ! 4360: This technique is simple to use if the decision of which kinds of ! 4361: identifiers to allow is made at a place close to where the identifier is ! 4362: parsed. But in C this is not always so: C allows a declaration to ! 4363: redeclare a typedef name provided an explicit type has been specified ! 4364: earlier: ! 4365: ! 4366: @example ! 4367: typedef int foo, bar, lose; ! 4368: static foo (bar); /* @r{redeclare @code{bar} as static variable} */ ! 4369: static int foo (lose); /* @r{redeclare @code{foo} as function} */ ! 4370: @end example ! 4371: ! 4372: Unfortunately, the name being declared is separated from the declaration ! 4373: construct itself by a complicated syntactic structure---the ``declarator''. ! 4374: ! 4375: As a result, the part of Bison parser for C needs to be duplicated, with ! 4376: all the nonterminal names changed: once for parsing a declaration in which ! 4377: a typedef name can be redefined, and once for parsing a declaration in ! 4378: which that can't be done. Here is a part of the duplication, with actions ! 4379: omitted for brevity: ! 4380: ! 4381: @example ! 4382: initdcl: ! 4383: declarator maybeasm '=' ! 4384: init ! 4385: | declarator maybeasm ! 4386: ; ! 4387: ! 4388: notype_initdcl: ! 4389: notype_declarator maybeasm '=' ! 4390: init ! 4391: | notype_declarator maybeasm ! 4392: ; ! 4393: @end example ! 4394: ! 4395: @noindent ! 4396: Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl} ! 4397: cannot. The distinction between @code{declarator} and ! 4398: @code{notype_declarator} is the same sort of thing. ! 4399: ! 4400: There is some similarity between this technique and a lexical tie-in ! 4401: (described next), in that information which alters the lexical analysis is ! 4402: changed during parsing by other parts of the program. The difference is ! 4403: here the information is global, and is used for other purposes in the ! 4404: program. A true lexical tie-in has a special-purpose flag controlled by ! 4405: the syntactic context. ! 4406: ! 4407: @node Lexical Tie-ins, Tie-in Recovery, Semantic Tokens, Context Dependency ! 4408: @section Lexical Tie-ins ! 4409: @cindex lexical tie-in ! 4410: ! 4411: One way to handle context-dependency is the @dfn{lexical tie-in}: a flag ! 4412: which is set by Bison actions, whose purpose is to alter the way tokens are ! 4413: parsed. ! 4414: ! 4415: For example, suppose we have a language vaguely like C, but with a special ! 4416: construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes ! 4417: an expression in parentheses in which all integers are hexadecimal. In ! 4418: particular, the token @samp{a1b} must be treated as an integer rather than ! 4419: as an identifier if it appears in that context. Here is how you can do it: ! 4420: ! 4421: @example ! 4422: @group ! 4423: %@{ ! 4424: int hexflag; ! 4425: %@} ! 4426: %% ! 4427: @dots{} ! 4428: @end group ! 4429: @group ! 4430: expr: IDENTIFIER ! 4431: | constant ! 4432: | HEX '(' ! 4433: @{ hexflag = 1; @} ! 4434: expr ')' ! 4435: @{ hexflag = 0; ! 4436: $$ = $4; @} ! 4437: | expr '+' expr ! 4438: @{ $$ = make_sum ($1, $3); @} ! 4439: @dots{} ! 4440: ; ! 4441: @end group ! 4442: ! 4443: @group ! 4444: constant: ! 4445: INTEGER ! 4446: | STRING ! 4447: ; ! 4448: @end group ! 4449: @end example ! 4450: ! 4451: @noindent ! 4452: Here we assume that @code{yylex} looks at the value of @code{hexflag}; when ! 4453: it is nonzero, all integers are parsed in hexadecimal, and tokens starting ! 4454: with letters are parsed as integers if possible. ! 4455: ! 4456: The declaration of @code{hexflag} shown in the C declarations section of ! 4457: the parser file is needed to make it accessible to the actions ! 4458: (@pxref{C Declarations, ,The C Declarations Section}). You must also write the code in @code{yylex} ! 4459: to obey the flag. ! 4460: ! 4461: @node Tie-in Recovery, , Lexical Tie-ins, Context Dependency ! 4462: @section Lexical Tie-ins and Error Recovery ! 4463: ! 4464: Lexical tie-ins make strict demands on any error recovery rules you have. ! 4465: @xref{Error Recovery}. ! 4466: ! 4467: The reason for this is that the purpose of an error recovery rule is to ! 4468: abort the parsing of one construct and resume in some larger construct. ! 4469: For example, in C-like languages, a typical error recovery rule is to skip ! 4470: tokens until the next semicolon, and then start a new statement, like this: ! 4471: ! 4472: @example ! 4473: stmt: expr ';' ! 4474: | IF '(' expr ')' stmt @{ @dots{} @} ! 4475: @dots{} ! 4476: error ';' ! 4477: @{ hexflag = 0; @} ! 4478: ; ! 4479: @end example ! 4480: ! 4481: If there is a syntax error in the middle of a @samp{hex (@var{expr})} ! 4482: construct, this error rule will apply, and then the action for the ! 4483: completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would ! 4484: remain set for the entire rest of the input, or until the next @code{hex} ! 4485: keyword, causing identifiers to be misinterpreted as integers. ! 4486: ! 4487: To avoid this problem the error recovery rule itself clears @code{hexflag}. ! 4488: ! 4489: There may also be an error recovery rule that works within expressions. ! 4490: For example, there could be a rule which applies within parentheses ! 4491: and skips to the close-parenthesis: ! 4492: ! 4493: @example ! 4494: @group ! 4495: expr: @dots{} ! 4496: | '(' expr ')' ! 4497: @{ $$ = $2; @} ! 4498: | '(' error ')' ! 4499: @dots{} ! 4500: @end group ! 4501: @end example ! 4502: ! 4503: If this rule acts within the @code{hex} construct, it is not going to abort ! 4504: that construct (since it applies to an inner level of parentheses within ! 4505: the construct). Therefore, it should not clear the flag: the rest of ! 4506: the @code{hex} construct should be parsed with the flag still in effect. ! 4507: ! 4508: What if there is an error recovery rule which might abort out of the ! 4509: @code{hex} construct or might not, depending on circumstances? There is no ! 4510: way you can write the action to determine whether a @code{hex} construct is ! 4511: being aborted or not. So if you are using a lexical tie-in, you had better ! 4512: make sure your error recovery rules are not of this kind. Each rule must ! 4513: be such that you can be sure that it always will, or always won't, have to ! 4514: clear the flag. ! 4515: ! 4516: @node Debugging, Invocation, Context Dependency, Top ! 4517: @chapter Debugging Your Parser ! 4518: @findex YYDEBUG ! 4519: @findex yydebug ! 4520: @cindex debugging ! 4521: @cindex tracing the parser ! 4522: ! 4523: If a Bison grammar compiles properly but doesn't do what you want when it ! 4524: runs, the @code{yydebug} parser-trace feature can help you figure out why. ! 4525: ! 4526: To enable compilation of trace facilities, you must define the macro ! 4527: @code{YYDEBUG} when you compile the parser. You could use ! 4528: @samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define ! 4529: YYDEBUG 1} in the C declarations section of the grammar file ! 4530: (@pxref{C Declarations, ,The C Declarations Section}). Alternatively, use the @samp{-t} option when ! 4531: you run Bison (@pxref{Invocation, ,Invoking Bison}). We always define @code{YYDEBUG} so that ! 4532: debugging is always possible. ! 4533: ! 4534: The trace facility uses @code{stderr}, so you must add @w{@code{#include ! 4535: <stdio.h>}} to the C declarations section unless it is already there. ! 4536: ! 4537: Once you have compiled the program with trace facilities, the way to ! 4538: request a trace is to store a nonzero value in the variable @code{yydebug}. ! 4539: You can do this by making the C code do it (in @code{main}, perhaps), or ! 4540: you can alter the value with a C debugger. ! 4541: ! 4542: Each step taken by the parser when @code{yydebug} is nonzero produces a ! 4543: line or two of trace information, written on @code{stderr}. The trace ! 4544: messages tell you these things: ! 4545: ! 4546: @itemize @bullet ! 4547: @item ! 4548: Each time the parser calls @code{yylex}, what kind of token was read. ! 4549: ! 4550: @item ! 4551: Each time a token is shifted, the depth and complete contents of the ! 4552: state stack (@pxref{Parser States}). ! 4553: ! 4554: @item ! 4555: Each time a rule is reduced, which rule it is, and the complete contents ! 4556: of the state stack afterward. ! 4557: @end itemize ! 4558: ! 4559: To make sense of this information, it helps to refer to the listing file ! 4560: produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking Bison}). This file ! 4561: shows the meaning of each state in terms of positions in various rules, and ! 4562: also what each state will do with each possible input token. As you read ! 4563: the successive trace messages, you can see that the parser is functioning ! 4564: according to its specification in the listing file. Eventually you will ! 4565: arrive at the place where something undesirable happens, and you will see ! 4566: which parts of the grammar are to blame. ! 4567: ! 4568: The parser file is a C program and you can use C debuggers on it, but it's ! 4569: not easy to interpret what it is doing. The parser function is a ! 4570: finite-state machine interpreter, and aside from the actions it executes ! 4571: the same code over and over. Only the values of variables show where in ! 4572: the grammar it is working. ! 4573: ! 4574: @findex YYPRINT ! 4575: The debugging information normally gives the token type of each token ! 4576: read, but not its semantic value. You can optionally define a macro ! 4577: named @code{YYPRINT} to provide a way to print the value. If you define ! 4578: @code{YYPRINT}, it should take three arguments. The parser will pass a ! 4579: standard I/O stream, the numeric code for the token type, and the token ! 4580: value (from @code{yylval}). ! 4581: ! 4582: Here is an example of @code{YYPRINT} suitable for the multi-function ! 4583: calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}): ! 4584: ! 4585: @smallexample ! 4586: #define YYPRINT(file, type, value) yyprint (file, type, value) ! 4587: ! 4588: static void ! 4589: yyprint (file, type, value) ! 4590: FILE *file; ! 4591: int type; ! 4592: YYSTYPE value; ! 4593: @{ ! 4594: if (type == VAR) ! 4595: fprintf (file, " %s", value.tptr->name); ! 4596: else if (type == NUM) ! 4597: fprintf (file, " %d", value.val); ! 4598: @} ! 4599: @end smallexample ! 4600: ! 4601: @node Invocation, Table of Symbols, Debugging, Top ! 4602: @chapter Invoking Bison ! 4603: @cindex invoking Bison ! 4604: @cindex Bison invocation ! 4605: @cindex options for invoking Bison ! 4606: ! 4607: The usual way to invoke Bison is as follows: ! 4608: ! 4609: @example ! 4610: bison @var{infile} ! 4611: @end example ! 4612: ! 4613: Here @var{infile} is the grammar file name, which usually ends in ! 4614: @samp{.y}. The parser file's name is made by replacing the @samp{.y} ! 4615: with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields ! 4616: @file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields ! 4617: @file{hack/foo.tab.c}.@refill ! 4618: ! 4619: @menu ! 4620: * Bison Options:: All the options described in detail, ! 4621: in alphabetical order by short options. ! 4622: * Option Cross Key:: Alphabetical list of long options. ! 4623: * VMS Invocation:: Bison command syntax on VMS. ! 4624: @end menu ! 4625: ! 4626: @node Bison Options, Option Cross Key, , Invocation ! 4627: @section Bison Options ! 4628: ! 4629: Bison supports both traditional single-letter options and mnemonic long ! 4630: option names. Long option names are indicated with @samp{--} instead of ! 4631: @samp{-}. Abbreviations for option names are allowed as long as they ! 4632: are unique. When a long option takes an argument, like ! 4633: @samp{--file-prefix}, connect the option name and the argument with ! 4634: @samp{=}. ! 4635: ! 4636: Here is a list of options that can be used with Bison, alphabetized by ! 4637: short option. It is followed by a cross key alphabetized by long ! 4638: option. ! 4639: ! 4640: @table @samp ! 4641: @item -b @var{file-prefix} ! 4642: @itemx --file-prefix=@var{prefix} ! 4643: Specify a prefix to use for all Bison output file names. The names are ! 4644: chosen as if the input file were named @file{@var{prefix}.c}. ! 4645: ! 4646: @item -d ! 4647: @itemx --defines ! 4648: Write an extra output file containing macro definitions for the token ! 4649: type names defined in the grammar and the semantic value type ! 4650: @code{YYSTYPE}, as well as a few @code{extern} variable declarations. ! 4651: ! 4652: If the parser output file is named @file{@var{name}.c} then this file ! 4653: is named @file{@var{name}.h}.@refill ! 4654: ! 4655: This output file is essential if you wish to put the definition of ! 4656: @code{yylex} in a separate source file, because @code{yylex} needs to ! 4657: be able to refer to token type codes and the variable ! 4658: @code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill ! 4659: ! 4660: @item -l ! 4661: @itemx --no-lines ! 4662: Don't put any @code{#line} preprocessor commands in the parser file. ! 4663: Ordinarily Bison puts them in the parser file so that the C compiler ! 4664: and debuggers will associate errors with your source file, the ! 4665: grammar file. This option causes them to associate errors with the ! 4666: parser file, treating it an independent source file in its own right. ! 4667: ! 4668: @item -o @var{outfile} ! 4669: @itemx --output-file=@var{outfile} ! 4670: Specify the name @var{outfile} for the parser file. ! 4671: ! 4672: The other output files' names are constructed from @var{outfile} ! 4673: as described under the @samp{-v} and @samp{-d} switches. ! 4674: ! 4675: @item -p @var{prefix} ! 4676: @itemx --name-prefix=@var{prefix} ! 4677: Rename the external symbols used in the parser so that they start with ! 4678: @var{prefix} instead of @samp{yy}. The precise list of symbols renamed ! 4679: is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yylval}, ! 4680: @code{yychar} and @code{yydebug}. ! 4681: ! 4682: For example, if you use @samp{-p c}, the names become @code{cparse}, ! 4683: @code{clex}, and so on. ! 4684: ! 4685: @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. ! 4686: ! 4687: @item -t ! 4688: @itemx --debug ! 4689: Output a definition of the macro @code{YYDEBUG} into the parser file, ! 4690: so that the debugging facilities are compiled. @xref{Debugging, ,Debugging Your Parser}. ! 4691: ! 4692: @item -v ! 4693: @itemx --verbose ! 4694: Write an extra output file containing verbose descriptions of the ! 4695: parser states and what is done for each type of look-ahead token in ! 4696: that state. ! 4697: ! 4698: This file also describes all the conflicts, both those resolved by ! 4699: operator precedence and the unresolved ones. ! 4700: ! 4701: The file's name is made by removing @samp{.tab.c} or @samp{.c} from ! 4702: the parser output file name, and adding @samp{.output} instead.@refill ! 4703: ! 4704: Therefore, if the input file is @file{foo.y}, then the parser file is ! 4705: called @file{foo.tab.c} by default. As a consequence, the verbose ! 4706: output file is called @file{foo.output}.@refill ! 4707: ! 4708: @item -V ! 4709: @itemx --version ! 4710: Print the version number of Bison and exit. ! 4711: ! 4712: @item -h ! 4713: @itemx --help ! 4714: Print a summary of the command-line options to Bison and exit. ! 4715: ! 4716: @need 1750 ! 4717: @item -y ! 4718: @itemx --yacc ! 4719: @itemx --fixed-output-files ! 4720: Equivalent to @samp{-o y.tab.c}; the parser output file is called ! 4721: @file{y.tab.c}, and the other outputs are called @file{y.output} and ! 4722: @file{y.tab.h}. The purpose of this switch is to imitate Yacc's output ! 4723: file name conventions. Thus, the following shell script can substitute ! 4724: for Yacc:@refill ! 4725: ! 4726: @example ! 4727: bison -y $* ! 4728: @end example ! 4729: @end table ! 4730: ! 4731: @node Option Cross Key, VMS Invocation, Bison Options, Invocation ! 4732: @section Option Cross Key ! 4733: ! 4734: Here is a list of options, alphabetized by long option, to help you find ! 4735: the corresponding short option. ! 4736: ! 4737: @tex ! 4738: \def\leaderfill{\leaders\hbox to 1em{\hss.\hss}\hfill} ! 4739: ! 4740: {\tt ! 4741: \line{ --debug \leaderfill -t} ! 4742: \line{ --defines \leaderfill -d} ! 4743: \line{ --file-prefix \leaderfill -b} ! 4744: \line{ --fixed-output-files \leaderfill -y} ! 4745: \line{ --help \leaderfill -h} ! 4746: \line{ --name-prefix \leaderfill -p} ! 4747: \line{ --no-lines \leaderfill -l} ! 4748: \line{ --output-file \leaderfill -o} ! 4749: \line{ --verbose \leaderfill -v} ! 4750: \line{ --version \leaderfill -V} ! 4751: \line{ --yacc \leaderfill -y} ! 4752: } ! 4753: @end tex ! 4754: ! 4755: @ifinfo ! 4756: @example ! 4757: --debug -t ! 4758: --defines -d ! 4759: --file-prefix=@var{prefix} -b @var{file-prefix} ! 4760: --fixed-output-files --yacc -y ! 4761: --help -h ! 4762: --name-prefix -p ! 4763: --no-lines -l ! 4764: --output-file=@var{outfile} -o @var{outfile} ! 4765: --verbose -v ! 4766: --version -V ! 4767: @end example ! 4768: @end ifinfo ! 4769: ! 4770: @node VMS Invocation, , Option Cross Key, Invocation ! 4771: @section Invoking Bison under VMS ! 4772: @cindex invoking Bison under VMS ! 4773: @cindex VMS ! 4774: ! 4775: The command line syntax for Bison on VMS is a variant of the usual ! 4776: Bison command syntax---adapted to fit VMS conventions. ! 4777: ! 4778: To find the VMS equivalent for any Bison option, start with the long ! 4779: option, and substitute a @samp{/} for the leading @samp{--}, and ! 4780: substitute a @samp{_} for each @samp{-} in the name of the long option. ! 4781: For example, the following invocation under VMS: ! 4782: ! 4783: @example ! 4784: bison /debug/name_prefix=bar foo.y ! 4785: @end example ! 4786: ! 4787: @noindent ! 4788: is equivalent to the following command under POSIX. ! 4789: ! 4790: @example ! 4791: bison --debug --name-prefix=bar foo.y ! 4792: @end example ! 4793: ! 4794: The VMS file system does not permit filenames such as ! 4795: @file{foo.tab.c}. In the above example, the output file ! 4796: would instead be named @file{foo_tab.c}. ! 4797: ! 4798: @node Table of Symbols, Glossary, Invocation, Top ! 4799: @appendix Bison Symbols ! 4800: @cindex Bison symbols, table of ! 4801: @cindex symbols in Bison, table of ! 4802: ! 4803: @table @code ! 4804: @item error ! 4805: A token name reserved for error recovery. This token may be used in ! 4806: grammar rules so as to allow the Bison parser to recognize an error in ! 4807: the grammar without halting the process. In effect, a sentence ! 4808: containing an error may be recognized as valid. On a parse error, the ! 4809: token @code{error} becomes the current look-ahead token. Actions ! 4810: corresponding to @code{error} are then executed, and the look-ahead ! 4811: token is reset to the token that originally caused the violation. ! 4812: @xref{Error Recovery}. ! 4813: ! 4814: @item YYABORT ! 4815: Macro to pretend that an unrecoverable syntax error has occurred, by ! 4816: making @code{yyparse} return 1 immediately. The error reporting ! 4817: function @code{yyerror} is not called. @xref{Parser Function, ,The Parser Function @code{yyparse}}. ! 4818: ! 4819: @item YYACCEPT ! 4820: Macro to pretend that a complete utterance of the language has been ! 4821: read, by making @code{yyparse} return 0 immediately. ! 4822: @xref{Parser Function, ,The Parser Function @code{yyparse}}. ! 4823: ! 4824: @item YYBACKUP ! 4825: Macro to discard a value from the parser stack and fake a look-ahead ! 4826: token. @xref{Action Features, ,Special Features for Use in Actions}. ! 4827: ! 4828: @item YYERROR ! 4829: Macro to pretend that a syntax error has just been detected: call ! 4830: @code{yyerror} and then perform normal error recovery if possible ! 4831: (@pxref{Error Recovery}), or (if recovery is impossible) make ! 4832: @code{yyparse} return 1. @xref{Error Recovery}. ! 4833: ! 4834: @item YYERROR_VERBOSE ! 4835: Macro that you define with @code{#define} in the Bison declarations ! 4836: section to request verbose, specific error message strings when ! 4837: @code{yyerror} is called. ! 4838: ! 4839: @item YYINITDEPTH ! 4840: Macro for specifying the initial size of the parser stack. ! 4841: @xref{Stack Overflow}. ! 4842: ! 4843: @item YYLTYPE ! 4844: Macro for the data type of @code{yylloc}; a structure with four ! 4845: members. @xref{Token Positions, ,Textual Positions of Tokens}. ! 4846: ! 4847: @item YYMAXDEPTH ! 4848: Macro for specifying the maximum size of the parser stack. ! 4849: @xref{Stack Overflow}. ! 4850: ! 4851: @item YYRECOVERING ! 4852: Macro whose value indicates whether the parser is recovering from a ! 4853: syntax error. @xref{Action Features, ,Special Features for Use in Actions}. ! 4854: ! 4855: @item YYSTYPE ! 4856: Macro for the data type of semantic values; @code{int} by default. ! 4857: @xref{Value Type, ,Data Types of Semantic Values}. ! 4858: ! 4859: @item yychar ! 4860: External integer variable that contains the integer value of the ! 4861: current look-ahead token. (In a pure parser, it is a local variable ! 4862: within @code{yyparse}.) Error-recovery rule actions may examine this ! 4863: variable. @xref{Action Features, ,Special Features for Use in Actions}. ! 4864: ! 4865: @item yyclearin ! 4866: Macro used in error-recovery rule actions. It clears the previous ! 4867: look-ahead token. @xref{Error Recovery}. ! 4868: ! 4869: @item yydebug ! 4870: External integer variable set to zero by default. If @code{yydebug} ! 4871: is given a nonzero value, the parser will output information on input ! 4872: symbols and parser action. @xref{Debugging, ,Debugging Your Parser}. ! 4873: ! 4874: @item yyerrok ! 4875: Macro to cause parser to recover immediately to its normal mode ! 4876: after a parse error. @xref{Error Recovery}. ! 4877: ! 4878: @item yyerror ! 4879: User-supplied function to be called by @code{yyparse} on error. The ! 4880: function receives one argument, a pointer to a character string ! 4881: containing an error message. @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. ! 4882: ! 4883: @item yylex ! 4884: User-supplied lexical analyzer function, called with no arguments ! 4885: to get the next token. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}. ! 4886: ! 4887: @item yylval ! 4888: External variable in which @code{yylex} should place the semantic ! 4889: value associated with a token. (In a pure parser, it is a local ! 4890: variable within @code{yyparse}, and its address is passed to ! 4891: @code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}. ! 4892: ! 4893: @item yylloc ! 4894: External variable in which @code{yylex} should place the line and ! 4895: column numbers associated with a token. (In a pure parser, it is a ! 4896: local variable within @code{yyparse}, and its address is passed to ! 4897: @code{yylex}.) You can ignore this variable if you don't use the ! 4898: @samp{@@} feature in the grammar actions. @xref{Token Positions, ,Textual Positions of Tokens}. ! 4899: ! 4900: @item yynerrs ! 4901: Global variable which Bison increments each time there is a parse ! 4902: error. (In a pure parser, it is a local variable within ! 4903: @code{yyparse}.) @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. ! 4904: ! 4905: @item yyparse ! 4906: The parser function produced by Bison; call this function to start ! 4907: parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}. ! 4908: ! 4909: @item %left ! 4910: Bison declaration to assign left associativity to token(s). ! 4911: @xref{Precedence Decl, ,Operator Precedence}. ! 4912: ! 4913: @item %nonassoc ! 4914: Bison declaration to assign nonassociativity to token(s). ! 4915: @xref{Precedence Decl, ,Operator Precedence}. ! 4916: ! 4917: @item %prec ! 4918: Bison declaration to assign a precedence to a specific rule. ! 4919: @xref{Contextual Precedence, ,Context-Dependent Precedence}. ! 4920: ! 4921: @item %pure_parser ! 4922: Bison declaration to request a pure (reentrant) parser. ! 4923: @xref{Pure Decl, ,A Pure (Reentrant) Parser}. ! 4924: ! 4925: @item %right ! 4926: Bison declaration to assign right associativity to token(s). ! 4927: @xref{Precedence Decl, ,Operator Precedence}. ! 4928: ! 4929: @item %start ! 4930: Bison declaration to specify the start symbol. @xref{Start Decl, ,The Start-Symbol}. ! 4931: ! 4932: @item %token ! 4933: Bison declaration to declare token(s) without specifying precedence. ! 4934: @xref{Token Decl, ,Token Type Names}. ! 4935: ! 4936: @item %type ! 4937: Bison declaration to declare nonterminals. @xref{Type Decl, ,Nonterminal Symbols}. ! 4938: ! 4939: @item %union ! 4940: Bison declaration to specify several possible data types for semantic ! 4941: values. @xref{Union Decl, ,The Collection of Value Types}. ! 4942: @end table ! 4943: ! 4944: These are the punctuation and delimiters used in Bison input: ! 4945: ! 4946: @table @samp ! 4947: @item %% ! 4948: Delimiter used to separate the grammar rule section from the ! 4949: Bison declarations section or the additional C code section. ! 4950: @xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}. ! 4951: ! 4952: @item %@{ %@} ! 4953: All code listed between @samp{%@{} and @samp{%@}} is copied directly ! 4954: to the output file uninterpreted. Such code forms the ``C ! 4955: declarations'' section of the input file. @xref{Grammar Outline, ,Outline of a Bison Grammar}. ! 4956: ! 4957: @item /*@dots{}*/ ! 4958: Comment delimiters, as in C. ! 4959: ! 4960: @item : ! 4961: Separates a rule's result from its components. @xref{Rules, ,Syntax of Grammar Rules}. ! 4962: ! 4963: @item ; ! 4964: Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}. ! 4965: ! 4966: @item | ! 4967: Separates alternate rules for the same result nonterminal. ! 4968: @xref{Rules, ,Syntax of Grammar Rules}. ! 4969: @end table ! 4970: ! 4971: @node Glossary, Index, Table of Symbols, Top ! 4972: @appendix Glossary ! 4973: @cindex glossary ! 4974: ! 4975: @table @asis ! 4976: @item Backus-Naur Form (BNF) ! 4977: Formal method of specifying context-free grammars. BNF was first used ! 4978: in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. ! 4979: ! 4980: @item Context-free grammars ! 4981: Grammars specified as rules that can be applied regardless of context. ! 4982: Thus, if there is a rule which says that an integer can be used as an ! 4983: expression, integers are allowed @emph{anywhere} an expression is ! 4984: permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. ! 4985: ! 4986: @item Dynamic allocation ! 4987: Allocation of memory that occurs during execution, rather than at ! 4988: compile time or on entry to a function. ! 4989: ! 4990: @item Empty string ! 4991: Analogous to the empty set in set theory, the empty string is a ! 4992: character string of length zero. ! 4993: ! 4994: @item Finite-state stack machine ! 4995: A ``machine'' that has discrete states in which it is said to exist at ! 4996: each instant in time. As input to the machine is processed, the ! 4997: machine moves from state to state as specified by the logic of the ! 4998: machine. In the case of the parser, the input is the language being ! 4999: parsed, and the states correspond to various stages in the grammar ! 5000: rules. @xref{Algorithm, ,The Bison Parser Algorithm }. ! 5001: ! 5002: @item Grouping ! 5003: A language construct that is (in general) grammatically divisible; ! 5004: for example, `expression' or `declaration' in C. ! 5005: @xref{Language and Grammar, ,Languages and Context-Free Grammars}. ! 5006: ! 5007: @item Infix operator ! 5008: An arithmetic operator that is placed between the operands on which it ! 5009: performs some operation. ! 5010: ! 5011: @item Input stream ! 5012: A continuous flow of data between devices or programs. ! 5013: ! 5014: @item Language construct ! 5015: One of the typical usage schemas of the language. For example, one of ! 5016: the constructs of the C language is the @code{if} statement. ! 5017: @xref{Language and Grammar, ,Languages and Context-Free Grammars}. ! 5018: ! 5019: @item Left associativity ! 5020: Operators having left associativity are analyzed from left to right: ! 5021: @samp{a+b+c} first computes @samp{a+b} and then combines with ! 5022: @samp{c}. @xref{Precedence, ,Operator Precedence}. ! 5023: ! 5024: @item Left recursion ! 5025: A rule whose result symbol is also its first component symbol; ! 5026: for example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive Rules}. ! 5027: ! 5028: @item Left-to-right parsing ! 5029: Parsing a sentence of a language by analyzing it token by token from ! 5030: left to right. @xref{Algorithm, ,The Bison Parser Algorithm }. ! 5031: ! 5032: @item Lexical analyzer (scanner) ! 5033: A function that reads an input stream and returns tokens one by one. ! 5034: @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}. ! 5035: ! 5036: @item Lexical tie-in ! 5037: A flag, set by actions in the grammar rules, which alters the way ! 5038: tokens are parsed. @xref{Lexical Tie-ins}. ! 5039: ! 5040: @item Look-ahead token ! 5041: A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead Tokens}. ! 5042: ! 5043: @item LALR(1) ! 5044: The class of context-free grammars that Bison (like most other parser ! 5045: generators) can handle; a subset of LR(1). @xref{Mystery Conflicts, , ! 5046: Mysterious Reduce/Reduce Conflicts}. ! 5047: ! 5048: @item LR(1) ! 5049: The class of context-free grammars in which at most one token of ! 5050: look-ahead is needed to disambiguate the parsing of any piece of input. ! 5051: ! 5052: @item Nonterminal symbol ! 5053: A grammar symbol standing for a grammatical construct that can ! 5054: be expressed through rules in terms of smaller constructs; in other ! 5055: words, a construct that is not a token. @xref{Symbols}. ! 5056: ! 5057: @item Parse error ! 5058: An error encountered during parsing of an input stream due to invalid ! 5059: syntax. @xref{Error Recovery}. ! 5060: ! 5061: @item Parser ! 5062: A function that recognizes valid sentences of a language by analyzing ! 5063: the syntax structure of a set of tokens passed to it from a lexical ! 5064: analyzer. ! 5065: ! 5066: @item Postfix operator ! 5067: An arithmetic operator that is placed after the operands upon which it ! 5068: performs some operation. ! 5069: ! 5070: @item Reduction ! 5071: Replacing a string of nonterminals and/or terminals with a single ! 5072: nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison Parser Algorithm }. ! 5073: ! 5074: @item Reentrant ! 5075: A reentrant subprogram is a subprogram which can be in invoked any ! 5076: number of times in parallel, without interference between the various ! 5077: invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. ! 5078: ! 5079: @item Reverse polish notation ! 5080: A language in which all operators are postfix operators. ! 5081: ! 5082: @item Right recursion ! 5083: A rule whose result symbol is also its last component symbol; ! 5084: for example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive Rules}. ! 5085: ! 5086: @item Semantics ! 5087: In computer languages, the semantics are specified by the actions ! 5088: taken for each instance of the language, i.e., the meaning of ! 5089: each statement. @xref{Semantics, ,Defining Language Semantics}. ! 5090: ! 5091: @item Shift ! 5092: A parser is said to shift when it makes the choice of analyzing ! 5093: further input from the stream rather than reducing immediately some ! 5094: already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm }. ! 5095: ! 5096: @item Single-character literal ! 5097: A single character that is recognized and interpreted as is. ! 5098: @xref{Grammar in Bison, ,From Formal Rules to Bison Input}. ! 5099: ! 5100: @item Start symbol ! 5101: The nonterminal symbol that stands for a complete valid utterance in ! 5102: the language being parsed. The start symbol is usually listed as the ! 5103: first nonterminal symbol in a language specification. ! 5104: @xref{Start Decl, ,The Start-Symbol}. ! 5105: ! 5106: @item Symbol table ! 5107: A data structure where symbol names and associated data are stored ! 5108: during parsing to allow for recognition and use of existing ! 5109: information in repeated uses of a symbol. @xref{Multi-function Calc}. ! 5110: ! 5111: @item Token ! 5112: A basic, grammatically indivisible unit of a language. The symbol ! 5113: that describes a token in the grammar is a terminal symbol. ! 5114: The input of the Bison parser is a stream of tokens which comes from ! 5115: the lexical analyzer. @xref{Symbols}. ! 5116: ! 5117: @item Terminal symbol ! 5118: A grammar symbol that has no rules in the grammar and therefore ! 5119: is grammatically indivisible. The piece of text it represents ! 5120: is a token. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. ! 5121: @end table ! 5122: ! 5123: @node Index, , Glossary, Top ! 5124: @unnumbered Index ! 5125: ! 5126: @printindex cp ! 5127: ! 5128: @contents ! 5129: ! 5130: @bye ! 5131: ! 5132: ! 5133: ! 5134: ! 5135: @c old menu ! 5136: ! 5137: * Introduction:: ! 5138: * Conditions:: ! 5139: * Copying:: The GNU General Public License says ! 5140: how you can copy and share Bison ! 5141: ! 5142: Tutorial sections: ! 5143: * Concepts:: Basic concepts for understanding Bison. ! 5144: * Examples:: Three simple explained examples of using Bison. ! 5145: ! 5146: Reference sections: ! 5147: * Grammar File:: Writing Bison declarations and rules. ! 5148: * Interface:: C-language interface to the parser function @code{yyparse}. ! 5149: * Algorithm:: How the Bison parser works at run-time. ! 5150: * Error Recovery:: Writing rules for error recovery. ! 5151: * Context Dependency::What to do if your language syntax is too ! 5152: messy for Bison to handle straightforwardly. ! 5153: * Debugging:: Debugging Bison parsers that parse wrong. ! 5154: * Invocation:: How to run Bison (to produce the parser source file). ! 5155: * Table of Symbols:: All the keywords of the Bison language are explained. ! 5156: * Glossary:: Basic concepts are explained. ! 5157: * Index:: Cross-references to the text. ! 5158:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.