|
|
1.1 root 1: \input texinfo @c -*-texinfo-*-
2: @comment %**start of header
3: @setfilename bison.info
4: @settitle Bison 1.20
5: @setchapternewpage odd
6:
7: @c SMALL BOOK version
8: @c This edition has been formatted so that you can format and print it in
9: @c the smallbook format.
10: @c @smallbook
11:
12: @c next time, consider using @set for edition number, etc...
13:
14: @c Set following if you have the new `shorttitlepage' command
15: @c @clear shorttitlepage-enabled
16: @c @set shorttitlepage-enabled
17:
18: @c ISPELL CHECK: done, 14 Jan 1993 --bob
19:
20: @c Check COPYRIGHT dates. should be updated in the titlepage, ifinfo
21: @c titlepage; should NOT be changed in the GPL. --mew
22:
23: @iftex
24: @syncodeindex fn cp
25: @syncodeindex vr cp
26: @syncodeindex tp cp
27: @end iftex
28: @ifinfo
29: @synindex fn cp
30: @synindex vr cp
31: @synindex tp cp
32: @end ifinfo
33: @comment %**end of header
34:
35: @ifinfo
36: This file documents the Bison parser generator.
37:
38: Copyright (C) 1988, 1989, 1990, 1991, 1992 Free Software Foundation, Inc.
39:
40: Permission is granted to make and distribute verbatim copies of
41: this manual provided the copyright notice and this permission notice
42: are preserved on all copies.
43:
44: @ignore
45: Permission is granted to process this file through Tex and print the
46: results, provided the printed document carries copying permission
47: notice identical to this one except for the removal of this paragraph
48: (this paragraph not being relevant to the printed manual).
49:
50: @end ignore
51: Permission is granted to copy and distribute modified versions of this
52: manual under the conditions for verbatim copying, provided also that the
53: sections entitled ``GNU General Public License'' and ``Conditions for
54: Using Bison'' are included exactly as in the original, and provided that
55: the entire resulting derived work is distributed under the terms of a
56: permission notice identical to this one.
57:
58: Permission is granted to copy and distribute translations of this manual
59: into another language, under the above conditions for modified versions,
60: except that the sections entitled ``GNU General Public License'',
61: ``Conditions for Using Bison'' and this permission notice may be
62: included in translations approved by the Free Software Foundation
63: instead of in the original English.
64: @end ifinfo
65:
66: @ifset shorttitlepage-enabled
67: @shorttitlepage Bison
68: @end ifset
69: @titlepage
70: @title Bison
71: @subtitle The YACC-compatible Parser Generator
72: @subtitle December 1992, Bison Version 1.20
73:
74: @author by Charles Donnelly and Richard Stallman
75:
76: @page
77: @vskip 0pt plus 1filll
78: Copyright @copyright{} 1988, 1989, 1990, 1991, 1992 Free Software
79: Foundation
80:
81: @sp 2
82: Published by the Free Software Foundation @*
83: 675 Massachusetts Avenue @*
84: Cambridge, MA 02139 USA @*
85: Printed copies are available for $15 each.@*
86: ISBN-1-882114-30-2
87:
88: Permission is granted to make and distribute verbatim copies of
89: this manual provided the copyright notice and this permission notice
90: are preserved on all copies.
91:
92: @ignore
93: Permission is granted to process this file through TeX and print the
94: results, provided the printed document carries copying permission
95: notice identical to this one except for the removal of this paragraph
96: (this paragraph not being relevant to the printed manual).
97:
98: @end ignore
99: Permission is granted to copy and distribute modified versions of this
100: manual under the conditions for verbatim copying, provided also that the
101: sections entitled ``GNU General Public License'' and ``Conditions for
102: Using Bison'' are included exactly as in the original, and provided that
103: the entire resulting derived work is distributed under the terms of a
104: permission notice identical to this one.
105:
106: Permission is granted to copy and distribute translations of this manual
107: into another language, under the above conditions for modified versions,
108: except that the sections entitled ``GNU General Public License'',
109: ``Conditions for Using Bison'' and this permission notice may be
110: included in translations approved by the Free Software Foundation
111: instead of in the original English.
112: @sp 2
113: Cover art by Etienne Suvasa.
114: @end titlepage
115: @page
116:
117: @node Top, Introduction, (dir), (dir)
118:
119: @ifinfo
120: This manual documents version 1.20 of Bison.
121: @end ifinfo
122:
123: @menu
124: * Introduction::
125: * Conditions::
126: * Copying:: The GNU General Public License says
127: how you can copy and share Bison
128:
129: Tutorial sections:
130: * Concepts:: Basic concepts for understanding Bison.
131: * Examples:: Three simple explained examples of using Bison.
132:
133: Reference sections:
134: * Grammar File:: Writing Bison declarations and rules.
135: * Interface:: C-language interface to the parser function @code{yyparse}.
136: * Algorithm:: How the Bison parser works at run-time.
137: * Error Recovery:: Writing rules for error recovery.
138: * Context Dependency:: What to do if your language syntax is too
139: messy for Bison to handle straightforwardly.
140: * Debugging:: Debugging Bison parsers that parse wrong.
141: * Invocation:: How to run Bison (to produce the parser source file).
142: * Table of Symbols:: All the keywords of the Bison language are explained.
143: * Glossary:: Basic concepts are explained.
144: * Index:: Cross-references to the text.
145:
146: --- The Detailed Node Listing ---
147:
148: The Concepts of Bison
149:
150: * Language and Grammar:: Languages and context-free grammars,
151: as mathematical ideas.
152: * Grammar in Bison:: How we represent grammars for Bison's sake.
153: * Semantic Values:: Each token or syntactic grouping can have
154: a semantic value (the value of an integer,
155: the name of an identifier, etc.).
156: * Semantic Actions:: Each rule can have an action containing C code.
157: * Bison Parser:: What are Bison's input and output,
158: how is the output used?
159: * Stages:: Stages in writing and running Bison grammars.
160: * Grammar Layout:: Overall structure of a Bison grammar file.
161:
162: Examples
163:
164: * RPN Calc:: Reverse polish notation calculator;
165: a first example with no operator precedence.
166: * Infix Calc:: Infix (algebraic) notation calculator.
167: Operator precedence is introduced.
168: * Simple Error Recovery:: Continuing after syntax errors.
169: * Multi-function Calc:: Calculator with memory and trig functions.
170: It uses multiple data-types for semantic values.
171: * Exercises:: Ideas for improving the multi-function calculator.
172:
173: Reverse Polish Notation Calculator
174:
175: * Decls: Rpcalc Decls. Bison and C declarations for rpcalc.
176: * Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
177: * Lexer: Rpcalc Lexer. The lexical analyzer.
178: * Main: Rpcalc Main. The controlling function.
179: * Error: Rpcalc Error. The error reporting function.
180: * Gen: Rpcalc Gen. Running Bison on the grammar file.
181: * Comp: Rpcalc Compile. Run the C compiler on the output code.
182:
183: Grammar Rules for @code{rpcalc}
184:
185: * Rpcalc Input::
186: * Rpcalc Line::
187: * Rpcalc Expr::
188:
189: Multi-Function Calculator: @code{mfcalc}
190:
191: * Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
192: * Rules: Mfcalc Rules. Grammar rules for the calculator.
193: * Symtab: Mfcalc Symtab. Symbol table management subroutines.
194:
195: Bison Grammar Files
196:
197: * Grammar Outline:: Overall layout of the grammar file.
198: * Symbols:: Terminal and nonterminal symbols.
199: * Rules:: How to write grammar rules.
200: * Recursion:: Writing recursive rules.
201: * Semantics:: Semantic values and actions.
202: * Declarations:: All kinds of Bison declarations are described here.
203: * Multiple Parsers:: Putting more than one Bison parser in one program.
204:
205: Outline of a Bison Grammar
206:
207: * C Declarations:: Syntax and usage of the C declarations section.
208: * Bison Declarations:: Syntax and usage of the Bison declarations section.
209: * Grammar Rules:: Syntax and usage of the grammar rules section.
210: * C Code:: Syntax and usage of the additional C code section.
211:
212: Defining Language Semantics
213:
214: * Value Type:: Specifying one data type for all semantic values.
215: * Multiple Types:: Specifying several alternative data types.
216: * Actions:: An action is the semantic definition of a grammar rule.
217: * Action Types:: Specifying data types for actions to operate on.
218: * Mid-Rule Actions:: Most actions go at the end of a rule.
219: This says when, why and how to use the exceptional
220: action in the middle of a rule.
221:
222: Bison Declarations
223:
224: * Token Decl:: Declaring terminal symbols.
225: * Precedence Decl:: Declaring terminals with precedence and associativity.
226: * Union Decl:: Declaring the set of all semantic value types.
227: * Type Decl:: Declaring the choice of type for a nonterminal symbol.
228: * Expect Decl:: Suppressing warnings about shift/reduce conflicts.
229: * Start Decl:: Specifying the start symbol.
230: * Pure Decl:: Requesting a reentrant parser.
231: * Decl Summary:: Table of all Bison declarations.
232:
233: Parser C-Language Interface
234:
235: * Parser Function:: How to call @code{yyparse} and what it returns.
236: * Lexical:: You must supply a function @code{yylex}
237: which reads tokens.
238: * Error Reporting:: You must supply a function @code{yyerror}.
239: * Action Features:: Special features for use in actions.
240:
241: The Lexical Analyzer Function @code{yylex}
242:
243: * Calling Convention:: How @code{yyparse} calls @code{yylex}.
244: * Token Values:: How @code{yylex} must return the semantic value
245: of the token it has read.
246: * Token Positions:: How @code{yylex} must return the text position
247: (line number, etc.) of the token, if the
248: actions want that.
249: * Pure Calling:: How the calling convention differs
250: in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
251:
252: The Bison Parser Algorithm
253:
254: * Look-Ahead:: Parser looks one token ahead when deciding what to do.
255: * Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
256: * Precedence:: Operator precedence works by resolving conflicts.
257: * Contextual Precedence:: When an operator's precedence depends on context.
258: * Parser States:: The parser is a finite-state-machine with stack.
259: * Reduce/Reduce:: When two rules are applicable in the same situation.
260: * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
261: * Stack Overflow:: What happens when stack gets full. How to avoid it.
262:
263: Operator Precedence
264:
265: * Why Precedence:: An example showing why precedence is needed.
266: * Using Precedence:: How to specify precedence in Bison grammars.
267: * Precedence Examples:: How these features are used in the previous example.
268: * How Precedence:: How they work.
269:
270: Handling Context Dependencies
271:
272: * Semantic Tokens:: Token parsing can depend on the semantic context.
273: * Lexical Tie-ins:: Token parsing can depend on the syntactic context.
274: * Tie-in Recovery:: Lexical tie-ins have implications for how
275: error recovery rules must be written.
276:
277: Invoking Bison
278:
279: * Bison Options:: All the options described in detail,
280: in alphabetical order by short options.
281: * Option Cross Key:: Alphabetical list of long options.
282: * VMS Invocation:: Bison command syntax on VMS.
283: @end menu
284:
285: @node Introduction, Conditions, Top, Top
286: @unnumbered Introduction
287: @cindex introduction
288:
289: @dfn{Bison} is a general-purpose parser generator that converts a
290: grammar description for an LALR(1) context-free grammar into a C
291: program to parse that grammar. Once you are proficient with Bison,
292: you may use it to develop a wide range of language parsers, from those
293: used in simple desk calculators to complex programming languages.
294:
295: Bison is upward compatible with Yacc: all properly-written Yacc grammars
296: ought to work with Bison with no change. Anyone familiar with Yacc
297: should be able to use Bison with little trouble. You need to be fluent in
298: C programming in order to use Bison or to understand this manual.
299:
300: We begin with tutorial chapters that explain the basic concepts of using
301: Bison and show three explained examples, each building on the last. If you
302: don't know Bison or Yacc, start by reading these chapters. Reference
303: chapters follow which describe specific aspects of Bison in detail.
304:
305: Bison was written primarily by Robert Corbett; Richard Stallman made
306: it Yacc-compatible. This edition corresponds to version 1.20 of Bison.
307:
308: @node Conditions, Copying, Introduction, Top
309: @unnumbered Conditions for Using Bison
310:
311: Bison grammars can be used only in programs that are free software. This
312: is in contrast to what happens with the GNU C compiler and the other
313: GNU programming tools.
314:
315: The reason Bison is special is that the output of the Bison utility---the
316: Bison parser file---contains a verbatim copy of a sizable piece of Bison,
317: which is the code for the @code{yyparse} function. (The actions from your
318: grammar are inserted into this function at one point, but the rest of the
319: function is not changed.)
320:
321: As a result, the Bison parser file is covered by the same copying
322: conditions that cover Bison itself and the rest of the GNU system: any
323: program containing it has to be distributed under the standard GNU copying
324: conditions.
325:
326: Occasionally people who would like to use Bison to develop proprietary
327: programs complain about this.
328:
329: We don't particularly sympathize with their complaints. The purpose of the
330: GNU project is to promote the right to share software and the practice of
331: sharing software; it is a means of changing society. The people who
332: complain are planning to be uncooperative toward the rest of the world; why
333: should they deserve our help in doing so?
334:
335: However, it's possible that a change in these conditions might encourage
336: computer companies to use and distribute the GNU system. If so, then we
337: might decide to change the terms on @code{yyparse} as a matter of the
338: strategy of promoting the right to share. Such a change would be
339: irrevocable. Since we stand by the copying permissions we have announced,
340: we cannot withdraw them once given.
341:
342: We mustn't make an irrevocable change hastily. We have to wait until there
343: is a complete GNU system and there has been time to learn how this issue
344: affects its reception.
345:
346: @node Copying, Concepts, Conditions, Top
347: @unnumbered GNU GENERAL PUBLIC LICENSE
348: @center Version 2, June 1991
349:
350: @display
351: Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
352: 675 Mass Ave, Cambridge, MA 02139, USA
353:
354: Everyone is permitted to copy and distribute verbatim copies
355: of this license document, but changing it is not allowed.
356: @end display
357:
358: @unnumberedsec Preamble
359:
360: The licenses for most software are designed to take away your
361: freedom to share and change it. By contrast, the GNU General Public
362: License is intended to guarantee your freedom to share and change free
363: software---to make sure the software is free for all its users. This
364: General Public License applies to most of the Free Software
365: Foundation's software and to any other program whose authors commit to
366: using it. (Some other Free Software Foundation software is covered by
367: the GNU Library General Public License instead.) You can apply it to
368: your programs, too.
369:
370: When we speak of free software, we are referring to freedom, not
371: price. Our General Public Licenses are designed to make sure that you
372: have the freedom to distribute copies of free software (and charge for
373: this service if you wish), that you receive source code or can get it
374: if you want it, that you can change the software or use pieces of it
375: in new free programs; and that you know you can do these things.
376:
377: To protect your rights, we need to make restrictions that forbid
378: anyone to deny you these rights or to ask you to surrender the rights.
379: These restrictions translate to certain responsibilities for you if you
380: distribute copies of the software, or if you modify it.
381:
382: For example, if you distribute copies of such a program, whether
383: gratis or for a fee, you must give the recipients all the rights that
384: you have. You must make sure that they, too, receive or can get the
385: source code. And you must show them these terms so they know their
386: rights.
387:
388: We protect your rights with two steps: (1) copyright the software, and
389: (2) offer you this license which gives you legal permission to copy,
390: distribute and/or modify the software.
391:
392: Also, for each author's protection and ours, we want to make certain
393: that everyone understands that there is no warranty for this free
394: software. If the software is modified by someone else and passed on, we
395: want its recipients to know that what they have is not the original, so
396: that any problems introduced by others will not reflect on the original
397: authors' reputations.
398:
399: Finally, any free program is threatened constantly by software
400: patents. We wish to avoid the danger that redistributors of a free
401: program will individually obtain patent licenses, in effect making the
402: program proprietary. To prevent this, we have made it clear that any
403: patent must be licensed for everyone's free use or not licensed at all.
404:
405: The precise terms and conditions for copying, distribution and
406: modification follow.
407:
408: @iftex
409: @unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
410: @end iftex
411: @ifinfo
412: @center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
413: @end ifinfo
414:
415: @enumerate 0
416: @item
417: This License applies to any program or other work which contains
418: a notice placed by the copyright holder saying it may be distributed
419: under the terms of this General Public License. The ``Program'', below,
420: refers to any such program or work, and a ``work based on the Program''
421: means either the Program or any derivative work under copyright law:
422: that is to say, a work containing the Program or a portion of it,
423: either verbatim or with modifications and/or translated into another
424: language. (Hereinafter, translation is included without limitation in
425: the term ``modification''.) Each licensee is addressed as ``you''.
426:
427: Activities other than copying, distribution and modification are not
428: covered by this License; they are outside its scope. The act of
429: running the Program is not restricted, and the output from the Program
430: is covered only if its contents constitute a work based on the
431: Program (independent of having been made by running the Program).
432: Whether that is true depends on what the Program does.
433:
434: @item
435: You may copy and distribute verbatim copies of the Program's
436: source code as you receive it, in any medium, provided that you
437: conspicuously and appropriately publish on each copy an appropriate
438: copyright notice and disclaimer of warranty; keep intact all the
439: notices that refer to this License and to the absence of any warranty;
440: and give any other recipients of the Program a copy of this License
441: along with the Program.
442:
443: You may charge a fee for the physical act of transferring a copy, and
444: you may at your option offer warranty protection in exchange for a fee.
445:
446: @item
447: You may modify your copy or copies of the Program or any portion
448: of it, thus forming a work based on the Program, and copy and
449: distribute such modifications or work under the terms of Section 1
450: above, provided that you also meet all of these conditions:
451:
452: @enumerate a
453: @item
454: You must cause the modified files to carry prominent notices
455: stating that you changed the files and the date of any change.
456:
457: @item
458: You must cause any work that you distribute or publish, that in
459: whole or in part contains or is derived from the Program or any
460: part thereof, to be licensed as a whole at no charge to all third
461: parties under the terms of this License.
462:
463: @item
464: If the modified program normally reads commands interactively
465: when run, you must cause it, when started running for such
466: interactive use in the most ordinary way, to print or display an
467: announcement including an appropriate copyright notice and a
468: notice that there is no warranty (or else, saying that you provide
469: a warranty) and that users may redistribute the program under
470: these conditions, and telling the user how to view a copy of this
471: License. (Exception: if the Program itself is interactive but
472: does not normally print such an announcement, your work based on
473: the Program is not required to print an announcement.)
474: @end enumerate
475:
476: These requirements apply to the modified work as a whole. If
477: identifiable sections of that work are not derived from the Program,
478: and can be reasonably considered independent and separate works in
479: themselves, then this License, and its terms, do not apply to those
480: sections when you distribute them as separate works. But when you
481: distribute the same sections as part of a whole which is a work based
482: on the Program, the distribution of the whole must be on the terms of
483: this License, whose permissions for other licensees extend to the
484: entire whole, and thus to each and every part regardless of who wrote it.
485:
486: Thus, it is not the intent of this section to claim rights or contest
487: your rights to work written entirely by you; rather, the intent is to
488: exercise the right to control the distribution of derivative or
489: collective works based on the Program.
490:
491: In addition, mere aggregation of another work not based on the Program
492: with the Program (or with a work based on the Program) on a volume of
493: a storage or distribution medium does not bring the other work under
494: the scope of this License.
495:
496: @item
497: You may copy and distribute the Program (or a work based on it,
498: under Section 2) in object code or executable form under the terms of
499: Sections 1 and 2 above provided that you also do one of the following:
500:
501: @enumerate a
502: @item
503: Accompany it with the complete corresponding machine-readable
504: source code, which must be distributed under the terms of Sections
505: 1 and 2 above on a medium customarily used for software interchange; or,
506:
507: @item
508: Accompany it with a written offer, valid for at least three
509: years, to give any third party, for a charge no more than your
510: cost of physically performing source distribution, a complete
511: machine-readable copy of the corresponding source code, to be
512: distributed under the terms of Sections 1 and 2 above on a medium
513: customarily used for software interchange; or,
514:
515: @item
516: Accompany it with the information you received as to the offer
517: to distribute corresponding source code. (This alternative is
518: allowed only for noncommercial distribution and only if you
519: received the program in object code or executable form with such
520: an offer, in accord with Subsection b above.)
521: @end enumerate
522:
523: The source code for a work means the preferred form of the work for
524: making modifications to it. For an executable work, complete source
525: code means all the source code for all modules it contains, plus any
526: associated interface definition files, plus the scripts used to
527: control compilation and installation of the executable. However, as a
528: special exception, the source code distributed need not include
529: anything that is normally distributed (in either source or binary
530: form) with the major components (compiler, kernel, and so on) of the
531: operating system on which the executable runs, unless that component
532: itself accompanies the executable.
533:
534: If distribution of executable or object code is made by offering
535: access to copy from a designated place, then offering equivalent
536: access to copy the source code from the same place counts as
537: distribution of the source code, even though third parties are not
538: compelled to copy the source along with the object code.
539:
540: @item
541: You may not copy, modify, sublicense, or distribute the Program
542: except as expressly provided under this License. Any attempt
543: otherwise to copy, modify, sublicense or distribute the Program is
544: void, and will automatically terminate your rights under this License.
545: However, parties who have received copies, or rights, from you under
546: this License will not have their licenses terminated so long as such
547: parties remain in full compliance.
548:
549: @item
550: You are not required to accept this License, since you have not
551: signed it. However, nothing else grants you permission to modify or
552: distribute the Program or its derivative works. These actions are
553: prohibited by law if you do not accept this License. Therefore, by
554: modifying or distributing the Program (or any work based on the
555: Program), you indicate your acceptance of this License to do so, and
556: all its terms and conditions for copying, distributing or modifying
557: the Program or works based on it.
558:
559: @item
560: Each time you redistribute the Program (or any work based on the
561: Program), the recipient automatically receives a license from the
562: original licensor to copy, distribute or modify the Program subject to
563: these terms and conditions. You may not impose any further
564: restrictions on the recipients' exercise of the rights granted herein.
565: You are not responsible for enforcing compliance by third parties to
566: this License.
567:
568: @item
569: If, as a consequence of a court judgment or allegation of patent
570: infringement or for any other reason (not limited to patent issues),
571: conditions are imposed on you (whether by court order, agreement or
572: otherwise) that contradict the conditions of this License, they do not
573: excuse you from the conditions of this License. If you cannot
574: distribute so as to satisfy simultaneously your obligations under this
575: License and any other pertinent obligations, then as a consequence you
576: may not distribute the Program at all. For example, if a patent
577: license would not permit royalty-free redistribution of the Program by
578: all those who receive copies directly or indirectly through you, then
579: the only way you could satisfy both it and this License would be to
580: refrain entirely from distribution of the Program.
581:
582: If any portion of this section is held invalid or unenforceable under
583: any particular circumstance, the balance of the section is intended to
584: apply and the section as a whole is intended to apply in other
585: circumstances.
586:
587: It is not the purpose of this section to induce you to infringe any
588: patents or other property right claims or to contest validity of any
589: such claims; this section has the sole purpose of protecting the
590: integrity of the free software distribution system, which is
591: implemented by public license practices. Many people have made
592: generous contributions to the wide range of software distributed
593: through that system in reliance on consistent application of that
594: system; it is up to the author/donor to decide if he or she is willing
595: to distribute software through any other system and a licensee cannot
596: impose that choice.
597:
598: This section is intended to make thoroughly clear what is believed to
599: be a consequence of the rest of this License.
600:
601: @item
602: If the distribution and/or use of the Program is restricted in
603: certain countries either by patents or by copyrighted interfaces, the
604: original copyright holder who places the Program under this License
605: may add an explicit geographical distribution limitation excluding
606: those countries, so that distribution is permitted only in or among
607: countries not thus excluded. In such case, this License incorporates
608: the limitation as if written in the body of this License.
609:
610: @item
611: The Free Software Foundation may publish revised and/or new versions
612: of the General Public License from time to time. Such new versions will
613: be similar in spirit to the present version, but may differ in detail to
614: address new problems or concerns.
615:
616: Each version is given a distinguishing version number. If the Program
617: specifies a version number of this License which applies to it and ``any
618: later version'', you have the option of following the terms and conditions
619: either of that version or of any later version published by the Free
620: Software Foundation. If the Program does not specify a version number of
621: this License, you may choose any version ever published by the Free Software
622: Foundation.
623:
624: @item
625: If you wish to incorporate parts of the Program into other free
626: programs whose distribution conditions are different, write to the author
627: to ask for permission. For software which is copyrighted by the Free
628: Software Foundation, write to the Free Software Foundation; we sometimes
629: make exceptions for this. Our decision will be guided by the two goals
630: of preserving the free status of all derivatives of our free software and
631: of promoting the sharing and reuse of software generally.
632:
633: @iftex
634: @heading NO WARRANTY
635: @end iftex
636: @ifinfo
637: @center NO WARRANTY
638: @end ifinfo
639:
640: @item
641: BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
642: FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
643: OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
644: PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
645: OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
646: MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
647: TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
648: PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
649: REPAIR OR CORRECTION.
650:
651: @item
652: IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
653: WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
654: REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
655: INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
656: OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
657: TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
658: YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
659: PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
660: POSSIBILITY OF SUCH DAMAGES.
661: @end enumerate
662:
663: @iftex
664: @heading END OF TERMS AND CONDITIONS
665: @end iftex
666: @ifinfo
667: @center END OF TERMS AND CONDITIONS
668: @end ifinfo
669:
670: @page
671: @unnumberedsec How to Apply These Terms to Your New Programs
672:
673: If you develop a new program, and you want it to be of the greatest
674: possible use to the public, the best way to achieve this is to make it
675: free software which everyone can redistribute and change under these terms.
676:
677: To do so, attach the following notices to the program. It is safest
678: to attach them to the start of each source file to most effectively
679: convey the exclusion of warranty; and each file should have at least
680: the ``copyright'' line and a pointer to where the full notice is found.
681:
682: @smallexample
683: @var{one line to give the program's name and a brief idea of what it does.}
684: Copyright (C) 19@var{yy} @var{name of author}
685:
686: This program is free software; you can redistribute it and/or modify
687: it under the terms of the GNU General Public License as published by
688: the Free Software Foundation; either version 2 of the License, or
689: (at your option) any later version.
690:
691: This program is distributed in the hope that it will be useful,
692: but WITHOUT ANY WARRANTY; without even the implied warranty of
693: MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
694: GNU General Public License for more details.
695:
696: You should have received a copy of the GNU General Public License
697: along with this program; if not, write to the Free Software
698: Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
699: @end smallexample
700:
701: Also add information on how to contact you by electronic and paper mail.
702:
703: If the program is interactive, make it output a short notice like this
704: when it starts in an interactive mode:
705:
706: @smallexample
707: Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
708: Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
709: type `show w'.
710: This is free software, and you are welcome to redistribute it
711: under certain conditions; type `show c' for details.
712: @end smallexample
713:
714: The hypothetical commands @samp{show w} and @samp{show c} should show
715: the appropriate parts of the General Public License. Of course, the
716: commands you use may be called something other than @samp{show w} and
717: @samp{show c}; they could even be mouse-clicks or menu items---whatever
718: suits your program.
719:
720: You should also get your employer (if you work as a programmer) or your
721: school, if any, to sign a ``copyright disclaimer'' for the program, if
722: necessary. Here is a sample; alter the names:
723:
724: @smallexample
725: Yoyodyne, Inc., hereby disclaims all copyright interest in the program
726: `Gnomovision' (which makes passes at compilers) written by James Hacker.
727:
728: @var{signature of Ty Coon}, 1 April 1989
729: Ty Coon, President of Vice
730: @end smallexample
731:
732: This General Public License does not permit incorporating your program into
733: proprietary programs. If your program is a subroutine library, you may
734: consider it more useful to permit linking proprietary applications with the
735: library. If this is what you want to do, use the GNU Library General
736: Public License instead of this License.
737:
738: @node Concepts, Examples, Copying, Top
739: @chapter The Concepts of Bison
740:
741: This chapter introduces many of the basic concepts without which the
742: details of Bison will not make sense. If you do not already know how to
743: use Bison or Yacc, we suggest you start by reading this chapter carefully.
744:
745: @menu
746: * Language and Grammar:: Languages and context-free grammars,
747: as mathematical ideas.
748: * Grammar in Bison:: How we represent grammars for Bison's sake.
749: * Semantic Values:: Each token or syntactic grouping can have
750: a semantic value (the value of an integer,
751: the name of an identifier, etc.).
752: * Semantic Actions:: Each rule can have an action containing C code.
753: * Bison Parser:: What are Bison's input and output,
754: how is the output used?
755: * Stages:: Stages in writing and running Bison grammars.
756: * Grammar Layout:: Overall structure of a Bison grammar file.
757: @end menu
758:
759: @node Language and Grammar, Grammar in Bison, , Concepts
760: @section Languages and Context-Free Grammars
761:
762: @c !!! ``An expression can be an integer'' is not a valid Bison
763: @c expression---Bison cannot read English! --rjc 6 Feb 1992
764: @cindex context-free grammar
765: @cindex grammar, context-free
766: In order for Bison to parse a language, it must be described by a
767: @dfn{context-free grammar}. This means that you specify one or more
768: @dfn{syntactic groupings} and give rules for constructing them from their
769: parts. For example, in the C language, one kind of grouping is called an
770: `expression'. One rule for making an expression might be, ``An expression
771: can be made of a minus sign and another expression''. Another would be,
772: ``An expression can be an integer''. As you can see, rules are often
773: recursive, but there must be at least one rule which leads out of the
774: recursion.
775:
776: @cindex BNF
777: @cindex Backus-Naur form
778: The most common formal system for presenting such rules for humans to read
779: is @dfn{Backus-Naur Form} or ``BNF'', which was developed in order to
780: specify the language Algol 60. Any grammar expressed in BNF is a
781: context-free grammar. The input to Bison is essentially machine-readable
782: BNF.
783:
784: Not all context-free languages can be handled by Bison, only those
785: that are LALR(1). In brief, this means that it must be possible to
786: tell how to parse any portion of an input string with just a single
787: token of look-ahead. Strictly speaking, that is a description of an
788: LR(1) grammar, and LALR(1) involves additional restrictions that are
789: hard to explain simply; but it is rare in actual practice to find an
790: LR(1) grammar that fails to be LALR(1). @xref{Mystery Conflicts, ,
791: Mysterious Reduce/Reduce Conflicts}, for more information on this.
792:
793: @cindex symbols (abstract)
794: @cindex token
795: @cindex syntactic grouping
796: @cindex grouping, syntactic
797: In the formal grammatical rules for a language, each kind of syntactic unit
798: or grouping is named by a @dfn{symbol}. Those which are built by grouping
799: smaller constructs according to grammatical rules are called
800: @dfn{nonterminal symbols}; those which can't be subdivided are called
801: @dfn{terminal symbols} or @dfn{token types}. We call a piece of input
802: corresponding to a single terminal symbol a @dfn{token}, and a piece
803: corresponding to a single nonterminal symbol a @dfn{grouping}.@refill
804:
805: We can use the C language as an example of what symbols, terminal and
806: nonterminal, mean. The tokens of C are identifiers, constants (numeric and
807: string), and the various keywords, arithmetic operators and punctuation
808: marks. So the terminal symbols of a grammar for C include `identifier',
809: `number', `string', plus one symbol for each keyword, operator or
810: punctuation mark: `if', `return', `const', `static', `int', `char',
811: `plus-sign', `open-brace', `close-brace', `comma' and many more. (These
812: tokens can be subdivided into characters, but that is a matter of
813: lexicography, not grammar.)
814:
815: Here is a simple C function subdivided into tokens:
816:
817: @example
818: int /* @r{keyword `int'} */
819: square (x) /* @r{identifier, open-paren,} */
820: /* @r{identifier, close-paren} */
821: int x; /* @r{keyword `int', identifier, semicolon} */
822: @{ /* @r{open-brace} */
823: return x * x; /* @r{keyword `return', identifier,} */
824: /* @r{asterisk, identifier, semicolon} */
825: @} /* @r{close-brace} */
826: @end example
827:
828: The syntactic groupings of C include the expression, the statement, the
829: declaration, and the function definition. These are represented in the
830: grammar of C by nonterminal symbols `expression', `statement',
831: `declaration' and `function definition'. The full grammar uses dozens of
832: additional language constructs, each with its own nonterminal symbol, in
833: order to express the meanings of these four. The example above is a
834: function definition; it contains one declaration, and one statement. In
835: the statement, each @samp{x} is an expression and so is @samp{x * x}.
836:
837: Each nonterminal symbol must have grammatical rules showing how it is made
838: out of simpler constructs. For example, one kind of C statement is the
839: @code{return} statement; this would be described with a grammar rule which
840: reads informally as follows:
841:
842: @quotation
843: A `statement' can be made of a `return' keyword, an `expression' and a
844: `semicolon'.
845: @end quotation
846:
847: @noindent
848: There would be many other rules for `statement', one for each kind of
849: statement in C.
850:
851: @cindex start symbol
852: One nonterminal symbol must be distinguished as the special one which
853: defines a complete utterance in the language. It is called the @dfn{start
854: symbol}. In a compiler, this means a complete input program. In the C
855: language, the nonterminal symbol `sequence of definitions and declarations'
856: plays this role.
857:
858: For example, @samp{1 + 2} is a valid C expression---a valid part of a C
859: program---but it is not valid as an @emph{entire} C program. In the
860: context-free grammar of C, this follows from the fact that `expression' is
861: not the start symbol.
862:
863: The Bison parser reads a sequence of tokens as its input, and groups the
864: tokens using the grammar rules. If the input is valid, the end result is
865: that the entire token sequence reduces to a single grouping whose symbol is
866: the grammar's start symbol. If we use a grammar for C, the entire input
867: must be a `sequence of definitions and declarations'. If not, the parser
868: reports a syntax error.
869:
870: @node Grammar in Bison, Semantic Values, Language and Grammar, Concepts
871: @section From Formal Rules to Bison Input
872: @cindex Bison grammar
873: @cindex grammar, Bison
874: @cindex formal grammar
875:
876: A formal grammar is a mathematical construct. To define the language
877: for Bison, you must write a file expressing the grammar in Bison syntax:
878: a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}.
879:
880: A nonterminal symbol in the formal grammar is represented in Bison input
881: as an identifier, like an identifier in C. By convention, it should be
882: in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
883:
884: The Bison representation for a terminal symbol is also called a @dfn{token
885: type}. Token types as well can be represented as C-like identifiers. By
886: convention, these identifiers should be upper case to distinguish them from
887: nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
888: @code{RETURN}. A terminal symbol that stands for a particular keyword in
889: the language should be named after that keyword converted to upper case.
890: The terminal symbol @code{error} is reserved for error recovery.
891: @xref{Symbols}.@refill
892:
893: A terminal symbol can also be represented as a character literal, just like
894: a C character constant. You should do this whenever a token is just a
895: single character (parenthesis, plus-sign, etc.): use that same character in
896: a literal as the terminal symbol for that token.
897:
898: The grammar rules also have an expression in Bison syntax. For example,
899: here is the Bison rule for a C @code{return} statement. The semicolon in
900: quotes is a literal character token, representing part of the C syntax for
901: the statement; the naked semicolon, and the colon, are Bison punctuation
902: used in every rule.
903:
904: @example
905: stmt: RETURN expr ';'
906: ;
907: @end example
908:
909: @noindent
910: @xref{Rules, ,Syntax of Grammar Rules}.
911:
912: @node Semantic Values, Semantic Actions, Grammar in Bison, Concepts
913: @section Semantic Values
914: @cindex semantic value
915: @cindex value, semantic
916:
917: A formal grammar selects tokens only by their classifications: for example,
918: if a rule mentions the terminal symbol `integer constant', it means that
919: @emph{any} integer constant is grammatically valid in that position. The
920: precise value of the constant is irrelevant to how to parse the input: if
921: @samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
922: grammatical.@refill
923:
924: But the precise value is very important for what the input means once it is
925: parsed. A compiler is useless if it fails to distinguish between 4, 1 and
926: 3989 as constants in the program! Therefore, each token in a Bison grammar
927: has both a token type and a @dfn{semantic value}. @xref{Semantics, ,Defining Language Semantics},
928: for details.
929:
930: The token type is a terminal symbol defined in the grammar, such as
931: @code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything
932: you need to know to decide where the token may validly appear and how to
933: group it with other tokens. The grammar rules know nothing about tokens
934: except their types.@refill
935:
936: The semantic value has all the rest of the information about the
937: meaning of the token, such as the value of an integer, or the name of an
938: identifier. (A token such as @code{','} which is just punctuation doesn't
939: need to have any semantic value.)
940:
941: For example, an input token might be classified as token type
942: @code{INTEGER} and have the semantic value 4. Another input token might
943: have the same token type @code{INTEGER} but value 3989. When a grammar
944: rule says that @code{INTEGER} is allowed, either of these tokens is
945: acceptable because each is an @code{INTEGER}. When the parser accepts the
946: token, it keeps track of the token's semantic value.
947:
948: Each grouping can also have a semantic value as well as its nonterminal
949: symbol. For example, in a calculator, an expression typically has a
950: semantic value that is a number. In a compiler for a programming
951: language, an expression typically has a semantic value that is a tree
952: structure describing the meaning of the expression.
953:
954: @node Semantic Actions, Bison Parser, Semantic Values, Concepts
955: @section Semantic Actions
956: @cindex semantic actions
957: @cindex actions, semantic
958:
959: In order to be useful, a program must do more than parse input; it must
960: also produce some output based on the input. In a Bison grammar, a grammar
961: rule can have an @dfn{action} made up of C statements. Each time the
962: parser recognizes a match for that rule, the action is executed.
963: @xref{Actions}.
964:
965: Most of the time, the purpose of an action is to compute the semantic value
966: of the whole construct from the semantic values of its parts. For example,
967: suppose we have a rule which says an expression can be the sum of two
968: expressions. When the parser recognizes such a sum, each of the
969: subexpressions has a semantic value which describes how it was built up.
970: The action for this rule should create a similar sort of value for the
971: newly recognized larger expression.
972:
973: For example, here is a rule that says an expression can be the sum of
974: two subexpressions:
975:
976: @example
977: expr: expr '+' expr @{ $$ = $1 + $3; @}
978: ;
979: @end example
980:
981: @noindent
982: The action says how to produce the semantic value of the sum expression
983: from the values of the two subexpressions.
984:
985: @node Bison Parser, Stages, Semantic Actions, Concepts
986: @section Bison Output: the Parser File
987: @cindex Bison parser
988: @cindex Bison utility
989: @cindex lexical analyzer, purpose
990: @cindex parser
991:
992: When you run Bison, you give it a Bison grammar file as input. The output
993: is a C source file that parses the language described by the grammar.
994: This file is called a @dfn{Bison parser}. Keep in mind that the Bison
995: utility and the Bison parser are two distinct programs: the Bison utility
996: is a program whose output is the Bison parser that becomes part of your
997: program.
998:
999: The job of the Bison parser is to group tokens into groupings according to
1000: the grammar rules---for example, to build identifiers and operators into
1001: expressions. As it does this, it runs the actions for the grammar rules it
1002: uses.
1003:
1004: The tokens come from a function called the @dfn{lexical analyzer} that you
1005: must supply in some fashion (such as by writing it in C). The Bison parser
1006: calls the lexical analyzer each time it wants a new token. It doesn't know
1007: what is ``inside'' the tokens (though their semantic values may reflect
1008: this). Typically the lexical analyzer makes the tokens by parsing
1009: characters of text, but Bison does not depend on this. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
1010:
1011: The Bison parser file is C code which defines a function named
1012: @code{yyparse} which implements that grammar. This function does not make
1013: a complete C program: you must supply some additional functions. One is
1014: the lexical analyzer. Another is an error-reporting function which the
1015: parser calls to report an error. In addition, a complete C program must
1016: start with a function called @code{main}; you have to provide this, and
1017: arrange for it to call @code{yyparse} or the parser will never run.
1018: @xref{Interface, ,Parser C-Language Interface}.
1019:
1020: Aside from the token type names and the symbols in the actions you
1021: write, all variable and function names used in the Bison parser file
1022: begin with @samp{yy} or @samp{YY}. This includes interface functions
1023: such as the lexical analyzer function @code{yylex}, the error reporting
1024: function @code{yyerror} and the parser function @code{yyparse} itself.
1025: This also includes numerous identifiers used for internal purposes.
1026: Therefore, you should avoid using C identifiers starting with @samp{yy}
1027: or @samp{YY} in the Bison grammar file except for the ones defined in
1028: this manual.
1029:
1030: @node Stages, Grammar Layout, Bison Parser, Concepts
1031: @section Stages in Using Bison
1032: @cindex stages in using Bison
1033: @cindex using Bison
1034:
1035: The actual language-design process using Bison, from grammar specification
1036: to a working compiler or interpreter, has these parts:
1037:
1038: @enumerate
1039: @item
1040: Formally specify the grammar in a form recognized by Bison
1041: (@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule in the language,
1042: describe the action that is to be taken when an instance of that rule
1043: is recognized. The action is described by a sequence of C statements.
1044:
1045: @item
1046: Write a lexical analyzer to process input and pass tokens to the
1047: parser. The lexical analyzer may be written by hand in C
1048: (@pxref{Lexical, ,The Lexical Analyzer Function @code{yylex}}). It could also be produced using Lex, but the use
1049: of Lex is not discussed in this manual.
1050:
1051: @item
1052: Write a controlling function that calls the Bison-produced parser.
1053:
1054: @item
1055: Write error-reporting routines.
1056: @end enumerate
1057:
1058: To turn this source code as written into a runnable program, you
1059: must follow these steps:
1060:
1061: @enumerate
1062: @item
1063: Run Bison on the grammar to produce the parser.
1064:
1065: @item
1066: Compile the code output by Bison, as well as any other source files.
1067:
1068: @item
1069: Link the object files to produce the finished product.
1070: @end enumerate
1071:
1072: @node Grammar Layout, , Stages, Concepts
1073: @section The Overall Layout of a Bison Grammar
1074: @cindex grammar file
1075: @cindex file format
1076: @cindex format of grammar file
1077: @cindex layout of Bison grammar
1078:
1079: The input file for the Bison utility is a @dfn{Bison grammar file}. The
1080: general form of a Bison grammar file is as follows:
1081:
1082: @example
1083: %@{
1084: @var{C declarations}
1085: %@}
1086:
1087: @var{Bison declarations}
1088:
1089: %%
1090: @var{Grammar rules}
1091: %%
1092: @var{Additional C code}
1093: @end example
1094:
1095: @noindent
1096: The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
1097: in every Bison grammar file to separate the sections.
1098:
1099: The C declarations may define types and variables used in the actions.
1100: You can also use preprocessor commands to define macros used there, and use
1101: @code{#include} to include header files that do any of these things.
1102:
1103: The Bison declarations declare the names of the terminal and nonterminal
1104: symbols, and may also describe operator precedence and the data types of
1105: semantic values of various symbols.
1106:
1107: The grammar rules define how to construct each nonterminal symbol from its
1108: parts.
1109:
1110: The additional C code can contain any C code you want to use. Often the
1111: definition of the lexical analyzer @code{yylex} goes here, plus subroutines
1112: called by the actions in the grammar rules. In a simple program, all the
1113: rest of the program can go here.
1114:
1115: @node Examples, Grammar File, Concepts, Top
1116: @chapter Examples
1117: @cindex simple examples
1118: @cindex examples, simple
1119:
1120: Now we show and explain three sample programs written using Bison: a
1121: reverse polish notation calculator, an algebraic (infix) notation
1122: calculator, and a multi-function calculator. All three have been tested
1123: under BSD Unix 4.3; each produces a usable, though limited, interactive
1124: desk-top calculator.
1125:
1126: These examples are simple, but Bison grammars for real programming
1127: languages are written the same way.
1128: @ifinfo
1129: You can copy these examples out of the Info file and into a source file
1130: to try them.
1131: @end ifinfo
1132:
1133: @menu
1134: * RPN Calc:: Reverse polish notation calculator;
1135: a first example with no operator precedence.
1136: * Infix Calc:: Infix (algebraic) notation calculator.
1137: Operator precedence is introduced.
1138: * Simple Error Recovery:: Continuing after syntax errors.
1139: * Multi-function Calc:: Calculator with memory and trig functions.
1140: It uses multiple data-types for semantic values.
1141: * Exercises:: Ideas for improving the multi-function calculator.
1142: @end menu
1143:
1144: @node RPN Calc, Infix Calc, , Examples
1145: @section Reverse Polish Notation Calculator
1146: @cindex reverse polish notation
1147: @cindex polish notation calculator
1148: @cindex @code{rpcalc}
1149: @cindex calculator, simple
1150:
1151: The first example is that of a simple double-precision @dfn{reverse polish
1152: notation} calculator (a calculator using postfix operators). This example
1153: provides a good starting point, since operator precedence is not an issue.
1154: The second example will illustrate how operator precedence is handled.
1155:
1156: The source code for this calculator is named @file{rpcalc.y}. The
1157: @samp{.y} extension is a convention used for Bison input files.
1158:
1159: @menu
1160: * Decls: Rpcalc Decls. Bison and C declarations for rpcalc.
1161: * Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
1162: * Lexer: Rpcalc Lexer. The lexical analyzer.
1163: * Main: Rpcalc Main. The controlling function.
1164: * Error: Rpcalc Error. The error reporting function.
1165: * Gen: Rpcalc Gen. Running Bison on the grammar file.
1166: * Comp: Rpcalc Compile. Run the C compiler on the output code.
1167: @end menu
1168:
1169: @node Rpcalc Decls, Rpcalc Rules, , RPN Calc
1170: @subsection Declarations for @code{rpcalc}
1171:
1172: Here are the C and Bison declarations for the reverse polish notation
1173: calculator. As in C, comments are placed between @samp{/*@dots{}*/}.
1174:
1175: @example
1176: /* Reverse polish notation calculator. */
1177:
1178: %@{
1179: #define YYSTYPE double
1180: #include <math.h>
1181: %@}
1182:
1183: %token NUM
1184:
1185: %% /* Grammar rules and actions follow */
1186: @end example
1187:
1188: The C declarations section (@pxref{C Declarations, ,The C Declarations Section}) contains two
1189: preprocessor directives.
1190:
1191: The @code{#define} directive defines the macro @code{YYSTYPE}, thus
1192: specifying the C data type for semantic values of both tokens and groupings
1193: (@pxref{Value Type, ,Data Types of Semantic Values}). The Bison parser will use whatever type
1194: @code{YYSTYPE} is defined as; if you don't define it, @code{int} is the
1195: default. Because we specify @code{double}, each token and each expression
1196: has an associated value, which is a floating point number.
1197:
1198: The @code{#include} directive is used to declare the exponentiation
1199: function @code{pow}.
1200:
1201: The second section, Bison declarations, provides information to Bison about
1202: the token types (@pxref{Bison Declarations, ,The Bison Declarations Section}). Each terminal symbol that is
1203: not a single-character literal must be declared here. (Single-character
1204: literals normally don't need to be declared.) In this example, all the
1205: arithmetic operators are designated by single-character literals, so the
1206: only terminal symbol that needs to be declared is @code{NUM}, the token
1207: type for numeric constants.
1208:
1209: @node Rpcalc Rules, Rpcalc Lexer, Rpcalc Decls, RPN Calc
1210: @subsection Grammar Rules for @code{rpcalc}
1211:
1212: Here are the grammar rules for the reverse polish notation calculator.
1213:
1214: @example
1215: input: /* empty */
1216: | input line
1217: ;
1218:
1219: line: '\n'
1220: | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1221: ;
1222:
1223: exp: NUM @{ $$ = $1; @}
1224: | exp exp '+' @{ $$ = $1 + $2; @}
1225: | exp exp '-' @{ $$ = $1 - $2; @}
1226: | exp exp '*' @{ $$ = $1 * $2; @}
1227: | exp exp '/' @{ $$ = $1 / $2; @}
1228: /* Exponentiation */
1229: | exp exp '^' @{ $$ = pow ($1, $2); @}
1230: /* Unary minus */
1231: | exp 'n' @{ $$ = -$1; @}
1232: ;
1233: %%
1234: @end example
1235:
1236: The groupings of the rpcalc ``language'' defined here are the expression
1237: (given the name @code{exp}), the line of input (@code{line}), and the
1238: complete input transcript (@code{input}). Each of these nonterminal
1239: symbols has several alternate rules, joined by the @samp{|} punctuator
1240: which is read as ``or''. The following sections explain what these rules
1241: mean.
1242:
1243: The semantics of the language is determined by the actions taken when a
1244: grouping is recognized. The actions are the C code that appears inside
1245: braces. @xref{Actions}.
1246:
1247: You must specify these actions in C, but Bison provides the means for
1248: passing semantic values between the rules. In each action, the
1249: pseudo-variable @code{$$} stands for the semantic value for the grouping
1250: that the rule is going to construct. Assigning a value to @code{$$} is the
1251: main job of most actions. The semantic values of the components of the
1252: rule are referred to as @code{$1}, @code{$2}, and so on.
1253:
1254: @menu
1255: * Rpcalc Input::
1256: * Rpcalc Line::
1257: * Rpcalc Expr::
1258: @end menu
1259:
1260: @node Rpcalc Input, Rpcalc Line, , Rpcalc Rules
1261: @subsubsection Explanation of @code{input}
1262:
1263: Consider the definition of @code{input}:
1264:
1265: @example
1266: input: /* empty */
1267: | input line
1268: ;
1269: @end example
1270:
1271: This definition reads as follows: ``A complete input is either an empty
1272: string, or a complete input followed by an input line''. Notice that
1273: ``complete input'' is defined in terms of itself. This definition is said
1274: to be @dfn{left recursive} since @code{input} appears always as the
1275: leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}.
1276:
1277: The first alternative is empty because there are no symbols between the
1278: colon and the first @samp{|}; this means that @code{input} can match an
1279: empty string of input (no tokens). We write the rules this way because it
1280: is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
1281: It's conventional to put an empty alternative first and write the comment
1282: @samp{/* empty */} in it.
1283:
1284: The second alternate rule (@code{input line}) handles all nontrivial input.
1285: It means, ``After reading any number of lines, read one more line if
1286: possible.'' The left recursion makes this rule into a loop. Since the
1287: first alternative matches empty input, the loop can be executed zero or
1288: more times.
1289:
1290: The parser function @code{yyparse} continues to process input until a
1291: grammatical error is seen or the lexical analyzer says there are no more
1292: input tokens; we will arrange for the latter to happen at end of file.
1293:
1294: @node Rpcalc Line, Rpcalc Expr, Rpcalc Input, Rpcalc Rules
1295: @subsubsection Explanation of @code{line}
1296:
1297: Now consider the definition of @code{line}:
1298:
1299: @example
1300: line: '\n'
1301: | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1302: ;
1303: @end example
1304:
1305: The first alternative is a token which is a newline character; this means
1306: that rpcalc accepts a blank line (and ignores it, since there is no
1307: action). The second alternative is an expression followed by a newline.
1308: This is the alternative that makes rpcalc useful. The semantic value of
1309: the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
1310: question is the first symbol in the alternative. The action prints this
1311: value, which is the result of the computation the user asked for.
1312:
1313: This action is unusual because it does not assign a value to @code{$$}. As
1314: a consequence, the semantic value associated with the @code{line} is
1315: uninitialized (its value will be unpredictable). This would be a bug if
1316: that value were ever used, but we don't use it: once rpcalc has printed the
1317: value of the user's input line, that value is no longer needed.
1318:
1319: @node Rpcalc Expr, , Rpcalc Line, Rpcalc Rules
1320: @subsubsection Explanation of @code{expr}
1321:
1322: The @code{exp} grouping has several rules, one for each kind of expression.
1323: The first rule handles the simplest expressions: those that are just numbers.
1324: The second handles an addition-expression, which looks like two expressions
1325: followed by a plus-sign. The third handles subtraction, and so on.
1326:
1327: @example
1328: exp: NUM
1329: | exp exp '+' @{ $$ = $1 + $2; @}
1330: | exp exp '-' @{ $$ = $1 - $2; @}
1331: @dots{}
1332: ;
1333: @end example
1334:
1335: We have used @samp{|} to join all the rules for @code{exp}, but we could
1336: equally well have written them separately:
1337:
1338: @example
1339: exp: NUM ;
1340: exp: exp exp '+' @{ $$ = $1 + $2; @} ;
1341: exp: exp exp '-' @{ $$ = $1 - $2; @} ;
1342: @dots{}
1343: @end example
1344:
1345: Most of the rules have actions that compute the value of the expression in
1346: terms of the value of its parts. For example, in the rule for addition,
1347: @code{$1} refers to the first component @code{exp} and @code{$2} refers to
1348: the second one. The third component, @code{'+'}, has no meaningful
1349: associated semantic value, but if it had one you could refer to it as
1350: @code{$3}. When @code{yyparse} recognizes a sum expression using this
1351: rule, the sum of the two subexpressions' values is produced as the value of
1352: the entire expression. @xref{Actions}.
1353:
1354: You don't have to give an action for every rule. When a rule has no
1355: action, Bison by default copies the value of @code{$1} into @code{$$}.
1356: This is what happens in the first rule (the one that uses @code{NUM}).
1357:
1358: The formatting shown here is the recommended convention, but Bison does
1359: not require it. You can add or change whitespace as much as you wish.
1360: For example, this:
1361:
1362: @example
1363: exp : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{}
1364: @end example
1365:
1366: @noindent
1367: means the same thing as this:
1368:
1369: @example
1370: exp: NUM
1371: | exp exp '+' @{ $$ = $1 + $2; @}
1372: | @dots{}
1373: @end example
1374:
1375: @noindent
1376: The latter, however, is much more readable.
1377:
1378: @node Rpcalc Lexer, Rpcalc Main, Rpcalc Rules, RPN Calc
1379: @subsection The @code{rpcalc} Lexical Analyzer
1380: @cindex writing a lexical analyzer
1381: @cindex lexical analyzer, writing
1382:
1383: The lexical analyzer's job is low-level parsing: converting characters or
1384: sequences of characters into tokens. The Bison parser gets its tokens by
1385: calling the lexical analyzer. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
1386:
1387: Only a simple lexical analyzer is needed for the RPN calculator. This
1388: lexical analyzer skips blanks and tabs, then reads in numbers as
1389: @code{double} and returns them as @code{NUM} tokens. Any other character
1390: that isn't part of a number is a separate token. Note that the token-code
1391: for such a single-character token is the character itself.
1392:
1393: The return value of the lexical analyzer function is a numeric code which
1394: represents a token type. The same text used in Bison rules to stand for
1395: this token type is also a C expression for the numeric code for the type.
1396: This works in two ways. If the token type is a character literal, then its
1397: numeric code is the ASCII code for that character; you can use the same
1398: character literal in the lexical analyzer to express the number. If the
1399: token type is an identifier, that identifier is defined by Bison as a C
1400: macro whose definition is the appropriate number. In this example,
1401: therefore, @code{NUM} becomes a macro for @code{yylex} to use.
1402:
1403: The semantic value of the token (if it has one) is stored into the global
1404: variable @code{yylval}, which is where the Bison parser will look for it.
1405: (The C data type of @code{yylval} is @code{YYSTYPE}, which was defined
1406: at the beginning of the grammar; @pxref{Rpcalc Decls, ,Declarations for @code{rpcalc}}.)
1407:
1408: A token type code of zero is returned if the end-of-file is encountered.
1409: (Bison recognizes any nonpositive value as indicating the end of the
1410: input.)
1411:
1412: Here is the code for the lexical analyzer:
1413:
1414: @example
1415: @group
1416: /* Lexical analyzer returns a double floating point
1417: number on the stack and the token NUM, or the ASCII
1418: character read if not a number. Skips all blanks
1419: and tabs, returns 0 for EOF. */
1420:
1421: #include <ctype.h>
1422: @end group
1423:
1424: @group
1425: yylex ()
1426: @{
1427: int c;
1428:
1429: /* skip white space */
1430: while ((c = getchar ()) == ' ' || c == '\t')
1431: ;
1432: @end group
1433: @group
1434: /* process numbers */
1435: if (c == '.' || isdigit (c))
1436: @{
1437: ungetc (c, stdin);
1438: scanf ("%lf", &yylval);
1439: return NUM;
1440: @}
1441: @end group
1442: @group
1443: /* return end-of-file */
1444: if (c == EOF)
1445: return 0;
1446: /* return single chars */
1447: return c;
1448: @}
1449: @end group
1450: @end example
1451:
1452: @node Rpcalc Main, Rpcalc Error, Rpcalc Lexer, RPN Calc
1453: @subsection The Controlling Function
1454: @cindex controlling function
1455: @cindex main function in simple example
1456:
1457: In keeping with the spirit of this example, the controlling function is
1458: kept to the bare minimum. The only requirement is that it call
1459: @code{yyparse} to start the process of parsing.
1460:
1461: @example
1462: @group
1463: main ()
1464: @{
1465: yyparse ();
1466: @}
1467: @end group
1468: @end example
1469:
1470: @node Rpcalc Error, Rpcalc Gen, Rpcalc Main, RPN Calc
1471: @subsection The Error Reporting Routine
1472: @cindex error reporting routine
1473:
1474: When @code{yyparse} detects a syntax error, it calls the error reporting
1475: function @code{yyerror} to print an error message (usually but not always
1476: @code{"parse error"}). It is up to the programmer to supply @code{yyerror}
1477: (@pxref{Interface, ,Parser C-Language Interface}), so here is the definition we will use:
1478:
1479: @example
1480: @group
1481: #include <stdio.h>
1482:
1483: yyerror (s) /* Called by yyparse on error */
1484: char *s;
1485: @{
1486: printf ("%s\n", s);
1487: @}
1488: @end group
1489: @end example
1490:
1491: After @code{yyerror} returns, the Bison parser may recover from the error
1492: and continue parsing if the grammar contains a suitable error rule
1493: (@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We
1494: have not written any error rules in this example, so any invalid input will
1495: cause the calculator program to exit. This is not clean behavior for a
1496: real calculator, but it is adequate in the first example.
1497:
1498: @node Rpcalc Gen, Rpcalc Compile, Rpcalc Error, RPN Calc
1499: @subsection Running Bison to Make the Parser
1500: @cindex running Bison (introduction)
1501:
1502: Before running Bison to produce a parser, we need to decide how to arrange
1503: all the source code in one or more source files. For such a simple example,
1504: the easiest thing is to put everything in one file. The definitions of
1505: @code{yylex}, @code{yyerror} and @code{main} go at the end, in the
1506: ``additional C code'' section of the file (@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}).
1507:
1508: For a large project, you would probably have several source files, and use
1509: @code{make} to arrange to recompile them.
1510:
1511: With all the source in a single file, you use the following command to
1512: convert it into a parser file:
1513:
1514: @example
1515: bison @var{file_name}.y
1516: @end example
1517:
1518: @noindent
1519: In this example the file was called @file{rpcalc.y} (for ``Reverse Polish
1520: CALCulator''). Bison produces a file named @file{@var{file_name}.tab.c},
1521: removing the @samp{.y} from the original file name. The file output by
1522: Bison contains the source code for @code{yyparse}. The additional
1523: functions in the input file (@code{yylex}, @code{yyerror} and @code{main})
1524: are copied verbatim to the output.
1525:
1526: @node Rpcalc Compile, , Rpcalc Gen, RPN Calc
1527: @subsection Compiling the Parser File
1528: @cindex compiling the parser
1529:
1530: Here is how to compile and run the parser file:
1531:
1532: @example
1533: @group
1534: # @r{List files in current directory.}
1535: % ls
1536: rpcalc.tab.c rpcalc.y
1537: @end group
1538:
1539: @group
1540: # @r{Compile the Bison parser.}
1541: # @r{@samp{-lm} tells compiler to search math library for @code{pow}.}
1542: % cc rpcalc.tab.c -lm -o rpcalc
1543: @end group
1544:
1545: @group
1546: # @r{List files again.}
1547: % ls
1548: rpcalc rpcalc.tab.c rpcalc.y
1549: @end group
1550: @end example
1551:
1552: The file @file{rpcalc} now contains the executable code. Here is an
1553: example session using @code{rpcalc}.
1554:
1555: @example
1556: % rpcalc
1557: 4 9 +
1558: 13
1559: 3 7 + 3 4 5 *+-
1560: -13
1561: 3 7 + 3 4 5 * + - n @r{Note the unary minus, @samp{n}}
1562: 13
1563: 5 6 / 4 n +
1564: -3.166666667
1565: 3 4 ^ @r{Exponentiation}
1566: 81
1567: ^D @r{End-of-file indicator}
1568: %
1569: @end example
1570:
1571: @node Infix Calc, Simple Error Recovery, RPN Calc, Examples
1572: @section Infix Notation Calculator: @code{calc}
1573: @cindex infix notation calculator
1574: @cindex @code{calc}
1575: @cindex calculator, infix notation
1576:
1577: We now modify rpcalc to handle infix operators instead of postfix. Infix
1578: notation involves the concept of operator precedence and the need for
1579: parentheses nested to arbitrary depth. Here is the Bison code for
1580: @file{calc.y}, an infix desk-top calculator.
1581:
1582: @example
1583: /* Infix notation calculator--calc */
1584:
1585: %@{
1586: #define YYSTYPE double
1587: #include <math.h>
1588: %@}
1589:
1590: /* BISON Declarations */
1591: %token NUM
1592: %left '-' '+'
1593: %left '*' '/'
1594: %left NEG /* negation--unary minus */
1595: %right '^' /* exponentiation */
1596:
1597: /* Grammar follows */
1598: %%
1599: input: /* empty string */
1600: | input line
1601: ;
1602:
1603: line: '\n'
1604: | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1605: ;
1606:
1607: exp: NUM @{ $$ = $1; @}
1608: | exp '+' exp @{ $$ = $1 + $3; @}
1609: | exp '-' exp @{ $$ = $1 - $3; @}
1610: | exp '*' exp @{ $$ = $1 * $3; @}
1611: | exp '/' exp @{ $$ = $1 / $3; @}
1612: | '-' exp %prec NEG @{ $$ = -$2; @}
1613: | exp '^' exp @{ $$ = pow ($1, $3); @}
1614: | '(' exp ')' @{ $$ = $2; @}
1615: ;
1616: %%
1617: @end example
1618:
1619: @noindent
1620: The functions @code{yylex}, @code{yyerror} and @code{main} can be the same
1621: as before.
1622:
1623: There are two important new features shown in this code.
1624:
1625: In the second section (Bison declarations), @code{%left} declares token
1626: types and says they are left-associative operators. The declarations
1627: @code{%left} and @code{%right} (right associativity) take the place of
1628: @code{%token} which is used to declare a token type name without
1629: associativity. (These tokens are single-character literals, which
1630: ordinarily don't need to be declared. We declare them here to specify
1631: the associativity.)
1632:
1633: Operator precedence is determined by the line ordering of the
1634: declarations; the higher the line number of the declaration (lower on
1635: the page or screen), the higher the precedence. Hence, exponentiation
1636: has the highest precedence, unary minus (@code{NEG}) is next, followed
1637: by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator Precedence}.
1638:
1639: The other important new feature is the @code{%prec} in the grammar section
1640: for the unary minus operator. The @code{%prec} simply instructs Bison that
1641: the rule @samp{| '-' exp} has the same precedence as @code{NEG}---in this
1642: case the next-to-highest. @xref{Contextual Precedence, ,Context-Dependent Precedence}.
1643:
1644: Here is a sample run of @file{calc.y}:
1645:
1646: @need 500
1647: @example
1648: % calc
1649: 4 + 4.5 - (34/(8*3+-3))
1650: 6.880952381
1651: -56 + 2
1652: -54
1653: 3 ^ 2
1654: 9
1655: @end example
1656:
1657: @node Simple Error Recovery, Multi-function Calc, Infix Calc, Examples
1658: @section Simple Error Recovery
1659: @cindex error recovery, simple
1660:
1661: Up to this point, this manual has not addressed the issue of @dfn{error
1662: recovery}---how to continue parsing after the parser detects a syntax
1663: error. All we have handled is error reporting with @code{yyerror}. Recall
1664: that by default @code{yyparse} returns after calling @code{yyerror}. This
1665: means that an erroneous input line causes the calculator program to exit.
1666: Now we show how to rectify this deficiency.
1667:
1668: The Bison language itself includes the reserved word @code{error}, which
1669: may be included in the grammar rules. In the example below it has
1670: been added to one of the alternatives for @code{line}:
1671:
1672: @example
1673: @group
1674: line: '\n'
1675: | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1676: | error '\n' @{ yyerrok; @}
1677: ;
1678: @end group
1679: @end example
1680:
1681: This addition to the grammar allows for simple error recovery in the event
1682: of a parse error. If an expression that cannot be evaluated is read, the
1683: error will be recognized by the third rule for @code{line}, and parsing
1684: will continue. (The @code{yyerror} function is still called upon to print
1685: its message as well.) The action executes the statement @code{yyerrok}, a
1686: macro defined automatically by Bison; its meaning is that error recovery is
1687: complete (@pxref{Error Recovery}). Note the difference between
1688: @code{yyerrok} and @code{yyerror}; neither one is a misprint.@refill
1689:
1690: This form of error recovery deals with syntax errors. There are other
1691: kinds of errors; for example, division by zero, which raises an exception
1692: signal that is normally fatal. A real calculator program must handle this
1693: signal and use @code{longjmp} to return to @code{main} and resume parsing
1694: input lines; it would also have to discard the rest of the current line of
1695: input. We won't discuss this issue further because it is not specific to
1696: Bison programs.
1697:
1698: @node Multi-function Calc, Exercises, Simple Error Recovery, Examples
1699: @section Multi-Function Calculator: @code{mfcalc}
1700: @cindex multi-function calculator
1701: @cindex @code{mfcalc}
1702: @cindex calculator, multi-function
1703:
1704: Now that the basics of Bison have been discussed, it is time to move on to
1705: a more advanced problem. The above calculators provided only five
1706: functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would
1707: be nice to have a calculator that provides other mathematical functions such
1708: as @code{sin}, @code{cos}, etc.
1709:
1710: It is easy to add new operators to the infix calculator as long as they are
1711: only single-character literals. The lexical analyzer @code{yylex} passes
1712: back all non-number characters as tokens, so new grammar rules suffice for
1713: adding a new operator. But we want something more flexible: built-in
1714: functions whose syntax has this form:
1715:
1716: @example
1717: @var{function_name} (@var{argument})
1718: @end example
1719:
1720: @noindent
1721: At the same time, we will add memory to the calculator, by allowing you
1722: to create named variables, store values in them, and use them later.
1723: Here is a sample session with the multi-function calculator:
1724:
1725: @example
1726: % acalc
1727: pi = 3.141592653589
1728: 3.1415926536
1729: sin(pi)
1730: 0.0000000000
1731: alpha = beta1 = 2.3
1732: 2.3000000000
1733: alpha
1734: 2.3000000000
1735: ln(alpha)
1736: 0.8329091229
1737: exp(ln(beta1))
1738: 2.3000000000
1739: %
1740: @end example
1741:
1742: Note that multiple assignment and nested function calls are permitted.
1743:
1744: @menu
1745: * Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
1746: * Rules: Mfcalc Rules. Grammar rules for the calculator.
1747: * Symtab: Mfcalc Symtab. Symbol table management subroutines.
1748: @end menu
1749:
1750: @node Mfcalc Decl, Mfcalc Rules, , Multi-function Calc
1751: @subsection Declarations for @code{mfcalc}
1752:
1753: Here are the C and Bison declarations for the multi-function calculator.
1754:
1755: @smallexample
1756: %@{
1757: #include <math.h> /* For math functions, cos(), sin(), etc. */
1758: #include "calc.h" /* Contains definition of `symrec' */
1759: %@}
1760: %union @{
1761: double val; /* For returning numbers. */
1762: symrec *tptr; /* For returning symbol-table pointers */
1763: @}
1764:
1765: %token <val> NUM /* Simple double precision number */
1766: %token <tptr> VAR FNCT /* Variable and Function */
1767: %type <val> exp
1768:
1769: %right '='
1770: %left '-' '+'
1771: %left '*' '/'
1772: %left NEG /* Negation--unary minus */
1773: %right '^' /* Exponentiation */
1774:
1775: /* Grammar follows */
1776:
1777: %%
1778: @end smallexample
1779:
1780: The above grammar introduces only two new features of the Bison language.
1781: These features allow semantic values to have various data types
1782: (@pxref{Multiple Types, ,More Than One Value Type}).
1783:
1784: The @code{%union} declaration specifies the entire list of possible types;
1785: this is instead of defining @code{YYSTYPE}. The allowable types are now
1786: double-floats (for @code{exp} and @code{NUM}) and pointers to entries in
1787: the symbol table. @xref{Union Decl, ,The Collection of Value Types}.
1788:
1789: Since values can now have various types, it is necessary to associate a
1790: type with each grammar symbol whose semantic value is used. These symbols
1791: are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their
1792: declarations are augmented with information about their data type (placed
1793: between angle brackets).
1794:
1795: The Bison construct @code{%type} is used for declaring nonterminal symbols,
1796: just as @code{%token} is used for declaring token types. We have not used
1797: @code{%type} before because nonterminal symbols are normally declared
1798: implicitly by the rules that define them. But @code{exp} must be declared
1799: explicitly so we can specify its value type. @xref{Type Decl, ,Nonterminal Symbols}.
1800:
1801: @node Mfcalc Rules, Mfcalc Symtab, Mfcalc Decl, Multi-function Calc
1802: @subsection Grammar Rules for @code{mfcalc}
1803:
1804: Here are the grammar rules for the multi-function calculator.
1805: Most of them are copied directly from @code{calc}; three rules,
1806: those which mention @code{VAR} or @code{FNCT}, are new.
1807:
1808: @smallexample
1809: input: /* empty */
1810: | input line
1811: ;
1812:
1813: line:
1814: '\n'
1815: | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1816: | error '\n' @{ yyerrok; @}
1817: ;
1818:
1819: exp: NUM @{ $$ = $1; @}
1820: | VAR @{ $$ = $1->value.var; @}
1821: | VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
1822: | FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
1823: | exp '+' exp @{ $$ = $1 + $3; @}
1824: | exp '-' exp @{ $$ = $1 - $3; @}
1825: | exp '*' exp @{ $$ = $1 * $3; @}
1826: | exp '/' exp @{ $$ = $1 / $3; @}
1827: | '-' exp %prec NEG @{ $$ = -$2; @}
1828: | exp '^' exp @{ $$ = pow ($1, $3); @}
1829: | '(' exp ')' @{ $$ = $2; @}
1830: ;
1831: /* End of grammar */
1832: %%
1833: @end smallexample
1834:
1835: @node Mfcalc Symtab, , Mfcalc Rules, Multi-function Calc
1836: @subsection The @code{mfcalc} Symbol Table
1837: @cindex symbol table example
1838:
1839: The multi-function calculator requires a symbol table to keep track of the
1840: names and meanings of variables and functions. This doesn't affect the
1841: grammar rules (except for the actions) or the Bison declarations, but it
1842: requires some additional C functions for support.
1843:
1844: The symbol table itself consists of a linked list of records. Its
1845: definition, which is kept in the header @file{calc.h}, is as follows. It
1846: provides for either functions or variables to be placed in the table.
1847:
1848: @smallexample
1849: @group
1850: /* Data type for links in the chain of symbols. */
1851: struct symrec
1852: @{
1853: char *name; /* name of symbol */
1854: int type; /* type of symbol: either VAR or FNCT */
1855: union @{
1856: double var; /* value of a VAR */
1857: double (*fnctptr)(); /* value of a FNCT */
1858: @} value;
1859: struct symrec *next; /* link field */
1860: @};
1861: @end group
1862:
1863: @group
1864: typedef struct symrec symrec;
1865:
1866: /* The symbol table: a chain of `struct symrec'. */
1867: extern symrec *sym_table;
1868:
1869: symrec *putsym ();
1870: symrec *getsym ();
1871: @end group
1872: @end smallexample
1873:
1874: The new version of @code{main} includes a call to @code{init_table}, a
1875: function that initializes the symbol table. Here it is, and
1876: @code{init_table} as well:
1877:
1878: @smallexample
1879: @group
1880: #include <stdio.h>
1881:
1882: main ()
1883: @{
1884: init_table ();
1885: yyparse ();
1886: @}
1887: @end group
1888:
1889: @group
1890: yyerror (s) /* Called by yyparse on error */
1891: char *s;
1892: @{
1893: printf ("%s\n", s);
1894: @}
1895:
1896: struct init
1897: @{
1898: char *fname;
1899: double (*fnct)();
1900: @};
1901: @end group
1902:
1903: @group
1904: struct init arith_fncts[]
1905: = @{
1906: "sin", sin,
1907: "cos", cos,
1908: "atan", atan,
1909: "ln", log,
1910: "exp", exp,
1911: "sqrt", sqrt,
1912: 0, 0
1913: @};
1914:
1915: /* The symbol table: a chain of `struct symrec'. */
1916: symrec *sym_table = (symrec *)0;
1917: @end group
1918:
1919: @group
1920: init_table () /* puts arithmetic functions in table. */
1921: @{
1922: int i;
1923: symrec *ptr;
1924: for (i = 0; arith_fncts[i].fname != 0; i++)
1925: @{
1926: ptr = putsym (arith_fncts[i].fname, FNCT);
1927: ptr->value.fnctptr = arith_fncts[i].fnct;
1928: @}
1929: @}
1930: @end group
1931: @end smallexample
1932:
1933: By simply editing the initialization list and adding the necessary include
1934: files, you can add additional functions to the calculator.
1935:
1936: Two important functions allow look-up and installation of symbols in the
1937: symbol table. The function @code{putsym} is passed a name and the type
1938: (@code{VAR} or @code{FNCT}) of the object to be installed. The object is
1939: linked to the front of the list, and a pointer to the object is returned.
1940: The function @code{getsym} is passed the name of the symbol to look up. If
1941: found, a pointer to that symbol is returned; otherwise zero is returned.
1942:
1943: @smallexample
1944: symrec *
1945: putsym (sym_name,sym_type)
1946: char *sym_name;
1947: int sym_type;
1948: @{
1949: symrec *ptr;
1950: ptr = (symrec *) malloc (sizeof (symrec));
1951: ptr->name = (char *) malloc (strlen (sym_name) + 1);
1952: strcpy (ptr->name,sym_name);
1953: ptr->type = sym_type;
1954: ptr->value.var = 0; /* set value to 0 even if fctn. */
1955: ptr->next = (struct symrec *)sym_table;
1956: sym_table = ptr;
1957: return ptr;
1958: @}
1959:
1960: symrec *
1961: getsym (sym_name)
1962: char *sym_name;
1963: @{
1964: symrec *ptr;
1965: for (ptr = sym_table; ptr != (symrec *) 0;
1966: ptr = (symrec *)ptr->next)
1967: if (strcmp (ptr->name,sym_name) == 0)
1968: return ptr;
1969: return 0;
1970: @}
1971: @end smallexample
1972:
1973: The function @code{yylex} must now recognize variables, numeric values, and
1974: the single-character arithmetic operators. Strings of alphanumeric
1975: characters with a leading nondigit are recognized as either variables or
1976: functions depending on what the symbol table says about them.
1977:
1978: The string is passed to @code{getsym} for look up in the symbol table. If
1979: the name appears in the table, a pointer to its location and its type
1980: (@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not
1981: already in the table, then it is installed as a @code{VAR} using
1982: @code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is
1983: returned to @code{yyparse}.@refill
1984:
1985: No change is needed in the handling of numeric values and arithmetic
1986: operators in @code{yylex}.
1987:
1988: @smallexample
1989: @group
1990: #include <ctype.h>
1991: yylex ()
1992: @{
1993: int c;
1994:
1995: /* Ignore whitespace, get first nonwhite character. */
1996: while ((c = getchar ()) == ' ' || c == '\t');
1997:
1998: if (c == EOF)
1999: return 0;
2000: @end group
2001:
2002: @group
2003: /* Char starts a number => parse the number. */
2004: if (c == '.' || isdigit (c))
2005: @{
2006: ungetc (c, stdin);
2007: scanf ("%lf", &yylval.val);
2008: return NUM;
2009: @}
2010: @end group
2011:
2012: @group
2013: /* Char starts an identifier => read the name. */
2014: if (isalpha (c))
2015: @{
2016: symrec *s;
2017: static char *symbuf = 0;
2018: static int length = 0;
2019: int i;
2020: @end group
2021:
2022: @group
2023: /* Initially make the buffer long enough
2024: for a 40-character symbol name. */
2025: if (length == 0)
2026: length = 40, symbuf = (char *)malloc (length + 1);
2027:
2028: i = 0;
2029: do
2030: @end group
2031: @group
2032: @{
2033: /* If buffer is full, make it bigger. */
2034: if (i == length)
2035: @{
2036: length *= 2;
2037: symbuf = (char *)realloc (symbuf, length + 1);
2038: @}
2039: /* Add this character to the buffer. */
2040: symbuf[i++] = c;
2041: /* Get another character. */
2042: c = getchar ();
2043: @}
2044: @end group
2045: @group
2046: while (c != EOF && isalnum (c));
2047:
2048: ungetc (c, stdin);
2049: symbuf[i] = '\0';
2050: @end group
2051:
2052: @group
2053: s = getsym (symbuf);
2054: if (s == 0)
2055: s = putsym (symbuf, VAR);
2056: yylval.tptr = s;
2057: return s->type;
2058: @}
2059:
2060: /* Any other character is a token by itself. */
2061: return c;
2062: @}
2063: @end group
2064: @end smallexample
2065:
2066: This program is both powerful and flexible. You may easily add new
2067: functions, and it is a simple job to modify this code to install predefined
2068: variables such as @code{pi} or @code{e} as well.
2069:
2070: @node Exercises, , Multi-function Calc, Examples
2071: @section Exercises
2072: @cindex exercises
2073:
2074: @enumerate
2075: @item
2076: Add some new functions from @file{math.h} to the initialization list.
2077:
2078: @item
2079: Add another array that contains constants and their values. Then
2080: modify @code{init_table} to add these constants to the symbol table.
2081: It will be easiest to give the constants type @code{VAR}.
2082:
2083: @item
2084: Make the program report an error if the user refers to an
2085: uninitialized variable in any way except to store a value in it.
2086: @end enumerate
2087:
2088: @node Grammar File, Interface, Examples, Top
2089: @chapter Bison Grammar Files
2090:
2091: Bison takes as input a context-free grammar specification and produces a
2092: C-language function that recognizes correct instances of the grammar.
2093:
2094: The Bison grammar input file conventionally has a name ending in @samp{.y}.
2095:
2096: @menu
2097: * Grammar Outline:: Overall layout of the grammar file.
2098: * Symbols:: Terminal and nonterminal symbols.
2099: * Rules:: How to write grammar rules.
2100: * Recursion:: Writing recursive rules.
2101: * Semantics:: Semantic values and actions.
2102: * Declarations:: All kinds of Bison declarations are described here.
2103: * Multiple Parsers:: Putting more than one Bison parser in one program.
2104: @end menu
2105:
2106: @node Grammar Outline, Symbols, , Grammar File
2107: @section Outline of a Bison Grammar
2108:
2109: A Bison grammar file has four main sections, shown here with the
2110: appropriate delimiters:
2111:
2112: @example
2113: %@{
2114: @var{C declarations}
2115: %@}
2116:
2117: @var{Bison declarations}
2118:
2119: %%
2120: @var{Grammar rules}
2121: %%
2122:
2123: @var{Additional C code}
2124: @end example
2125:
2126: Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
2127:
2128: @menu
2129: * C Declarations:: Syntax and usage of the C declarations section.
2130: * Bison Declarations:: Syntax and usage of the Bison declarations section.
2131: * Grammar Rules:: Syntax and usage of the grammar rules section.
2132: * C Code:: Syntax and usage of the additional C code section.
2133: @end menu
2134:
2135: @node C Declarations, Bison Declarations, , Grammar Outline
2136: @subsection The C Declarations Section
2137: @cindex C declarations section
2138: @cindex declarations, C
2139:
2140: The @var{C declarations} section contains macro definitions and
2141: declarations of functions and variables that are used in the actions in the
2142: grammar rules. These are copied to the beginning of the parser file so
2143: that they precede the definition of @code{yyparse}. You can use
2144: @samp{#include} to get the declarations from a header file. If you don't
2145: need any C declarations, you may omit the @samp{%@{} and @samp{%@}}
2146: delimiters that bracket this section.
2147:
2148: @node Bison Declarations, Grammar Rules, C Declarations, Grammar Outline
2149: @subsection The Bison Declarations Section
2150: @cindex Bison declarations (introduction)
2151: @cindex declarations, Bison (introduction)
2152:
2153: The @var{Bison declarations} section contains declarations that define
2154: terminal and nonterminal symbols, specify precedence, and so on.
2155: In some simple grammars you may not need any declarations.
2156: @xref{Declarations, ,Bison Declarations}.
2157:
2158: @node Grammar Rules, C Code, Bison Declarations, Grammar Outline
2159: @subsection The Grammar Rules Section
2160: @cindex grammar rules section
2161: @cindex rules section for grammar
2162:
2163: The @dfn{grammar rules} section contains one or more Bison grammar
2164: rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}.
2165:
2166: There must always be at least one grammar rule, and the first
2167: @samp{%%} (which precedes the grammar rules) may never be omitted even
2168: if it is the first thing in the file.
2169:
2170: @node C Code, , Grammar Rules, Grammar Outline
2171: @subsection The Additional C Code Section
2172: @cindex additional C code section
2173: @cindex C code, section for additional
2174:
2175: The @var{additional C code} section is copied verbatim to the end of
2176: the parser file, just as the @var{C declarations} section is copied to
2177: the beginning. This is the most convenient place to put anything
2178: that you want to have in the parser file but which need not come before
2179: the definition of @code{yyparse}. For example, the definitions of
2180: @code{yylex} and @code{yyerror} often go here. @xref{Interface, ,Parser C-Language Interface}.
2181:
2182: If the last section is empty, you may omit the @samp{%%} that separates it
2183: from the grammar rules.
2184:
2185: The Bison parser itself contains many static variables whose names start
2186: with @samp{yy} and many macros whose names start with @samp{YY}. It is a
2187: good idea to avoid using any such names (except those documented in this
2188: manual) in the additional C code section of the grammar file.
2189:
2190: @node Symbols, Rules, Grammar Outline, Grammar File
2191: @section Symbols, Terminal and Nonterminal
2192: @cindex nonterminal symbol
2193: @cindex terminal symbol
2194: @cindex token type
2195: @cindex symbol
2196:
2197: @dfn{Symbols} in Bison grammars represent the grammatical classifications
2198: of the language.
2199:
2200: A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a
2201: class of syntactically equivalent tokens. You use the symbol in grammar
2202: rules to mean that a token in that class is allowed. The symbol is
2203: represented in the Bison parser by a numeric code, and the @code{yylex}
2204: function returns a token type code to indicate what kind of token has been
2205: read. You don't need to know what the code value is; you can use the
2206: symbol to stand for it.
2207:
2208: A @dfn{nonterminal symbol} stands for a class of syntactically equivalent
2209: groupings. The symbol name is used in writing grammar rules. By convention,
2210: it should be all lower case.
2211:
2212: Symbol names can contain letters, digits (not at the beginning),
2213: underscores and periods. Periods make sense only in nonterminals.
2214:
2215: There are two ways of writing terminal symbols in the grammar:
2216:
2217: @itemize @bullet
2218: @item
2219: A @dfn{named token type} is written with an identifier, like an
2220: identifier in C. By convention, it should be all upper case. Each
2221: such name must be defined with a Bison declaration such as
2222: @code{%token}. @xref{Token Decl, ,Token Type Names}.
2223:
2224: @item
2225: @cindex character token
2226: @cindex literal token
2227: @cindex single-character literal
2228: A @dfn{character token type} (or @dfn{literal token}) is written in
2229: the grammar using the same syntax used in C for character constants;
2230: for example, @code{'+'} is a character token type. A character token
2231: type doesn't need to be declared unless you need to specify its
2232: semantic value data type (@pxref{Value Type, ,Data Types of Semantic Values}), associativity, or
2233: precedence (@pxref{Precedence, ,Operator Precedence}).
2234:
2235: By convention, a character token type is used only to represent a
2236: token that consists of that particular character. Thus, the token
2237: type @code{'+'} is used to represent the character @samp{+} as a
2238: token. Nothing enforces this convention, but if you depart from it,
2239: your program will confuse other readers.
2240:
2241: All the usual escape sequences used in character literals in C can be
2242: used in Bison as well, but you must not use the null character as a
2243: character literal because its ASCII code, zero, is the code
2244: @code{yylex} returns for end-of-input (@pxref{Calling Convention, ,Calling Convention for @code{yylex}}).
2245: @end itemize
2246:
2247: How you choose to write a terminal symbol has no effect on its
2248: grammatical meaning. That depends only on where it appears in rules and
2249: on when the parser function returns that symbol.
2250:
2251: The value returned by @code{yylex} is always one of the terminal symbols
2252: (or 0 for end-of-input). Whichever way you write the token type in the
2253: grammar rules, you write it the same way in the definition of @code{yylex}.
2254: The numeric code for a character token type is simply the ASCII code for
2255: the character, so @code{yylex} can use the identical character constant to
2256: generate the requisite code. Each named token type becomes a C macro in
2257: the parser file, so @code{yylex} can use the name to stand for the code.
2258: (This is why periods don't make sense in terminal symbols.)
2259: @xref{Calling Convention, ,Calling Convention for @code{yylex}}.
2260:
2261: If @code{yylex} is defined in a separate file, you need to arrange for the
2262: token-type macro definitions to be available there. Use the @samp{-d}
2263: option when you run Bison, so that it will write these macro definitions
2264: into a separate header file @file{@var{name}.tab.h} which you can include
2265: in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
2266:
2267: The symbol @code{error} is a terminal symbol reserved for error recovery
2268: (@pxref{Error Recovery}); you shouldn't use it for any other purpose.
2269: In particular, @code{yylex} should never return this value.
2270:
2271: @node Rules, Recursion, Symbols, Grammar File
2272: @section Syntax of Grammar Rules
2273: @cindex rule syntax
2274: @cindex grammar rule syntax
2275: @cindex syntax of grammar rules
2276:
2277: A Bison grammar rule has the following general form:
2278:
2279: @example
2280: @var{result}: @var{components}@dots{}
2281: ;
2282: @end example
2283:
2284: @noindent
2285: where @var{result} is the nonterminal symbol that this rule describes
2286: and @var{components} are various terminal and nonterminal symbols that
2287: are put together by this rule (@pxref{Symbols}).
2288:
2289: For example,
2290:
2291: @example
2292: @group
2293: exp: exp '+' exp
2294: ;
2295: @end group
2296: @end example
2297:
2298: @noindent
2299: says that two groupings of type @code{exp}, with a @samp{+} token in between,
2300: can be combined into a larger grouping of type @code{exp}.
2301:
2302: Whitespace in rules is significant only to separate symbols. You can add
2303: extra whitespace as you wish.
2304:
2305: Scattered among the components can be @var{actions} that determine
2306: the semantics of the rule. An action looks like this:
2307:
2308: @example
2309: @{@var{C statements}@}
2310: @end example
2311:
2312: @noindent
2313: Usually there is only one action and it follows the components.
2314: @xref{Actions}.
2315:
2316: @findex |
2317: Multiple rules for the same @var{result} can be written separately or can
2318: be joined with the vertical-bar character @samp{|} as follows:
2319:
2320: @ifinfo
2321: @example
2322: @var{result}: @var{rule1-components}@dots{}
2323: | @var{rule2-components}@dots{}
2324: @dots{}
2325: ;
2326: @end example
2327: @end ifinfo
2328: @iftex
2329: @example
2330: @group
2331: @var{result}: @var{rule1-components}@dots{}
2332: | @var{rule2-components}@dots{}
2333: @dots{}
2334: ;
2335: @end group
2336: @end example
2337: @end iftex
2338:
2339: @noindent
2340: They are still considered distinct rules even when joined in this way.
2341:
2342: If @var{components} in a rule is empty, it means that @var{result} can
2343: match the empty string. For example, here is how to define a
2344: comma-separated sequence of zero or more @code{exp} groupings:
2345:
2346: @example
2347: @group
2348: expseq: /* empty */
2349: | expseq1
2350: ;
2351: @end group
2352:
2353: @group
2354: expseq1: exp
2355: | expseq1 ',' exp
2356: ;
2357: @end group
2358: @end example
2359:
2360: @noindent
2361: It is customary to write a comment @samp{/* empty */} in each rule
2362: with no components.
2363:
2364: @node Recursion, Semantics, Rules, Grammar File
2365: @section Recursive Rules
2366: @cindex recursive rule
2367:
2368: A rule is called @dfn{recursive} when its @var{result} nonterminal appears
2369: also on its right hand side. Nearly all Bison grammars need to use
2370: recursion, because that is the only way to define a sequence of any number
2371: of somethings. Consider this recursive definition of a comma-separated
2372: sequence of one or more expressions:
2373:
2374: @example
2375: @group
2376: expseq1: exp
2377: | expseq1 ',' exp
2378: ;
2379: @end group
2380: @end example
2381:
2382: @cindex left recursion
2383: @cindex right recursion
2384: @noindent
2385: Since the recursive use of @code{expseq1} is the leftmost symbol in the
2386: right hand side, we call this @dfn{left recursion}. By contrast, here
2387: the same construct is defined using @dfn{right recursion}:
2388:
2389: @example
2390: @group
2391: expseq1: exp
2392: | exp ',' expseq1
2393: ;
2394: @end group
2395: @end example
2396:
2397: @noindent
2398: Any kind of sequence can be defined using either left recursion or
2399: right recursion, but you should always use left recursion, because it
2400: can parse a sequence of any number of elements with bounded stack
2401: space. Right recursion uses up space on the Bison stack in proportion
2402: to the number of elements in the sequence, because all the elements
2403: must be shifted onto the stack before the rule can be applied even
2404: once. @xref{Algorithm, ,The Bison Parser Algorithm }, for
2405: further explanation of this.
2406:
2407: @cindex mutual recursion
2408: @dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
2409: rule does not appear directly on its right hand side, but does appear
2410: in rules for other nonterminals which do appear on its right hand
2411: side.
2412:
2413: For example:
2414:
2415: @example
2416: @group
2417: expr: primary
2418: | primary '+' primary
2419: ;
2420: @end group
2421:
2422: @group
2423: primary: constant
2424: | '(' expr ')'
2425: ;
2426: @end group
2427: @end example
2428:
2429: @noindent
2430: defines two mutually-recursive nonterminals, since each refers to the
2431: other.
2432:
2433: @node Semantics, Declarations, Recursion, Grammar File
2434: @section Defining Language Semantics
2435: @cindex defining language semantics
2436: @cindex language semantics, defining
2437:
2438: The grammar rules for a language determine only the syntax. The semantics
2439: are determined by the semantic values associated with various tokens and
2440: groupings, and by the actions taken when various groupings are recognized.
2441:
2442: For example, the calculator calculates properly because the value
2443: associated with each expression is the proper number; it adds properly
2444: because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
2445: the numbers associated with @var{x} and @var{y}.
2446:
2447: @menu
2448: * Value Type:: Specifying one data type for all semantic values.
2449: * Multiple Types:: Specifying several alternative data types.
2450: * Actions:: An action is the semantic definition of a grammar rule.
2451: * Action Types:: Specifying data types for actions to operate on.
2452: * Mid-Rule Actions:: Most actions go at the end of a rule.
2453: This says when, why and how to use the exceptional
2454: action in the middle of a rule.
2455: @end menu
2456:
2457: @node Value Type, Multiple Types, , Semantics
2458: @subsection Data Types of Semantic Values
2459: @cindex semantic value type
2460: @cindex value type, semantic
2461: @cindex data types of semantic values
2462: @cindex default data type
2463:
2464: In a simple program it may be sufficient to use the same data type for
2465: the semantic values of all language constructs. This was true in the
2466: RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish Notation Calculator}).
2467:
2468: Bison's default is to use type @code{int} for all semantic values. To
2469: specify some other type, define @code{YYSTYPE} as a macro, like this:
2470:
2471: @example
2472: #define YYSTYPE double
2473: @end example
2474:
2475: @noindent
2476: This macro definition must go in the C declarations section of the grammar
2477: file (@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
2478:
2479: @node Multiple Types, Actions, Value Type, Semantics
2480: @subsection More Than One Value Type
2481:
2482: In most programs, you will need different data types for different kinds
2483: of tokens and groupings. For example, a numeric constant may need type
2484: @code{int} or @code{long}, while a string constant needs type @code{char *},
2485: and an identifier might need a pointer to an entry in the symbol table.
2486:
2487: To use more than one data type for semantic values in one parser, Bison
2488: requires you to do two things:
2489:
2490: @itemize @bullet
2491: @item
2492: Specify the entire collection of possible data types, with the
2493: @code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of Value Types}).
2494:
2495: @item
2496: Choose one of those types for each symbol (terminal or nonterminal)
2497: for which semantic values are used. This is done for tokens with the
2498: @code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) and for groupings
2499: with the @code{%type} Bison declaration (@pxref{Type Decl, ,Nonterminal Symbols}).
2500: @end itemize
2501:
2502: @node Actions, Action Types, Multiple Types, Semantics
2503: @subsection Actions
2504: @cindex action
2505: @vindex $$
2506: @vindex $@var{n}
2507:
2508: An action accompanies a syntactic rule and contains C code to be executed
2509: each time an instance of that rule is recognized. The task of most actions
2510: is to compute a semantic value for the grouping built by the rule from the
2511: semantic values associated with tokens or smaller groupings.
2512:
2513: An action consists of C statements surrounded by braces, much like a
2514: compound statement in C. It can be placed at any position in the rule; it
2515: is executed at that position. Most rules have just one action at the end
2516: of the rule, following all the components. Actions in the middle of a rule
2517: are tricky and used only for special purposes (@pxref{Mid-Rule Actions, ,Actions in Mid-Rule}).
2518:
2519: The C code in an action can refer to the semantic values of the components
2520: matched by the rule with the construct @code{$@var{n}}, which stands for
2521: the value of the @var{n}th component. The semantic value for the grouping
2522: being constructed is @code{$$}. (Bison translates both of these constructs
2523: into array element references when it copies the actions into the parser
2524: file.)
2525:
2526: Here is a typical example:
2527:
2528: @example
2529: @group
2530: exp: @dots{}
2531: | exp '+' exp
2532: @{ $$ = $1 + $3; @}
2533: @end group
2534: @end example
2535:
2536: @noindent
2537: This rule constructs an @code{exp} from two smaller @code{exp} groupings
2538: connected by a plus-sign token. In the action, @code{$1} and @code{$3}
2539: refer to the semantic values of the two component @code{exp} groupings,
2540: which are the first and third symbols on the right hand side of the rule.
2541: The sum is stored into @code{$$} so that it becomes the semantic value of
2542: the addition-expression just recognized by the rule. If there were a
2543: useful semantic value associated with the @samp{+} token, it could be
2544: referred to as @code{$2}.@refill
2545:
2546: @cindex default action
2547: If you don't specify an action for a rule, Bison supplies a default:
2548: @w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule becomes
2549: the value of the whole rule. Of course, the default rule is valid only
2550: if the two data types match. There is no meaningful default action for
2551: an empty rule; every empty rule must have an explicit action unless the
2552: rule's value does not matter.
2553:
2554: @code{$@var{n}} with @var{n} zero or negative is allowed for reference
2555: to tokens and groupings on the stack @emph{before} those that match the
2556: current rule. This is a very risky practice, and to use it reliably
2557: you must be certain of the context in which the rule is applied. Here
2558: is a case in which you can use this reliably:
2559:
2560: @example
2561: @group
2562: foo: expr bar '+' expr @{ @dots{} @}
2563: | expr bar '-' expr @{ @dots{} @}
2564: ;
2565: @end group
2566:
2567: @group
2568: bar: /* empty */
2569: @{ previous_expr = $0; @}
2570: ;
2571: @end group
2572: @end example
2573:
2574: As long as @code{bar} is used only in the fashion shown here, @code{$0}
2575: always refers to the @code{expr} which precedes @code{bar} in the
2576: definition of @code{foo}.
2577:
2578: @node Action Types, Mid-Rule Actions, Actions, Semantics
2579: @subsection Data Types of Values in Actions
2580: @cindex action data types
2581: @cindex data types in actions
2582:
2583: If you have chosen a single data type for semantic values, the @code{$$}
2584: and @code{$@var{n}} constructs always have that data type.
2585:
2586: If you have used @code{%union} to specify a variety of data types, then you
2587: must declare a choice among these types for each terminal or nonterminal
2588: symbol that can have a semantic value. Then each time you use @code{$$} or
2589: @code{$@var{n}}, its data type is determined by which symbol it refers to
2590: in the rule. In this example,@refill
2591:
2592: @example
2593: @group
2594: exp: @dots{}
2595: | exp '+' exp
2596: @{ $$ = $1 + $3; @}
2597: @end group
2598: @end example
2599:
2600: @noindent
2601: @code{$1} and @code{$3} refer to instances of @code{exp}, so they all
2602: have the data type declared for the nonterminal symbol @code{exp}. If
2603: @code{$2} were used, it would have the data type declared for the
2604: terminal symbol @code{'+'}, whatever that might be.@refill
2605:
2606: Alternatively, you can specify the data type when you refer to the value,
2607: by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
2608: reference. For example, if you have defined types as shown here:
2609:
2610: @example
2611: @group
2612: %union @{
2613: int itype;
2614: double dtype;
2615: @}
2616: @end group
2617: @end example
2618:
2619: @noindent
2620: then you can write @code{$<itype>1} to refer to the first subunit of the
2621: rule as an integer, or @code{$<dtype>1} to refer to it as a double.
2622:
2623: @node Mid-Rule Actions, , Action Types, Semantics
2624: @subsection Actions in Mid-Rule
2625: @cindex actions in mid-rule
2626: @cindex mid-rule actions
2627:
2628: Occasionally it is useful to put an action in the middle of a rule.
2629: These actions are written just like usual end-of-rule actions, but they
2630: are executed before the parser even recognizes the following components.
2631:
2632: A mid-rule action may refer to the components preceding it using
2633: @code{$@var{n}}, but it may not refer to subsequent components because
2634: it is run before they are parsed.
2635:
2636: The mid-rule action itself counts as one of the components of the rule.
2637: This makes a difference when there is another action later in the same rule
2638: (and usually there is another at the end): you have to count the actions
2639: along with the symbols when working out which number @var{n} to use in
2640: @code{$@var{n}}.
2641:
2642: The mid-rule action can also have a semantic value. The action can set
2643: its value with an assignment to @code{$$}, and actions later in the rule
2644: can refer to the value using @code{$@var{n}}. Since there is no symbol
2645: to name the action, there is no way to declare a data type for the value
2646: in advance, so you must use the @samp{$<@dots{}>} construct to specify a
2647: data type each time you refer to this value.
2648:
2649: There is no way to set the value of the entire rule with a mid-rule
2650: action, because assignments to @code{$$} do not have that effect. The
2651: only way to set the value for the entire rule is with an ordinary action
2652: at the end of the rule.
2653:
2654: Here is an example from a hypothetical compiler, handling a @code{let}
2655: statement that looks like @samp{let (@var{variable}) @var{statement}} and
2656: serves to create a variable named @var{variable} temporarily for the
2657: duration of @var{statement}. To parse this construct, we must put
2658: @var{variable} into the symbol table while @var{statement} is parsed, then
2659: remove it afterward. Here is how it is done:
2660:
2661: @example
2662: @group
2663: stmt: LET '(' var ')'
2664: @{ $<context>$ = push_context ();
2665: declare_variable ($3); @}
2666: stmt @{ $$ = $6;
2667: pop_context ($<context>5); @}
2668: @end group
2669: @end example
2670:
2671: @noindent
2672: As soon as @samp{let (@var{variable})} has been recognized, the first
2673: action is run. It saves a copy of the current semantic context (the
2674: list of accessible variables) as its semantic value, using alternative
2675: @code{context} in the data-type union. Then it calls
2676: @code{declare_variable} to add the new variable to that list. Once the
2677: first action is finished, the embedded statement @code{stmt} can be
2678: parsed. Note that the mid-rule action is component number 5, so the
2679: @samp{stmt} is component number 6.
2680:
2681: After the embedded statement is parsed, its semantic value becomes the
2682: value of the entire @code{let}-statement. Then the semantic value from the
2683: earlier action is used to restore the prior list of variables. This
2684: removes the temporary @code{let}-variable from the list so that it won't
2685: appear to exist while the rest of the program is parsed.
2686:
2687: Taking action before a rule is completely recognized often leads to
2688: conflicts since the parser must commit to a parse in order to execute the
2689: action. For example, the following two rules, without mid-rule actions,
2690: can coexist in a working parser because the parser can shift the open-brace
2691: token and look at what follows before deciding whether there is a
2692: declaration or not:
2693:
2694: @example
2695: @group
2696: compound: '@{' declarations statements '@}'
2697: | '@{' statements '@}'
2698: ;
2699: @end group
2700: @end example
2701:
2702: @noindent
2703: But when we add a mid-rule action as follows, the rules become nonfunctional:
2704:
2705: @example
2706: @group
2707: compound: @{ prepare_for_local_variables (); @}
2708: '@{' declarations statements '@}'
2709: @end group
2710: @group
2711: | '@{' statements '@}'
2712: ;
2713: @end group
2714: @end example
2715:
2716: @noindent
2717: Now the parser is forced to decide whether to run the mid-rule action
2718: when it has read no farther than the open-brace. In other words, it
2719: must commit to using one rule or the other, without sufficient
2720: information to do it correctly. (The open-brace token is what is called
2721: the @dfn{look-ahead} token at this time, since the parser is still
2722: deciding what to do about it. @xref{Look-Ahead, ,Look-Ahead Tokens}.)
2723:
2724: You might think that you could correct the problem by putting identical
2725: actions into the two rules, like this:
2726:
2727: @example
2728: @group
2729: compound: @{ prepare_for_local_variables (); @}
2730: '@{' declarations statements '@}'
2731: | @{ prepare_for_local_variables (); @}
2732: '@{' statements '@}'
2733: ;
2734: @end group
2735: @end example
2736:
2737: @noindent
2738: But this does not help, because Bison does not realize that the two actions
2739: are identical. (Bison never tries to understand the C code in an action.)
2740:
2741: If the grammar is such that a declaration can be distinguished from a
2742: statement by the first token (which is true in C), then one solution which
2743: does work is to put the action after the open-brace, like this:
2744:
2745: @example
2746: @group
2747: compound: '@{' @{ prepare_for_local_variables (); @}
2748: declarations statements '@}'
2749: | '@{' statements '@}'
2750: ;
2751: @end group
2752: @end example
2753:
2754: @noindent
2755: Now the first token of the following declaration or statement,
2756: which would in any case tell Bison which rule to use, can still do so.
2757:
2758: Another solution is to bury the action inside a nonterminal symbol which
2759: serves as a subroutine:
2760:
2761: @example
2762: @group
2763: subroutine: /* empty */
2764: @{ prepare_for_local_variables (); @}
2765: ;
2766:
2767: @end group
2768:
2769: @group
2770: compound: subroutine
2771: '@{' declarations statements '@}'
2772: | subroutine
2773: '@{' statements '@}'
2774: ;
2775: @end group
2776: @end example
2777:
2778: @noindent
2779: Now Bison can execute the action in the rule for @code{subroutine} without
2780: deciding which rule for @code{compound} it will eventually use. Note that
2781: the action is now at the end of its rule. Any mid-rule action can be
2782: converted to an end-of-rule action in this way, and this is what Bison
2783: actually does to implement mid-rule actions.
2784:
2785: @node Declarations, Multiple Parsers, Semantics, Grammar File
2786: @section Bison Declarations
2787: @cindex declarations, Bison
2788: @cindex Bison declarations
2789:
2790: The @dfn{Bison declarations} section of a Bison grammar defines the symbols
2791: used in formulating the grammar and the data types of semantic values.
2792: @xref{Symbols}.
2793:
2794: All token type names (but not single-character literal tokens such as
2795: @code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
2796: declared if you need to specify which data type to use for the semantic
2797: value (@pxref{Multiple Types, ,More Than One Value Type}).
2798:
2799: The first rule in the file also specifies the start symbol, by default.
2800: If you want some other symbol to be the start symbol, you must declare
2801: it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
2802:
2803: @menu
2804: * Token Decl:: Declaring terminal symbols.
2805: * Precedence Decl:: Declaring terminals with precedence and associativity.
2806: * Union Decl:: Declaring the set of all semantic value types.
2807: * Type Decl:: Declaring the choice of type for a nonterminal symbol.
2808: * Expect Decl:: Suppressing warnings about shift/reduce conflicts.
2809: * Start Decl:: Specifying the start symbol.
2810: * Pure Decl:: Requesting a reentrant parser.
2811: * Decl Summary:: Table of all Bison declarations.
2812: @end menu
2813:
2814: @node Token Decl, Precedence Decl, , Declarations
2815: @subsection Token Type Names
2816: @cindex declaring token type names
2817: @cindex token type names, declaring
2818: @findex %token
2819:
2820: The basic way to declare a token type name (terminal symbol) is as follows:
2821:
2822: @example
2823: %token @var{name}
2824: @end example
2825:
2826: Bison will convert this into a @code{#define} directive in
2827: the parser, so that the function @code{yylex} (if it is in this file)
2828: can use the name @var{name} to stand for this token type's code.
2829:
2830: Alternatively, you can use @code{%left}, @code{%right}, or @code{%nonassoc}
2831: instead of @code{%token}, if you wish to specify precedence.
2832: @xref{Precedence Decl, ,Operator Precedence}.
2833:
2834: You can explicitly specify the numeric code for a token type by appending
2835: an integer value in the field immediately following the token name:
2836:
2837: @example
2838: %token NUM 300
2839: @end example
2840:
2841: @noindent
2842: It is generally best, however, to let Bison choose the numeric codes for
2843: all token types. Bison will automatically select codes that don't conflict
2844: with each other or with ASCII characters.
2845:
2846: In the event that the stack type is a union, you must augment the
2847: @code{%token} or other token declaration to include the data type
2848: alternative delimited by angle-brackets (@pxref{Multiple Types, ,More Than One Value Type}).
2849:
2850: For example:
2851:
2852: @example
2853: @group
2854: %union @{ /* define stack type */
2855: double val;
2856: symrec *tptr;
2857: @}
2858: %token <val> NUM /* define token NUM and its type */
2859: @end group
2860: @end example
2861:
2862: @node Precedence Decl, Union Decl, Token Decl, Declarations
2863: @subsection Operator Precedence
2864: @cindex precedence declarations
2865: @cindex declaring operator precedence
2866: @cindex operator precedence, declaring
2867:
2868: Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to
2869: declare a token and specify its precedence and associativity, all at
2870: once. These are called @dfn{precedence declarations}.
2871: @xref{Precedence, ,Operator Precedence}, for general information on operator precedence.
2872:
2873: The syntax of a precedence declaration is the same as that of
2874: @code{%token}: either
2875:
2876: @example
2877: %left @var{symbols}@dots{}
2878: @end example
2879:
2880: @noindent
2881: or
2882:
2883: @example
2884: %left <@var{type}> @var{symbols}@dots{}
2885: @end example
2886:
2887: And indeed any of these declarations serves the purposes of @code{%token}.
2888: But in addition, they specify the associativity and relative precedence for
2889: all the @var{symbols}:
2890:
2891: @itemize @bullet
2892: @item
2893: The associativity of an operator @var{op} determines how repeated uses
2894: of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op}
2895: @var{z}} is parsed by grouping @var{x} with @var{y} first or by
2896: grouping @var{y} with @var{z} first. @code{%left} specifies
2897: left-associativity (grouping @var{x} with @var{y} first) and
2898: @code{%right} specifies right-associativity (grouping @var{y} with
2899: @var{z} first). @code{%nonassoc} specifies no associativity, which
2900: means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is
2901: considered a syntax error.
2902:
2903: @item
2904: The precedence of an operator determines how it nests with other operators.
2905: All the tokens declared in a single precedence declaration have equal
2906: precedence and nest together according to their associativity.
2907: When two tokens declared in different precedence declarations associate,
2908: the one declared later has the higher precedence and is grouped first.
2909: @end itemize
2910:
2911: @node Union Decl, Type Decl, Precedence Decl, Declarations
2912: @subsection The Collection of Value Types
2913: @cindex declaring value types
2914: @cindex value types, declaring
2915: @findex %union
2916:
2917: The @code{%union} declaration specifies the entire collection of possible
2918: data types for semantic values. The keyword @code{%union} is followed by a
2919: pair of braces containing the same thing that goes inside a @code{union} in
2920: C.
2921:
2922: For example:
2923:
2924: @example
2925: @group
2926: %union @{
2927: double val;
2928: symrec *tptr;
2929: @}
2930: @end group
2931: @end example
2932:
2933: @noindent
2934: This says that the two alternative types are @code{double} and @code{symrec
2935: *}. They are given names @code{val} and @code{tptr}; these names are used
2936: in the @code{%token} and @code{%type} declarations to pick one of the types
2937: for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
2938:
2939: Note that, unlike making a @code{union} declaration in C, you do not write
2940: a semicolon after the closing brace.
2941:
2942: @node Type Decl, Expect Decl, Union Decl, Declarations
2943: @subsection Nonterminal Symbols
2944: @cindex declaring value types, nonterminals
2945: @cindex value types, nonterminals, declaring
2946: @findex %type
2947:
2948: @noindent
2949: When you use @code{%union} to specify multiple value types, you must
2950: declare the value type of each nonterminal symbol for which values are
2951: used. This is done with a @code{%type} declaration, like this:
2952:
2953: @example
2954: %type <@var{type}> @var{nonterminal}@dots{}
2955: @end example
2956:
2957: @noindent
2958: Here @var{nonterminal} is the name of a nonterminal symbol, and @var{type}
2959: is the name given in the @code{%union} to the alternative that you want
2960: (@pxref{Union Decl, ,The Collection of Value Types}). You can give any number of nonterminal symbols in
2961: the same @code{%type} declaration, if they have the same value type. Use
2962: spaces to separate the symbol names.
2963:
2964: @node Expect Decl, Start Decl, Type Decl, Declarations
2965: @subsection Suppressing Conflict Warnings
2966: @cindex suppressing conflict warnings
2967: @cindex preventing warnings about conflicts
2968: @cindex warnings, preventing
2969: @cindex conflicts, suppressing warnings of
2970: @findex %expect
2971:
2972: Bison normally warns if there are any conflicts in the grammar
2973: (@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars have harmless shift/reduce
2974: conflicts which are resolved in a predictable way and would be difficult to
2975: eliminate. It is desirable to suppress the warning about these conflicts
2976: unless the number of conflicts changes. You can do this with the
2977: @code{%expect} declaration.
2978:
2979: The declaration looks like this:
2980:
2981: @example
2982: %expect @var{n}
2983: @end example
2984:
2985: Here @var{n} is a decimal integer. The declaration says there should be no
2986: warning if there are @var{n} shift/reduce conflicts and no reduce/reduce
2987: conflicts. The usual warning is given if there are either more or fewer
2988: conflicts, or if there are any reduce/reduce conflicts.
2989:
2990: In general, using @code{%expect} involves these steps:
2991:
2992: @itemize @bullet
2993: @item
2994: Compile your grammar without @code{%expect}. Use the @samp{-v} option
2995: to get a verbose list of where the conflicts occur. Bison will also
2996: print the number of conflicts.
2997:
2998: @item
2999: Check each of the conflicts to make sure that Bison's default
3000: resolution is what you really want. If not, rewrite the grammar and
3001: go back to the beginning.
3002:
3003: @item
3004: Add an @code{%expect} declaration, copying the number @var{n} from the
3005: number which Bison printed.
3006: @end itemize
3007:
3008: Now Bison will stop annoying you about the conflicts you have checked, but
3009: it will warn you again if changes in the grammar result in additional
3010: conflicts.
3011:
3012: @node Start Decl, Pure Decl, Expect Decl, Declarations
3013: @subsection The Start-Symbol
3014: @cindex declaring the start symbol
3015: @cindex start symbol, declaring
3016: @cindex default start symbol
3017: @findex %start
3018:
3019: Bison assumes by default that the start symbol for the grammar is the first
3020: nonterminal specified in the grammar specification section. The programmer
3021: may override this restriction with the @code{%start} declaration as follows:
3022:
3023: @example
3024: %start @var{symbol}
3025: @end example
3026:
3027: @node Pure Decl, Decl Summary, Start Decl, Declarations
3028: @subsection A Pure (Reentrant) Parser
3029: @cindex reentrant parser
3030: @cindex pure parser
3031: @findex %pure_parser
3032:
3033: A @dfn{reentrant} program is one which does not alter in the course of
3034: execution; in other words, it consists entirely of @dfn{pure} (read-only)
3035: code. Reentrancy is important whenever asynchronous execution is possible;
3036: for example, a nonreentrant program may not be safe to call from a signal
3037: handler. In systems with multiple threads of control, a nonreentrant
3038: program must be called only within interlocks.
3039:
3040: The Bison parser is not normally a reentrant program, because it uses
3041: statically allocated variables for communication with @code{yylex}. These
3042: variables include @code{yylval} and @code{yylloc}.
3043:
3044: The Bison declaration @code{%pure_parser} says that you want the parser
3045: to be reentrant. It looks like this:
3046:
3047: @example
3048: %pure_parser
3049: @end example
3050:
3051: The effect is that the two communication variables become local
3052: variables in @code{yyparse}, and a different calling convention is used for
3053: the lexical analyzer function @code{yylex}. @xref{Pure Calling, ,Calling for Pure Parsers}, for the
3054: details of this. The variable @code{yynerrs} also becomes local in
3055: @code{yyparse} (@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}). The convention for calling
3056: @code{yyparse} itself is unchanged.
3057:
3058: @node Decl Summary, , Pure Decl, Declarations
3059: @subsection Bison Declaration Summary
3060: @cindex Bison declaration summary
3061: @cindex declaration summary
3062: @cindex summary, Bison declaration
3063:
3064: Here is a summary of all Bison declarations:
3065:
3066: @table @code
3067: @item %union
3068: Declare the collection of data types that semantic values may have
3069: (@pxref{Union Decl, ,The Collection of Value Types}).
3070:
3071: @item %token
3072: Declare a terminal symbol (token type name) with no precedence
3073: or associativity specified (@pxref{Token Decl, ,Token Type Names}).
3074:
3075: @item %right
3076: Declare a terminal symbol (token type name) that is right-associative
3077: (@pxref{Precedence Decl, ,Operator Precedence}).
3078:
3079: @item %left
3080: Declare a terminal symbol (token type name) that is left-associative
3081: (@pxref{Precedence Decl, ,Operator Precedence}).
3082:
3083: @item %nonassoc
3084: Declare a terminal symbol (token type name) that is nonassociative
3085: (using it in a way that would be associative is a syntax error)
3086: (@pxref{Precedence Decl, ,Operator Precedence}).
3087:
3088: @item %type
3089: Declare the type of semantic values for a nonterminal symbol
3090: (@pxref{Type Decl, ,Nonterminal Symbols}).
3091:
3092: @item %start
3093: Specify the grammar's start symbol (@pxref{Start Decl, ,The Start-Symbol}).
3094:
3095: @item %expect
3096: Declare the expected number of shift-reduce conflicts
3097: (@pxref{Expect Decl, ,Suppressing Conflict Warnings}).
3098:
3099: @item %pure_parser
3100: Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
3101: @end table
3102:
3103: @node Multiple Parsers, , Declarations, Grammar File
3104: @section Multiple Parsers in the Same Program
3105:
3106: Most programs that use Bison parse only one language and therefore contain
3107: only one Bison parser. But what if you want to parse more than one
3108: language with the same program? Then you need to avoid a name conflict
3109: between different definitions of @code{yyparse}, @code{yylval}, and so on.
3110:
3111: The easy way to do this is to use the option @samp{-p @var{prefix}}
3112: (@pxref{Invocation, ,Invoking Bison}). This renames the interface functions and
3113: variables of the Bison parser to start with @var{prefix} instead of
3114: @samp{yy}. You can use this to give each parser distinct names that do
3115: not conflict.
3116:
3117: The precise list of symbols renamed is @code{yyparse}, @code{yylex},
3118: @code{yyerror}, @code{yylval}, @code{yychar} and @code{yydebug}. For
3119: example, if you use @samp{-p c}, the names become @code{cparse},
3120: @code{clex}, and so on.
3121:
3122: @strong{All the other variables and macros associated with Bison are not
3123: renamed.} These others are not global; there is no conflict if the same
3124: name is used in different parsers. For example, @code{YYSTYPE} is not
3125: renamed, but defining this in different ways in different parsers causes
3126: no trouble (@pxref{Value Type, ,Data Types of Semantic Values}).
3127:
3128: The @samp{-p} option works by adding macro definitions to the beginning
3129: of the parser source file, defining @code{yyparse} as
3130: @code{@var{prefix}parse}, and so on. This effectively substitutes one
3131: name for the other in the entire parser file.
3132:
3133: @node Interface, Algorithm, Grammar File, Top
3134: @chapter Parser C-Language Interface
3135: @cindex C-language interface
3136: @cindex interface
3137:
3138: The Bison parser is actually a C function named @code{yyparse}. Here we
3139: describe the interface conventions of @code{yyparse} and the other
3140: functions that it needs to use.
3141:
3142: Keep in mind that the parser uses many C identifiers starting with
3143: @samp{yy} and @samp{YY} for internal purposes. If you use such an
3144: identifier (aside from those in this manual) in an action or in additional
3145: C code in the grammar file, you are likely to run into trouble.
3146:
3147: @menu
3148: * Parser Function:: How to call @code{yyparse} and what it returns.
3149: * Lexical:: You must supply a function @code{yylex}
3150: which reads tokens.
3151: * Error Reporting:: You must supply a function @code{yyerror}.
3152: * Action Features:: Special features for use in actions.
3153: @end menu
3154:
3155: @node Parser Function, Lexical, , Interface
3156: @section The Parser Function @code{yyparse}
3157: @findex yyparse
3158:
3159: You call the function @code{yyparse} to cause parsing to occur. This
3160: function reads tokens, executes actions, and ultimately returns when it
3161: encounters end-of-input or an unrecoverable syntax error. You can also
3162: write an action which directs @code{yyparse} to return immediately without
3163: reading further.
3164:
3165: The value returned by @code{yyparse} is 0 if parsing was successful (return
3166: is due to end-of-input).
3167:
3168: The value is 1 if parsing failed (return is due to a syntax error).
3169:
3170: In an action, you can cause immediate return from @code{yyparse} by using
3171: these macros:
3172:
3173: @table @code
3174: @item YYACCEPT
3175: @findex YYACCEPT
3176: Return immediately with value 0 (to report success).
3177:
3178: @item YYABORT
3179: @findex YYABORT
3180: Return immediately with value 1 (to report failure).
3181: @end table
3182:
3183: @node Lexical, Error Reporting, Parser Function, Interface
3184: @section The Lexical Analyzer Function @code{yylex}
3185: @findex yylex
3186: @cindex lexical analyzer
3187:
3188: The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
3189: the input stream and returns them to the parser. Bison does not create
3190: this function automatically; you must write it so that @code{yyparse} can
3191: call it. The function is sometimes referred to as a lexical scanner.
3192:
3193: In simple programs, @code{yylex} is often defined at the end of the Bison
3194: grammar file. If @code{yylex} is defined in a separate source file, you
3195: need to arrange for the token-type macro definitions to be available there.
3196: To do this, use the @samp{-d} option when you run Bison, so that it will
3197: write these macro definitions into a separate header file
3198: @file{@var{name}.tab.h} which you can include in the other source files
3199: that need it. @xref{Invocation, ,Invoking Bison}.@refill
3200:
3201: @menu
3202: * Calling Convention:: How @code{yyparse} calls @code{yylex}.
3203: * Token Values:: How @code{yylex} must return the semantic value
3204: of the token it has read.
3205: * Token Positions:: How @code{yylex} must return the text position
3206: (line number, etc.) of the token, if the
3207: actions want that.
3208: * Pure Calling:: How the calling convention differs
3209: in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
3210: @end menu
3211:
3212: @node Calling Convention, Token Values, , Lexical
3213: @subsection Calling Convention for @code{yylex}
3214:
3215: The value that @code{yylex} returns must be the numeric code for the type
3216: of token it has just found, or 0 for end-of-input.
3217:
3218: When a token is referred to in the grammar rules by a name, that name
3219: in the parser file becomes a C macro whose definition is the proper
3220: numeric code for that token type. So @code{yylex} can use the name
3221: to indicate that type. @xref{Symbols}.
3222:
3223: When a token is referred to in the grammar rules by a character literal,
3224: the numeric code for that character is also the code for the token type.
3225: So @code{yylex} can simply return that character code. The null character
3226: must not be used this way, because its code is zero and that is what
3227: signifies end-of-input.
3228:
3229: Here is an example showing these things:
3230:
3231: @example
3232: yylex ()
3233: @{
3234: @dots{}
3235: if (c == EOF) /* Detect end of file. */
3236: return 0;
3237: @dots{}
3238: if (c == '+' || c == '-')
3239: return c; /* Assume token type for `+' is '+'. */
3240: @dots{}
3241: return INT; /* Return the type of the token. */
3242: @dots{}
3243: @}
3244: @end example
3245:
3246: @noindent
3247: This interface has been designed so that the output from the @code{lex}
3248: utility can be used without change as the definition of @code{yylex}.
3249:
3250: @node Token Values, Token Positions, Calling Convention, Lexical
3251: @subsection Semantic Values of Tokens
3252:
3253: @vindex yylval
3254: In an ordinary (nonreentrant) parser, the semantic value of the token must
3255: be stored into the global variable @code{yylval}. When you are using
3256: just one data type for semantic values, @code{yylval} has that type.
3257: Thus, if the type is @code{int} (the default), you might write this in
3258: @code{yylex}:
3259:
3260: @example
3261: @group
3262: @dots{}
3263: yylval = value; /* Put value onto Bison stack. */
3264: return INT; /* Return the type of the token. */
3265: @dots{}
3266: @end group
3267: @end example
3268:
3269: When you are using multiple data types, @code{yylval}'s type is a union
3270: made from the @code{%union} declaration (@pxref{Union Decl, ,The Collection of Value Types}). So when
3271: you store a token's value, you must use the proper member of the union.
3272: If the @code{%union} declaration looks like this:
3273:
3274: @example
3275: @group
3276: %union @{
3277: int intval;
3278: double val;
3279: symrec *tptr;
3280: @}
3281: @end group
3282: @end example
3283:
3284: @noindent
3285: then the code in @code{yylex} might look like this:
3286:
3287: @example
3288: @group
3289: @dots{}
3290: yylval.intval = value; /* Put value onto Bison stack. */
3291: return INT; /* Return the type of the token. */
3292: @dots{}
3293: @end group
3294: @end example
3295:
3296: @node Token Positions, Pure Calling, Token Values, Lexical
3297: @subsection Textual Positions of Tokens
3298:
3299: @vindex yylloc
3300: If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, ,Special Features for Use in Actions}) in
3301: actions to keep track of the textual locations of tokens and groupings,
3302: then you must provide this information in @code{yylex}. The function
3303: @code{yyparse} expects to find the textual location of a token just parsed
3304: in the global variable @code{yylloc}. So @code{yylex} must store the
3305: proper data in that variable. The value of @code{yylloc} is a structure
3306: and you need only initialize the members that are going to be used by the
3307: actions. The four members are called @code{first_line},
3308: @code{first_column}, @code{last_line} and @code{last_column}. Note that
3309: the use of this feature makes the parser noticeably slower.
3310:
3311: @tindex YYLTYPE
3312: The data type of @code{yylloc} has the name @code{YYLTYPE}.
3313:
3314: @node Pure Calling, , Token Positions, Lexical
3315: @subsection Calling for Pure Parsers
3316:
3317: When you use the Bison declaration @code{%pure_parser} to request a pure,
3318: reentrant parser, the global communication variables @code{yylval} and
3319: @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) Parser}.) In such parsers the
3320: two global variables are replaced by pointers passed as arguments to
3321: @code{yylex}. You must declare them as shown here, and pass the
3322: information back by storing it through those pointers.
3323:
3324: @example
3325: yylex (lvalp, llocp)
3326: YYSTYPE *lvalp;
3327: YYLTYPE *llocp;
3328: @{
3329: @dots{}
3330: *lvalp = value; /* Put value onto Bison stack. */
3331: return INT; /* Return the type of the token. */
3332: @dots{}
3333: @}
3334: @end example
3335:
3336: If the grammar file does not use the @samp{@@} constructs to refer to
3337: textual positions, then the type @code{YYLTYPE} will not be defined. In
3338: this case, omit the second argument; @code{yylex} will be called with
3339: only one argument.
3340:
3341: @node Error Reporting, Action Features, Lexical, Interface
3342: @section The Error Reporting Function @code{yyerror}
3343: @cindex error reporting function
3344: @findex yyerror
3345: @cindex parse error
3346: @cindex syntax error
3347:
3348: The Bison parser detects a @dfn{parse error} or @dfn{syntax error}
3349: whenever it reads a token which cannot satisfy any syntax rule. A
3350: action in the grammar can also explicitly proclaim an error, using the
3351: macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use in Actions}).
3352:
3353: The Bison parser expects to report the error by calling an error
3354: reporting function named @code{yyerror}, which you must supply. It is
3355: called by @code{yyparse} whenever a syntax error is found, and it
3356: receives one argument. For a parse error, the string is normally
3357: @w{@code{"parse error"}}.
3358:
3359: @findex YYERROR_VERBOSE
3360: If you define the macro @code{YYERROR_VERBOSE} in the Bison declarations
3361: section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose
3362: and specific error message string instead of just plain @w{@code{"parse
3363: error"}}. It doesn't matter what definition you use for
3364: @code{YYERROR_VERBOSE}, just whether you define it.
3365:
3366: The parser can detect one other kind of error: stack overflow. This
3367: happens when the input contains constructions that are very deeply
3368: nested. It isn't likely you will encounter this, since the Bison
3369: parser extends its stack automatically up to a very large limit. But
3370: if overflow happens, @code{yyparse} calls @code{yyerror} in the usual
3371: fashion, except that the argument string is @w{@code{"parser stack
3372: overflow"}}.
3373:
3374: The following definition suffices in simple programs:
3375:
3376: @example
3377: @group
3378: yyerror (s)
3379: char *s;
3380: @{
3381: @end group
3382: @group
3383: fprintf (stderr, "%s\n", s);
3384: @}
3385: @end group
3386: @end example
3387:
3388: After @code{yyerror} returns to @code{yyparse}, the latter will attempt
3389: error recovery if you have written suitable error recovery grammar rules
3390: (@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will
3391: immediately return 1.
3392:
3393: @vindex yynerrs
3394: The variable @code{yynerrs} contains the number of syntax errors
3395: encountered so far. Normally this variable is global; but if you
3396: request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) then it is a local variable
3397: which only the actions can access.
3398:
3399: @node Action Features, , Error Reporting, Interface
3400: @section Special Features for Use in Actions
3401: @cindex summary, action features
3402: @cindex action features summary
3403:
3404: Here is a table of Bison constructs, variables and macros that
3405: are useful in actions.
3406:
3407: @table @samp
3408: @item $$
3409: Acts like a variable that contains the semantic value for the
3410: grouping made by the current rule. @xref{Actions}.
3411:
3412: @item $@var{n}
3413: Acts like a variable that contains the semantic value for the
3414: @var{n}th component of the current rule. @xref{Actions}.
3415:
3416: @item $<@var{typealt}>$
3417: Like @code{$$} but specifies alternative @var{typealt} in the union
3418: specified by the @code{%union} declaration. @xref{Action Types, ,Data Types of Values in Actions}.
3419:
3420: @item $<@var{typealt}>@var{n}
3421: Like @code{$@var{n}} but specifies alternative @var{typealt} in the
3422: union specified by the @code{%union} declaration.
3423: @xref{Action Types, ,Data Types of Values in Actions}.@refill
3424:
3425: @item YYABORT;
3426: Return immediately from @code{yyparse}, indicating failure.
3427: @xref{Parser Function, ,The Parser Function @code{yyparse}}.
3428:
3429: @item YYACCEPT;
3430: Return immediately from @code{yyparse}, indicating success.
3431: @xref{Parser Function, ,The Parser Function @code{yyparse}}.
3432:
3433: @item YYBACKUP (@var{token}, @var{value});
3434: @findex YYBACKUP
3435: Unshift a token. This macro is allowed only for rules that reduce
3436: a single value, and only when there is no look-ahead token.
3437: It installs a look-ahead token with token type @var{token} and
3438: semantic value @var{value}; then it discards the value that was
3439: going to be reduced by this rule.
3440:
3441: If the macro is used when it is not valid, such as when there is
3442: a look-ahead token already, then it reports a syntax error with
3443: a message @samp{cannot back up} and performs ordinary error
3444: recovery.
3445:
3446: In either case, the rest of the action is not executed.
3447:
3448: @item YYEMPTY
3449: @vindex YYEMPTY
3450: Value stored in @code{yychar} when there is no look-ahead token.
3451:
3452: @item YYERROR;
3453: @findex YYERROR
3454: Cause an immediate syntax error. This statement initiates error
3455: recovery just as if the parser itself had detected an error; however, it
3456: does not call @code{yyerror}, and does not print any message. If you
3457: want to print an error message, call @code{yyerror} explicitly before
3458: the @samp{YYERROR;} statement. @xref{Error Recovery}.
3459:
3460: @item YYRECOVERING
3461: This macro stands for an expression that has the value 1 when the parser
3462: is recovering from a syntax error, and 0 the rest of the time.
3463: @xref{Error Recovery}.
3464:
3465: @item yychar
3466: Variable containing the current look-ahead token. (In a pure parser,
3467: this is actually a local variable within @code{yyparse}.) When there is
3468: no look-ahead token, the value @code{YYEMPTY} is stored in the variable.
3469: @xref{Look-Ahead, ,Look-Ahead Tokens}.
3470:
3471: @item yyclearin;
3472: Discard the current look-ahead token. This is useful primarily in
3473: error rules. @xref{Error Recovery}.
3474:
3475: @item yyerrok;
3476: Resume generating error messages immediately for subsequent syntax
3477: errors. This is useful primarily in error rules.
3478: @xref{Error Recovery}.
3479:
3480: @item @@@var{n}
3481: @findex @@@var{n}
3482: Acts like a structure variable containing information on the line
3483: numbers and column numbers of the @var{n}th component of the current
3484: rule. The structure has four members, like this:
3485:
3486: @example
3487: struct @{
3488: int first_line, last_line;
3489: int first_column, last_column;
3490: @};
3491: @end example
3492:
3493: Thus, to get the starting line number of the third component, use
3494: @samp{@@3.first_line}.
3495:
3496: In order for the members of this structure to contain valid information,
3497: you must make @code{yylex} supply this information about each token.
3498: If you need only certain members, then @code{yylex} need only fill in
3499: those members.
3500:
3501: The use of this feature makes the parser noticeably slower.
3502: @end table
3503:
3504: @node Algorithm, Error Recovery, Interface, Top
3505: @chapter The Bison Parser Algorithm
3506: @cindex Bison parser algorithm
3507: @cindex algorithm of parser
3508: @cindex shifting
3509: @cindex reduction
3510: @cindex parser stack
3511: @cindex stack, parser
3512:
3513: As Bison reads tokens, it pushes them onto a stack along with their
3514: semantic values. The stack is called the @dfn{parser stack}. Pushing a
3515: token is traditionally called @dfn{shifting}.
3516:
3517: For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
3518: @samp{3} to come. The stack will have four elements, one for each token
3519: that was shifted.
3520:
3521: But the stack does not always have an element for each token read. When
3522: the last @var{n} tokens and groupings shifted match the components of a
3523: grammar rule, they can be combined according to that rule. This is called
3524: @dfn{reduction}. Those tokens and groupings are replaced on the stack by a
3525: single grouping whose symbol is the result (left hand side) of that rule.
3526: Running the rule's action is part of the process of reduction, because this
3527: is what computes the semantic value of the resulting grouping.
3528:
3529: For example, if the infix calculator's parser stack contains this:
3530:
3531: @example
3532: 1 + 5 * 3
3533: @end example
3534:
3535: @noindent
3536: and the next input token is a newline character, then the last three
3537: elements can be reduced to 15 via the rule:
3538:
3539: @example
3540: expr: expr '*' expr;
3541: @end example
3542:
3543: @noindent
3544: Then the stack contains just these three elements:
3545:
3546: @example
3547: 1 + 15
3548: @end example
3549:
3550: @noindent
3551: At this point, another reduction can be made, resulting in the single value
3552: 16. Then the newline token can be shifted.
3553:
3554: The parser tries, by shifts and reductions, to reduce the entire input down
3555: to a single grouping whose symbol is the grammar's start-symbol
3556: (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
3557:
3558: This kind of parser is known in the literature as a bottom-up parser.
3559:
3560: @menu
3561: * Look-Ahead:: Parser looks one token ahead when deciding what to do.
3562: * Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
3563: * Precedence:: Operator precedence works by resolving conflicts.
3564: * Contextual Precedence:: When an operator's precedence depends on context.
3565: * Parser States:: The parser is a finite-state-machine with stack.
3566: * Reduce/Reduce:: When two rules are applicable in the same situation.
3567: * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
3568: * Stack Overflow:: What happens when stack gets full. How to avoid it.
3569: @end menu
3570:
3571: @node Look-Ahead, Shift/Reduce, , Algorithm
3572: @section Look-Ahead Tokens
3573: @cindex look-ahead token
3574:
3575: The Bison parser does @emph{not} always reduce immediately as soon as the
3576: last @var{n} tokens and groupings match a rule. This is because such a
3577: simple strategy is inadequate to handle most languages. Instead, when a
3578: reduction is possible, the parser sometimes ``looks ahead'' at the next
3579: token in order to decide what to do.
3580:
3581: When a token is read, it is not immediately shifted; first it becomes the
3582: @dfn{look-ahead token}, which is not on the stack. Now the parser can
3583: perform one or more reductions of tokens and groupings on the stack, while
3584: the look-ahead token remains off to the side. When no more reductions
3585: should take place, the look-ahead token is shifted onto the stack. This
3586: does not mean that all possible reductions have been done; depending on the
3587: token type of the look-ahead token, some rules may choose to delay their
3588: application.
3589:
3590: Here is a simple case where look-ahead is needed. These three rules define
3591: expressions which contain binary addition operators and postfix unary
3592: factorial operators (@samp{!}), and allow parentheses for grouping.
3593:
3594: @example
3595: @group
3596: expr: term '+' expr
3597: | term
3598: ;
3599: @end group
3600:
3601: @group
3602: term: '(' expr ')'
3603: | term '!'
3604: | NUMBER
3605: ;
3606: @end group
3607: @end example
3608:
3609: Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
3610: should be done? If the following token is @samp{)}, then the first three
3611: tokens must be reduced to form an @code{expr}. This is the only valid
3612: course, because shifting the @samp{)} would produce a sequence of symbols
3613: @w{@code{term ')'}}, and no rule allows this.
3614:
3615: If the following token is @samp{!}, then it must be shifted immediately so
3616: that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the
3617: parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
3618: @code{expr}. It would then be impossible to shift the @samp{!} because
3619: doing so would produce on the stack the sequence of symbols @code{expr
3620: '!'}. No rule allows that sequence.
3621:
3622: @vindex yychar
3623: The current look-ahead token is stored in the variable @code{yychar}.
3624: @xref{Action Features, ,Special Features for Use in Actions}.
3625:
3626: @node Shift/Reduce, Precedence, Look-Ahead, Algorithm
3627: @section Shift/Reduce Conflicts
3628: @cindex conflicts
3629: @cindex shift/reduce conflicts
3630: @cindex dangling @code{else}
3631: @cindex @code{else}, dangling
3632:
3633: Suppose we are parsing a language which has if-then and if-then-else
3634: statements, with a pair of rules like this:
3635:
3636: @example
3637: @group
3638: if_stmt:
3639: IF expr THEN stmt
3640: | IF expr THEN stmt ELSE stmt
3641: ;
3642: @end group
3643: @end example
3644:
3645: @noindent
3646: Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
3647: terminal symbols for specific keyword tokens.
3648:
3649: When the @code{ELSE} token is read and becomes the look-ahead token, the
3650: contents of the stack (assuming the input is valid) are just right for
3651: reduction by the first rule. But it is also legitimate to shift the
3652: @code{ELSE}, because that would lead to eventual reduction by the second
3653: rule.
3654:
3655: This situation, where either a shift or a reduction would be valid, is
3656: called a @dfn{shift/reduce conflict}. Bison is designed to resolve
3657: these conflicts by choosing to shift, unless otherwise directed by
3658: operator precedence declarations. To see the reason for this, let's
3659: contrast it with the other alternative.
3660:
3661: Since the parser prefers to shift the @code{ELSE}, the result is to attach
3662: the else-clause to the innermost if-statement, making these two inputs
3663: equivalent:
3664:
3665: @example
3666: if x then if y then win (); else lose;
3667:
3668: if x then do; if y then win (); else lose; end;
3669: @end example
3670:
3671: But if the parser chose to reduce when possible rather than shift, the
3672: result would be to attach the else-clause to the outermost if-statement,
3673: making these two inputs equivalent:
3674:
3675: @example
3676: if x then if y then win (); else lose;
3677:
3678: if x then do; if y then win (); end; else lose;
3679: @end example
3680:
3681: The conflict exists because the grammar as written is ambiguous: either
3682: parsing of the simple nested if-statement is legitimate. The established
3683: convention is that these ambiguities are resolved by attaching the
3684: else-clause to the innermost if-statement; this is what Bison accomplishes
3685: by choosing to shift rather than reduce. (It would ideally be cleaner to
3686: write an unambiguous grammar, but that is very hard to do in this case.)
3687: This particular ambiguity was first encountered in the specifications of
3688: Algol 60 and is called the ``dangling @code{else}'' ambiguity.
3689:
3690: To avoid warnings from Bison about predictable, legitimate shift/reduce
3691: conflicts, use the @code{%expect @var{n}} declaration. There will be no
3692: warning as long as the number of shift/reduce conflicts is exactly @var{n}.
3693: @xref{Expect Decl, ,Suppressing Conflict Warnings}.
3694:
3695: The definition of @code{if_stmt} above is solely to blame for the
3696: conflict, but the conflict does not actually appear without additional
3697: rules. Here is a complete Bison input file that actually manifests the
3698: conflict:
3699:
3700: @example
3701: @group
3702: %token IF THEN ELSE variable
3703: %%
3704: @end group
3705: @group
3706: stmt: expr
3707: | if_stmt
3708: ;
3709: @end group
3710:
3711: @group
3712: if_stmt:
3713: IF expr THEN stmt
3714: | IF expr THEN stmt ELSE stmt
3715: ;
3716: @end group
3717:
3718: expr: variable
3719: ;
3720: @end example
3721:
3722: @node Precedence, Contextual Precedence, Shift/Reduce, Algorithm
3723: @section Operator Precedence
3724: @cindex operator precedence
3725: @cindex precedence of operators
3726:
3727: Another situation where shift/reduce conflicts appear is in arithmetic
3728: expressions. Here shifting is not always the preferred resolution; the
3729: Bison declarations for operator precedence allow you to specify when to
3730: shift and when to reduce.
3731:
3732: @menu
3733: * Why Precedence:: An example showing why precedence is needed.
3734: * Using Precedence:: How to specify precedence in Bison grammars.
3735: * Precedence Examples:: How these features are used in the previous example.
3736: * How Precedence:: How they work.
3737: @end menu
3738:
3739: @node Why Precedence, Using Precedence, , Precedence
3740: @subsection When Precedence is Needed
3741:
3742: Consider the following ambiguous grammar fragment (ambiguous because the
3743: input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
3744:
3745: @example
3746: @group
3747: expr: expr '-' expr
3748: | expr '*' expr
3749: | expr '<' expr
3750: | '(' expr ')'
3751: @dots{}
3752: ;
3753: @end group
3754: @end example
3755:
3756: @noindent
3757: Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
3758: should it reduce them via the rule for the addition operator? It depends
3759: on the next token. Of course, if the next token is @samp{)}, we must
3760: reduce; shifting is invalid because no single rule can reduce the token
3761: sequence @w{@samp{- 2 )}} or anything starting with that. But if the next
3762: token is @samp{*} or @samp{<}, we have a choice: either shifting or
3763: reduction would allow the parse to complete, but with different
3764: results.
3765:
3766: To decide which one Bison should do, we must consider the
3767: results. If the next operator token @var{op} is shifted, then it
3768: must be reduced first in order to permit another opportunity to
3769: reduce the sum. The result is (in effect) @w{@samp{1 - (2
3770: @var{op} 3)}}. On the other hand, if the subtraction is reduced
3771: before shifting @var{op}, the result is @w{@samp{(1 - 2) @var{op}
3772: 3}}. Clearly, then, the choice of shift or reduce should depend
3773: on the relative precedence of the operators @samp{-} and
3774: @var{op}: @samp{*} should be shifted first, but not @samp{<}.
3775:
3776: @cindex associativity
3777: What about input such as @w{@samp{1 - 2 - 5}}; should this be
3778: @w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For
3779: most operators we prefer the former, which is called @dfn{left
3780: association}. The latter alternative, @dfn{right association}, is
3781: desirable for assignment operators. The choice of left or right
3782: association is a matter of whether the parser chooses to shift or
3783: reduce when the stack contains @w{@samp{1 - 2}} and the look-ahead
3784: token is @samp{-}: shifting makes right-associativity.
3785:
3786: @node Using Precedence, Precedence Examples, Why Precedence, Precedence
3787: @subsection Specifying Operator Precedence
3788: @findex %left
3789: @findex %right
3790: @findex %nonassoc
3791:
3792: Bison allows you to specify these choices with the operator precedence
3793: declarations @code{%left} and @code{%right}. Each such declaration
3794: contains a list of tokens, which are operators whose precedence and
3795: associativity is being declared. The @code{%left} declaration makes all
3796: those operators left-associative and the @code{%right} declaration makes
3797: them right-associative. A third alternative is @code{%nonassoc}, which
3798: declares that it is a syntax error to find the same operator twice ``in a
3799: row''.
3800:
3801: The relative precedence of different operators is controlled by the
3802: order in which they are declared. The first @code{%left} or
3803: @code{%right} declaration in the file declares the operators whose
3804: precedence is lowest, the next such declaration declares the operators
3805: whose precedence is a little higher, and so on.
3806:
3807: @node Precedence Examples, How Precedence, Using Precedence, Precedence
3808: @subsection Precedence Examples
3809:
3810: In our example, we would want the following declarations:
3811:
3812: @example
3813: %left '<'
3814: %left '-'
3815: %left '*'
3816: @end example
3817:
3818: In a more complete example, which supports other operators as well, we
3819: would declare them in groups of equal precedence. For example, @code{'+'} is
3820: declared with @code{'-'}:
3821:
3822: @example
3823: %left '<' '>' '=' NE LE GE
3824: %left '+' '-'
3825: %left '*' '/'
3826: @end example
3827:
3828: @noindent
3829: (Here @code{NE} and so on stand for the operators for ``not equal''
3830: and so on. We assume that these tokens are more than one character long
3831: and therefore are represented by names, not character literals.)
3832:
3833: @node How Precedence, , Precedence Examples, Precedence
3834: @subsection How Precedence Works
3835:
3836: The first effect of the precedence declarations is to assign precedence
3837: levels to the terminal symbols declared. The second effect is to assign
3838: precedence levels to certain rules: each rule gets its precedence from the
3839: last terminal symbol mentioned in the components. (You can also specify
3840: explicitly the precedence of a rule. @xref{Contextual Precedence, ,Context-Dependent Precedence}.)
3841:
3842: Finally, the resolution of conflicts works by comparing the
3843: precedence of the rule being considered with that of the
3844: look-ahead token. If the token's precedence is higher, the
3845: choice is to shift. If the rule's precedence is higher, the
3846: choice is to reduce. If they have equal precedence, the choice
3847: is made based on the associativity of that precedence level. The
3848: verbose output file made by @samp{-v} (@pxref{Invocation, ,Invoking Bison}) says
3849: how each conflict was resolved.
3850:
3851: Not all rules and not all tokens have precedence. If either the rule or
3852: the look-ahead token has no precedence, then the default is to shift.
3853:
3854: @node Contextual Precedence, Parser States, Precedence, Algorithm
3855: @section Context-Dependent Precedence
3856: @cindex context-dependent precedence
3857: @cindex unary operator precedence
3858: @cindex precedence, context-dependent
3859: @cindex precedence, unary operator
3860: @findex %prec
3861:
3862: Often the precedence of an operator depends on the context. This sounds
3863: outlandish at first, but it is really very common. For example, a minus
3864: sign typically has a very high precedence as a unary operator, and a
3865: somewhat lower precedence (lower than multiplication) as a binary operator.
3866:
3867: The Bison precedence declarations, @code{%left}, @code{%right} and
3868: @code{%nonassoc}, can only be used once for a given token; so a token has
3869: only one precedence declared in this way. For context-dependent
3870: precedence, you need to use an additional mechanism: the @code{%prec}
3871: modifier for rules.@refill
3872:
3873: The @code{%prec} modifier declares the precedence of a particular rule by
3874: specifying a terminal symbol whose precedence should be used for that rule.
3875: It's not necessary for that symbol to appear otherwise in the rule. The
3876: modifier's syntax is:
3877:
3878: @example
3879: %prec @var{terminal-symbol}
3880: @end example
3881:
3882: @noindent
3883: and it is written after the components of the rule. Its effect is to
3884: assign the rule the precedence of @var{terminal-symbol}, overriding
3885: the precedence that would be deduced for it in the ordinary way. The
3886: altered rule precedence then affects how conflicts involving that rule
3887: are resolved (@pxref{Precedence, ,Operator Precedence}).
3888:
3889: Here is how @code{%prec} solves the problem of unary minus. First, declare
3890: a precedence for a fictitious terminal symbol named @code{UMINUS}. There
3891: are no tokens of this type, but the symbol serves to stand for its
3892: precedence:
3893:
3894: @example
3895: @dots{}
3896: %left '+' '-'
3897: %left '*'
3898: %left UMINUS
3899: @end example
3900:
3901: Now the precedence of @code{UMINUS} can be used in specific rules:
3902:
3903: @example
3904: @group
3905: exp: @dots{}
3906: | exp '-' exp
3907: @dots{}
3908: | '-' exp %prec UMINUS
3909: @end group
3910: @end example
3911:
3912: @node Parser States, Reduce/Reduce, Contextual Precedence, Algorithm
3913: @section Parser States
3914: @cindex finite-state machine
3915: @cindex parser state
3916: @cindex state (of parser)
3917:
3918: The function @code{yyparse} is implemented using a finite-state machine.
3919: The values pushed on the parser stack are not simply token type codes; they
3920: represent the entire sequence of terminal and nonterminal symbols at or
3921: near the top of the stack. The current state collects all the information
3922: about previous input which is relevant to deciding what to do next.
3923:
3924: Each time a look-ahead token is read, the current parser state together
3925: with the type of look-ahead token are looked up in a table. This table
3926: entry can say, ``Shift the look-ahead token.'' In this case, it also
3927: specifies the new parser state, which is pushed onto the top of the
3928: parser stack. Or it can say, ``Reduce using rule number @var{n}.''
3929: This means that a certain number of tokens or groupings are taken off
3930: the top of the stack, and replaced by one grouping. In other words,
3931: that number of states are popped from the stack, and one new state is
3932: pushed.
3933:
3934: There is one other alternative: the table can say that the look-ahead token
3935: is erroneous in the current state. This causes error processing to begin
3936: (@pxref{Error Recovery}).
3937:
3938: @node Reduce/Reduce, Mystery Conflicts, Parser States, Algorithm
3939: @section Reduce/Reduce Conflicts
3940: @cindex reduce/reduce conflict
3941: @cindex conflicts, reduce/reduce
3942:
3943: A reduce/reduce conflict occurs if there are two or more rules that apply
3944: to the same sequence of input. This usually indicates a serious error
3945: in the grammar.
3946:
3947: For example, here is an erroneous attempt to define a sequence
3948: of zero or more @code{word} groupings.
3949:
3950: @example
3951: sequence: /* empty */
3952: @{ printf ("empty sequence\n"); @}
3953: | maybeword
3954: | sequence word
3955: @{ printf ("added word %s\n", $2); @}
3956: ;
3957:
3958: maybeword: /* empty */
3959: @{ printf ("empty maybeword\n"); @}
3960: | word
3961: @{ printf ("single word %s\n", $1); @}
3962: ;
3963: @end example
3964:
3965: @noindent
3966: The error is an ambiguity: there is more than one way to parse a single
3967: @code{word} into a @code{sequence}. It could be reduced to a
3968: @code{maybeword} and then into a @code{sequence} via the second rule.
3969: Alternatively, nothing-at-all could be reduced into a @code{sequence}
3970: via the first rule, and this could be combined with the @code{word}
3971: using the third rule for @code{sequence}.
3972:
3973: There is also more than one way to reduce nothing-at-all into a
3974: @code{sequence}. This can be done directly via the first rule,
3975: or indirectly via @code{maybeword} and then the second rule.
3976:
3977: You might think that this is a distinction without a difference, because it
3978: does not change whether any particular input is valid or not. But it does
3979: affect which actions are run. One parsing order runs the second rule's
3980: action; the other runs the first rule's action and the third rule's action.
3981: In this example, the output of the program changes.
3982:
3983: Bison resolves a reduce/reduce conflict by choosing to use the rule that
3984: appears first in the grammar, but it is very risky to rely on this. Every
3985: reduce/reduce conflict must be studied and usually eliminated. Here is the
3986: proper way to define @code{sequence}:
3987:
3988: @example
3989: sequence: /* empty */
3990: @{ printf ("empty sequence\n"); @}
3991: | sequence word
3992: @{ printf ("added word %s\n", $2); @}
3993: ;
3994: @end example
3995:
3996: Here is another common error that yields a reduce/reduce conflict:
3997:
3998: @example
3999: sequence: /* empty */
4000: | sequence words
4001: | sequence redirects
4002: ;
4003:
4004: words: /* empty */
4005: | words word
4006: ;
4007:
4008: redirects:/* empty */
4009: | redirects redirect
4010: ;
4011: @end example
4012:
4013: @noindent
4014: The intention here is to define a sequence which can contain either
4015: @code{word} or @code{redirect} groupings. The individual definitions of
4016: @code{sequence}, @code{words} and @code{redirects} are error-free, but the
4017: three together make a subtle ambiguity: even an empty input can be parsed
4018: in infinitely many ways!
4019:
4020: Consider: nothing-at-all could be a @code{words}. Or it could be two
4021: @code{words} in a row, or three, or any number. It could equally well be a
4022: @code{redirects}, or two, or any number. Or it could be a @code{words}
4023: followed by three @code{redirects} and another @code{words}. And so on.
4024:
4025: Here are two ways to correct these rules. First, to make it a single level
4026: of sequence:
4027:
4028: @example
4029: sequence: /* empty */
4030: | sequence word
4031: | sequence redirect
4032: ;
4033: @end example
4034:
4035: Second, to prevent either a @code{words} or a @code{redirects}
4036: from being empty:
4037:
4038: @example
4039: sequence: /* empty */
4040: | sequence words
4041: | sequence redirects
4042: ;
4043:
4044: words: word
4045: | words word
4046: ;
4047:
4048: redirects:redirect
4049: | redirects redirect
4050: ;
4051: @end example
4052:
4053: @node Mystery Conflicts, Stack Overflow, Reduce/Reduce, Algorithm
4054: @section Mysterious Reduce/Reduce Conflicts
4055:
4056: Sometimes reduce/reduce conflicts can occur that don't look warranted.
4057: Here is an example:
4058:
4059: @example
4060: @group
4061: %token ID
4062:
4063: %%
4064: def: param_spec return_spec ','
4065: ;
4066: param_spec:
4067: type
4068: | name_list ':' type
4069: ;
4070: @end group
4071: @group
4072: return_spec:
4073: type
4074: | name ':' type
4075: ;
4076: @end group
4077: @group
4078: type: ID
4079: ;
4080: @end group
4081: @group
4082: name: ID
4083: ;
4084: name_list:
4085: name
4086: | name ',' name_list
4087: ;
4088: @end group
4089: @end example
4090:
4091: It would seem that this grammar can be parsed with only a single token
4092: of look-ahead: when a @code{param_spec} is being read, an @code{ID} is
4093: a @code{name} if a comma or colon follows, or a @code{type} if another
4094: @code{ID} follows. In other words, this grammar is LR(1).
4095:
4096: @cindex LR(1)
4097: @cindex LALR(1)
4098: However, Bison, like most parser generators, cannot actually handle all
4099: LR(1) grammars. In this grammar, two contexts, that after an @code{ID}
4100: at the beginning of a @code{param_spec} and likewise at the beginning of
4101: a @code{return_spec}, are similar enough that Bison assumes they are the
4102: same. They appear similar because the same set of rules would be
4103: active---the rule for reducing to a @code{name} and that for reducing to
4104: a @code{type}. Bison is unable to determine at that stage of processing
4105: that the rules would require different look-ahead tokens in the two
4106: contexts, so it makes a single parser state for them both. Combining
4107: the two contexts causes a conflict later. In parser terminology, this
4108: occurrence means that the grammar is not LALR(1).
4109:
4110: In general, it is better to fix deficiencies than to document them. But
4111: this particular deficiency is intrinsically hard to fix; parser
4112: generators that can handle LR(1) grammars are hard to write and tend to
4113: produce parsers that are very large. In practice, Bison is more useful
4114: as it is now.
4115:
4116: When the problem arises, you can often fix it by identifying the two
4117: parser states that are being confused, and adding something to make them
4118: look distinct. In the above example, adding one rule to
4119: @code{return_spec} as follows makes the problem go away:
4120:
4121: @example
4122: @group
4123: %token BOGUS
4124: @dots{}
4125: %%
4126: @dots{}
4127: return_spec:
4128: type
4129: | name ':' type
4130: /* This rule is never used. */
4131: | ID BOGUS
4132: ;
4133: @end group
4134: @end example
4135:
4136: This corrects the problem because it introduces the possibility of an
4137: additional active rule in the context after the @code{ID} at the beginning of
4138: @code{return_spec}. This rule is not active in the corresponding context
4139: in a @code{param_spec}, so the two contexts receive distinct parser states.
4140: As long as the token @code{BOGUS} is never generated by @code{yylex},
4141: the added rule cannot alter the way actual input is parsed.
4142:
4143: In this particular example, there is another way to solve the problem:
4144: rewrite the rule for @code{return_spec} to use @code{ID} directly
4145: instead of via @code{name}. This also causes the two confusing
4146: contexts to have different sets of active rules, because the one for
4147: @code{return_spec} activates the altered rule for @code{return_spec}
4148: rather than the one for @code{name}.
4149:
4150: @example
4151: param_spec:
4152: type
4153: | name_list ':' type
4154: ;
4155: return_spec:
4156: type
4157: | ID ':' type
4158: ;
4159: @end example
4160:
4161: @node Stack Overflow, , Mystery Conflicts, Algorithm
4162: @section Stack Overflow, and How to Avoid It
4163: @cindex stack overflow
4164: @cindex parser stack overflow
4165: @cindex overflow of parser stack
4166:
4167: The Bison parser stack can overflow if too many tokens are shifted and
4168: not reduced. When this happens, the parser function @code{yyparse}
4169: returns a nonzero value, pausing only to call @code{yyerror} to report
4170: the overflow.
4171:
4172: @vindex YYMAXDEPTH
4173: By defining the macro @code{YYMAXDEPTH}, you can control how deep the
4174: parser stack can become before a stack overflow occurs. Define the
4175: macro with a value that is an integer. This value is the maximum number
4176: of tokens that can be shifted (and not reduced) before overflow.
4177: It must be a constant expression whose value is known at compile time.
4178:
4179: The stack space allowed is not necessarily allocated. If you specify a
4180: large value for @code{YYMAXDEPTH}, the parser actually allocates a small
4181: stack at first, and then makes it bigger by stages as needed. This
4182: increasing allocation happens automatically and silently. Therefore,
4183: you do not need to make @code{YYMAXDEPTH} painfully small merely to save
4184: space for ordinary inputs that do not need much stack.
4185:
4186: @cindex default stack limit
4187: The default value of @code{YYMAXDEPTH}, if you do not define it, is
4188: 10000.
4189:
4190: @vindex YYINITDEPTH
4191: You can control how much stack is allocated initially by defining the
4192: macro @code{YYINITDEPTH}. This value too must be a compile-time
4193: constant integer. The default is 200.
4194:
4195: @node Error Recovery, Context Dependency, Algorithm, Top
4196: @chapter Error Recovery
4197: @cindex error recovery
4198: @cindex recovery from errors
4199:
4200: It is not usually acceptable to have a program terminate on a parse
4201: error. For example, a compiler should recover sufficiently to parse the
4202: rest of the input file and check it for errors; a calculator should accept
4203: another expression.
4204:
4205: In a simple interactive command parser where each input is one line, it may
4206: be sufficient to allow @code{yyparse} to return 1 on error and have the
4207: caller ignore the rest of the input line when that happens (and then call
4208: @code{yyparse} again). But this is inadequate for a compiler, because it
4209: forgets all the syntactic context leading up to the error. A syntax error
4210: deep within a function in the compiler input should not cause the compiler
4211: to treat the following line like the beginning of a source file.
4212:
4213: @findex error
4214: You can define how to recover from a syntax error by writing rules to
4215: recognize the special token @code{error}. This is a terminal symbol that
4216: is always defined (you need not declare it) and reserved for error
4217: handling. The Bison parser generates an @code{error} token whenever a
4218: syntax error happens; if you have provided a rule to recognize this token
4219: in the current context, the parse can continue.
4220:
4221: For example:
4222:
4223: @example
4224: stmnts: /* empty string */
4225: | stmnts '\n'
4226: | stmnts exp '\n'
4227: | stmnts error '\n'
4228: @end example
4229:
4230: The fourth rule in this example says that an error followed by a newline
4231: makes a valid addition to any @code{stmnts}.
4232:
4233: What happens if a syntax error occurs in the middle of an @code{exp}? The
4234: error recovery rule, interpreted strictly, applies to the precise sequence
4235: of a @code{stmnts}, an @code{error} and a newline. If an error occurs in
4236: the middle of an @code{exp}, there will probably be some additional tokens
4237: and subexpressions on the stack after the last @code{stmnts}, and there
4238: will be tokens to read before the next newline. So the rule is not
4239: applicable in the ordinary way.
4240:
4241: But Bison can force the situation to fit the rule, by discarding part of
4242: the semantic context and part of the input. First it discards states and
4243: objects from the stack until it gets back to a state in which the
4244: @code{error} token is acceptable. (This means that the subexpressions
4245: already parsed are discarded, back to the last complete @code{stmnts}.) At
4246: this point the @code{error} token can be shifted. Then, if the old
4247: look-ahead token is not acceptable to be shifted next, the parser reads
4248: tokens and discards them until it finds a token which is acceptable. In
4249: this example, Bison reads and discards input until the next newline
4250: so that the fourth rule can apply.
4251:
4252: The choice of error rules in the grammar is a choice of strategies for
4253: error recovery. A simple and useful strategy is simply to skip the rest of
4254: the current input line or current statement if an error is detected:
4255:
4256: @example
4257: stmnt: error ';' /* on error, skip until ';' is read */
4258: @end example
4259:
4260: It is also useful to recover to the matching close-delimiter of an
4261: opening-delimiter that has already been parsed. Otherwise the
4262: close-delimiter will probably appear to be unmatched, and generate another,
4263: spurious error message:
4264:
4265: @example
4266: primary: '(' expr ')'
4267: | '(' error ')'
4268: @dots{}
4269: ;
4270: @end example
4271:
4272: Error recovery strategies are necessarily guesses. When they guess wrong,
4273: one syntax error often leads to another. In the above example, the error
4274: recovery rule guesses that an error is due to bad input within one
4275: @code{stmnt}. Suppose that instead a spurious semicolon is inserted in the
4276: middle of a valid @code{stmnt}. After the error recovery rule recovers
4277: from the first error, another syntax error will be found straightaway,
4278: since the text following the spurious semicolon is also an invalid
4279: @code{stmnt}.
4280:
4281: To prevent an outpouring of error messages, the parser will output no error
4282: message for another syntax error that happens shortly after the first; only
4283: after three consecutive input tokens have been successfully shifted will
4284: error messages resume.
4285:
4286: Note that rules which accept the @code{error} token may have actions, just
4287: as any other rules can.
4288:
4289: @findex yyerrok
4290: You can make error messages resume immediately by using the macro
4291: @code{yyerrok} in an action. If you do this in the error rule's action, no
4292: error messages will be suppressed. This macro requires no arguments;
4293: @samp{yyerrok;} is a valid C statement.
4294:
4295: @findex yyclearin
4296: The previous look-ahead token is reanalyzed immediately after an error. If
4297: this is unacceptable, then the macro @code{yyclearin} may be used to clear
4298: this token. Write the statement @samp{yyclearin;} in the error rule's
4299: action.
4300:
4301: For example, suppose that on a parse error, an error handling routine is
4302: called that advances the input stream to some point where parsing should
4303: once again commence. The next symbol returned by the lexical scanner is
4304: probably correct. The previous look-ahead token ought to be discarded
4305: with @samp{yyclearin;}.
4306:
4307: @vindex YYRECOVERING
4308: The macro @code{YYRECOVERING} stands for an expression that has the
4309: value 1 when the parser is recovering from a syntax error, and 0 the
4310: rest of the time. A value of 1 indicates that error messages are
4311: currently suppressed for new syntax errors.
4312:
4313: @node Context Dependency, Debugging, Error Recovery, Top
4314: @chapter Handling Context Dependencies
4315:
4316: The Bison paradigm is to parse tokens first, then group them into larger
4317: syntactic units. In many languages, the meaning of a token is affected by
4318: its context. Although this violates the Bison paradigm, certain techniques
4319: (known as @dfn{kludges}) may enable you to write Bison parsers for such
4320: languages.
4321:
4322: @menu
4323: * Semantic Tokens:: Token parsing can depend on the semantic context.
4324: * Lexical Tie-ins:: Token parsing can depend on the syntactic context.
4325: * Tie-in Recovery:: Lexical tie-ins have implications for how
4326: error recovery rules must be written.
4327: @end menu
4328:
4329: (Actually, ``kludge'' means any technique that gets its job done but is
4330: neither clean nor robust.)
4331:
4332: @node Semantic Tokens, Lexical Tie-ins, , Context Dependency
4333: @section Semantic Info in Token Types
4334:
4335: The C language has a context dependency: the way an identifier is used
4336: depends on what its current meaning is. For example, consider this:
4337:
4338: @example
4339: foo (x);
4340: @end example
4341:
4342: This looks like a function call statement, but if @code{foo} is a typedef
4343: name, then this is actually a declaration of @code{x}. How can a Bison
4344: parser for C decide how to parse this input?
4345:
4346: The method used in GNU C is to have two different token types,
4347: @code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
4348: identifier, it looks up the current declaration of the identifier in order
4349: to decide which token type to return: @code{TYPENAME} if the identifier is
4350: declared as a typedef, @code{IDENTIFIER} otherwise.
4351:
4352: The grammar rules can then express the context dependency by the choice of
4353: token type to recognize. @code{IDENTIFIER} is accepted as an expression,
4354: but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
4355: @code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
4356: is @emph{not} significant, such as in declarations that can shadow a
4357: typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
4358: accepted---there is one rule for each of the two token types.
4359:
4360: This technique is simple to use if the decision of which kinds of
4361: identifiers to allow is made at a place close to where the identifier is
4362: parsed. But in C this is not always so: C allows a declaration to
4363: redeclare a typedef name provided an explicit type has been specified
4364: earlier:
4365:
4366: @example
4367: typedef int foo, bar, lose;
4368: static foo (bar); /* @r{redeclare @code{bar} as static variable} */
4369: static int foo (lose); /* @r{redeclare @code{foo} as function} */
4370: @end example
4371:
4372: Unfortunately, the name being declared is separated from the declaration
4373: construct itself by a complicated syntactic structure---the ``declarator''.
4374:
4375: As a result, the part of Bison parser for C needs to be duplicated, with
4376: all the nonterminal names changed: once for parsing a declaration in which
4377: a typedef name can be redefined, and once for parsing a declaration in
4378: which that can't be done. Here is a part of the duplication, with actions
4379: omitted for brevity:
4380:
4381: @example
4382: initdcl:
4383: declarator maybeasm '='
4384: init
4385: | declarator maybeasm
4386: ;
4387:
4388: notype_initdcl:
4389: notype_declarator maybeasm '='
4390: init
4391: | notype_declarator maybeasm
4392: ;
4393: @end example
4394:
4395: @noindent
4396: Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
4397: cannot. The distinction between @code{declarator} and
4398: @code{notype_declarator} is the same sort of thing.
4399:
4400: There is some similarity between this technique and a lexical tie-in
4401: (described next), in that information which alters the lexical analysis is
4402: changed during parsing by other parts of the program. The difference is
4403: here the information is global, and is used for other purposes in the
4404: program. A true lexical tie-in has a special-purpose flag controlled by
4405: the syntactic context.
4406:
4407: @node Lexical Tie-ins, Tie-in Recovery, Semantic Tokens, Context Dependency
4408: @section Lexical Tie-ins
4409: @cindex lexical tie-in
4410:
4411: One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
4412: which is set by Bison actions, whose purpose is to alter the way tokens are
4413: parsed.
4414:
4415: For example, suppose we have a language vaguely like C, but with a special
4416: construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes
4417: an expression in parentheses in which all integers are hexadecimal. In
4418: particular, the token @samp{a1b} must be treated as an integer rather than
4419: as an identifier if it appears in that context. Here is how you can do it:
4420:
4421: @example
4422: @group
4423: %@{
4424: int hexflag;
4425: %@}
4426: %%
4427: @dots{}
4428: @end group
4429: @group
4430: expr: IDENTIFIER
4431: | constant
4432: | HEX '('
4433: @{ hexflag = 1; @}
4434: expr ')'
4435: @{ hexflag = 0;
4436: $$ = $4; @}
4437: | expr '+' expr
4438: @{ $$ = make_sum ($1, $3); @}
4439: @dots{}
4440: ;
4441: @end group
4442:
4443: @group
4444: constant:
4445: INTEGER
4446: | STRING
4447: ;
4448: @end group
4449: @end example
4450:
4451: @noindent
4452: Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
4453: it is nonzero, all integers are parsed in hexadecimal, and tokens starting
4454: with letters are parsed as integers if possible.
4455:
4456: The declaration of @code{hexflag} shown in the C declarations section of
4457: the parser file is needed to make it accessible to the actions
4458: (@pxref{C Declarations, ,The C Declarations Section}). You must also write the code in @code{yylex}
4459: to obey the flag.
4460:
4461: @node Tie-in Recovery, , Lexical Tie-ins, Context Dependency
4462: @section Lexical Tie-ins and Error Recovery
4463:
4464: Lexical tie-ins make strict demands on any error recovery rules you have.
4465: @xref{Error Recovery}.
4466:
4467: The reason for this is that the purpose of an error recovery rule is to
4468: abort the parsing of one construct and resume in some larger construct.
4469: For example, in C-like languages, a typical error recovery rule is to skip
4470: tokens until the next semicolon, and then start a new statement, like this:
4471:
4472: @example
4473: stmt: expr ';'
4474: | IF '(' expr ')' stmt @{ @dots{} @}
4475: @dots{}
4476: error ';'
4477: @{ hexflag = 0; @}
4478: ;
4479: @end example
4480:
4481: If there is a syntax error in the middle of a @samp{hex (@var{expr})}
4482: construct, this error rule will apply, and then the action for the
4483: completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would
4484: remain set for the entire rest of the input, or until the next @code{hex}
4485: keyword, causing identifiers to be misinterpreted as integers.
4486:
4487: To avoid this problem the error recovery rule itself clears @code{hexflag}.
4488:
4489: There may also be an error recovery rule that works within expressions.
4490: For example, there could be a rule which applies within parentheses
4491: and skips to the close-parenthesis:
4492:
4493: @example
4494: @group
4495: expr: @dots{}
4496: | '(' expr ')'
4497: @{ $$ = $2; @}
4498: | '(' error ')'
4499: @dots{}
4500: @end group
4501: @end example
4502:
4503: If this rule acts within the @code{hex} construct, it is not going to abort
4504: that construct (since it applies to an inner level of parentheses within
4505: the construct). Therefore, it should not clear the flag: the rest of
4506: the @code{hex} construct should be parsed with the flag still in effect.
4507:
4508: What if there is an error recovery rule which might abort out of the
4509: @code{hex} construct or might not, depending on circumstances? There is no
4510: way you can write the action to determine whether a @code{hex} construct is
4511: being aborted or not. So if you are using a lexical tie-in, you had better
4512: make sure your error recovery rules are not of this kind. Each rule must
4513: be such that you can be sure that it always will, or always won't, have to
4514: clear the flag.
4515:
4516: @node Debugging, Invocation, Context Dependency, Top
4517: @chapter Debugging Your Parser
4518: @findex YYDEBUG
4519: @findex yydebug
4520: @cindex debugging
4521: @cindex tracing the parser
4522:
4523: If a Bison grammar compiles properly but doesn't do what you want when it
4524: runs, the @code{yydebug} parser-trace feature can help you figure out why.
4525:
4526: To enable compilation of trace facilities, you must define the macro
4527: @code{YYDEBUG} when you compile the parser. You could use
4528: @samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define
4529: YYDEBUG 1} in the C declarations section of the grammar file
4530: (@pxref{C Declarations, ,The C Declarations Section}). Alternatively, use the @samp{-t} option when
4531: you run Bison (@pxref{Invocation, ,Invoking Bison}). We always define @code{YYDEBUG} so that
4532: debugging is always possible.
4533:
4534: The trace facility uses @code{stderr}, so you must add @w{@code{#include
4535: <stdio.h>}} to the C declarations section unless it is already there.
4536:
4537: Once you have compiled the program with trace facilities, the way to
4538: request a trace is to store a nonzero value in the variable @code{yydebug}.
4539: You can do this by making the C code do it (in @code{main}, perhaps), or
4540: you can alter the value with a C debugger.
4541:
4542: Each step taken by the parser when @code{yydebug} is nonzero produces a
4543: line or two of trace information, written on @code{stderr}. The trace
4544: messages tell you these things:
4545:
4546: @itemize @bullet
4547: @item
4548: Each time the parser calls @code{yylex}, what kind of token was read.
4549:
4550: @item
4551: Each time a token is shifted, the depth and complete contents of the
4552: state stack (@pxref{Parser States}).
4553:
4554: @item
4555: Each time a rule is reduced, which rule it is, and the complete contents
4556: of the state stack afterward.
4557: @end itemize
4558:
4559: To make sense of this information, it helps to refer to the listing file
4560: produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking Bison}). This file
4561: shows the meaning of each state in terms of positions in various rules, and
4562: also what each state will do with each possible input token. As you read
4563: the successive trace messages, you can see that the parser is functioning
4564: according to its specification in the listing file. Eventually you will
4565: arrive at the place where something undesirable happens, and you will see
4566: which parts of the grammar are to blame.
4567:
4568: The parser file is a C program and you can use C debuggers on it, but it's
4569: not easy to interpret what it is doing. The parser function is a
4570: finite-state machine interpreter, and aside from the actions it executes
4571: the same code over and over. Only the values of variables show where in
4572: the grammar it is working.
4573:
4574: @findex YYPRINT
4575: The debugging information normally gives the token type of each token
4576: read, but not its semantic value. You can optionally define a macro
4577: named @code{YYPRINT} to provide a way to print the value. If you define
4578: @code{YYPRINT}, it should take three arguments. The parser will pass a
4579: standard I/O stream, the numeric code for the token type, and the token
4580: value (from @code{yylval}).
4581:
4582: Here is an example of @code{YYPRINT} suitable for the multi-function
4583: calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}):
4584:
4585: @smallexample
4586: #define YYPRINT(file, type, value) yyprint (file, type, value)
4587:
4588: static void
4589: yyprint (file, type, value)
4590: FILE *file;
4591: int type;
4592: YYSTYPE value;
4593: @{
4594: if (type == VAR)
4595: fprintf (file, " %s", value.tptr->name);
4596: else if (type == NUM)
4597: fprintf (file, " %d", value.val);
4598: @}
4599: @end smallexample
4600:
4601: @node Invocation, Table of Symbols, Debugging, Top
4602: @chapter Invoking Bison
4603: @cindex invoking Bison
4604: @cindex Bison invocation
4605: @cindex options for invoking Bison
4606:
4607: The usual way to invoke Bison is as follows:
4608:
4609: @example
4610: bison @var{infile}
4611: @end example
4612:
4613: Here @var{infile} is the grammar file name, which usually ends in
4614: @samp{.y}. The parser file's name is made by replacing the @samp{.y}
4615: with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields
4616: @file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields
4617: @file{hack/foo.tab.c}.@refill
4618:
4619: @menu
4620: * Bison Options:: All the options described in detail,
4621: in alphabetical order by short options.
4622: * Option Cross Key:: Alphabetical list of long options.
4623: * VMS Invocation:: Bison command syntax on VMS.
4624: @end menu
4625:
4626: @node Bison Options, Option Cross Key, , Invocation
4627: @section Bison Options
4628:
4629: Bison supports both traditional single-letter options and mnemonic long
4630: option names. Long option names are indicated with @samp{--} instead of
4631: @samp{-}. Abbreviations for option names are allowed as long as they
4632: are unique. When a long option takes an argument, like
4633: @samp{--file-prefix}, connect the option name and the argument with
4634: @samp{=}.
4635:
4636: Here is a list of options that can be used with Bison, alphabetized by
4637: short option. It is followed by a cross key alphabetized by long
4638: option.
4639:
4640: @table @samp
4641: @item -b @var{file-prefix}
4642: @itemx --file-prefix=@var{prefix}
4643: Specify a prefix to use for all Bison output file names. The names are
4644: chosen as if the input file were named @file{@var{prefix}.c}.
4645:
4646: @item -d
4647: @itemx --defines
4648: Write an extra output file containing macro definitions for the token
4649: type names defined in the grammar and the semantic value type
4650: @code{YYSTYPE}, as well as a few @code{extern} variable declarations.
4651:
4652: If the parser output file is named @file{@var{name}.c} then this file
4653: is named @file{@var{name}.h}.@refill
4654:
4655: This output file is essential if you wish to put the definition of
4656: @code{yylex} in a separate source file, because @code{yylex} needs to
4657: be able to refer to token type codes and the variable
4658: @code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill
4659:
4660: @item -l
4661: @itemx --no-lines
4662: Don't put any @code{#line} preprocessor commands in the parser file.
4663: Ordinarily Bison puts them in the parser file so that the C compiler
4664: and debuggers will associate errors with your source file, the
4665: grammar file. This option causes them to associate errors with the
4666: parser file, treating it an independent source file in its own right.
4667:
4668: @item -o @var{outfile}
4669: @itemx --output-file=@var{outfile}
4670: Specify the name @var{outfile} for the parser file.
4671:
4672: The other output files' names are constructed from @var{outfile}
4673: as described under the @samp{-v} and @samp{-d} switches.
4674:
4675: @item -p @var{prefix}
4676: @itemx --name-prefix=@var{prefix}
4677: Rename the external symbols used in the parser so that they start with
4678: @var{prefix} instead of @samp{yy}. The precise list of symbols renamed
4679: is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yylval},
4680: @code{yychar} and @code{yydebug}.
4681:
4682: For example, if you use @samp{-p c}, the names become @code{cparse},
4683: @code{clex}, and so on.
4684:
4685: @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
4686:
4687: @item -t
4688: @itemx --debug
4689: Output a definition of the macro @code{YYDEBUG} into the parser file,
4690: so that the debugging facilities are compiled. @xref{Debugging, ,Debugging Your Parser}.
4691:
4692: @item -v
4693: @itemx --verbose
4694: Write an extra output file containing verbose descriptions of the
4695: parser states and what is done for each type of look-ahead token in
4696: that state.
4697:
4698: This file also describes all the conflicts, both those resolved by
4699: operator precedence and the unresolved ones.
4700:
4701: The file's name is made by removing @samp{.tab.c} or @samp{.c} from
4702: the parser output file name, and adding @samp{.output} instead.@refill
4703:
4704: Therefore, if the input file is @file{foo.y}, then the parser file is
4705: called @file{foo.tab.c} by default. As a consequence, the verbose
4706: output file is called @file{foo.output}.@refill
4707:
4708: @item -V
4709: @itemx --version
4710: Print the version number of Bison and exit.
4711:
4712: @item -h
4713: @itemx --help
4714: Print a summary of the command-line options to Bison and exit.
4715:
4716: @need 1750
4717: @item -y
4718: @itemx --yacc
4719: @itemx --fixed-output-files
4720: Equivalent to @samp{-o y.tab.c}; the parser output file is called
4721: @file{y.tab.c}, and the other outputs are called @file{y.output} and
4722: @file{y.tab.h}. The purpose of this switch is to imitate Yacc's output
4723: file name conventions. Thus, the following shell script can substitute
4724: for Yacc:@refill
4725:
4726: @example
4727: bison -y $*
4728: @end example
4729: @end table
4730:
4731: @node Option Cross Key, VMS Invocation, Bison Options, Invocation
4732: @section Option Cross Key
4733:
4734: Here is a list of options, alphabetized by long option, to help you find
4735: the corresponding short option.
4736:
4737: @tex
4738: \def\leaderfill{\leaders\hbox to 1em{\hss.\hss}\hfill}
4739:
4740: {\tt
4741: \line{ --debug \leaderfill -t}
4742: \line{ --defines \leaderfill -d}
4743: \line{ --file-prefix \leaderfill -b}
4744: \line{ --fixed-output-files \leaderfill -y}
4745: \line{ --help \leaderfill -h}
4746: \line{ --name-prefix \leaderfill -p}
4747: \line{ --no-lines \leaderfill -l}
4748: \line{ --output-file \leaderfill -o}
4749: \line{ --verbose \leaderfill -v}
4750: \line{ --version \leaderfill -V}
4751: \line{ --yacc \leaderfill -y}
4752: }
4753: @end tex
4754:
4755: @ifinfo
4756: @example
4757: --debug -t
4758: --defines -d
4759: --file-prefix=@var{prefix} -b @var{file-prefix}
4760: --fixed-output-files --yacc -y
4761: --help -h
4762: --name-prefix -p
4763: --no-lines -l
4764: --output-file=@var{outfile} -o @var{outfile}
4765: --verbose -v
4766: --version -V
4767: @end example
4768: @end ifinfo
4769:
4770: @node VMS Invocation, , Option Cross Key, Invocation
4771: @section Invoking Bison under VMS
4772: @cindex invoking Bison under VMS
4773: @cindex VMS
4774:
4775: The command line syntax for Bison on VMS is a variant of the usual
4776: Bison command syntax---adapted to fit VMS conventions.
4777:
4778: To find the VMS equivalent for any Bison option, start with the long
4779: option, and substitute a @samp{/} for the leading @samp{--}, and
4780: substitute a @samp{_} for each @samp{-} in the name of the long option.
4781: For example, the following invocation under VMS:
4782:
4783: @example
4784: bison /debug/name_prefix=bar foo.y
4785: @end example
4786:
4787: @noindent
4788: is equivalent to the following command under POSIX.
4789:
4790: @example
4791: bison --debug --name-prefix=bar foo.y
4792: @end example
4793:
4794: The VMS file system does not permit filenames such as
4795: @file{foo.tab.c}. In the above example, the output file
4796: would instead be named @file{foo_tab.c}.
4797:
4798: @node Table of Symbols, Glossary, Invocation, Top
4799: @appendix Bison Symbols
4800: @cindex Bison symbols, table of
4801: @cindex symbols in Bison, table of
4802:
4803: @table @code
4804: @item error
4805: A token name reserved for error recovery. This token may be used in
4806: grammar rules so as to allow the Bison parser to recognize an error in
4807: the grammar without halting the process. In effect, a sentence
4808: containing an error may be recognized as valid. On a parse error, the
4809: token @code{error} becomes the current look-ahead token. Actions
4810: corresponding to @code{error} are then executed, and the look-ahead
4811: token is reset to the token that originally caused the violation.
4812: @xref{Error Recovery}.
4813:
4814: @item YYABORT
4815: Macro to pretend that an unrecoverable syntax error has occurred, by
4816: making @code{yyparse} return 1 immediately. The error reporting
4817: function @code{yyerror} is not called. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
4818:
4819: @item YYACCEPT
4820: Macro to pretend that a complete utterance of the language has been
4821: read, by making @code{yyparse} return 0 immediately.
4822: @xref{Parser Function, ,The Parser Function @code{yyparse}}.
4823:
4824: @item YYBACKUP
4825: Macro to discard a value from the parser stack and fake a look-ahead
4826: token. @xref{Action Features, ,Special Features for Use in Actions}.
4827:
4828: @item YYERROR
4829: Macro to pretend that a syntax error has just been detected: call
4830: @code{yyerror} and then perform normal error recovery if possible
4831: (@pxref{Error Recovery}), or (if recovery is impossible) make
4832: @code{yyparse} return 1. @xref{Error Recovery}.
4833:
4834: @item YYERROR_VERBOSE
4835: Macro that you define with @code{#define} in the Bison declarations
4836: section to request verbose, specific error message strings when
4837: @code{yyerror} is called.
4838:
4839: @item YYINITDEPTH
4840: Macro for specifying the initial size of the parser stack.
4841: @xref{Stack Overflow}.
4842:
4843: @item YYLTYPE
4844: Macro for the data type of @code{yylloc}; a structure with four
4845: members. @xref{Token Positions, ,Textual Positions of Tokens}.
4846:
4847: @item YYMAXDEPTH
4848: Macro for specifying the maximum size of the parser stack.
4849: @xref{Stack Overflow}.
4850:
4851: @item YYRECOVERING
4852: Macro whose value indicates whether the parser is recovering from a
4853: syntax error. @xref{Action Features, ,Special Features for Use in Actions}.
4854:
4855: @item YYSTYPE
4856: Macro for the data type of semantic values; @code{int} by default.
4857: @xref{Value Type, ,Data Types of Semantic Values}.
4858:
4859: @item yychar
4860: External integer variable that contains the integer value of the
4861: current look-ahead token. (In a pure parser, it is a local variable
4862: within @code{yyparse}.) Error-recovery rule actions may examine this
4863: variable. @xref{Action Features, ,Special Features for Use in Actions}.
4864:
4865: @item yyclearin
4866: Macro used in error-recovery rule actions. It clears the previous
4867: look-ahead token. @xref{Error Recovery}.
4868:
4869: @item yydebug
4870: External integer variable set to zero by default. If @code{yydebug}
4871: is given a nonzero value, the parser will output information on input
4872: symbols and parser action. @xref{Debugging, ,Debugging Your Parser}.
4873:
4874: @item yyerrok
4875: Macro to cause parser to recover immediately to its normal mode
4876: after a parse error. @xref{Error Recovery}.
4877:
4878: @item yyerror
4879: User-supplied function to be called by @code{yyparse} on error. The
4880: function receives one argument, a pointer to a character string
4881: containing an error message. @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
4882:
4883: @item yylex
4884: User-supplied lexical analyzer function, called with no arguments
4885: to get the next token. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
4886:
4887: @item yylval
4888: External variable in which @code{yylex} should place the semantic
4889: value associated with a token. (In a pure parser, it is a local
4890: variable within @code{yyparse}, and its address is passed to
4891: @code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}.
4892:
4893: @item yylloc
4894: External variable in which @code{yylex} should place the line and
4895: column numbers associated with a token. (In a pure parser, it is a
4896: local variable within @code{yyparse}, and its address is passed to
4897: @code{yylex}.) You can ignore this variable if you don't use the
4898: @samp{@@} feature in the grammar actions. @xref{Token Positions, ,Textual Positions of Tokens}.
4899:
4900: @item yynerrs
4901: Global variable which Bison increments each time there is a parse
4902: error. (In a pure parser, it is a local variable within
4903: @code{yyparse}.) @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
4904:
4905: @item yyparse
4906: The parser function produced by Bison; call this function to start
4907: parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
4908:
4909: @item %left
4910: Bison declaration to assign left associativity to token(s).
4911: @xref{Precedence Decl, ,Operator Precedence}.
4912:
4913: @item %nonassoc
4914: Bison declaration to assign nonassociativity to token(s).
4915: @xref{Precedence Decl, ,Operator Precedence}.
4916:
4917: @item %prec
4918: Bison declaration to assign a precedence to a specific rule.
4919: @xref{Contextual Precedence, ,Context-Dependent Precedence}.
4920:
4921: @item %pure_parser
4922: Bison declaration to request a pure (reentrant) parser.
4923: @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
4924:
4925: @item %right
4926: Bison declaration to assign right associativity to token(s).
4927: @xref{Precedence Decl, ,Operator Precedence}.
4928:
4929: @item %start
4930: Bison declaration to specify the start symbol. @xref{Start Decl, ,The Start-Symbol}.
4931:
4932: @item %token
4933: Bison declaration to declare token(s) without specifying precedence.
4934: @xref{Token Decl, ,Token Type Names}.
4935:
4936: @item %type
4937: Bison declaration to declare nonterminals. @xref{Type Decl, ,Nonterminal Symbols}.
4938:
4939: @item %union
4940: Bison declaration to specify several possible data types for semantic
4941: values. @xref{Union Decl, ,The Collection of Value Types}.
4942: @end table
4943:
4944: These are the punctuation and delimiters used in Bison input:
4945:
4946: @table @samp
4947: @item %%
4948: Delimiter used to separate the grammar rule section from the
4949: Bison declarations section or the additional C code section.
4950: @xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}.
4951:
4952: @item %@{ %@}
4953: All code listed between @samp{%@{} and @samp{%@}} is copied directly
4954: to the output file uninterpreted. Such code forms the ``C
4955: declarations'' section of the input file. @xref{Grammar Outline, ,Outline of a Bison Grammar}.
4956:
4957: @item /*@dots{}*/
4958: Comment delimiters, as in C.
4959:
4960: @item :
4961: Separates a rule's result from its components. @xref{Rules, ,Syntax of Grammar Rules}.
4962:
4963: @item ;
4964: Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}.
4965:
4966: @item |
4967: Separates alternate rules for the same result nonterminal.
4968: @xref{Rules, ,Syntax of Grammar Rules}.
4969: @end table
4970:
4971: @node Glossary, Index, Table of Symbols, Top
4972: @appendix Glossary
4973: @cindex glossary
4974:
4975: @table @asis
4976: @item Backus-Naur Form (BNF)
4977: Formal method of specifying context-free grammars. BNF was first used
4978: in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
4979:
4980: @item Context-free grammars
4981: Grammars specified as rules that can be applied regardless of context.
4982: Thus, if there is a rule which says that an integer can be used as an
4983: expression, integers are allowed @emph{anywhere} an expression is
4984: permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
4985:
4986: @item Dynamic allocation
4987: Allocation of memory that occurs during execution, rather than at
4988: compile time or on entry to a function.
4989:
4990: @item Empty string
4991: Analogous to the empty set in set theory, the empty string is a
4992: character string of length zero.
4993:
4994: @item Finite-state stack machine
4995: A ``machine'' that has discrete states in which it is said to exist at
4996: each instant in time. As input to the machine is processed, the
4997: machine moves from state to state as specified by the logic of the
4998: machine. In the case of the parser, the input is the language being
4999: parsed, and the states correspond to various stages in the grammar
5000: rules. @xref{Algorithm, ,The Bison Parser Algorithm }.
5001:
5002: @item Grouping
5003: A language construct that is (in general) grammatically divisible;
5004: for example, `expression' or `declaration' in C.
5005: @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5006:
5007: @item Infix operator
5008: An arithmetic operator that is placed between the operands on which it
5009: performs some operation.
5010:
5011: @item Input stream
5012: A continuous flow of data between devices or programs.
5013:
5014: @item Language construct
5015: One of the typical usage schemas of the language. For example, one of
5016: the constructs of the C language is the @code{if} statement.
5017: @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5018:
5019: @item Left associativity
5020: Operators having left associativity are analyzed from left to right:
5021: @samp{a+b+c} first computes @samp{a+b} and then combines with
5022: @samp{c}. @xref{Precedence, ,Operator Precedence}.
5023:
5024: @item Left recursion
5025: A rule whose result symbol is also its first component symbol;
5026: for example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive Rules}.
5027:
5028: @item Left-to-right parsing
5029: Parsing a sentence of a language by analyzing it token by token from
5030: left to right. @xref{Algorithm, ,The Bison Parser Algorithm }.
5031:
5032: @item Lexical analyzer (scanner)
5033: A function that reads an input stream and returns tokens one by one.
5034: @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
5035:
5036: @item Lexical tie-in
5037: A flag, set by actions in the grammar rules, which alters the way
5038: tokens are parsed. @xref{Lexical Tie-ins}.
5039:
5040: @item Look-ahead token
5041: A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead Tokens}.
5042:
5043: @item LALR(1)
5044: The class of context-free grammars that Bison (like most other parser
5045: generators) can handle; a subset of LR(1). @xref{Mystery Conflicts, ,
5046: Mysterious Reduce/Reduce Conflicts}.
5047:
5048: @item LR(1)
5049: The class of context-free grammars in which at most one token of
5050: look-ahead is needed to disambiguate the parsing of any piece of input.
5051:
5052: @item Nonterminal symbol
5053: A grammar symbol standing for a grammatical construct that can
5054: be expressed through rules in terms of smaller constructs; in other
5055: words, a construct that is not a token. @xref{Symbols}.
5056:
5057: @item Parse error
5058: An error encountered during parsing of an input stream due to invalid
5059: syntax. @xref{Error Recovery}.
5060:
5061: @item Parser
5062: A function that recognizes valid sentences of a language by analyzing
5063: the syntax structure of a set of tokens passed to it from a lexical
5064: analyzer.
5065:
5066: @item Postfix operator
5067: An arithmetic operator that is placed after the operands upon which it
5068: performs some operation.
5069:
5070: @item Reduction
5071: Replacing a string of nonterminals and/or terminals with a single
5072: nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison Parser Algorithm }.
5073:
5074: @item Reentrant
5075: A reentrant subprogram is a subprogram which can be in invoked any
5076: number of times in parallel, without interference between the various
5077: invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
5078:
5079: @item Reverse polish notation
5080: A language in which all operators are postfix operators.
5081:
5082: @item Right recursion
5083: A rule whose result symbol is also its last component symbol;
5084: for example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive Rules}.
5085:
5086: @item Semantics
5087: In computer languages, the semantics are specified by the actions
5088: taken for each instance of the language, i.e., the meaning of
5089: each statement. @xref{Semantics, ,Defining Language Semantics}.
5090:
5091: @item Shift
5092: A parser is said to shift when it makes the choice of analyzing
5093: further input from the stream rather than reducing immediately some
5094: already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm }.
5095:
5096: @item Single-character literal
5097: A single character that is recognized and interpreted as is.
5098: @xref{Grammar in Bison, ,From Formal Rules to Bison Input}.
5099:
5100: @item Start symbol
5101: The nonterminal symbol that stands for a complete valid utterance in
5102: the language being parsed. The start symbol is usually listed as the
5103: first nonterminal symbol in a language specification.
5104: @xref{Start Decl, ,The Start-Symbol}.
5105:
5106: @item Symbol table
5107: A data structure where symbol names and associated data are stored
5108: during parsing to allow for recognition and use of existing
5109: information in repeated uses of a symbol. @xref{Multi-function Calc}.
5110:
5111: @item Token
5112: A basic, grammatically indivisible unit of a language. The symbol
5113: that describes a token in the grammar is a terminal symbol.
5114: The input of the Bison parser is a stream of tokens which comes from
5115: the lexical analyzer. @xref{Symbols}.
5116:
5117: @item Terminal symbol
5118: A grammar symbol that has no rules in the grammar and therefore
5119: is grammatically indivisible. The piece of text it represents
5120: is a token. @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5121: @end table
5122:
5123: @node Index, , Glossary, Top
5124: @unnumbered Index
5125:
5126: @printindex cp
5127:
5128: @contents
5129:
5130: @bye
5131:
5132:
5133:
5134:
5135: @c old menu
5136:
5137: * Introduction::
5138: * Conditions::
5139: * Copying:: The GNU General Public License says
5140: how you can copy and share Bison
5141:
5142: Tutorial sections:
5143: * Concepts:: Basic concepts for understanding Bison.
5144: * Examples:: Three simple explained examples of using Bison.
5145:
5146: Reference sections:
5147: * Grammar File:: Writing Bison declarations and rules.
5148: * Interface:: C-language interface to the parser function @code{yyparse}.
5149: * Algorithm:: How the Bison parser works at run-time.
5150: * Error Recovery:: Writing rules for error recovery.
5151: * Context Dependency::What to do if your language syntax is too
5152: messy for Bison to handle straightforwardly.
5153: * Debugging:: Debugging Bison parsers that parse wrong.
5154: * Invocation:: How to run Bison (to produce the parser source file).
5155: * Table of Symbols:: All the keywords of the Bison language are explained.
5156: * Glossary:: Basic concepts are explained.
5157: * Index:: Cross-references to the text.
5158:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.