bison/src/bison/2.4.1/bison-2.4.1-src/doc/bison.info - Issue 10807020: Add native Windows binary for bison.

Unified Diff: bison/src/bison/2.4.1/bison-2.4.1-src/doc/bison.info

Issue 10807020: Add native Windows binary for bison. (Closed) Base URL: svn://chrome-svn/chrome/trunk/deps/third_party/

Patch Set: Created 8 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: bison/src/bison/2.4.1/bison-2.4.1-src/doc/bison.info

===================================================================

--- bison/src/bison/2.4.1/bison-2.4.1-src/doc/bison.info (revision 0)

+++ bison/src/bison/2.4.1/bison-2.4.1-src/doc/bison.info (revision 0)

@@ -0,0 +1,11009 @@

+This is ../../bison-2.4.1-src/doc/bison.info, produced by makeinfo

+version 4.8 from ../../bison-2.4.1-src/doc/bison.texinfo.

+ This manual (19 November 2008) is for GNU Bison (version 2.4.1), the

+GNU parser generator.

+2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software

+Foundation, Inc.

+ Permission is granted to copy, distribute and/or modify this

+ document under the terms of the GNU Free Documentation License,

+ Version 1.2 or any later version published by the Free Software

+ Foundation; with no Invariant Sections, with the Front-Cover texts

+ being "A GNU Manual," and with the Back-Cover Texts as in (a)

+ below. A copy of the license is included in the section entitled

+ "GNU Free Documentation License."

+ (a) The FSF's Back-Cover Text is: "You have the freedom to copy and

+ modify this GNU manual. Buying copies from the FSF supports it in

+ developing GNU and promoting software freedom."

+INFO-DIR-SECTION Software development

+START-INFO-DIR-ENTRY

+* bison: (bison). GNU parser generator (Yacc replacement).

+END-INFO-DIR-ENTRY

+File: bison.info, Node: Top, Next: Introduction, Up: (dir)

+Bison

+*****

+This manual (19 November 2008) is for GNU Bison (version 2.4.1), the

+GNU parser generator.

+2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software

+Foundation, Inc.

+ Permission is granted to copy, distribute and/or modify this

+ document under the terms of the GNU Free Documentation License,

+ Version 1.2 or any later version published by the Free Software

+ Foundation; with no Invariant Sections, with the Front-Cover texts

+ being "A GNU Manual," and with the Back-Cover Texts as in (a)

+ below. A copy of the license is included in the section entitled

+ "GNU Free Documentation License."

+ (a) The FSF's Back-Cover Text is: "You have the freedom to copy and

+ modify this GNU manual. Buying copies from the FSF supports it in

+ developing GNU and promoting software freedom."

+* Menu:

+* Introduction::

+* Conditions::

+* Copying:: The GNU General Public License says

+ how you can copy and share Bison.

+Tutorial sections:

+* Concepts:: Basic concepts for understanding Bison.

+* Examples:: Three simple explained examples of using Bison.

+Reference sections:

+* Grammar File:: Writing Bison declarations and rules.

+* Interface:: C-language interface to the parser function `yyparse'.

+* Algorithm:: How the Bison parser works at run-time.

+* Error Recovery:: Writing rules for error recovery.

+* Context Dependency:: What to do if your language syntax is too

+ messy for Bison to handle straightforwardly.

+* Debugging:: Understanding or debugging Bison parsers.

+* Invocation:: How to run Bison (to produce the parser source file).

+* Other Languages:: Creating C++ and Java parsers.

+* FAQ:: Frequently Asked Questions

+* Table of Symbols:: All the keywords of the Bison language are explained.

+* Glossary:: Basic concepts are explained.

+* Copying This Manual:: License for copying this manual.

+* Index:: Cross-references to the text.

+ --- The Detailed Node Listing ---

+The Concepts of Bison

+* Language and Grammar:: Languages and context-free grammars,

+ as mathematical ideas.

+* Grammar in Bison:: How we represent grammars for Bison's sake.

+* Semantic Values:: Each token or syntactic grouping can have

+ a semantic value (the value of an integer,

+ the name of an identifier, etc.).

+* Semantic Actions:: Each rule can have an action containing C code.

+* GLR Parsers:: Writing parsers for general context-free languages.

+* Locations Overview:: Tracking Locations.

+* Bison Parser:: What are Bison's input and output,

+ how is the output used?

+* Stages:: Stages in writing and running Bison grammars.

+* Grammar Layout:: Overall structure of a Bison grammar file.

+Writing GLR Parsers

+* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.

+* Merging GLR Parses:: Using GLR parsers to resolve ambiguities.

+* GLR Semantic Actions:: Deferred semantic actions have special concerns.

+* Compiler Requirements:: GLR parsers require a modern C compiler.

+Examples

+* RPN Calc:: Reverse polish notation calculator;

+ a first example with no operator precedence.

+* Infix Calc:: Infix (algebraic) notation calculator.

+ Operator precedence is introduced.

+* Simple Error Recovery:: Continuing after syntax errors.

+* Location Tracking Calc:: Demonstrating the use of @N and @$.

+* Multi-function Calc:: Calculator with memory and trig functions.

+ It uses multiple data-types for semantic values.

+* Exercises:: Ideas for improving the multi-function calculator.

+Reverse Polish Notation Calculator

+* Rpcalc Declarations:: Prologue (declarations) for rpcalc.

+* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation.

+* Rpcalc Lexer:: The lexical analyzer.

+* Rpcalc Main:: The controlling function.

+* Rpcalc Error:: The error reporting function.

+* Rpcalc Generate:: Running Bison on the grammar file.

+* Rpcalc Compile:: Run the C compiler on the output code.

+Grammar Rules for `rpcalc'

+* Rpcalc Input::

+* Rpcalc Line::

+* Rpcalc Expr::

+Location Tracking Calculator: `ltcalc'

+* Ltcalc Declarations:: Bison and C declarations for ltcalc.

+* Ltcalc Rules:: Grammar rules for ltcalc, with explanations.

+* Ltcalc Lexer:: The lexical analyzer.

+Multi-Function Calculator: `mfcalc'

+* Mfcalc Declarations:: Bison declarations for multi-function calculator.

+* Mfcalc Rules:: Grammar rules for the calculator.

+* Mfcalc Symbol Table:: Symbol table management subroutines.

+Bison Grammar Files

+* Grammar Outline:: Overall layout of the grammar file.

+* Symbols:: Terminal and nonterminal symbols.

+* Rules:: How to write grammar rules.

+* Recursion:: Writing recursive rules.

+* Semantics:: Semantic values and actions.

+* Locations:: Locations and actions.

+* Declarations:: All kinds of Bison declarations are described here.

+* Multiple Parsers:: Putting more than one Bison parser in one program.

+Outline of a Bison Grammar

+* Prologue:: Syntax and usage of the prologue.

+* Prologue Alternatives:: Syntax and usage of alternatives to the prologue.

+* Bison Declarations:: Syntax and usage of the Bison declarations section.

+* Grammar Rules:: Syntax and usage of the grammar rules section.

+* Epilogue:: Syntax and usage of the epilogue.

+Defining Language Semantics

+* Value Type:: Specifying one data type for all semantic values.

+* Multiple Types:: Specifying several alternative data types.

+* Actions:: An action is the semantic definition of a grammar rule.

+* Action Types:: Specifying data types for actions to operate on.

+* Mid-Rule Actions:: Most actions go at the end of a rule.

+ This says when, why and how to use the exceptional

+ action in the middle of a rule.

+Tracking Locations

+* Location Type:: Specifying a data type for locations.

+* Actions and Locations:: Using locations in actions.

+* Location Default Action:: Defining a general way to compute locations.

+Bison Declarations

+* Require Decl:: Requiring a Bison version.

+* Token Decl:: Declaring terminal symbols.

+* Precedence Decl:: Declaring terminals with precedence and associativity.

+* Union Decl:: Declaring the set of all semantic value types.

+* Type Decl:: Declaring the choice of type for a nonterminal symbol.

+* Initial Action Decl:: Code run before parsing starts.

+* Destructor Decl:: Declaring how symbols are freed.

+* Expect Decl:: Suppressing warnings about parsing conflicts.

+* Start Decl:: Specifying the start symbol.

+* Pure Decl:: Requesting a reentrant parser.

+* Push Decl:: Requesting a push parser.

+* Decl Summary:: Table of all Bison declarations.

+Parser C-Language Interface

+* Parser Function:: How to call `yyparse' and what it returns.

+* Push Parser Function:: How to call `yypush_parse' and what it returns.

+* Pull Parser Function:: How to call `yypull_parse' and what it returns.

+* Parser Create Function:: How to call `yypstate_new' and what it returns.

+* Parser Delete Function:: How to call `yypstate_delete' and what it returns.

+* Lexical:: You must supply a function `yylex'

+ which reads tokens.

+* Error Reporting:: You must supply a function `yyerror'.

+* Action Features:: Special features for use in actions.

+* Internationalization:: How to let the parser speak in the user's

+ native language.

+The Lexical Analyzer Function `yylex'

+* Calling Convention:: How `yyparse' calls `yylex'.

+* Token Values:: How `yylex' must return the semantic value

+ of the token it has read.

+* Token Locations:: How `yylex' must return the text location

+ (line number, etc.) of the token, if the

+ actions want that.

+* Pure Calling:: How the calling convention differs in a pure parser

+ (*note A Pure (Reentrant) Parser: Pure Decl.).

+The Bison Parser Algorithm

+* Lookahead:: Parser looks one token ahead when deciding what to do.

+* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.

+* Precedence:: Operator precedence works by resolving conflicts.

+* Contextual Precedence:: When an operator's precedence depends on context.

+* Parser States:: The parser is a finite-state-machine with stack.

+* Reduce/Reduce:: When two rules are applicable in the same situation.

+* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.

+* Generalized LR Parsing:: Parsing arbitrary context-free grammars.

+* Memory Management:: What happens when memory is exhausted. How to avoid it.

+Operator Precedence

+* Why Precedence:: An example showing why precedence is needed.

+* Using Precedence:: How to specify precedence in Bison grammars.

+* Precedence Examples:: How these features are used in the previous example.

+* How Precedence:: How they work.

+Handling Context Dependencies

+* Semantic Tokens:: Token parsing can depend on the semantic context.

+* Lexical Tie-ins:: Token parsing can depend on the syntactic context.

+* Tie-in Recovery:: Lexical tie-ins have implications for how

+ error recovery rules must be written.

+Debugging Your Parser

+* Understanding:: Understanding the structure of your parser.

+* Tracing:: Tracing the execution of your parser.

+Invoking Bison

+* Bison Options:: All the options described in detail,

+ in alphabetical order by short options.

+* Option Cross Key:: Alphabetical list of long options.

+* Yacc Library:: Yacc-compatible `yylex' and `main'.

+Parsers Written In Other Languages

+* C++ Parsers:: The interface to generate C++ parser classes

+* Java Parsers:: The interface to generate Java parser classes

+C++ Parsers

+* C++ Bison Interface:: Asking for C++ parser generation

+* C++ Semantic Values:: %union vs. C++

+* C++ Location Values:: The position and location classes

+* C++ Parser Interface:: Instantiating and running the parser

+* C++ Scanner Interface:: Exchanges between yylex and parse

+* A Complete C++ Example:: Demonstrating their use

+A Complete C++ Example

+* Calc++ --- C++ Calculator:: The specifications

+* Calc++ Parsing Driver:: An active parsing context

+* Calc++ Parser:: A parser class

+* Calc++ Scanner:: A pure C++ Flex scanner

+* Calc++ Top Level:: Conducting the band

+Java Parsers

+* Java Bison Interface:: Asking for Java parser generation

+* Java Semantic Values:: %type and %token vs. Java

+* Java Location Values:: The position and location classes

+* Java Parser Interface:: Instantiating and running the parser

+* Java Scanner Interface:: Specifying the scanner for the parser

+* Java Action Features:: Special features for use in actions

+* Java Differences:: Differences between C/C++ and Java Grammars

+* Java Declarations Summary:: List of Bison declarations used with Java

+Frequently Asked Questions

+* Memory Exhausted:: Breaking the Stack Limits

+* How Can I Reset the Parser:: `yyparse' Keeps some State

+* Strings are Destroyed:: `yylval' Loses Track of Strings

+* Implementing Gotos/Loops:: Control Flow in the Calculator

+* Multiple start-symbols:: Factoring closely related grammars

+* Secure? Conform?:: Is Bison POSIX safe?

+* I can't build Bison:: Troubleshooting

+* Where can I find help?:: Troubleshouting

+* Bug Reports:: Troublereporting

+* More Languages:: Parsers in C++, Java, and so on

+* Beta Testing:: Experimenting development versions

+* Mailing Lists:: Meeting other Bison users

+Copying This Manual

+* Copying This Manual:: License for copying this manual.

+File: bison.info, Node: Introduction, Next: Conditions, Prev: Top, Up: Top

+Introduction

+************

+"Bison" is a general-purpose parser generator that converts an

+annotated context-free grammar into an LALR(1) or GLR parser for that

+grammar. Once you are proficient with Bison, you can use it to develop

+a wide range of language parsers, from those used in simple desk

+calculators to complex programming languages.

+ Bison is upward compatible with Yacc: all properly-written Yacc

+grammars ought to work with Bison with no change. Anyone familiar with

+Yacc should be able to use Bison with little trouble. You need to be

+fluent in C or C++ programming in order to use Bison or to understand

+this manual.

+ We begin with tutorial chapters that explain the basic concepts of

+using Bison and show three explained examples, each building on the

+last. If you don't know Bison or Yacc, start by reading these

+chapters. Reference chapters follow which describe specific aspects of

+Bison in detail.

+ Bison was written primarily by Robert Corbett; Richard Stallman made

+it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added

+multi-character string literals and other features.

+ This edition corresponds to version 2.4.1 of Bison.

+File: bison.info, Node: Conditions, Next: Copying, Prev: Introduction, Up: Top

+Conditions for Using Bison

+**************************

+The distribution terms for Bison-generated parsers permit using the

+parsers in nonfree programs. Before Bison version 2.2, these extra

+permissions applied only when Bison was generating LALR(1) parsers in

+C. And before Bison version 1.24, Bison-generated parsers could be

+used only in programs that were free software.

+ The other GNU programming tools, such as the GNU C compiler, have

+never had such a requirement. They could always be used for nonfree

+software. The reason Bison was different was not due to a special

+policy decision; it resulted from applying the usual General Public

+License to all of the Bison source code.

+ The output of the Bison utility--the Bison parser file--contains a

+verbatim copy of a sizable piece of Bison, which is the code for the

+parser's implementation. (The actions from your grammar are inserted

+into this implementation at one point, but most of the rest of the

+implementation is not changed.) When we applied the GPL terms to the

+skeleton code for the parser's implementation, the effect was to

+restrict the use of Bison output to free software.

+ We didn't change the terms because of sympathy for people who want to

+make software proprietary. *Software should be free.* But we

+concluded that limiting Bison's use to free software was doing little to

+encourage people to make other software free. So we decided to make the

+practical conditions for using Bison match the practical conditions for

+using the other GNU tools.

+ This exception applies when Bison is generating code for a parser.

+You can tell whether the exception applies to a Bison output file by

+inspecting the file for text beginning with "As a special

+exception...". The text spells out the exact terms of the exception.

+File: bison.info, Node: Copying, Next: Concepts, Prev: Conditions, Up: Top

+GNU GENERAL PUBLIC LICENSE

+**************************

+ Version 3, 29 June 2007

+ Everyone is permitted to copy and distribute verbatim copies of this

+ license document, but changing it is not allowed.

+Preamble

+========

+The GNU General Public License is a free, copyleft license for software

+and other kinds of works.

+ The licenses for most software and other practical works are designed

+to take away your freedom to share and change the works. By contrast,

+the GNU General Public License is intended to guarantee your freedom to

+share and change all versions of a program--to make sure it remains

+free software for all its users. We, the Free Software Foundation, use

+the GNU General Public License for most of our software; it applies

+also to any other work released this way by its authors. You can apply

+it to your programs, too.

+ When we speak of free software, we are referring to freedom, not

+price. Our General Public Licenses are designed to make sure that you

+have the freedom to distribute copies of free software (and charge for

+them if you wish), that you receive source code or can get it if you

+want it, that you can change the software or use pieces of it in new

+free programs, and that you know you can do these things.

+ To protect your rights, we need to prevent others from denying you

+these rights or asking you to surrender the rights. Therefore, you

+have certain responsibilities if you distribute copies of the software,

+or if you modify it: responsibilities to respect the freedom of others.

+ For example, if you distribute copies of such a program, whether

+gratis or for a fee, you must pass on to the recipients the same

+freedoms that you received. You must make sure that they, too, receive

+or can get the source code. And you must show them these terms so they

+know their rights.

+ Developers that use the GNU GPL protect your rights with two steps:

+(1) assert copyright on the software, and (2) offer you this License

+giving you legal permission to copy, distribute and/or modify it.

+ For the developers' and authors' protection, the GPL clearly explains

+that there is no warranty for this free software. For both users' and

+authors' sake, the GPL requires that modified versions be marked as

+changed, so that their problems will not be attributed erroneously to

+authors of previous versions.

+ Some devices are designed to deny users access to install or run

+modified versions of the software inside them, although the

+manufacturer can do so. This is fundamentally incompatible with the

+aim of protecting users' freedom to change the software. The

+systematic pattern of such abuse occurs in the area of products for

+individuals to use, which is precisely where it is most unacceptable.

+Therefore, we have designed this version of the GPL to prohibit the

+practice for those products. If such problems arise substantially in

+other domains, we stand ready to extend this provision to those domains

+in future versions of the GPL, as needed to protect the freedom of

+users.

+ Finally, every program is threatened constantly by software patents.

+States should not allow patents to restrict development and use of

+software on general-purpose computers, but in those that do, we wish to

+avoid the special danger that patents applied to a free program could

+make it effectively proprietary. To prevent this, the GPL assures that

+patents cannot be used to render the program non-free.

+ The precise terms and conditions for copying, distribution and

+modification follow.

+TERMS AND CONDITIONS

+====================

+ 0. Definitions.

+ "This License" refers to version 3 of the GNU General Public

+ License.

+ "Copyright" also means copyright-like laws that apply to other

+ kinds of works, such as semiconductor masks.

+ "The Program" refers to any copyrightable work licensed under this

+ License. Each licensee is addressed as "you". "Licensees" and

+ "recipients" may be individuals or organizations.

+ To "modify" a work means to copy from or adapt all or part of the

+ work in a fashion requiring copyright permission, other than the

+ making of an exact copy. The resulting work is called a "modified

+ version" of the earlier work or a work "based on" the earlier work.

+ A "covered work" means either the unmodified Program or a work

+ based on the Program.

+ To "propagate" a work means to do anything with it that, without

+ permission, would make you directly or secondarily liable for

+ infringement under applicable copyright law, except executing it

+ on a computer or modifying a private copy. Propagation includes

+ copying, distribution (with or without modification), making

+ available to the public, and in some countries other activities as

+ well.

+ To "convey" a work means any kind of propagation that enables other

+ parties to make or receive copies. Mere interaction with a user

+ through a computer network, with no transfer of a copy, is not

+ conveying.

+ An interactive user interface displays "Appropriate Legal Notices"

+ to the extent that it includes a convenient and prominently visible

+ feature that (1) displays an appropriate copyright notice, and (2)

+ tells the user that there is no warranty for the work (except to

+ the extent that warranties are provided), that licensees may

+ convey the work under this License, and how to view a copy of this

+ License. If the interface presents a list of user commands or

+ options, such as a menu, a prominent item in the list meets this

+ criterion.

+ 1. Source Code.

+ The "source code" for a work means the preferred form of the work

+ for making modifications to it. "Object code" means any

+ non-source form of a work.

+ A "Standard Interface" means an interface that either is an

+ official standard defined by a recognized standards body, or, in

+ the case of interfaces specified for a particular programming

+ language, one that is widely used among developers working in that

+ language.

+ The "System Libraries" of an executable work include anything,

+ other than the work as a whole, that (a) is included in the normal

+ form of packaging a Major Component, but which is not part of that

+ Major Component, and (b) serves only to enable use of the work

+ with that Major Component, or to implement a Standard Interface

+ for which an implementation is available to the public in source

+ code form. A "Major Component", in this context, means a major

+ essential component (kernel, window system, and so on) of the

+ specific operating system (if any) on which the executable work

+ runs, or a compiler used to produce the work, or an object code

+ interpreter used to run it.

+ The "Corresponding Source" for a work in object code form means all

+ the source code needed to generate, install, and (for an executable

+ work) run the object code and to modify the work, including

+ scripts to control those activities. However, it does not include

+ the work's System Libraries, or general-purpose tools or generally

+ available free programs which are used unmodified in performing

+ those activities but which are not part of the work. For example,

+ Corresponding Source includes interface definition files

+ associated with source files for the work, and the source code for

+ shared libraries and dynamically linked subprograms that the work

+ is specifically designed to require, such as by intimate data

+ communication or control flow between those subprograms and other

+ parts of the work.

+ The Corresponding Source need not include anything that users can

+ regenerate automatically from other parts of the Corresponding

+ Source.

+ The Corresponding Source for a work in source code form is that

+ same work.

+ 2. Basic Permissions.

+ All rights granted under this License are granted for the term of

+ copyright on the Program, and are irrevocable provided the stated

+ conditions are met. This License explicitly affirms your unlimited

+ permission to run the unmodified Program. The output from running

+ a covered work is covered by this License only if the output,

+ given its content, constitutes a covered work. This License

+ acknowledges your rights of fair use or other equivalent, as

+ provided by copyright law.

+ You may make, run and propagate covered works that you do not

+ convey, without conditions so long as your license otherwise

+ remains in force. You may convey covered works to others for the

+ sole purpose of having them make modifications exclusively for

+ you, or provide you with facilities for running those works,

+ provided that you comply with the terms of this License in

+ conveying all material for which you do not control copyright.

+ Those thus making or running the covered works for you must do so

+ exclusively on your behalf, under your direction and control, on

+ terms that prohibit them from making any copies of your

+ copyrighted material outside their relationship with you.

+ Conveying under any other circumstances is permitted solely under

+ the conditions stated below. Sublicensing is not allowed; section

+ 10 makes it unnecessary.

+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.

+ No covered work shall be deemed part of an effective technological

+ measure under any applicable law fulfilling obligations under

+ article 11 of the WIPO copyright treaty adopted on 20 December

+ 1996, or similar laws prohibiting or restricting circumvention of

+ such measures.

+ When you convey a covered work, you waive any legal power to forbid

+ circumvention of technological measures to the extent such

+ circumvention is effected by exercising rights under this License

+ with respect to the covered work, and you disclaim any intention

+ to limit operation or modification of the work as a means of

+ enforcing, against the work's users, your or third parties' legal

+ rights to forbid circumvention of technological measures.

+ 4. Conveying Verbatim Copies.

+ You may convey verbatim copies of the Program's source code as you

+ receive it, in any medium, provided that you conspicuously and

+ appropriately publish on each copy an appropriate copyright notice;

+ keep intact all notices stating that this License and any

+ non-permissive terms added in accord with section 7 apply to the

+ code; keep intact all notices of the absence of any warranty; and

+ give all recipients a copy of this License along with the Program.

+ You may charge any price or no price for each copy that you convey,

+ and you may offer support or warranty protection for a fee.

+ 5. Conveying Modified Source Versions.

+ You may convey a work based on the Program, or the modifications to

+ produce it from the Program, in the form of source code under the

+ terms of section 4, provided that you also meet all of these

+ conditions:

+ a. The work must carry prominent notices stating that you

+ modified it, and giving a relevant date.

+ b. The work must carry prominent notices stating that it is

+ released under this License and any conditions added under

+ section 7. This requirement modifies the requirement in

+ section 4 to "keep intact all notices".

+ c. You must license the entire work, as a whole, under this

+ License to anyone who comes into possession of a copy. This

+ License will therefore apply, along with any applicable

+ section 7 additional terms, to the whole of the work, and all

+ its parts, regardless of how they are packaged. This License

+ gives no permission to license the work in any other way, but

+ it does not invalidate such permission if you have separately

+ received it.

+ d. If the work has interactive user interfaces, each must display

+ Appropriate Legal Notices; however, if the Program has

+ interactive interfaces that do not display Appropriate Legal

+ Notices, your work need not make them do so.

+ A compilation of a covered work with other separate and independent

+ works, which are not by their nature extensions of the covered

+ work, and which are not combined with it such as to form a larger

+ program, in or on a volume of a storage or distribution medium, is

+ called an "aggregate" if the compilation and its resulting

+ copyright are not used to limit the access or legal rights of the

+ compilation's users beyond what the individual works permit.

+ Inclusion of a covered work in an aggregate does not cause this

+ License to apply to the other parts of the aggregate.

+ 6. Conveying Non-Source Forms.

+ You may convey a covered work in object code form under the terms

+ of sections 4 and 5, provided that you also convey the

+ machine-readable Corresponding Source under the terms of this

+ License, in one of these ways:

+ a. Convey the object code in, or embodied in, a physical product

+ (including a physical distribution medium), accompanied by the

+ Corresponding Source fixed on a durable physical medium

+ customarily used for software interchange.

+ b. Convey the object code in, or embodied in, a physical product

+ (including a physical distribution medium), accompanied by a

+ written offer, valid for at least three years and valid for

+ as long as you offer spare parts or customer support for that

+ product model, to give anyone who possesses the object code

+ either (1) a copy of the Corresponding Source for all the

+ software in the product that is covered by this License, on a

+ durable physical medium customarily used for software

+ interchange, for a price no more than your reasonable cost of

+ physically performing this conveying of source, or (2) access

+ to copy the Corresponding Source from a network server at no

+ charge.

+ c. Convey individual copies of the object code with a copy of

+ the written offer to provide the Corresponding Source. This

+ alternative is allowed only occasionally and noncommercially,

+ and only if you received the object code with such an offer,

+ in accord with subsection 6b.

+ d. Convey the object code by offering access from a designated

+ place (gratis or for a charge), and offer equivalent access

+ to the Corresponding Source in the same way through the same

+ place at no further charge. You need not require recipients

+ to copy the Corresponding Source along with the object code.

+ If the place to copy the object code is a network server, the

+ Corresponding Source may be on a different server (operated

+ by you or a third party) that supports equivalent copying

+ facilities, provided you maintain clear directions next to

+ the object code saying where to find the Corresponding Source.

+ Regardless of what server hosts the Corresponding Source, you

+ remain obligated to ensure that it is available for as long

+ as needed to satisfy these requirements.

+ e. Convey the object code using peer-to-peer transmission,

+ provided you inform other peers where the object code and

+ Corresponding Source of the work are being offered to the

+ general public at no charge under subsection 6d.

+ A separable portion of the object code, whose source code is

+ excluded from the Corresponding Source as a System Library, need

+ not be included in conveying the object code work.

+ A "User Product" is either (1) a "consumer product", which means

+ any tangible personal property which is normally used for personal,

+ family, or household purposes, or (2) anything designed or sold for

+ incorporation into a dwelling. In determining whether a product

+ is a consumer product, doubtful cases shall be resolved in favor of

+ coverage. For a particular product received by a particular user,

+ "normally used" refers to a typical or common use of that class of

+ product, regardless of the status of the particular user or of the

+ way in which the particular user actually uses, or expects or is

+ expected to use, the product. A product is a consumer product

+ regardless of whether the product has substantial commercial,

+ industrial or non-consumer uses, unless such uses represent the

+ only significant mode of use of the product.

+ "Installation Information" for a User Product means any methods,

+ procedures, authorization keys, or other information required to

+ install and execute modified versions of a covered work in that

+ User Product from a modified version of its Corresponding Source.

+ The information must suffice to ensure that the continued

+ functioning of the modified object code is in no case prevented or

+ interfered with solely because modification has been made.

+ If you convey an object code work under this section in, or with,

+ or specifically for use in, a User Product, and the conveying

+ occurs as part of a transaction in which the right of possession

+ and use of the User Product is transferred to the recipient in

+ perpetuity or for a fixed term (regardless of how the transaction

+ is characterized), the Corresponding Source conveyed under this

+ section must be accompanied by the Installation Information. But

+ this requirement does not apply if neither you nor any third party

+ retains the ability to install modified object code on the User

+ Product (for example, the work has been installed in ROM).

+ The requirement to provide Installation Information does not

+ include a requirement to continue to provide support service,

+ warranty, or updates for a work that has been modified or

+ installed by the recipient, or for the User Product in which it

+ has been modified or installed. Access to a network may be denied

+ when the modification itself materially and adversely affects the

+ operation of the network or violates the rules and protocols for

+ communication across the network.

+ Corresponding Source conveyed, and Installation Information

+ provided, in accord with this section must be in a format that is

+ publicly documented (and with an implementation available to the

+ public in source code form), and must require no special password

+ or key for unpacking, reading or copying.

+ 7. Additional Terms.

+ "Additional permissions" are terms that supplement the terms of

+ this License by making exceptions from one or more of its

+ conditions. Additional permissions that are applicable to the

+ entire Program shall be treated as though they were included in

+ this License, to the extent that they are valid under applicable

+ law. If additional permissions apply only to part of the Program,

+ that part may be used separately under those permissions, but the

+ entire Program remains governed by this License without regard to

+ the additional permissions.

+ When you convey a copy of a covered work, you may at your option

+ remove any additional permissions from that copy, or from any part

+ of it. (Additional permissions may be written to require their own

+ removal in certain cases when you modify the work.) You may place

+ additional permissions on material, added by you to a covered work,

+ for which you have or can give appropriate copyright permission.

+ Notwithstanding any other provision of this License, for material

+ you add to a covered work, you may (if authorized by the copyright

+ holders of that material) supplement the terms of this License

+ with terms:

+ a. Disclaiming warranty or limiting liability differently from

+ the terms of sections 15 and 16 of this License; or

+ b. Requiring preservation of specified reasonable legal notices

+ or author attributions in that material or in the Appropriate

+ Legal Notices displayed by works containing it; or

+ c. Prohibiting misrepresentation of the origin of that material,

+ or requiring that modified versions of such material be

+ marked in reasonable ways as different from the original

+ version; or

+ d. Limiting the use for publicity purposes of names of licensors

+ or authors of the material; or

+ e. Declining to grant rights under trademark law for use of some

+ trade names, trademarks, or service marks; or

+ f. Requiring indemnification of licensors and authors of that

+ material by anyone who conveys the material (or modified

+ versions of it) with contractual assumptions of liability to

+ the recipient, for any liability that these contractual

+ assumptions directly impose on those licensors and authors.

+ All other non-permissive additional terms are considered "further

+ restrictions" within the meaning of section 10. If the Program as

+ you received it, or any part of it, contains a notice stating that

+ it is governed by this License along with a term that is a further

+ restriction, you may remove that term. If a license document

+ contains a further restriction but permits relicensing or

+ conveying under this License, you may add to a covered work

+ material governed by the terms of that license document, provided

+ that the further restriction does not survive such relicensing or

+ conveying.

+ If you add terms to a covered work in accord with this section, you

+ must place, in the relevant source files, a statement of the

+ additional terms that apply to those files, or a notice indicating

+ where to find the applicable terms.

+ Additional terms, permissive or non-permissive, may be stated in

+ the form of a separately written license, or stated as exceptions;

+ the above requirements apply either way.

+ 8. Termination.

+ You may not propagate or modify a covered work except as expressly

+ provided under this License. Any attempt otherwise to propagate or

+ modify it is void, and will automatically terminate your rights

+ under this License (including any patent licenses granted under

+ the third paragraph of section 11).

+ However, if you cease all violation of this License, then your

+ license from a particular copyright holder is reinstated (a)

+ provisionally, unless and until the copyright holder explicitly

+ and finally terminates your license, and (b) permanently, if the

+ copyright holder fails to notify you of the violation by some

+ reasonable means prior to 60 days after the cessation.

+ Moreover, your license from a particular copyright holder is

+ reinstated permanently if the copyright holder notifies you of the

+ violation by some reasonable means, this is the first time you have

+ received notice of violation of this License (for any work) from

+ that copyright holder, and you cure the violation prior to 30 days

+ after your receipt of the notice.

+ Termination of your rights under this section does not terminate

+ the licenses of parties who have received copies or rights from

+ you under this License. If your rights have been terminated and

+ not permanently reinstated, you do not qualify to receive new

+ licenses for the same material under section 10.

+ 9. Acceptance Not Required for Having Copies.

+ You are not required to accept this License in order to receive or

+ run a copy of the Program. Ancillary propagation of a covered work

+ occurring solely as a consequence of using peer-to-peer

+ transmission to receive a copy likewise does not require

+ acceptance. However, nothing other than this License grants you

+ permission to propagate or modify any covered work. These actions

+ infringe copyright if you do not accept this License. Therefore,

+ by modifying or propagating a covered work, you indicate your

+ acceptance of this License to do so.

+ 10. Automatic Licensing of Downstream Recipients.

+ Each time you convey a covered work, the recipient automatically

+ receives a license from the original licensors, to run, modify and

+ propagate that work, subject to this License. You are not

+ responsible for enforcing compliance by third parties with this

+ License.

+ An "entity transaction" is a transaction transferring control of an

+ organization, or substantially all assets of one, or subdividing an

+ organization, or merging organizations. If propagation of a

+ covered work results from an entity transaction, each party to that

+ transaction who receives a copy of the work also receives whatever

+ licenses to the work the party's predecessor in interest had or

+ could give under the previous paragraph, plus a right to

+ possession of the Corresponding Source of the work from the

+ predecessor in interest, if the predecessor has it or can get it

+ with reasonable efforts.

+ You may not impose any further restrictions on the exercise of the

+ rights granted or affirmed under this License. For example, you

+ may not impose a license fee, royalty, or other charge for

+ exercise of rights granted under this License, and you may not

+ initiate litigation (including a cross-claim or counterclaim in a

+ lawsuit) alleging that any patent claim is infringed by making,

+ using, selling, offering for sale, or importing the Program or any

+ portion of it.

+ 11. Patents.

+ A "contributor" is a copyright holder who authorizes use under this

+ License of the Program or a work on which the Program is based.

+ The work thus licensed is called the contributor's "contributor

+ version".

+ A contributor's "essential patent claims" are all patent claims

+ owned or controlled by the contributor, whether already acquired or

+ hereafter acquired, that would be infringed by some manner,

+ permitted by this License, of making, using, or selling its

+ contributor version, but do not include claims that would be

+ infringed only as a consequence of further modification of the

+ contributor version. For purposes of this definition, "control"

+ includes the right to grant patent sublicenses in a manner

+ consistent with the requirements of this License.

+ Each contributor grants you a non-exclusive, worldwide,

+ royalty-free patent license under the contributor's essential

+ patent claims, to make, use, sell, offer for sale, import and

+ otherwise run, modify and propagate the contents of its

+ contributor version.

+ In the following three paragraphs, a "patent license" is any

+ express agreement or commitment, however denominated, not to

+ enforce a patent (such as an express permission to practice a

+ patent or covenant not to sue for patent infringement). To

+ "grant" such a patent license to a party means to make such an

+ agreement or commitment not to enforce a patent against the party.

+ If you convey a covered work, knowingly relying on a patent

+ license, and the Corresponding Source of the work is not available

+ for anyone to copy, free of charge and under the terms of this

+ License, through a publicly available network server or other

+ readily accessible means, then you must either (1) cause the

+ Corresponding Source to be so available, or (2) arrange to deprive

+ yourself of the benefit of the patent license for this particular

+ work, or (3) arrange, in a manner consistent with the requirements

+ of this License, to extend the patent license to downstream

+ recipients. "Knowingly relying" means you have actual knowledge

+ that, but for the patent license, your conveying the covered work

+ in a country, or your recipient's use of the covered work in a

+ country, would infringe one or more identifiable patents in that

+ country that you have reason to believe are valid.

+ If, pursuant to or in connection with a single transaction or

+ arrangement, you convey, or propagate by procuring conveyance of, a

+ covered work, and grant a patent license to some of the parties

+ receiving the covered work authorizing them to use, propagate,

+ modify or convey a specific copy of the covered work, then the

+ patent license you grant is automatically extended to all

+ recipients of the covered work and works based on it.

+ A patent license is "discriminatory" if it does not include within

+ the scope of its coverage, prohibits the exercise of, or is

+ conditioned on the non-exercise of one or more of the rights that

+ are specifically granted under this License. You may not convey a

+ covered work if you are a party to an arrangement with a third

+ party that is in the business of distributing software, under

+ which you make payment to the third party based on the extent of

+ your activity of conveying the work, and under which the third

+ party grants, to any of the parties who would receive the covered

+ work from you, a discriminatory patent license (a) in connection

+ with copies of the covered work conveyed by you (or copies made

+ from those copies), or (b) primarily for and in connection with

+ specific products or compilations that contain the covered work,

+ unless you entered into that arrangement, or that patent license

+ was granted, prior to 28 March 2007.

+ Nothing in this License shall be construed as excluding or limiting

+ any implied license or other defenses to infringement that may

+ otherwise be available to you under applicable patent law.

+ 12. No Surrender of Others' Freedom.

+ If conditions are imposed on you (whether by court order,

+ agreement or otherwise) that contradict the conditions of this

+ License, they do not excuse you from the conditions of this

+ License. If you cannot convey a covered work so as to satisfy

+ simultaneously your obligations under this License and any other

+ pertinent obligations, then as a consequence you may not convey it

+ at all. For example, if you agree to terms that obligate you to

+ collect a royalty for further conveying from those to whom you

+ convey the Program, the only way you could satisfy both those

+ terms and this License would be to refrain entirely from conveying

+ the Program.

+ 13. Use with the GNU Affero General Public License.

+ Notwithstanding any other provision of this License, you have

+ permission to link or combine any covered work with a work licensed

+ under version 3 of the GNU Affero General Public License into a

+ single combined work, and to convey the resulting work. The terms

+ of this License will continue to apply to the part which is the

+ covered work, but the special requirements of the GNU Affero

+ General Public License, section 13, concerning interaction through

+ a network will apply to the combination as such.

+ 14. Revised Versions of this License.

+ The Free Software Foundation may publish revised and/or new

+ versions of the GNU General Public License from time to time.

+ Such new versions will be similar in spirit to the present

+ version, but may differ in detail to address new problems or

+ concerns.

+ Each version is given a distinguishing version number. If the

+ Program specifies that a certain numbered version of the GNU

+ General Public License "or any later version" applies to it, you

+ have the option of following the terms and conditions either of

+ that numbered version or of any later version published by the

+ Free Software Foundation. If the Program does not specify a

+ version number of the GNU General Public License, you may choose

+ any version ever published by the Free Software Foundation.

+ If the Program specifies that a proxy can decide which future

+ versions of the GNU General Public License can be used, that

+ proxy's public statement of acceptance of a version permanently

+ authorizes you to choose that version for the Program.

+ Later license versions may give you additional or different

+ permissions. However, no additional obligations are imposed on any

+ author or copyright holder as a result of your choosing to follow a

+ later version.

+ 15. Disclaimer of Warranty.

+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY

+ APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE

+ COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS"

+ WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,

+ INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF

+ MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE

+ RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.

+ SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL

+ NECESSARY SERVICING, REPAIR OR CORRECTION.

+ 16. Limitation of Liability.

+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN

+ WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES

+ AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU

+ FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR

+ CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE

+ THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA

+ BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD

+ PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER

+ PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF

+ THE POSSIBILITY OF SUCH DAMAGES.

+ 17. Interpretation of Sections 15 and 16.

+ If the disclaimer of warranty and limitation of liability provided

+ above cannot be given local legal effect according to their terms,

+ reviewing courts shall apply local law that most closely

+ approximates an absolute waiver of all civil liability in

+ connection with the Program, unless a warranty or assumption of

+ liability accompanies a copy of the Program in return for a fee.

+END OF TERMS AND CONDITIONS

+===========================

+How to Apply These Terms to Your New Programs

+=============================================

+If you develop a new program, and you want it to be of the greatest

+possible use to the public, the best way to achieve this is to make it

+free software which everyone can redistribute and change under these

+terms.

+ To do so, attach the following notices to the program. It is safest

+to attach them to the start of each source file to most effectively

+state the exclusion of warranty; and each file should have at least the

+"copyright" line and a pointer to where the full notice is found.

+ ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.

+ Copyright (C) YEAR NAME OF AUTHOR

+ This program is free software: you can redistribute it and/or modify

+ it under the terms of the GNU General Public License as published by

+ the Free Software Foundation, either version 3 of the License, or (at

+ your option) any later version.

+ This program is distributed in the hope that it will be useful, but

+ WITHOUT ANY WARRANTY; without even the implied warranty of

+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

+ General Public License for more details.

+ You should have received a copy of the GNU General Public License

+ along with this program. If not, see `http://www.gnu.org/licenses/'.

+ Also add information on how to contact you by electronic and paper

+mail.

+ If the program does terminal interaction, make it output a short

+notice like this when it starts in an interactive mode:

+ PROGRAM Copyright (C) YEAR NAME OF AUTHOR

+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.

+ This is free software, and you are welcome to redistribute it

+ under certain conditions; type `show c' for details.

+ The hypothetical commands `show w' and `show c' should show the

+appropriate parts of the General Public License. Of course, your

+program's commands might be different; for a GUI interface, you would

+use an "about box".

+ You should also get your employer (if you work as a programmer) or

+school, if any, to sign a "copyright disclaimer" for the program, if

+necessary. For more information on this, and how to apply and follow

+the GNU GPL, see `http://www.gnu.org/licenses/'.

+ The GNU General Public License does not permit incorporating your

+program into proprietary programs. If your program is a subroutine

+library, you may consider it more useful to permit linking proprietary

+applications with the library. If this is what you want to do, use the

+GNU Lesser General Public License instead of this License. But first,

+please read `http://www.gnu.org/philosophy/why-not-lgpl.html'.

+File: bison.info, Node: Concepts, Next: Examples, Prev: Copying, Up: Top

+1 The Concepts of Bison

+***********************

+This chapter introduces many of the basic concepts without which the

+details of Bison will not make sense. If you do not already know how to

+use Bison or Yacc, we suggest you start by reading this chapter

+carefully.

+* Menu:

+* Language and Grammar:: Languages and context-free grammars,

+ as mathematical ideas.

+* Grammar in Bison:: How we represent grammars for Bison's sake.

+* Semantic Values:: Each token or syntactic grouping can have

+ a semantic value (the value of an integer,

+ the name of an identifier, etc.).

+* Semantic Actions:: Each rule can have an action containing C code.

+* GLR Parsers:: Writing parsers for general context-free languages.

+* Locations Overview:: Tracking Locations.

+* Bison Parser:: What are Bison's input and output,

+ how is the output used?

+* Stages:: Stages in writing and running Bison grammars.

+* Grammar Layout:: Overall structure of a Bison grammar file.

+File: bison.info, Node: Language and Grammar, Next: Grammar in Bison, Up: Concepts

+1.1 Languages and Context-Free Grammars

+=======================================

+In order for Bison to parse a language, it must be described by a

+"context-free grammar". This means that you specify one or more

+"syntactic groupings" and give rules for constructing them from their

+parts. For example, in the C language, one kind of grouping is called

+an `expression'. One rule for making an expression might be, "An

+expression can be made of a minus sign and another expression".

+Another would be, "An expression can be an integer". As you can see,

+rules are often recursive, but there must be at least one rule which

+leads out of the recursion.

+ The most common formal system for presenting such rules for humans

+to read is "Backus-Naur Form" or "BNF", which was developed in order to

+specify the language Algol 60. Any grammar expressed in BNF is a

+context-free grammar. The input to Bison is essentially

+machine-readable BNF.

+ There are various important subclasses of context-free grammar.

+Although it can handle almost all context-free grammars, Bison is

+optimized for what are called LALR(1) grammars. In brief, in these

+grammars, it must be possible to tell how to parse any portion of an

+input string with just a single token of lookahead. Strictly speaking,

+that is a description of an LR(1) grammar, and LALR(1) involves

+additional restrictions that are hard to explain simply; but it is rare

+in actual practice to find an LR(1) grammar that fails to be LALR(1).

+*Note Mysterious Reduce/Reduce Conflicts: Mystery Conflicts, for more

+information on this.

+ Parsers for LALR(1) grammars are "deterministic", meaning roughly

+that the next grammar rule to apply at any point in the input is

+uniquely determined by the preceding input and a fixed, finite portion

+(called a "lookahead") of the remaining input. A context-free grammar

+can be "ambiguous", meaning that there are multiple ways to apply the

+grammar rules to get the same inputs. Even unambiguous grammars can be

+"nondeterministic", meaning that no fixed lookahead always suffices to

+determine the next grammar rule to apply. With the proper

+declarations, Bison is also able to parse these more general

+context-free grammars, using a technique known as GLR parsing (for

+Generalized LR). Bison's GLR parsers are able to handle any

+context-free grammar for which the number of possible parses of any

+given string is finite.

+ In the formal grammatical rules for a language, each kind of

+syntactic unit or grouping is named by a "symbol". Those which are

+built by grouping smaller constructs according to grammatical rules are

+called "nonterminal symbols"; those which can't be subdivided are called

+"terminal symbols" or "token types". We call a piece of input

+corresponding to a single terminal symbol a "token", and a piece

+corresponding to a single nonterminal symbol a "grouping".

+ We can use the C language as an example of what symbols, terminal and

+nonterminal, mean. The tokens of C are identifiers, constants (numeric

+and string), and the various keywords, arithmetic operators and

+punctuation marks. So the terminal symbols of a grammar for C include

+`identifier', `number', `string', plus one symbol for each keyword,

+operator or punctuation mark: `if', `return', `const', `static', `int',

+`char', `plus-sign', `open-brace', `close-brace', `comma' and many more.

+(These tokens can be subdivided into characters, but that is a matter of

+lexicography, not grammar.)

+ Here is a simple C function subdivided into tokens:

+ int /* keyword `int' */

+ square (int x) /* identifier, open-paren, keyword `int',

+ identifier, close-paren */

+ { /* open-brace */

+ return x * x; /* keyword `return', identifier, asterisk,

+ identifier, semicolon */

+ } /* close-brace */

+ The syntactic groupings of C include the expression, the statement,

+the declaration, and the function definition. These are represented in

+the grammar of C by nonterminal symbols `expression', `statement',

+`declaration' and `function definition'. The full grammar uses dozens

+of additional language constructs, each with its own nonterminal

+symbol, in order to express the meanings of these four. The example

+above is a function definition; it contains one declaration, and one

+statement. In the statement, each `x' is an expression and so is `x *

+x'.

+ Each nonterminal symbol must have grammatical rules showing how it

+is made out of simpler constructs. For example, one kind of C

+statement is the `return' statement; this would be described with a

+grammar rule which reads informally as follows:

+ A `statement' can be made of a `return' keyword, an `expression'

+ and a `semicolon'.

+There would be many other rules for `statement', one for each kind of

+statement in C.

+ One nonterminal symbol must be distinguished as the special one which

+defines a complete utterance in the language. It is called the "start

+symbol". In a compiler, this means a complete input program. In the C

+language, the nonterminal symbol `sequence of definitions and

+declarations' plays this role.

+ For example, `1 + 2' is a valid C expression--a valid part of a C

+program--but it is not valid as an _entire_ C program. In the

+context-free grammar of C, this follows from the fact that `expression'

+is not the start symbol.

+ The Bison parser reads a sequence of tokens as its input, and groups

+the tokens using the grammar rules. If the input is valid, the end

+result is that the entire token sequence reduces to a single grouping

+whose symbol is the grammar's start symbol. If we use a grammar for C,

+the entire input must be a `sequence of definitions and declarations'.

+If not, the parser reports a syntax error.

+File: bison.info, Node: Grammar in Bison, Next: Semantic Values, Prev: Language and Grammar, Up: Concepts

+1.2 From Formal Rules to Bison Input

+====================================

+A formal grammar is a mathematical construct. To define the language

+for Bison, you must write a file expressing the grammar in Bison syntax:

+a "Bison grammar" file. *Note Bison Grammar Files: Grammar File.

+ A nonterminal symbol in the formal grammar is represented in Bison

+input as an identifier, like an identifier in C. By convention, it

+should be in lower case, such as `expr', `stmt' or `declaration'.

+ The Bison representation for a terminal symbol is also called a

+"token type". Token types as well can be represented as C-like

+identifiers. By convention, these identifiers should be upper case to

+distinguish them from nonterminals: for example, `INTEGER',

+`IDENTIFIER', `IF' or `RETURN'. A terminal symbol that stands for a

+particular keyword in the language should be named after that keyword

+converted to upper case. The terminal symbol `error' is reserved for

+error recovery. *Note Symbols::.

+ A terminal symbol can also be represented as a character literal,

+just like a C character constant. You should do this whenever a token

+is just a single character (parenthesis, plus-sign, etc.): use that

+same character in a literal as the terminal symbol for that token.

+ A third way to represent a terminal symbol is with a C string

+constant containing several characters. *Note Symbols::, for more

+information.

+ The grammar rules also have an expression in Bison syntax. For

+example, here is the Bison rule for a C `return' statement. The

+semicolon in quotes is a literal character token, representing part of

+the C syntax for the statement; the naked semicolon, and the colon, are

+Bison punctuation used in every rule.

+ stmt: RETURN expr ';'

+ ;

+*Note Syntax of Grammar Rules: Rules.

+File: bison.info, Node: Semantic Values, Next: Semantic Actions, Prev: Grammar in Bison, Up: Concepts

+1.3 Semantic Values

+===================

+A formal grammar selects tokens only by their classifications: for

+example, if a rule mentions the terminal symbol `integer constant', it

+means that _any_ integer constant is grammatically valid in that

+position. The precise value of the constant is irrelevant to how to

+parse the input: if `x+4' is grammatical then `x+1' or `x+3989' is

+equally grammatical.

+ But the precise value is very important for what the input means

+once it is parsed. A compiler is useless if it fails to distinguish

+between 4, 1 and 3989 as constants in the program! Therefore, each

+token in a Bison grammar has both a token type and a "semantic value".

+*Note Defining Language Semantics: Semantics, for details.

+ The token type is a terminal symbol defined in the grammar, such as

+`INTEGER', `IDENTIFIER' or `',''. It tells everything you need to know

+to decide where the token may validly appear and how to group it with

+other tokens. The grammar rules know nothing about tokens except their

+types.

+ The semantic value has all the rest of the information about the

+meaning of the token, such as the value of an integer, or the name of an

+identifier. (A token such as `','' which is just punctuation doesn't

+need to have any semantic value.)

+ For example, an input token might be classified as token type

+`INTEGER' and have the semantic value 4. Another input token might

+have the same token type `INTEGER' but value 3989. When a grammar rule

+says that `INTEGER' is allowed, either of these tokens is acceptable

+because each is an `INTEGER'. When the parser accepts the token, it

+keeps track of the token's semantic value.

+ Each grouping can also have a semantic value as well as its

+nonterminal symbol. For example, in a calculator, an expression

+typically has a semantic value that is a number. In a compiler for a

+programming language, an expression typically has a semantic value that

+is a tree structure describing the meaning of the expression.

+File: bison.info, Node: Semantic Actions, Next: GLR Parsers, Prev: Semantic Values, Up: Concepts

+1.4 Semantic Actions

+====================

+In order to be useful, a program must do more than parse input; it must

+also produce some output based on the input. In a Bison grammar, a

+grammar rule can have an "action" made up of C statements. Each time

+the parser recognizes a match for that rule, the action is executed.

+*Note Actions::.

+ Most of the time, the purpose of an action is to compute the

+semantic value of the whole construct from the semantic values of its

+parts. For example, suppose we have a rule which says an expression

+can be the sum of two expressions. When the parser recognizes such a

+sum, each of the subexpressions has a semantic value which describes

+how it was built up. The action for this rule should create a similar

+sort of value for the newly recognized larger expression.

+ For example, here is a rule that says an expression can be the sum of

+two subexpressions:

+ expr: expr '+' expr { $$ = $1 + $3; }

+ ;

+The action says how to produce the semantic value of the sum expression

+from the values of the two subexpressions.

+File: bison.info, Node: GLR Parsers, Next: Locations Overview, Prev: Semantic Actions, Up: Concepts

+1.5 Writing GLR Parsers

+=======================

+In some grammars, Bison's standard LALR(1) parsing algorithm cannot

+decide whether to apply a certain grammar rule at a given point. That

+is, it may not be able to decide (on the basis of the input read so

+far) which of two possible reductions (applications of a grammar rule)

+applies, or whether to apply a reduction or read more of the input and

+apply a reduction later in the input. These are known respectively as

+"reduce/reduce" conflicts (*note Reduce/Reduce::), and "shift/reduce"

+conflicts (*note Shift/Reduce::).

+ To use a grammar that is not easily modified to be LALR(1), a more

+general parsing algorithm is sometimes necessary. If you include

+`%glr-parser' among the Bison declarations in your file (*note Grammar

+Outline::), the result is a Generalized LR (GLR) parser. These parsers

+handle Bison grammars that contain no unresolved conflicts (i.e., after

+applying precedence declarations) identically to LALR(1) parsers.

+However, when faced with unresolved shift/reduce and reduce/reduce

+conflicts, GLR parsers use the simple expedient of doing both,

+effectively cloning the parser to follow both possibilities. Each of

+the resulting parsers can again split, so that at any given time, there

+can be any number of possible parses being explored. The parsers

+proceed in lockstep; that is, all of them consume (shift) a given input

+symbol before any of them proceed to the next. Each of the cloned

+parsers eventually meets one of two possible fates: either it runs into

+a parsing error, in which case it simply vanishes, or it merges with

+another parser, because the two of them have reduced the input to an

+identical set of symbols.

+ During the time that there are multiple parsers, semantic actions are

+recorded, but not performed. When a parser disappears, its recorded

+semantic actions disappear as well, and are never performed. When a

+reduction makes two parsers identical, causing them to merge, Bison

+records both sets of semantic actions. Whenever the last two parsers

+merge, reverting to the single-parser case, Bison resolves all the

+outstanding actions either by precedences given to the grammar rules

+involved, or by performing both actions, and then calling a designated

+user-defined function on the resulting values to produce an arbitrary

+merged result.

+* Menu:

+* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.

+* Merging GLR Parses:: Using GLR parsers to resolve ambiguities.

+* GLR Semantic Actions:: Deferred semantic actions have special concerns.

+* Compiler Requirements:: GLR parsers require a modern C compiler.

+File: bison.info, Node: Simple GLR Parsers, Next: Merging GLR Parses, Up: GLR Parsers

+1.5.1 Using GLR on Unambiguous Grammars

+---------------------------------------

+In the simplest cases, you can use the GLR algorithm to parse grammars

+that are unambiguous, but fail to be LALR(1). Such grammars typically

+require more than one symbol of lookahead, or (in rare cases) fall into

+the category of grammars in which the LALR(1) algorithm throws away too

+much information (they are in LR(1), but not LALR(1), *Note Mystery

+Conflicts::).

+ Consider a problem that arises in the declaration of enumerated and

+subrange types in the programming language Pascal. Here are some

+examples:

+ type subrange = lo .. hi;

+ type enum = (a, b, c);

+The original language standard allows only numeric literals and

+constant identifiers for the subrange bounds (`lo' and `hi'), but

+Extended Pascal (ISO/IEC 10206) and many other Pascal implementations

+allow arbitrary expressions there. This gives rise to the following

+situation, containing a superfluous pair of parentheses:

+ type subrange = (a) .. b;

+Compare this to the following declaration of an enumerated type with

+only one value:

+ type enum = (a);

+(These declarations are contrived, but they are syntactically valid,

+and more-complicated cases can come up in practical programs.)

+ These two declarations look identical until the `..' token. With

+normal LALR(1) one-token lookahead it is not possible to decide between

+the two forms when the identifier `a' is parsed. It is, however,

+desirable for a parser to decide this, since in the latter case `a'

+must become a new identifier to represent the enumeration value, while

+in the former case `a' must be evaluated with its current meaning,

+which may be a constant or even a function call.

+ You could parse `(a)' as an "unspecified identifier in parentheses",

+to be resolved later, but this typically requires substantial

+contortions in both semantic actions and large parts of the grammar,

+where the parentheses are nested in the recursive rules for expressions.

+ You might think of using the lexer to distinguish between the two

+forms by returning different tokens for currently defined and undefined

+identifiers. But if these declarations occur in a local scope, and `a'

+is defined in an outer scope, then both forms are possible--either

+locally redefining `a', or using the value of `a' from the outer scope.

+So this approach cannot work.

+ A simple solution to this problem is to declare the parser to use

+the GLR algorithm. When the GLR parser reaches the critical state, it

+merely splits into two branches and pursues both syntax rules

+simultaneously. Sooner or later, one of them runs into a parsing

+error. If there is a `..' token before the next `;', the rule for

+enumerated types fails since it cannot accept `..' anywhere; otherwise,

+the subrange type rule fails since it requires a `..' token. So one of

+the branches fails silently, and the other one continues normally,

+performing all the intermediate actions that were postponed during the

+split.

+ If the input is syntactically incorrect, both branches fail and the

+parser reports a syntax error as usual.

+ The effect of all this is that the parser seems to "guess" the

+correct branch to take, or in other words, it seems to use more

+lookahead than the underlying LALR(1) algorithm actually allows for.

+In this example, LALR(2) would suffice, but also some cases that are

+not LALR(k) for any k can be handled this way.

+ In general, a GLR parser can take quadratic or cubic worst-case time,

+and the current Bison parser even takes exponential time and space for

+some grammars. In practice, this rarely happens, and for many grammars

+it is possible to prove that it cannot happen. The present example

+contains only one conflict between two rules, and the type-declaration

+context containing the conflict cannot be nested. So the number of

+branches that can exist at any time is limited by the constant 2, and

+the parsing time is still linear.

+ Here is a Bison grammar corresponding to the example above. It

+parses a vastly simplified form of Pascal type declarations.

+ %token TYPE DOTDOT ID

+ %left '+' '-'

+ %left '*' '/'

+ %%

+ type_decl : TYPE ID '=' type ';'

+ ;

+ type : '(' id_list ')'

+ | expr DOTDOT expr

+ ;

+ id_list : ID

+ | id_list ',' ID

+ ;

+ expr : '(' expr ')'

+ | expr '+' expr

+ | expr '-' expr

+ | expr '*' expr

+ | expr '/' expr

+ | ID

+ ;

+ When used as a normal LALR(1) grammar, Bison correctly complains

+about one reduce/reduce conflict. In the conflicting situation the

+parser chooses one of the alternatives, arbitrarily the one declared

+first. Therefore the following correct input is not recognized:

+ type t = (a) .. b;

+ The parser can be turned into a GLR parser, while also telling Bison

+to be silent about the one known reduce/reduce conflict, by adding

+these two declarations to the Bison input file (before the first `%%'):

+ %glr-parser

+ %expect-rr 1

+No change in the grammar itself is required. Now the parser recognizes

+all valid declarations, according to the limited syntax above,

+transparently. In fact, the user does not even notice when the parser

+splits.

+ So here we have a case where we can use the benefits of GLR, almost

+without disadvantages. Even in simple cases like this, however, there

+are at least two potential problems to beware. First, always analyze

+the conflicts reported by Bison to make sure that GLR splitting is only

+done where it is intended. A GLR parser splitting inadvertently may

+cause problems less obvious than an LALR parser statically choosing the

+wrong alternative in a conflict. Second, consider interactions with

+the lexer (*note Semantic Tokens::) with great care. Since a split

+parser consumes tokens without performing any actions during the split,

+the lexer cannot obtain information via parser actions. Some cases of

+lexer interactions can be eliminated by using GLR to shift the

+complications from the lexer to the parser. You must check the

+remaining cases for correctness.

+ In our example, it would be safe for the lexer to return tokens

+based on their current meanings in some symbol table, because no new

+symbols are defined in the middle of a type declaration. Though it is

+possible for a parser to define the enumeration constants as they are

+parsed, before the type declaration is completed, it actually makes no

+difference since they cannot be used within the same enumerated type

+declaration.

+File: bison.info, Node: Merging GLR Parses, Next: GLR Semantic Actions, Prev: Simple GLR Parsers, Up: GLR Parsers

+1.5.2 Using GLR to Resolve Ambiguities

+--------------------------------------

+Let's consider an example, vastly simplified from a C++ grammar.

+ %{

+ #include <stdio.h>

+ #define YYSTYPE char const *

+ int yylex (void);

+ void yyerror (char const *);

+ %}

+ %token TYPENAME ID

+ %right '='

+ %left '+'

+ %glr-parser

+ %%

+ prog :

+ | prog stmt { printf ("\n"); }

+ ;

+ stmt : expr ';' %dprec 1

+ | decl %dprec 2

+ ;

+ expr : ID { printf ("%s ", $$); }

+ | TYPENAME '(' expr ')'

+ { printf ("%s <cast> ", $1); }

+ | expr '+' expr { printf ("+ "); }

+ | expr '=' expr { printf ("= "); }

+ ;

+ decl : TYPENAME declarator ';'

+ { printf ("%s <declare> ", $1); }

+ | TYPENAME declarator '=' expr ';'

+ { printf ("%s <init-declare> ", $1); }

+ ;

+ declarator : ID { printf ("\"%s\" ", $1); }

+ | '(' declarator ')'

+ ;

+This models a problematic part of the C++ grammar--the ambiguity between

+certain declarations and statements. For example,

+ T (x) = y+z;

+parses as either an `expr' or a `stmt' (assuming that `T' is recognized

+as a `TYPENAME' and `x' as an `ID'). Bison detects this as a

+reduce/reduce conflict between the rules `expr : ID' and `declarator :

+ID', which it cannot resolve at the time it encounters `x' in the

+example above. Since this is a GLR parser, it therefore splits the

+problem into two parses, one for each choice of resolving the

+reduce/reduce conflict. Unlike the example from the previous section

+(*note Simple GLR Parsers::), however, neither of these parses "dies,"

+because the grammar as it stands is ambiguous. One of the parsers

+eventually reduces `stmt : expr ';'' and the other reduces `stmt :

+decl', after which both parsers are in an identical state: they've seen

+`prog stmt' and have the same unprocessed input remaining. We say that

+these parses have "merged."

+ At this point, the GLR parser requires a specification in the

+grammar of how to choose between the competing parses. In the example

+above, the two `%dprec' declarations specify that Bison is to give

+precedence to the parse that interprets the example as a `decl', which

+implies that `x' is a declarator. The parser therefore prints

+ "x" y z + T <init-declare>

+ The `%dprec' declarations only come into play when more than one

+parse survives. Consider a different input string for this parser:

+ T (x) + y;

+This is another example of using GLR to parse an unambiguous construct,

+as shown in the previous section (*note Simple GLR Parsers::). Here,

+there is no ambiguity (this cannot be parsed as a declaration).

+However, at the time the Bison parser encounters `x', it does not have

+enough information to resolve the reduce/reduce conflict (again,

+between `x' as an `expr' or a `declarator'). In this case, no

+precedence declaration is used. Again, the parser splits into two, one

+assuming that `x' is an `expr', and the other assuming `x' is a

+`declarator'. The second of these parsers then vanishes when it sees

+`+', and the parser prints

+ x T <cast> y +

+ Suppose that instead of resolving the ambiguity, you wanted to see

+all the possibilities. For this purpose, you must merge the semantic

+actions of the two possible parsers, rather than choosing one over the

+other. To do so, you could change the declaration of `stmt' as follows:

+ stmt : expr ';' %merge <stmtMerge>

+ | decl %merge <stmtMerge>

+ ;

+and define the `stmtMerge' function as:

+ static YYSTYPE

+ stmtMerge (YYSTYPE x0, YYSTYPE x1)

+ {

+ printf ("<OR> ");

+ return "";

+ }

+with an accompanying forward declaration in the C declarations at the

+beginning of the file:

+ %{

+ #define YYSTYPE char const *

+ static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);

+ %}

+With these declarations, the resulting parser parses the first example

+as both an `expr' and a `decl', and prints

+ "x" y z + T <init-declare> x T <cast> y z + = <OR>

+ Bison requires that all of the productions that participate in any

+particular merge have identical `%merge' clauses. Otherwise, the

+ambiguity would be unresolvable, and the parser will report an error

+during any parse that results in the offending merge.

+File: bison.info, Node: GLR Semantic Actions, Next: Compiler Requirements, Prev: Merging GLR Parses, Up: GLR Parsers

+1.5.3 GLR Semantic Actions

+--------------------------

+By definition, a deferred semantic action is not performed at the same

+time as the associated reduction. This raises caveats for several

+Bison features you might use in a semantic action in a GLR parser.

+ In any semantic action, you can examine `yychar' to determine the

+type of the lookahead token present at the time of the associated

+reduction. After checking that `yychar' is not set to `YYEMPTY' or

+`YYEOF', you can then examine `yylval' and `yylloc' to determine the

+lookahead token's semantic value and location, if any. In a

+nondeferred semantic action, you can also modify any of these variables

+to influence syntax analysis. *Note Lookahead Tokens: Lookahead.

+ In a deferred semantic action, it's too late to influence syntax

+analysis. In this case, `yychar', `yylval', and `yylloc' are set to

+shallow copies of the values they had at the time of the associated

+reduction. For this reason alone, modifying them is dangerous.

+Moreover, the result of modifying them is undefined and subject to

+change with future versions of Bison. For example, if a semantic

+action might be deferred, you should never write it to invoke

+`yyclearin' (*note Action Features::) or to attempt to free memory

+referenced by `yylval'.

+ Another Bison feature requiring special consideration is `YYERROR'

+(*note Action Features::), which you can invoke in a semantic action to

+initiate error recovery. During deterministic GLR operation, the

+effect of `YYERROR' is the same as its effect in an LALR(1) parser. In

+a deferred semantic action, its effect is undefined.

+ Also, see *Note Default Action for Locations: Location Default

+Action, which describes a special usage of `YYLLOC_DEFAULT' in GLR

+parsers.

+File: bison.info, Node: Compiler Requirements, Prev: GLR Semantic Actions, Up: GLR Parsers

+1.5.4 Considerations when Compiling GLR Parsers

+-----------------------------------------------

+The GLR parsers require a compiler for ISO C89 or later. In addition,

+they use the `inline' keyword, which is not C89, but is C99 and is a

+common extension in pre-C99 compilers. It is up to the user of these

+parsers to handle portability issues. For instance, if using Autoconf

+and the Autoconf macro `AC_C_INLINE', a mere

+ %{

+ #include <config.h>

+ %}

+will suffice. Otherwise, we suggest

+ %{

+ #if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline

+ #define inline

+ #endif

+ %}

+File: bison.info, Node: Locations Overview, Next: Bison Parser, Prev: GLR Parsers, Up: Concepts

+1.6 Locations

+=============

+Many applications, like interpreters or compilers, have to produce

+verbose and useful error messages. To achieve this, one must be able

+to keep track of the "textual location", or "location", of each

+syntactic construct. Bison provides a mechanism for handling these

+locations.

+ Each token has a semantic value. In a similar fashion, each token

+has an associated location, but the type of locations is the same for

+all tokens and groupings. Moreover, the output parser is equipped with

+a default data structure for storing locations (*note Locations::, for

+more details).

+ Like semantic values, locations can be reached in actions using a

+dedicated set of constructs. In the example above, the location of the

+whole grouping is `@$', while the locations of the subexpressions are

+`@1' and `@3'.

+ When a rule is matched, a default action is used to compute the

+semantic value of its left hand side (*note Actions::). In the same

+way, another default action is used for locations. However, the action

+for locations is general enough for most cases, meaning there is

+usually no need to describe for each rule how `@$' should be formed.

+When building a new location for a given grouping, the default behavior

+of the output parser is to take the beginning of the first symbol, and

+the end of the last symbol.

+File: bison.info, Node: Bison Parser, Next: Stages, Prev: Locations Overview, Up: Concepts

+1.7 Bison Output: the Parser File

+=================================

+When you run Bison, you give it a Bison grammar file as input. The

+output is a C source file that parses the language described by the

+grammar. This file is called a "Bison parser". Keep in mind that the

+Bison utility and the Bison parser are two distinct programs: the Bison

+utility is a program whose output is the Bison parser that becomes part

+of your program.

+ The job of the Bison parser is to group tokens into groupings

+according to the grammar rules--for example, to build identifiers and

+operators into expressions. As it does this, it runs the actions for

+the grammar rules it uses.

+ The tokens come from a function called the "lexical analyzer" that

+you must supply in some fashion (such as by writing it in C). The Bison

+parser calls the lexical analyzer each time it wants a new token. It

+doesn't know what is "inside" the tokens (though their semantic values

+may reflect this). Typically the lexical analyzer makes the tokens by

+parsing characters of text, but Bison does not depend on this. *Note

+The Lexical Analyzer Function `yylex': Lexical.

+ The Bison parser file is C code which defines a function named

+`yyparse' which implements that grammar. This function does not make a

+complete C program: you must supply some additional functions. One is

+the lexical analyzer. Another is an error-reporting function which the

+parser calls to report an error. In addition, a complete C program must

+start with a function called `main'; you have to provide this, and

+arrange for it to call `yyparse' or the parser will never run. *Note

+Parser C-Language Interface: Interface.

+ Aside from the token type names and the symbols in the actions you

+write, all symbols defined in the Bison parser file itself begin with

+`yy' or `YY'. This includes interface functions such as the lexical

+analyzer function `yylex', the error reporting function `yyerror' and

+the parser function `yyparse' itself. This also includes numerous

+identifiers used for internal purposes. Therefore, you should avoid

+using C identifiers starting with `yy' or `YY' in the Bison grammar

+file except for the ones defined in this manual. Also, you should

+avoid using the C identifiers `malloc' and `free' for anything other

+than their usual meanings.

+ In some cases the Bison parser file includes system headers, and in

+those cases your code should respect the identifiers reserved by those

+headers. On some non-GNU hosts, `<alloca.h>', `<malloc.h>',

+`<stddef.h>', and `<stdlib.h>' are included as needed to declare memory

+allocators and related types. `<libintl.h>' is included if message

+translation is in use (*note Internationalization::). Other system

+headers may be included if you define `YYDEBUG' to a nonzero value

+(*note Tracing Your Parser: Tracing.).

+File: bison.info, Node: Stages, Next: Grammar Layout, Prev: Bison Parser, Up: Concepts

+1.8 Stages in Using Bison

+=========================

+The actual language-design process using Bison, from grammar

+specification to a working compiler or interpreter, has these parts:

+ 1. Formally specify the grammar in a form recognized by Bison (*note

+ Bison Grammar Files: Grammar File.). For each grammatical rule in

+ the language, describe the action that is to be taken when an

+ instance of that rule is recognized. The action is described by a

+ sequence of C statements.

+ 2. Write a lexical analyzer to process input and pass tokens to the

+ parser. The lexical analyzer may be written by hand in C (*note

+ The Lexical Analyzer Function `yylex': Lexical.). It could also

+ be produced using Lex, but the use of Lex is not discussed in this

+ manual.

+ 3. Write a controlling function that calls the Bison-produced parser.

+ 4. Write error-reporting routines.

+ To turn this source code as written into a runnable program, you

+must follow these steps:

+ 1. Run Bison on the grammar to produce the parser.

+ 2. Compile the code output by Bison, as well as any other source

+ files.

+ 3. Link the object files to produce the finished product.

+File: bison.info, Node: Grammar Layout, Prev: Stages, Up: Concepts

+1.9 The Overall Layout of a Bison Grammar

+=========================================

+The input file for the Bison utility is a "Bison grammar file". The

+general form of a Bison grammar file is as follows:

+ %{

+ PROLOGUE

+ %}

+ BISON DECLARATIONS

+ %%

+ GRAMMAR RULES

+ %%

+ EPILOGUE

+The `%%', `%{' and `%}' are punctuation that appears in every Bison

+grammar file to separate the sections.

+ The prologue may define types and variables used in the actions.

+You can also use preprocessor commands to define macros used there, and

+use `#include' to include header files that do any of these things.

+You need to declare the lexical analyzer `yylex' and the error printer

+`yyerror' here, along with any other global identifiers used by the

+actions in the grammar rules.

+ The Bison declarations declare the names of the terminal and

+nonterminal symbols, and may also describe operator precedence and the

+data types of semantic values of various symbols.

+ The grammar rules define how to construct each nonterminal symbol

+from its parts.

+ The epilogue can contain any code you want to use. Often the

+definitions of functions declared in the prologue go here. In a simple

+program, all the rest of the program can go here.

+File: bison.info, Node: Examples, Next: Grammar File, Prev: Concepts, Up: Top

+2 Examples

+**********

+Now we show and explain three sample programs written using Bison: a

+reverse polish notation calculator, an algebraic (infix) notation

+calculator, and a multi-function calculator. All three have been tested

+under BSD Unix 4.3; each produces a usable, though limited, interactive

+desk-top calculator.

+ These examples are simple, but Bison grammars for real programming

+languages are written the same way. You can copy these examples into a

+source file to try them.

+* Menu:

+* RPN Calc:: Reverse polish notation calculator;

+ a first example with no operator precedence.

+* Infix Calc:: Infix (algebraic) notation calculator.

+ Operator precedence is introduced.

+* Simple Error Recovery:: Continuing after syntax errors.

+* Location Tracking Calc:: Demonstrating the use of @N and @$.

+* Multi-function Calc:: Calculator with memory and trig functions.

+ It uses multiple data-types for semantic values.

+* Exercises:: Ideas for improving the multi-function calculator.

+File: bison.info, Node: RPN Calc, Next: Infix Calc, Up: Examples

+2.1 Reverse Polish Notation Calculator

+======================================

+The first example is that of a simple double-precision "reverse polish

+notation" calculator (a calculator using postfix operators). This

+example provides a good starting point, since operator precedence is

+not an issue. The second example will illustrate how operator

+precedence is handled.

+ The source code for this calculator is named `rpcalc.y'. The `.y'

+extension is a convention used for Bison input files.

+* Menu:

+* Rpcalc Declarations:: Prologue (declarations) for rpcalc.

+* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation.

+* Rpcalc Lexer:: The lexical analyzer.

+* Rpcalc Main:: The controlling function.

+* Rpcalc Error:: The error reporting function.

+* Rpcalc Generate:: Running Bison on the grammar file.

+* Rpcalc Compile:: Run the C compiler on the output code.

+File: bison.info, Node: Rpcalc Declarations, Next: Rpcalc Rules, Up: RPN Calc

+2.1.1 Declarations for `rpcalc'

+-------------------------------

+Here are the C and Bison declarations for the reverse polish notation

+calculator. As in C, comments are placed between `/*...*/'.

+ /* Reverse polish notation calculator. */

+ %{

+ #define YYSTYPE double

+ #include <math.h>

+ int yylex (void);

+ void yyerror (char const *);

+ %}

+ %token NUM

+ %% /* Grammar rules and actions follow. */

+ The declarations section (*note The prologue: Prologue.) contains two

+preprocessor directives and two forward declarations.

+ The `#define' directive defines the macro `YYSTYPE', thus specifying

+the C data type for semantic values of both tokens and groupings (*note

+Data Types of Semantic Values: Value Type.). The Bison parser will use

+whatever type `YYSTYPE' is defined as; if you don't define it, `int' is

+the default. Because we specify `double', each token and each

+expression has an associated value, which is a floating point number.

+ The `#include' directive is used to declare the exponentiation

+function `pow'.

+ The forward declarations for `yylex' and `yyerror' are needed

+because the C language requires that functions be declared before they

+are used. These functions will be defined in the epilogue, but the

+parser calls them so they must be declared in the prologue.

+ The second section, Bison declarations, provides information to Bison

+about the token types (*note The Bison Declarations Section: Bison

+Declarations.). Each terminal symbol that is not a single-character

+literal must be declared here. (Single-character literals normally

+don't need to be declared.) In this example, all the arithmetic

+operators are designated by single-character literals, so the only

+terminal symbol that needs to be declared is `NUM', the token type for

+numeric constants.

+File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Declarations, Up: RPN Calc

+2.1.2 Grammar Rules for `rpcalc'

+--------------------------------

+Here are the grammar rules for the reverse polish notation calculator.

+ input: /* empty */

+ | input line

+ ;

+ line: '\n'

+ | exp '\n' { printf ("\t%.10g\n", $1); }

+ ;

+ exp: NUM { $$ = $1; }

+ | exp exp '+' { $$ = $1 + $2; }

+ | exp exp '-' { $$ = $1 - $2; }

+ | exp exp '*' { $$ = $1 * $2; }

+ | exp exp '/' { $$ = $1 / $2; }

+ /* Exponentiation */

+ | exp exp '^' { $$ = pow ($1, $2); }

+ /* Unary minus */

+ | exp 'n' { $$ = -$1; }

+ ;

+ %%

+ The groupings of the rpcalc "language" defined here are the

+expression (given the name `exp'), the line of input (`line'), and the

+complete input transcript (`input'). Each of these nonterminal symbols

+has several alternate rules, joined by the vertical bar `|' which is

+read as "or". The following sections explain what these rules mean.

+ The semantics of the language is determined by the actions taken

+when a grouping is recognized. The actions are the C code that appears

+inside braces. *Note Actions::.

+ You must specify these actions in C, but Bison provides the means for

+passing semantic values between the rules. In each action, the

+pseudo-variable `$$' stands for the semantic value for the grouping

+that the rule is going to construct. Assigning a value to `$$' is the

+main job of most actions. The semantic values of the components of the

+rule are referred to as `$1', `$2', and so on.

+* Menu:

+* Rpcalc Input::

+* Rpcalc Line::

+* Rpcalc Expr::

+File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules

+2.1.2.1 Explanation of `input'

+..............................

+Consider the definition of `input':

+ input: /* empty */

+ | input line

+ ;

+ This definition reads as follows: "A complete input is either an

+empty string, or a complete input followed by an input line". Notice

+that "complete input" is defined in terms of itself. This definition

+is said to be "left recursive" since `input' appears always as the

+leftmost symbol in the sequence. *Note Recursive Rules: Recursion.

+ The first alternative is empty because there are no symbols between

+the colon and the first `|'; this means that `input' can match an empty

+string of input (no tokens). We write the rules this way because it is

+legitimate to type `Ctrl-d' right after you start the calculator. It's

+conventional to put an empty alternative first and write the comment

+`/* empty */' in it.

+ The second alternate rule (`input line') handles all nontrivial

+input. It means, "After reading any number of lines, read one more

+line if possible." The left recursion makes this rule into a loop.

+Since the first alternative matches empty input, the loop can be

+executed zero or more times.

+ The parser function `yyparse' continues to process input until a

+grammatical error is seen or the lexical analyzer says there are no more

+input tokens; we will arrange for the latter to happen at end-of-input.

+File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules

+2.1.2.2 Explanation of `line'

+.............................

+Now consider the definition of `line':

+ line: '\n'

+ | exp '\n' { printf ("\t%.10g\n", $1); }

+ ;

+ The first alternative is a token which is a newline character; this

+means that rpcalc accepts a blank line (and ignores it, since there is

+no action). The second alternative is an expression followed by a

+newline. This is the alternative that makes rpcalc useful. The

+semantic value of the `exp' grouping is the value of `$1' because the

+`exp' in question is the first symbol in the alternative. The action

+prints this value, which is the result of the computation the user

+asked for.

+ This action is unusual because it does not assign a value to `$$'.

+As a consequence, the semantic value associated with the `line' is

+uninitialized (its value will be unpredictable). This would be a bug if

+that value were ever used, but we don't use it: once rpcalc has printed

+the value of the user's input line, that value is no longer needed.

+File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules

+2.1.2.3 Explanation of `expr'

+.............................

+The `exp' grouping has several rules, one for each kind of expression.

+The first rule handles the simplest expressions: those that are just

+numbers. The second handles an addition-expression, which looks like

+two expressions followed by a plus-sign. The third handles

+subtraction, and so on.

+ exp: NUM

+ | exp exp '+' { $$ = $1 + $2; }

+ | exp exp '-' { $$ = $1 - $2; }

+ ...

+ ;

+ We have used `|' to join all the rules for `exp', but we could

+equally well have written them separately:

+ exp: NUM ;

+ exp: exp exp '+' { $$ = $1 + $2; } ;

+ exp: exp exp '-' { $$ = $1 - $2; } ;

+ ...

+ Most of the rules have actions that compute the value of the

+expression in terms of the value of its parts. For example, in the

+rule for addition, `$1' refers to the first component `exp' and `$2'

+refers to the second one. The third component, `'+'', has no meaningful

+associated semantic value, but if it had one you could refer to it as

+`$3'. When `yyparse' recognizes a sum expression using this rule, the

+sum of the two subexpressions' values is produced as the value of the

+entire expression. *Note Actions::.

+ You don't have to give an action for every rule. When a rule has no

+action, Bison by default copies the value of `$1' into `$$'. This is

+what happens in the first rule (the one that uses `NUM').

+ The formatting shown here is the recommended convention, but Bison

+does not require it. You can add or change white space as much as you

+wish. For example, this:

+ exp : NUM | exp exp '+' {$$ = $1 + $2; } | ... ;

+means the same thing as this:

+ exp: NUM

+ | exp exp '+' { $$ = $1 + $2; }

+ | ...

+ ;

+The latter, however, is much more readable.

+File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc

+2.1.3 The `rpcalc' Lexical Analyzer

+-----------------------------------

+The lexical analyzer's job is low-level parsing: converting characters

+or sequences of characters into tokens. The Bison parser gets its

+tokens by calling the lexical analyzer. *Note The Lexical Analyzer

+Function `yylex': Lexical.

+ Only a simple lexical analyzer is needed for the RPN calculator.

+This lexical analyzer skips blanks and tabs, then reads in numbers as

+`double' and returns them as `NUM' tokens. Any other character that

+isn't part of a number is a separate token. Note that the token-code

+for such a single-character token is the character itself.

+ The return value of the lexical analyzer function is a numeric code

+which represents a token type. The same text used in Bison rules to

+stand for this token type is also a C expression for the numeric code

+for the type. This works in two ways. If the token type is a

+character literal, then its numeric code is that of the character; you

+can use the same character literal in the lexical analyzer to express

+the number. If the token type is an identifier, that identifier is

+defined by Bison as a C macro whose definition is the appropriate

+number. In this example, therefore, `NUM' becomes a macro for `yylex'

+to use.

+ The semantic value of the token (if it has one) is stored into the

+global variable `yylval', which is where the Bison parser will look for

+it. (The C data type of `yylval' is `YYSTYPE', which was defined at

+the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc

+Declarations.)

+ A token type code of zero is returned if the end-of-input is

+encountered. (Bison recognizes any nonpositive value as indicating

+end-of-input.)

+ Here is the code for the lexical analyzer:

+ /* The lexical analyzer returns a double floating point

+ number on the stack and the token NUM, or the numeric code

+ of the character read if not a number. It skips all blanks

+ and tabs, and returns 0 for end-of-input. */

+ #include <ctype.h>

+ int

+ yylex (void)

+ {

+ int c;

+ /* Skip white space. */

+ while ((c = getchar ()) == ' ' || c == '\t')

+ ;

+ /* Process numbers. */

+ if (c == '.' || isdigit (c))

+ {

+ ungetc (c, stdin);

+ scanf ("%lf", &yylval);

+ return NUM;

+ }

+ /* Return end-of-input. */

+ if (c == EOF)

+ return 0;

+ /* Return a single char. */

+ return c;

+ }

+File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc

+2.1.4 The Controlling Function

+------------------------------

+In keeping with the spirit of this example, the controlling function is

+kept to the bare minimum. The only requirement is that it call

+`yyparse' to start the process of parsing.

+ int

+ main (void)

+ {

+ return yyparse ();

+ }

+File: bison.info, Node: Rpcalc Error, Next: Rpcalc Generate, Prev: Rpcalc Main, Up: RPN Calc

+2.1.5 The Error Reporting Routine

+---------------------------------

+When `yyparse' detects a syntax error, it calls the error reporting

+function `yyerror' to print an error message (usually but not always

+`"syntax error"'). It is up to the programmer to supply `yyerror'

+(*note Parser C-Language Interface: Interface.), so here is the

+definition we will use:

+ #include <stdio.h>

+ /* Called by yyparse on error. */

+ void

+ yyerror (char const *s)

+ {

+ fprintf (stderr, "%s\n", s);

+ }

+ After `yyerror' returns, the Bison parser may recover from the error

+and continue parsing if the grammar contains a suitable error rule

+(*note Error Recovery::). Otherwise, `yyparse' returns nonzero. We

+have not written any error rules in this example, so any invalid input

+will cause the calculator program to exit. This is not clean behavior

+for a real calculator, but it is adequate for the first example.

+File: bison.info, Node: Rpcalc Generate, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc

+2.1.6 Running Bison to Make the Parser

+--------------------------------------

+Before running Bison to produce a parser, we need to decide how to

+arrange all the source code in one or more source files. For such a

+simple example, the easiest thing is to put everything in one file. The

+definitions of `yylex', `yyerror' and `main' go at the end, in the

+epilogue of the file (*note The Overall Layout of a Bison Grammar:

+Grammar Layout.).

+ For a large project, you would probably have several source files,

+and use `make' to arrange to recompile them.

+ With all the source in a single file, you use the following command

+to convert it into a parser file:

+ bison FILE.y

+In this example the file was called `rpcalc.y' (for "Reverse Polish

+CALCulator"). Bison produces a file named `FILE.tab.c', removing the

+`.y' from the original file name. The file output by Bison contains

+the source code for `yyparse'. The additional functions in the input

+file (`yylex', `yyerror' and `main') are copied verbatim to the output.

+File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Generate, Up: RPN Calc

+2.1.7 Compiling the Parser File

+-------------------------------

+Here is how to compile and run the parser file:

+ # List files in current directory.

+ $ ls

+ rpcalc.tab.c rpcalc.y

+ # Compile the Bison parser.

+ # `-lm' tells compiler to search math library for `pow'.

+ $ cc -lm -o rpcalc rpcalc.tab.c

+ # List files again.

+ $ ls

+ rpcalc rpcalc.tab.c rpcalc.y

+ The file `rpcalc' now contains the executable code. Here is an

+example session using `rpcalc'.

+ $ rpcalc

+ 4 9 +

+ 13

+ 3 7 + 3 4 5 *+-

+ -13

+ 3 7 + 3 4 5 * + - n Note the unary minus, `n'

+ 13

+ 5 6 / 4 n +

+ -3.166666667

+ 3 4 ^ Exponentiation

+ 81

+ ^D End-of-file indicator

+ $

+File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples

+2.2 Infix Notation Calculator: `calc'

+=====================================

+We now modify rpcalc to handle infix operators instead of postfix.

+Infix notation involves the concept of operator precedence and the need

+for parentheses nested to arbitrary depth. Here is the Bison code for

+`calc.y', an infix desk-top calculator.

+ /* Infix notation calculator. */

+ %{

+ #define YYSTYPE double

+ #include <math.h>

+ #include <stdio.h>

+ int yylex (void);

+ void yyerror (char const *);

+ %}

+ /* Bison declarations. */

+ %token NUM

+ %left '-' '+'

+ %left '*' '/'

+ %left NEG /* negation--unary minus */

+ %right '^' /* exponentiation */

+ %% /* The grammar follows. */

+ input: /* empty */

+ | input line

+ ;

+ line: '\n'

+ | exp '\n' { printf ("\t%.10g\n", $1); }

+ ;

+ exp: NUM { $$ = $1; }

+ | exp '+' exp { $$ = $1 + $3; }

+ | exp '-' exp { $$ = $1 - $3; }

+ | exp '*' exp { $$ = $1 * $3; }

+ | exp '/' exp { $$ = $1 / $3; }

+ | '-' exp %prec NEG { $$ = -$2; }

+ | exp '^' exp { $$ = pow ($1, $3); }

+ | '(' exp ')' { $$ = $2; }

+ ;

+ %%

+The functions `yylex', `yyerror' and `main' can be the same as before.

+ There are two important new features shown in this code.

+ In the second section (Bison declarations), `%left' declares token

+types and says they are left-associative operators. The declarations

+`%left' and `%right' (right associativity) take the place of `%token'

+which is used to declare a token type name without associativity.

+(These tokens are single-character literals, which ordinarily don't

+need to be declared. We declare them here to specify the

+associativity.)

+ Operator precedence is determined by the line ordering of the

+declarations; the higher the line number of the declaration (lower on

+the page or screen), the higher the precedence. Hence, exponentiation

+has the highest precedence, unary minus (`NEG') is next, followed by

+`*' and `/', and so on. *Note Operator Precedence: Precedence.

+ The other important new feature is the `%prec' in the grammar

+section for the unary minus operator. The `%prec' simply instructs

+Bison that the rule `| '-' exp' has the same precedence as `NEG'--in

+this case the next-to-highest. *Note Context-Dependent Precedence:

+Contextual Precedence.

+ Here is a sample run of `calc.y':

+ $ calc

+ 4 + 4.5 - (34/(8*3+-3))

+ 6.880952381

+ -56 + 2

+ -54

+ 3 ^ 2

+ 9

+File: bison.info, Node: Simple Error Recovery, Next: Location Tracking Calc, Prev: Infix Calc, Up: Examples

+2.3 Simple Error Recovery

+=========================

+Up to this point, this manual has not addressed the issue of "error

+recovery"--how to continue parsing after the parser detects a syntax

+error. All we have handled is error reporting with `yyerror'. Recall

+that by default `yyparse' returns after calling `yyerror'. This means

+that an erroneous input line causes the calculator program to exit.

+Now we show how to rectify this deficiency.

+ The Bison language itself includes the reserved word `error', which

+may be included in the grammar rules. In the example below it has been

+added to one of the alternatives for `line':

+ line: '\n'

+ | exp '\n' { printf ("\t%.10g\n", $1); }

+ | error '\n' { yyerrok; }

+ ;

+ This addition to the grammar allows for simple error recovery in the

+event of a syntax error. If an expression that cannot be evaluated is

+read, the error will be recognized by the third rule for `line', and

+parsing will continue. (The `yyerror' function is still called upon to

+print its message as well.) The action executes the statement

+`yyerrok', a macro defined automatically by Bison; its meaning is that

+error recovery is complete (*note Error Recovery::). Note the

+difference between `yyerrok' and `yyerror'; neither one is a misprint.

+ This form of error recovery deals with syntax errors. There are

+other kinds of errors; for example, division by zero, which raises an

+exception signal that is normally fatal. A real calculator program

+must handle this signal and use `longjmp' to return to `main' and

+resume parsing input lines; it would also have to discard the rest of

+the current line of input. We won't discuss this issue further because

+it is not specific to Bison programs.

+File: bison.info, Node: Location Tracking Calc, Next: Multi-function Calc, Prev: Simple Error Recovery, Up: Examples

+2.4 Location Tracking Calculator: `ltcalc'

+==========================================

+This example extends the infix notation calculator with location

+tracking. This feature will be used to improve the error messages. For

+the sake of clarity, this example is a simple integer calculator, since

+most of the work needed to use locations will be done in the lexical

+analyzer.

+* Menu:

+* Ltcalc Declarations:: Bison and C declarations for ltcalc.

+* Ltcalc Rules:: Grammar rules for ltcalc, with explanations.

+* Ltcalc Lexer:: The lexical analyzer.

+File: bison.info, Node: Ltcalc Declarations, Next: Ltcalc Rules, Up: Location Tracking Calc

+2.4.1 Declarations for `ltcalc'

+-------------------------------

+The C and Bison declarations for the location tracking calculator are

+the same as the declarations for the infix notation calculator.

+ /* Location tracking calculator. */

+ %{

+ #define YYSTYPE int

+ #include <math.h>

+ int yylex (void);

+ void yyerror (char const *);

+ %}

+ /* Bison declarations. */

+ %token NUM

+ %left '-' '+'

+ %left '*' '/'

+ %left NEG

+ %right '^'

+ %% /* The grammar follows. */

+Note there are no declarations specific to locations. Defining a data

+type for storing locations is not needed: we will use the type provided

+by default (*note Data Types of Locations: Location Type.), which is a

+four member structure with the following integer fields: `first_line',

+`first_column', `last_line' and `last_column'. By conventions, and in

+accordance with the GNU Coding Standards and common practice, the line

+and column count both start at 1.

+File: bison.info, Node: Ltcalc Rules, Next: Ltcalc Lexer, Prev: Ltcalc Declarations, Up: Location Tracking Calc

+2.4.2 Grammar Rules for `ltcalc'

+--------------------------------

+Whether handling locations or not has no effect on the syntax of your

+language. Therefore, grammar rules for this example will be very close

+to those of the previous example: we will only modify them to benefit

+from the new information.

+ Here, we will use locations to report divisions by zero, and locate

+the wrong expressions or subexpressions.

+ input : /* empty */

+ | input line

+ ;

+ line : '\n'

+ | exp '\n' { printf ("%d\n", $1); }

+ ;

+ exp : NUM { $$ = $1; }

+ | exp '+' exp { $$ = $1 + $3; }

+ | exp '-' exp { $$ = $1 - $3; }

+ | exp '*' exp { $$ = $1 * $3; }

+ | exp '/' exp

+ {

+ if ($3)

+ $$ = $1 / $3;

+ else

+ {

+ $$ = 1;

+ fprintf (stderr, "%d.%d-%d.%d: division by zero",

+ @3.first_line, @3.first_column,

+ @3.last_line, @3.last_column);

+ }

+ | '-' exp %prec NEG { $$ = -$2; }

+ | exp '^' exp { $$ = pow ($1, $3); }

+ | '(' exp ')' { $$ = $2; }

+ This code shows how to reach locations inside of semantic actions, by

+using the pseudo-variables `@N' for rule components, and the

+pseudo-variable `@$' for groupings.

+ We don't need to assign a value to `@$': the output parser does it

+automatically. By default, before executing the C code of each action,

+`@$' is set to range from the beginning of `@1' to the end of `@N', for

+a rule with N components. This behavior can be redefined (*note

+Default Action for Locations: Location Default Action.), and for very

+specific rules, `@$' can be computed by hand.

+File: bison.info, Node: Ltcalc Lexer, Prev: Ltcalc Rules, Up: Location Tracking Calc

+2.4.3 The `ltcalc' Lexical Analyzer.

+------------------------------------

+Until now, we relied on Bison's defaults to enable location tracking.

+The next step is to rewrite the lexical analyzer, and make it able to

+feed the parser with the token locations, as it already does for

+semantic values.

+ To this end, we must take into account every single character of the

+input text, to avoid the computed locations of being fuzzy or wrong:

+ int

+ yylex (void)

+ {

+ int c;

+ /* Skip white space. */

+ while ((c = getchar ()) == ' ' || c == '\t')

+ ++yylloc.last_column;

+ /* Step. */

+ yylloc.first_line = yylloc.last_line;

+ yylloc.first_column = yylloc.last_column;

+ /* Process numbers. */

+ if (isdigit (c))

+ {

+ yylval = c - '0';

+ ++yylloc.last_column;

+ while (isdigit (c = getchar ()))

+ {

+ ++yylloc.last_column;

+ yylval = yylval * 10 + c - '0';

+ }

+ ungetc (c, stdin);

+ return NUM;

+ }

+ /* Return end-of-input. */

+ if (c == EOF)

+ return 0;

+ /* Return a single char, and update location. */

+ if (c == '\n')

+ {

+ ++yylloc.last_line;

+ yylloc.last_column = 0;

+ }

+ else

+ ++yylloc.last_column;

+ return c;

+ }

+ Basically, the lexical analyzer performs the same processing as

+before: it skips blanks and tabs, and reads numbers or single-character

+tokens. In addition, it updates `yylloc', the global variable (of type

+`YYLTYPE') containing the token's location.

+ Now, each time this function returns a token, the parser has its

+number as well as its semantic value, and its location in the text.

+The last needed change is to initialize `yylloc', for example in the

+controlling function:

+ int

+ main (void)

+ {

+ yylloc.first_line = yylloc.last_line = 1;

+ yylloc.first_column = yylloc.last_column = 0;

+ return yyparse ();

+ }

+ Remember that computing locations is not a matter of syntax. Every

+character must be associated to a location update, whether it is in

+valid input, in comments, in literal strings, and so on.

+File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Location Tracking Calc, Up: Examples

+2.5 Multi-Function Calculator: `mfcalc'

+=======================================

+Now that the basics of Bison have been discussed, it is time to move on

+to a more advanced problem. The above calculators provided only five

+functions, `+', `-', `*', `/' and `^'. It would be nice to have a

+calculator that provides other mathematical functions such as `sin',

+`cos', etc.

+ It is easy to add new operators to the infix calculator as long as

+they are only single-character literals. The lexical analyzer `yylex'

+passes back all nonnumeric characters as tokens, so new grammar rules

+suffice for adding a new operator. But we want something more

+flexible: built-in functions whose syntax has this form:

+ FUNCTION_NAME (ARGUMENT)

+At the same time, we will add memory to the calculator, by allowing you

+to create named variables, store values in them, and use them later.

+Here is a sample session with the multi-function calculator:

+ $ mfcalc

+ pi = 3.141592653589

+ 3.1415926536

+ sin(pi)

+ 0.0000000000

+ alpha = beta1 = 2.3

+ 2.3000000000

+ alpha

+ 2.3000000000

+ ln(alpha)

+ 0.8329091229

+ exp(ln(beta1))

+ 2.3000000000

+ $

+ Note that multiple assignment and nested function calls are

+permitted.

+* Menu:

+* Mfcalc Declarations:: Bison declarations for multi-function calculator.

+* Mfcalc Rules:: Grammar rules for the calculator.

+* Mfcalc Symbol Table:: Symbol table management subroutines.

+File: bison.info, Node: Mfcalc Declarations, Next: Mfcalc Rules, Up: Multi-function Calc

+2.5.1 Declarations for `mfcalc'

+-------------------------------

+Here are the C and Bison declarations for the multi-function calculator.

+ %{

+ #include <math.h> /* For math functions, cos(), sin(), etc. */

+ #include "calc.h" /* Contains definition of `symrec'. */

+ int yylex (void);

+ void yyerror (char const *);

+ %}

+ %union {

+ double val; /* For returning numbers. */

+ symrec *tptr; /* For returning symbol-table pointers. */

+ }

+ %token <val> NUM /* Simple double precision number. */

+ %token <tptr> VAR FNCT /* Variable and Function. */

+ %type <val> exp

+ %right '='

+ %left '-' '+'

+ %left '*' '/'

+ %left NEG /* negation--unary minus */

+ %right '^' /* exponentiation */

+ %% /* The grammar follows. */

+ The above grammar introduces only two new features of the Bison

+language. These features allow semantic values to have various data

+types (*note More Than One Value Type: Multiple Types.).

+ The `%union' declaration specifies the entire list of possible types;

+this is instead of defining `YYSTYPE'. The allowable types are now

+double-floats (for `exp' and `NUM') and pointers to entries in the

+symbol table. *Note The Collection of Value Types: Union Decl.

+ Since values can now have various types, it is necessary to

+associate a type with each grammar symbol whose semantic value is used.

+These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations

+are augmented with information about their data type (placed between

+angle brackets).

+ The Bison construct `%type' is used for declaring nonterminal

+symbols, just as `%token' is used for declaring token types. We have

+not used `%type' before because nonterminal symbols are normally

+declared implicitly by the rules that define them. But `exp' must be

+declared explicitly so we can specify its value type. *Note

+Nonterminal Symbols: Type Decl.

+File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symbol Table, Prev: Mfcalc Declarations, Up: Multi-function Calc

+2.5.2 Grammar Rules for `mfcalc'

+--------------------------------

+Here are the grammar rules for the multi-function calculator. Most of

+them are copied directly from `calc'; three rules, those which mention

+`VAR' or `FNCT', are new.

+ input: /* empty */

+ | input line

+ ;

+ line:

+ '\n'

+ | exp '\n' { printf ("\t%.10g\n", $1); }

+ | error '\n' { yyerrok; }

+ ;

+ exp: NUM { $$ = $1; }

+ | VAR { $$ = $1->value.var; }

+ | VAR '=' exp { $$ = $3; $1->value.var = $3; }

+ | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); }

+ | exp '+' exp { $$ = $1 + $3; }

+ | exp '-' exp { $$ = $1 - $3; }

+ | exp '*' exp { $$ = $1 * $3; }

+ | exp '/' exp { $$ = $1 / $3; }

+ | '-' exp %prec NEG { $$ = -$2; }

+ | exp '^' exp { $$ = pow ($1, $3); }

+ | '(' exp ')' { $$ = $2; }

+ ;

+ /* End of grammar. */

+ %%

+File: bison.info, Node: Mfcalc Symbol Table, Prev: Mfcalc Rules, Up: Multi-function Calc

+2.5.3 The `mfcalc' Symbol Table

+-------------------------------

+The multi-function calculator requires a symbol table to keep track of

+the names and meanings of variables and functions. This doesn't affect

+the grammar rules (except for the actions) or the Bison declarations,

+but it requires some additional C functions for support.

+ The symbol table itself consists of a linked list of records. Its

+definition, which is kept in the header `calc.h', is as follows. It

+provides for either functions or variables to be placed in the table.

+ /* Function type. */

+ typedef double (*func_t) (double);

+ /* Data type for links in the chain of symbols. */

+ struct symrec

+ {

+ char *name; /* name of symbol */

+ int type; /* type of symbol: either VAR or FNCT */

+ union

+ {

+ double var; /* value of a VAR */

+ func_t fnctptr; /* value of a FNCT */

+ } value;

+ struct symrec *next; /* link field */

+ };

+ typedef struct symrec symrec;

+ /* The symbol table: a chain of `struct symrec'. */

+ extern symrec *sym_table;

+ symrec *putsym (char const *, int);

+ symrec *getsym (char const *);

+ The new version of `main' includes a call to `init_table', a

+function that initializes the symbol table. Here it is, and

+`init_table' as well:

+ #include <stdio.h>

+ /* Called by yyparse on error. */

+ void

+ yyerror (char const *s)

+ {

+ printf ("%s\n", s);

+ }

+ struct init

+ {

+ char const *fname;

+ double (*fnct) (double);

+ };

+ struct init const arith_fncts[] =

+ {

+ "sin", sin,

+ "cos", cos,

+ "atan", atan,

+ "ln", log,

+ "exp", exp,

+ "sqrt", sqrt,

+ 0, 0

+ };

+ /* The symbol table: a chain of `struct symrec'. */

+ symrec *sym_table;

+ /* Put arithmetic functions in table. */

+ void

+ init_table (void)

+ {

+ int i;

+ symrec *ptr;

+ for (i = 0; arith_fncts[i].fname != 0; i++)

+ {

+ ptr = putsym (arith_fncts[i].fname, FNCT);

+ ptr->value.fnctptr = arith_fncts[i].fnct;

+ }

+ int

+ main (void)

+ {

+ init_table ();

+ return yyparse ();

+ }

+ By simply editing the initialization list and adding the necessary

+include files, you can add additional functions to the calculator.

+ Two important functions allow look-up and installation of symbols in

+the symbol table. The function `putsym' is passed a name and the type

+(`VAR' or `FNCT') of the object to be installed. The object is linked

+to the front of the list, and a pointer to the object is returned. The

+function `getsym' is passed the name of the symbol to look up. If

+found, a pointer to that symbol is returned; otherwise zero is returned.

+ symrec *

+ putsym (char const *sym_name, int sym_type)

+ {

+ symrec *ptr;

+ ptr = (symrec *) malloc (sizeof (symrec));

+ ptr->name = (char *) malloc (strlen (sym_name) + 1);

+ strcpy (ptr->name,sym_name);

+ ptr->type = sym_type;

+ ptr->value.var = 0; /* Set value to 0 even if fctn. */

+ ptr->next = (struct symrec *)sym_table;

+ sym_table = ptr;

+ return ptr;

+ }

+ symrec *

+ getsym (char const *sym_name)

+ {

+ symrec *ptr;

+ for (ptr = sym_table; ptr != (symrec *) 0;

+ ptr = (symrec *)ptr->next)

+ if (strcmp (ptr->name,sym_name) == 0)

+ return ptr;

+ return 0;

+ }

+ The function `yylex' must now recognize variables, numeric values,

+and the single-character arithmetic operators. Strings of alphanumeric

+characters with a leading letter are recognized as either variables or

+functions depending on what the symbol table says about them.

+ The string is passed to `getsym' for look up in the symbol table. If

+the name appears in the table, a pointer to its location and its type

+(`VAR' or `FNCT') is returned to `yyparse'. If it is not already in

+the table, then it is installed as a `VAR' using `putsym'. Again, a

+pointer and its type (which must be `VAR') is returned to `yyparse'.

+ No change is needed in the handling of numeric values and arithmetic

+operators in `yylex'.

+ #include <ctype.h>

+ int

+ yylex (void)

+ {

+ int c;

+ /* Ignore white space, get first nonwhite character. */

+ while ((c = getchar ()) == ' ' || c == '\t');

+ if (c == EOF)

+ return 0;

+ /* Char starts a number => parse the number. */

+ if (c == '.' || isdigit (c))

+ {

+ ungetc (c, stdin);

+ scanf ("%lf", &yylval.val);

+ return NUM;

+ }

+ /* Char starts an identifier => read the name. */

+ if (isalpha (c))

+ {

+ symrec *s;

+ static char *symbuf = 0;

+ static int length = 0;

+ int i;

+ /* Initially make the buffer long enough

+ for a 40-character symbol name. */

+ if (length == 0)

+ length = 40, symbuf = (char *)malloc (length + 1);

+ i = 0;

+ do

+ {

+ /* If buffer is full, make it bigger. */

+ if (i == length)

+ {

+ length *= 2;

+ symbuf = (char *) realloc (symbuf, length + 1);

+ }

+ /* Add this character to the buffer. */

+ symbuf[i++] = c;

+ /* Get another character. */

+ c = getchar ();

+ }

+ while (isalnum (c));

+ ungetc (c, stdin);

+ symbuf[i] = '\0';

+ s = getsym (symbuf);

+ if (s == 0)

+ s = putsym (symbuf, VAR);

+ yylval.tptr = s;

+ return s->type;

+ }

+ /* Any other character is a token by itself. */

+ return c;

+ }

+ This program is both powerful and flexible. You may easily add new

+functions, and it is a simple job to modify this code to install

+predefined variables such as `pi' or `e' as well.

+File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples

+2.6 Exercises

+=============

+ 1. Add some new functions from `math.h' to the initialization list.

+ 2. Add another array that contains constants and their values. Then

+ modify `init_table' to add these constants to the symbol table.

+ It will be easiest to give the constants type `VAR'.

+ 3. Make the program report an error if the user refers to an

+ uninitialized variable in any way except to store a value in it.

+File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top

+3 Bison Grammar Files

+*********************

+Bison takes as input a context-free grammar specification and produces a

+C-language function that recognizes correct instances of the grammar.

+ The Bison grammar input file conventionally has a name ending in

+`.y'. *Note Invoking Bison: Invocation.

+* Menu:

+* Grammar Outline:: Overall layout of the grammar file.

+* Symbols:: Terminal and nonterminal symbols.

+* Rules:: How to write grammar rules.

+* Recursion:: Writing recursive rules.

+* Semantics:: Semantic values and actions.

+* Locations:: Locations and actions.

+* Declarations:: All kinds of Bison declarations are described here.

+* Multiple Parsers:: Putting more than one Bison parser in one program.

+File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File

+3.1 Outline of a Bison Grammar

+==============================

+A Bison grammar file has four main sections, shown here with the

+appropriate delimiters:

+ %{

+ PROLOGUE

+ %}

+ BISON DECLARATIONS

+ %%

+ GRAMMAR RULES

+ %%

+ EPILOGUE

+ Comments enclosed in `/* ... */' may appear in any of the sections.

+As a GNU extension, `//' introduces a comment that continues until end

+of line.

+* Menu:

+* Prologue:: Syntax and usage of the prologue.

+* Prologue Alternatives:: Syntax and usage of alternatives to the prologue.

+* Bison Declarations:: Syntax and usage of the Bison declarations section.

+* Grammar Rules:: Syntax and usage of the grammar rules section.

+* Epilogue:: Syntax and usage of the epilogue.

+File: bison.info, Node: Prologue, Next: Prologue Alternatives, Up: Grammar Outline

+3.1.1 The prologue

+------------------

+The PROLOGUE section contains macro definitions and declarations of

+functions and variables that are used in the actions in the grammar

+rules. These are copied to the beginning of the parser file so that

+they precede the definition of `yyparse'. You can use `#include' to

+get the declarations from a header file. If you don't need any C

+declarations, you may omit the `%{' and `%}' delimiters that bracket

+this section.

+ The PROLOGUE section is terminated by the first occurrence of `%}'

+that is outside a comment, a string literal, or a character constant.

+ You may have more than one PROLOGUE section, intermixed with the

+BISON DECLARATIONS. This allows you to have C and Bison declarations

+that refer to each other. For example, the `%union' declaration may

+use types defined in a header file, and you may wish to prototype

+functions that take arguments of type `YYSTYPE'. This can be done with

+two PROLOGUE blocks, one before and one after the `%union' declaration.

+ %{

+ #define _GNU_SOURCE

+ #include <stdio.h>

+ #include "ptypes.h"

+ %}

+ %union {

+ long int n;

+ tree t; /* `tree' is defined in `ptypes.h'. */

+ }

+ %{

+ static void print_token_value (FILE *, int, YYSTYPE);

+ #define YYPRINT(F, N, L) print_token_value (F, N, L)

+ %}

+ ...

+ When in doubt, it is usually safer to put prologue code before all

+Bison declarations, rather than after. For example, any definitions of

+feature test macros like `_GNU_SOURCE' or `_POSIX_C_SOURCE' should

+appear before all Bison declarations, as feature test macros can affect

+the behavior of Bison-generated `#include' directives.

+File: bison.info, Node: Prologue Alternatives, Next: Bison Declarations, Prev: Prologue, Up: Grammar Outline

+3.1.2 Prologue Alternatives

+---------------------------

+(The prologue alternatives described here are experimental. More user

+feedback will help to determine whether they should become permanent

+features.)

+ The functionality of PROLOGUE sections can often be subtle and

+inflexible. As an alternative, Bison provides a %code directive with

+an explicit qualifier field, which identifies the purpose of the code

+and thus the location(s) where Bison should generate it. For C/C++,

+the qualifier can be omitted for the default location, or it can be one

+of `requires', `provides', `top'. *Note %code: Decl Summary.

+ Look again at the example of the previous section:

+ %{

+ #define _GNU_SOURCE

+ #include <stdio.h>

+ #include "ptypes.h"

+ %}

+ %union {

+ long int n;

+ tree t; /* `tree' is defined in `ptypes.h'. */

+ }

+ %{

+ static void print_token_value (FILE *, int, YYSTYPE);

+ #define YYPRINT(F, N, L) print_token_value (F, N, L)

+ %}

+ ...

+Notice that there are two PROLOGUE sections here, but there's a subtle

+distinction between their functionality. For example, if you decide to

+override Bison's default definition for `YYLTYPE', in which PROLOGUE

+section should you write your new definition? You should write it in

+the first since Bison will insert that code into the parser source code

+file _before_ the default `YYLTYPE' definition. In which PROLOGUE

+section should you prototype an internal function, `trace_token', that

+accepts `YYLTYPE' and `yytokentype' as arguments? You should prototype

+it in the second since Bison will insert that code _after_ the

+`YYLTYPE' and `yytokentype' definitions.

+ This distinction in functionality between the two PROLOGUE sections

+is established by the appearance of the `%union' between them. This

+behavior raises a few questions. First, why should the position of a

+`%union' affect definitions related to `YYLTYPE' and `yytokentype'?

+Second, what if there is no `%union'? In that case, the second kind of

+PROLOGUE section is not available. This behavior is not intuitive.

+ To avoid this subtle `%union' dependency, rewrite the example using a

+`%code top' and an unqualified `%code'. Let's go ahead and add the new

+`YYLTYPE' definition and the `trace_token' prototype at the same time:

+ %code top {

+ #define _GNU_SOURCE

+ #include <stdio.h>

+ /* WARNING: The following code really belongs

+ * in a `%code requires'; see below. */

+ #include "ptypes.h"

+ #define YYLTYPE YYLTYPE

+ typedef struct YYLTYPE

+ {

+ int first_line;

+ int first_column;

+ int last_line;

+ int last_column;

+ char *filename;

+ } YYLTYPE;

+ }

+ %union {

+ long int n;

+ tree t; /* `tree' is defined in `ptypes.h'. */

+ }

+ %code {

+ static void print_token_value (FILE *, int, YYSTYPE);

+ #define YYPRINT(F, N, L) print_token_value (F, N, L)

+ static void trace_token (enum yytokentype token, YYLTYPE loc);

+ }

+ ...

+In this way, `%code top' and the unqualified `%code' achieve the same

+functionality as the two kinds of PROLOGUE sections, but it's always

+explicit which kind you intend. Moreover, both kinds are always

+available even in the absence of `%union'.

+ The `%code top' block above logically contains two parts. The first

+two lines before the warning need to appear near the top of the parser

+source code file. The first line after the warning is required by

+`YYSTYPE' and thus also needs to appear in the parser source code file.

+However, if you've instructed Bison to generate a parser header file

+(*note %defines: Decl Summary.), you probably want that line to appear

+before the `YYSTYPE' definition in that header file as well. The

+`YYLTYPE' definition should also appear in the parser header file to

+override the default `YYLTYPE' definition there.

+ In other words, in the `%code top' block above, all but the first two

+lines are dependency code required by the `YYSTYPE' and `YYLTYPE'

+definitions. Thus, they belong in one or more `%code requires':

+ %code top {

+ #define _GNU_SOURCE

+ #include <stdio.h>

+ }

+ %code requires {

+ #include "ptypes.h"

+ }

+ %union {

+ long int n;

+ tree t; /* `tree' is defined in `ptypes.h'. */

+ }

+ %code requires {

+ #define YYLTYPE YYLTYPE

+ typedef struct YYLTYPE

+ {

+ int first_line;

+ int first_column;

+ int last_line;

+ int last_column;

+ char *filename;

+ } YYLTYPE;

+ }

+ %code {

+ static void print_token_value (FILE *, int, YYSTYPE);

+ #define YYPRINT(F, N, L) print_token_value (F, N, L)

+ static void trace_token (enum yytokentype token, YYLTYPE loc);

+ }

+ ...

+Now Bison will insert `#include "ptypes.h"' and the new `YYLTYPE'

+definition before the Bison-generated `YYSTYPE' and `YYLTYPE'

+definitions in both the parser source code file and the parser header

+file. (By the same reasoning, `%code requires' would also be the

+appropriate place to write your own definition for `YYSTYPE'.)

+ When you are writing dependency code for `YYSTYPE' and `YYLTYPE', you

+should prefer `%code requires' over `%code top' regardless of whether

+you instruct Bison to generate a parser header file. When you are

+writing code that you need Bison to insert only into the parser source

+code file and that has no special need to appear at the top of that

+file, you should prefer the unqualified `%code' over `%code top'.

+These practices will make the purpose of each block of your code

+explicit to Bison and to other developers reading your grammar file.

+Following these practices, we expect the unqualified `%code' and `%code

+requires' to be the most important of the four PROLOGUE alternatives.

+ At some point while developing your parser, you might decide to

+provide `trace_token' to modules that are external to your parser.

+Thus, you might wish for Bison to insert the prototype into both the

+parser header file and the parser source code file. Since this

+function is not a dependency required by `YYSTYPE' or `YYLTYPE', it

+doesn't make sense to move its prototype to a `%code requires'. More

+importantly, since it depends upon `YYLTYPE' and `yytokentype', `%code

+requires' is not sufficient. Instead, move its prototype from the

+unqualified `%code' to a `%code provides':

+ %code top {

+ #define _GNU_SOURCE

+ #include <stdio.h>

+ }

+ %code requires {

+ #include "ptypes.h"

+ }

+ %union {

+ long int n;

+ tree t; /* `tree' is defined in `ptypes.h'. */

+ }

+ %code requires {

+ #define YYLTYPE YYLTYPE

+ typedef struct YYLTYPE

+ {

+ int first_line;

+ int first_column;

+ int last_line;

+ int last_column;

+ char *filename;

+ } YYLTYPE;

+ }

+ %code provides {

+ void trace_token (enum yytokentype token, YYLTYPE loc);

+ }

+ %code {

+ static void print_token_value (FILE *, int, YYSTYPE);

+ #define YYPRINT(F, N, L) print_token_value (F, N, L)

+ }

+ ...

+Bison will insert the `trace_token' prototype into both the parser

+header file and the parser source code file after the definitions for

+`yytokentype', `YYLTYPE', and `YYSTYPE'.

+ The above examples are careful to write directives in an order that

+reflects the layout of the generated parser source code and header

+files: `%code top', `%code requires', `%code provides', and then

+`%code'. While your grammar files may generally be easier to read if

+you also follow this order, Bison does not require it. Instead, Bison

+lets you choose an organization that makes sense to you.

+ You may declare any of these directives multiple times in the

+grammar file. In that case, Bison concatenates the contained code in

+declaration order. This is the only way in which the position of one

+of these directives within the grammar file affects its functionality.

+ The result of the previous two properties is greater flexibility in

+how you may organize your grammar file. For example, you may organize

+semantic-type-related directives by semantic type:

+ %code requires { #include "type1.h" }

+ %union { type1 field1; }

+ %destructor { type1_free ($$); } <field1>

+ %printer { type1_print ($$); } <field1>

+ %code requires { #include "type2.h" }

+ %union { type2 field2; }

+ %destructor { type2_free ($$); } <field2>

+ %printer { type2_print ($$); } <field2>

+You could even place each of the above directive groups in the rules

+section of the grammar file next to the set of rules that uses the

+associated semantic type. (In the rules section, you must terminate

+each of those directives with a semicolon.) And you don't have to

+worry that some directive (like a `%union') in the definitions section

+is going to adversely affect their functionality in some

+counter-intuitive manner just because it comes first. Such an

+organization is not possible using PROLOGUE sections.

+ This section has been concerned with explaining the advantages of

+the four PROLOGUE alternatives over the original Yacc PROLOGUE.

+However, in most cases when using these directives, you shouldn't need

+to think about all the low-level ordering issues discussed here.

+Instead, you should simply use these directives to label each block of

+your code according to its purpose and let Bison handle the ordering.

+`%code' is the most generic label. Move code to `%code requires',

+`%code provides', or `%code top' as needed.

+File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: Prologue Alternatives, Up: Grammar Outline

+3.1.3 The Bison Declarations Section

+------------------------------------

+The BISON DECLARATIONS section contains declarations that define

+terminal and nonterminal symbols, specify precedence, and so on. In

+some simple grammars you may not need any declarations. *Note Bison

+Declarations: Declarations.

+File: bison.info, Node: Grammar Rules, Next: Epilogue, Prev: Bison Declarations, Up: Grammar Outline

+3.1.4 The Grammar Rules Section

+-------------------------------

+The "grammar rules" section contains one or more Bison grammar rules,

+and nothing else. *Note Syntax of Grammar Rules: Rules.

+ There must always be at least one grammar rule, and the first `%%'

+(which precedes the grammar rules) may never be omitted even if it is

+the first thing in the file.

+File: bison.info, Node: Epilogue, Prev: Grammar Rules, Up: Grammar Outline

+3.1.5 The epilogue

+------------------

+The EPILOGUE is copied verbatim to the end of the parser file, just as

+the PROLOGUE is copied to the beginning. This is the most convenient

+place to put anything that you want to have in the parser file but

+which need not come before the definition of `yyparse'. For example,

+the definitions of `yylex' and `yyerror' often go here. Because C

+requires functions to be declared before being used, you often need to

+declare functions like `yylex' and `yyerror' in the Prologue, even if

+you define them in the Epilogue. *Note Parser C-Language Interface:

+Interface.

+ If the last section is empty, you may omit the `%%' that separates it

+from the grammar rules.

+ The Bison parser itself contains many macros and identifiers whose

+names start with `yy' or `YY', so it is a good idea to avoid using any

+such names (except those documented in this manual) in the epilogue of

+the grammar file.

+File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File

+3.2 Symbols, Terminal and Nonterminal

+=====================================

+"Symbols" in Bison grammars represent the grammatical classifications

+of the language.

+ A "terminal symbol" (also known as a "token type") represents a

+class of syntactically equivalent tokens. You use the symbol in grammar

+rules to mean that a token in that class is allowed. The symbol is

+represented in the Bison parser by a numeric code, and the `yylex'

+function returns a token type code to indicate what kind of token has

+been read. You don't need to know what the code value is; you can use

+the symbol to stand for it.

+ A "nonterminal symbol" stands for a class of syntactically

+equivalent groupings. The symbol name is used in writing grammar rules.

+By convention, it should be all lower case.

+ Symbol names can contain letters, digits (not at the beginning),

+underscores and periods. Periods make sense only in nonterminals.

+ There are three ways of writing terminal symbols in the grammar:

+ * A "named token type" is written with an identifier, like an

+ identifier in C. By convention, it should be all upper case. Each

+ such name must be defined with a Bison declaration such as

+ `%token'. *Note Token Type Names: Token Decl.

+ * A "character token type" (or "literal character token") is written

+ in the grammar using the same syntax used in C for character

+ constants; for example, `'+'' is a character token type. A

+ character token type doesn't need to be declared unless you need to

+ specify its semantic value data type (*note Data Types of Semantic

+ Values: Value Type.), associativity, or precedence (*note Operator

+ Precedence: Precedence.).

+ By convention, a character token type is used only to represent a

+ token that consists of that particular character. Thus, the token

+ type `'+'' is used to represent the character `+' as a token.

+ Nothing enforces this convention, but if you depart from it, your

+ program will confuse other readers.

+ All the usual escape sequences used in character literals in C can

+ be used in Bison as well, but you must not use the null character

+ as a character literal because its numeric code, zero, signifies

+ end-of-input (*note Calling Convention for `yylex': Calling

+ Convention.). Also, unlike standard C, trigraphs have no special

+ meaning in Bison character literals, nor is backslash-newline

+ allowed.

+ * A "literal string token" is written like a C string constant; for

+ example, `"<="' is a literal string token. A literal string token

+ doesn't need to be declared unless you need to specify its semantic

+ value data type (*note Value Type::), associativity, or precedence

+ (*note Precedence::).

+ You can associate the literal string token with a symbolic name as

+ an alias, using the `%token' declaration (*note Token

+ Declarations: Token Decl.). If you don't do that, the lexical

+ analyzer has to retrieve the token number for the literal string

+ token from the `yytname' table (*note Calling Convention::).

+ *Warning*: literal string tokens do not work in Yacc.

+ By convention, a literal string token is used only to represent a

+ token that consists of that particular string. Thus, you should

+ use the token type `"<="' to represent the string `<=' as a token.

+ Bison does not enforce this convention, but if you depart from

+ it, people who read your program will be confused.

+ All the escape sequences used in string literals in C can be used

+ in Bison as well, except that you must not use a null character

+ within a string literal. Also, unlike Standard C, trigraphs have

+ no special meaning in Bison string literals, nor is

+ backslash-newline allowed. A literal string token must contain

+ two or more characters; for a token containing just one character,

+ use a character token (see above).

+ How you choose to write a terminal symbol has no effect on its

+grammatical meaning. That depends only on where it appears in rules and

+on when the parser function returns that symbol.

+ The value returned by `yylex' is always one of the terminal symbols,

+except that a zero or negative value signifies end-of-input. Whichever

+way you write the token type in the grammar rules, you write it the

+same way in the definition of `yylex'. The numeric code for a

+character token type is simply the positive numeric code of the

+character, so `yylex' can use the identical value to generate the

+requisite code, though you may need to convert it to `unsigned char' to

+avoid sign-extension on hosts where `char' is signed. Each named token

+type becomes a C macro in the parser file, so `yylex' can use the name

+to stand for the code. (This is why periods don't make sense in

+terminal symbols.) *Note Calling Convention for `yylex': Calling

+Convention.

+ If `yylex' is defined in a separate file, you need to arrange for the

+token-type macro definitions to be available there. Use the `-d'

+option when you run Bison, so that it will write these macro definitions

+into a separate header file `NAME.tab.h' which you can include in the

+other source files that need it. *Note Invoking Bison: Invocation.

+ If you want to write a grammar that is portable to any Standard C

+host, you must use only nonnull character tokens taken from the basic

+execution character set of Standard C. This set consists of the ten

+digits, the 52 lower- and upper-case English letters, and the

+characters in the following C-language string:

+ "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_{|}~"

+ The `yylex' function and Bison must use a consistent character set

+and encoding for character tokens. For example, if you run Bison in an

+ASCII environment, but then compile and run the resulting program in an

+environment that uses an incompatible character set like EBCDIC, the

+resulting program may not work because the tables generated by Bison

+will assume ASCII numeric values for character tokens. It is standard

+practice for software distributions to contain C source files that were

+generated by Bison in an ASCII environment, so installers on platforms

+that are incompatible with ASCII must rebuild those files before

+compiling them.

+ The symbol `error' is a terminal symbol reserved for error recovery

+(*note Error Recovery::); you shouldn't use it for any other purpose.

+In particular, `yylex' should never return this value. The default

+value of the error token is 256, unless you explicitly assigned 256 to

+one of your tokens with a `%token' declaration.

+File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File

+3.3 Syntax of Grammar Rules

+===========================

+A Bison grammar rule has the following general form:

+ RESULT: COMPONENTS...

+ ;

+where RESULT is the nonterminal symbol that this rule describes, and

+COMPONENTS are various terminal and nonterminal symbols that are put

+together by this rule (*note Symbols::).

+ For example,

+ exp: exp '+' exp

+ ;

+says that two groupings of type `exp', with a `+' token in between, can

+be combined into a larger grouping of type `exp'.

+ White space in rules is significant only to separate symbols. You

+can add extra white space as you wish.

+ Scattered among the components can be ACTIONS that determine the

+semantics of the rule. An action looks like this:

+ {C STATEMENTS}

+This is an example of "braced code", that is, C code surrounded by

+braces, much like a compound statement in C. Braced code can contain

+any sequence of C tokens, so long as its braces are balanced. Bison

+does not check the braced code for correctness directly; it merely

+copies the code to the output file, where the C compiler can check it.

+ Within braced code, the balanced-brace count is not affected by

+braces within comments, string literals, or character constants, but it

+is affected by the C digraphs `<%' and `%>' that represent braces. At

+the top level braced code must be terminated by `}' and not by a

+digraph. Bison does not look for trigraphs, so if braced code uses

+trigraphs you should ensure that they do not affect the nesting of

+braces or the boundaries of comments, string literals, or character

+constants.

+ Usually there is only one action and it follows the components.

+*Note Actions::.

+ Multiple rules for the same RESULT can be written separately or can

+be joined with the vertical-bar character `|' as follows:

+ RESULT: RULE1-COMPONENTS...

+ | RULE2-COMPONENTS...

+ ...

+ ;

+They are still considered distinct rules even when joined in this way.

+ If COMPONENTS in a rule is empty, it means that RESULT can match the

+empty string. For example, here is how to define a comma-separated

+sequence of zero or more `exp' groupings:

+ expseq: /* empty */

+ | expseq1

+ ;

+ expseq1: exp

+ | expseq1 ',' exp

+ ;

+It is customary to write a comment `/* empty */' in each rule with no

+components.

+File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File

+3.4 Recursive Rules

+===================

+A rule is called "recursive" when its RESULT nonterminal appears also

+on its right hand side. Nearly all Bison grammars need to use

+recursion, because that is the only way to define a sequence of any

+number of a particular thing. Consider this recursive definition of a

+comma-separated sequence of one or more expressions:

+ expseq1: exp

+ | expseq1 ',' exp

+ ;

+Since the recursive use of `expseq1' is the leftmost symbol in the

+right hand side, we call this "left recursion". By contrast, here the

+same construct is defined using "right recursion":

+ expseq1: exp

+ | exp ',' expseq1

+ ;

+Any kind of sequence can be defined using either left recursion or right

+recursion, but you should always use left recursion, because it can

+parse a sequence of any number of elements with bounded stack space.

+Right recursion uses up space on the Bison stack in proportion to the

+number of elements in the sequence, because all the elements must be

+shifted onto the stack before the rule can be applied even once. *Note

+The Bison Parser Algorithm: Algorithm, for further explanation of this.

+ "Indirect" or "mutual" recursion occurs when the result of the rule

+does not appear directly on its right hand side, but does appear in

+rules for other nonterminals which do appear on its right hand side.

+ For example:

+ expr: primary

+ | primary '+' primary

+ ;

+ primary: constant

+ | '(' expr ')'

+ ;

+defines two mutually-recursive nonterminals, since each refers to the

+other.

+File: bison.info, Node: Semantics, Next: Locations, Prev: Recursion, Up: Grammar File

+3.5 Defining Language Semantics

+===============================

+The grammar rules for a language determine only the syntax. The

+semantics are determined by the semantic values associated with various

+tokens and groupings, and by the actions taken when various groupings

+are recognized.

+ For example, the calculator calculates properly because the value

+associated with each expression is the proper number; it adds properly

+because the action for the grouping `X + Y' is to add the numbers

+associated with X and Y.

+* Menu:

+* Value Type:: Specifying one data type for all semantic values.

+* Multiple Types:: Specifying several alternative data types.

+* Actions:: An action is the semantic definition of a grammar rule.

+* Action Types:: Specifying data types for actions to operate on.

+* Mid-Rule Actions:: Most actions go at the end of a rule.

+ This says when, why and how to use the exceptional

+ action in the middle of a rule.

+File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics

+3.5.1 Data Types of Semantic Values

+-----------------------------------

+In a simple program it may be sufficient to use the same data type for

+the semantic values of all language constructs. This was true in the

+RPN and infix calculator examples (*note Reverse Polish Notation

+Calculator: RPN Calc.).

+ Bison normally uses the type `int' for semantic values if your

+program uses the same data type for all language constructs. To

+specify some other type, define `YYSTYPE' as a macro, like this:

+ #define YYSTYPE double

+`YYSTYPE''s replacement list should be a type name that does not

+contain parentheses or square brackets. This macro definition must go

+in the prologue of the grammar file (*note Outline of a Bison Grammar:

+Grammar Outline.).

+File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics

+3.5.2 More Than One Value Type

+------------------------------

+In most programs, you will need different data types for different kinds

+of tokens and groupings. For example, a numeric constant may need type

+`int' or `long int', while a string constant needs type `char *', and

+an identifier might need a pointer to an entry in the symbol table.

+ To use more than one data type for semantic values in one parser,

+Bison requires you to do two things:

+ * Specify the entire collection of possible data types, either by

+ using the `%union' Bison declaration (*note The Collection of

+ Value Types: Union Decl.), or by using a `typedef' or a `#define'

+ to define `YYSTYPE' to be a union type whose member names are the

+ type tags.

+ * Choose one of those types for each symbol (terminal or

+ nonterminal) for which semantic values are used. This is done for

+ tokens with the `%token' Bison declaration (*note Token Type

+ Names: Token Decl.) and for groupings with the `%type' Bison

+ declaration (*note Nonterminal Symbols: Type Decl.).

+File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics

+3.5.3 Actions

+-------------

+An action accompanies a syntactic rule and contains C code to be

+executed each time an instance of that rule is recognized. The task of

+most actions is to compute a semantic value for the grouping built by

+the rule from the semantic values associated with tokens or smaller

+groupings.

+ An action consists of braced code containing C statements, and can be

+placed at any position in the rule; it is executed at that position.

+Most rules have just one action at the end of the rule, following all

+the components. Actions in the middle of a rule are tricky and used

+only for special purposes (*note Actions in Mid-Rule: Mid-Rule

+Actions.).

+ The C code in an action can refer to the semantic values of the

+components matched by the rule with the construct `$N', which stands for

+the value of the Nth component. The semantic value for the grouping

+being constructed is `$$'. Bison translates both of these constructs

+into expressions of the appropriate type when it copies the actions

+into the parser file. `$$' is translated to a modifiable lvalue, so it

+can be assigned to.

+ Here is a typical example:

+ exp: ...

+ | exp '+' exp

+ { $$ = $1 + $3; }

+This rule constructs an `exp' from two smaller `exp' groupings

+connected by a plus-sign token. In the action, `$1' and `$3' refer to

+the semantic values of the two component `exp' groupings, which are the

+first and third symbols on the right hand side of the rule. The sum is

+stored into `$$' so that it becomes the semantic value of the

+addition-expression just recognized by the rule. If there were a

+useful semantic value associated with the `+' token, it could be

+referred to as `$2'.

+ Note that the vertical-bar character `|' is really a rule separator,

+and actions are attached to a single rule. This is a difference with

+tools like Flex, for which `|' stands for either "or", or "the same

+action as that of the next rule". In the following example, the action

+is triggered only when `b' is found:

+ a-or-b: 'a'|'b' { a_or_b_found = 1; };

+ If you don't specify an action for a rule, Bison supplies a default:

+`$$ = $1'. Thus, the value of the first symbol in the rule becomes the

+value of the whole rule. Of course, the default action is valid only

+if the two data types match. There is no meaningful default action for

+an empty rule; every empty rule must have an explicit action unless the

+rule's value does not matter.

+ `$N' with N zero or negative is allowed for reference to tokens and

+groupings on the stack _before_ those that match the current rule.

+This is a very risky practice, and to use it reliably you must be

+certain of the context in which the rule is applied. Here is a case in

+which you can use this reliably:

+ foo: expr bar '+' expr { ... }

+ | expr bar '-' expr { ... }

+ ;

+ bar: /* empty */

+ { previous_expr = $0; }

+ ;

+ As long as `bar' is used only in the fashion shown here, `$0' always

+refers to the `expr' which precedes `bar' in the definition of `foo'.

+ It is also possible to access the semantic value of the lookahead

+token, if any, from a semantic action. This semantic value is stored

+in `yylval'. *Note Special Features for Use in Actions: Action

+Features.

+File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics

+3.5.4 Data Types of Values in Actions

+-------------------------------------

+If you have chosen a single data type for semantic values, the `$$' and

+`$N' constructs always have that data type.

+ If you have used `%union' to specify a variety of data types, then

+you must declare a choice among these types for each terminal or

+nonterminal symbol that can have a semantic value. Then each time you

+use `$$' or `$N', its data type is determined by which symbol it refers

+to in the rule. In this example,

+ exp: ...

+ | exp '+' exp

+ { $$ = $1 + $3; }

+`$1' and `$3' refer to instances of `exp', so they all have the data

+type declared for the nonterminal symbol `exp'. If `$2' were used, it

+would have the data type declared for the terminal symbol `'+'',

+whatever that might be.

+ Alternatively, you can specify the data type when you refer to the

+value, by inserting `<TYPE>' after the `$' at the beginning of the

+reference. For example, if you have defined types as shown here:

+ %union {

+ int itype;

+ double dtype;

+ }

+then you can write `$<itype>1' to refer to the first subunit of the

+rule as an integer, or `$<dtype>1' to refer to it as a double.

+File: bison.info, Node: Mid-Rule Actions, Prev: Action Types, Up: Semantics

+3.5.5 Actions in Mid-Rule

+-------------------------

+Occasionally it is useful to put an action in the middle of a rule.

+These actions are written just like usual end-of-rule actions, but they

+are executed before the parser even recognizes the following components.

+ A mid-rule action may refer to the components preceding it using

+`$N', but it may not refer to subsequent components because it is run

+before they are parsed.

+ The mid-rule action itself counts as one of the components of the

+rule. This makes a difference when there is another action later in

+the same rule (and usually there is another at the end): you have to

+count the actions along with the symbols when working out which number

+N to use in `$N'.

+ The mid-rule action can also have a semantic value. The action can

+set its value with an assignment to `$$', and actions later in the rule

+can refer to the value using `$N'. Since there is no symbol to name

+the action, there is no way to declare a data type for the value in

+advance, so you must use the `$<...>N' construct to specify a data type

+each time you refer to this value.

+ There is no way to set the value of the entire rule with a mid-rule

+action, because assignments to `$$' do not have that effect. The only

+way to set the value for the entire rule is with an ordinary action at

+the end of the rule.

+ Here is an example from a hypothetical compiler, handling a `let'

+statement that looks like `let (VARIABLE) STATEMENT' and serves to

+create a variable named VARIABLE temporarily for the duration of

+STATEMENT. To parse this construct, we must put VARIABLE into the

+symbol table while STATEMENT is parsed, then remove it afterward. Here

+is how it is done:

+ stmt: LET '(' var ')'

+ { $<context>$ = push_context ();

+ declare_variable ($3); }

+ stmt { $$ = $6;

+ pop_context ($<context>5); }

+As soon as `let (VARIABLE)' has been recognized, the first action is

+run. It saves a copy of the current semantic context (the list of

+accessible variables) as its semantic value, using alternative

+`context' in the data-type union. Then it calls `declare_variable' to

+add the new variable to that list. Once the first action is finished,

+the embedded statement `stmt' can be parsed. Note that the mid-rule

+action is component number 5, so the `stmt' is component number 6.

+ After the embedded statement is parsed, its semantic value becomes

+the value of the entire `let'-statement. Then the semantic value from

+the earlier action is used to restore the prior list of variables. This

+removes the temporary `let'-variable from the list so that it won't

+appear to exist while the rest of the program is parsed.

+ In the above example, if the parser initiates error recovery (*note

+Error Recovery::) while parsing the tokens in the embedded statement

+`stmt', it might discard the previous semantic context `$<context>5'

+without restoring it. Thus, `$<context>5' needs a destructor (*note

+Freeing Discarded Symbols: Destructor Decl.). However, Bison currently

+provides no means to declare a destructor specific to a particular

+mid-rule action's semantic value.

+ One solution is to bury the mid-rule action inside a nonterminal

+symbol and to declare a destructor for that symbol:

+ %type <context> let

+ %destructor { pop_context ($$); } let

+ %%

+ stmt: let stmt

+ { $$ = $2;

+ pop_context ($1); }

+ ;

+ let: LET '(' var ')'

+ { $$ = push_context ();

+ declare_variable ($3); }

+ ;

+Note that the action is now at the end of its rule. Any mid-rule

+action can be converted to an end-of-rule action in this way, and this

+is what Bison actually does to implement mid-rule actions.

+ Taking action before a rule is completely recognized often leads to

+conflicts since the parser must commit to a parse in order to execute

+the action. For example, the following two rules, without mid-rule

+actions, can coexist in a working parser because the parser can shift

+the open-brace token and look at what follows before deciding whether

+there is a declaration or not:

+ compound: '{' declarations statements '}'

+ | '{' statements '}'

+ ;

+But when we add a mid-rule action as follows, the rules become

+nonfunctional:

+ compound: { prepare_for_local_variables (); }

+ '{' declarations statements '}'

+ | '{' statements '}'

+ ;

+Now the parser is forced to decide whether to run the mid-rule action

+when it has read no farther than the open-brace. In other words, it

+must commit to using one rule or the other, without sufficient

+information to do it correctly. (The open-brace token is what is called

+the "lookahead" token at this time, since the parser is still deciding

+what to do about it. *Note Lookahead Tokens: Lookahead.)

+ You might think that you could correct the problem by putting

+identical actions into the two rules, like this:

+ compound: { prepare_for_local_variables (); }

+ '{' declarations statements '}'

+ | { prepare_for_local_variables (); }

+ '{' statements '}'

+ ;

+But this does not help, because Bison does not realize that the two

+actions are identical. (Bison never tries to understand the C code in

+an action.)

+ If the grammar is such that a declaration can be distinguished from a

+statement by the first token (which is true in C), then one solution

+which does work is to put the action after the open-brace, like this:

+ compound: '{' { prepare_for_local_variables (); }

+ declarations statements '}'

+ | '{' statements '}'

+ ;

+Now the first token of the following declaration or statement, which

+would in any case tell Bison which rule to use, can still do so.

+ Another solution is to bury the action inside a nonterminal symbol

+which serves as a subroutine:

+ subroutine: /* empty */

+ { prepare_for_local_variables (); }

+ ;

+ compound: subroutine

+ '{' declarations statements '}'

+ | subroutine

+ '{' statements '}'

+ ;

+Now Bison can execute the action in the rule for `subroutine' without

+deciding which rule for `compound' it will eventually use.

+File: bison.info, Node: Locations, Next: Declarations, Prev: Semantics, Up: Grammar File

+3.6 Tracking Locations

+======================

+Though grammar rules and semantic actions are enough to write a fully

+functional parser, it can be useful to process some additional

+information, especially symbol locations.

+ The way locations are handled is defined by providing a data type,

+and actions to take when rules are matched.

+* Menu:

+* Location Type:: Specifying a data type for locations.

+* Actions and Locations:: Using locations in actions.

+* Location Default Action:: Defining a general way to compute locations.

+File: bison.info, Node: Location Type, Next: Actions and Locations, Up: Locations

+3.6.1 Data Type of Locations

+----------------------------

+Defining a data type for locations is much simpler than for semantic

+values, since all tokens and groupings always use the same type.

+ You can specify the type of locations by defining a macro called

+`YYLTYPE', just as you can specify the semantic value type by defining

+a `YYSTYPE' macro (*note Value Type::). When `YYLTYPE' is not defined,

+Bison uses a default structure type with four members:

+ typedef struct YYLTYPE

+ {

+ int first_line;

+ int first_column;

+ int last_line;

+ int last_column;

+ } YYLTYPE;

+ At the beginning of the parsing, Bison initializes all these fields

+to 1 for `yylloc'.

+File: bison.info, Node: Actions and Locations, Next: Location Default Action, Prev: Location Type, Up: Locations

+3.6.2 Actions and Locations

+---------------------------

+Actions are not only useful for defining language semantics, but also

+for describing the behavior of the output parser with locations.

+ The most obvious way for building locations of syntactic groupings

+is very similar to the way semantic values are computed. In a given

+rule, several constructs can be used to access the locations of the

+elements being matched. The location of the Nth component of the right

+hand side is `@N', while the location of the left hand side grouping is

+`@$'.

+ Here is a basic example using the default data type for locations:

+ exp: ...

+ | exp '/' exp

+ {

+ @$.first_column = @1.first_column;

+ @$.first_line = @1.first_line;

+ @$.last_column = @3.last_column;

+ @$.last_line = @3.last_line;

+ if ($3)

+ $$ = $1 / $3;

+ else

+ {

+ $$ = 1;

+ fprintf (stderr,

+ "Division by zero, l%d,c%d-l%d,c%d",

+ @3.first_line, @3.first_column,

+ @3.last_line, @3.last_column);

+ }

+ As for semantic values, there is a default action for locations that

+is run each time a rule is matched. It sets the beginning of `@$' to

+the beginning of the first symbol, and the end of `@$' to the end of the

+last symbol.

+ With this default action, the location tracking can be fully

+automatic. The example above simply rewrites this way:

+ exp: ...

+ | exp '/' exp

+ {

+ if ($3)

+ $$ = $1 / $3;

+ else

+ {

+ $$ = 1;

+ fprintf (stderr,

+ "Division by zero, l%d,c%d-l%d,c%d",

+ @3.first_line, @3.first_column,

+ @3.last_line, @3.last_column);

+ }

+ It is also possible to access the location of the lookahead token,

+if any, from a semantic action. This location is stored in `yylloc'.

+*Note Special Features for Use in Actions: Action Features.

+File: bison.info, Node: Location Default Action, Prev: Actions and Locations, Up: Locations

+3.6.3 Default Action for Locations

+----------------------------------

+Actually, actions are not the best place to compute locations. Since

+locations are much more general than semantic values, there is room in

+the output parser to redefine the default action to take for each rule.

+The `YYLLOC_DEFAULT' macro is invoked each time a rule is matched,

+before the associated action is run. It is also invoked while

+processing a syntax error, to compute the error's location. Before

+reporting an unresolvable syntactic ambiguity, a GLR parser invokes

+`YYLLOC_DEFAULT' recursively to compute the location of that ambiguity.

+ Most of the time, this macro is general enough to suppress location

+dedicated code from semantic actions.

+ The `YYLLOC_DEFAULT' macro takes three parameters. The first one is

+the location of the grouping (the result of the computation). When a

+rule is matched, the second parameter identifies locations of all right

+hand side elements of the rule being matched, and the third parameter

+is the size of the rule's right hand side. When a GLR parser reports

+an ambiguity, which of multiple candidate right hand sides it passes to

+`YYLLOC_DEFAULT' is undefined. When processing a syntax error, the

+second parameter identifies locations of the symbols that were

+discarded during error processing, and the third parameter is the

+number of discarded symbols.

+ By default, `YYLLOC_DEFAULT' is defined this way:

+ # define YYLLOC_DEFAULT(Current, Rhs, N) \

+ do \

+ if (N) \

+ { \

+ (Current).first_line = YYRHSLOC(Rhs, 1).first_line; \

+ (Current).first_column = YYRHSLOC(Rhs, 1).first_column; \

+ (Current).last_line = YYRHSLOC(Rhs, N).last_line; \

+ (Current).last_column = YYRHSLOC(Rhs, N).last_column; \

+ } \

+ else \

+ { \

+ (Current).first_line = (Current).last_line = \

+ YYRHSLOC(Rhs, 0).last_line; \

+ (Current).first_column = (Current).last_column = \

+ YYRHSLOC(Rhs, 0).last_column; \

+ } \

+ while (0)

+ where `YYRHSLOC (rhs, k)' is the location of the Kth symbol in RHS

+when K is positive, and the location of the symbol just before the

+reduction when K and N are both zero.

+ When defining `YYLLOC_DEFAULT', you should consider that:

+ * All arguments are free of side-effects. However, only the first

+ one (the result) should be modified by `YYLLOC_DEFAULT'.

+ * For consistency with semantic actions, valid indexes within the

+ right hand side range from 1 to N. When N is zero, only 0 is a

+ valid index, and it refers to the symbol just before the reduction.

+ During error processing N is always positive.

+ * Your macro should parenthesize its arguments, if need be, since the

+ actual arguments may not be surrounded by parentheses. Also, your

+ macro should expand to something that can be used as a single

+ statement when it is followed by a semicolon.

+File: bison.info, Node: Declarations, Next: Multiple Parsers, Prev: Locations, Up: Grammar File

+3.7 Bison Declarations

+======================

+The "Bison declarations" section of a Bison grammar defines the symbols

+used in formulating the grammar and the data types of semantic values.

+*Note Symbols::.

+ All token type names (but not single-character literal tokens such as

+`'+'' and `'*'') must be declared. Nonterminal symbols must be

+declared if you need to specify which data type to use for the semantic

+value (*note More Than One Value Type: Multiple Types.).

+ The first rule in the file also specifies the start symbol, by

+default. If you want some other symbol to be the start symbol, you

+must declare it explicitly (*note Languages and Context-Free Grammars:

+Language and Grammar.).

+* Menu:

+* Require Decl:: Requiring a Bison version.

+* Token Decl:: Declaring terminal symbols.

+* Precedence Decl:: Declaring terminals with precedence and associativity.

+* Union Decl:: Declaring the set of all semantic value types.

+* Type Decl:: Declaring the choice of type for a nonterminal symbol.

+* Initial Action Decl:: Code run before parsing starts.

+* Destructor Decl:: Declaring how symbols are freed.

+* Expect Decl:: Suppressing warnings about parsing conflicts.

+* Start Decl:: Specifying the start symbol.

+* Pure Decl:: Requesting a reentrant parser.

+* Push Decl:: Requesting a push parser.

+* Decl Summary:: Table of all Bison declarations.

+File: bison.info, Node: Require Decl, Next: Token Decl, Up: Declarations

+3.7.1 Require a Version of Bison

+--------------------------------

+You may require the minimum version of Bison to process the grammar. If

+the requirement is not met, `bison' exits with an error (exit status

+63).

+ %require "VERSION"

+File: bison.info, Node: Token Decl, Next: Precedence Decl, Prev: Require Decl, Up: Declarations

+3.7.2 Token Type Names

+----------------------

+The basic way to declare a token type name (terminal symbol) is as

+follows:

+ %token NAME

+ Bison will convert this into a `#define' directive in the parser, so

+that the function `yylex' (if it is in this file) can use the name NAME

+to stand for this token type's code.

+ Alternatively, you can use `%left', `%right', or `%nonassoc' instead

+of `%token', if you wish to specify associativity and precedence.

+*Note Operator Precedence: Precedence Decl.

+ You can explicitly specify the numeric code for a token type by

+appending a nonnegative decimal or hexadecimal integer value in the

+field immediately following the token name:

+ %token NUM 300

+ %token XNUM 0x12d // a GNU extension

+It is generally best, however, to let Bison choose the numeric codes for

+all token types. Bison will automatically select codes that don't

+conflict with each other or with normal characters.

+ In the event that the stack type is a union, you must augment the

+`%token' or other token declaration to include the data type

+alternative delimited by angle-brackets (*note More Than One Value

+Type: Multiple Types.).

+ For example:

+ %union { /* define stack type */

+ double val;

+ symrec *tptr;

+ }

+ %token <val> NUM /* define token NUM and its type */

+ You can associate a literal string token with a token type name by

+writing the literal string at the end of a `%token' declaration which

+declares the name. For example:

+ %token arrow "=>"

+For example, a grammar for the C language might specify these names with

+equivalent literal string tokens:

+ %token <operator> OR "||"

+ %token <operator> LE 134 "<="

+ %left OR "<="

+Once you equate the literal string and the token name, you can use them

+interchangeably in further declarations or the grammar rules. The

+`yylex' function can use the token name or the literal string to obtain

+the token type code number (*note Calling Convention::). Syntax error

+messages passed to `yyerror' from the parser will reference the literal

+string instead of the token name.

+ The token numbered as 0 corresponds to end of file; the following

+line allows for nicer error messages referring to "end of file" instead

+of "$end":

+ %token END 0 "end of file"

+File: bison.info, Node: Precedence Decl, Next: Union Decl, Prev: Token Decl, Up: Declarations

+3.7.3 Operator Precedence

+-------------------------

+Use the `%left', `%right' or `%nonassoc' declaration to declare a token

+and specify its precedence and associativity, all at once. These are

+called "precedence declarations". *Note Operator Precedence:

+Precedence, for general information on operator precedence.

+ The syntax of a precedence declaration is nearly the same as that of

+`%token': either

+ %left SYMBOLS...

+or

+ %left <TYPE> SYMBOLS...

+ And indeed any of these declarations serves the purposes of `%token'.

+But in addition, they specify the associativity and relative precedence

+for all the SYMBOLS:

+ * The associativity of an operator OP determines how repeated uses

+ of the operator nest: whether `X OP Y OP Z' is parsed by grouping

+ X with Y first or by grouping Y with Z first. `%left' specifies

+ left-associativity (grouping X with Y first) and `%right'

+ specifies right-associativity (grouping Y with Z first).

+ `%nonassoc' specifies no associativity, which means that `X OP Y

+ OP Z' is considered a syntax error.

+ * The precedence of an operator determines how it nests with other

+ operators. All the tokens declared in a single precedence

+ declaration have equal precedence and nest together according to

+ their associativity. When two tokens declared in different

+ precedence declarations associate, the one declared later has the

+ higher precedence and is grouped first.

+ For backward compatibility, there is a confusing difference between

+the argument lists of `%token' and precedence declarations. Only a

+`%token' can associate a literal string with a token type name. A

+precedence declaration always interprets a literal string as a

+reference to a separate token. For example:

+ %left OR "<=" // Does not declare an alias.

+ %left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=".

+File: bison.info, Node: Union Decl, Next: Type Decl, Prev: Precedence Decl, Up: Declarations

+3.7.4 The Collection of Value Types

+-----------------------------------

+The `%union' declaration specifies the entire collection of possible

+data types for semantic values. The keyword `%union' is followed by

+braced code containing the same thing that goes inside a `union' in C.

+ For example:

+ %union {

+ double val;

+ symrec *tptr;

+ }

+This says that the two alternative types are `double' and `symrec *'.

+They are given names `val' and `tptr'; these names are used in the

+`%token' and `%type' declarations to pick one of the types for a

+terminal or nonterminal symbol (*note Nonterminal Symbols: Type Decl.).

+ As an extension to POSIX, a tag is allowed after the `union'. For

+example:

+ %union value {

+ double val;

+ symrec *tptr;

+ }

+specifies the union tag `value', so the corresponding C type is `union

+value'. If you do not specify a tag, it defaults to `YYSTYPE'.

+ As another extension to POSIX, you may specify multiple `%union'

+declarations; their contents are concatenated. However, only the first

+`%union' declaration can specify a tag.

+ Note that, unlike making a `union' declaration in C, you need not

+write a semicolon after the closing brace.

+ Instead of `%union', you can define and use your own union type

+`YYSTYPE' if your grammar contains at least one `<TYPE>' tag. For

+example, you can put the following into a header file `parser.h':

+ union YYSTYPE {

+ double val;

+ symrec *tptr;

+ };

+ typedef union YYSTYPE YYSTYPE;

+and then your grammar can use the following instead of `%union':

+ %{

+ #include "parser.h"

+ %}

+ %type <val> expr

+ %token <tptr> ID

+File: bison.info, Node: Type Decl, Next: Initial Action Decl, Prev: Union Decl, Up: Declarations

+3.7.5 Nonterminal Symbols

+-------------------------

+When you use `%union' to specify multiple value types, you must declare

+the value type of each nonterminal symbol for which values are used.

+This is done with a `%type' declaration, like this:

+ %type <TYPE> NONTERMINAL...

+Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the

+name given in the `%union' to the alternative that you want (*note The

+Collection of Value Types: Union Decl.). You can give any number of

+nonterminal symbols in the same `%type' declaration, if they have the

+same value type. Use spaces to separate the symbol names.

+ You can also declare the value type of a terminal symbol. To do

+this, use the same `<TYPE>' construction in a declaration for the

+terminal symbol. All kinds of token declarations allow `<TYPE>'.

+File: bison.info, Node: Initial Action Decl, Next: Destructor Decl, Prev: Type Decl, Up: Declarations

+3.7.6 Performing Actions before Parsing

+---------------------------------------

+Sometimes your parser needs to perform some initializations before

+parsing. The `%initial-action' directive allows for such arbitrary

+code.

+ -- Directive: %initial-action { CODE }

+ Declare that the braced CODE must be invoked before parsing each

+ time `yyparse' is called. The CODE may use `$$' and `@$' --

+ initial value and location of the lookahead -- and the

+ `%parse-param'.

+ For instance, if your locations use a file name, you may use

+ %parse-param { char const *file_name };

+ %initial-action

+ {

+ @$.initialize (file_name);

+ };

+File: bison.info, Node: Destructor Decl, Next: Expect Decl, Prev: Initial Action Decl, Up: Declarations

+3.7.7 Freeing Discarded Symbols

+-------------------------------

+During error recovery (*note Error Recovery::), symbols already pushed

+on the stack and tokens coming from the rest of the file are discarded

+until the parser falls on its feet. If the parser runs out of memory,

+or if it returns via `YYABORT' or `YYACCEPT', all the symbols on the

+stack must be discarded. Even if the parser succeeds, it must discard

+the start symbol.

+ When discarded symbols convey heap based information, this memory is

+lost. While this behavior can be tolerable for batch parsers, such as

+in traditional compilers, it is unacceptable for programs like shells or

+protocol implementations that may parse and execute indefinitely.

+ The `%destructor' directive defines code that is called when a

+symbol is automatically discarded.

+ -- Directive: %destructor { CODE } SYMBOLS

+ Invoke the braced CODE whenever the parser discards one of the

+ SYMBOLS. Within CODE, `$$' designates the semantic value

+ associated with the discarded symbol, and `@$' designates its

+ location. The additional parser parameters are also available

+ (*note The Parser Function `yyparse': Parser Function.).

+ When a symbol is listed among SYMBOLS, its `%destructor' is called

+ a per-symbol `%destructor'. You may also define a per-type

+ `%destructor' by listing a semantic type tag among SYMBOLS. In

+ that case, the parser will invoke this CODE whenever it discards

+ any grammar symbol that has that semantic type tag unless that

+ symbol has its own per-symbol `%destructor'.

+ Finally, you can define two different kinds of default

+ `%destructor's. (These default forms are experimental. More user

+ feedback will help to determine whether they should become

+ permanent features.) You can place each of `<*>' and `<>' in the

+ SYMBOLS list of exactly one `%destructor' declaration in your

+ grammar file. The parser will invoke the CODE associated with one

+ of these whenever it discards any user-defined grammar symbol that

+ has no per-symbol and no per-type `%destructor'. The parser uses

+ the CODE for `<*>' in the case of such a grammar symbol for which

+ you have formally declared a semantic type tag (`%type' counts as

+ such a declaration, but `$<tag>$' does not). The parser uses the

+ CODE for `<>' in the case of such a grammar symbol that has no

+ declared semantic type tag.

+For example:

+ %union { char *string; }

+ %token <string> STRING1

+ %token <string> STRING2

+ %type <string> string1

+ %type <string> string2

+ %union { char character; }

+ %token <character> CHR

+ %type <character> chr

+ %token TAGLESS

+ %destructor { } <character>

+ %destructor { free ($$); } <*>

+ %destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1

+ %destructor { printf ("Discarding tagless symbol.\n"); } <>

+guarantees that, when the parser discards any user-defined symbol that

+has a semantic type tag other than `<character>', it passes its

+semantic value to `free' by default. However, when the parser discards

+a `STRING1' or a `string1', it also prints its line number to `stdout'.

+It performs only the second `%destructor' in this case, so it invokes

+`free' only once. Finally, the parser merely prints a message whenever

+it discards any symbol, such as `TAGLESS', that has no semantic type

+tag.

+ A Bison-generated parser invokes the default `%destructor's only for

+user-defined as opposed to Bison-defined symbols. For example, the

+parser will not invoke either kind of default `%destructor' for the

+special Bison-defined symbols `$accept', `$undefined', or `$end' (*note

+Bison Symbols: Table of Symbols.), none of which you can reference in

+your grammar. It also will not invoke either for the `error' token

+(*note error: Table of Symbols.), which is always defined by Bison

+regardless of whether you reference it in your grammar. However, it

+may invoke one of them for the end token (token 0) if you redefine it

+from `$end' to, for example, `END':

+ %token END 0

+ Finally, Bison will never invoke a `%destructor' for an unreferenced

+mid-rule semantic value (*note Actions in Mid-Rule: Mid-Rule Actions.).

+That is, Bison does not consider a mid-rule to have a semantic value if

+you do not reference `$$' in the mid-rule's action or `$N' (where N is

+the RHS symbol position of the mid-rule) in any later action in that

+rule. However, if you do reference either, the Bison-generated parser

+will invoke the `<>' `%destructor' whenever it discards the mid-rule

+symbol.

+ "Discarded symbols" are the following:

+ * stacked symbols popped during the first phase of error recovery,

+ * incoming terminals during the second phase of error recovery,

+ * the current lookahead and the entire stack (except the current

+ right-hand side symbols) when the parser returns immediately, and

+ * the start symbol, when the parser succeeds.

+ The parser can "return immediately" because of an explicit call to

+`YYABORT' or `YYACCEPT', or failed error recovery, or memory exhaustion.

+ Right-hand side symbols of a rule that explicitly triggers a syntax

+error via `YYERROR' are not discarded automatically. As a rule of

+thumb, destructors are invoked only when user actions cannot manage the

+memory.

+File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Destructor Decl, Up: Declarations

+3.7.8 Suppressing Conflict Warnings

+-----------------------------------

+Bison normally warns if there are any conflicts in the grammar (*note

+Shift/Reduce Conflicts: Shift/Reduce.), but most real grammars have

+harmless shift/reduce conflicts which are resolved in a predictable way

+and would be difficult to eliminate. It is desirable to suppress the

+warning about these conflicts unless the number of conflicts changes.

+You can do this with the `%expect' declaration.

+ The declaration looks like this:

+ %expect N

+ Here N is a decimal integer. The declaration says there should be N

+shift/reduce conflicts and no reduce/reduce conflicts. Bison reports

+an error if the number of shift/reduce conflicts differs from N, or if

+there are any reduce/reduce conflicts.

+ For normal LALR(1) parsers, reduce/reduce conflicts are more

+serious, and should be eliminated entirely. Bison will always report

+reduce/reduce conflicts for these parsers. With GLR parsers, however,

+both kinds of conflicts are routine; otherwise, there would be no need

+to use GLR parsing. Therefore, it is also possible to specify an

+expected number of reduce/reduce conflicts in GLR parsers, using the

+declaration:

+ %expect-rr N

+ In general, using `%expect' involves these steps:

+ * Compile your grammar without `%expect'. Use the `-v' option to

+ get a verbose list of where the conflicts occur. Bison will also

+ print the number of conflicts.

+ * Check each of the conflicts to make sure that Bison's default

+ resolution is what you really want. If not, rewrite the grammar

+ and go back to the beginning.

+ * Add an `%expect' declaration, copying the number N from the number

+ which Bison printed. With GLR parsers, add an `%expect-rr'

+ declaration as well.

+ Now Bison will warn you if you introduce an unexpected conflict, but

+will keep silent otherwise.

+File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations

+3.7.9 The Start-Symbol

+----------------------

+Bison assumes by default that the start symbol for the grammar is the

+first nonterminal specified in the grammar specification section. The

+programmer may override this restriction with the `%start' declaration

+as follows:

+ %start SYMBOL

+File: bison.info, Node: Pure Decl, Next: Push Decl, Prev: Start Decl, Up: Declarations

+3.7.10 A Pure (Reentrant) Parser

+--------------------------------

+A "reentrant" program is one which does not alter in the course of

+execution; in other words, it consists entirely of "pure" (read-only)

+code. Reentrancy is important whenever asynchronous execution is

+possible; for example, a nonreentrant program may not be safe to call

+from a signal handler. In systems with multiple threads of control, a

+nonreentrant program must be called only within interlocks.

+ Normally, Bison generates a parser which is not reentrant. This is

+suitable for most uses, and it permits compatibility with Yacc. (The

+standard Yacc interfaces are inherently nonreentrant, because they use

+statically allocated variables for communication with `yylex',

+including `yylval' and `yylloc'.)

+ Alternatively, you can generate a pure, reentrant parser. The Bison

+declaration `%define api.pure' says that you want the parser to be

+reentrant. It looks like this:

+ %define api.pure

+ The result is that the communication variables `yylval' and `yylloc'

+become local variables in `yyparse', and a different calling convention

+is used for the lexical analyzer function `yylex'. *Note Calling

+Conventions for Pure Parsers: Pure Calling, for the details of this.

+The variable `yynerrs' becomes local in `yyparse' in pull mode but it

+becomes a member of yypstate in push mode. (*note The Error Reporting

+Function `yyerror': Error Reporting.). The convention for calling

+`yyparse' itself is unchanged.

+ Whether the parser is pure has nothing to do with the grammar rules.

+You can generate either a pure parser or a nonreentrant parser from any

+valid grammar.

+File: bison.info, Node: Push Decl, Next: Decl Summary, Prev: Pure Decl, Up: Declarations

+3.7.11 A Push Parser

+--------------------

+(The current push parsing interface is experimental and may evolve.

+More user feedback will help to stabilize it.)

+ A pull parser is called once and it takes control until all its input

+is completely parsed. A push parser, on the other hand, is called each

+time a new token is made available.

+ A push parser is typically useful when the parser is part of a main

+event loop in the client's application. This is typically a

+requirement of a GUI, when the main event loop needs to be triggered

+within a certain time period.

+ Normally, Bison generates a pull parser. The following Bison

+declaration says that you want the parser to be a push parser (*note

+%define api.push_pull: Decl Summary.):

+ %define api.push_pull "push"

+ In almost all cases, you want to ensure that your push parser is also

+a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.). The only

+time you should create an impure push parser is to have backwards

+compatibility with the impure Yacc pull mode interface. Unless you know

+what you are doing, your declarations should look like this:

+ %define api.pure

+ %define api.push_pull "push"

+ There is a major notable functional difference between the pure push

+parser and the impure push parser. It is acceptable for a pure push

+parser to have many parser instances, of the same type of parser, in

+memory at the same time. An impure push parser should only use one

+parser at a time.

+ When a push parser is selected, Bison will generate some new symbols

+in the generated parser. `yypstate' is a structure that the generated

+parser uses to store the parser's state. `yypstate_new' is the

+function that will create a new parser instance. `yypstate_delete'

+will free the resources associated with the corresponding parser

+instance. Finally, `yypush_parse' is the function that should be

+called whenever a token is available to provide the parser. A trivial

+example of using a pure push parser would look like this:

+ int status;

+ yypstate *ps = yypstate_new ();

+ do {

+ status = yypush_parse (ps, yylex (), NULL);

+ } while (status == YYPUSH_MORE);

+ yypstate_delete (ps);

+ If the user decided to use an impure push parser, a few things about

+the generated parser will change. The `yychar' variable becomes a

+global variable instead of a variable in the `yypush_parse' function.

+For this reason, the signature of the `yypush_parse' function is

+changed to remove the token as a parameter. A nonreentrant push parser

+example would thus look like this:

+ extern int yychar;

+ int status;

+ yypstate *ps = yypstate_new ();

+ do {

+ yychar = yylex ();

+ status = yypush_parse (ps);

+ } while (status == YYPUSH_MORE);

+ yypstate_delete (ps);

+ That's it. Notice the next token is put into the global variable

+`yychar' for use by the next invocation of the `yypush_parse' function.

+ Bison also supports both the push parser interface along with the

+pull parser interface in the same generated parser. In order to get

+this functionality, you should replace the `%define api.push_pull

+"push"' declaration with the `%define api.push_pull "both"'

+declaration. Doing this will create all of the symbols mentioned

+earlier along with the two extra symbols, `yyparse' and `yypull_parse'.

+`yyparse' can be used exactly as it normally would be used. However,

+the user should note that it is implemented in the generated parser by

+calling `yypull_parse'. This makes the `yyparse' function that is

+generated with the `%define api.push_pull "both"' declaration slower

+than the normal `yyparse' function. If the user calls the

+`yypull_parse' function it will parse the rest of the input stream. It

+is possible to `yypush_parse' tokens to select a subgrammar and then

+`yypull_parse' the rest of the input stream. If you would like to

+switch back and forth between between parsing styles, you would have to

+write your own `yypull_parse' function that knows when to quit looking

+for input. An example of using the `yypull_parse' function would look

+like this:

+ yypstate *ps = yypstate_new ();

+ yypull_parse (ps); /* Will call the lexer */

+ yypstate_delete (ps);

+ Adding the `%define api.pure' declaration does exactly the same

+thing to the generated parser with `%define api.push_pull "both"' as it

+did for `%define api.push_pull "push"'.

+File: bison.info, Node: Decl Summary, Prev: Push Decl, Up: Declarations

+3.7.12 Bison Declaration Summary

+--------------------------------

+Here is a summary of the declarations used to define a grammar:

+ -- Directive: %union

+ Declare the collection of data types that semantic values may have

+ (*note The Collection of Value Types: Union Decl.).

+ -- Directive: %token

+ Declare a terminal symbol (token type name) with no precedence or

+ associativity specified (*note Token Type Names: Token Decl.).

+ -- Directive: %right

+ Declare a terminal symbol (token type name) that is

+ right-associative (*note Operator Precedence: Precedence Decl.).

+ -- Directive: %left

+ Declare a terminal symbol (token type name) that is

+ left-associative (*note Operator Precedence: Precedence Decl.).

+ -- Directive: %nonassoc

+ Declare a terminal symbol (token type name) that is nonassociative

+ (*note Operator Precedence: Precedence Decl.). Using it in a way

+ that would be associative is a syntax error.

+ -- Directive: %type

+ Declare the type of semantic values for a nonterminal symbol

+ (*note Nonterminal Symbols: Type Decl.).

+ -- Directive: %start

+ Specify the grammar's start symbol (*note The Start-Symbol: Start

+ Decl.).

+ -- Directive: %expect

+ Declare the expected number of shift-reduce conflicts (*note

+ Suppressing Conflict Warnings: Expect Decl.).

+In order to change the behavior of `bison', use the following

+directives:

+ -- Directive: %code {CODE}

+ This is the unqualified form of the `%code' directive. It inserts

+ CODE verbatim at a language-dependent default location in the

+ output(1).

+ For C/C++, the default location is the parser source code file

+ after the usual contents of the parser header file. Thus, `%code'

+ replaces the traditional Yacc prologue, `%{CODE%}', for most

+ purposes. For a detailed discussion, see *Note Prologue

+ Alternatives::.

+ For Java, the default location is inside the parser class.

+ (Like all the Yacc prologue alternatives, this directive is

+ experimental. More user feedback will help to determine whether

+ it should become a permanent feature.)

+ -- Directive: %code QUALIFIER {CODE}

+ This is the qualified form of the `%code' directive. If you need

+ to specify location-sensitive verbatim CODE that does not belong

+ at the default location selected by the unqualified `%code' form,

+ use this form instead.

+ QUALIFIER identifies the purpose of CODE and thus the location(s)

+ where Bison should generate it. Not all values of QUALIFIER are

+ available for all target languages:

+ * requires

+ * Language(s): C, C++

+ * Purpose: This is the best place to write dependency code

+ required for `YYSTYPE' and `YYLTYPE'. In other words,

+ it's the best place to define types referenced in

+ `%union' directives, and it's the best place to override

+ Bison's default `YYSTYPE' and `YYLTYPE' definitions.

+ * Location(s): The parser header file and the parser

+ source code file before the Bison-generated `YYSTYPE'

+ and `YYLTYPE' definitions.

+ * provides

+ * Language(s): C, C++

+ * Purpose: This is the best place to write additional

+ definitions and declarations that should be provided to

+ other modules.

+ * Location(s): The parser header file and the parser

+ source code file after the Bison-generated `YYSTYPE',

+ `YYLTYPE', and token definitions.

+ * top

+ * Language(s): C, C++

+ * Purpose: The unqualified `%code' or `%code requires'

+ should usually be more appropriate than `%code top'.

+ However, occasionally it is necessary to insert code

+ much nearer the top of the parser source code file. For

+ example:

+ %code top {

+ #define _GNU_SOURCE

+ #include <stdio.h>

+ }

+ * Location(s): Near the top of the parser source code file.

+ * imports

+ * Language(s): Java

+ * Purpose: This is the best place to write Java import

+ directives.

+ * Location(s): The parser Java file after any Java package

+ directive and before any class definitions.

+ (Like all the Yacc prologue alternatives, this directive is

+ experimental. More user feedback will help to determine whether

+ it should become a permanent feature.)

+ For a detailed discussion of how to use `%code' in place of the

+ traditional Yacc prologue for C/C++, see *Note Prologue

+ Alternatives::.

+ -- Directive: %debug

+ In the parser file, define the macro `YYDEBUG' to 1 if it is not

+ already defined, so that the debugging facilities are compiled.

+ *Note Tracing Your Parser: Tracing.

+ -- Directive: %define VARIABLE

+ -- Directive: %define VARIABLE "VALUE"

+ Define a variable to adjust Bison's behavior. The possible

+ choices for VARIABLE, as well as their meanings, depend on the

+ selected target language and/or the parser skeleton (*note

+ %language: Decl Summary, *note %skeleton: Decl Summary.).

+ Bison will warn if a VARIABLE is defined multiple times.

+ Omitting `"VALUE"' is always equivalent to specifying it as `""'.

+ Some VARIABLEs may be used as Booleans. In this case, Bison will

+ complain if the variable definition does not meet one of the

+ following four conditions:

+ 1. `"VALUE"' is `"true"'

+ 2. `"VALUE"' is omitted (or is `""'). This is equivalent to

+ `"true"'.

+ 3. `"VALUE"' is `"false"'.

+ 4. VARIABLE is never defined. In this case, Bison selects a

+ default value, which may depend on the selected target

+ language and/or parser skeleton.

+ Some of the accepted VARIABLEs are:

+ * api.pure

+ * Language(s): C

+ * Purpose: Request a pure (reentrant) parser program.

+ *Note A Pure (Reentrant) Parser: Pure Decl.

+ * Accepted Values: Boolean

+ * Default Value: `"false"'

+ * api.push_pull

+ * Language(s): C (LALR(1) only)

+ * Purpose: Requests a pull parser, a push parser, or both.

+ *Note A Push Parser: Push Decl. (The current push

+ parsing interface is experimental and may evolve. More

+ user feedback will help to stabilize it.)

+ * Accepted Values: `"pull"', `"push"', `"both"'

+ * Default Value: `"pull"'

+ * lr.keep_unreachable_states

+ * Language(s): all

+ * Purpose: Requests that Bison allow unreachable parser

+ states to remain in the parser tables. Bison considers

+ a state to be unreachable if there exists no sequence of

+ transitions from the start state to that state. A state

+ can become unreachable during conflict resolution if

+ Bison disables a shift action leading to it from a

+ predecessor state. Keeping unreachable states is

+ sometimes useful for analysis purposes, but they are

+ useless in the generated parser.

+ * Accepted Values: Boolean

+ * Default Value: `"false"'

+ * Caveats:

+ * Unreachable states may contain conflicts and may

+ use rules not used in any other state. Thus,

+ keeping unreachable states may induce warnings that

+ are irrelevant to your parser's behavior, and it

+ may eliminate warnings that are relevant. Of

+ course, the change in warnings may actually be

+ relevant to a parser table analysis that wants to

+ keep unreachable states, so this behavior will

+ likely remain in future Bison releases.

+ * While Bison is able to remove unreachable states,

+ it is not guaranteed to remove other kinds of

+ useless states. Specifically, when Bison disables

+ reduce actions during conflict resolution, some

+ goto actions may become useless, and thus some

+ additional states may become useless. If Bison

+ were to compute which goto actions were useless and

+ then disable those actions, it could identify such

+ states as unreachable and then remove those states.

+ However, Bison does not compute which goto actions

+ are useless.

+ * namespace

+ * Languages(s): C++

+ * Purpose: Specifies the namespace for the parser class.

+ For example, if you specify:

+ %define namespace "foo::bar"

+ Bison uses `foo::bar' verbatim in references such as:

+ foo::bar::parser::semantic_type

+ However, to open a namespace, Bison removes any leading

+ `::' and then splits on any remaining occurrences:

+ namespace foo { namespace bar {

+ class position;

+ class location;

+ } }

+ * Accepted Values: Any absolute or relative C++ namespace

+ reference without a trailing `"::"'. For example,

+ `"foo"' or `"::foo::bar"'.

+ * Default Value: The value specified by `%name-prefix',

+ which defaults to `yy'. This usage of `%name-prefix' is

+ for backward compatibility and can be confusing since

+ `%name-prefix' also specifies the textual prefix for the

+ lexical analyzer function. Thus, if you specify

+ `%name-prefix', it is best to also specify `%define

+ namespace' so that `%name-prefix' _only_ affects the

+ lexical analyzer function. For example, if you specify:

+ %define namespace "foo"

+ %name-prefix "bar::"

+ The parser namespace is `foo' and `yylex' is referenced

+ as `bar::lex'.

+ -- Directive: %defines

+ Write a header file containing macro definitions for the token type

+ names defined in the grammar as well as a few other declarations.

+ If the parser output file is named `NAME.c' then this file is

+ named `NAME.h'.

+ For C parsers, the output header declares `YYSTYPE' unless

+ `YYSTYPE' is already defined as a macro or you have used a

+ `<TYPE>' tag without using `%union'. Therefore, if you are using

+ a `%union' (*note More Than One Value Type: Multiple Types.) with

+ components that require other definitions, or if you have defined

+ a `YYSTYPE' macro or type definition (*note Data Types of Semantic

+ Values: Value Type.), you need to arrange for these definitions to

+ be propagated to all modules, e.g., by putting them in a

+ prerequisite header that is included both by your parser and by

+ any other module that needs `YYSTYPE'.

+ Unless your parser is pure, the output header declares `yylval' as

+ an external variable. *Note A Pure (Reentrant) Parser: Pure Decl.

+ If you have also used locations, the output header declares

+ `YYLTYPE' and `yylloc' using a protocol similar to that of the

+ `YYSTYPE' macro and `yylval'. *Note Tracking Locations: Locations.

+ This output file is normally essential if you wish to put the

+ definition of `yylex' in a separate source file, because `yylex'

+ typically needs to be able to refer to the above-mentioned

+ declarations and to the token type codes. *Note Semantic Values

+ of Tokens: Token Values.

+ If you have declared `%code requires' or `%code provides', the

+ output header also contains their code. *Note %code: Decl Summary.

+ -- Directive: %defines DEFINES-FILE

+ Same as above, but save in the file DEFINES-FILE.

+ -- Directive: %destructor

+ Specify how the parser should reclaim the memory associated to

+ discarded symbols. *Note Freeing Discarded Symbols: Destructor

+ Decl.

+ -- Directive: %file-prefix "PREFIX"

+ Specify a prefix to use for all Bison output file names. The

+ names are chosen as if the input file were named `PREFIX.y'.

+ -- Directive: %language "LANGUAGE"

+ Specify the programming language for the generated parser.

+ Currently supported languages include C, C++, and Java. LANGUAGE

+ is case-insensitive.

+ This directive is experimental and its effect may be modified in

+ future releases.

+ -- Directive: %locations

+ Generate the code processing the locations (*note Special Features

+ for Use in Actions: Action Features.). This mode is enabled as

+ soon as the grammar uses the special `@N' tokens, but if your

+ grammar does not use it, using `%locations' allows for more

+ accurate syntax error messages.

+ -- Directive: %name-prefix "PREFIX"

+ Rename the external symbols used in the parser so that they start

+ with PREFIX instead of `yy'. The precise list of symbols renamed

+ in C parsers is `yyparse', `yylex', `yyerror', `yynerrs',

+ `yylval', `yychar', `yydebug', and (if locations are used)

+ `yylloc'. If you use a push parser, `yypush_parse',

+ `yypull_parse', `yypstate', `yypstate_new' and `yypstate_delete'

+ will also be renamed. For example, if you use `%name-prefix

+ "c_"', the names become `c_parse', `c_lex', and so on. For C++

+ parsers, see the `%define namespace' documentation in this section.

+ *Note Multiple Parsers in the Same Program: Multiple Parsers.

+ -- Directive: %no-lines

+ Don't generate any `#line' preprocessor commands in the parser

+ file. Ordinarily Bison writes these commands in the parser file

+ so that the C compiler and debuggers will associate errors and

+ object code with your source file (the grammar file). This

+ directive causes them to associate errors with the parser file,

+ treating it an independent source file in its own right.

+ -- Directive: %output "FILE"

+ Specify FILE for the parser file.

+ -- Directive: %pure-parser

+ Deprecated version of `%define api.pure' (*note %define: Decl

+ Summary.), for which Bison is more careful to warn about

+ unreasonable usage.

+ -- Directive: %require "VERSION"

+ Require version VERSION or higher of Bison. *Note Require a

+ Version of Bison: Require Decl.

+ -- Directive: %skeleton "FILE"

+ Specify the skeleton to use.

+ If FILE does not contain a `/', FILE is the name of a skeleton

+ file in the Bison installation directory. If it does, FILE is an

+ absolute file name or a file name relative to the directory of the

+ grammar file. This is similar to how most shells resolve commands.

+ -- Directive: %token-table

+ Generate an array of token names in the parser file. The name of

+ the array is `yytname'; `yytname[I]' is the name of the token

+ whose internal Bison token code number is I. The first three

+ elements of `yytname' correspond to the predefined tokens `"$end"',

+ `"error"', and `"$undefined"'; after these come the symbols

+ defined in the grammar file.

+ The name in the table includes all the characters needed to

+ represent the token in Bison. For single-character literals and

+ literal strings, this includes the surrounding quoting characters

+ and any escape sequences. For example, the Bison single-character

+ literal `'+'' corresponds to a three-character name, represented

+ in C as `"'+'"'; and the Bison two-character literal string `"\\/"'

+ corresponds to a five-character name, represented in C as

+ `"\"\\\\/\""'.

+ When you specify `%token-table', Bison also generates macro

+ definitions for macros `YYNTOKENS', `YYNNTS', and `YYNRULES', and

+ `YYNSTATES':

+ `YYNTOKENS'

+ The highest token number, plus one.

+ `YYNNTS'

+ The number of nonterminal symbols.

+ `YYNRULES'

+ The number of grammar rules,

+ `YYNSTATES'

+ The number of parser states (*note Parser States::).

+ -- Directive: %verbose

+ Write an extra output file containing verbose descriptions of the

+ parser states and what is done for each type of lookahead token in

+ that state. *Note Understanding Your Parser: Understanding, for

+ more information.

+ -- Directive: %yacc

+ Pretend the option `--yacc' was given, i.e., imitate Yacc,

+ including its naming conventions. *Note Bison Options::, for more.

+ ---------- Footnotes ----------

+ (1) The default location is actually skeleton-dependent; writers

+of non-standard skeletons however should choose the default location

+consistently with the behavior of the standard Bison skeletons.

+File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File

+3.8 Multiple Parsers in the Same Program

+========================================

+Most programs that use Bison parse only one language and therefore

+contain only one Bison parser. But what if you want to parse more than

+one language with the same program? Then you need to avoid a name

+conflict between different definitions of `yyparse', `yylval', and so

+on.

+ The easy way to do this is to use the option `-p PREFIX' (*note

+Invoking Bison: Invocation.). This renames the interface functions and

+variables of the Bison parser to start with PREFIX instead of `yy'.

+You can use this to give each parser distinct names that do not

+conflict.

+ The precise list of symbols renamed is `yyparse', `yylex',

+`yyerror', `yynerrs', `yylval', `yylloc', `yychar' and `yydebug'. If

+you use a push parser, `yypush_parse', `yypull_parse', `yypstate',

+`yypstate_new' and `yypstate_delete' will also be renamed. For

+example, if you use `-p c', the names become `cparse', `clex', and so

+on.

+ *All the other variables and macros associated with Bison are not

+renamed.* These others are not global; there is no conflict if the same

+name is used in different parsers. For example, `YYSTYPE' is not

+renamed, but defining this in different ways in different parsers causes

+no trouble (*note Data Types of Semantic Values: Value Type.).

+ The `-p' option works by adding macro definitions to the beginning

+of the parser source file, defining `yyparse' as `PREFIXparse', and so

+on. This effectively substitutes one name for the other in the entire

+parser file.

+File: bison.info, Node: Interface, Next: Algorithm, Prev: Grammar File, Up: Top

+4 Parser C-Language Interface

+*****************************

+The Bison parser is actually a C function named `yyparse'. Here we

+describe the interface conventions of `yyparse' and the other functions

+that it needs to use.

+ Keep in mind that the parser uses many C identifiers starting with

+`yy' and `YY' for internal purposes. If you use such an identifier

+(aside from those in this manual) in an action or in epilogue in the

+grammar file, you are likely to run into trouble.

+* Menu:

+* Parser Function:: How to call `yyparse' and what it returns.

+* Push Parser Function:: How to call `yypush_parse' and what it returns.

+* Pull Parser Function:: How to call `yypull_parse' and what it returns.

+* Parser Create Function:: How to call `yypstate_new' and what it returns.

+* Parser Delete Function:: How to call `yypstate_delete' and what it returns.

+* Lexical:: You must supply a function `yylex'

+ which reads tokens.

+* Error Reporting:: You must supply a function `yyerror'.

+* Action Features:: Special features for use in actions.

+* Internationalization:: How to let the parser speak in the user's

+ native language.

+File: bison.info, Node: Parser Function, Next: Push Parser Function, Up: Interface

+4.1 The Parser Function `yyparse'

+=================================

+You call the function `yyparse' to cause parsing to occur. This

+function reads tokens, executes actions, and ultimately returns when it

+encounters end-of-input or an unrecoverable syntax error. You can also

+write an action which directs `yyparse' to return immediately without

+reading further.

+ -- Function: int yyparse (void)

+ The value returned by `yyparse' is 0 if parsing was successful

+ (return is due to end-of-input).

+ The value is 1 if parsing failed because of invalid input, i.e.,

+ input that contains a syntax error or that causes `YYABORT' to be

+ invoked.

+ The value is 2 if parsing failed due to memory exhaustion.

+ In an action, you can cause immediate return from `yyparse' by using

+these macros:

+ -- Macro: YYACCEPT

+ Return immediately with value 0 (to report success).

+ -- Macro: YYABORT

+ Return immediately with value 1 (to report failure).

+ If you use a reentrant parser, you can optionally pass additional

+parameter information to it in a reentrant way. To do so, use the

+declaration `%parse-param':

+ -- Directive: %parse-param {ARGUMENT-DECLARATION}

+ Declare that an argument declared by the braced-code

+ ARGUMENT-DECLARATION is an additional `yyparse' argument. The

+ ARGUMENT-DECLARATION is used when declaring functions or

+ prototypes. The last identifier in ARGUMENT-DECLARATION must be

+ the argument name.

+ Here's an example. Write this in the parser:

+ %parse-param {int *nastiness}

+ %parse-param {int *randomness}

+Then call the parser like this:

+ {

+ int nastiness, randomness;

+ ... /* Store proper data in `nastiness' and `randomness'. */

+ value = yyparse (&nastiness, &randomness);

+ ...

+ }

+In the grammar actions, use expressions like this to refer to the data:

+ exp: ... { ...; *randomness += 1; ... }

+File: bison.info, Node: Push Parser Function, Next: Pull Parser Function, Prev: Parser Function, Up: Interface

+4.2 The Push Parser Function `yypush_parse'

+===========================================

+(The current push parsing interface is experimental and may evolve.

+More user feedback will help to stabilize it.)

+ You call the function `yypush_parse' to parse a single token. This

+function is available if either the `%define api.push_pull "push"' or

+`%define api.push_pull "both"' declaration is used. *Note A Push

+Parser: Push Decl.

+ -- Function: int yypush_parse (yypstate *yyps)

+ The value returned by `yypush_parse' is the same as for yyparse

+ with the following exception. `yypush_parse' will return

+ YYPUSH_MORE if more input is required to finish parsing the

+ grammar.

+File: bison.info, Node: Pull Parser Function, Next: Parser Create Function, Prev: Push Parser Function, Up: Interface

+4.3 The Pull Parser Function `yypull_parse'

+===========================================

+(The current push parsing interface is experimental and may evolve.

+More user feedback will help to stabilize it.)

+ You call the function `yypull_parse' to parse the rest of the input

+stream. This function is available if the `%define api.push_pull

+"both"' declaration is used. *Note A Push Parser: Push Decl.

+ -- Function: int yypull_parse (yypstate *yyps)

+ The value returned by `yypull_parse' is the same as for `yyparse'.

+File: bison.info, Node: Parser Create Function, Next: Parser Delete Function, Prev: Pull Parser Function, Up: Interface

+4.4 The Parser Create Function `yystate_new'

+============================================

+(The current push parsing interface is experimental and may evolve.

+More user feedback will help to stabilize it.)

+ You call the function `yypstate_new' to create a new parser instance.

+This function is available if either the `%define api.push_pull "push"'

+or `%define api.push_pull "both"' declaration is used. *Note A Push

+Parser: Push Decl.

+ -- Function: yypstate *yypstate_new (void)

+ The fuction will return a valid parser instance if there was

+ memory available or 0 if no memory was available. In impure mode,

+ it will also return 0 if a parser instance is currently allocated.

+File: bison.info, Node: Parser Delete Function, Next: Lexical, Prev: Parser Create Function, Up: Interface

+4.5 The Parser Delete Function `yystate_delete'

+===============================================

+(The current push parsing interface is experimental and may evolve.

+More user feedback will help to stabilize it.)

+ You call the function `yypstate_delete' to delete a parser instance.

+function is available if either the `%define api.push_pull "push"' or

+`%define api.push_pull "both"' declaration is used. *Note A Push

+Parser: Push Decl.

+ -- Function: void yypstate_delete (yypstate *yyps)

+ This function will reclaim the memory associated with a parser

+ instance. After this call, you should no longer attempt to use

+ the parser instance.

+File: bison.info, Node: Lexical, Next: Error Reporting, Prev: Parser Delete Function, Up: Interface

+4.6 The Lexical Analyzer Function `yylex'

+=========================================

+The "lexical analyzer" function, `yylex', recognizes tokens from the

+input stream and returns them to the parser. Bison does not create

+this function automatically; you must write it so that `yyparse' can

+call it. The function is sometimes referred to as a lexical scanner.

+ In simple programs, `yylex' is often defined at the end of the Bison

+grammar file. If `yylex' is defined in a separate source file, you

+need to arrange for the token-type macro definitions to be available

+there. To do this, use the `-d' option when you run Bison, so that it

+will write these macro definitions into a separate header file

+`NAME.tab.h' which you can include in the other source files that need

+it. *Note Invoking Bison: Invocation.

+* Menu:

+* Calling Convention:: How `yyparse' calls `yylex'.

+* Token Values:: How `yylex' must return the semantic value

+ of the token it has read.

+* Token Locations:: How `yylex' must return the text location

+ (line number, etc.) of the token, if the

+ actions want that.

+* Pure Calling:: How the calling convention differs in a pure parser

+ (*note A Pure (Reentrant) Parser: Pure Decl.).

+File: bison.info, Node: Calling Convention, Next: Token Values, Up: Lexical

+4.6.1 Calling Convention for `yylex'

+------------------------------------

+The value that `yylex' returns must be the positive numeric code for

+the type of token it has just found; a zero or negative value signifies

+end-of-input.

+ When a token is referred to in the grammar rules by a name, that name

+in the parser file becomes a C macro whose definition is the proper

+numeric code for that token type. So `yylex' can use the name to

+indicate that type. *Note Symbols::.

+ When a token is referred to in the grammar rules by a character

+literal, the numeric code for that character is also the code for the

+token type. So `yylex' can simply return that character code, possibly

+converted to `unsigned char' to avoid sign-extension. The null

+character must not be used this way, because its code is zero and that

+signifies end-of-input.

+ Here is an example showing these things:

+ int

+ yylex (void)

+ {

+ ...

+ if (c == EOF) /* Detect end-of-input. */

+ return 0;

+ ...

+ if (c == '+' || c == '-')

+ return c; /* Assume token type for `+' is '+'. */

+ ...

+ return INT; /* Return the type of the token. */

+ ...

+ }

+This interface has been designed so that the output from the `lex'

+utility can be used without change as the definition of `yylex'.

+ If the grammar uses literal string tokens, there are two ways that

+`yylex' can determine the token type codes for them:

+ * If the grammar defines symbolic token names as aliases for the

+ literal string tokens, `yylex' can use these symbolic names like

+ all others. In this case, the use of the literal string tokens in

+ the grammar file has no effect on `yylex'.

+ * `yylex' can find the multicharacter token in the `yytname' table.

+ The index of the token in the table is the token type's code. The

+ name of a multicharacter token is recorded in `yytname' with a

+ double-quote, the token's characters, and another double-quote.

+ The token's characters are escaped as necessary to be suitable as

+ input to Bison.

+ Here's code for looking up a multicharacter token in `yytname',

+ assuming that the characters of the token are stored in

+ `token_buffer', and assuming that the token does not contain any

+ characters like `"' that require escaping.

+ for (i = 0; i < YYNTOKENS; i++)

+ {

+ if (yytname[i] != 0

+ && yytname[i][0] == '"'

+ && ! strncmp (yytname[i] + 1, token_buffer,

+ strlen (token_buffer))

+ && yytname[i][strlen (token_buffer) + 1] == '"'

+ && yytname[i][strlen (token_buffer) + 2] == 0)

+ break;

+ }

+ The `yytname' table is generated only if you use the

+ `%token-table' declaration. *Note Decl Summary::.

+File: bison.info, Node: Token Values, Next: Token Locations, Prev: Calling Convention, Up: Lexical

+4.6.2 Semantic Values of Tokens

+-------------------------------

+In an ordinary (nonreentrant) parser, the semantic value of the token

+must be stored into the global variable `yylval'. When you are using

+just one data type for semantic values, `yylval' has that type. Thus,

+if the type is `int' (the default), you might write this in `yylex':

+ ...

+ yylval = value; /* Put value onto Bison stack. */

+ return INT; /* Return the type of the token. */

+ ...

+ When you are using multiple data types, `yylval''s type is a union

+made from the `%union' declaration (*note The Collection of Value

+Types: Union Decl.). So when you store a token's value, you must use

+the proper member of the union. If the `%union' declaration looks like

+this:

+ %union {

+ int intval;

+ double val;

+ symrec *tptr;

+ }

+then the code in `yylex' might look like this:

+ ...

+ yylval.intval = value; /* Put value onto Bison stack. */

+ return INT; /* Return the type of the token. */

+ ...

+File: bison.info, Node: Token Locations, Next: Pure Calling, Prev: Token Values, Up: Lexical

+4.6.3 Textual Locations of Tokens

+---------------------------------

+If you are using the `@N'-feature (*note Tracking Locations:

+Locations.) in actions to keep track of the textual locations of tokens

+and groupings, then you must provide this information in `yylex'. The

+function `yyparse' expects to find the textual location of a token just

+parsed in the global variable `yylloc'. So `yylex' must store the

+proper data in that variable.

+ By default, the value of `yylloc' is a structure and you need only

+initialize the members that are going to be used by the actions. The

+four members are called `first_line', `first_column', `last_line' and

+`last_column'. Note that the use of this feature makes the parser

+noticeably slower.

+ The data type of `yylloc' has the name `YYLTYPE'.

+File: bison.info, Node: Pure Calling, Prev: Token Locations, Up: Lexical

+4.6.4 Calling Conventions for Pure Parsers

+------------------------------------------

+When you use the Bison declaration `%define api.pure' to request a

+pure, reentrant parser, the global communication variables `yylval' and

+`yylloc' cannot be used. (*Note A Pure (Reentrant) Parser: Pure Decl.)

+In such parsers the two global variables are replaced by pointers

+passed as arguments to `yylex'. You must declare them as shown here,

+and pass the information back by storing it through those pointers.

+ int

+ yylex (YYSTYPE *lvalp, YYLTYPE *llocp)

+ {

+ ...

+ *lvalp = value; /* Put value onto Bison stack. */

+ return INT; /* Return the type of the token. */

+ ...

+ }

+ If the grammar file does not use the `@' constructs to refer to

+textual locations, then the type `YYLTYPE' will not be defined. In

+this case, omit the second argument; `yylex' will be called with only

+one argument.

+ If you wish to pass the additional parameter data to `yylex', use

+`%lex-param' just like `%parse-param' (*note Parser Function::).

+ -- Directive: lex-param {ARGUMENT-DECLARATION}

+ Declare that the braced-code ARGUMENT-DECLARATION is an additional

+ `yylex' argument declaration.

+ For instance:

+ %parse-param {int *nastiness}

+ %lex-param {int *nastiness}

+ %parse-param {int *randomness}

+results in the following signature:

+ int yylex (int *nastiness);

+ int yyparse (int *nastiness, int *randomness);

+ If `%define api.pure' is added:

+ int yylex (YYSTYPE *lvalp, int *nastiness);

+ int yyparse (int *nastiness, int *randomness);

+and finally, if both `%define api.pure' and `%locations' are used:

+ int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);

+ int yyparse (int *nastiness, int *randomness);

+File: bison.info, Node: Error Reporting, Next: Action Features, Prev: Lexical, Up: Interface

+4.7 The Error Reporting Function `yyerror'

+==========================================

+The Bison parser detects a "syntax error" or "parse error" whenever it

+reads a token which cannot satisfy any syntax rule. An action in the

+grammar can also explicitly proclaim an error, using the macro

+`YYERROR' (*note Special Features for Use in Actions: Action Features.).

+ The Bison parser expects to report the error by calling an error

+reporting function named `yyerror', which you must supply. It is

+called by `yyparse' whenever a syntax error is found, and it receives

+one argument. For a syntax error, the string is normally

+`"syntax error"'.

+ If you invoke the directive `%error-verbose' in the Bison

+declarations section (*note The Bison Declarations Section: Bison

+Declarations.), then Bison provides a more verbose and specific error

+message string instead of just plain `"syntax error"'.

+ The parser can detect one other kind of error: memory exhaustion.

+This can happen when the input contains constructions that are very

+deeply nested. It isn't likely you will encounter this, since the Bison

+parser normally extends its stack automatically up to a very large

+limit. But if memory is exhausted, `yyparse' calls `yyerror' in the

+usual fashion, except that the argument string is `"memory exhausted"'.

+ In some cases diagnostics like `"syntax error"' are translated

+automatically from English to some other language before they are

+passed to `yyerror'. *Note Internationalization::.

+ The following definition suffices in simple programs:

+ void

+ yyerror (char const *s)

+ {

+ fprintf (stderr, "%s\n", s);

+ }

+ After `yyerror' returns to `yyparse', the latter will attempt error

+recovery if you have written suitable error recovery grammar rules

+(*note Error Recovery::). If recovery is impossible, `yyparse' will

+immediately return 1.

+ Obviously, in location tracking pure parsers, `yyerror' should have

+an access to the current location. This is indeed the case for the GLR

+parsers, but not for the Yacc parser, for historical reasons. I.e., if

+`%locations %define api.pure' is passed then the prototypes for

+`yyerror' are:

+ void yyerror (char const *msg); /* Yacc parsers. */

+ void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */

+ If `%parse-param {int *nastiness}' is used, then:

+ void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */

+ void yyerror (int *nastiness, char const *msg); /* GLR parsers. */

+ Finally, GLR and Yacc parsers share the same `yyerror' calling

+convention for absolutely pure parsers, i.e., when the calling

+convention of `yylex' _and_ the calling convention of `%define

+api.pure' are pure. I.e.:

+ /* Location tracking. */

+ %locations

+ /* Pure yylex. */

+ %define api.pure

+ %lex-param {int *nastiness}

+ /* Pure yyparse. */

+ %parse-param {int *nastiness}

+ %parse-param {int *randomness}

+results in the following signatures for all the parser kinds:

+ int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);

+ int yyparse (int *nastiness, int *randomness);

+ void yyerror (YYLTYPE *locp,

+ int *nastiness, int *randomness,

+ char const *msg);

+The prototypes are only indications of how the code produced by Bison

+uses `yyerror'. Bison-generated code always ignores the returned

+value, so `yyerror' can return any type, including `void'. Also,

+`yyerror' can be a variadic function; that is why the message is always

+passed last.

+ Traditionally `yyerror' returns an `int' that is always ignored, but

+this is purely for historical reasons, and `void' is preferable since

+it more accurately describes the return type for `yyerror'.

+ The variable `yynerrs' contains the number of syntax errors reported

+so far. Normally this variable is global; but if you request a pure

+parser (*note A Pure (Reentrant) Parser: Pure Decl.) then it is a

+local variable which only the actions can access.

+File: bison.info, Node: Action Features, Next: Internationalization, Prev: Error Reporting, Up: Interface

+4.8 Special Features for Use in Actions

+=======================================

+Here is a table of Bison constructs, variables and macros that are

+useful in actions.

+ -- Variable: $$

+ Acts like a variable that contains the semantic value for the

+ grouping made by the current rule. *Note Actions::.

+ -- Variable: $N

+ Acts like a variable that contains the semantic value for the Nth

+ component of the current rule. *Note Actions::.

+ -- Variable: $<TYPEALT>$

+ Like `$$' but specifies alternative TYPEALT in the union specified

+ by the `%union' declaration. *Note Data Types of Values in

+ Actions: Action Types.

+ -- Variable: $<TYPEALT>N

+ Like `$N' but specifies alternative TYPEALT in the union specified

+ by the `%union' declaration. *Note Data Types of Values in

+ Actions: Action Types.

+ -- Macro: YYABORT;

+ Return immediately from `yyparse', indicating failure. *Note The

+ Parser Function `yyparse': Parser Function.

+ -- Macro: YYACCEPT;

+ Return immediately from `yyparse', indicating success. *Note The

+ Parser Function `yyparse': Parser Function.

+ -- Macro: YYBACKUP (TOKEN, VALUE);

+ Unshift a token. This macro is allowed only for rules that reduce

+ a single value, and only when there is no lookahead token. It is

+ also disallowed in GLR parsers. It installs a lookahead token

+ with token type TOKEN and semantic value VALUE; then it discards

+ the value that was going to be reduced by this rule.

+ If the macro is used when it is not valid, such as when there is a

+ lookahead token already, then it reports a syntax error with a

+ message `cannot back up' and performs ordinary error recovery.

+ In either case, the rest of the action is not executed.

+ -- Macro: YYEMPTY

+ Value stored in `yychar' when there is no lookahead token.

+ -- Macro: YYEOF

+ Value stored in `yychar' when the lookahead is the end of the input

+ stream.

+ -- Macro: YYERROR;

+ Cause an immediate syntax error. This statement initiates error

+ recovery just as if the parser itself had detected an error;

+ however, it does not call `yyerror', and does not print any

+ message. If you want to print an error message, call `yyerror'

+ explicitly before the `YYERROR;' statement. *Note Error

+ Recovery::.

+ -- Macro: YYRECOVERING

+ The expression `YYRECOVERING ()' yields 1 when the parser is

+ recovering from a syntax error, and 0 otherwise. *Note Error

+ Recovery::.

+ -- Variable: yychar

+ Variable containing either the lookahead token, or `YYEOF' when the

+ lookahead is the end of the input stream, or `YYEMPTY' when no

+ lookahead has been performed so the next token is not yet known.

+ Do not modify `yychar' in a deferred semantic action (*note GLR

+ Semantic Actions::). *Note Lookahead Tokens: Lookahead.

+ -- Macro: yyclearin;

+ Discard the current lookahead token. This is useful primarily in

+ error rules. Do not invoke `yyclearin' in a deferred semantic

+ action (*note GLR Semantic Actions::). *Note Error Recovery::.

+ -- Macro: yyerrok;

+ Resume generating error messages immediately for subsequent syntax

+ errors. This is useful primarily in error rules. *Note Error

+ Recovery::.

+ -- Variable: yylloc

+ Variable containing the lookahead token location when `yychar' is

+ not set to `YYEMPTY' or `YYEOF'. Do not modify `yylloc' in a

+ deferred semantic action (*note GLR Semantic Actions::). *Note

+ Actions and Locations: Actions and Locations.

+ -- Variable: yylval

+ Variable containing the lookahead token semantic value when

+ `yychar' is not set to `YYEMPTY' or `YYEOF'. Do not modify

+ `yylval' in a deferred semantic action (*note GLR Semantic

+ Actions::). *Note Actions: Actions.

+ -- Value: @$

+ Acts like a structure variable containing information on the

+ textual location of the grouping made by the current rule. *Note

+ Tracking Locations: Locations.

+ -- Value: @N

+ Acts like a structure variable containing information on the

+ textual location of the Nth component of the current rule. *Note

+ Tracking Locations: Locations.

+File: bison.info, Node: Internationalization, Prev: Action Features, Up: Interface

+4.9 Parser Internationalization

+===============================

+A Bison-generated parser can print diagnostics, including error and

+tracing messages. By default, they appear in English. However, Bison

+also supports outputting diagnostics in the user's native language. To

+make this work, the user should set the usual environment variables.

+*Note The User's View: (gettext)Users. For example, the shell command

+`export LC_ALL=fr_CA.UTF-8' might set the user's locale to French

+Canadian using the UTF-8 encoding. The exact set of available locales

+depends on the user's installation.

+ The maintainer of a package that uses a Bison-generated parser

+enables the internationalization of the parser's output through the

+following steps. Here we assume a package that uses GNU Autoconf and

+GNU Automake.

+ 1. Into the directory containing the GNU Autoconf macros used by the

+ package--often called `m4'--copy the `bison-i18n.m4' file

+ installed by Bison under `share/aclocal/bison-i18n.m4' in Bison's

+ installation directory. For example:

+ cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4

+ 2. In the top-level `configure.ac', after the `AM_GNU_GETTEXT'

+ invocation, add an invocation of `BISON_I18N'. This macro is

+ defined in the file `bison-i18n.m4' that you copied earlier. It

+ causes `configure' to find the value of the `BISON_LOCALEDIR'

+ variable, and it defines the source-language symbol `YYENABLE_NLS'

+ to enable translations in the Bison-generated parser.

+ 3. In the `main' function of your program, designate the directory

+ containing Bison's runtime message catalog, through a call to

+ `bindtextdomain' with domain name `bison-runtime'. For example:

+ bindtextdomain ("bison-runtime", BISON_LOCALEDIR);

+ Typically this appears after any other call `bindtextdomain

+ (PACKAGE, LOCALEDIR)' that your package already has. Here we rely

+ on `BISON_LOCALEDIR' to be defined as a string through the

+ `Makefile'.

+ 4. In the `Makefile.am' that controls the compilation of the `main'

+ function, make `BISON_LOCALEDIR' available as a C preprocessor

+ macro, either in `DEFS' or in `AM_CPPFLAGS'. For example:

+ DEFS = @DEFS@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'

+ or:

+ AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'

+ 5. Finally, invoke the command `autoreconf' to generate the build

+ infrastructure.

+File: bison.info, Node: Algorithm, Next: Error Recovery, Prev: Interface, Up: Top

+5 The Bison Parser Algorithm

+****************************

+As Bison reads tokens, it pushes them onto a stack along with their

+semantic values. The stack is called the "parser stack". Pushing a

+token is traditionally called "shifting".

+ For example, suppose the infix calculator has read `1 + 5 *', with a

+`3' to come. The stack will have four elements, one for each token

+that was shifted.

+ But the stack does not always have an element for each token read.

+When the last N tokens and groupings shifted match the components of a

+grammar rule, they can be combined according to that rule. This is

+called "reduction". Those tokens and groupings are replaced on the

+stack by a single grouping whose symbol is the result (left hand side)

+of that rule. Running the rule's action is part of the process of

+reduction, because this is what computes the semantic value of the

+resulting grouping.

+ For example, if the infix calculator's parser stack contains this:

+ 1 + 5 * 3

+and the next input token is a newline character, then the last three

+elements can be reduced to 15 via the rule:

+ expr: expr '*' expr;

+Then the stack contains just these three elements:

+ 1 + 15

+At this point, another reduction can be made, resulting in the single

+value 16. Then the newline token can be shifted.

+ The parser tries, by shifts and reductions, to reduce the entire

+input down to a single grouping whose symbol is the grammar's

+start-symbol (*note Languages and Context-Free Grammars: Language and

+Grammar.).

+ This kind of parser is known in the literature as a bottom-up parser.