bison/man/cat1p/yacc.1p.txt - Issue 10807020: Add native Windows binary for bison.

Unified Diff: bison/man/cat1p/yacc.1p.txt

Issue 10807020: Add native Windows binary for bison. (Closed) Base URL: svn://chrome-svn/chrome/trunk/deps/third_party/

Patch Set: Created 8 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: bison/man/cat1p/yacc.1p.txt

===================================================================

--- bison/man/cat1p/yacc.1p.txt (revision 0)

+++ bison/man/cat1p/yacc.1p.txt (revision 0)

@@ -0,0 +1,1269 @@

+YACC(1P) POSIX Programmer's Manual YACC(1P)

+PROLOG

+ This manual page is part of the POSIX Programmer's Man-

+ ual. The Linux implementation of this interface may

+ differ (consult the corresponding Linux manual page for

+ details of Linux behavior), or the interface may not be

+ implemented on Linux.

+NAME

+ yacc - yet another compiler compiler (DEVELOPMENT)

+SYNOPSIS

+ yacc [-dltv][-b file_prefix][-p sym_prefix] grammar

+DESCRIPTION

+ The yacc utility shall read a description of a context-

+ free grammar in grammar and write C source code, con-

+ forming to the ISO C standard, to a code file, and

+ optionally header information into a header file, in the

+ current directory. The C code shall define a function

+ and related routines and macros for an automaton that

+ executes a parsing algorithm meeting the requirements in

+ Algorithms .

+ The form and meaning of the grammar are described in the

+ EXTENDED DESCRIPTION section.

+ The C source code and header file shall be produced in a

+ form suitable as input for the C compiler (see c99 ).

+OPTIONS

+ The yacc utility shall conform to the Base Definitions

+ volume of IEEE Std 1003.1-2001, Section 12.2, Utility

+ Syntax Guidelines.

+ The following options shall be supported:

+ -b file_prefix

+ Use file_prefix instead of y as the prefix for

+ all output filenames. The code file y.tab.c, the

+ header file y.tab.h (created when -d is speci-

+ fied), and the description file y.output (created

+ when -v is specified), shall be changed to

+ file_prefix .tab.c, file_prefix .tab.h, and

+ file_prefix .output, respectively.

+ -d Write the header file; by default only the code

+ file is written. The #define statements associate

+ the token codes assigned by yacc with the user-

+ declared token names. This allows source files

+ other than y.tab.c to access the token codes.

+ -l Produce a code file that does not contain any

+ #line constructs. If this option is not present,

+ it is unspecified whether the code file or header

+ file contains #line directives. This should only

+ be used after the grammar and the associated

+ actions are fully debugged.

+ -p sym_prefix

+ Use sym_prefix instead of yy as the prefix for

+ all external names produced by yacc. The names

+ affected shall include the functions yyparse(),

+ yylex(), and yyerror(), and the variables yylval,

+ yychar, and yydebug. (In the remainder of this

+ section, the six symbols cited are referenced

+ using their default names only as a notational

+ convenience.) Local names may also be affected by

+ the -p option; however, the -p option shall not

+ affect #define symbols generated by yacc.

+ -t Modify conditional compilation directives to per-

+ mit compilation of debugging code in the code

+ file. Runtime debugging statements shall always

+ be contained in the code file, but by default

+ conditional compilation directives prevent their

+ compilation.

+ -v Write a file containing a description of the

+ parser and a report of conflicts generated by

+ ambiguities in the grammar.

+OPERANDS

+ The following operand is required:

+ grammar

+ A pathname of a file containing instructions,

+ hereafter called grammar, for which a parser is

+ to be created. The format for the grammar is

+ described in the EXTENDED DESCRIPTION section.

+STDIN

+ Not used.

+INPUT FILES

+ The file grammar shall be a text file formatted as spec-

+ ified in the EXTENDED DESCRIPTION section.

+ENVIRONMENT VARIABLES

+ The following environment variables shall affect the

+ execution of yacc:

+ LANG Provide a default value for the internationaliza-

+ tion variables that are unset or null. (See the

+ Base Definitions volume of IEEE Std 1003.1-2001,

+ Section 8.2, Internationalization Variables for

+ the precedence of internationalization variables

+ used to determine the values of locale cate-

+ gories.)

+ LC_ALL If set to a non-empty string value, override the

+ values of all the other internationalization

+ variables.

+ LC_CTYPE

+ Determine the locale for the interpretation of

+ sequences of bytes of text data as characters

+ (for example, single-byte as opposed to multi-

+ byte characters in arguments and input files).

+ LC_MESSAGES

+ Determine the locale that should be used to

+ affect the format and contents of diagnostic mes-

+ sages written to standard error.

+ NLSPATH

+ Determine the location of message catalogs for

+ the processing of LC_MESSAGES .

+ The LANG and LC_* variables affect the execution of the

+ yacc utility as stated. The main() function defined in

+ Yacc Library shall call:

+ setlocale(LC_ALL, "")

+ and thus the program generated by yacc shall also be

+ affected by the contents of these variables at runtime.

+ASYNCHRONOUS EVENTS

+ Default.

+STDOUT

+ Not used.

+STDERR

+ If shift/reduce or reduce/reduce conflicts are detected

+ in grammar, yacc shall write a report of those conflicts

+ to the standard error in an unspecified format.

+ Standard error shall also be used for diagnostic mes-

+ sages.

+OUTPUT FILES

+ The code file, the header file, and the description file

+ shall be text files. All are described in the following

+ sections.

+ Code File

+ This file shall contain the C source code for the

+ yyparse() function. It shall contain code for the vari-

+ ous semantic actions with macro substitution performed

+ on them as described in the EXTENDED DESCRIPTION sec-

+ tion. It also shall contain a copy of the #define state-

+ ments in the header file. If a %union declaration is

+ used, the declaration for YYSTYPE shall also be included

+ in this file.

+ Header File

+ The header file shall contain #define statements that

+ associate the token numbers with the token names. This

+ allows source files other than the code file to access

+ the token codes. If a %union declaration is used, the

+ declaration for YYSTYPE and an extern YYSTYPE yylval

+ declaration shall also be included in this file.

+ Description File

+ The description file shall be a text file containing a

+ description of the state machine corresponding to the

+ parser, using an unspecified format. Limits for internal

+ tables (see Limits ) shall also be reported, in an

+ implementation-defined manner. (Some implementations may

+ use dynamic allocation techniques and have no specific

+ limit values to report.)

+EXTENDED DESCRIPTION

+ The yacc command accepts a language that is used to

+ define a grammar for a target language to be parsed by

+ the tables and code generated by yacc. The language

+ accepted by yacc as a grammar for the target language is

+ described below using the yacc input language itself.

+ The input grammar includes rules describing the input

+ structure of the target language and code to be invoked

+ when these rules are recognized to provide the associ-

+ ated semantic action. The code to be executed shall

+ appear as bodies of text that are intended to be C-lan-

+ guage code. The C-language inclusions are presumed to

+ form a correct function when processed by yacc into its

+ output files. The code included in this way shall be

+ executed during the recognition of the target language.

+ Given a grammar, the yacc utility generates the files

+ described in the OUTPUT FILES section. The code file can

+ be compiled and linked using c99. If the declaration and

+ programs sections of the grammar file did not include

+ definitions of main(), yylex(), and yyerror(), the com-

+ piled output requires linking with externally supplied

+ versions of those functions. Default versions of main()

+ and yyerror() are supplied in the yacc library and can

+ be linked in by using the -l y operand to c99. The yacc

+ library interfaces need not support interfaces with

+ other than the default yy symbol prefix. The application

+ provides the lexical analyzer function, yylex(); the lex

+ utility is specifically designed to generate such a rou-

+ tine.

+ Input Language

+ The application shall ensure that every specification

+ file consists of three sections in order: declarations,

+ grammar rules, and programs, separated by double percent

+ signs ( "%%" ). The declarations and programs sections

+ can be empty. If the latter is empty, the preceding "%%"

+ mark separating it from the rules section can be omit-

+ ted.

+ The input is free form text following the structure of

+ the grammar defined below.

+ Lexical Structure of the Grammar

+ The <blank>s, <newline>s, and <form-feed>s shall be

+ ignored, except that the application shall ensure that

+ they do not appear in names or multi-character reserved

+ symbols. Comments shall be enclosed in "/* ... */", and

+ can appear wherever a name is valid.

+ Names are of arbitrary length, made up of letters, peri-

+ ods ( '.' ), underscores ( '_' ), and non-initial dig-

+ its. Uppercase and lowercase letters are distinct. Con-

+ forming applications shall not use names beginning in yy

+ or YY since the yacc parser uses such names. Many of the

+ names appear in the final output of yacc, and thus they

+ should be chosen to conform with any additional rules

+ created by the C compiler to be used. In particular they

+ appear in #define statements.

+ A literal shall consist of a single character enclosed

+ in single-quotes ( '" ). All of the escape sequences

+ supported for character constants by the ISO C standard

+ shall be supported by yacc.

+ The relationship with the lexical analyzer is discussed

+ in detail below.

+ The application shall ensure that the NUL character is

+ not used in grammar rules or literals.

+ Declarations Section

+ The declarations section is used to define the symbols

+ used to define the target language and their relation-

+ ship with each other. In particular, much of the addi-

+ tional information required to resolve ambiguities in

+ the context-free grammar for the target language is pro-

+ vided here.

+ Usually yacc assigns the relationship between the sym-

+ bolic names it generates and their underlying numeric

+ value. The declarations section makes it possible to

+ control the assignment of these values.

+ It is also possible to keep semantic information associ-

+ ated with the tokens currently on the parse stack in a

+ user-defined C-language union, if the members of the

+ union are associated with the various names in the gram-

+ mar. The declarations section provides for this as well.

+ The first group of declarators below all take a list of

+ names as arguments. That list can optionally be pre-

+ ceded by the name of a C union member (called a tag

+ below) appearing within '<' and '>' . (As an exception

+ to the typographical conventions of the rest of this

+ volume of IEEE Std 1003.1-2001, in this case <tag> does

+ not represent a metavariable, but the literal angle

+ bracket characters surrounding a symbol.) The use of tag

+ specifies that the tokens named on this line shall be of

+ the same C type as the union member referenced by tag.

+ This is discussed in more detail below.

+ For lists used to define tokens, the first appearance of

+ a given token can be followed by a positive integer (as

+ a string of decimal digits). If this is done, the under-

+ lying value assigned to it for lexical purposes shall be

+ taken to be that number.

+ The following declares name to be a token:

+ %token [<tag>] name [number][name [number]]...

+ If tag is present, the C type for all tokens on this

+ line shall be declared to be the type referenced by tag.

+ If a positive integer, number, follows a name, that

+ value shall be assigned to the token.

+ The following declares name to be a token, and assigns

+ precedence to it:

+ %left [<tag>] name [number][name [number]]...

+ %right [<tag>] name [number][name [number]]...

+ One or more lines, each beginning with one of these sym-

+ bols, can appear in this section. All tokens on the same

+ line have the same precedence level and associativity;

+ the lines are in order of increasing precedence or bind-

+ ing strength. %left denotes that the operators on that

+ line are left associative, and %right similarly denotes

+ right associative operators. If tag is present, it shall

+ declare a C type for names as described for %token.

+ The following declares name to be a token, and indicates

+ that this cannot be used associatively:

+ %nonassoc [<tag>] name [number][name [number]]...

+ If the parser encounters associative use of this token

+ it reports an error. If tag is present, it shall declare

+ a C type for names as described for %token.

+ The following declares that union member names are non-

+ terminals, and thus it is required to have a tag field

+ at its beginning:

+ %type <tag> name...

+ Because it deals with non-terminals only, assigning a

+ token number or using a literal is also prohibited. If

+ this construct is present, yacc shall perform type

+ checking; if this construct is not present, the parse

+ stack shall hold only the int type.

+ Every name used in grammar not defined by a %token,

+ %left, %right, or %nonassoc declaration is assumed to

+ represent a non-terminal symbol. The yacc utility shall

+ report an error for any non-terminal symbol that does

+ not appear on the left side of at least one grammar

+ rule.

+ Once the type, precedence, or token number of a name is

+ specified, it shall not be changed. If the first decla-

+ ration of a token does not assign a token number, yacc

+ shall assign a token number. Once this assignment is

+ made, the token number shall not be changed by explicit

+ assignment.

+ The following declarators do not follow the previous

+ pattern.

+ The following declares the non-terminal name to be the

+ start symbol, which represents the largest, most general

+ structure described by the grammar rules:

+ %start name

+ By default, it is the left-hand side of the first gram-

+ mar rule; this default can be overridden with this dec-

+ laration.

+ The following declares the yacc value stack to be a

+ union of the various types of values desired:

+ %union { body of union (in C) }

+ By default, the values returned by actions (see below)

+ and the lexical analyzer shall be of type int. The yacc

+ utility keeps track of types, and it shall insert corre-

+ sponding union member names in order to perform strict

+ type checking of the resulting parser.

+ Alternatively, given that at least one <tag> construct

+ is used, the union can be declared in a header file

+ (which shall be included in the declarations section by

+ using a #include construct within %{ and %}), and a

+ typedef used to define the symbol YYSTYPE to represent

+ this union. The effect of %union is to provide the dec-

+ laration of YYSTYPE directly from the yacc input.

+ C-language declarations and definitions can appear in

+ the declarations section, enclosed by the following

+ marks:

+ %{ ... %}

+ These statements shall be copied into the code file, and

+ have global scope within it so that they can be used in

+ the rules and program sections.

+ The application shall ensure that the declarations sec-

+ tion is terminated by the token %%.

+ Grammar Rules in yacc

+ The rules section defines the context-free grammar to be

+ accepted by the function yacc generates, and associates

+ with those rules C-language actions and additional

+ precedence information. The grammar is described below,

+ and a formal definition follows.

+ The rules section is comprised of one or more grammar

+ rules. A grammar rule has the form:

+ A : BODY ;

+ The symbol A represents a non-terminal name, and BODY

+ represents a sequence of zero or more names, literals,

+ and semantic actions that can then be followed by

+ optional precedence rules. Only the names and literals

+ participate in the formation of the grammar; the seman-

+ tic actions and precedence rules are used in other ways.

+ The colon and the semicolon are yacc punctuation. If

+ there are several successive grammar rules with the same

+ left-hand side, the vertical bar '|' can be used to

+ avoid rewriting the left-hand side; in this case the

+ semicolon appears only after the last rule. The BODY

+ part can be empty (or empty of names and literals) to

+ indicate that the non-terminal symbol matches the empty

+ string.

+ The yacc utility assigns a unique number to each rule.

+ Rules using the vertical bar notation are distinct

+ rules. The number assigned to the rule appears in the

+ description file.

+ The elements comprising a BODY are:

+ name, literal

+ These form the rules of the grammar: name is

+ either a token or a non-terminal; literal stands

+ for itself (less the lexically required quotation

+ marks).

+ semantic action

+ With each grammar rule, the user can associate

+ actions to be performed each time the rule is

+ recognized in the input process. (Note that the

+ word "action" can also refer to the actions of

+ the parser-shift, reduce, and so on.)

+ These actions can return values and can obtain the val-

+ ues returned by previous actions. These values are kept

+ in objects of type YYSTYPE (see %union). The result

+ value of the action shall be kept on the parse stack

+ with the left-hand side of the rule, to be accessed by

+ other reductions as part of their right-hand side. By

+ using the <tag> information provided in the declarations

+ section, the code generated by yacc can be strictly type

+ checked and contain arbitrary information. In addition,

+ the lexical analyzer can provide the same kinds of val-

+ ues for tokens, if desired.

+ An action is an arbitrary C statement and as such can do

+ input or output, call subprograms, and alter external

+ variables. An action is one or more C statements

+ enclosed in curly braces '{' and '}' .

+ Certain pseudo-variables can be used in the action.

+ These are macros for access to data structures known

+ internally to yacc.

+ $$

+ The value of the action can be set by assigning

+ it to $$. If type checking is enabled and the

+ type of the value to be assigned cannot be deter-

+ mined, a diagnostic message may be generated.

+ $number

+ This refers to the value returned by the compo-

+ nent specified by the token number in the right

+ side of a rule, reading from left to right; num-

+ ber can be zero or negative. If number is zero or

+ negative, it refers to the data associated with

+ the name on the parser's stack preceding the

+ leftmost symbol of the current rule. (That is,

+ "$0" refers to the name immediately preceding the

+ leftmost name in the current rule to be found on

+ the parser's stack and "$-1" refers to the symbol

+ to its left.) If number refers to an element past

+ the current point in the rule, or beyond the bot-

+ tom of the stack, the result is undefined. If

+ type checking is enabled and the type of the

+ value to be assigned cannot be determined, a

+ diagnostic message may be generated.

+ $<tag>number

+ These correspond exactly to the corresponding

+ symbols without the tag inclusion, but allow for

+ strict type checking (and preclude unwanted type

+ conversions). The effect is that the macro is

+ expanded to use tag to select an element from the

+ YYSTYPE union (using dataname.tag). This is par-

+ ticularly useful if number is not positive.

+ $<tag>$

+ This imposes on the reference the type of the

+ union member referenced by tag. This construction

+ is applicable when a reference to a left context

+ value occurs in the grammar, and provides yacc

+ with a means for selecting a type.

+ Actions can occur anywhere in a rule (not just at the

+ end); an action can access values returned by actions to

+ its left, and in turn the value it returns can be

+ accessed by actions to its right. An action appearing

+ in the middle of a rule shall be equivalent to replacing

+ the action with a new non-terminal symbol and adding an

+ empty rule with that non-terminal symbol on the left-

+ hand side. The semantic action associated with the new

+ rule shall be equivalent to the original action. The use

+ of actions within rules might introduce conflicts that

+ would not otherwise exist.

+ By default, the value of a rule shall be the value of

+ the first element in it. If the first element does not

+ have a type (particularly in the case of a literal) and

+ type checking is turned on by %type, an error message

+ shall result.

+ precedence

+ The keyword %prec can be used to change the

+ precedence level associated with a particular

+ grammar rule. Examples of this are in cases where

+ a unary and binary operator have the same sym-

+ bolic representation, but need to be given dif-

+ ferent precedences, or where the handling of an

+ ambiguous if-else construction is necessary. The

+ reserved symbol %prec can appear immediately

+ after the body of the grammar rule and can be

+ followed by a token name or a literal. It shall

+ cause the precedence of the grammar rule to

+ become that of the following token name or lit-

+ eral. The action for the rule as a whole can fol-

+ low %prec.

+ If a program section follows, the application shall

+ ensure that the grammar rules are terminated by %%.

+ Programs Section

+ The programs section can include the definition of the

+ lexical analyzer yylex(), and any other functions; for

+ example, those used in the actions specified in the

+ grammar rules. It is unspecified whether the programs

+ section precedes or follows the semantic actions in the

+ output file; therefore, if the application contains any

+ macro definitions and declarations intended to apply to

+ the code in the semantic actions, it shall place them

+ within "%{ ... %}" in the declarations section.

+ Input Grammar

+ The following input to yacc yields a parser for the

+ input to yacc. This formal syntax takes precedence over

+ the preceding text syntax description.

+ The lexical structure is defined less precisely; Lexical

+ Structure of the Grammar defines most terms. The corre-

+ spondence between the previous terms and the tokens

+ below is as follows.

+ IDENTIFIER

+ This corresponds to the concept of name, given

+ previously. It also includes literals as defined

+ previously.

+ C_IDENTIFIER

+ This is a name, and additionally it is known to

+ be followed by a colon. A literal cannot yield

+ this token.

+ NUMBER A string of digits (a non-negative decimal inte-

+ ger).

+ TYPE, LEFT, MARK, LCURL, RCURL

+ These correspond directly to %type, %left, %%,

+ %{, and %}.

+ { ... }

+ This indicates C-language source code, with the

+ possible inclusion of '$' macros as discussed

+ previously.

+ /* Grammar for the input to yacc. */

+ /* Basic entries. */

+ /* The following are recognized by the lexical analyzer. */

+ %token IDENTIFIER /* Includes identifiers and literals */

+ %token C_IDENTIFIER /* identifier (but not literal)

+ followed by a :. */

+ %token NUMBER /* [0-9][0-9]* */

+ /* Reserved words : %type=>TYPE %left=>LEFT, and so on */

+ %token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION

+ %token MARK /* The %% mark. */

+ %token LCURL /* The %{ mark. */

+ %token RCURL /* The %} mark. */

+ /* 8-bit character literals stand for themselves; */

+ /* tokens have to be defined for multi-byte characters. */

+ %start spec

+ %%

+ spec : defs MARK rules tail

+ ;

+ tail : MARK

+ {

+ /* In this action, set up the rest of the file. */

+ }

+ | /* Empty; the second MARK is optional. */

+ ;

+ defs : /* Empty. */

+ | defs def

+ ;

+ def : START IDENTIFIER

+ | UNION

+ {

+ /* Copy union definition to output. */

+ }

+ | LCURL

+ {

+ /* Copy C code to output file. */

+ }

+ RCURL

+ | rword tag nlist

+ ;

+ rword : TOKEN

+ | LEFT

+ | RIGHT

+ | NONASSOC

+ | TYPE

+ ;

+ tag : /* Empty: union tag ID optional. */

+ | '<' IDENTIFIER '>'

+ ;

+ nlist : nmno

+ | nlist nmno

+ ;

+ nmno : IDENTIFIER /* Note: literal invalid with % type. */

+ | IDENTIFIER NUMBER /* Note: invalid with % type. */

+ ;

+ /* Rule section */

+ rules : C_IDENTIFIER rbody prec

+ | rules rule

+ ;

+ rule : C_IDENTIFIER rbody prec

+ | '|' rbody prec

+ ;

+ rbody : /* empty */

+ | rbody IDENTIFIER

+ | rbody act

+ ;

+ act : '{'

+ {

+ /* Copy action, translate $$, and so on. */

+ }

+ '}'

+ ;

+ prec : /* Empty */

+ | PREC IDENTIFIER

+ | PREC IDENTIFIER act

+ | prec ';'

+ ;

+ Conflicts

+ The parser produced for an input grammar may contain

+ states in which conflicts occur. The conflicts occur

+ because the grammar is not LALR(1). An ambiguous grammar

+ always contains at least one LALR(1) conflict. The yacc

+ utility shall resolve all conflicts, using either

+ default rules or user-specified precedence rules.

+ Conflicts are either shift/reduce conflicts or

+ reduce/reduce conflicts. A shift/reduce conflict is

+ where, for a given state and lookahead symbol, both a

+ shift action and a reduce action are possible. A

+ reduce/reduce conflict is where, for a given state and

+ lookahead symbol, reductions by two different rules are

+ possible.

+ The rules below describe how to specify what actions to

+ take when a conflict occurs. Not all shift/reduce con-

+ flicts can be successfully resolved this way because the

+ conflict may be due to something other than ambiguity,

+ so incautious use of these facilities can cause the lan-

+ guage accepted by the parser to be much different from

+ that which was intended. The description file shall con-

+ tain sufficient information to understand the cause of

+ the conflict. Where ambiguity is the reason either the

+ default or explicit rules should be adequate to produce

+ a working parser.

+ The declared precedences and associativities (see Decla-

+ rations Section ) are used to resolve parsing conflicts

+ as follows:

+ 1. A precedence and associativity is associated with

+ each grammar rule; it is the precedence and associa-

+ tivity of the last token or literal in the body of

+ the rule. If the %prec keyword is used, it overrides

+ this default. Some grammar rules might not have both

+ precedence and associativity.

+ 2. If there is a shift/reduce conflict, and both the

+ grammar rule and the input symbol have precedence

+ and associativity associated with them, then the

+ conflict is resolved in favor of the action (shift

+ or reduce) associated with the higher precedence. If

+ the precedences are the same, then the associativity

+ is used; left associative implies reduce, right

+ associative implies shift, and non-associative

+ implies an error in the string being parsed.

+ 3. When there is a shift/reduce conflict that cannot be

+ resolved by rule 2, the shift is done. Conflicts

+ resolved this way are counted in the diagnostic out-

+ put described in Error Handling .

+ 4. When there is a reduce/reduce conflict, a reduction

+ is done by the grammar rule that occurs earlier in

+ the input sequence. Conflicts resolved this way are

+ counted in the diagnostic output described in Error

+ Handling .

+ Conflicts resolved by precedence or associativity shall

+ not be counted in the shift/reduce and reduce/reduce

+ conflicts reported by yacc on either standard error or

+ in the description file.

+ Error Handling

+ The token error shall be reserved for error handling.

+ The name error can be used in grammar rules. It indi-

+ cates places where the parser can recover from a syntax

+ error. The default value of error shall be 256. Its

+ value can be changed using a %token declaration. The

+ lexical analyzer should not return the value of error.

+ The parser shall detect a syntax error when it is in a

+ state where the action associated with the lookahead

+ symbol is error. A semantic action can cause the parser

+ to initiate error handling by executing the macro YYER-

+ ROR. When YYERROR is executed, the semantic action

+ passes control back to the parser. YYERROR cannot be

+ used outside of semantic actions.

+ When the parser detects a syntax error, it normally

+ calls yyerror() with the character string "syntax error"

+ as its argument. The call shall not be made if the

+ parser is still recovering from a previous error when

+ the error is detected. The parser is considered to be

+ recovering from a previous error until the parser has

+ shifted over at least three normal input symbols since

+ the last error was detected or a semantic action has

+ executed the macro yyerrok. The parser shall not call

+ yyerror() when YYERROR is executed.

+ The macro function YYRECOVERING shall return 1 if a syn-

+ tax error has been detected and the parser has not yet

+ fully recovered from it. Otherwise, zero shall be

+ returned.

+ When a syntax error is detected by the parser, the

+ parser shall check if a previous syntax error has been

+ detected. If a previous error was detected, and if no

+ normal input symbols have been shifted since the preced-

+ ing error was detected, the parser checks if the looka-

+ head symbol is an endmarker (see Interface to the Lexi-

+ cal Analyzer ). If it is, the parser shall return with a

+ non-zero value. Otherwise, the lookahead symbol shall be

+ discarded and normal parsing shall resume.

+ When YYERROR is executed or when the parser detects a

+ syntax error and no previous error has been detected, or

+ at least one normal input symbol has been shifted since

+ the previous error was detected, the parser shall pop

+ back one state at a time until the parse stack is empty

+ or the current state allows a shift over error. If the

+ parser empties the parse stack, it shall return with a

+ non-zero value. Otherwise, it shall shift over error and

+ then resume normal parsing. If the parser reads a looka-

+ head symbol before the error was detected, that symbol

+ shall still be the lookahead symbol when parsing is

+ resumed.

+ The macro yyerrok in a semantic action shall cause the

+ parser to act as if it has fully recovered from any pre-

+ vious errors. The macro yyclearin shall cause the parser

+ to discard the current lookahead token. If the current

+ lookahead token has not yet been read, yyclearin shall

+ have no effect.

+ The macro YYACCEPT shall cause the parser to return with

+ the value zero. The macro YYABORT shall cause the parser

+ to return with a non-zero value.

+ Interface to the Lexical Analyzer

+ The yylex() function is an integer-valued function that

+ returns a token number representing the kind of token

+ read. If there is a value associated with the token

+ returned by yylex() (see the discussion of tag above),

+ it shall be assigned to the external variable yylval.

+ If the parser and yylex() do not agree on these token

+ numbers, reliable communication between them cannot

+ occur. For (single-byte character) literals, the token

+ is simply the numeric value of the character in the cur-

+ rent character set. The numbers for other tokens can

+ either be chosen by yacc, or chosen by the user. In

+ either case, the #define construct of C is used to allow

+ yylex() to return these numbers symbolically. The

+ #define statements are put into the code file, and the

+ header file if that file is requested. The set of char-

+ acters permitted by yacc in an identifier is larger than

+ that permitted by C. Token names found to contain such

+ characters shall not be included in the #define declara-

+ tions.

+ If the token numbers are chosen by yacc, the tokens

+ other than literals shall be assigned numbers greater

+ than 256, although no order is implied. A token can be

+ explicitly assigned a number by following its first

+ appearance in the declarations section with a number.

+ Names and literals not defined this way retain their

+ default definition. All token numbers assigned by yacc

+ shall be unique and distinct from the token numbers used

+ for literals and user-assigned tokens. If duplicate

+ token numbers cause conflicts in parser generation, yacc

+ shall report an error; otherwise, it is unspecified

+ whether the token assignment is accepted or an error is

+ reported.

+ The end of the input is marked by a special token called

+ the endmarker, which has a token number that is zero or

+ negative. (These values are invalid for any other

+ token.) All lexical analyzers shall return zero or nega-

+ tive as a token number upon reaching the end of their

+ input. If the tokens up to, but excluding, the endmarker

+ form a structure that matches the start symbol, the

+ parser shall accept the input. If the endmarker is seen

+ in any other context, it shall be considered an error.

+ Completing the Program

+ In addition to yyparse() and yylex(), the functions

+ yyerror() and main() are required to make a complete

+ program. The application can supply main() and yyer-

+ ror(), or those routines can be obtained from the yacc

+ library.

+ Yacc Library

+ The following functions shall appear only in the yacc

+ library accessible through the -l y operand to c99; they

+ can therefore be redefined by a conforming application:

+ int main(void)

+ This function shall call yyparse() and exit with

+ an unspecified value. Other actions within this

+ function are unspecified.

+ int yyerror(const char *s)

+ This function shall write the NUL-terminated

+ argument to standard error, followed by a <new-

+ line>.

+ The order of the -l y and -l l operands given to c99 is

+ significant; the application shall either provide its

+ own main() function or ensure that -l y precedes -l l.

+ Debugging the Parser

+ The parser generated by yacc shall have diagnostic

+ facilities in it that can be optionally enabled at

+ either compile time or at runtime (if enabled at compile

+ time). The compilation of the runtime debugging code is

+ under the control of YYDEBUG, a preprocessor symbol. If

+ YYDEBUG has a non-zero value, the debugging code shall

+ be included. If its value is zero, the code shall not be

+ included.

+ In parsers where the debugging code has been included,

+ the external int yydebug can be used to turn debugging

+ on (with a non-zero value) and off (zero value) at run-

+ time. The initial value of yydebug shall be zero.

+ When -t is specified, the code file shall be built such

+ that, if YYDEBUG is not already defined at compilation

+ time (using the c99 -D YYDEBUG option, for example),

+ YYDEBUG shall be set explicitly to 1. When -t is not

+ specified, the code file shall be built such that, if

+ YYDEBUG is not already defined, it shall be set explic-

+ itly to zero.

+ The format of the debugging output is unspecified but

+ includes at least enough information to determine the

+ shift and reduce actions, and the input symbols. It also

+ provides information about error recovery.

+ Algorithms

+ The parser constructed by yacc implements an LALR(1)

+ parsing algorithm as documented in the literature. It is

+ unspecified whether the parser is table-driven or

+ direct-coded.

+ A parser generated by yacc shall never request an input

+ symbol from yylex() while in a state where the only

+ actions other than the error action are reductions by a

+ single rule.

+ The literature of parsing theory defines these concepts.

+ Limits

+ The yacc utility may have several internal tables. The

+ minimum maximums for these tables are shown in the fol-

+ lowing table. The exact meaning of these values is

+ implementation-defined. The implementation shall define

+ the relationship between these values and between them

+ and any error messages that the implementation may gen-

+ erate should it run out of space for any internal struc-

+ ture. An implementation may combine groups of these

+ resources into a single pool as long as the total avail-

+ able to the user does not fall below the sum of the

+ sizes specified by this section.

+ Table: Internal Limits in yacc

+ Minimum

+ Limit Maximum Description

+ {NTERMS} 126 Number of tokens.

+ {NNONTERM} 200 Number of non-terminals.

+ {NPROD} 300 Number of rules.

+ {NSTATES} 600 Number of states.

+ {MEMSIZE} 5200 Length of rules. The total length, in

+ names (tokens and non-terminals), of all

+ the rules of the grammar. The left-hand

+ side is counted for each rule, even if

+ it is not explicitly repeated, as speci-

+ fied in Grammar Rules in yacc .

+ {ACTSIZE} 4000 Number of actions. "Actions" here (and

+ in the description file) refer to parser

+ actions (shift, reduce, and so on) not

+ to semantic actions defined in Grammar

+ Rules in yacc .

+EXIT STATUS

+ The following exit values shall be returned:

+ 0 Successful completion.

+ >0 An error occurred.

+CONSEQUENCES OF ERRORS

+ If any errors are encountered, the run is aborted and

+ yacc exits with a non-zero status. Partial code files

+ and header files may be produced. The summary informa-

+ tion in the description file shall always be produced if

+ the -v flag is present.

+ The following sections are informative.

+APPLICATION USAGE

+ Historical implementations experience name conflicts on

+ the names yacc.tmp, yacc.acts, yacc.debug, y.tab.c,

+ y.tab.h, and y.output if more than one copy of yacc is

+ running in a single directory at one time. The -b option

+ was added to overcome this problem. The related problem

+ of allowing multiple yacc parsers to be placed in the

+ same file was addressed by adding a -p option to over-

+ ride the previously hard-coded yy variable prefix.

+ The description of the -p option specifies the minimal

+ set of function and variable names that cause conflict

+ when multiple parsers are linked together. YYSTYPE does

+ not need to be changed. Instead, the programmer can use

+ -b to give the header files for different parsers dif-

+ ferent names, and then the file with the yylex() for a

+ given parser can include the header for that parser.

+ Names such as yyclearerr do not need to be changed

+ because they are used only in the actions; they do not

+ have linkage. It is possible that an implementation has

+ other names, either internal ones for implementing

+ things such as yyclearerr, or providing non-standard

+ features that it wants to change with -p.

+ Unary operators that are the same token as a binary

+ operator in general need their precedence adjusted. This

+ is handled by the %prec advisory symbol associated with

+ the particular grammar rule defining that unary opera-

+ tor. (See Grammar Rules in yacc .) Applications are not

+ required to use this operator for unary operators, but

+ the grammars that do not require it are rare.

+EXAMPLES

+ Access to the yacc library is obtained with library

+ search operands to c99. To use the yacc library main():

+ c99 y.tab.c -l y

+ Both the lex library and the yacc library contain

+ main(). To access the yacc main():

+ c99 y.tab.c lex.yy.c -l y -l l

+ This ensures that the yacc library is searched first, so

+ that its main() is used.

+ The historical yacc libraries have contained two simple

+ functions that are normally coded by the application

+ programmer. These functions are similar to the follow-

+ ing code:

+ #include <locale.h>

+ int main(void)

+ {

+ extern int yyparse();

+ setlocale(LC_ALL, "");

+ /* If the following parser is one created by lex, the

+ application must be careful to ensure that LC_CTYPE

+ and LC_COLLATE are set to the POSIX locale. */

+ (void) yyparse();

+ return (0);

+ }

+ #include <stdio.h>

+ int yyerror(const char *msg)

+ {

+ (void) fprintf(stderr, "%s\n", msg);

+ return (0);

+ }

+RATIONALE

+ The references in may be helpful in constructing the

+ parser generator. The referenced DeRemer and Pennello

+ article (along with the works it references) describes a

+ technique to generate parsers that conform to this vol-

+ ume of IEEE Std 1003.1-2001. Work in this area contin-

+ ues to be done, so implementors should consult current

+ literature before doing any new implementations. The

+ original Knuth article is the theoretical basis for this

+ kind of parser, but the tables it generates are imprac-

+ tically large for reasonable grammars and should not be

+ used. The "equivalent to" wording is intentional to

+ assure that the best tables that are LALR(1) can be gen-

+ erated.

+ There has been confusion between the class of grammars,

+ the algorithms needed to generate parsers, and the algo-

+ rithms needed to parse the languages. They are all rea-

+ sonably orthogonal. In particular, a parser generator

+ that accepts the full range of LR(1) grammars need not

+ generate a table any more complex than one that accepts

+ SLR(1) (a relatively weak class of LR grammars) for a

+ grammar that happens to be SLR(1). Such an implementa-

+ tion need not recognize the case, either; table compres-

+ sion can yield the SLR(1) table (or one even smaller

+ than that) without recognizing that the grammar is

+ SLR(1). The speed of an LR(1) parser for any class is

+ dependent more upon the table representation and com-

+ pression (or the code generation if a direct parser is

+ generated) than upon the class of grammar that the table

+ generator handles.

+ The speed of the parser generator is somewhat dependent

+ upon the class of grammar it handles. However, the orig-

+ inal Knuth article algorithms for constructing LR

+ parsers were judged by its author to be impractically

+ slow at that time. Although full LR is more complex than

+ LALR(1), as computer speeds and algorithms improve, the

+ difference (in terms of acceptable wall-clock execution

+ time) is becoming less significant.

+ Potential authors are cautioned that the referenced

+ DeRemer and Pennello article previously cited identifies

+ a bug (an over-simplification of the computation of

+ LALR(1) lookahead sets) in some of the LALR(1) algorithm

+ statements that preceded it to publication. They should

+ take the time to seek out that paper, as well as current

+ relevant work, particularly Aho's.

+ The -b option was added to provide a portable method for

+ permitting yacc to work on multiple separate parsers in

+ the same directory. If a directory contains more than

+ one yacc grammar, and both grammars are constructed at

+ the same time (by, for example, a parallel make pro-

+ gram), conflict results. While the solution is not his-

+ torical practice, it corrects a known deficiency in his-

+ torical implementations. Corresponding changes were made

+ to all sections that referenced the filenames y.tab.c

+ (now "the code file"), y.tab.h (now "the header file"),

+ and y.output (now "the description file").

+ The grammar for yacc input is based on System V documen-

+ tation. The textual description shows there that the

+ ';' is required at the end of the rule. The grammar and

+ the implementation do not require this. (The use of

+ C_IDENTIFIER causes a reduce to occur in the right

+ place.)

+ Also, in that implementation, the constructs such as

+ %token can be terminated by a semicolon, but this is not

+ permitted by the grammar. The keywords such as %token

+ can also appear in uppercase, which is again not dis-

+ cussed. In most places where '%' is used, '\' can be

+ substituted, and there are alternate spellings for some

+ of the symbols (for example, %LEFT can be "%<" or even

+ "\<" ).

+ Historically, <tag> can contain any characters except

+ '>', including white space, in the implementation. How-

+ ever, since the tag must reference an ISO C standard

+ union member, in practice conforming implementations

+ need to support only the set of characters for ISO C

+ standard identifiers in this context.

+ Some historical implementations are known to accept

+ actions that are terminated by a period. Historical

+ implementations often allow '$' in names. A conforming

+ implementation does not need to support either of these

+ behaviors.

+ Deciding when to use %prec illustrates the difficulty in

+ specifying the behavior of yacc. There may be situations

+ in which the grammar is not, strictly speaking, in

+ error, and yet yacc cannot interpret it unambiguously.

+ The resolution of ambiguities in the grammar can in many

+ instances be resolved by providing additional informa-

+ tion, such as using %type or %union declarations. It is

+ often easier and it usually yields a smaller parser to

+ take this alternative when it is appropriate.

+ The size and execution time of a program produced with-

+ out the runtime debugging code is usually smaller and

+ slightly faster in historical implementations.

+ Statistics messages from several historical implementa-

+ tions include the following types of information:

+ n/512 terminals, n/300 non-terminals

+ n/600 grammar rules, n/1500 states

+ n shift/reduce, n reduce/reduce conflicts reported

+ n/350 working sets used

+ Memory: states, etc. n/15000, parser n/15000

+ n/600 distinct lookahead sets

+ n extra closures

+ n shift entries, n exceptions

+ n goto entries

+ n entries saved by goto default

+ Optimizer space used: input n/15000, output n/15000

+ n table entries, n zero

+ Maximum spread: n, Maximum offset: n

+ The report of internal tables in the description file is

+ left implementation-defined because all aspects of these

+ limits are also implementation-defined. Some implementa-

+ tions may use dynamic allocation techniques and have no

+ specific limit values to report.

+ The format of the y.output file is not given because

+ specification of the format was not seen to enhance

+ applications portability. The listing is primarily

+ intended to help human users understand and debug the

+ parser; use of y.output by a conforming application

+ script would be unusual. Furthermore, implementations

+ have not produced consistent output and no popular for-

+ mat was apparent. The format selected by the implementa-

+ tion should be human-readable, in addition to the

+ requirement that it be a text file.

+ Standard error reports are not specifically described

+ because they are seldom of use to conforming applica-

+ tions and there was no reason to restrict implementa-

+ tions.

+ Some implementations recognize "={" as equivalent to '{'

+ because it appears in historical documentation. This

+ construction was recognized and documented as obsolete

+ as long ago as 1978, in the referenced Yacc: Yet Another

+ Compiler-Compiler. This volume of IEEE Std 1003.1-2001

+ chose to leave it as obsolete and omit it.

+ Multi-byte characters should be recognized by the lexi-

+ cal analyzer and returned as tokens. They should not be

+ returned as multi-byte character literals. The token

+ error that is used for error recovery is normally

+ assigned the value 256 in the historical implementation.

+ Thus, the token value 256, which is used in many multi-

+ byte character sets, is not available for use as the

+ value of a user-defined token.

+FUTURE DIRECTIONS

+ None.

+SEE ALSO

+ c99, lex

+COPYRIGHT

+ Portions of this text are reprinted and reproduced in

+ electronic form from IEEE Std 1003.1, 2003 Edition,

+ Standard for Information Technology -- Portable Operat-

+ ing System Interface (POSIX), The Open Group Base Speci-

+ tute of Electrical and Electronics Engineers, Inc and

+ The Open Group. In the event of any discrepancy between

+ this version and the original IEEE and The Open Group

+ Standard, the original IEEE and The Open Group Standard

+ is the referee document. The original Standard can be

+ obtained online at http://www.open-

+ group.org/unix/online.html .

+IEEE/The Open Group 2003 YACC(1P)

Property changes on: bison\man\cat1p\yacc.1p.txt

___________________________________________________________________

Added: svn:eol-style

+ LF

« no previous file with comments | « bison/man/cat1/yacc.1.txt ('k') | bison/manifest/bison-2.4.1-bin.mft » ('j') | no next file with comments »