Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(218)

Side by Side Diff: bison/man/cat1p/yacc.1p.txt

Issue 10807020: Add native Windows binary for bison. (Closed) Base URL: svn://chrome-svn/chrome/trunk/deps/third_party/
Patch Set: Created 8 years, 5 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
« no previous file with comments | « bison/man/cat1/yacc.1.txt ('k') | bison/manifest/bison-2.4.1-bin.mft » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Property Changes:
Added: svn:eol-style
+ LF
OLDNEW
(Empty)
1 YACC(1P) POSIX Programmer's Manual YACC(1P)
2
3
4
5 PROLOG
6 This manual page is part of the POSIX Programmer's Man-
7 ual. The Linux implementation of this interface may
8 differ (consult the corresponding Linux manual page for
9 details of Linux behavior), or the interface may not be
10 implemented on Linux.
11
12 NAME
13 yacc - yet another compiler compiler (DEVELOPMENT)
14
15 SYNOPSIS
16 yacc [-dltv][-b file_prefix][-p sym_prefix] grammar
17
18 DESCRIPTION
19 The yacc utility shall read a description of a context-
20 free grammar in grammar and write C source code, con-
21 forming to the ISO C standard, to a code file, and
22 optionally header information into a header file, in the
23 current directory. The C code shall define a function
24 and related routines and macros for an automaton that
25 executes a parsing algorithm meeting the requirements in
26 Algorithms .
27
28 The form and meaning of the grammar are described in the
29 EXTENDED DESCRIPTION section.
30
31 The C source code and header file shall be produced in a
32 form suitable as input for the C compiler (see c99 ).
33
34 OPTIONS
35 The yacc utility shall conform to the Base Definitions
36 volume of IEEE Std 1003.1-2001, Section 12.2, Utility
37 Syntax Guidelines.
38
39 The following options shall be supported:
40
41 -b file_prefix
42 Use file_prefix instead of y as the prefix for
43 all output filenames. The code file y.tab.c, the
44 header file y.tab.h (created when -d is speci-
45 fied), and the description file y.output (created
46 when -v is specified), shall be changed to
47 file_prefix .tab.c, file_prefix .tab.h, and
48 file_prefix .output, respectively.
49
50 -d Write the header file; by default only the code
51 file is written. The #define statements associate
52 the token codes assigned by yacc with the user-
53 declared token names. This allows source files
54 other than y.tab.c to access the token codes.
55
56 -l Produce a code file that does not contain any
57 #line constructs. If this option is not present,
58 it is unspecified whether the code file or header
59 file contains #line directives. This should only
60 be used after the grammar and the associated
61 actions are fully debugged.
62
63 -p sym_prefix
64
65 Use sym_prefix instead of yy as the prefix for
66 all external names produced by yacc. The names
67 affected shall include the functions yyparse(),
68 yylex(), and yyerror(), and the variables yylval,
69 yychar, and yydebug. (In the remainder of this
70 section, the six symbols cited are referenced
71 using their default names only as a notational
72 convenience.) Local names may also be affected by
73 the -p option; however, the -p option shall not
74 affect #define symbols generated by yacc.
75
76 -t Modify conditional compilation directives to per-
77 mit compilation of debugging code in the code
78 file. Runtime debugging statements shall always
79 be contained in the code file, but by default
80 conditional compilation directives prevent their
81 compilation.
82
83 -v Write a file containing a description of the
84 parser and a report of conflicts generated by
85 ambiguities in the grammar.
86
87
88 OPERANDS
89 The following operand is required:
90
91 grammar
92 A pathname of a file containing instructions,
93 hereafter called grammar, for which a parser is
94 to be created. The format for the grammar is
95 described in the EXTENDED DESCRIPTION section.
96
97
98 STDIN
99 Not used.
100
101 INPUT FILES
102 The file grammar shall be a text file formatted as spec-
103 ified in the EXTENDED DESCRIPTION section.
104
105 ENVIRONMENT VARIABLES
106 The following environment variables shall affect the
107 execution of yacc:
108
109 LANG Provide a default value for the internationaliza-
110 tion variables that are unset or null. (See the
111 Base Definitions volume of IEEE Std 1003.1-2001,
112 Section 8.2, Internationalization Variables for
113 the precedence of internationalization variables
114 used to determine the values of locale cate-
115 gories.)
116
117 LC_ALL If set to a non-empty string value, override the
118 values of all the other internationalization
119 variables.
120
121 LC_CTYPE
122 Determine the locale for the interpretation of
123 sequences of bytes of text data as characters
124 (for example, single-byte as opposed to multi-
125 byte characters in arguments and input files).
126
127 LC_MESSAGES
128 Determine the locale that should be used to
129 affect the format and contents of diagnostic mes-
130 sages written to standard error.
131
132 NLSPATH
133 Determine the location of message catalogs for
134 the processing of LC_MESSAGES .
135
136
137 The LANG and LC_* variables affect the execution of the
138 yacc utility as stated. The main() function defined in
139 Yacc Library shall call:
140
141
142 setlocale(LC_ALL, "")
143
144 and thus the program generated by yacc shall also be
145 affected by the contents of these variables at runtime.
146
147 ASYNCHRONOUS EVENTS
148 Default.
149
150 STDOUT
151 Not used.
152
153 STDERR
154 If shift/reduce or reduce/reduce conflicts are detected
155 in grammar, yacc shall write a report of those conflicts
156 to the standard error in an unspecified format.
157
158 Standard error shall also be used for diagnostic mes-
159 sages.
160
161 OUTPUT FILES
162 The code file, the header file, and the description file
163 shall be text files. All are described in the following
164 sections.
165
166 Code File
167 This file shall contain the C source code for the
168 yyparse() function. It shall contain code for the vari-
169 ous semantic actions with macro substitution performed
170 on them as described in the EXTENDED DESCRIPTION sec-
171 tion. It also shall contain a copy of the #define state-
172 ments in the header file. If a %union declaration is
173 used, the declaration for YYSTYPE shall also be included
174 in this file.
175
176 Header File
177 The header file shall contain #define statements that
178 associate the token numbers with the token names. This
179 allows source files other than the code file to access
180 the token codes. If a %union declaration is used, the
181 declaration for YYSTYPE and an extern YYSTYPE yylval
182 declaration shall also be included in this file.
183
184 Description File
185 The description file shall be a text file containing a
186 description of the state machine corresponding to the
187 parser, using an unspecified format. Limits for internal
188 tables (see Limits ) shall also be reported, in an
189 implementation-defined manner. (Some implementations may
190 use dynamic allocation techniques and have no specific
191 limit values to report.)
192
193 EXTENDED DESCRIPTION
194 The yacc command accepts a language that is used to
195 define a grammar for a target language to be parsed by
196 the tables and code generated by yacc. The language
197 accepted by yacc as a grammar for the target language is
198 described below using the yacc input language itself.
199
200 The input grammar includes rules describing the input
201 structure of the target language and code to be invoked
202 when these rules are recognized to provide the associ-
203 ated semantic action. The code to be executed shall
204 appear as bodies of text that are intended to be C-lan-
205 guage code. The C-language inclusions are presumed to
206 form a correct function when processed by yacc into its
207 output files. The code included in this way shall be
208 executed during the recognition of the target language.
209
210 Given a grammar, the yacc utility generates the files
211 described in the OUTPUT FILES section. The code file can
212 be compiled and linked using c99. If the declaration and
213 programs sections of the grammar file did not include
214 definitions of main(), yylex(), and yyerror(), the com-
215 piled output requires linking with externally supplied
216 versions of those functions. Default versions of main()
217 and yyerror() are supplied in the yacc library and can
218 be linked in by using the -l y operand to c99. The yacc
219 library interfaces need not support interfaces with
220 other than the default yy symbol prefix. The application
221 provides the lexical analyzer function, yylex(); the lex
222 utility is specifically designed to generate such a rou-
223 tine.
224
225 Input Language
226 The application shall ensure that every specification
227 file consists of three sections in order: declarations,
228 grammar rules, and programs, separated by double percent
229 signs ( "%%" ). The declarations and programs sections
230 can be empty. If the latter is empty, the preceding "%%"
231 mark separating it from the rules section can be omit-
232 ted.
233
234 The input is free form text following the structure of
235 the grammar defined below.
236
237 Lexical Structure of the Grammar
238 The <blank>s, <newline>s, and <form-feed>s shall be
239 ignored, except that the application shall ensure that
240 they do not appear in names or multi-character reserved
241 symbols. Comments shall be enclosed in "/* ... */", and
242 can appear wherever a name is valid.
243
244 Names are of arbitrary length, made up of letters, peri-
245 ods ( '.' ), underscores ( '_' ), and non-initial dig-
246 its. Uppercase and lowercase letters are distinct. Con-
247 forming applications shall not use names beginning in yy
248 or YY since the yacc parser uses such names. Many of the
249 names appear in the final output of yacc, and thus they
250 should be chosen to conform with any additional rules
251 created by the C compiler to be used. In particular they
252 appear in #define statements.
253
254 A literal shall consist of a single character enclosed
255 in single-quotes ( '" ). All of the escape sequences
256 supported for character constants by the ISO C standard
257 shall be supported by yacc.
258
259 The relationship with the lexical analyzer is discussed
260 in detail below.
261
262 The application shall ensure that the NUL character is
263 not used in grammar rules or literals.
264
265 Declarations Section
266 The declarations section is used to define the symbols
267 used to define the target language and their relation-
268 ship with each other. In particular, much of the addi-
269 tional information required to resolve ambiguities in
270 the context-free grammar for the target language is pro-
271 vided here.
272
273 Usually yacc assigns the relationship between the sym-
274 bolic names it generates and their underlying numeric
275 value. The declarations section makes it possible to
276 control the assignment of these values.
277
278 It is also possible to keep semantic information associ-
279 ated with the tokens currently on the parse stack in a
280 user-defined C-language union, if the members of the
281 union are associated with the various names in the gram-
282 mar. The declarations section provides for this as well.
283
284 The first group of declarators below all take a list of
285 names as arguments. That list can optionally be pre-
286 ceded by the name of a C union member (called a tag
287 below) appearing within '<' and '>' . (As an exception
288 to the typographical conventions of the rest of this
289 volume of IEEE Std 1003.1-2001, in this case <tag> does
290 not represent a metavariable, but the literal angle
291 bracket characters surrounding a symbol.) The use of tag
292 specifies that the tokens named on this line shall be of
293 the same C type as the union member referenced by tag.
294 This is discussed in more detail below.
295
296 For lists used to define tokens, the first appearance of
297 a given token can be followed by a positive integer (as
298 a string of decimal digits). If this is done, the under-
299 lying value assigned to it for lexical purposes shall be
300 taken to be that number.
301
302 The following declares name to be a token:
303
304
305 %token [<tag>] name [number][name [number]]...
306
307 If tag is present, the C type for all tokens on this
308 line shall be declared to be the type referenced by tag.
309 If a positive integer, number, follows a name, that
310 value shall be assigned to the token.
311
312 The following declares name to be a token, and assigns
313 precedence to it:
314
315
316 %left [<tag>] name [number][name [number]]...
317 %right [<tag>] name [number][name [number]]...
318
319 One or more lines, each beginning with one of these sym-
320 bols, can appear in this section. All tokens on the same
321 line have the same precedence level and associativity;
322 the lines are in order of increasing precedence or bind-
323 ing strength. %left denotes that the operators on that
324 line are left associative, and %right similarly denotes
325 right associative operators. If tag is present, it shall
326 declare a C type for names as described for %token.
327
328 The following declares name to be a token, and indicates
329 that this cannot be used associatively:
330
331
332 %nonassoc [<tag>] name [number][name [number]]...
333
334 If the parser encounters associative use of this token
335 it reports an error. If tag is present, it shall declare
336 a C type for names as described for %token.
337
338 The following declares that union member names are non-
339 terminals, and thus it is required to have a tag field
340 at its beginning:
341
342
343 %type <tag> name...
344
345 Because it deals with non-terminals only, assigning a
346 token number or using a literal is also prohibited. If
347 this construct is present, yacc shall perform type
348 checking; if this construct is not present, the parse
349 stack shall hold only the int type.
350
351 Every name used in grammar not defined by a %token,
352 %left, %right, or %nonassoc declaration is assumed to
353 represent a non-terminal symbol. The yacc utility shall
354 report an error for any non-terminal symbol that does
355 not appear on the left side of at least one grammar
356 rule.
357
358 Once the type, precedence, or token number of a name is
359 specified, it shall not be changed. If the first decla-
360 ration of a token does not assign a token number, yacc
361 shall assign a token number. Once this assignment is
362 made, the token number shall not be changed by explicit
363 assignment.
364
365 The following declarators do not follow the previous
366 pattern.
367
368 The following declares the non-terminal name to be the
369 start symbol, which represents the largest, most general
370 structure described by the grammar rules:
371
372
373 %start name
374
375 By default, it is the left-hand side of the first gram-
376 mar rule; this default can be overridden with this dec-
377 laration.
378
379 The following declares the yacc value stack to be a
380 union of the various types of values desired:
381
382
383 %union { body of union (in C) }
384
385 By default, the values returned by actions (see below)
386 and the lexical analyzer shall be of type int. The yacc
387 utility keeps track of types, and it shall insert corre-
388 sponding union member names in order to perform strict
389 type checking of the resulting parser.
390
391 Alternatively, given that at least one <tag> construct
392 is used, the union can be declared in a header file
393 (which shall be included in the declarations section by
394 using a #include construct within %{ and %}), and a
395 typedef used to define the symbol YYSTYPE to represent
396 this union. The effect of %union is to provide the dec-
397 laration of YYSTYPE directly from the yacc input.
398
399 C-language declarations and definitions can appear in
400 the declarations section, enclosed by the following
401 marks:
402
403
404 %{ ... %}
405
406 These statements shall be copied into the code file, and
407 have global scope within it so that they can be used in
408 the rules and program sections.
409
410 The application shall ensure that the declarations sec-
411 tion is terminated by the token %%.
412
413 Grammar Rules in yacc
414 The rules section defines the context-free grammar to be
415 accepted by the function yacc generates, and associates
416 with those rules C-language actions and additional
417 precedence information. The grammar is described below,
418 and a formal definition follows.
419
420 The rules section is comprised of one or more grammar
421 rules. A grammar rule has the form:
422
423
424 A : BODY ;
425
426 The symbol A represents a non-terminal name, and BODY
427 represents a sequence of zero or more names, literals,
428 and semantic actions that can then be followed by
429 optional precedence rules. Only the names and literals
430 participate in the formation of the grammar; the seman-
431 tic actions and precedence rules are used in other ways.
432 The colon and the semicolon are yacc punctuation. If
433 there are several successive grammar rules with the same
434 left-hand side, the vertical bar '|' can be used to
435 avoid rewriting the left-hand side; in this case the
436 semicolon appears only after the last rule. The BODY
437 part can be empty (or empty of names and literals) to
438 indicate that the non-terminal symbol matches the empty
439 string.
440
441 The yacc utility assigns a unique number to each rule.
442 Rules using the vertical bar notation are distinct
443 rules. The number assigned to the rule appears in the
444 description file.
445
446 The elements comprising a BODY are:
447
448 name, literal
449 These form the rules of the grammar: name is
450 either a token or a non-terminal; literal stands
451 for itself (less the lexically required quotation
452 marks).
453
454 semantic action
455
456 With each grammar rule, the user can associate
457 actions to be performed each time the rule is
458 recognized in the input process. (Note that the
459 word "action" can also refer to the actions of
460 the parser-shift, reduce, and so on.)
461
462 These actions can return values and can obtain the val-
463 ues returned by previous actions. These values are kept
464 in objects of type YYSTYPE (see %union). The result
465 value of the action shall be kept on the parse stack
466 with the left-hand side of the rule, to be accessed by
467 other reductions as part of their right-hand side. By
468 using the <tag> information provided in the declarations
469 section, the code generated by yacc can be strictly type
470 checked and contain arbitrary information. In addition,
471 the lexical analyzer can provide the same kinds of val-
472 ues for tokens, if desired.
473
474 An action is an arbitrary C statement and as such can do
475 input or output, call subprograms, and alter external
476 variables. An action is one or more C statements
477 enclosed in curly braces '{' and '}' .
478
479 Certain pseudo-variables can be used in the action.
480 These are macros for access to data structures known
481 internally to yacc.
482
483 $$
484 The value of the action can be set by assigning
485 it to $$. If type checking is enabled and the
486 type of the value to be assigned cannot be deter-
487 mined, a diagnostic message may be generated.
488
489 $number
490 This refers to the value returned by the compo-
491 nent specified by the token number in the right
492 side of a rule, reading from left to right; num-
493 ber can be zero or negative. If number is zero or
494 negative, it refers to the data associated with
495 the name on the parser's stack preceding the
496 leftmost symbol of the current rule. (That is,
497 "$0" refers to the name immediately preceding the
498 leftmost name in the current rule to be found on
499 the parser's stack and "$-1" refers to the symbol
500 to its left.) If number refers to an element past
501 the current point in the rule, or beyond the bot-
502 tom of the stack, the result is undefined. If
503 type checking is enabled and the type of the
504 value to be assigned cannot be determined, a
505 diagnostic message may be generated.
506
507 $<tag>number
508
509 These correspond exactly to the corresponding
510 symbols without the tag inclusion, but allow for
511 strict type checking (and preclude unwanted type
512 conversions). The effect is that the macro is
513 expanded to use tag to select an element from the
514 YYSTYPE union (using dataname.tag). This is par-
515 ticularly useful if number is not positive.
516
517 $<tag>$
518 This imposes on the reference the type of the
519 union member referenced by tag. This construction
520 is applicable when a reference to a left context
521 value occurs in the grammar, and provides yacc
522 with a means for selecting a type.
523
524
525 Actions can occur anywhere in a rule (not just at the
526 end); an action can access values returned by actions to
527 its left, and in turn the value it returns can be
528 accessed by actions to its right. An action appearing
529 in the middle of a rule shall be equivalent to replacing
530 the action with a new non-terminal symbol and adding an
531 empty rule with that non-terminal symbol on the left-
532 hand side. The semantic action associated with the new
533 rule shall be equivalent to the original action. The use
534 of actions within rules might introduce conflicts that
535 would not otherwise exist.
536
537 By default, the value of a rule shall be the value of
538 the first element in it. If the first element does not
539 have a type (particularly in the case of a literal) and
540 type checking is turned on by %type, an error message
541 shall result.
542
543 precedence
544 The keyword %prec can be used to change the
545 precedence level associated with a particular
546 grammar rule. Examples of this are in cases where
547 a unary and binary operator have the same sym-
548 bolic representation, but need to be given dif-
549 ferent precedences, or where the handling of an
550 ambiguous if-else construction is necessary. The
551 reserved symbol %prec can appear immediately
552 after the body of the grammar rule and can be
553 followed by a token name or a literal. It shall
554 cause the precedence of the grammar rule to
555 become that of the following token name or lit-
556 eral. The action for the rule as a whole can fol-
557 low %prec.
558
559
560 If a program section follows, the application shall
561 ensure that the grammar rules are terminated by %%.
562
563 Programs Section
564 The programs section can include the definition of the
565 lexical analyzer yylex(), and any other functions; for
566 example, those used in the actions specified in the
567 grammar rules. It is unspecified whether the programs
568 section precedes or follows the semantic actions in the
569 output file; therefore, if the application contains any
570 macro definitions and declarations intended to apply to
571 the code in the semantic actions, it shall place them
572 within "%{ ... %}" in the declarations section.
573
574 Input Grammar
575 The following input to yacc yields a parser for the
576 input to yacc. This formal syntax takes precedence over
577 the preceding text syntax description.
578
579 The lexical structure is defined less precisely; Lexical
580 Structure of the Grammar defines most terms. The corre-
581 spondence between the previous terms and the tokens
582 below is as follows.
583
584 IDENTIFIER
585 This corresponds to the concept of name, given
586 previously. It also includes literals as defined
587 previously.
588
589 C_IDENTIFIER
590 This is a name, and additionally it is known to
591 be followed by a colon. A literal cannot yield
592 this token.
593
594 NUMBER A string of digits (a non-negative decimal inte-
595 ger).
596
597 TYPE, LEFT, MARK, LCURL, RCURL
598
599 These correspond directly to %type, %left, %%,
600 %{, and %}.
601
602 { ... }
603 This indicates C-language source code, with the
604 possible inclusion of '$' macros as discussed
605 previously.
606
607
608
609 /* Grammar for the input to yacc. */
610 /* Basic entries. */
611 /* The following are recognized by the lexical analyzer. */
612
613
614 %token IDENTIFIER /* Includes identifiers and literals */
615 %token C_IDENTIFIER /* identifier (but not literal)
616 followed by a :. */
617 %token NUMBER /* [0-9][0-9]* */
618
619
620 /* Reserved words : %type=>TYPE %left=>LEFT, and so on */
621
622
623 %token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
624
625
626 %token MARK /* The %% mark. */
627 %token LCURL /* The %{ mark. */
628 %token RCURL /* The %} mark. */
629
630
631 /* 8-bit character literals stand for themselves; */
632 /* tokens have to be defined for multi-byte characters. */
633
634
635 %start spec
636
637
638 %%
639
640
641 spec : defs MARK rules tail
642 ;
643 tail : MARK
644 {
645 /* In this action, set up the rest of the file. */
646 }
647 | /* Empty; the second MARK is optional. */
648 ;
649 defs : /* Empty. */
650 | defs def
651 ;
652 def : START IDENTIFIER
653 | UNION
654 {
655 /* Copy union definition to output. */
656 }
657 | LCURL
658 {
659 /* Copy C code to output file. */
660 }
661 RCURL
662 | rword tag nlist
663 ;
664 rword : TOKEN
665 | LEFT
666 | RIGHT
667 | NONASSOC
668 | TYPE
669 ;
670 tag : /* Empty: union tag ID optional. */
671 | '<' IDENTIFIER '>'
672 ;
673 nlist : nmno
674 | nlist nmno
675 ;
676 nmno : IDENTIFIER /* Note: literal invalid with % type. * /
677 | IDENTIFIER NUMBER /* Note: invalid with % type. */
678 ;
679
680
681 /* Rule section */
682
683
684 rules : C_IDENTIFIER rbody prec
685 | rules rule
686 ;
687 rule : C_IDENTIFIER rbody prec
688 | '|' rbody prec
689 ;
690 rbody : /* empty */
691 | rbody IDENTIFIER
692 | rbody act
693 ;
694 act : '{'
695 {
696 /* Copy action, translate $$, and so on. */
697 }
698 '}'
699 ;
700 prec : /* Empty */
701 | PREC IDENTIFIER
702 | PREC IDENTIFIER act
703 | prec ';'
704 ;
705
706 Conflicts
707 The parser produced for an input grammar may contain
708 states in which conflicts occur. The conflicts occur
709 because the grammar is not LALR(1). An ambiguous grammar
710 always contains at least one LALR(1) conflict. The yacc
711 utility shall resolve all conflicts, using either
712 default rules or user-specified precedence rules.
713
714 Conflicts are either shift/reduce conflicts or
715 reduce/reduce conflicts. A shift/reduce conflict is
716 where, for a given state and lookahead symbol, both a
717 shift action and a reduce action are possible. A
718 reduce/reduce conflict is where, for a given state and
719 lookahead symbol, reductions by two different rules are
720 possible.
721
722 The rules below describe how to specify what actions to
723 take when a conflict occurs. Not all shift/reduce con-
724 flicts can be successfully resolved this way because the
725 conflict may be due to something other than ambiguity,
726 so incautious use of these facilities can cause the lan-
727 guage accepted by the parser to be much different from
728 that which was intended. The description file shall con-
729 tain sufficient information to understand the cause of
730 the conflict. Where ambiguity is the reason either the
731 default or explicit rules should be adequate to produce
732 a working parser.
733
734 The declared precedences and associativities (see Decla-
735 rations Section ) are used to resolve parsing conflicts
736 as follows:
737
738 1. A precedence and associativity is associated with
739 each grammar rule; it is the precedence and associa-
740 tivity of the last token or literal in the body of
741 the rule. If the %prec keyword is used, it overrides
742 this default. Some grammar rules might not have both
743 precedence and associativity.
744
745
746 2. If there is a shift/reduce conflict, and both the
747 grammar rule and the input symbol have precedence
748 and associativity associated with them, then the
749 conflict is resolved in favor of the action (shift
750 or reduce) associated with the higher precedence. If
751 the precedences are the same, then the associativity
752 is used; left associative implies reduce, right
753 associative implies shift, and non-associative
754 implies an error in the string being parsed.
755
756
757 3. When there is a shift/reduce conflict that cannot be
758 resolved by rule 2, the shift is done. Conflicts
759 resolved this way are counted in the diagnostic out-
760 put described in Error Handling .
761
762
763 4. When there is a reduce/reduce conflict, a reduction
764 is done by the grammar rule that occurs earlier in
765 the input sequence. Conflicts resolved this way are
766 counted in the diagnostic output described in Error
767 Handling .
768
769
770 Conflicts resolved by precedence or associativity shall
771 not be counted in the shift/reduce and reduce/reduce
772 conflicts reported by yacc on either standard error or
773 in the description file.
774
775 Error Handling
776 The token error shall be reserved for error handling.
777 The name error can be used in grammar rules. It indi-
778 cates places where the parser can recover from a syntax
779 error. The default value of error shall be 256. Its
780 value can be changed using a %token declaration. The
781 lexical analyzer should not return the value of error.
782
783 The parser shall detect a syntax error when it is in a
784 state where the action associated with the lookahead
785 symbol is error. A semantic action can cause the parser
786 to initiate error handling by executing the macro YYER-
787 ROR. When YYERROR is executed, the semantic action
788 passes control back to the parser. YYERROR cannot be
789 used outside of semantic actions.
790
791 When the parser detects a syntax error, it normally
792 calls yyerror() with the character string "syntax error"
793 as its argument. The call shall not be made if the
794 parser is still recovering from a previous error when
795 the error is detected. The parser is considered to be
796 recovering from a previous error until the parser has
797 shifted over at least three normal input symbols since
798 the last error was detected or a semantic action has
799 executed the macro yyerrok. The parser shall not call
800 yyerror() when YYERROR is executed.
801
802 The macro function YYRECOVERING shall return 1 if a syn-
803 tax error has been detected and the parser has not yet
804 fully recovered from it. Otherwise, zero shall be
805 returned.
806
807 When a syntax error is detected by the parser, the
808 parser shall check if a previous syntax error has been
809 detected. If a previous error was detected, and if no
810 normal input symbols have been shifted since the preced-
811 ing error was detected, the parser checks if the looka-
812 head symbol is an endmarker (see Interface to the Lexi-
813 cal Analyzer ). If it is, the parser shall return with a
814 non-zero value. Otherwise, the lookahead symbol shall be
815 discarded and normal parsing shall resume.
816
817 When YYERROR is executed or when the parser detects a
818 syntax error and no previous error has been detected, or
819 at least one normal input symbol has been shifted since
820 the previous error was detected, the parser shall pop
821 back one state at a time until the parse stack is empty
822 or the current state allows a shift over error. If the
823 parser empties the parse stack, it shall return with a
824 non-zero value. Otherwise, it shall shift over error and
825 then resume normal parsing. If the parser reads a looka-
826 head symbol before the error was detected, that symbol
827 shall still be the lookahead symbol when parsing is
828 resumed.
829
830 The macro yyerrok in a semantic action shall cause the
831 parser to act as if it has fully recovered from any pre-
832 vious errors. The macro yyclearin shall cause the parser
833 to discard the current lookahead token. If the current
834 lookahead token has not yet been read, yyclearin shall
835 have no effect.
836
837 The macro YYACCEPT shall cause the parser to return with
838 the value zero. The macro YYABORT shall cause the parser
839 to return with a non-zero value.
840
841 Interface to the Lexical Analyzer
842 The yylex() function is an integer-valued function that
843 returns a token number representing the kind of token
844 read. If there is a value associated with the token
845 returned by yylex() (see the discussion of tag above),
846 it shall be assigned to the external variable yylval.
847
848 If the parser and yylex() do not agree on these token
849 numbers, reliable communication between them cannot
850 occur. For (single-byte character) literals, the token
851 is simply the numeric value of the character in the cur-
852 rent character set. The numbers for other tokens can
853 either be chosen by yacc, or chosen by the user. In
854 either case, the #define construct of C is used to allow
855 yylex() to return these numbers symbolically. The
856 #define statements are put into the code file, and the
857 header file if that file is requested. The set of char-
858 acters permitted by yacc in an identifier is larger than
859 that permitted by C. Token names found to contain such
860 characters shall not be included in the #define declara-
861 tions.
862
863 If the token numbers are chosen by yacc, the tokens
864 other than literals shall be assigned numbers greater
865 than 256, although no order is implied. A token can be
866 explicitly assigned a number by following its first
867 appearance in the declarations section with a number.
868 Names and literals not defined this way retain their
869 default definition. All token numbers assigned by yacc
870 shall be unique and distinct from the token numbers used
871 for literals and user-assigned tokens. If duplicate
872 token numbers cause conflicts in parser generation, yacc
873 shall report an error; otherwise, it is unspecified
874 whether the token assignment is accepted or an error is
875 reported.
876
877 The end of the input is marked by a special token called
878 the endmarker, which has a token number that is zero or
879 negative. (These values are invalid for any other
880 token.) All lexical analyzers shall return zero or nega-
881 tive as a token number upon reaching the end of their
882 input. If the tokens up to, but excluding, the endmarker
883 form a structure that matches the start symbol, the
884 parser shall accept the input. If the endmarker is seen
885 in any other context, it shall be considered an error.
886
887 Completing the Program
888 In addition to yyparse() and yylex(), the functions
889 yyerror() and main() are required to make a complete
890 program. The application can supply main() and yyer-
891 ror(), or those routines can be obtained from the yacc
892 library.
893
894 Yacc Library
895 The following functions shall appear only in the yacc
896 library accessible through the -l y operand to c99; they
897 can therefore be redefined by a conforming application:
898
899 int main(void)
900
901 This function shall call yyparse() and exit with
902 an unspecified value. Other actions within this
903 function are unspecified.
904
905 int yyerror(const char *s)
906
907 This function shall write the NUL-terminated
908 argument to standard error, followed by a <new-
909 line>.
910
911
912 The order of the -l y and -l l operands given to c99 is
913 significant; the application shall either provide its
914 own main() function or ensure that -l y precedes -l l.
915
916 Debugging the Parser
917 The parser generated by yacc shall have diagnostic
918 facilities in it that can be optionally enabled at
919 either compile time or at runtime (if enabled at compile
920 time). The compilation of the runtime debugging code is
921 under the control of YYDEBUG, a preprocessor symbol. If
922 YYDEBUG has a non-zero value, the debugging code shall
923 be included. If its value is zero, the code shall not be
924 included.
925
926 In parsers where the debugging code has been included,
927 the external int yydebug can be used to turn debugging
928 on (with a non-zero value) and off (zero value) at run-
929 time. The initial value of yydebug shall be zero.
930
931 When -t is specified, the code file shall be built such
932 that, if YYDEBUG is not already defined at compilation
933 time (using the c99 -D YYDEBUG option, for example),
934 YYDEBUG shall be set explicitly to 1. When -t is not
935 specified, the code file shall be built such that, if
936 YYDEBUG is not already defined, it shall be set explic-
937 itly to zero.
938
939 The format of the debugging output is unspecified but
940 includes at least enough information to determine the
941 shift and reduce actions, and the input symbols. It also
942 provides information about error recovery.
943
944 Algorithms
945 The parser constructed by yacc implements an LALR(1)
946 parsing algorithm as documented in the literature. It is
947 unspecified whether the parser is table-driven or
948 direct-coded.
949
950 A parser generated by yacc shall never request an input
951 symbol from yylex() while in a state where the only
952 actions other than the error action are reductions by a
953 single rule.
954
955 The literature of parsing theory defines these concepts.
956
957 Limits
958 The yacc utility may have several internal tables. The
959 minimum maximums for these tables are shown in the fol-
960 lowing table. The exact meaning of these values is
961 implementation-defined. The implementation shall define
962 the relationship between these values and between them
963 and any error messages that the implementation may gen-
964 erate should it run out of space for any internal struc-
965 ture. An implementation may combine groups of these
966 resources into a single pool as long as the total avail-
967 able to the user does not fall below the sum of the
968 sizes specified by this section.
969
970 Table: Internal Limits in yacc
971
972 Minimum
973 Limit Maximum Description
974 {NTERMS} 126 Number of tokens.
975 {NNONTERM} 200 Number of non-terminals.
976 {NPROD} 300 Number of rules.
977 {NSTATES} 600 Number of states.
978 {MEMSIZE} 5200 Length of rules. The total length, in
979 names (tokens and non-terminals), of all
980 the rules of the grammar. The left-hand
981 side is counted for each rule, even if
982 it is not explicitly repeated, as speci-
983 fied in Grammar Rules in yacc .
984 {ACTSIZE} 4000 Number of actions. "Actions" here (and
985 in the description file) refer to parser
986 actions (shift, reduce, and so on) not
987 to semantic actions defined in Grammar
988 Rules in yacc .
989
990 EXIT STATUS
991 The following exit values shall be returned:
992
993 0 Successful completion.
994
995 >0 An error occurred.
996
997
998 CONSEQUENCES OF ERRORS
999 If any errors are encountered, the run is aborted and
1000 yacc exits with a non-zero status. Partial code files
1001 and header files may be produced. The summary informa-
1002 tion in the description file shall always be produced if
1003 the -v flag is present.
1004
1005 The following sections are informative.
1006
1007 APPLICATION USAGE
1008 Historical implementations experience name conflicts on
1009 the names yacc.tmp, yacc.acts, yacc.debug, y.tab.c,
1010 y.tab.h, and y.output if more than one copy of yacc is
1011 running in a single directory at one time. The -b option
1012 was added to overcome this problem. The related problem
1013 of allowing multiple yacc parsers to be placed in the
1014 same file was addressed by adding a -p option to over-
1015 ride the previously hard-coded yy variable prefix.
1016
1017 The description of the -p option specifies the minimal
1018 set of function and variable names that cause conflict
1019 when multiple parsers are linked together. YYSTYPE does
1020 not need to be changed. Instead, the programmer can use
1021 -b to give the header files for different parsers dif-
1022 ferent names, and then the file with the yylex() for a
1023 given parser can include the header for that parser.
1024 Names such as yyclearerr do not need to be changed
1025 because they are used only in the actions; they do not
1026 have linkage. It is possible that an implementation has
1027 other names, either internal ones for implementing
1028 things such as yyclearerr, or providing non-standard
1029 features that it wants to change with -p.
1030
1031 Unary operators that are the same token as a binary
1032 operator in general need their precedence adjusted. This
1033 is handled by the %prec advisory symbol associated with
1034 the particular grammar rule defining that unary opera-
1035 tor. (See Grammar Rules in yacc .) Applications are not
1036 required to use this operator for unary operators, but
1037 the grammars that do not require it are rare.
1038
1039 EXAMPLES
1040 Access to the yacc library is obtained with library
1041 search operands to c99. To use the yacc library main():
1042
1043
1044 c99 y.tab.c -l y
1045
1046 Both the lex library and the yacc library contain
1047 main(). To access the yacc main():
1048
1049
1050 c99 y.tab.c lex.yy.c -l y -l l
1051
1052 This ensures that the yacc library is searched first, so
1053 that its main() is used.
1054
1055 The historical yacc libraries have contained two simple
1056 functions that are normally coded by the application
1057 programmer. These functions are similar to the follow-
1058 ing code:
1059
1060
1061 #include <locale.h>
1062 int main(void)
1063 {
1064 extern int yyparse();
1065
1066
1067 setlocale(LC_ALL, "");
1068
1069
1070 /* If the following parser is one created by lex, the
1071 application must be careful to ensure that LC_CTYPE
1072 and LC_COLLATE are set to the POSIX locale. */
1073 (void) yyparse();
1074 return (0);
1075 }
1076
1077
1078 #include <stdio.h>
1079
1080
1081 int yyerror(const char *msg)
1082 {
1083 (void) fprintf(stderr, "%s\n", msg);
1084 return (0);
1085 }
1086
1087 RATIONALE
1088 The references in may be helpful in constructing the
1089 parser generator. The referenced DeRemer and Pennello
1090 article (along with the works it references) describes a
1091 technique to generate parsers that conform to this vol-
1092 ume of IEEE Std 1003.1-2001. Work in this area contin-
1093 ues to be done, so implementors should consult current
1094 literature before doing any new implementations. The
1095 original Knuth article is the theoretical basis for this
1096 kind of parser, but the tables it generates are imprac-
1097 tically large for reasonable grammars and should not be
1098 used. The "equivalent to" wording is intentional to
1099 assure that the best tables that are LALR(1) can be gen-
1100 erated.
1101
1102 There has been confusion between the class of grammars,
1103 the algorithms needed to generate parsers, and the algo-
1104 rithms needed to parse the languages. They are all rea-
1105 sonably orthogonal. In particular, a parser generator
1106 that accepts the full range of LR(1) grammars need not
1107 generate a table any more complex than one that accepts
1108 SLR(1) (a relatively weak class of LR grammars) for a
1109 grammar that happens to be SLR(1). Such an implementa-
1110 tion need not recognize the case, either; table compres-
1111 sion can yield the SLR(1) table (or one even smaller
1112 than that) without recognizing that the grammar is
1113 SLR(1). The speed of an LR(1) parser for any class is
1114 dependent more upon the table representation and com-
1115 pression (or the code generation if a direct parser is
1116 generated) than upon the class of grammar that the table
1117 generator handles.
1118
1119 The speed of the parser generator is somewhat dependent
1120 upon the class of grammar it handles. However, the orig-
1121 inal Knuth article algorithms for constructing LR
1122 parsers were judged by its author to be impractically
1123 slow at that time. Although full LR is more complex than
1124 LALR(1), as computer speeds and algorithms improve, the
1125 difference (in terms of acceptable wall-clock execution
1126 time) is becoming less significant.
1127
1128 Potential authors are cautioned that the referenced
1129 DeRemer and Pennello article previously cited identifies
1130 a bug (an over-simplification of the computation of
1131 LALR(1) lookahead sets) in some of the LALR(1) algorithm
1132 statements that preceded it to publication. They should
1133 take the time to seek out that paper, as well as current
1134 relevant work, particularly Aho's.
1135
1136 The -b option was added to provide a portable method for
1137 permitting yacc to work on multiple separate parsers in
1138 the same directory. If a directory contains more than
1139 one yacc grammar, and both grammars are constructed at
1140 the same time (by, for example, a parallel make pro-
1141 gram), conflict results. While the solution is not his-
1142 torical practice, it corrects a known deficiency in his-
1143 torical implementations. Corresponding changes were made
1144 to all sections that referenced the filenames y.tab.c
1145 (now "the code file"), y.tab.h (now "the header file"),
1146 and y.output (now "the description file").
1147
1148 The grammar for yacc input is based on System V documen-
1149 tation. The textual description shows there that the
1150 ';' is required at the end of the rule. The grammar and
1151 the implementation do not require this. (The use of
1152 C_IDENTIFIER causes a reduce to occur in the right
1153 place.)
1154
1155 Also, in that implementation, the constructs such as
1156 %token can be terminated by a semicolon, but this is not
1157 permitted by the grammar. The keywords such as %token
1158 can also appear in uppercase, which is again not dis-
1159 cussed. In most places where '%' is used, '\' can be
1160 substituted, and there are alternate spellings for some
1161 of the symbols (for example, %LEFT can be "%<" or even
1162 "\<" ).
1163
1164 Historically, <tag> can contain any characters except
1165 '>', including white space, in the implementation. How-
1166 ever, since the tag must reference an ISO C standard
1167 union member, in practice conforming implementations
1168 need to support only the set of characters for ISO C
1169 standard identifiers in this context.
1170
1171 Some historical implementations are known to accept
1172 actions that are terminated by a period. Historical
1173 implementations often allow '$' in names. A conforming
1174 implementation does not need to support either of these
1175 behaviors.
1176
1177 Deciding when to use %prec illustrates the difficulty in
1178 specifying the behavior of yacc. There may be situations
1179 in which the grammar is not, strictly speaking, in
1180 error, and yet yacc cannot interpret it unambiguously.
1181 The resolution of ambiguities in the grammar can in many
1182 instances be resolved by providing additional informa-
1183 tion, such as using %type or %union declarations. It is
1184 often easier and it usually yields a smaller parser to
1185 take this alternative when it is appropriate.
1186
1187 The size and execution time of a program produced with-
1188 out the runtime debugging code is usually smaller and
1189 slightly faster in historical implementations.
1190
1191 Statistics messages from several historical implementa-
1192 tions include the following types of information:
1193
1194
1195 n/512 terminals, n/300 non-terminals
1196 n/600 grammar rules, n/1500 states
1197 n shift/reduce, n reduce/reduce conflicts reported
1198 n/350 working sets used
1199 Memory: states, etc. n/15000, parser n/15000
1200 n/600 distinct lookahead sets
1201 n extra closures
1202 n shift entries, n exceptions
1203 n goto entries
1204 n entries saved by goto default
1205 Optimizer space used: input n/15000, output n/15000
1206 n table entries, n zero
1207 Maximum spread: n, Maximum offset: n
1208
1209 The report of internal tables in the description file is
1210 left implementation-defined because all aspects of these
1211 limits are also implementation-defined. Some implementa-
1212 tions may use dynamic allocation techniques and have no
1213 specific limit values to report.
1214
1215 The format of the y.output file is not given because
1216 specification of the format was not seen to enhance
1217 applications portability. The listing is primarily
1218 intended to help human users understand and debug the
1219 parser; use of y.output by a conforming application
1220 script would be unusual. Furthermore, implementations
1221 have not produced consistent output and no popular for-
1222 mat was apparent. The format selected by the implementa-
1223 tion should be human-readable, in addition to the
1224 requirement that it be a text file.
1225
1226 Standard error reports are not specifically described
1227 because they are seldom of use to conforming applica-
1228 tions and there was no reason to restrict implementa-
1229 tions.
1230
1231 Some implementations recognize "={" as equivalent to '{'
1232 because it appears in historical documentation. This
1233 construction was recognized and documented as obsolete
1234 as long ago as 1978, in the referenced Yacc: Yet Another
1235 Compiler-Compiler. This volume of IEEE Std 1003.1-2001
1236 chose to leave it as obsolete and omit it.
1237
1238 Multi-byte characters should be recognized by the lexi-
1239 cal analyzer and returned as tokens. They should not be
1240 returned as multi-byte character literals. The token
1241 error that is used for error recovery is normally
1242 assigned the value 256 in the historical implementation.
1243 Thus, the token value 256, which is used in many multi-
1244 byte character sets, is not available for use as the
1245 value of a user-defined token.
1246
1247 FUTURE DIRECTIONS
1248 None.
1249
1250 SEE ALSO
1251 c99, lex
1252
1253 COPYRIGHT
1254 Portions of this text are reprinted and reproduced in
1255 electronic form from IEEE Std 1003.1, 2003 Edition,
1256 Standard for Information Technology -- Portable Operat-
1257 ing System Interface (POSIX), The Open Group Base Speci-
1258 fications Issue 6, Copyright (C) 2001-2003 by the Insti-
1259 tute of Electrical and Electronics Engineers, Inc and
1260 The Open Group. In the event of any discrepancy between
1261 this version and the original IEEE and The Open Group
1262 Standard, the original IEEE and The Open Group Standard
1263 is the referee document. The original Standard can be
1264 obtained online at http://www.open-
1265 group.org/unix/online.html .
1266
1267
1268
1269 IEEE/The Open Group 2003 YACC(1P)
OLDNEW
« no previous file with comments | « bison/man/cat1/yacc.1.txt ('k') | bison/manifest/bison-2.4.1-bin.mft » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698