Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(117)

Side by Side Diff: gperf/src/gperf/3.0.1/gperf-3.0.1-src/doc/gperf.texi

Issue 10804012: Add native Windows binary for gperf. (Closed) Base URL: svn://chrome-svn/chrome/trunk/deps/third_party/
Patch Set: Created 8 years, 5 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
OLDNEW
(Empty)
1 \input texinfo @c -*- texinfo -*-
2 @c %**start of header
3 @setfilename gperf.info
4 @settitle Perfect Hash Function Generator
5 @c @setchapternewpage odd
6 @c %**end of header
7
8 @c some day we should @include version.texi instead of defining
9 @c these values at hand.
10 @set UPDATED 12 June 2003
11 @set EDITION 3.0.1
12 @set VERSION 3.0.1
13 @c ---------------------
14
15 @c remove the black boxes generated in the GPL appendix.
16 @finalout
17
18 @c Merge functions into the concept index
19 @syncodeindex fn cp
20 @c @synindex pg cp
21
22 @dircategory Programming Tools
23 @direntry
24 * Gperf: (gperf). Perfect Hash Function Generator.
25 @end direntry
26
27 @ifinfo
28 This file documents the features of the GNU Perfect Hash Function
29 Generator @value{VERSION}.
30
31 Copyright @copyright{} 1989-2003 Free Software Foundation, Inc.
32
33 Permission is granted to make and distribute verbatim copies of this
34 manual provided the copyright notice and this permission notice are
35 preserved on all copies.
36
37 @ignore
38 Permission is granted to process this file through TeX and print the
39 results, provided the printed document carries a copying permission
40 notice identical to this one except for the removal of this paragraph
41 (this paragraph not being relevant to the printed manual).
42
43 @end ignore
44
45 Permission is granted to copy and distribute modified versions of this
46 manual under the conditions for verbatim copying, provided also that the
47 section entitled ``GNU General Public License'' is included exactly as
48 in the original, and provided that the entire resulting derived work is
49 distributed under the terms of a permission notice identical to this
50 one.
51
52 Permission is granted to copy and distribute translations of this manual
53 into another language, under the above conditions for modified versions,
54 except that the section entitled ``GNU General Public License'' and this
55 permission notice may be included in translations approved by the Free
56 Software Foundation instead of in the original English.
57
58 @end ifinfo
59
60 @titlepage
61 @title User's Guide to @code{gperf} @value{VERSION}
62 @subtitle The GNU Perfect Hash Function Generator
63 @subtitle Edition @value{EDITION}, @value{UPDATED}
64 @author Douglas C. Schmidt
65 @author Bruno Haible
66
67 @page
68 @vskip 0pt plus 1filll
69 Copyright @copyright{} 1989-2003 Free Software Foundation, Inc.
70
71
72 Permission is granted to make and distribute verbatim copies of
73 this manual provided the copyright notice and this permission notice
74 are preserved on all copies.
75
76 Permission is granted to copy and distribute modified versions of this
77 manual under the conditions for verbatim copying, provided also that the
78 section entitled ``GNU General Public License'' is included
79 exactly as in the original, and provided that the entire resulting
80 derived work is distributed under the terms of a permission notice
81 identical to this one.
82
83 Permission is granted to copy and distribute translations of this manual
84 into another language, under the above conditions for modified versions,
85 except that the section entitled ``GNU General Public License'' may be
86 included in a translation approved by the author instead of in the
87 original English.
88 @end titlepage
89
90 @ifinfo
91 @node Top, Copying, (dir), (dir)
92 @top Introduction
93
94 This manual documents the GNU @code{gperf} perfect hash function generator
95 utility, focusing on its features and how to use them, and how to report
96 bugs.
97
98 @menu
99 * Copying:: GNU @code{gperf} General Public License says
100 how you can copy and share @code{gperf}.
101 * Contributors:: People who have contributed to @code{gperf}.
102 * Motivation:: The purpose of @code{gperf}.
103 * Search Structures:: Static search structures and GNU @code{gperf}
104 * Description:: High-level discussion of how GPERF functions.
105 * Options:: A description of options to the program.
106 * Bugs:: Known bugs and limitations with GPERF.
107 * Projects:: Things still left to do.
108 * Bibliography:: Material Referenced in this Report.
109
110 * Concept Index::
111
112 @detailmenu --- The Detailed Node Listing ---
113
114 High-Level Description of GNU @code{gperf}
115
116 * Input Format:: Input Format to @code{gperf}
117 * Output Format:: Output Format for Generated C Code with @code{gp erf}
118 * Binary Strings:: Use of NUL bytes
119
120 Input Format to @code{gperf}
121
122 * Declarations:: Declarations.
123 * Keywords:: Format for Keyword Entries.
124 * Functions:: Including Additional C Functions.
125 * Controls for GNU indent:: Where to place directives for GNU @code{indent}.
126
127 Declarations
128
129 * User-supplied Struct:: Specifying keywords with attributes.
130 * Gperf Declarations:: Embedding command line options in the input.
131 * C Code Inclusion:: Including C declarations and definitions.
132
133 Invoking @code{gperf}
134
135 * Input Details:: Options that affect Interpretation of the Input File
136 * Output Language:: Specifying the Language for the Output Code
137 * Output Details:: Fine tuning Details in the Output Code
138 * Algorithmic Details:: Changing the Algorithms employed by @code{gperf}
139 * Verbosity:: Informative Output
140
141 @end detailmenu
142 @end menu
143
144 @end ifinfo
145
146 @node Copying, Contributors, Top, Top
147 @unnumbered GNU GENERAL PUBLIC LICENSE
148 @include gpl.texinfo
149
150 @node Contributors, Motivation, Copying, Top
151 @unnumbered Contributors to GNU @code{gperf} Utility
152
153 @itemize @bullet
154 @item
155 @cindex Bugs
156 The GNU @code{gperf} perfect hash function generator utility was
157 written in GNU C++ by Douglas C. Schmidt. The general
158 idea for the perfect hash function generator was inspired by Keith
159 Bostic's algorithm written in C, and distributed to net.sources around
160 1984. The current program is a heavily modified, enhanced, and extended
161 implementation of Keith's basic idea, created at the University of
162 California, Irvine. Bugs, patches, and suggestions should be reported
163 to @code{<bug-gnu-gperf@@gnu.org>}.
164
165 @item
166 Special thanks is extended to Michael Tiemann and Doug Lea, for
167 providing a useful compiler, and for giving me a forum to exhibit my
168 creation.
169
170 In addition, Adam de Boor and Nels Olson provided many tips and insights
171 that greatly helped improve the quality and functionality of @code{gperf}.
172
173 @item
174 Bruno Haible enhanced and optimized the search algorithm. He also rewrote
175 the input routines and the output routines for better reliability, and
176 added a testsuite.
177 @end itemize
178
179 @node Motivation, Search Structures, Contributors, Top
180 @chapter Introduction
181
182 @code{gperf} is a perfect hash function generator written in C++. It
183 transforms an @var{n} element user-specified keyword set @var{W} into a
184 perfect hash function @var{F}. @var{F} uniquely maps keywords in
185 @var{W} onto the range 0..@var{k}, where @var{k} >= @var{n-1}. If @var{k}
186 = @var{n-1} then @var{F} is a @emph{minimal} perfect hash function.
187 @code{gperf} generates a 0..@var{k} element static lookup table and a
188 pair of C functions. These functions determine whether a given
189 character string @var{s} occurs in @var{W}, using at most one probe into
190 the lookup table.
191
192 @code{gperf} currently generates the reserved keyword recognizer for
193 lexical analyzers in several production and research compilers and
194 language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal,
195 GNU Modula 3, and GNU indent. Complete C++ source code for @code{gperf} is
196 available from @code{http://ftp.gnu.org/pub/gnu/gperf/}.
197 A paper describing @code{gperf}'s design and implementation in greater
198 detail is available in the Second USENIX C++ Conference proceedings
199 or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}.
200
201 @node Search Structures, Description, Motivation, Top
202 @chapter Static search structures and GNU @code{gperf}
203 @cindex Static search structure
204
205 A @dfn{static search structure} is an Abstract Data Type with certain
206 fundamental operations, e.g., @emph{initialize}, @emph{insert},
207 and @emph{retrieve}. Conceptually, all insertions occur before any
208 retrievals. In practice, @code{gperf} generates a @emph{static} array
209 containing search set keywords and any associated attributes specified
210 by the user. Thus, there is essentially no execution-time cost for the
211 insertions. It is a useful data structure for representing @emph{static
212 search sets}. Static search sets occur frequently in software system
213 applications. Typical static search sets include compiler reserved
214 words, assembler instruction opcodes, and built-in shell interpreter
215 commands. Search set members, called @dfn{keywords}, are inserted into
216 the structure only once, usually during program initialization, and are
217 not generally modified at run-time.
218
219 Numerous static search structure implementations exist, e.g.,
220 arrays, linked lists, binary search trees, digital search tries, and
221 hash tables. Different approaches offer trade-offs between space
222 utilization and search time efficiency. For example, an @var{n} element
223 sorted array is space efficient, though the average-case time
224 complexity for retrieval operations using binary search is
225 proportional to log @var{n}. Conversely, hash table implementations
226 often locate a table entry in constant time, but typically impose
227 additional memory overhead and exhibit poor worst case performance.
228
229 @cindex Minimal perfect hash functions
230 @emph{Minimal perfect hash functions} provide an optimal solution for a
231 particular class of static search sets. A minimal perfect hash
232 function is defined by two properties:
233
234 @itemize @bullet
235 @item
236 It allows keyword recognition in a static search set using at most
237 @emph{one} probe into the hash table. This represents the ``perfect''
238 property.
239 @item
240 The actual memory allocated to store the keywords is precisely large
241 enough for the keyword set, and @emph{no larger}. This is the
242 ``minimal'' property.
243 @end itemize
244
245 For most applications it is far easier to generate @emph{perfect} hash
246 functions than @emph{minimal perfect} hash functions. Moreover,
247 non-minimal perfect hash functions frequently execute faster than
248 minimal ones in practice. This phenomena occurs since searching a
249 sparse keyword table increases the probability of locating a ``null''
250 entry, thereby reducing string comparisons. @code{gperf}'s default
251 behavior generates @emph{near-minimal} perfect hash functions for
252 keyword sets. However, @code{gperf} provides many options that permit
253 user control over the degree of minimality and perfection.
254
255 Static search sets often exhibit relative stability over time. For
256 example, Ada's 63 reserved words have remained constant for nearly a
257 decade. It is therefore frequently worthwhile to expend concerted
258 effort building an optimal search structure @emph{once}, if it
259 subsequently receives heavy use multiple times. @code{gperf} removes
260 the drudgery associated with constructing time- and space-efficient
261 search structures by hand. It has proven a useful and practical tool
262 for serious programming projects. Output from @code{gperf} is currently
263 used in several production and research compilers, including GNU C, GNU
264 C++, GNU Java, GNU Pascal, and GNU Modula 3. The latter two compilers are
265 not yet part of the official GNU distribution. Each compiler utilizes
266 @code{gperf} to automatically generate static search structures that
267 efficiently identify their respective reserved keywords.
268
269 @node Description, Options, Search Structures, Top
270 @chapter High-Level Description of GNU @code{gperf}
271
272 @menu
273 * Input Format:: Input Format to @code{gperf}
274 * Output Format:: Output Format for Generated C Code with @code{gp erf}
275 * Binary Strings:: Use of NUL bytes
276 @end menu
277
278 The perfect hash function generator @code{gperf} reads a set of
279 ``keywords'' from an input file (or from the standard input by
280 default). It attempts to derive a perfect hashing function that
281 recognizes a member of the @dfn{static keyword set} with at most a
282 single probe into the lookup table. If @code{gperf} succeeds in
283 generating such a function it produces a pair of C source code routines
284 that perform hashing and table lookup recognition. All generated C code
285 is directed to the standard output. Command-line options described
286 below allow you to modify the input and output format to @code{gperf}.
287
288 By default, @code{gperf} attempts to produce time-efficient code, with
289 less emphasis on efficient space utilization. However, several options
290 exist that permit trading-off execution time for storage space and vice
291 versa. In particular, expanding the generated table size produces a
292 sparse search structure, generally yielding faster searches.
293 Conversely, you can direct @code{gperf} to utilize a C @code{switch}
294 statement scheme that minimizes data space storage size. Furthermore,
295 using a C @code{switch} may actually speed up the keyword retrieval time
296 somewhat. Actual results depend on your C compiler, of course.
297
298 In general, @code{gperf} assigns values to the bytes it is using
299 for hashing until some set of values gives each keyword a unique value.
300 A helpful heuristic is that the larger the hash value range, the easier
301 it is for @code{gperf} to find and generate a perfect hash function.
302 Experimentation is the key to getting the most from @code{gperf}.
303
304 @node Input Format, Output Format, Description, Description
305 @section Input Format to @code{gperf}
306 @cindex Format
307 @cindex Declaration section
308 @cindex Keywords section
309 @cindex Functions section
310 You can control the input file format by varying certain command-line
311 arguments, in particular the @samp{-t} option. The input's appearance
312 is similar to GNU utilities @code{flex} and @code{bison} (or UNIX
313 utilities @code{lex} and @code{yacc}). Here's an outline of the general
314 format:
315
316 @example
317 @group
318 declarations
319 %%
320 keywords
321 %%
322 functions
323 @end group
324 @end example
325
326 @emph{Unlike} @code{flex} or @code{bison}, the declarations section and
327 the functions section are optional. The following sections describe the
328 input format for each section.
329
330 @menu
331 * Declarations:: Declarations.
332 * Keywords:: Format for Keyword Entries.
333 * Functions:: Including Additional C Functions.
334 * Controls for GNU indent:: Where to place directives for GNU @code{indent}.
335 @end menu
336
337 It is possible to omit the declaration section entirely, if the @samp{-t}
338 option is not given. In this case the input file begins directly with the
339 first keyword line, e.g.:
340
341 @example
342 @group
343 january
344 february
345 march
346 april
347 ...
348 @end group
349 @end example
350
351 @node Declarations, Keywords, Input Format, Input Format
352 @subsection Declarations
353
354 The keyword input file optionally contains a section for including
355 arbitrary C declarations and definitions, @code{gperf} declarations that
356 act like command-line options, as well as for providing a user-supplied
357 @code{struct}.
358
359 @menu
360 * User-supplied Struct:: Specifying keywords with attributes.
361 * Gperf Declarations:: Embedding command line options in the input.
362 * C Code Inclusion:: Including C declarations and definitions.
363 @end menu
364
365 @node User-supplied Struct, Gperf Declarations, Declarations, Declarations
366 @subsubsection User-supplied @code{struct}
367
368 If the @samp{-t} option (or, equivalently, the @samp{%struct-type} declaration)
369 @emph{is} enabled, you @emph{must} provide a C @code{struct} as the last
370 component in the declaration section from the input file. The first
371 field in this struct must be of type @code{char *} or @code{const char *}
372 if the @samp{-P} option is not given, or of type @code{int} if the option
373 @samp{-P} (or, equivalently, the @samp{%pic} declaration) is enabled.
374 This first field must be called @samp{name}, although it is possible to modify
375 its name with the @samp{-K} option (or, equivalently, the
376 @samp{%define slot-name} declaration) described below.
377
378 Here is a simple example, using months of the year and their attributes as
379 input:
380
381 @example
382 @group
383 struct month @{ char *name; int number; int days; int leap_days; @};
384 %%
385 january, 1, 31, 31
386 february, 2, 28, 29
387 march, 3, 31, 31
388 april, 4, 30, 30
389 may, 5, 31, 31
390 june, 6, 30, 30
391 july, 7, 31, 31
392 august, 8, 31, 31
393 september, 9, 30, 30
394 october, 10, 31, 31
395 november, 11, 30, 30
396 december, 12, 31, 31
397 @end group
398 @end example
399
400 @cindex @samp{%%}
401 Separating the @code{struct} declaration from the list of keywords and
402 other fields are a pair of consecutive percent signs, @samp{%%},
403 appearing left justified in the first column, as in the UNIX utility
404 @code{lex}.
405
406 If the @code{struct} has already been declared in an include file, it can
407 be mentioned in an abbreviated form, like this:
408
409 @example
410 @group
411 struct month;
412 %%
413 january, 1, 31, 31
414 ...
415 @end group
416 @end example
417
418 @node Gperf Declarations, C Code Inclusion, User-supplied Struct, Declarations
419 @subsubsection Gperf Declarations
420
421 The declaration section can contain @code{gperf} declarations. They
422 influence the way @code{gperf} works, like command line options do.
423 In fact, every such declaration is equivalent to a command line option.
424 There are three forms of declarations:
425
426 @enumerate
427 @item
428 Declarations without argument, like @samp{%compare-lengths}.
429
430 @item
431 Declarations with an argument, like @samp{%switch=@var{count}}.
432
433 @item
434 Declarations of names of entities in the output file, like
435 @samp{%define lookup-function-name @var{name}}.
436 @end enumerate
437
438 When a declaration is given both in the input file and as a command line
439 option, the command-line option's value prevails.
440
441 The following @code{gperf} declarations are available.
442
443 @table @samp
444 @item %delimiters=@var{delimiter-list}
445 @cindex @samp{%delimiters}
446 Allows you to provide a string containing delimiters used to
447 separate keywords from their attributes. The default is ",". This
448 option is essential if you want to use keywords that have embedded
449 commas or newlines.
450
451 @item %struct-type
452 @cindex @samp{%struct-type}
453 Allows you to include a @code{struct} type declaration for generated
454 code; see above for an example.
455
456 @item %ignore-case
457 @cindex @samp{%ignore-case}
458 Consider upper and lower case ASCII characters as equivalent. The string
459 comparison will use a case insignificant character comparison. Note that
460 locale dependent case mappings are ignored.
461
462 @item %language=@var{language-name}
463 @cindex @samp{%language}
464 Instructs @code{gperf} to generate code in the language specified by the
465 option's argument. Languages handled are currently:
466
467 @table @samp
468 @item KR-C
469 Old-style K&R C. This language is understood by old-style C compilers and
470 ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
471 because of lacking @samp{const}.
472
473 @item C
474 Common C. This language is understood by ANSI C compilers, and also by
475 old-style C compilers, provided that you @code{#define const} to empty
476 for compilers which don't know about this keyword.
477
478 @item ANSI-C
479 ANSI C. This language is understood by ANSI C compilers and C++ compilers.
480
481 @item C++
482 C++. This language is understood by C++ compilers.
483 @end table
484
485 The default is C.
486
487 @item %define slot-name @var{name}
488 @cindex @samp{%define slot-name}
489 This declaration is only useful when option @samp{-t} (or, equivalently, the
490 @samp{%struct-type} declaration) has been given.
491 By default, the program assumes the structure component identifier for
492 the keyword is @samp{name}. This option allows an arbitrary choice of
493 identifier for this component, although it still must occur as the first
494 field in your supplied @code{struct}.
495
496 @item %define initializer-suffix @var{initializers}
497 @cindex @samp{%define initializer-suffix}
498 This declaration is only useful when option @samp{-t} (or, equivalently, the
499 @samp{%struct-type} declaration) has been given.
500 It permits to specify initializers for the structure members following
501 @var{slot-name} in empty hash table entries. The list of initializers
502 should start with a comma. By default, the emitted code will
503 zero-initialize structure members following @var{slot-name}.
504
505 @item %define hash-function-name @var{name}
506 @cindex @samp{%define hash-function-name}
507 Allows you to specify the name for the generated hash function. Default
508 name is @samp{hash}. This option permits the use of two hash tables in
509 the same file.
510
511 @item %define lookup-function-name @var{name}
512 @cindex @samp{%define lookup-function-name}
513 Allows you to specify the name for the generated lookup function.
514 Default name is @samp{in_word_set}. This option permits multiple
515 generated hash functions to be used in the same application.
516
517 @item %define class-name @var{name}
518 @cindex @samp{%define class-name}
519 This option is only useful when option @samp{-L C++} (or, equivalently,
520 the @samp{%language=C++} declaration) has been given. It
521 allows you to specify the name of generated C++ class. Default name is
522 @code{Perfect_Hash}.
523
524 @item %7bit
525 @cindex @samp{%7bit}
526 This option specifies that all strings that will be passed as arguments
527 to the generated hash function and the generated lookup function will
528 solely consist of 7-bit ASCII characters (bytes in the range 0..127).
529 (Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
530 @emph{not} guarantee that a byte is in this range. Only an explicit
531 test like @samp{c >= 'A' && c <= 'Z'} guarantees this.)
532
533 @item %compare-lengths
534 @cindex @samp{%compare-lengths}
535 Compare keyword lengths before trying a string comparison. This option
536 is mandatory for binary comparisons (@pxref{Binary Strings}). It also might
537 cut down on the number of string comparisons made during the lookup, since
538 keywords with different lengths are never compared via @code{strcmp}.
539 However, using @samp{%compare-lengths} might greatly increase the size of the
540 generated C code if the lookup table range is large (which implies that
541 the switch option @samp{-S} or @samp{%switch} is not enabled), since the length
542 table contains as many elements as there are entries in the lookup table.
543
544 @item %compare-strncmp
545 @cindex @samp{%compare-strncmp}
546 Generates C code that uses the @code{strncmp} function to perform
547 string comparisons. The default action is to use @code{strcmp}.
548
549 @item %readonly-tables
550 @cindex @samp{%readonly-tables}
551 Makes the contents of all generated lookup tables constant, i.e.,
552 ``readonly''. Many compilers can generate more efficient code for this
553 by putting the tables in readonly memory.
554
555 @item %enum
556 @cindex @samp{%enum}
557 Define constant values using an enum local to the lookup function rather
558 than with #defines. This also means that different lookup functions can
559 reside in the same file. Thanks to James Clark @code{<jjc@@ai.mit.edu>}.
560
561 @item %includes
562 @cindex @samp{%includes}
563 Include the necessary system include file, @code{<string.h>}, at the
564 beginning of the code. By default, this is not done; the user must
565 include this header file himself to allow compilation of the code.
566
567 @item %global-table
568 @cindex @samp{%global-table}
569 Generate the static table of keywords as a static global variable,
570 rather than hiding it inside of the lookup function (which is the
571 default behavior).
572
573 @item %pic
574 @cindex @samp{%pic}
575 Optimize the generated table for inclusion in shared libraries. This
576 reduces the startup time of programs using a shared library containing
577 the generated code. If the @samp{%struct-type} declaration (or,
578 equivalently, the option @samp{-t}) is also given, the first field of the
579 user-defined struct must be of type @samp{int}, not @samp{char *}, because
580 it will contain offsets into the string pool instead of actual strings.
581 To convert such an offset to a string, you can use the expression
582 @samp{stringpool + @var{o}}, where @var{o} is the offset. The string pool
583 name can be changed through the @samp{%define string-pool-name} declaration.
584
585 @item %define string-pool-name @var{name}
586 @cindex @samp{%define string-pool-name}
587 Allows you to specify the name of the generated string pool created by
588 the declaration @samp{%pic} (or, equivalently, the option @samp{-P}).
589 The default name is @samp{stringpool}. This declaration permits the use of
590 two hash tables in the same file, with @samp{%pic} and even when the
591 @samp{%global-table} declaration (or, equivalently, the option @samp{-G})
592 is given.
593
594 @item %null-strings
595 @cindex @samp{%null-strings}
596 Use NULL strings instead of empty strings for empty keyword table entries.
597 This reduces the startup time of programs using a shared library containing
598 the generated code (but not as much as the declaration @samp{%pic}), at the
599 expense of one more test-and-branch instruction at run time.
600
601 @item %define word-array-name @var{name}
602 @cindex @samp{%define word-array-name}
603 Allows you to specify the name for the generated array containing the
604 hash table. Default name is @samp{wordlist}. This option permits the
605 use of two hash tables in the same file, even when the option @samp{-G}
606 (or, equivalently, the @samp{%global-table} declaration) is given.
607
608 @item %switch=@var{count}
609 @cindex @samp{%switch}
610 Causes the generated C code to use a @code{switch} statement scheme,
611 rather than an array lookup table. This can lead to a reduction in both
612 time and space requirements for some input files. The argument to this
613 option determines how many @code{switch} statements are generated. A
614 value of 1 generates 1 @code{switch} containing all the elements, a
615 value of 2 generates 2 tables with 1/2 the elements in each
616 @code{switch}, etc. This is useful since many C compilers cannot
617 correctly generate code for large @code{switch} statements. This option
618 was inspired in part by Keith Bostic's original C program.
619
620 @item %omit-struct-type
621 @cindex @samp{%omit-struct-type}
622 Prevents the transfer of the type declaration to the output file. Use
623 this option if the type is already defined elsewhere.
624 @end table
625
626 @node C Code Inclusion, , Gperf Declarations, Declarations
627 @subsubsection C Code Inclusion
628
629 @cindex @samp{%@{}
630 @cindex @samp{%@}}
631 Using a syntax similar to GNU utilities @code{flex} and @code{bison}, it
632 is possible to directly include C source text and comments verbatim into
633 the generated output file. This is accomplished by enclosing the region
634 inside left-justified surrounding @samp{%@{}, @samp{%@}} pairs. Here is
635 an input fragment based on the previous example that illustrates this
636 feature:
637
638 @example
639 @group
640 %@{
641 #include <assert.h>
642 /* This section of code is inserted directly into the output. */
643 int return_month_days (struct month *months, int is_leap_year);
644 %@}
645 struct month @{ char *name; int number; int days; int leap_days; @};
646 %%
647 january, 1, 31, 31
648 february, 2, 28, 29
649 march, 3, 31, 31
650 ...
651 @end group
652 @end example
653
654 @node Keywords, Functions, Declarations, Input Format
655 @subsection Format for Keyword Entries
656
657 The second input file format section contains lines of keywords and any
658 associated attributes you might supply. A line beginning with @samp{#}
659 in the first column is considered a comment. Everything following the
660 @samp{#} is ignored, up to and including the following newline. A line
661 beginning with @samp{%} in the first column is an option declaration and
662 must not occur within the keywords section.
663
664 The first field of each non-comment line is always the keyword itself. It
665 can be given in two ways: as a simple name, i.e., without surrounding
666 string quotation marks, or as a string enclosed in double-quotes, in
667 C syntax, possibly with backslash escapes like @code{\"} or @code{\234}
668 or @code{\xa8}. In either case, it must start right at the beginning
669 of the line, without leading whitespace.
670 In this context, a ``field'' is considered to extend up to, but
671 not include, the first blank, comma, or newline. Here is a simple
672 example taken from a partial list of C reserved words:
673
674 @example
675 @group
676 # These are a few C reserved words, see the c.gperf file
677 # for a complete list of ANSI C reserved words.
678 unsigned
679 sizeof
680 switch
681 signed
682 if
683 default
684 for
685 while
686 return
687 @end group
688 @end example
689
690 Note that unlike @code{flex} or @code{bison} the first @samp{%%} marker
691 may be elided if the declaration section is empty.
692
693 Additional fields may optionally follow the leading keyword. Fields
694 should be separated by commas, and terminate at the end of line. What
695 these fields mean is entirely up to you; they are used to initialize the
696 elements of the user-defined @code{struct} provided by you in the
697 declaration section. If the @samp{-t} option (or, equivalently, the
698 @samp{%struct-type} declaration) is @emph{not} enabled
699 these fields are simply ignored. All previous examples except the last
700 one contain keyword attributes.
701
702 @node Functions, Controls for GNU indent, Keywords, Input Format
703 @subsection Including Additional C Functions
704
705 The optional third section also corresponds closely with conventions
706 found in @code{flex} and @code{bison}. All text in this section,
707 starting at the final @samp{%%} and extending to the end of the input
708 file, is included verbatim into the generated output file. Naturally,
709 it is your responsibility to ensure that the code contained in this
710 section is valid C.
711
712 @node Controls for GNU indent, , Functions, Input Format
713 @subsection Where to place directives for GNU @code{indent}.
714
715 If you want to invoke GNU @code{indent} on a @code{gperf} input file,
716 you will see that GNU @code{indent} doesn't understand the @samp{%%},
717 @samp{%@{} and @samp{%@}} directives that control @code{gperf}'s
718 interpretation of the input file. Therefore you have to insert some
719 directives for GNU @code{indent}. More precisely, assuming the most
720 general input file structure
721
722 @example
723 @group
724 declarations part 1
725 %@{
726 verbatim code
727 %@}
728 declarations part 2
729 %%
730 keywords
731 %%
732 functions
733 @end group
734 @end example
735
736 @noindent
737 you would insert @samp{*INDENT-OFF*} and @samp{*INDENT-ON*} comments
738 as follows:
739
740 @example
741 @group
742 /* *INDENT-OFF* */
743 declarations part 1
744 %@{
745 /* *INDENT-ON* */
746 verbatim code
747 /* *INDENT-OFF* */
748 %@}
749 declarations part 2
750 %%
751 keywords
752 %%
753 /* *INDENT-ON* */
754 functions
755 @end group
756 @end example
757
758 @node Output Format, Binary Strings, Input Format, Description
759 @section Output Format for Generated C Code with @code{gperf}
760 @cindex hash table
761
762 Several options control how the generated C code appears on the standard
763 output. Two C function are generated. They are called @code{hash} and
764 @code{in_word_set}, although you may modify their names with a command-line
765 option. Both functions require two arguments, a string, @code{char *}
766 @var{str}, and a length parameter, @code{int} @var{len}. Their default
767 function prototypes are as follows:
768
769 @deftypefun {unsigned int} hash (const char * @var{str}, unsigned int @var{len})
770 By default, the generated @code{hash} function returns an integer value
771 created by adding @var{len} to several user-specified @var{str} byte
772 positions indexed into an @dfn{associated values} table stored in a
773 local static array. The associated values table is constructed
774 internally by @code{gperf} and later output as a static local C array
775 called @samp{hash_table}. The relevant selected positions (i.e. indices
776 into @var{str}) are specified via the @samp{-k} option when running
777 @code{gperf}, as detailed in the @emph{Options} section below (@pxref{Options}).
778 @end deftypefun
779
780 @deftypefun {} in_word_set (const char * @var{str}, unsigned int @var{len})
781 If @var{str} is in the keyword set, returns a pointer to that
782 keyword. More exactly, if the option @samp{-t} (or, equivalently, the
783 @samp{%struct-type} declaration) was given, it returns
784 a pointer to the matching keyword's structure. Otherwise it returns
785 @code{NULL}.
786 @end deftypefun
787
788 If the option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}
789 declaration) is not used, @var{str} must be a NUL terminated
790 string of exactly length @var{len}. If @samp{-c} (or, equivalently, the
791 @samp{%compare-strncmp} declaration) is used, @var{str} must
792 simply be an array of @var{len} bytes and does not need to be NUL
793 terminated.
794
795 The code generated for these two functions is affected by the following
796 options:
797
798 @table @samp
799 @item -t
800 @itemx --struct-type
801 Make use of the user-defined @code{struct}.
802
803 @item -S @var{total-switch-statements}
804 @itemx --switch=@var{total-switch-statements}
805 @cindex @code{switch}
806 Generate 1 or more C @code{switch} statement rather than use a large,
807 (and potentially sparse) static array. Although the exact time and
808 space savings of this approach vary according to your C compiler's
809 degree of optimization, this method often results in smaller and faster
810 code.
811 @end table
812
813 If the @samp{-t} and @samp{-S} options (or, equivalently, the
814 @samp{%struct-type} and @samp{%switch} declarations) are omitted, the default
815 action
816 is to generate a @code{char *} array containing the keywords, together with
817 additional empty strings used for padding the array. By experimenting
818 with the various input and output options, and timing the resulting C
819 code, you can determine the best option choices for different keyword
820 set characteristics.
821
822 @node Binary Strings, , Output Format, Description
823 @section Use of NUL bytes
824 @cindex NUL
825
826 By default, the code generated by @code{gperf} operates on zero
827 terminated strings, the usual representation of strings in C. This means
828 that the keywords in the input file must not contain NUL bytes,
829 and the @var{str} argument passed to @code{hash} or @code{in_word_set}
830 must be NUL terminated and have exactly length @var{len}.
831
832 If option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}
833 declaration) is used, then the @var{str} argument does not need
834 to be NUL terminated. The code generated by @code{gperf} will only
835 access the first @var{len}, not @var{len+1}, bytes starting at @var{str}.
836 However, the keywords in the input file still must not contain NUL
837 bytes.
838
839 If option @samp{-l} (or, equivalently, the @samp{%compare-lengths}
840 declaration) is used, then the hash table performs binary
841 comparison. The keywords in the input file may contain NUL bytes,
842 written in string syntax as @code{\000} or @code{\x00}, and the code
843 generated by @code{gperf} will treat NUL like any other byte.
844 Also, in this case the @samp{-c} option (or, equivalently, the
845 @samp{%compare-strncmp} declaration) is ignored.
846
847 @node Options, Bugs, Description, Top
848 @chapter Invoking @code{gperf}
849
850 There are @emph{many} options to @code{gperf}. They were added to make
851 the program more convenient for use with real applications. ``On-line''
852 help is readily available via the @samp{--help} option. Here is the
853 complete list of options.
854
855 @menu
856 * Output File:: Specifying the Location of the Output File
857 * Input Details:: Options that affect Interpretation of the Input File
858 * Output Language:: Specifying the Language for the Output Code
859 * Output Details:: Fine tuning Details in the Output Code
860 * Algorithmic Details:: Changing the Algorithms employed by @code{gperf}
861 * Verbosity:: Informative Output
862 @end menu
863
864 @node Output File, Input Details, Options, Options
865 @section Specifying the Location of the Output File
866
867 @table @samp
868 @item --output-file=@var{file}
869 Allows you to specify the name of the file to which the output is written to.
870 @end table
871
872 The results are written to standard output if no output file is specified
873 or if it is @samp{-}.
874
875 @node Input Details, Output Language, Output File, Options
876 @section Options that affect Interpretation of the Input File
877
878 These options are also available as declarations in the input file
879 (@pxref{Gperf Declarations}).
880
881 @table @samp
882 @item -e @var{keyword-delimiter-list}
883 @itemx --delimiters=@var{keyword-delimiter-list}
884 @cindex Delimiters
885 Allows you to provide a string containing delimiters used to
886 separate keywords from their attributes. The default is ",". This
887 option is essential if you want to use keywords that have embedded
888 commas or newlines. One useful trick is to use -e'TAB', where TAB is
889 the literal tab character.
890
891 @item -t
892 @itemx --struct-type
893 Allows you to include a @code{struct} type declaration for generated
894 code. Any text before a pair of consecutive @samp{%%} is considered
895 part of the type declaration. Keywords and additional fields may follow
896 this, one group of fields per line. A set of examples for generating
897 perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2,
898 Modula 3 and JavaScript reserved words are distributed with this release.
899
900 @item --ignore-case
901 Consider upper and lower case ASCII characters as equivalent. The string
902 comparison will use a case insignificant character comparison. Note that
903 locale dependent case mappings are ignored. This option is therefore not
904 suitable if a properly internationalized or locale aware case mapping
905 should be used. (For example, in a Turkish locale, the upper case equivalent
906 of the lowercase ASCII letter @samp{i} is the non-ASCII character
907 @samp{capital i with dot above}.) For this case, it is better to apply
908 an uppercase or lowercase conversion on the string before passing it to
909 the @code{gperf} generated function.
910 @end table
911
912 @node Output Language, Output Details, Input Details, Options
913 @section Options to specify the Language for the Output Code
914
915 These options are also available as declarations in the input file
916 (@pxref{Gperf Declarations}).
917
918 @table @samp
919 @item -L @var{generated-language-name}
920 @itemx --language=@var{generated-language-name}
921 Instructs @code{gperf} to generate code in the language specified by the
922 option's argument. Languages handled are currently:
923
924 @table @samp
925 @item KR-C
926 Old-style K&R C. This language is understood by old-style C compilers and
927 ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
928 because of lacking @samp{const}.
929
930 @item C
931 Common C. This language is understood by ANSI C compilers, and also by
932 old-style C compilers, provided that you @code{#define const} to empty
933 for compilers which don't know about this keyword.
934
935 @item ANSI-C
936 ANSI C. This language is understood by ANSI C compilers and C++ compilers.
937
938 @item C++
939 C++. This language is understood by C++ compilers.
940 @end table
941
942 The default is C.
943
944 @item -a
945 This option is supported for compatibility with previous releases of
946 @code{gperf}. It does not do anything.
947
948 @item -g
949 This option is supported for compatibility with previous releases of
950 @code{gperf}. It does not do anything.
951 @end table
952
953 @node Output Details, Algorithmic Details, Output Language, Options
954 @section Options for fine tuning Details in the Output Code
955
956 Most of these options are also available as declarations in the input file
957 (@pxref{Gperf Declarations}).
958
959 @table @samp
960 @item -K @var{slot-name}
961 @itemx --slot-name=@var{slot-name}
962 @cindex Slot name
963 This option is only useful when option @samp{-t} (or, equivalently, the
964 @samp{%struct-type} declaration) has been given.
965 By default, the program assumes the structure component identifier for
966 the keyword is @samp{name}. This option allows an arbitrary choice of
967 identifier for this component, although it still must occur as the first
968 field in your supplied @code{struct}.
969
970 @item -F @var{initializers}
971 @itemx --initializer-suffix=@var{initializers}
972 @cindex Initializers
973 This option is only useful when option @samp{-t} (or, equivalently, the
974 @samp{%struct-type} declaration) has been given.
975 It permits to specify initializers for the structure members following
976 @var{slot-name} in empty hash table entries. The list of initializers
977 should start with a comma. By default, the emitted code will
978 zero-initialize structure members following @var{slot-name}.
979
980 @item -H @var{hash-function-name}
981 @itemx --hash-function-name=@var{hash-function-name}
982 Allows you to specify the name for the generated hash function. Default
983 name is @samp{hash}. This option permits the use of two hash tables in
984 the same file.
985
986 @item -N @var{lookup-function-name}
987 @itemx --lookup-function-name=@var{lookup-function-name}
988 Allows you to specify the name for the generated lookup function.
989 Default name is @samp{in_word_set}. This option permits multiple
990 generated hash functions to be used in the same application.
991
992 @item -Z @var{class-name}
993 @itemx --class-name=@var{class-name}
994 @cindex Class name
995 This option is only useful when option @samp{-L C++} (or, equivalently,
996 the @samp{%language=C++} declaration) has been given. It
997 allows you to specify the name of generated C++ class. Default name is
998 @code{Perfect_Hash}.
999
1000 @item -7
1001 @itemx --seven-bit
1002 This option specifies that all strings that will be passed as arguments
1003 to the generated hash function and the generated lookup function will
1004 solely consist of 7-bit ASCII characters (bytes in the range 0..127).
1005 (Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
1006 @emph{not} guarantee that a byte is in this range. Only an explicit
1007 test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the
1008 default in versions of @code{gperf} earlier than 2.7; now the default is
1009 to support 8-bit and multibyte characters.
1010
1011 @item -l
1012 @itemx --compare-lengths
1013 Compare keyword lengths before trying a string comparison. This option
1014 is mandatory for binary comparisons (@pxref{Binary Strings}). It also might
1015 cut down on the number of string comparisons made during the lookup, since
1016 keywords with different lengths are never compared via @code{strcmp}.
1017 However, using @samp{-l} might greatly increase the size of the
1018 generated C code if the lookup table range is large (which implies that
1019 the switch option @samp{-S} or @samp{%switch} is not enabled), since the length
1020 table contains as many elements as there are entries in the lookup table.
1021
1022 @item -c
1023 @itemx --compare-strncmp
1024 Generates C code that uses the @code{strncmp} function to perform
1025 string comparisons. The default action is to use @code{strcmp}.
1026
1027 @item -C
1028 @itemx --readonly-tables
1029 Makes the contents of all generated lookup tables constant, i.e.,
1030 ``readonly''. Many compilers can generate more efficient code for this
1031 by putting the tables in readonly memory.
1032
1033 @item -E
1034 @itemx --enum
1035 Define constant values using an enum local to the lookup function rather
1036 than with #defines. This also means that different lookup functions can
1037 reside in the same file. Thanks to James Clark @code{<jjc@@ai.mit.edu>}.
1038
1039 @item -I
1040 @itemx --includes
1041 Include the necessary system include file, @code{<string.h>}, at the
1042 beginning of the code. By default, this is not done; the user must
1043 include this header file himself to allow compilation of the code.
1044
1045 @item -G
1046 @itemx --global-table
1047 Generate the static table of keywords as a static global variable,
1048 rather than hiding it inside of the lookup function (which is the
1049 default behavior).
1050
1051 @item -P
1052 @itemx --pic
1053 Optimize the generated table for inclusion in shared libraries. This
1054 reduces the startup time of programs using a shared library containing
1055 the generated code. If the option @samp{-t} (or, equivalently, the
1056 @samp{%struct-type} declaration) is also given, the first field of the
1057 user-defined struct must be of type @samp{int}, not @samp{char *}, because
1058 it will contain offsets into the string pool instead of actual strings.
1059 To convert such an offset to a string, you can use the expression
1060 @samp{stringpool + @var{o}}, where @var{o} is the offset. The string pool
1061 name can be changed through the option @samp{--string-pool-name}.
1062
1063 @item -Q @var{string-pool-name}
1064 @itemx --string-pool-name=@var{string-pool-name}
1065 Allows you to specify the name of the generated string pool created by
1066 option @samp{-P}. The default name is @samp{stringpool}. This option
1067 permits the use of two hash tables in the same file, with @samp{-P} and
1068 even when the option @samp{-G} (or, equivalently, the @samp{%global-table}
1069 declaration) is given.
1070
1071 @item --null-strings
1072 Use NULL strings instead of empty strings for empty keyword table entries.
1073 This reduces the startup time of programs using a shared library containing
1074 the generated code (but not as much as option @samp{-P}), at the expense
1075 of one more test-and-branch instruction at run time.
1076
1077 @item -W @var{hash-table-array-name}
1078 @itemx --word-array-name=@var{hash-table-array-name}
1079 @cindex Array name
1080 Allows you to specify the name for the generated array containing the
1081 hash table. Default name is @samp{wordlist}. This option permits the
1082 use of two hash tables in the same file, even when the option @samp{-G}
1083 (or, equivalently, the @samp{%global-table} declaration) is given.
1084
1085 @item -S @var{total-switch-statements}
1086 @itemx --switch=@var{total-switch-statements}
1087 @cindex @code{switch}
1088 Causes the generated C code to use a @code{switch} statement scheme,
1089 rather than an array lookup table. This can lead to a reduction in both
1090 time and space requirements for some input files. The argument to this
1091 option determines how many @code{switch} statements are generated. A
1092 value of 1 generates 1 @code{switch} containing all the elements, a
1093 value of 2 generates 2 tables with 1/2 the elements in each
1094 @code{switch}, etc. This is useful since many C compilers cannot
1095 correctly generate code for large @code{switch} statements. This option
1096 was inspired in part by Keith Bostic's original C program.
1097
1098 @item -T
1099 @itemx --omit-struct-type
1100 Prevents the transfer of the type declaration to the output file. Use
1101 this option if the type is already defined elsewhere.
1102
1103 @item -p
1104 This option is supported for compatibility with previous releases of
1105 @code{gperf}. It does not do anything.
1106 @end table
1107
1108 @node Algorithmic Details, Verbosity, Output Details, Options
1109 @section Options for changing the Algorithms employed by @code{gperf}
1110
1111 @table @samp
1112 @item -k @var{selected-byte-positions}
1113 @itemx --key-positions=@var{selected-byte-positions}
1114 Allows selection of the byte positions used in the keywords'
1115 hash function. The allowable choices range between 1-255, inclusive.
1116 The positions are separated by commas, e.g., @samp{-k 9,4,13,14};
1117 ranges may be used, e.g., @samp{-k 2-7}; and positions may occur
1118 in any order. Furthermore, the wildcard '*' causes the generated
1119 hash function to consider @strong{all} byte positions in each keyword,
1120 whereas '$' instructs the hash function to use the ``final byte''
1121 of a keyword (this is the only way to use a byte position greater than
1122 255, incidentally).
1123
1124 For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash
1125 function that considers positions 1,2,4,6,7,8,9,10, plus the last
1126 byte in each keyword (which may be at a different position for each
1127 keyword, obviously). Keywords
1128 with length less than the indicated byte positions work properly, since
1129 selected byte positions exceeding the keyword length are simply not
1130 referenced in the hash function.
1131
1132 This option is not normally needed since version 2.8 of @code{gperf};
1133 the default byte positions are computed depending on the keyword set,
1134 through a search that minimizes the number of byte positions.
1135
1136 @item -D
1137 @itemx --duplicates
1138 @cindex Duplicates
1139 Handle keywords whose selected byte sets hash to duplicate values.
1140 Duplicate hash values can occur if a set of keywords has the same names, but
1141 possesses different attributes, or if the selected byte positions are not well
1142 chosen. With the -D option @code{gperf} treats all these keywords as
1143 part of an equivalence class and generates a perfect hash function with
1144 multiple comparisons for duplicate keywords. It is up to you to completely
1145 disambiguate the keywords by modifying the generated C code. However,
1146 @code{gperf} helps you out by organizing the output.
1147
1148 Using this option usually means that the generated hash function is no
1149 longer perfect. On the other hand, it permits @code{gperf} to work on
1150 keyword sets that it otherwise could not handle.
1151
1152 @item -m @var{iterations}
1153 @itemx --multiple-iterations=@var{iterations}
1154 Perform multiple choices of the @samp{-i} and @samp{-j} values, and
1155 choose the best results. This increases the running time by a factor of
1156 @var{iterations} but does a good job minimizing the generated table size.
1157
1158 @item -i @var{initial-value}
1159 @itemx --initial-asso=@var{initial-value}
1160 Provides an initial @var{value} for the associate values array. Default
1161 is 0. Increasing the initial value helps inflate the final table size,
1162 possibly leading to more time efficient keyword lookups. Note that this
1163 option is not particularly useful when @samp{-S} (or, equivalently,
1164 @samp{%switch}) is used. Also,
1165 @samp{-i} is overridden when the @samp{-r} option is used.
1166
1167 @item -j @var{jump-value}
1168 @itemx --jump=@var{jump-value}
1169 @cindex Jump value
1170 Affects the ``jump value'', i.e., how far to advance the associated
1171 byte value upon collisions. @var{Jump-value} is rounded up to an
1172 odd number, the default is 5. If the @var{jump-value} is 0 @code{gperf}
1173 jumps by random amounts.
1174
1175 @item -n
1176 @itemx --no-strlen
1177 Instructs the generator not to include the length of a keyword when
1178 computing its hash value. This may save a few assembly instructions in
1179 the generated lookup table.
1180
1181 @item -r
1182 @itemx --random
1183 Utilizes randomness to initialize the associated values table. This
1184 frequently generates solutions faster than using deterministic
1185 initialization (which starts all associated values at 0). Furthermore,
1186 using the randomization option generally increases the size of the
1187 table.
1188
1189 @item -s @var{size-multiple}
1190 @itemx --size-multiple=@var{size-multiple}
1191 Affects the size of the generated hash table. The numeric argument for
1192 this option indicates ``how many times larger or smaller'' the maximum
1193 associated value range should be, in relationship to the number of keywords.
1194 It can be written as an integer, a floating-point number or a fraction.
1195 For example, a value of 3 means ``allow the maximum associated value to be
1196 about 3 times larger than the number of input keywords''.
1197 Conversely, a value of 1/3 means ``allow the maximum associated value to
1198 be about 3 times smaller than the number of input keywords''. Values
1199 smaller than 1 are useful for limiting the overall size of the generated hash
1200 table, though the option @samp{-m} is better at this purpose.
1201
1202 If `generate switch' option @samp{-S} (or, equivalently, @samp{%switch}) is
1203 @emph{not} enabled, the maximum
1204 associated value influences the static array table size, and a larger
1205 table should decrease the time required for an unsuccessful search, at
1206 the expense of extra table space.
1207
1208 The default value is 1, thus the default maximum associated value about
1209 the same size as the number of keywords (for efficiency, the maximum
1210 associated value is always rounded up to a power of 2). The actual
1211 table size may vary somewhat, since this technique is essentially a
1212 heuristic.
1213 @end table
1214
1215 @node Verbosity, , Algorithmic Details, Options
1216 @section Informative Output
1217
1218 @table @samp
1219 @item -h
1220 @itemx --help
1221 Prints a short summary on the meaning of each program option. Aborts
1222 further program execution.
1223
1224 @item -v
1225 @itemx --version
1226 Prints out the current version number.
1227
1228 @item -d
1229 @itemx --debug
1230 Enables the debugging option. This produces verbose diagnostics to
1231 ``standard error'' when @code{gperf} is executing. It is useful both for
1232 maintaining the program and for determining whether a given set of
1233 options is actually speeding up the search for a solution. Some useful
1234 information is dumped at the end of the program when the @samp{-d}
1235 option is enabled.
1236 @end table
1237
1238 @node Bugs, Projects, Options, Top
1239 @chapter Known Bugs and Limitations with @code{gperf}
1240
1241 The following are some limitations with the current release of
1242 @code{gperf}:
1243
1244 @itemize @bullet
1245 @item
1246 The @code{gperf} utility is tuned to execute quickly, and works quickly
1247 for small to medium size data sets (around 1000 keywords). It is
1248 extremely useful for maintaining perfect hash functions for compiler
1249 keyword sets. Several recent enhancements now enable @code{gperf} to
1250 work efficiently on much larger keyword sets (over 15,000 keywords).
1251 When processing large keyword sets it helps greatly to have over 8 megs
1252 of RAM.
1253
1254 @item
1255 The size of the generate static keyword array can get @emph{extremely}
1256 large if the input keyword file is large or if the keywords are quite
1257 similar. This tends to slow down the compilation of the generated C
1258 code, and @emph{greatly} inflates the object code size. If this
1259 situation occurs, consider using the @samp{-S} option to reduce data
1260 size, potentially increasing keyword recognition time a negligible
1261 amount. Since many C compilers cannot correctly generate code for
1262 large switch statements it is important to qualify the @var{-S} option
1263 with an appropriate numerical argument that controls the number of
1264 switch statements generated.
1265
1266 @item
1267 The maximum number of selected byte positions has an
1268 arbitrary limit of 255. This restriction should be removed, and if
1269 anyone considers this a problem write me and let me know so I can remove
1270 the constraint.
1271 @end itemize
1272
1273 @node Projects, Bibliography, Bugs, Top
1274 @chapter Things Still Left to Do
1275
1276 It should be ``relatively'' easy to replace the current perfect hash
1277 function algorithm with a more exhaustive approach; the perfect hash
1278 module is essential independent from other program modules. Additional
1279 worthwhile improvements include:
1280
1281 @itemize @bullet
1282 @item
1283 Another useful extension involves modifying the program to generate
1284 ``minimal'' perfect hash functions (under certain circumstances, the
1285 current version can be rather extravagant in the generated table size).
1286 This is mostly of theoretical interest, since a sparse table
1287 often produces faster lookups, and use of the @samp{-S} @code{switch}
1288 option can minimize the data size, at the expense of slightly longer
1289 lookups (note that the gcc compiler generally produces good code for
1290 @code{switch} statements, reducing the need for more complex schemes).
1291
1292 @item
1293 In addition to improving the algorithm, it would also be useful to
1294 generate an Ada package as the code output, in addition to the current
1295 C and C++ routines.
1296 @end itemize
1297
1298 @page
1299
1300 @node Bibliography, Concept Index, Projects, Top
1301 @chapter Bibliography
1302
1303 [1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect
1304 Hashing Functions} Information Sciences 39(1986), 187-195.
1305
1306 [2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfect Hash
1307 Functions Method''} Communications of the ACM, 23, 12(December 1980), 729.
1308
1309 [3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple}
1310 Communications of the ACM, 23, 1(January 1980), 17-19.
1311
1312 [4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented Minimal
1313 Perfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27.
1314
1315 [5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
1316 @i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58.
1317
1318 [6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating Minimal
1319 Perfect Hashing Functions} Communications of the ACM, 24, 12(December
1320 1981), 829-833.
1321
1322 [7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal Perfect
1323 Hash Functions Method} Communications of the ACM, 23, 12(December 1980),
1324 728-729.
1325
1326 [8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal Perfect
1327 Hash Functions} Communications of the ACM, 28, 5(December 1985), 523-532
1328
1329 [9] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
1330 Second USENIX C++ Conference Proceedings, April 1990.
1331
1332 [10] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
1333 C++ Report, SIGS 10 10 (November/December 1998).
1334
1335 [11] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functions
1336 for Reserved Word Lists} SIGPLAN Notices, 20, 12(September 1985), 47-53.
1337
1338 [12] Sprugnoli, R. @i{Perfect Hashing Functions: A Single Probe
1339 Retrieving Method for Static Sets} Communications of the ACM, 20
1340 11(November 1977), 841-850.
1341
1342 [13] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation,
1343 1988.
1344
1345 [14] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986.
1346
1347 [15] Tiemann, Michael D. @i{User's Guide to GNU C++} Free Software
1348 Foundation, 1989.
1349
1350 @node Concept Index, , Bibliography, Top
1351 @unnumbered Concept Index
1352
1353 @printindex cp
1354
1355 @contents
1356 @bye
OLDNEW
« no previous file with comments | « gperf/src/gperf/3.0.1/gperf-3.0.1-src/doc/gperf.pdf ('k') | gperf/src/gperf/3.0.1/gperf-3.0.1-src/doc/gperf.dvi.gz » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698