gperf/src/gperf/3.0.1/gperf-3.0.1-src/doc/gperf.texi - Issue 10804012: Add native Windows binary for gperf.

Side by Side Diff: gperf/src/gperf/3.0.1/gperf-3.0.1-src/doc/gperf.texi

Issue 10804012: Add native Windows binary for gperf. (Closed) Base URL: svn://chrome-svn/chrome/trunk/deps/third_party/

Patch Set: Created 8 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

OLD	NEW
(Empty)
	1 \input texinfo @c -- texinfo --

	2 @c %**start of header

	3 @setfilename gperf.info

	4 @settitle Perfect Hash Function Generator

	5 @c @setchapternewpage odd

	6 @c %**end of header

	7

	8 @c some day we should @include version.texi instead of defining

	9 @c these values at hand.

	10 @set UPDATED 12 June 2003

	11 @set EDITION 3.0.1

	12 @set VERSION 3.0.1

	13 @c ---------------------

	14

	15 @c remove the black boxes generated in the GPL appendix.

	16 @finalout

	17

	18 @c Merge functions into the concept index

	19 @syncodeindex fn cp

	20 @c @synindex pg cp

	21

	22 @dircategory Programming Tools

	23 @direntry

	24 * Gperf: (gperf). Perfect Hash Function Generator.

	25 @end direntry

	26

	27 @ifinfo

	28 This file documents the features of the GNU Perfect Hash Function

	29 Generator @value{VERSION}.

	30

	31 Copyright @copyright{} 1989-2003 Free Software Foundation, Inc.

	32

	33 Permission is granted to make and distribute verbatim copies of this

	34 manual provided the copyright notice and this permission notice are

	35 preserved on all copies.

	36

	37 @ignore

	38 Permission is granted to process this file through TeX and print the

	39 results, provided the printed document carries a copying permission

	40 notice identical to this one except for the removal of this paragraph

	41 (this paragraph not being relevant to the printed manual).

	42

	43 @end ignore

	44

	45 Permission is granted to copy and distribute modified versions of this

	46 manual under the conditions for verbatim copying, provided also that the

	47 section entitled ``GNU General Public License'' is included exactly as

	48 in the original, and provided that the entire resulting derived work is

	49 distributed under the terms of a permission notice identical to this

	50 one.

	51

	52 Permission is granted to copy and distribute translations of this manual

	53 into another language, under the above conditions for modified versions,

	54 except that the section entitled ``GNU General Public License'' and this

	55 permission notice may be included in translations approved by the Free

	56 Software Foundation instead of in the original English.

	57

	58 @end ifinfo

	59

	60 @titlepage

	61 @title User's Guide to @code{gperf} @value{VERSION}

	62 @subtitle The GNU Perfect Hash Function Generator

	63 @subtitle Edition @value{EDITION}, @value{UPDATED}

	64 @author Douglas C. Schmidt

	65 @author Bruno Haible

	66

	67 @page

	68 @vskip 0pt plus 1filll

	69 Copyright @copyright{} 1989-2003 Free Software Foundation, Inc.

	70

	71

	72 Permission is granted to make and distribute verbatim copies of

	73 this manual provided the copyright notice and this permission notice

	74 are preserved on all copies.

	75

	76 Permission is granted to copy and distribute modified versions of this

	77 manual under the conditions for verbatim copying, provided also that the

	78 section entitled ``GNU General Public License'' is included

	79 exactly as in the original, and provided that the entire resulting

	80 derived work is distributed under the terms of a permission notice

	81 identical to this one.

	82

	83 Permission is granted to copy and distribute translations of this manual

	84 into another language, under the above conditions for modified versions,

	85 except that the section entitled ``GNU General Public License'' may be

	86 included in a translation approved by the author instead of in the

	87 original English.

	88 @end titlepage

	89

	90 @ifinfo

	91 @node Top, Copying, (dir), (dir)

	92 @top Introduction

	93

	94 This manual documents the GNU @code{gperf} perfect hash function generator

	95 utility, focusing on its features and how to use them, and how to report

	96 bugs.

	97

	98 @menu

	99 * Copying:: GNU @code{gperf} General Public License says

	100 how you can copy and share @code{gperf}.

	101 * Contributors:: People who have contributed to @code{gperf}.

	102 * Motivation:: The purpose of @code{gperf}.

	103 * Search Structures:: Static search structures and GNU @code{gperf}

	104 * Description:: High-level discussion of how GPERF functions.

	105 * Options:: A description of options to the program.

	106 * Bugs:: Known bugs and limitations with GPERF.

	107 * Projects:: Things still left to do.

	108 * Bibliography:: Material Referenced in this Report.

	109

	110 * Concept Index::

	111

	112 @detailmenu --- The Detailed Node Listing ---

	113

	114 High-Level Description of GNU @code{gperf}

	115

	116 * Input Format:: Input Format to @code{gperf}

	117 * Output Format:: Output Format for Generated C Code with @code{gp erf}

	118 * Binary Strings:: Use of NUL bytes

	119

	120 Input Format to @code{gperf}

	121

	122 * Declarations:: Declarations.

	123 * Keywords:: Format for Keyword Entries.

	124 * Functions:: Including Additional C Functions.

	125 * Controls for GNU indent:: Where to place directives for GNU @code{indent}.

	126

	127 Declarations

	128

	129 * User-supplied Struct:: Specifying keywords with attributes.

	130 * Gperf Declarations:: Embedding command line options in the input.

	131 * C Code Inclusion:: Including C declarations and definitions.

	132

	133 Invoking @code{gperf}

	134

	135 * Input Details:: Options that affect Interpretation of the Input File

	136 * Output Language:: Specifying the Language for the Output Code

	137 * Output Details:: Fine tuning Details in the Output Code

	138 * Algorithmic Details:: Changing the Algorithms employed by @code{gperf}

	139 * Verbosity:: Informative Output

	140

	141 @end detailmenu

	142 @end menu

	143

	144 @end ifinfo

	145

	146 @node Copying, Contributors, Top, Top

	147 @unnumbered GNU GENERAL PUBLIC LICENSE

	148 @include gpl.texinfo

	149

	150 @node Contributors, Motivation, Copying, Top

	151 @unnumbered Contributors to GNU @code{gperf} Utility

	152

	153 @itemize @bullet

	154 @item

	155 @cindex Bugs

	156 The GNU @code{gperf} perfect hash function generator utility was

	157 written in GNU C++ by Douglas C. Schmidt. The general

	158 idea for the perfect hash function generator was inspired by Keith

	159 Bostic's algorithm written in C, and distributed to net.sources around

	160 1984. The current program is a heavily modified, enhanced, and extended

	161 implementation of Keith's basic idea, created at the University of

	162 California, Irvine. Bugs, patches, and suggestions should be reported

	163 to @code{<bug-gnu-gperf@@gnu.org>}.

	164

	165 @item

	166 Special thanks is extended to Michael Tiemann and Doug Lea, for

	167 providing a useful compiler, and for giving me a forum to exhibit my

	168 creation.

	169

	170 In addition, Adam de Boor and Nels Olson provided many tips and insights

	171 that greatly helped improve the quality and functionality of @code{gperf}.

	172

	173 @item

	174 Bruno Haible enhanced and optimized the search algorithm. He also rewrote

	175 the input routines and the output routines for better reliability, and

	176 added a testsuite.

	177 @end itemize

	178

	179 @node Motivation, Search Structures, Contributors, Top

	180 @chapter Introduction

	181

	182 @code{gperf} is a perfect hash function generator written in C++. It

	183 transforms an @var{n} element user-specified keyword set @var{W} into a

	184 perfect hash function @var{F}. @var{F} uniquely maps keywords in

	185 @var{W} onto the range 0..@var{k}, where @var{k} >= @var{n-1}. If @var{k}

	186 = @var{n-1} then @var{F} is a @emph{minimal} perfect hash function.

	187 @code{gperf} generates a 0..@var{k} element static lookup table and a

	188 pair of C functions. These functions determine whether a given

	189 character string @var{s} occurs in @var{W}, using at most one probe into

	190 the lookup table.

	191

	192 @code{gperf} currently generates the reserved keyword recognizer for

	193 lexical analyzers in several production and research compilers and

	194 language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal,

	195 GNU Modula 3, and GNU indent. Complete C++ source code for @code{gperf} is

	196 available from @code{http://ftp.gnu.org/pub/gnu/gperf/}.

	197 A paper describing @code{gperf}'s design and implementation in greater

	198 detail is available in the Second USENIX C++ Conference proceedings

	199 or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}.

	200

	201 @node Search Structures, Description, Motivation, Top

	202 @chapter Static search structures and GNU @code{gperf}

	203 @cindex Static search structure

	204

	205 A @dfn{static search structure} is an Abstract Data Type with certain

	206 fundamental operations, e.g., @emph{initialize}, @emph{insert},

	207 and @emph{retrieve}. Conceptually, all insertions occur before any

	208 retrievals. In practice, @code{gperf} generates a @emph{static} array

	209 containing search set keywords and any associated attributes specified

	210 by the user. Thus, there is essentially no execution-time cost for the

	211 insertions. It is a useful data structure for representing @emph{static

	212 search sets}. Static search sets occur frequently in software system

	213 applications. Typical static search sets include compiler reserved

	214 words, assembler instruction opcodes, and built-in shell interpreter

	215 commands. Search set members, called @dfn{keywords}, are inserted into

	216 the structure only once, usually during program initialization, and are

	217 not generally modified at run-time.

	218

	219 Numerous static search structure implementations exist, e.g.,

	220 arrays, linked lists, binary search trees, digital search tries, and

	221 hash tables. Different approaches offer trade-offs between space

	222 utilization and search time efficiency. For example, an @var{n} element

	223 sorted array is space efficient, though the average-case time

	224 complexity for retrieval operations using binary search is

	225 proportional to log @var{n}. Conversely, hash table implementations

	226 often locate a table entry in constant time, but typically impose

	227 additional memory overhead and exhibit poor worst case performance.

	228

	229 @cindex Minimal perfect hash functions

	230 @emph{Minimal perfect hash functions} provide an optimal solution for a

	231 particular class of static search sets. A minimal perfect hash

	232 function is defined by two properties:

	233

	234 @itemize @bullet

	235 @item

	236 It allows keyword recognition in a static search set using at most

	237 @emph{one} probe into the hash table. This represents the ``perfect''

	238 property.

	239 @item

	240 The actual memory allocated to store the keywords is precisely large

	241 enough for the keyword set, and @emph{no larger}. This is the

	242 ``minimal'' property.

	243 @end itemize

	244

	245 For most applications it is far easier to generate @emph{perfect} hash

	246 functions than @emph{minimal perfect} hash functions. Moreover,

	247 non-minimal perfect hash functions frequently execute faster than

	248 minimal ones in practice. This phenomena occurs since searching a

	249 sparse keyword table increases the probability of locating a ``null''

	250 entry, thereby reducing string comparisons. @code{gperf}'s default

	251 behavior generates @emph{near-minimal} perfect hash functions for

	252 keyword sets. However, @code{gperf} provides many options that permit

	253 user control over the degree of minimality and perfection.

	254

	255 Static search sets often exhibit relative stability over time. For

	256 example, Ada's 63 reserved words have remained constant for nearly a

	257 decade. It is therefore frequently worthwhile to expend concerted

	258 effort building an optimal search structure @emph{once}, if it

	259 subsequently receives heavy use multiple times. @code{gperf} removes

	260 the drudgery associated with constructing time- and space-efficient

	261 search structures by hand. It has proven a useful and practical tool

	262 for serious programming projects. Output from @code{gperf} is currently

	263 used in several production and research compilers, including GNU C, GNU

	264 C++, GNU Java, GNU Pascal, and GNU Modula 3. The latter two compilers are

	265 not yet part of the official GNU distribution. Each compiler utilizes

	266 @code{gperf} to automatically generate static search structures that

	267 efficiently identify their respective reserved keywords.

	268

	269 @node Description, Options, Search Structures, Top

	270 @chapter High-Level Description of GNU @code{gperf}

	271

	272 @menu

	273 * Input Format:: Input Format to @code{gperf}

	274 * Output Format:: Output Format for Generated C Code with @code{gp erf}

	275 * Binary Strings:: Use of NUL bytes

	276 @end menu

	277

	278 The perfect hash function generator @code{gperf} reads a set of

	279 ``keywords'' from an input file (or from the standard input by

	280 default). It attempts to derive a perfect hashing function that

	281 recognizes a member of the @dfn{static keyword set} with at most a

	282 single probe into the lookup table. If @code{gperf} succeeds in

	283 generating such a function it produces a pair of C source code routines

	284 that perform hashing and table lookup recognition. All generated C code

	285 is directed to the standard output. Command-line options described

	286 below allow you to modify the input and output format to @code{gperf}.

	287

	288 By default, @code{gperf} attempts to produce time-efficient code, with

	289 less emphasis on efficient space utilization. However, several options

	290 exist that permit trading-off execution time for storage space and vice

	291 versa. In particular, expanding the generated table size produces a

	292 sparse search structure, generally yielding faster searches.

	293 Conversely, you can direct @code{gperf} to utilize a C @code{switch}

	294 statement scheme that minimizes data space storage size. Furthermore,

	295 using a C @code{switch} may actually speed up the keyword retrieval time

	296 somewhat. Actual results depend on your C compiler, of course.

	297

	298 In general, @code{gperf} assigns values to the bytes it is using

	299 for hashing until some set of values gives each keyword a unique value.

	300 A helpful heuristic is that the larger the hash value range, the easier

	301 it is for @code{gperf} to find and generate a perfect hash function.

	302 Experimentation is the key to getting the most from @code{gperf}.

	303

	304 @node Input Format, Output Format, Description, Description

	305 @section Input Format to @code{gperf}

	306 @cindex Format

	307 @cindex Declaration section

	308 @cindex Keywords section

	309 @cindex Functions section

	310 You can control the input file format by varying certain command-line

	311 arguments, in particular the @samp{-t} option. The input's appearance

	312 is similar to GNU utilities @code{flex} and @code{bison} (or UNIX

	313 utilities @code{lex} and @code{yacc}). Here's an outline of the general

	314 format:

	315

	316 @example

	317 @group

	318 declarations

	319 %%

	320 keywords

	321 %%

	322 functions

	323 @end group

	324 @end example

	325

	326 @emph{Unlike} @code{flex} or @code{bison}, the declarations section and

	327 the functions section are optional. The following sections describe the

	328 input format for each section.

	329

	330 @menu

	331 * Declarations:: Declarations.

	332 * Keywords:: Format for Keyword Entries.

	333 * Functions:: Including Additional C Functions.

	334 * Controls for GNU indent:: Where to place directives for GNU @code{indent}.

	335 @end menu

	336

	337 It is possible to omit the declaration section entirely, if the @samp{-t}

	338 option is not given. In this case the input file begins directly with the

	339 first keyword line, e.g.:

	340

	341 @example

	342 @group

	343 january

	344 february

	345 march

	346 april

	347 ...

	348 @end group

	349 @end example

	350

	351 @node Declarations, Keywords, Input Format, Input Format

	352 @subsection Declarations

	353

	354 The keyword input file optionally contains a section for including

	355 arbitrary C declarations and definitions, @code{gperf} declarations that

	356 act like command-line options, as well as for providing a user-supplied

	357 @code{struct}.

	358

	359 @menu

	360 * User-supplied Struct:: Specifying keywords with attributes.

	361 * Gperf Declarations:: Embedding command line options in the input.

	362 * C Code Inclusion:: Including C declarations and definitions.

	363 @end menu

	364

	365 @node User-supplied Struct, Gperf Declarations, Declarations, Declarations

	366 @subsubsection User-supplied @code{struct}

	367

	368 If the @samp{-t} option (or, equivalently, the @samp{%struct-type} declaration)

	369 @emph{is} enabled, you @emph{must} provide a C @code{struct} as the last

	370 component in the declaration section from the input file. The first

	371 field in this struct must be of type @code{char } or @code{const char }

	372 if the @samp{-P} option is not given, or of type @code{int} if the option

	373 @samp{-P} (or, equivalently, the @samp{%pic} declaration) is enabled.

	374 This first field must be called @samp{name}, although it is possible to modify

	375 its name with the @samp{-K} option (or, equivalently, the

	376 @samp{%define slot-name} declaration) described below.

	377

	378 Here is a simple example, using months of the year and their attributes as

	379 input:

	380

	381 @example

	382 @group

	383 struct month @{ char *name; int number; int days; int leap_days; @};

	384 %%

	385 january, 1, 31, 31

	386 february, 2, 28, 29

	387 march, 3, 31, 31

	388 april, 4, 30, 30

	389 may, 5, 31, 31

	390 june, 6, 30, 30

	391 july, 7, 31, 31

	392 august, 8, 31, 31

	393 september, 9, 30, 30

	394 october, 10, 31, 31

	395 november, 11, 30, 30

	396 december, 12, 31, 31

	397 @end group

	398 @end example

	399

	400 @cindex @samp{%%}

	401 Separating the @code{struct} declaration from the list of keywords and

	402 other fields are a pair of consecutive percent signs, @samp{%%},

	403 appearing left justified in the first column, as in the UNIX utility

	404 @code{lex}.

	405

	406 If the @code{struct} has already been declared in an include file, it can

	407 be mentioned in an abbreviated form, like this:

	408

	409 @example

	410 @group

	411 struct month;

	412 %%

	413 january, 1, 31, 31

	414 ...

	415 @end group

	416 @end example

	417

	418 @node Gperf Declarations, C Code Inclusion, User-supplied Struct, Declarations

	419 @subsubsection Gperf Declarations

	420

	421 The declaration section can contain @code{gperf} declarations. They

	422 influence the way @code{gperf} works, like command line options do.

	423 In fact, every such declaration is equivalent to a command line option.

	424 There are three forms of declarations:

	425

	426 @enumerate

	427 @item

	428 Declarations without argument, like @samp{%compare-lengths}.

	429

	430 @item

	431 Declarations with an argument, like @samp{%switch=@var{count}}.

	432

	433 @item

	434 Declarations of names of entities in the output file, like

	435 @samp{%define lookup-function-name @var{name}}.

	436 @end enumerate

	437

	438 When a declaration is given both in the input file and as a command line

	439 option, the command-line option's value prevails.

	440

	441 The following @code{gperf} declarations are available.

	442

	443 @table @samp

	444 @item %delimiters=@var{delimiter-list}

	445 @cindex @samp{%delimiters}

	446 Allows you to provide a string containing delimiters used to

	447 separate keywords from their attributes. The default is ",". This

	448 option is essential if you want to use keywords that have embedded

	449 commas or newlines.

	450

	451 @item %struct-type

	452 @cindex @samp{%struct-type}

	453 Allows you to include a @code{struct} type declaration for generated

	454 code; see above for an example.

	455

	456 @item %ignore-case

	457 @cindex @samp{%ignore-case}

	458 Consider upper and lower case ASCII characters as equivalent. The string

	459 comparison will use a case insignificant character comparison. Note that

	460 locale dependent case mappings are ignored.

	461

	462 @item %language=@var{language-name}

	463 @cindex @samp{%language}

	464 Instructs @code{gperf} to generate code in the language specified by the

	465 option's argument. Languages handled are currently:

	466

	467 @table @samp

	468 @item KR-C

	469 Old-style K&R C. This language is understood by old-style C compilers and

	470 ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)

	471 because of lacking @samp{const}.

	472

	473 @item C

	474 Common C. This language is understood by ANSI C compilers, and also by

	475 old-style C compilers, provided that you @code{#define const} to empty

	476 for compilers which don't know about this keyword.

	477

	478 @item ANSI-C

	479 ANSI C. This language is understood by ANSI C compilers and C++ compilers.

	480

	481 @item C++

	482 C++. This language is understood by C++ compilers.

	483 @end table

	484

	485 The default is C.

	486

	487 @item %define slot-name @var{name}

	488 @cindex @samp{%define slot-name}

	489 This declaration is only useful when option @samp{-t} (or, equivalently, the

	490 @samp{%struct-type} declaration) has been given.

	491 By default, the program assumes the structure component identifier for

	492 the keyword is @samp{name}. This option allows an arbitrary choice of

	493 identifier for this component, although it still must occur as the first

	494 field in your supplied @code{struct}.

	495

	496 @item %define initializer-suffix @var{initializers}

	497 @cindex @samp{%define initializer-suffix}

	498 This declaration is only useful when option @samp{-t} (or, equivalently, the

	499 @samp{%struct-type} declaration) has been given.

	500 It permits to specify initializers for the structure members following

	501 @var{slot-name} in empty hash table entries. The list of initializers

	502 should start with a comma. By default, the emitted code will

	503 zero-initialize structure members following @var{slot-name}.

	504

	505 @item %define hash-function-name @var{name}

	506 @cindex @samp{%define hash-function-name}

	507 Allows you to specify the name for the generated hash function. Default

	508 name is @samp{hash}. This option permits the use of two hash tables in

	509 the same file.

	510

	511 @item %define lookup-function-name @var{name}

	512 @cindex @samp{%define lookup-function-name}

	513 Allows you to specify the name for the generated lookup function.

	514 Default name is @samp{in_word_set}. This option permits multiple

	515 generated hash functions to be used in the same application.

	516

	517 @item %define class-name @var{name}

	518 @cindex @samp{%define class-name}

	519 This option is only useful when option @samp{-L C++} (or, equivalently,

	520 the @samp{%language=C++} declaration) has been given. It

	521 allows you to specify the name of generated C++ class. Default name is

	522 @code{Perfect_Hash}.

	523

	524 @item %7bit

	525 @cindex @samp{%7bit}

	526 This option specifies that all strings that will be passed as arguments

	527 to the generated hash function and the generated lookup function will

	528 solely consist of 7-bit ASCII characters (bytes in the range 0..127).

	529 (Note that the ANSI C functions @code{isalnum} and @code{isgraph} do

	530 @emph{not} guarantee that a byte is in this range. Only an explicit

	531 test like @samp{c >= 'A' && c <= 'Z'} guarantees this.)

	532

	533 @item %compare-lengths

	534 @cindex @samp{%compare-lengths}

	535 Compare keyword lengths before trying a string comparison. This option

	536 is mandatory for binary comparisons (@pxref{Binary Strings}). It also might

	537 cut down on the number of string comparisons made during the lookup, since

	538 keywords with different lengths are never compared via @code{strcmp}.

	539 However, using @samp{%compare-lengths} might greatly increase the size of the

	540 generated C code if the lookup table range is large (which implies that

	541 the switch option @samp{-S} or @samp{%switch} is not enabled), since the length

	542 table contains as many elements as there are entries in the lookup table.

	543

	544 @item %compare-strncmp

	545 @cindex @samp{%compare-strncmp}

	546 Generates C code that uses the @code{strncmp} function to perform

	547 string comparisons. The default action is to use @code{strcmp}.

	548

	549 @item %readonly-tables

	550 @cindex @samp{%readonly-tables}

	551 Makes the contents of all generated lookup tables constant, i.e.,

	552 ``readonly''. Many compilers can generate more efficient code for this

	553 by putting the tables in readonly memory.

	554

	555 @item %enum

	556 @cindex @samp{%enum}

	557 Define constant values using an enum local to the lookup function rather

	558 than with #defines. This also means that different lookup functions can

	559 reside in the same file. Thanks to James Clark @code{<jjc@@ai.mit.edu>}.

	560

	561 @item %includes

	562 @cindex @samp{%includes}

	563 Include the necessary system include file, @code{<string.h>}, at the

	564 beginning of the code. By default, this is not done; the user must

	565 include this header file himself to allow compilation of the code.

	566

	567 @item %global-table

	568 @cindex @samp{%global-table}

	569 Generate the static table of keywords as a static global variable,

	570 rather than hiding it inside of the lookup function (which is the

	571 default behavior).

	572

	573 @item %pic

	574 @cindex @samp{%pic}

	575 Optimize the generated table for inclusion in shared libraries. This

	576 reduces the startup time of programs using a shared library containing

	577 the generated code. If the @samp{%struct-type} declaration (or,

	578 equivalently, the option @samp{-t}) is also given, the first field of the

	579 user-defined struct must be of type @samp{int}, not @samp{char *}, because

	580 it will contain offsets into the string pool instead of actual strings.

	581 To convert such an offset to a string, you can use the expression

	582 @samp{stringpool + @var{o}}, where @var{o} is the offset. The string pool

	583 name can be changed through the @samp{%define string-pool-name} declaration.

	584

	585 @item %define string-pool-name @var{name}

	586 @cindex @samp{%define string-pool-name}

	587 Allows you to specify the name of the generated string pool created by

	588 the declaration @samp{%pic} (or, equivalently, the option @samp{-P}).

	589 The default name is @samp{stringpool}. This declaration permits the use of

	590 two hash tables in the same file, with @samp{%pic} and even when the

	591 @samp{%global-table} declaration (or, equivalently, the option @samp{-G})

	592 is given.

	593

	594 @item %null-strings

	595 @cindex @samp{%null-strings}

	596 Use NULL strings instead of empty strings for empty keyword table entries.

	597 This reduces the startup time of programs using a shared library containing

	598 the generated code (but not as much as the declaration @samp{%pic}), at the

	599 expense of one more test-and-branch instruction at run time.

	600

	601 @item %define word-array-name @var{name}

	602 @cindex @samp{%define word-array-name}

	603 Allows you to specify the name for the generated array containing the

	604 hash table. Default name is @samp{wordlist}. This option permits the

	605 use of two hash tables in the same file, even when the option @samp{-G}

	606 (or, equivalently, the @samp{%global-table} declaration) is given.

	607

	608 @item %switch=@var{count}

	609 @cindex @samp{%switch}

	610 Causes the generated C code to use a @code{switch} statement scheme,

	611 rather than an array lookup table. This can lead to a reduction in both

	612 time and space requirements for some input files. The argument to this

	613 option determines how many @code{switch} statements are generated. A

	614 value of 1 generates 1 @code{switch} containing all the elements, a

	615 value of 2 generates 2 tables with 1/2 the elements in each

	616 @code{switch}, etc. This is useful since many C compilers cannot

	617 correctly generate code for large @code{switch} statements. This option

	618 was inspired in part by Keith Bostic's original C program.

	619

	620 @item %omit-struct-type

	621 @cindex @samp{%omit-struct-type}

	622 Prevents the transfer of the type declaration to the output file. Use

	623 this option if the type is already defined elsewhere.

	624 @end table

	625

	626 @node C Code Inclusion, , Gperf Declarations, Declarations

	627 @subsubsection C Code Inclusion

	628

	629 @cindex @samp{%@{}

	630 @cindex @samp{%@}}

	631 Using a syntax similar to GNU utilities @code{flex} and @code{bison}, it

	632 is possible to directly include C source text and comments verbatim into

	633 the generated output file. This is accomplished by enclosing the region

	634 inside left-justified surrounding @samp{%@{}, @samp{%@}} pairs. Here is

	635 an input fragment based on the previous example that illustrates this

	636 feature:

	637

	638 @example

	639 @group

	640 %@{

	641 #include <assert.h>

	642 /* This section of code is inserted directly into the output. */

	643 int return_month_days (struct month *months, int is_leap_year);

	644 %@}

	645 struct month @{ char *name; int number; int days; int leap_days; @};

	646 %%

	647 january, 1, 31, 31

	648 february, 2, 28, 29

	649 march, 3, 31, 31

	650 ...

	651 @end group

	652 @end example

	653

	654 @node Keywords, Functions, Declarations, Input Format

	655 @subsection Format for Keyword Entries

	656

	657 The second input file format section contains lines of keywords and any

	658 associated attributes you might supply. A line beginning with @samp{#}

	659 in the first column is considered a comment. Everything following the

	660 @samp{#} is ignored, up to and including the following newline. A line

	661 beginning with @samp{%} in the first column is an option declaration and

	662 must not occur within the keywords section.

	663

	664 The first field of each non-comment line is always the keyword itself. It

	665 can be given in two ways: as a simple name, i.e., without surrounding

	666 string quotation marks, or as a string enclosed in double-quotes, in

	667 C syntax, possibly with backslash escapes like @code{\"} or @code{\234}

	668 or @code{\xa8}. In either case, it must start right at the beginning

	669 of the line, without leading whitespace.

	670 In this context, a ``field'' is considered to extend up to, but

	671 not include, the first blank, comma, or newline. Here is a simple

	672 example taken from a partial list of C reserved words:

	673

	674 @example

	675 @group

	676 # These are a few C reserved words, see the c.gperf file

	677 # for a complete list of ANSI C reserved words.

	678 unsigned

	679 sizeof

	680 switch

	681 signed

	682 if

	683 default

	684 for

	685 while

	686 return

	687 @end group

	688 @end example

	689

	690 Note that unlike @code{flex} or @code{bison} the first @samp{%%} marker

	691 may be elided if the declaration section is empty.

	692

	693 Additional fields may optionally follow the leading keyword. Fields

	694 should be separated by commas, and terminate at the end of line. What

	695 these fields mean is entirely up to you; they are used to initialize the

	696 elements of the user-defined @code{struct} provided by you in the

	697 declaration section. If the @samp{-t} option (or, equivalently, the

	698 @samp{%struct-type} declaration) is @emph{not} enabled

	699 these fields are simply ignored. All previous examples except the last

	700 one contain keyword attributes.

	701

	702 @node Functions, Controls for GNU indent, Keywords, Input Format

	703 @subsection Including Additional C Functions

	704

	705 The optional third section also corresponds closely with conventions

	706 found in @code{flex} and @code{bison}. All text in this section,

	707 starting at the final @samp{%%} and extending to the end of the input

	708 file, is included verbatim into the generated output file. Naturally,

	709 it is your responsibility to ensure that the code contained in this

	710 section is valid C.

	711

	712 @node Controls for GNU indent, , Functions, Input Format

	713 @subsection Where to place directives for GNU @code{indent}.

	714

	715 If you want to invoke GNU @code{indent} on a @code{gperf} input file,

	716 you will see that GNU @code{indent} doesn't understand the @samp{%%},

	717 @samp{%@{} and @samp{%@}} directives that control @code{gperf}'s

	718 interpretation of the input file. Therefore you have to insert some

	719 directives for GNU @code{indent}. More precisely, assuming the most

	720 general input file structure

	721

	722 @example

	723 @group

	724 declarations part 1

	725 %@{

	726 verbatim code

	727 %@}

	728 declarations part 2

	729 %%

	730 keywords

	731 %%

	732 functions

	733 @end group

	734 @end example

	735

	736 @noindent

	737 you would insert @samp{INDENT-OFF} and @samp{INDENT-ON} comments

	738 as follows:

	739

	740 @example

	741 @group

	742 /* INDENT-OFF */

	743 declarations part 1

	744 %@{

	745 /* INDENT-ON */

	746 verbatim code

	747 /* INDENT-OFF */

	748 %@}

	749 declarations part 2

	750 %%

	751 keywords

	752 %%

	753 /* INDENT-ON */

	754 functions

	755 @end group

	756 @end example

	757

	758 @node Output Format, Binary Strings, Input Format, Description

	759 @section Output Format for Generated C Code with @code{gperf}

	760 @cindex hash table

	761

	762 Several options control how the generated C code appears on the standard

	763 output. Two C function are generated. They are called @code{hash} and

	764 @code{in_word_set}, although you may modify their names with a command-line

	765 option. Both functions require two arguments, a string, @code{char *}

	766 @var{str}, and a length parameter, @code{int} @var{len}. Their default

	767 function prototypes are as follows:

	768

	769 @deftypefun {unsigned int} hash (const char * @var{str}, unsigned int @var{len})

	770 By default, the generated @code{hash} function returns an integer value

	771 created by adding @var{len} to several user-specified @var{str} byte

	772 positions indexed into an @dfn{associated values} table stored in a

	773 local static array. The associated values table is constructed

	774 internally by @code{gperf} and later output as a static local C array

	775 called @samp{hash_table}. The relevant selected positions (i.e. indices

	776 into @var{str}) are specified via the @samp{-k} option when running

	777 @code{gperf}, as detailed in the @emph{Options} section below (@pxref{Options}).

	778 @end deftypefun

	779

	780 @deftypefun {} in_word_set (const char * @var{str}, unsigned int @var{len})

	781 If @var{str} is in the keyword set, returns a pointer to that

	782 keyword. More exactly, if the option @samp{-t} (or, equivalently, the

	783 @samp{%struct-type} declaration) was given, it returns

	784 a pointer to the matching keyword's structure. Otherwise it returns

	785 @code{NULL}.

	786 @end deftypefun

	787

	788 If the option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}

	789 declaration) is not used, @var{str} must be a NUL terminated

	790 string of exactly length @var{len}. If @samp{-c} (or, equivalently, the

	791 @samp{%compare-strncmp} declaration) is used, @var{str} must

	792 simply be an array of @var{len} bytes and does not need to be NUL

	793 terminated.

	794

	795 The code generated for these two functions is affected by the following

	796 options:

	797

	798 @table @samp

	799 @item -t

	800 @itemx --struct-type

	801 Make use of the user-defined @code{struct}.

	802

	803 @item -S @var{total-switch-statements}

	804 @itemx --switch=@var{total-switch-statements}

	805 @cindex @code{switch}

	806 Generate 1 or more C @code{switch} statement rather than use a large,

	807 (and potentially sparse) static array. Although the exact time and

	808 space savings of this approach vary according to your C compiler's

	809 degree of optimization, this method often results in smaller and faster

	810 code.

	811 @end table

	812

	813 If the @samp{-t} and @samp{-S} options (or, equivalently, the

	814 @samp{%struct-type} and @samp{%switch} declarations) are omitted, the default

	815 action

	816 is to generate a @code{char *} array containing the keywords, together with

	817 additional empty strings used for padding the array. By experimenting

	818 with the various input and output options, and timing the resulting C

	819 code, you can determine the best option choices for different keyword

	820 set characteristics.

	821

	822 @node Binary Strings, , Output Format, Description

	823 @section Use of NUL bytes

	824 @cindex NUL

	825

	826 By default, the code generated by @code{gperf} operates on zero

	827 terminated strings, the usual representation of strings in C. This means

	828 that the keywords in the input file must not contain NUL bytes,

	829 and the @var{str} argument passed to @code{hash} or @code{in_word_set}

	830 must be NUL terminated and have exactly length @var{len}.

	831

	832 If option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}

	833 declaration) is used, then the @var{str} argument does not need

	834 to be NUL terminated. The code generated by @code{gperf} will only

	835 access the first @var{len}, not @var{len+1}, bytes starting at @var{str}.

	836 However, the keywords in the input file still must not contain NUL

	837 bytes.

	838

	839 If option @samp{-l} (or, equivalently, the @samp{%compare-lengths}

	840 declaration) is used, then the hash table performs binary

	841 comparison. The keywords in the input file may contain NUL bytes,

	842 written in string syntax as @code{\000} or @code{\x00}, and the code

	843 generated by @code{gperf} will treat NUL like any other byte.

	844 Also, in this case the @samp{-c} option (or, equivalently, the

	845 @samp{%compare-strncmp} declaration) is ignored.

	846

	847 @node Options, Bugs, Description, Top

	848 @chapter Invoking @code{gperf}

	849

	850 There are @emph{many} options to @code{gperf}. They were added to make

	851 the program more convenient for use with real applications. ``On-line''

	852 help is readily available via the @samp{--help} option. Here is the

	853 complete list of options.

	854

	855 @menu

	856 * Output File:: Specifying the Location of the Output File

	857 * Input Details:: Options that affect Interpretation of the Input File

	858 * Output Language:: Specifying the Language for the Output Code

	859 * Output Details:: Fine tuning Details in the Output Code

	860 * Algorithmic Details:: Changing the Algorithms employed by @code{gperf}

	861 * Verbosity:: Informative Output

	862 @end menu

	863

	864 @node Output File, Input Details, Options, Options

	865 @section Specifying the Location of the Output File

	866

	867 @table @samp

	868 @item --output-file=@var{file}

	869 Allows you to specify the name of the file to which the output is written to.

	870 @end table

	871

	872 The results are written to standard output if no output file is specified

	873 or if it is @samp{-}.

	874

	875 @node Input Details, Output Language, Output File, Options

	876 @section Options that affect Interpretation of the Input File

	877

	878 These options are also available as declarations in the input file

	879 (@pxref{Gperf Declarations}).

	880

	881 @table @samp

	882 @item -e @var{keyword-delimiter-list}

	883 @itemx --delimiters=@var{keyword-delimiter-list}

	884 @cindex Delimiters

	885 Allows you to provide a string containing delimiters used to

	886 separate keywords from their attributes. The default is ",". This

	887 option is essential if you want to use keywords that have embedded

	888 commas or newlines. One useful trick is to use -e'TAB', where TAB is

	889 the literal tab character.

	890

	891 @item -t

	892 @itemx --struct-type

	893 Allows you to include a @code{struct} type declaration for generated

	894 code. Any text before a pair of consecutive @samp{%%} is considered

	895 part of the type declaration. Keywords and additional fields may follow

	896 this, one group of fields per line. A set of examples for generating

	897 perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2,

	898 Modula 3 and JavaScript reserved words are distributed with this release.

	899

	900 @item --ignore-case

	901 Consider upper and lower case ASCII characters as equivalent. The string

	902 comparison will use a case insignificant character comparison. Note that

	903 locale dependent case mappings are ignored. This option is therefore not

	904 suitable if a properly internationalized or locale aware case mapping

	905 should be used. (For example, in a Turkish locale, the upper case equivalent

	906 of the lowercase ASCII letter @samp{i} is the non-ASCII character

	907 @samp{capital i with dot above}.) For this case, it is better to apply

	908 an uppercase or lowercase conversion on the string before passing it to

	909 the @code{gperf} generated function.

	910 @end table

	911

	912 @node Output Language, Output Details, Input Details, Options

	913 @section Options to specify the Language for the Output Code

	914

	915 These options are also available as declarations in the input file

	916 (@pxref{Gperf Declarations}).

	917

	918 @table @samp

	919 @item -L @var{generated-language-name}

	920 @itemx --language=@var{generated-language-name}

	921 Instructs @code{gperf} to generate code in the language specified by the

	922 option's argument. Languages handled are currently:

	923

	924 @table @samp

	925 @item KR-C

	926 Old-style K&R C. This language is understood by old-style C compilers and

	927 ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)

	928 because of lacking @samp{const}.

	929

	930 @item C

	931 Common C. This language is understood by ANSI C compilers, and also by

	932 old-style C compilers, provided that you @code{#define const} to empty

	933 for compilers which don't know about this keyword.

	934

	935 @item ANSI-C

	936 ANSI C. This language is understood by ANSI C compilers and C++ compilers.

	937

	938 @item C++

	939 C++. This language is understood by C++ compilers.

	940 @end table

	941

	942 The default is C.

	943

	944 @item -a

	945 This option is supported for compatibility with previous releases of

	946 @code{gperf}. It does not do anything.

	947

	948 @item -g

	949 This option is supported for compatibility with previous releases of

	950 @code{gperf}. It does not do anything.

	951 @end table

	952

	953 @node Output Details, Algorithmic Details, Output Language, Options

	954 @section Options for fine tuning Details in the Output Code

	955

	956 Most of these options are also available as declarations in the input file

	957 (@pxref{Gperf Declarations}).

	958

	959 @table @samp

	960 @item -K @var{slot-name}

	961 @itemx --slot-name=@var{slot-name}

	962 @cindex Slot name

	963 This option is only useful when option @samp{-t} (or, equivalently, the

	964 @samp{%struct-type} declaration) has been given.

	965 By default, the program assumes the structure component identifier for

	966 the keyword is @samp{name}. This option allows an arbitrary choice of

	967 identifier for this component, although it still must occur as the first

	968 field in your supplied @code{struct}.

	969

	970 @item -F @var{initializers}

	971 @itemx --initializer-suffix=@var{initializers}

	972 @cindex Initializers

	973 This option is only useful when option @samp{-t} (or, equivalently, the

	974 @samp{%struct-type} declaration) has been given.

	975 It permits to specify initializers for the structure members following

	976 @var{slot-name} in empty hash table entries. The list of initializers

	977 should start with a comma. By default, the emitted code will

	978 zero-initialize structure members following @var{slot-name}.

	979

	980 @item -H @var{hash-function-name}

	981 @itemx --hash-function-name=@var{hash-function-name}

	982 Allows you to specify the name for the generated hash function. Default

	983 name is @samp{hash}. This option permits the use of two hash tables in

	984 the same file.

	985

	986 @item -N @var{lookup-function-name}

	987 @itemx --lookup-function-name=@var{lookup-function-name}

	988 Allows you to specify the name for the generated lookup function.

	989 Default name is @samp{in_word_set}. This option permits multiple

	990 generated hash functions to be used in the same application.

	991

	992 @item -Z @var{class-name}

	993 @itemx --class-name=@var{class-name}

	994 @cindex Class name

	995 This option is only useful when option @samp{-L C++} (or, equivalently,

	996 the @samp{%language=C++} declaration) has been given. It

	997 allows you to specify the name of generated C++ class. Default name is

	998 @code{Perfect_Hash}.

	999

	1000 @item -7

	1001 @itemx --seven-bit

	1002 This option specifies that all strings that will be passed as arguments

	1003 to the generated hash function and the generated lookup function will

	1004 solely consist of 7-bit ASCII characters (bytes in the range 0..127).

	1005 (Note that the ANSI C functions @code{isalnum} and @code{isgraph} do

	1006 @emph{not} guarantee that a byte is in this range. Only an explicit

	1007 test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the

	1008 default in versions of @code{gperf} earlier than 2.7; now the default is

	1009 to support 8-bit and multibyte characters.

	1010

	1011 @item -l

	1012 @itemx --compare-lengths

	1013 Compare keyword lengths before trying a string comparison. This option

	1014 is mandatory for binary comparisons (@pxref{Binary Strings}). It also might

	1015 cut down on the number of string comparisons made during the lookup, since

	1016 keywords with different lengths are never compared via @code{strcmp}.

	1017 However, using @samp{-l} might greatly increase the size of the

	1018 generated C code if the lookup table range is large (which implies that

	1019 the switch option @samp{-S} or @samp{%switch} is not enabled), since the length

	1020 table contains as many elements as there are entries in the lookup table.

	1021

	1022 @item -c

	1023 @itemx --compare-strncmp

	1024 Generates C code that uses the @code{strncmp} function to perform

	1025 string comparisons. The default action is to use @code{strcmp}.

	1026

	1027 @item -C

	1028 @itemx --readonly-tables

	1029 Makes the contents of all generated lookup tables constant, i.e.,

	1030 ``readonly''. Many compilers can generate more efficient code for this

	1031 by putting the tables in readonly memory.

	1032

	1033 @item -E

	1034 @itemx --enum

	1035 Define constant values using an enum local to the lookup function rather

	1036 than with #defines. This also means that different lookup functions can

	1037 reside in the same file. Thanks to James Clark @code{<jjc@@ai.mit.edu>}.

	1038

	1039 @item -I

	1040 @itemx --includes

	1041 Include the necessary system include file, @code{<string.h>}, at the

	1042 beginning of the code. By default, this is not done; the user must

	1043 include this header file himself to allow compilation of the code.

	1044

	1045 @item -G

	1046 @itemx --global-table

	1047 Generate the static table of keywords as a static global variable,

	1048 rather than hiding it inside of the lookup function (which is the

	1049 default behavior).

	1050

	1051 @item -P

	1052 @itemx --pic

	1053 Optimize the generated table for inclusion in shared libraries. This

	1054 reduces the startup time of programs using a shared library containing

	1055 the generated code. If the option @samp{-t} (or, equivalently, the

	1056 @samp{%struct-type} declaration) is also given, the first field of the

	1057 user-defined struct must be of type @samp{int}, not @samp{char *}, because

	1058 it will contain offsets into the string pool instead of actual strings.

	1059 To convert such an offset to a string, you can use the expression

	1060 @samp{stringpool + @var{o}}, where @var{o} is the offset. The string pool

	1061 name can be changed through the option @samp{--string-pool-name}.

	1062

	1063 @item -Q @var{string-pool-name}

	1064 @itemx --string-pool-name=@var{string-pool-name}

	1065 Allows you to specify the name of the generated string pool created by

	1066 option @samp{-P}. The default name is @samp{stringpool}. This option

	1067 permits the use of two hash tables in the same file, with @samp{-P} and

	1068 even when the option @samp{-G} (or, equivalently, the @samp{%global-table}

	1069 declaration) is given.

	1070

	1071 @item --null-strings

	1072 Use NULL strings instead of empty strings for empty keyword table entries.

	1073 This reduces the startup time of programs using a shared library containing

	1074 the generated code (but not as much as option @samp{-P}), at the expense

	1075 of one more test-and-branch instruction at run time.

	1076

	1077 @item -W @var{hash-table-array-name}

	1078 @itemx --word-array-name=@var{hash-table-array-name}

	1079 @cindex Array name

	1080 Allows you to specify the name for the generated array containing the

	1081 hash table. Default name is @samp{wordlist}. This option permits the

	1082 use of two hash tables in the same file, even when the option @samp{-G}

	1083 (or, equivalently, the @samp{%global-table} declaration) is given.

	1084

	1085 @item -S @var{total-switch-statements}

	1086 @itemx --switch=@var{total-switch-statements}

	1087 @cindex @code{switch}

	1088 Causes the generated C code to use a @code{switch} statement scheme,

	1089 rather than an array lookup table. This can lead to a reduction in both

	1090 time and space requirements for some input files. The argument to this

	1091 option determines how many @code{switch} statements are generated. A

	1092 value of 1 generates 1 @code{switch} containing all the elements, a

	1093 value of 2 generates 2 tables with 1/2 the elements in each

	1094 @code{switch}, etc. This is useful since many C compilers cannot

	1095 correctly generate code for large @code{switch} statements. This option

	1096 was inspired in part by Keith Bostic's original C program.

	1097

	1098 @item -T

	1099 @itemx --omit-struct-type

	1100 Prevents the transfer of the type declaration to the output file. Use

	1101 this option if the type is already defined elsewhere.

	1102

	1103 @item -p

	1104 This option is supported for compatibility with previous releases of

	1105 @code{gperf}. It does not do anything.

	1106 @end table

	1107

	1108 @node Algorithmic Details, Verbosity, Output Details, Options

	1109 @section Options for changing the Algorithms employed by @code{gperf}

	1110

	1111 @table @samp

	1112 @item -k @var{selected-byte-positions}

	1113 @itemx --key-positions=@var{selected-byte-positions}

	1114 Allows selection of the byte positions used in the keywords'

	1115 hash function. The allowable choices range between 1-255, inclusive.

	1116 The positions are separated by commas, e.g., @samp{-k 9,4,13,14};

	1117 ranges may be used, e.g., @samp{-k 2-7}; and positions may occur

	1118 in any order. Furthermore, the wildcard '*' causes the generated

	1119 hash function to consider @strong{all} byte positions in each keyword,

	1120 whereas '$' instructs the hash function to use the ``final byte''

	1121 of a keyword (this is the only way to use a byte position greater than

	1122 255, incidentally).

	1123

	1124 For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash

	1125 function that considers positions 1,2,4,6,7,8,9,10, plus the last

	1126 byte in each keyword (which may be at a different position for each

	1127 keyword, obviously). Keywords

	1128 with length less than the indicated byte positions work properly, since

	1129 selected byte positions exceeding the keyword length are simply not

	1130 referenced in the hash function.

	1131

	1132 This option is not normally needed since version 2.8 of @code{gperf};

	1133 the default byte positions are computed depending on the keyword set,

	1134 through a search that minimizes the number of byte positions.

	1135

	1136 @item -D

	1137 @itemx --duplicates

	1138 @cindex Duplicates

	1139 Handle keywords whose selected byte sets hash to duplicate values.

	1140 Duplicate hash values can occur if a set of keywords has the same names, but

	1141 possesses different attributes, or if the selected byte positions are not well

	1142 chosen. With the -D option @code{gperf} treats all these keywords as

	1143 part of an equivalence class and generates a perfect hash function with

	1144 multiple comparisons for duplicate keywords. It is up to you to completely

	1145 disambiguate the keywords by modifying the generated C code. However,

	1146 @code{gperf} helps you out by organizing the output.

	1147

	1148 Using this option usually means that the generated hash function is no

	1149 longer perfect. On the other hand, it permits @code{gperf} to work on

	1150 keyword sets that it otherwise could not handle.

	1151

	1152 @item -m @var{iterations}

	1153 @itemx --multiple-iterations=@var{iterations}

	1154 Perform multiple choices of the @samp{-i} and @samp{-j} values, and

	1155 choose the best results. This increases the running time by a factor of

	1156 @var{iterations} but does a good job minimizing the generated table size.

	1157

	1158 @item -i @var{initial-value}

	1159 @itemx --initial-asso=@var{initial-value}

	1160 Provides an initial @var{value} for the associate values array. Default

	1161 is 0. Increasing the initial value helps inflate the final table size,

	1162 possibly leading to more time efficient keyword lookups. Note that this

	1163 option is not particularly useful when @samp{-S} (or, equivalently,

	1164 @samp{%switch}) is used. Also,

	1165 @samp{-i} is overridden when the @samp{-r} option is used.

	1166

	1167 @item -j @var{jump-value}

	1168 @itemx --jump=@var{jump-value}

	1169 @cindex Jump value

	1170 Affects the ``jump value'', i.e., how far to advance the associated

	1171 byte value upon collisions. @var{Jump-value} is rounded up to an

	1172 odd number, the default is 5. If the @var{jump-value} is 0 @code{gperf}

	1173 jumps by random amounts.

	1174

	1175 @item -n

	1176 @itemx --no-strlen

	1177 Instructs the generator not to include the length of a keyword when

	1178 computing its hash value. This may save a few assembly instructions in

	1179 the generated lookup table.

	1180

	1181 @item -r

	1182 @itemx --random

	1183 Utilizes randomness to initialize the associated values table. This

	1184 frequently generates solutions faster than using deterministic

	1185 initialization (which starts all associated values at 0). Furthermore,

	1186 using the randomization option generally increases the size of the

	1187 table.

	1188

	1189 @item -s @var{size-multiple}

	1190 @itemx --size-multiple=@var{size-multiple}

	1191 Affects the size of the generated hash table. The numeric argument for

	1192 this option indicates ``how many times larger or smaller'' the maximum

	1193 associated value range should be, in relationship to the number of keywords.

	1194 It can be written as an integer, a floating-point number or a fraction.

	1195 For example, a value of 3 means ``allow the maximum associated value to be

	1196 about 3 times larger than the number of input keywords''.

	1197 Conversely, a value of 1/3 means ``allow the maximum associated value to

	1198 be about 3 times smaller than the number of input keywords''. Values

	1199 smaller than 1 are useful for limiting the overall size of the generated hash

	1200 table, though the option @samp{-m} is better at this purpose.

	1201

	1202 If `generate switch' option @samp{-S} (or, equivalently, @samp{%switch}) is

	1203 @emph{not} enabled, the maximum

	1204 associated value influences the static array table size, and a larger

	1205 table should decrease the time required for an unsuccessful search, at

	1206 the expense of extra table space.

	1207

	1208 The default value is 1, thus the default maximum associated value about

	1209 the same size as the number of keywords (for efficiency, the maximum

	1210 associated value is always rounded up to a power of 2). The actual

	1211 table size may vary somewhat, since this technique is essentially a

	1212 heuristic.

	1213 @end table

	1214

	1215 @node Verbosity, , Algorithmic Details, Options

	1216 @section Informative Output

	1217

	1218 @table @samp

	1219 @item -h

	1220 @itemx --help

	1221 Prints a short summary on the meaning of each program option. Aborts

	1222 further program execution.

	1223

	1224 @item -v

	1225 @itemx --version

	1226 Prints out the current version number.

	1227

	1228 @item -d

	1229 @itemx --debug

	1230 Enables the debugging option. This produces verbose diagnostics to

	1231 ``standard error'' when @code{gperf} is executing. It is useful both for

	1232 maintaining the program and for determining whether a given set of

	1233 options is actually speeding up the search for a solution. Some useful

	1234 information is dumped at the end of the program when the @samp{-d}

	1235 option is enabled.

	1236 @end table

	1237

	1238 @node Bugs, Projects, Options, Top

	1239 @chapter Known Bugs and Limitations with @code{gperf}

	1240

	1241 The following are some limitations with the current release of

	1242 @code{gperf}:

	1243

	1244 @itemize @bullet

	1245 @item

	1246 The @code{gperf} utility is tuned to execute quickly, and works quickly

	1247 for small to medium size data sets (around 1000 keywords). It is

	1248 extremely useful for maintaining perfect hash functions for compiler

	1249 keyword sets. Several recent enhancements now enable @code{gperf} to

	1250 work efficiently on much larger keyword sets (over 15,000 keywords).

	1251 When processing large keyword sets it helps greatly to have over 8 megs

	1252 of RAM.

	1253

	1254 @item

	1255 The size of the generate static keyword array can get @emph{extremely}

	1256 large if the input keyword file is large or if the keywords are quite

	1257 similar. This tends to slow down the compilation of the generated C

	1258 code, and @emph{greatly} inflates the object code size. If this

	1259 situation occurs, consider using the @samp{-S} option to reduce data

	1260 size, potentially increasing keyword recognition time a negligible

	1261 amount. Since many C compilers cannot correctly generate code for

	1262 large switch statements it is important to qualify the @var{-S} option

	1263 with an appropriate numerical argument that controls the number of

	1264 switch statements generated.

	1265

	1266 @item

	1267 The maximum number of selected byte positions has an

	1268 arbitrary limit of 255. This restriction should be removed, and if

	1269 anyone considers this a problem write me and let me know so I can remove

	1270 the constraint.

	1271 @end itemize

	1272

	1273 @node Projects, Bibliography, Bugs, Top

	1274 @chapter Things Still Left to Do

	1275

	1276 It should be ``relatively'' easy to replace the current perfect hash

	1277 function algorithm with a more exhaustive approach; the perfect hash

	1278 module is essential independent from other program modules. Additional

	1279 worthwhile improvements include:

	1280

	1281 @itemize @bullet

	1282 @item

	1283 Another useful extension involves modifying the program to generate

	1284 ``minimal'' perfect hash functions (under certain circumstances, the

	1285 current version can be rather extravagant in the generated table size).

	1286 This is mostly of theoretical interest, since a sparse table

	1287 often produces faster lookups, and use of the @samp{-S} @code{switch}

	1288 option can minimize the data size, at the expense of slightly longer

	1289 lookups (note that the gcc compiler generally produces good code for

	1290 @code{switch} statements, reducing the need for more complex schemes).

	1291

	1292 @item

	1293 In addition to improving the algorithm, it would also be useful to

	1294 generate an Ada package as the code output, in addition to the current

	1295 C and C++ routines.

	1296 @end itemize

	1297

	1298 @page

	1299

	1300 @node Bibliography, Concept Index, Projects, Top

	1301 @chapter Bibliography

	1302

	1303 [1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect

	1304 Hashing Functions} Information Sciences 39(1986), 187-195.

	1305

	1306 [2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfect Hash

	1307 Functions Method''} Communications of the ACM, 23, 12(December 1980), 729.

	1308

	1309 [3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple}

	1310 Communications of the ACM, 23, 1(January 1980), 17-19.

	1311

	1312 [4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented Minimal

	1313 Perfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27.

	1314

	1315 [5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.

	1316 @i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58.

	1317

	1318 [6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating Minimal

	1319 Perfect Hashing Functions} Communications of the ACM, 24, 12(December

	1320 1981), 829-833.

	1321

	1322 [7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal Perfect

	1323 Hash Functions Method} Communications of the ACM, 23, 12(December 1980),

	1324 728-729.

	1325

	1326 [8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal Perfect

	1327 Hash Functions} Communications of the ACM, 28, 5(December 1985), 523-532

	1328

	1329 [9] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}

	1330 Second USENIX C++ Conference Proceedings, April 1990.

	1331

	1332 [10] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}

	1333 C++ Report, SIGS 10 10 (November/December 1998).

	1334

	1335 [11] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functions

	1336 for Reserved Word Lists} SIGPLAN Notices, 20, 12(September 1985), 47-53.

	1337

	1338 [12] Sprugnoli, R. @i{Perfect Hashing Functions: A Single Probe

	1339 Retrieving Method for Static Sets} Communications of the ACM, 20

	1340 11(November 1977), 841-850.

	1341

	1342 [13] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation,

	1343 1988.

	1344

	1345 [14] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986.

	1346

	1347 [15] Tiemann, Michael D. @i{User's Guide to GNU C++} Free Software

	1348 Foundation, 1989.

	1349

	1350 @node Concept Index, , Bibliography, Top

	1351 @unnumbered Concept Index

	1352

	1353 @printindex cp

	1354

	1355 @contents

	1356 @bye

OLD	NEW

« no previous file with comments | « gperf/src/gperf/3.0.1/gperf-3.0.1-src/doc/gperf.pdf ('k') | gperf/src/gperf/3.0.1/gperf-3.0.1-src/doc/gperf.dvi.gz » ('j') | no next file with comments »