lexi
[ -d FILE
| --dump-file FILE
] [ -f LIMIT
| --factor-limitLIMIT
] [ -i INLINES
| --inline INLINES
] [ -l LANGUAGE
| --language LANGUAGE
] [ -s OPTION
| --switch OPTION
] [ -t NUMBER
| --tab-width NUMBER
] [ -h | --help ] [ -e | --show-errors ] [ -v | --version ] file
...
The sid command is used to turn descriptions
of a language into a program for recognising that language. This manual page
details the command line syntax; for more information, consult the sid
user documentation. The number of
files specified on the command line varies depending upon the output
language. The description of the --language
option
specifies the number of files for each language.
sid accepts both short form and long form command line switches. The long form equivalents are due to be removed in the next release.
Short form switches are single characters, and begin with a '-' or '+' character. They can be concatentated into a single command line word, e.g.:
-vdl
dump-file
language-name
which contains three different switches (-v
, which
takes no arguments; -d
, which takes one argument:
dump-file
; and -l
, which
takes one argument: language-name
).
Long form switches are strings, and begin with --
or ++
. With long form switches, only the shortest
unique prefix need be entered. The long form of the above example
would be:
--version
--dump-file
dump-file
--language
language-name
In most cases the arguments to the switch should follow the switch as a separate word. In the case of short form switches, the arguments to the short form switches in a single word should follow the word in the order of the switches (as in the first example). For some options, the argument may be part of the same word as the switch (such options are shown without a space between the switch and the argument in the switch summaries below). In the case of short form switches, such a switch would terminate any concatentation of switches (either a character would follow it, which would be treated as its argument, or it would be the end of the word, and its argument would follow as normal).
For binary switches, the -
or --
switch prefixes set (enable) the switch, and the +
or ++
switch prefixes reset (disable) the switch.
This is probably back to front, but is in keeping with other programs. The
switches --
or ++
by themselves
terminate option parsing.
It is possible to change the error messages that sid
uses. In order to do this, make the environment variable
SID_ERROR_FILE
contain the name of a
file with the new error messages in.
The error file consists of zero or more sections. Each section begins
with a section marker (one of %prefix%
, %errors%
or
%strings%
). The prefix section takes a single string (this is to
be the prefix for all error messages). The other sections take zero or
more pairs of names and strings. A name is a sequence of characters
surrounded by single quotes. A string is a sequence of characters
surrounded by double quotes. In the case of the prefix and error
sections, the strings may contain variables of the form ${
variable
name
}
. These variables will be replaced by suitable information
when the error occurs. The backslash character can be used to escape
characters. The following C style escape sequences are recognized:
'\n
', '\r
', '\t
',
'\
'. Also, the sequence
'\x
NN
' represents the
character with code NN
in hex. The hash
character acts as a comment to end of line character.
The --show-errors
option may be used to
get a copy of the current error messages.
sid accepts the following command line options:
--dump-file
FILE
-d
FILE
This option causes intermediate dumps of
the grammar to be written to the file
FILE
.
--factor-limit
LIMIT
-f
LIMIT
This option limits the number of rules that can be created during the factorisation process. It is probably best not to change this.
--help
-?
Write an option summary to the standard error.
--inline
INLINES
-i
INLINES
This option controls what inlining will be done in the output parser. The inlines argument should be a comma seperated list of the following words:
SINGLES
This causes single alternative rules to be inlined. This inlining is no longer performed as a modification to the grammar (it was in version 1.0).
BASICS
This causes rules that contain only basics (and no exception handlers or empty alternatives) to be inlined. The restriction on exception handlers and empty alternatives is rather arbitrary, and may be changed later.
TAIL
This causes tail recursive calls to be inlined. Without this, tail recursion elimination will not be performed.
OTHER
This causes other calls to be inlined
wherever possible. Unless the
MULTI
inlining is
also specified, this will be done only for
productions that are called once.
MULTI
This causes calls to be inlined,
even if the rule being called is called
more than once. Turning this inlining on
implies OTHER
. Similarly
turning off OTHER
inlining
will turn off MULTI
inlining.
For grammars of any size, this is probably
best avoided; if used the generated parser may
be huge (e.g. a C grammar has produced a file that
was several hundred MB in size).
ALL
This turns on all inlining.
In addition, prefixing a word with “no” turns off that inlining phase. The words may be given in any case. They are evaluated in the order given, so:
-inline noall,singles
would turn on single alternative rule inlining only, whilst:
-inline singles,noall
would turn off all inlining. The default is as if sid were invoked with the option:
-inline noall,basics,tail
--language
LANGUAGE
-l
LANGUAGE
This option specifies the output language. Currently this should be either “ansi-c”, “pre-ansi-c”, or “test”. The default is “ansi-c”.
The “ansi-c” and “pre-ansi-c” languages are basically the same. The only difference is that “ansi-c” initially uses function prototypes, and “pre-ansi-c” doesn't. Each language takes two input files, a grammar file and an actions file, and produces two output files, a C source file containing the generated parser and a C header file containing the external declarations for the parser. The C language specific options are:
prototypes
proto
no-prototypes
no-proto
These enable or disable the use of function prototypes.
split split=NUMBER
no-split
These enable or disable the output file
split option. The generated files can be
very large even without inlining. This option
splits the main output file into a number of
components containing about NUMBER
lines each (the default being 50000).
These components are distinguished by successively
substituting 1, 2, 3, ... for the character
@
in the output file name.
numeric-ids
numeric
no-numeric-ids
no-numeric
These enable or disable the use of numeric identifiers. Numeric identifiers replace the identifier name with a number, which is mainly of use in stopping identifier names getting too long. The disadvantage is that the code becomes less readable, and more difficult to debug. Numeric identifiers are not used by default and are never used for terminal numbers.
casts
no-casts
no-cast
These enable or disable casting of action and assignment operator immutable parameters. If enabled, a parameter is cast to its own type when it is substituted into the action. This will cause some compilers to complain about attempts to modify the parameter (which can help pick out attempts at mutating parameters that should not be mutated). The disadvantage is that not all compilers will reject attempts at mutation, and that ANSI doesn't allow casting to structure and union types, which means that some code may be illegal. Parameter casting is disabled by default.
unreachable-macros
unreachable-macro
unreachable-comments
unreachable-comment
These choose whether unreachable code is marked by
a macro or a comment. The default is to mark
unreachable code with a comment /*UNREACHED*/
,
however a macro UNREACHED;
may be used
instead, if desired.
lines
line
no-lines
no-line
These determine whether #line
directives should be output to relate the
output file to the actions file.
These are generated by default.
The “test” language only takes one input file, and produces no output file. It may be used to check that a grammar is valid. In conjunction with the dump file, it may be used to check the transformations that would be applied to the grammar. There are no language specific options for the “test” language.
--show-errors
-e
Write the current error message list to the standard output.
--switch
OPTION
-s
OPTION
Pass through OPTION
as
a language specific option.
--tab-width
NUMBER
-t
NUMBER
This option specifies the number of spaces that a tab occupies. It defaults to 8. It is only used when indenting output.
--version
-v
This option causes the version number and supported languages to be written to the standard error stream.