The lq-text database configuration file: config.txt
Purpose
The file config.txt is used to tell lq-text how to make
an index. For example, this is where you tell lq-text where to find
the files to put in the index.
Once you have created this file, you can run lqaddfile
to build the index.
After that, you must not alter the config.txt file.
Format
config.txt must be a plain text file; you can create it with
vi or emacs or even Windows Notepad if you like.
The file contains a list of entries, one per line, each of which
controls some aspect or other of the database.
Blank lines are ignored, as are lines starting with a # sign.
You can use # to start a comment:
# this text is ignored
Leading and trailing spaces and tabs are ignored.
# this text is ignored
# so is this one
Note that a # does not introduce a comment elsewhere on a line.
Apart from blank lines and comments, each line contains a keyword
followed by a value.
Example:
MinWordLength 2
The keywords are described below.
They are case insensitive, so that DocPath is the same
as docpath; the values themselves are case sensitive, however.
Keywords and Value Types
Each keyword has a value type. The value may be a
list of directories, a string, a number, or a truth value (yes or no),
as follows:
- Path
- A Path variable has a value that is a list of directories,
separated by colons. If you need to include a directory with a
space in its name, you can put the entire value in double quotes.
Example:
DocPath "/home/vax5/liam/documents:/usr/pub/text files"
- String
- A String variable has a value that is not interpreted
directly: you can put anything you like there. If you need to include
spaces, use double quotes.
Example:Title "This is the title"
- Number
- A value that is a Number must be a whole number (an
integer); it can be a positive number (e.g. 12),
a negative number (e.g. -5) or zero (0).
Example:MaxWordLength 16
- Affirmation
- A value that is an Affirmation must be either yes
or no. You can use True and False instead if
you prefer.
Example:
ConvertNumbers On
- Flags
- A Flags value is a list of Flags separated by a
vertical bar ( | ); the flag values are combined
using logical or, although the effect in practice is to set all
the flags that are given. You can also subtract one set of flags
from another using -, and you can add them with +.
Most lists of Flags can include All and None, as well as whatever
specific values are appropriate.
Example:
WordFlags All-HasStuffBefore|NextHasPunct
Keywords, Their Meanings and Their Default Values
If a default value is given, it refers to the value if the
entry is omitted entirely; if you use a keyword, you must give a
value. If you want to use the default value, you can comment out
the entry with a # at the start of the line.
- Administrator_Email
- A String containing an electronic mail address of a
peron to be contacted if there is a technical problem with the
database. This variable is not used by lq-text, but is
provided for use by other systems, and to encourage a uniform
way of specifying the information.
Default value: unset.
- CGI_URL
- A String containing the Uniform Resource Locator of a
Word Wide Web interface to the database.
Synonyms: URL, CGI-URL, CGIURL
Default value: unset.
- Common
- See under Stoplist.
- ConvertNumbers
- An Affirmation. If this is set to on,
words encountered in the input that have a leading zero and
contain no digit higher than 7 are treated as octal, and indexed
as their decimal equivalent; words starting with 0x are treated as
hexadecimal and indexed similarly. Leading zeros are removed.
If ConvertNumbers is set to off, such conversions are
not performed, and numbers are indexed as if they were normal
words.
Example: If On, 0277 will be indexed as 191,
0xfF will be indexed as 255, and 00009 will be indexed as 9.
See Also: IndexNumbers.
Default value: off.
- Data_Owner_Email
- A String containing an electronic mail address for the
person or organisation who gave permission for this material to be
made available, or who holds editorial control over it.
This variable is not used by lq-text, but is
provided for use by other systems, and to encourage a uniform
way of specifying the information.
Default value: unset.
- DocPath
- This is a Path variable. It consists of a list of
directories, separated by colons (:) (or by semicolons (;) on
Microsoft Windows NT). Whenever any lq-text program tries
to find a document, it searches each directory in DocPath in
turn until it succeeds.
Example: DocPath "/usr/doc:/home/mieza/Alexander"
If you try and index a file called "people/plato", lq-text
will look first for "/usr/doc/people/plato" and then for
"/home/mieza/Alexander/people/plato".
Synonyms: path
¶There is no default: you must always include this entry.
- Echo
- This String is intended for debugging; its argument is
printed to standard error whenever the configuration file containing
it is read. Use of this keyword is discouraged.
- End
- This keyword has no value, but must appear on a line by itself.
If it appears, the rest of the configuration file is ignored.
- FileBlockSize
- Word locations are represented in an lq-text index as
a file number (FID), a Block Number, and a word number within
that block. You will get the smallest index if, on average, almost
all file blocks contain less than 128 words. Since the average
word length in English is just under 4 letters, and there is always
at least one space between two words, the default value for
FileBlockSize is five times (MinWordLength + 1).
You can override that here: FileBlockSize is the Number of
bytes in each block.
- IndexNumbers
- If this Affirmation is set to off, words that appear
to lq-text to be numbers will not be included in the database.
If it is set to on, they will be indexed as determined by
the value of the ConvertNumbers keyword.
Synonyms: numbers
Default Value: off
- Locale
- A String containing the name of the POSIX Locale to use for
the purposes of string comparison, collation, and in particular for
determining what characters represent upper and lower case letters.
A useful value on most systems is iso_8859_1. It is passed
to the C setlocale function.
Default Value: the environment variable LOCALE is used; if this is
unset, the default locale is sysem dependent, and is usually C,
which causes lq-text to discard 8-bit characters.
- MaxWordLength
- This is a Number. Words longer than MaxWordLength
characters will be truncated to that length after stemming.
Synonyms: maxlen
Default Value: 18
- MinWordLength
- This is a Number. Words shorter than MinWordLength
characters will not be added to the index. If WordFlags
includes LastHadLetters, that flag will be set on the first indexed
word following a word that was considered too short to index.
Synonyms: minlen
Default Value: 2
- StopList
- A String that names a file that must exist. If the
filename does not begin with a dot (.) or slash (/), it is sought
in the database directory itself; DocPath is not used
to find this file.
The file must contain a list of words, one per line, that are to
be ignored, and not indexed, whenever they occur. You can use
a # to introduce comments anywhere on a line.
It is usually best to make the stoplist be an empty file.
Synonyms: common
Default Value: stoplist
- Title
- A String containing a short (single line) description
of the database suitable to appear in a list of databases.
This variable is not used by lq-text, but is
provided for use by other systems, and to encourage a uniform
way of specifying the information.
Default value: unset.
- WordFlags
- This is a List of Flags that is used
to dewtermine what aspects of words to store in the database.
The actual values are described under
Word Rules.
You should not set or alter this value without understanding
what you are doing.
The most useful values are All (the default) and All-HasStuffBefore,
which will save approximately 5% to 10% of the index size.
Default Value: All
- WordsInIndex
- This is an Affirmation that you
can set to No if you are very tight on database storage space.
Normally, the actual text of each word is stored twice: once so that
it can be converted to a Word Identifier for looking up in the
database, and once so that it can be used to match against with
lqquery and lqwordlist. If WordsInIndex is set
to No, the second copy is not retained, and lqquery
and lqwordlist cannot be used.
Synonyms: WordList
Default Value: yes (on)
See Also