Introduction

This manual describes how lq-text works from a technical standpoint.
After reading this manual, you will be able to extend lq-text, and to
use lq-text in your own applications, either at the shell-script
level or at the C level.

Modules

Technically, lq-text can be thought of as providing a number of services.
You will find that the functions in the C API are each labelled as
belonging to one or more categories, and these categories correspond to
the lq-text services.

The services are as follows:

[1] Lists of Files

lq-text can find a file by searching in a path,
and remembers the names and locations of every indexed file.
Every file is given a unique number, the FID; lq-text mantains a
two-way mapping between filenames and FIDs and back again.


[2] Lists of Words

lq-text keeps a list of every word that has been seen, and relates
that information to match-lists.
Every word is given a unique number, the WID; lq-text mantains a
two-way mapping between filenames and FIDs and back again.
Optionally, to save index space, this mapping can be made one-way only,
from word to WID, and not the other way round.

[3] Lists of Matches

For each word (or, more precisely, for each WID), lq-text retains a list
of matches: that is, of all of the places where the word occurred.
Each match contains a FID, along with other numbers described below
under Matches.  The matches are stored in a compressed format.
You could think of this service as nothing more than an efficient
way to store and later retrieve lists of numbers.

[4] Morphology and Parsing

lq-text provides a number of input filters to read files and extract all
the words from them.
In addition, lq-text attempts to reduce those words to their singular form.
This allows you to search for `boy' and find `boys', for example.

[5] Configuration and Preferences

lq-text provides a fairly primitive configuration mechanism through the
config.txt file in the database directory, and also some Unix command-line
option handling.

[6] Error Handling

lq-text has a single Error function called by the library for all errors.
Currently, the library does not attempt to recover from fatal errors.

[7] Efficient Input and Output

lq-text has a number of internal block buffer caches that are designed
to make input and output very efficient.
