.\" t
.\" use sqtbl % | troff -man
.
.\" $Id: lqunindex.man,v 1.1 96/05/31 01:39:45 lee Exp $
.
.TH lqunindex l "1.14, May 1996" "\(co Liam Quin 1989, 1996" "lq-text programs"
.
.SH NAME
lqunindex \- remove files from an lq-text full-text retrieval inverted index
.SH SYNOPSIS
.
.de oB
. RB [ \\$1 ]
..
.de oI
. RB [ \\$1
. IR "\\$2" ]
..
.B lqaddfile
.oB \-vVx
.oI \-d dir
.oI \-t n
.oI \-M n
.oI \-W n
.oI \-H n
.oI \-f file
[
.I file
] .\|.\|.
.SH DESCRIPTION
Use
.I lqunindex
to remove files from an existing
.I lq-text
database.
Each file whose name is given to
.I lqunindex
is removed from the index.
If a file given to
.I lqunindex
is not an absolute pathname, it is searched for along each entry
in
.I docpath
in turn.
Note that the actual disk files themselves are not affected in any way;
they are read to determine which words they contain, but they are neither
written to nor removed.
.PP
The options (described below) for setting the cache size can make
a dramatic difference to the speed of
.IR lqunindex .
The general idea is maximise the use of physical memory; if the system
starts frantic paging or swapping activity, use a smaller cache size
next time.
Unfortunately, there's no good way of determining this number in advance,
partly because it depends on the distribution of word frequences in the
data, and partly because it's difficult to manage physical memory usage
on a virtual memory system such as Unix.
.
.PP
Note that while
.I lqunindex
is running, the database is not accessible by other users.
.
.PP
.
.SH "OPTIONS"
All of the options documented in
.IR lq-text "\^(1)"
are interepreted by
.IR lqunindex ,
although some of them apply only to matching rather than to indexing,
and are simply ignored.
.PP
The options specific to
.I lqunindex
are given here:
.TP
.BI \-f \^file
Read the names of files to add to the index from
.IR file ;
the files should be one per line.
Blank lines are ignored, and a leading # is taken to start a comment.
As with all
.I lq-text
programs, a value of \- for a filename means that the program is
to read standard input; see below under
.SM EXAMPLES
for a typical use of this.
.TP
.BI \-H n
Set the size of the hash table to
.I n
slots;
each slot holds information about every occurrence of a single word.
.
.TP
.BI \-w n
Specifies the maximum total number of occurrences of words that can
be held in the cache's in-memory hash table at one time.
Typically, the overhead in memory is about 12 bytes for each occurrence,
with an additional 20\-30 bytes per slot, depending on the overhead
of
.IR malloc "\^(3)"
and on the length of the particular word.
.
.SH EXAMPLES
.RS
find reports memos \-type f \-print | \-lqunindex \-w3000000 \-f \-
.RE
This removes from the index all of the files in the
.I reports
or
.I memos
directories.
The cache size is set about ten times higher than the default, but
no extra in-memory hash table slots are allocated, using
the default of 32768, so presumably each
word occurs an average of 1,000 times.
.PP
.RS
lqunindex \-t3 \-w\s-1\&1000000\s0 \-H\s-1\&5000000\s0 \-W2 war+peace waverly
.RE
The two enormous text files
.I war+peace
and
.I waverly
are indexed.
Memory and hash slots are allocated, and the \-W option indicates that
many words will occur only one or two times in a million (the million
comes from the \-w option).
The
.B \-t3
is a generic
.I lq-text
option that produces tracing.
You can also use the normal
.I lq-text
tracinfg options; see
.I "lqunindex -t list"
for a list of these.
.SH ENVIRONMENT
All the environment variables described in
.IR lq-text "\^(l)"
are understood; in particular,
.SM LQTEXTDIR
can contain the name of a directory containing the database to be used.
.PP
The
.SM TMPDIR
environment variable is used to determine where to place temporarily
uncompressed or unarchived copies of files in order to index them.
If not set, this defaults to
.IR /tmp .
This is only used for files that could not be read directly.
See also the note under
.SM BUGS
below.
.
.SH BUGS
.I lqunindex
is supposed to recognise compressed, packed, gzip'd or archived files,
which it does; however, it is
.I also
supposed to unpack the files appropriately before indexing them,
which it does not.
If you give only the basename of a file, however,
.I lqunindex
is more likely to do the right thing.
For example, if you want to index a file called
.I ichabod.Z
which is compressed, pass only the name
.I ichabod
to
.IR lqunindex ,
without the
.I \&.Z
suffix.
In this case,
.I lqunindex
will uncompress
.I ichabod.Z
into a temporary file and then index that.
.SH "SEE ALSO"
.IR lq-text "\^(l),"
.IR lq "\^(l),"
.IR lqkwic "\^(l),"
.IR lqaddfile "\^(l),"
.IR lqword "\^(l),"
.IR lqclean "\^(l),"
.IR lqfile "\^(l),"
.IR find "\^(l)."
.SH AUTHOR
Liam R. E. Quin, 1989, 1993, 1996
