.Overview "Language/Stop_List"
It's common practice in text retrieval to omit words from the database if they
occur very often.
For example, `and', `the' and `to' don't seem to add very much information.
However, in certain circumstances, such as `The Times', or `Bitwise and',
the words are suddenly of great significance.
.P
There are three approaches to this.
.P
First, you can say that people looking for The Times are out of luck.
.P
Second, you can index all of the words, and take a penalty on index size.
This penalty is usually from one to thirty percent of the total index size,
and is usually acceptable.
.P
Thirdly, you could specify a list of contexts in which words in the stoplist
are to be indexed anyway.
There are three problems with this last approach.
Firstly, you don't have enough context in a query to determine what to
do about those words.
Secondly, you have to think of all the contexts in advance; if you didn't
think of `the Times', the user would still be out of luck.
Finally, lq-text doesn't support this third approach directly, although
you could modify lq-text, perhaps using the routines in this category.
./Overview
