KEY:
* -- easy change
** - harder, needs more understanding
@ -- needs understanding of internals
@@ - mail me if you need this!

source fixes awaiting attention:
* common.c -- can this use fReadLine and ReadWord?
* docpath.c -- move docpath into db
* OpenDatabase -- should take severity, description, see liblqtext.h!

[1 - ui]
* give lqshow the ability to page a file
  (it can call up $PAGER if you use "v")
  Or, **, rewrite the curses-based front end altogether.
* make the gnome/gtk front end usable
@ make a perl module?
@ make a python module?

[2 - search,index]
@@ special treatment of dates in the index

[3 - index]
** table of pagers for browsing by file/type
** Make the table read from a file at run-time, and include filters
   Probably the filters should call LQT_AddWord(), instead of the
   other way round.

[4 - retrieval]
** Better ranking of queries
   Start with lqrank, which already does some sorting.  The difficult
   thing here is deciding on what basis to do the ranking.
   E.g. docs containing the target phrases the most times come first?
   Or does the length of the document make a difference too?
   Probably has to be configurable.
   See TREC conference reports, also see src/lqtext/lqsimilar

[6 - doc]
**@@ write a manual
   I have started this in the doc directory.  My goal is to have
   * a user manual
   * an administration manual
   * a programmer's manual, documenting the API.
     The API reference manual is the most advanced,
     but an API guide is needed.

[7 - index]
**@ The entire plural code (Root.c) needs a rethink.
   I have started Plurals.c, but it's not ready yet.  Yell if you have any
   ideas, I need them! [although "this" is OK now]
   In particular, how to handle morphological analysis into lemmas in an
   internationalised application is a difficulty.
@ allow user-defined stemmers (compiled-in) via config (README) file

[8 - index]
** allow dynamic definition of word start/mid/end, in README.
   Must be at least as fast as isupper() etc.
   Perhaps per-file-type rules, though?  makes Phrase Matching hard.


[9 - index]
@@ Replace the common words file with three files:
   [1] a list of words not to be indexed
   [2] a list of phrases to index completely, even if some of the words
       in them occur in [1]
   [3] a list of phrases not to index at all, possibly with the ability
       to mark specific words.
   Then you can say
     don't index /the/ except in /the times/
     always index /our/ except in /Our Company Ltd/
   Tim Bray of Open Text says stop lists are a bug, and I think I agree with
   him, except you also have to say that the price of disk is a bug too.

[10 - ui]
** lqshow could be made a routine (BrowseList() I suppose) that takes
   a list of Phrases with their matches...
   Hard to integrate into X.

[11 - implement.]
**@ should abandon dbm for the list of filenames.  A better approach would
   be to store path components as words in the database!  This would make
   / a common-word, though.  Needs some thought.  Or maybe as blocks in data.
   That would be fairly easy.
   A btree might be a good comprimise.  For now, at least db-1.xx doesn't
   have overflow problems.

(12 - index)
**@@ the ability to delete a file.
   (this has been done, lqunindex)

[13 - implementation]
** Better file locking
   (no file locking or signal handling at all at the moment -- I ripped it
    all out when I discovered that it was broken on many systems, and
    this gave a false sense of security.)
    Also, it was too slow, and gave console error messages on NFS!

(14 - retrieval)
* Phrase Matching would be orders of magnitude faster if it did not involve
    reading the tables of matches until they are needed, as many of them
    won't be!  It should extend the lists of matches for each word in the
    phrase only as necessary.
    Done.  It wasn't orders of maginitude in most cases, only a factor
    of two or so.  Could possibly be improved.

[16 - implementation]
@@ use mmap for data and widindex, in segments with a cache

[18 - implementation]
@ Add a WIDIndex cache!

[19 - implementation]
@ Close "chainend" (lasblks) on exit
    (done)
@ optionally remove chainend, with README line

[20 - implementation]
** Proper variable-based configuration, no global variables.
    (1.14: the global variables are all gone but no good config code yet)

(21 - index)
@@ 4-bit coding for when delta-block and WIB fit in 7 bits combined?
    (1.14: done)
    Or, could always use 4/4 bits, and set the top bit on each if continued,
    nead to measure what numbers occur.

[22 - index]
@@ Variable WIDBLOCKSIZE to reduce wastage?  How much is wasted?
    (1.16: not very much)

(23 - implementation)
* cd/rom changes --
    [23a]  read-only database
    (done)
    [23b]  ms/dos-compatible filenames
    (done)

[24 - retrieval]
@@ find within fielded data
    e.g. "find within title"

[25 - retrieval]
* make lqsed handle overlapping matches, e.g. use only the longest.
  ?what to do with this?:
	aa bb cc dd ee ff gg
	  [1     ]1   
	      [2    ]2
  Probably turn it into
	aa bb cc dd ee ff gg
	  [1        ]2

[25 - ui]
@@ integrate a decent command-line option parser and combine it with the
    config file.  Preferably something like X defaults, except with a
    description and type for each of them.
    Someone did this but never sent patches, and usd a GPL'd lib.
    Symbol table to be added to t_Database.
    See also item 20.

[26 ?]
*@ query expansion
    (this is done with lqquery for wildcards, but not for thesaurus)

[27 ?]
* quorum ranking
   (done in lqrank I think)

[28 ?]
@@ statistical ranking ala SMART.
    see also item 5.

[29 ?]
@@ support structured documents
    see also item 24.

[30 ui]
@@ i18n ??  unicode??

[31 implementation]
@@ use an mmap() cache for LQT_ReadBlock()

[32 filters]
integrate filters with findmatchends

[33 doc]
user guide

[34 doc]
update the man pages

[35 doc]
api reference/C
    (1.14: there's a fledgeling API reference, but it only documents
    functions, not data structures right now)

[37 ui]
** provide a GUI, e.g. using Motif?  Ugh.
   (1.14: I've started with Motif.)
   (1.16: I started again with perlgtk and got further)

[38 filters]
    C (used to have one of these, it seems to have got lost)
    ASCII
    man pages/troff (done!  but need work)
    SGML
    RTF?
    HTML

[39 retrieval]
proximity searching

[40 ?]
queries by file (this work started in Phrase.c)

[41 ?]
indexing of compressed and archived files

[42 ui]
Udi Manber's agrep on vocabulary.
Or maybe soundex.

[43 retrieval]
complex queries

[44 ?]
lqgrep/lqegrep using preprocessor to reduce no. of files to search

[46 index/physical]
lqdbfsck?  Program to check a db is ok.  lqword -A > /dev/null is silly.

[48 ?]
lq enhancements:
    save { matches/files } with(out) titles to a file
	 { index         }

[49 ?]
ship faq script

[50 ?]
@ thesaurus

[51 ?]
generalise the variables used in lqkwic to a general-purpose facility
suitable for i18n, etc.
    (1.14: this is what t_NameSpace is for, but is not yet widely used)

[52 WWW]
* CGI script interface, see also item 36

Known Bugs
==========
* lqshow does not know about file types
* there is no troff (or sqtroff) file type
	-> fixed, but the filter is still buggy: you'll get the wrong
	word highlighted quite often, I'm afraid.
* the C filter got lost in history (sigh)
write PID into block zero, also IP address of host, for exclusive access.
Use lockf or fcntl-style locking.  Need to test which ones work.

