KEY:
* -- easy change
** - harder, needs more understanding
@ -- needs understanding of internals
@@ - mail me if you need this!

source fixes awaiting attention:
* common.c -- can this use fReadLine and ReadWord?
* docpath.c -- move docpath into db
* OpenDatabase -- should take severity, description, see liblqtext.h!

[1 - ui]
* give lqshow the ability to page a file
  (it can call up $PAGER if you use "v")
  Or, **, rewrite the curses-based front end altogether.

[2 - search,index]
@@ special treatment of dates

[3 - index]
** table of pagers for browsing by file/type
@  Make the table read from a file at run-time, and include filters
    [3b - presentation]
    ** The various Filter routines should be incorporated into lqshow and
       lqkwic.


[4 - implementation]
@@ Distributed, Multi-threaded Networked version
    * an RPC-based WID server
    **@@ make use of an RPC WID server...

[5 - retrieval]
** Better ranking of queries
   Start with lqrank, which already does some sorting.  The difficult
   thing here is deciding on what basis to do the ranking.
   E.g. docs containing the target phrases the most times come first?
   Or does the length of the document make a difference too?
   Probably has to be configurable.

[6 - doc]
**@@ write a manual
   I have started this in the doc directory.  My goal is to have
   * a user manual
   * an administration manual
   * a programmer's manual, documenting the API.
     The API reference manual is the most advanced, but an API guide is needed.

[7 - index]
**@ The entire plural code (Root.c) needs a rethink.
   I have started Plurals.c, but it's not ready yet.  Yell if you have any
   ideas, I need them! [although "this" is OK now]
   In particular, how to handle morphological analysis into lemmas in an
   internationalised application is a difficulty.
   TODO: allow user-defined stemmers (compiled-in) via config (README) file

[8 - index]
** allow dynamic definition of word start/mid/end, in README.
   Must be at least as fast as isupper() etc.
   Perhaps per-file-type rules, though?  No -- makes Phrase Matching hard or
   impossible.

[9 - index]
** Replace the common words file with three files:
   [1] a list of words not to be indexed
   [2] a list of phrases to index completely, even if some of the words
       in them occur in [1]
   [3] a list of phrases not to index at all, possibly with the ability
       to mark specific words.
   Then you can say
     don't index /the/ except in /the times/
     always index /our/ except in /Our Company Ltd/
   Tim Bray of Open Text says stop lists are a bug, and I think I agree with
   him, except you also have to say that the price of disk is a bug too.

[10 - ui]
** lqshow could be made a routine (BrowseList() I suppose) that takes
   a list of Phrases with their matches...
   Hard to integrate into X.

[11 - implement.]
**@ should abandon dbm for the list of filenames.  A better approach would
   be to store path components as words in the database!  This would make
   / a common-word, though.  Needs some thought.  Or maybe as blocks in data.
   That would be fairly easy.
   A btree might be a good comprimise.  For now, at least db-1.xx doesn't
   have overflow problems.

(12 - index)
**@@ the ability to delete a file.
   (done)

[13 - misc]
* README config file should be used more, allowing more configuration.

[13 - implementation]
** Better file locking
   (no file locking or signal handling at all at the moment -- I ripped it
    all out when I discovered that it was broken on many systems, and
    this gave a false sense of security.)
    Also, it was too slow, and gave console error messages on NFS!

(14 - retrieval)
* Phrase Matching would be orders of magnitude faster if it did not involve
    reading the tables of matches until they are needed, as many of them
    won't be!  It should extend the lists of matches for each word in the
    phrase only as necessary.
    Done.  It wasn't orders of maginitude in most cases, only a factor
    of two or so.  Could possibly be improved.

(15 - index)
** save small FIDs for large files.  This would save space in the index, as
    larger files have more words, and this their FIDs need to be stored more.
    (No, better to tell people to index big files first and let them retain
    control of the order in which results apprear without havinf to sort them.)

[16 - implementation]
@@ use mmap for data and widindex, in segments with a cache

(17 - index)
@ use sWriteNumber for WIDINBLOCK?
    (pointless, it's only for debugging)

[18 - implementation]
@ Add a WIDIndex cache!

[19 - implementation]
@ Close "chainend" on exit
    (done)
@ optionally remove chainend, with README line

[20 - implementation]
** Proper variable-based configuration, no global variables.
    Perhaps using the xhex code?
    (1.14: the global variables are all gone but no good config code yet)

(21 - index)
@@ 4-bit coding for when delta-block and WIB fit in 7 bits combined?
    (1.14: done)
    Or, could always use 4/4 bits, and set the top bit on each if continued,
    nead to measure what numbers occur.

[22 - index]
@@ Variable WIDBLOCKSIZE to reduce wastage?  How much is wasted?

(23 - implementation)
* cd/rom changes --
    [23a]  read-only database
    (done)
    [23b]  ms/dos-compatible filenames
    (done)

[24 - retrieval]
@@ find within fielded data
    e.g. "find within title"

[25 - retrieval]
* make lqsed handle overlapping matches, e.g. use only the longest.
  ?what to do with this?:
	aa bb cc dd ee ff gg
	  [1     ]1   
	      [2    ]2
  Probably turn it into
	aa bb cc dd ee ff gg
	  [1        ]2

[25 - ui]
@@ integrate a decent command-line option parser and combine it with the
    config file.  Preferably something like X defaults, except with a
    description and type for each of them.
    Symbol table to be added to t_Database.
    See also item 20.

[26 ?]
*@ query expansion

[27 ?]
** quorum ranking

[28 ?]
@@ statistical ranking ala SMART.
    see also item 5.

[29 ?]
@@ support structured documents
    see also item 24.

[30 ui]
@@ i18n ??  unicode??

[31 implementation]
@@ use an mmap() cache for LQT_ReadBlock()

[32 filters]
integrate filters with fundmatchends

[33 doc]
user guide

[34 doc]
update the man pages

[35 doc]
api reference/C
    (1.14: there's a fledgeling API reference, but it only documents
    functions, not data structures right now)

[36 ?]
* WAIS/Z39.50 support

[37 ui]
** provide a GUI, e.g. using Motif?  Ugh.
   (1.14: I've started with Motif.)

[38 filters]
    C
    ASCII
    man pages/troff (done!  but need work)
    SGML (can't do this, not allowed to)
    RTF?
    HTML (is this too close to SGML?)

[39 retrieval]
proximity searching

[40 ?]
queries by file (this work started in Phrase.c)

[41 ?]
indexing of compressed and archived files

[42 ui]
Udi Manber's agrep on vocabulary.
Or maybe soundex.

[43 retrieval]
complex queries

[44 ?]
lqgrep/lqegrep using preprocessor to reduce no. of files to search

[45 index]
unindex (this is done, but not yet well tested) (Aug. 1994)

[46 index/physical]
lqdbfsck?  Program to check a db is ok.  lqword -A > /dev/null is silly.

[47 ?]
lqsed-like version of lqshow, perhaps using a hacked less.

[48 ?]
lq enhancements:
    save { matches/files } with(out) titles to a file
	 { index         }

[49 ?]
ship faq script

[50 ?]
thesaurus

[51 ?]
generalise the variables used in lqkwic to a general-purpose facility
suitable for i18n, etc.
    (1.14: this is what t_NameSpace is for, but is not yet widely used)

[52 WWW]
* CGI script interface, see also item 36

[53]
** Reinstate CanonicalPhrase and OrigianlPhrase in StringToPhrase.
   Add accessor functions for them.

Known Bugs
==========
* lqshow does not know about file types!!!
* there is no troff (or sqtroff) file type
	-> fixed, but the filter is still buggy: you'll get the wrong
	word highlighted quite often, I'm afraid.
* the C filter got lost in history (sigh)
