A Text Retrieval Package for the Unix Operating System
Liam R. E. Quin
SoftQuad Inc. (lee at sq.com)
Note: the author of this paper has moved to liamquin at interlog dot com
- Bentley, Jon, Little Languages, in
More Programming Pearls, Addison-Wesley, 1988.
A clearly-written rationale for the use of little (or embedded) languages. This column first appeared in Comm. ACM in August 1986.
- Bray, Tim, Lessons of the New Oxford English Dictionary Project, Usenix, Winter, 1989, pp. 137-199
- Burkowski, Forbes J., An algebra for hierarchically organized textdominated databases, 1992, in Information Processing & Management 28 No. 3, pp. 333
- Cleverdon, C. W., Mills, J., and Keen, E.M., Factors Determining the Performance of Indexing Systems, Volume 1 - Design, Aslib Cranfield Research Project, Cranfield, 1966
- Cutting, Douglas R., Karger, David R., and Pedersen, Jan O., Constant InteractionTime Scatter/Gather Browsing of Very Large Document Collections, in Proc. 16th ACM SIGIR, pp. 126-131, 1993
One of a number of papers reporting work at Xerox Parc on information retrieval
- Faloutsos, Christos, Access Methods for Text, in Computing Surveys 17, 1, pp. 49-74, March 1985
Compares text retrieval methods for office systems
- Faloutsos, Christos and Christodoulakis, Stavros, ``Optimal Signature Extraction and Information Loss'', in ACM Trans. on Database Systems 12, 3, pp. 395-428, Sept. 1987
- Faloutsos, Christos and Christodoulakis, Stavros, ``Description and Performance Analysis of Signature File Methods'', in ACM Trans. on Office Systems 5, 3, July 1987
A good overview of signatures.
- Fawcett, Heather, PAT User's Guide, Open Text, 1989
- Fox, Edward A., France, Robert K., Sahle, Eskinder, Daoud, Amjad, and Cutter, Ben, ``Development of a Modern OPAC: From REVTOLC to MARIAN'', TR 93-06, Virginia Polytechnic Institute and State University, 1993. A clientserver Online Punlic Access Catalogue for a library, using the NeXTStep GUI.
- Frakes, William B. and Baeza-Yates, Ricardo, Information Retrieval: Data Structures and Algorithms, Prentice-Hall, 1992.
An excellent introduction to the issues in implementing information retrieval systems. Examples in C for Unix, available by ftp from ftp://ftp.vt.edu/pub/reuse/ir-code.
- Golovchinsky, G. and Chignell, M.H., ``QueriesRLinks: Graphical Markup for Text Navigation'', in Proceedings of INTERCHI '93, Amsterdam, pp. 454-460, April 1993, ACM Press., N.Y.
Presents a conceptually simple way for users to add and subtract terms from text retrieval queries, and raises issues about the tradeoffs between predetermined hypertext links and live text retrieval queries.
- Harman, Donna, ``Overview of the First TREC Conference'', Annual ACM SIGIR Conf., 16, pp. 36, 1993.
At the SIGIR conference in 1993, some of the TREC participants reported that they had had difficulties using similarity techniques on long documents.
- Hutton, Scott, Computing Information for Indiana University Users, 1994, formerly at http://scww.ucs.indiana.edu/kb/search.html
- Knuth, Donald, The Art of Computer Programming, Vol III: Sorting and Searching, Addison-Wesley, 1981.
- Lesk, M. E., ``Some Applications of Inverted Indexes on the Unix System'', in V7 Unix Programmers' Manual, Vol 2A, Bell Laboratories, 1978
- Littman, Dan, ``AppleSearch 1.0'', Macworld, May 1994.
A review of Apple's `easy to administer, easy to use' text retrieval software. Mentions that `the indexing process required more than double the disk space of the original documents'.
- Mandelbrot, Benoit, ``An informational theory of the statistical structure of language'', in Communication Theory, Ed. Willis Jackson, Butterworths, 1953, pp. 486-502
- McKusick, Marshall Kirk, Joy, William N., Leffler, Samuel J., and Fabry, Robert S., ``A Fast File System for Unix'', CSRG Technical Report 83-147, 1983
- Meadow, Charles T., Text information Retrieval Systems, Academic Press, Toronto, 1992.
Gives clear descriptions of fulltext retrieval data structures and algorithms, although with a bias towards indexing only abstracts of books or of library catalogue entries.
- Oracle Corporation, SQL*TextRetrieval Version 2 Technical Overview, 1992
- Roydhouse, Aaron, Miller, Linton, Jones, Eric K., and McGregor, James,
The Design and Implementation of MetVUW Workbench Version 1.0, CS-TR-93/7, 1993
Describes a multimedia meteorological database that uses lq-text to provide text searching.
- Salton, Gerald, Automatic Text Processing, Addison-Wesley, 1988
- Seltzer, Margo and Yigit, Ozan, A New Hashing Package for Unix, Usenix '91, Dallas, TX, 1991
- (ISO), International Organization for Standardization, Information Processing - Text and office sytems - Standard Generalized Markup Language (SGML), ISO8879, 1988
- Torek, Chris, Re: dbm.a and ndbm.a archives, netnews comp.unix newsgroup, 1987.
- Tsuchiya, Paul F., Bellcore, 1991, A Search Algorithm for Table Entries with Noncontiguous Wildcarding, Bellcore, 1991.
Unpublished(?) description of Cecelia, a package using inmemory Patricia trees with efficient update and deletion.
- Yigit, Ozan, How to roll your own dbm/ndbm, Unpublished Manuscript, 1989
- Zadeh, L. A., ``PRIF - a meaning representation language for natural languages'', in Int. J. Man-Machine Studies, 10, pp. 395-460, 1978.
One of many of L. A. Zadeh's papers arguing for modeling the `pervasive imprecision of natural languages' (p. 396).
- Zimmerman, Mark, Zbrowsr implementation, 1991
Article in para mailing list, unpublished.
- Zipf, Georke K., Human Behaviour and the Principle of Least Effort, Addison-Wesley, Cambridge, MA., USA, 1949.