A Text Retrieval Package for the Unix Operating System

Liam R. E. Quin

SoftQuad Inc. (lee at sq.com)

Note: the author of this paper has moved to liamquin at interlog dot com


References

Bent88
Bentley, Jon, Little Languages, in More Programming Pearls, Addison-Wesley, 1988. A clearly-written rationale for the use of little (or embedded) languages. This column first appeared in Comm. ACM in August 1986.


Bray89
Bray, Tim, Lessons of the New Oxford English Dictionary Project, Usenix, Winter, 1989, pp. 137-199

Burk92
Burkowski, Forbes J., An algebra for hierarchically organized text­dominated databases, 1992, in Information Processing & Management 28 No. 3, pp. 333

Clev66
Cleverdon, C. W., Mills, J., and Keen, E.M., Factors Determining the Performance of Indexing Systems, Volume 1 - Design, Aslib Cranfield Research Project, Cranfield, 1966

Cutt93
Cutting, Douglas R., Karger, David R., and Pedersen, Jan O., Constant Interaction­Time Scatter/Gather Browsing of Very Large Document Collections, in Proc. 16th ACM SIGIR, pp. 126-131, 1993

One of a number of papers reporting work at Xerox Parc on information retrieval


Falo85
Faloutsos, Christos, Access Methods for Text, in Computing Surveys 17, 1, pp. 49-74, March 1985

Compares text retrieval methods for office systems


Falo87a
Faloutsos, Christos and Christodoulakis, Stavros, ``Optimal Signature Extraction and Information Loss'', in ACM Trans. on Database Systems 12, 3, pp. 395-428, Sept. 1987

Falo87b
Faloutsos, Christos and Christodoulakis, Stavros, ``Description and Performance Analysis of Signature File Methods'', in ACM Trans. on Office Systems 5, 3, July 1987

A good overview of signatures.


Fawc89
Fawcett, Heather, PAT User's Guide, Open Text, 1989

Fox93
Fox, Edward A., France, Robert K., Sahle, Eskinder, Daoud, Amjad, and Cutter, Ben, ``Development of a Modern OPAC: From REVTOLC to MARIAN'', TR 93-06, Virginia Polytechnic Institute and State University, 1993. A client­server Online Punlic Access Catalogue for a library, using the NeXTStep GUI.

Frak92
Frakes, William B. and Baeza-Yates, Ricardo, Information Retrieval: Data Structures and Algorithms, Prentice-Hall, 1992.

An excellent introduction to the issues in implementing information retrieval systems. Examples in C for Unix, available by ftp from ftp://ftp.vt.edu/pub/reuse/ir-code.


Golo93
Golovchinsky, G. and Chignell, M.H., ``Queries­R­Links: Graphical Markup for Text Navigation'', in Proceedings of INTERCHI '93, Amsterdam, pp. 454-460, April 1993, ACM Press., N.Y.
Presents a conceptually simple way for users to add and subtract terms from text retrieval queries, and raises issues about the trade­offs between pre­determined hypertext links and live text retrieval queries.


Harm93
Harman, Donna, ``Overview of the First TREC Conference'', Annual ACM SIGIR Conf., 16, pp. 36, 1993.

At the SIGIR conference in 1993, some of the TREC participants reported that they had had difficulties using similarity techniques on long documents.


Hutt94
Hutton, Scott, Computing Information for Indiana University Users, 1994, formerly at http://scww.ucs.indiana.edu/kb/search.html

Knut81
Knuth, Donald, The Art of Computer Programming, Vol III: Sorting and Searching, Addison-Wesley, 1981.

Lesk78
Lesk, M. E., ``Some Applications of Inverted Indexes on the Unix System'', in V7 Unix Programmers' Manual, Vol 2A, Bell Laboratories, 1978

Litt94
Littman, Dan, ``AppleSearch 1.0'', Macworld, May 1994.
A review of Apple's `easy to administer, easy to use' text retrieval software. Mentions that `the indexing process required more than double the disk space of the original documents'.


Mand53
Mandelbrot, Benoit, ``An informational theory of the statistical structure of language'', in Communication Theory, Ed. Willis Jackson, Butterworths, 1953, pp. 486-502

McKu83
McKusick, Marshall Kirk, Joy, William N., Leffler, Samuel J., and Fabry, Robert S., ``A Fast File System for Unix'', CSRG Technical Report 83-147, 1983

Mead92
Meadow, Charles T., Text information Retrieval Systems, Academic Press, Toronto, 1992.

Gives clear descriptions of full­text retrieval data structures and algorithms, although with a bias towards indexing only abstracts of books or of library catalogue entries.


Orac92
Oracle Corporation, SQL*TextRetrieval Version 2 Technical Overview, 1992

Royd93
Roydhouse, Aaron, Miller, Linton, Jones, Eric K., and McGregor, James, The Design and Implementation of MetVUW Workbench Version 1.0, CS-TR-93/7, 1993

Describes a multi­media meteorological database that uses lq-text to provide text searching.


Salt88
Salton, Gerald, Automatic Text Processing, Addison-Wesley, 1988

Selt91
Seltzer, Margo and Yigit, Ozan, A New Hashing Package for Unix, Usenix '91, Dallas, TX, 1991

Stan88
(ISO), International Organization for Standardization, Information Processing - Text and office sytems - Standard Generalized Markup Language (SGML), ISO8879, 1988

Tore87
Torek, Chris, Re: dbm.a and ndbm.a archives, netnews comp.unix newsgroup, 1987.

Tsuc91
Tsuchiya, Paul F., Bellcore, 1991, A Search Algorithm for Table Entries with Non­contiguous Wildcarding, Bellcore, 1991.

Unpublished(?) description of Cecelia, a package using in­memory Patricia trees with efficient update and deletion.


Yigi89
Yigit, Ozan, How to roll your own dbm/ndbm, Unpublished Manuscript, 1989

Zade78
Zadeh, L. A., ``PRIF - a meaning representation language for natural languages'', in Int. J. Man-Machine Studies, 10, pp. 395-460, 1978.

One of many of L. A. Zadeh's papers arguing for modeling the `pervasive imprecision of natural languages' (p. 396).

Zimm91
Zimmerman, Mark, Zbrowsr implementation, 1991

Article in para mailing list, unpublished.


Zipf49
Zipf, Georke K., Human Behaviour and the Principle of Least Effort, Addison-Wesley, Cambridge, MA., USA, 1949.



Next   Top