lq-text: LQT_ReadWordFromStringPointer

API t_WordInfo * LQT_ReadWordFromStringPointer (db, Stringpp, Startp, Endp, Flags) t_LQTEXT_Database *db; char **Stringpp; char **Startp; CONST char *Endp; unsigned int Flags;

Purpose

Returns the next natural-language word from the given NUL-terminated string.

The definition of a word for the purpose of this routine is determined partly by the definitions for LQT_StartsWord, LQT_OnlyWithinWord and LQT_EndsWord in the header file <wordrules.h>, and partly on the configuration file in the database directory, where indexnumbers, minwordlength and maxwordlength may be set.

If the arguments are all null, the effect is to reset the routine ready to start a new string, and no useful value is returned in that case.

The given Flags argument may either be zero or any combination of LQT_READWORD_IGNORE_COMMON and LQT_READWORD_WILDCARDS, or'd together.

Characters are read from the string, incrementing *Stringpp as each byte is processed, until a recognised word is found. If the LQT_READWORD_IGNORE_COMMON flag was set in Flags, LQT_ReadWordFromStringPointer continues until either a word is found that has not been registered as being too common to index, or the end of the string is reached.

If Startp is not a NULL pointer, *Startp is set to point to the first character in the word that has been found in the given Stringpp (not to the malloc'd copy in the result).

If Endp is a NULL pointer, the string is considered to be terminated by the first zero byte reached; otherwise, Endp must point to the first character not in the string; normally, Endp would be set to point to the terminating NUL byte.

If the LQT_READWORD_WILDCARDS flag is set, the `Wild Card' characters * and ? are allowed within words. Such characters do not count as punctuation for the returned WordInfo flags.

Returns

the next WordInfo on success, or zero if there are no more words to read in the string.

Bugs

The interface to this routine is somewhat ugly, and may be changed in the next release with the addition of a Reset routine and a block offset counter.

Notes

All client programs and library routines which parse words use this routine or the companion LQT_ReadWordFromFileInfo routine. This is very important, because lq-text relies on word counts within each block of text to be the same on retrieval as they were on indexing, and if different routines parsed the data each time there would be a chance of discrepancies.