Glossary

access method
etx is an example of a secondary access method.
approximate phrase search
A search in which the search text must contain a phrase identical to the clue, or one or more words of the clue. The order of the words in the clue is not important. For example, if the clue is buy three dolls, the search engine returns documents that contain the exact phrase as well as those that contain the phrases three dolls, buy dolls, or dolls buy.
BLOB
See also smart large object.
Boolean search
A search that uses the Boolean expressions (logical operators) & (AND), | (OR), or ! and ^ (NOT). Use the & Boolean operator when you want to search for documents that contain all the words in a keyword list; use | when you want to search for documents that contain at least one word in the list; and use ! or ^ when you want to search for documents that do not contain a specified word. The Boolean operators can be combined to make more complicated expressions. This type of search is activated by setting the SEARCH_TYPE tuning parameter to BOOLEAN_SEARCH.
CLOB
A smart large object data type that stores blocks of text items, such as ASCII or PostScript™ files.

See also smart large object.

clue
The data that you are searching for, specified as the second argument to the etx_contains() operator.
document score
A value that the text search engine assigns to each of the returned rows of a fuzzy search that specifies the degree of similarity between your clue and each of the returned rows. Scores vary 0 - 100, with 0 indicating no match and 100 indicating a perfect match. You access scoring information through the third parameter of the etx_contains() operator, a statement local variable (SLV). The data type of the SLV is etx_ReturnType, a row type defined by HCL OneDB™ that consists of three fields. The scoring information is contained in the score field.
filtering
A component of the that automatically filters out all proprietary formatting information from a formatted document and converts it into ASCII form.
exact phrase search
A search for text that matches your clue exactly. An exact phrase search is successful when the text search engine finds a phrase that contains all the words in the clue in the exact order that you specify.
fuzzy search
A search for text that matches your clue approximately instead of exactly. A fuzzy search takes into account substitutions, transpositions, and basic pattern matching. A search that returns a document that contains the word editer when searching for editor is an example of a fuzzy search.
hit
The result (a row) of a text search.
hitlist
A list of hits (rows).
highlighting
The process of retrieving the location of every instance of a clue in the search text. The returns highlighting information in the form of ordered pairs of integers that describe the location and length of all occurrences of the clue in the corresponding document.
index parameter
A variable that you use to specify the characteristics of an etx index to support the searches you plan to perform. An example of an index parameter is WORD_SUPPORT='EXACT'.
keyword
Any contiguous group of characters found in the search text or clue, delimited by nonindexable characters such as spaces or tabs.
keyword search
A search in which the words in the clue are treated as separate entities (keywords) instead of a single unit (phrase). When the text search engine performs a keyword search, it returns a row whenever it encounters one or more of the words in your clue.
operator class
The set of operators that the database server associates with a secondary access method. When an index is created, it is associated with a particular operator class.
pattern search
See fuzzy search.
phrase search
A search in which the words in the clue are treated as a single unit (phrase) instead of separate entities (keywords). The two types of phrase searches are exact and approximate.
proximity search
A search in which you specify the number of nonsearch words that can occur between two or more of the search words. You use a proximity search if, for example, you are searching for a phrase that contains the words editor and multimedia but do not want the two keywords separated by more than four nonsearch words. This type of search is activated by setting the tuning parameter SEARCH_TYPE equal to PROX_SEARCH.
rank
The order given to a hitlist based on the score of each of the returned rows.
root word
The word in a synonym list that has one or more synonyms defined for it. It is the leftmost word of a single row of the synonym list. When synonym matching is activated, the keyword being searched for must be a root word for its synonym to be returned instead.
row data type
A complex data type consisting of a group of ordered data elements (fields) of the same or different data types. The fields of a row type can be of any supported built-in or extended data type, including complex data types, except SERIAL, SERIAL8, and BIGSERIAL and, in certain situations, TEXT and BYTE.

There are two kinds of row data types:

  • Named row types, created with the CREATE ROW TYPE statement
  • Unnamed row types, created with the ROW constructor
sbspace
A logical storage area that contains one or more chunks that store only smart large object data.
score
See document score, word score.
search string
See clue.
search text
The data that is to be searched, stored in a column of a table.
SLV
Abbreviation for statement local variable.
smart large object
A large object that:
  • is stored in an sbspace, a logical storage area that contains one or more chunks.
  • has read, write, and seek properties similar to a UNIX™ file.
  • is recoverable.
  • obeys transaction isolation modes.
  • can be retrieved in segments by an application.

Smart large objects include CLOB and BLOB data types.

statement local variable (SLV)
Variable for storing a value that a function returns indirectly, through a pointer, in addition to the value that the function returns directly. A scope of an SVL is limited to the statement in which it is used. The third optional parameter of the etx_contains() operator is an SLV that holds scoring and highlighting information. The data type of the SLV is etx_ReturnType.
stopword
A keyword that you want excluded from your index or your search. Stopwords are typically common words such as and, or, the, and to, or any word that appears frequently in your document that you want to exclude.
substitution
A misspelling of a word, in which one letter has been substituted by another, incorrect one. Misspelling searck for search is an example of a substitution.
synonym
One of two or more words or expressions that have the same or nearly the same meaning in some or all senses. The word java is a synonym of the word coffee.
text search engine
The component of the that calls the Text Retrieval Library (TRL) of Excalibur Technologies to perform a search. The TRL is a library of C-language object modules designed to perform fast retrieval and automatic indexing of text data. The text search engine is dynamically linked into HCL OneDB whenever a text search is performed or text data is indexed.
transposition
A misspelling of a word in which two adjacent letters switch positions. Misspelling saerch for search is an example of a transposition.
tuning parameter
A variable used to guide the way the text search engine conducts a search. Tuning parameters are passed to the text search engine through the second parameter of the Row() constructor of the etx_contains() operator. An example of a tuning parameter is SEARCH_TYPE = WORD.
word score
The search engine uses fuzzy logic to determine whether a pattern match is to be considered a hit. It assigns a word score to candidate matches based on its internal rules. By default, only words that match your search clue by a relative measure of 70 out of 100—that have a word score of 70 or better—are considered hits.