The etx_contains() operator

The etx_contains() operator executes a search that you define by using a clue, tuning parameters, and an optional statement local variable (SLV).

Syntax for COMPARISON condition

COMPARISON condition

1 (1) etx_contains  ( col_exp ,
2.1  %Quoted String(2)
2.1  %Row Exp(3)
1?  , slv
1 )
Notes:
Element Purpose Syntax
col_exp The name of the column you want to search. Syntax must conform to the column expression syntax in HCL OneDB™ Guide to SQL: Syntax.
slv A statement local variable (SLV) that the search engine uses to store the score and highlighting information of a particular row. Syntax must conform to the identifier segment; see HCL OneDB Guide to SQL: Syntax.

Syntax for quoted string

Quoted String

1  ‘
2.1 +  character
2.1  ‘ ‘
1 ‘
Element Purpose Restrictions Syntax
character A character that forms part of the quoted string When the quoted string is part of a Row() expression (see Syntax row expression usage), the following conventions apply:
  • Spaces become delimiters for keywords when performing a keyword, proximity, or Boolean search.
  • The character & corresponds to the Boolean operatorAND. The character | corresponds to the Boolean operator OR. The two characters ! and ^ both correspond to the Boolean operator NOT. If you want to search for these characters themselves, you can escape their function as Boolean operators by preceding them with a backslash.
Characters are literal values that you enter from the keyboard.

Syntax row expression usage

Row expression

1  ROW  (
2.1  %Quoted String(1)
2.1 IfxDocDesc
1  %Row expression options
1 )
Row expression options

1?  ,  ‘
2.1+ &
2.1  MATCH_SYNONYM?  = syn_name
2.1 PATTERN_ALL
2.1 PATTERN_BASIC
2.1 PATTERN_SUBS
2.1 PATTERN_TRANS
2.1  SEARCH_TYPE =
2.2.1 WORD
2.2.1 BOOLEAN_SEARCH
2.2.1  PROX_SEARCH ( limit )
2.2.1 PHRASE_EXACT
2.2.1 PHRASE_APPROX
2.1 CONSIDER_STOPWORDS
2.1  MAX_MATCHES = hits
2.1  WORD_SCORE = score
2.1 NO_HILITE
2.1 WILDCARD_CLUE(2)
1 ‘
Notes:
  • 1 See Syntax for quoted string
  • 2 Only one occurrence of each choice allowed per command invocation. However, multiple different choices can be used on the same command invocation.
Element Purpose Restrictions Syntax
hits Maximum number of hits, per index fragment, returned by a text search. The data type must be a positive integer. Syntax must conform to the Literal Number syntax in HCL OneDB Guide to SQL: Syntax.
limit Number of nonsearch words that can occur between two or more search words. The data type must be a nonzero integer greater than the number of search words. Syntax must conform to the Literal Number syntax in HCL OneDB Guide to SQL: Syntax.
score Minimum score for a word to be considered a pattern match. The data type must be a real number. Syntax must conform to the Literal Number syntax in HCL OneDB Guide to SQL: Syntax.
syn_name Synonym list name. The synonym list must exist. Syntax must conform to the Literal Number syntax in HCL OneDB Guide to SQL: Syntax.
IfxDocDesc The clue can be stored in an IfxDocDesc data type. Only the data in the location field is consulted when a search is executed. See The IfxDocDesc data type for more information about the IfxDocDesc row data type.

Tuning parameters

The following table lists the tuning parameters that you can use to guide the search engine when it performs a search.
Tuning parameter Description Restrictions Default
CONSIDER_STOPWORDS Indicates to the search engine that stopwords are to be included in the search. The etx index must have been created with the INCLUDE_STOPWORDS and STOPWORD_LIST index parameters. Disabled
MATCH_SYNONYM Enables synonym matching. If no value is specified, the default synonym list etx_thesaurus is consulted. If a value is specified, the specified synonym list is consulted instead of the default list. Follow the instructions given in Synonym lists to create a default or alternative list. If a custom synonym list is specified, it must have already been created through the etx_CreateSynWlst() routine. When used together with pattern matching, such as PATTERN_ALL, only pattern matches of root words are found, not of synonyms. Disabled
MAX_MATCHES Allows you to specify the maximum number of hits per index fragment returned by the search engine. If you want to use this tuning parameter to limit the number of rows returned from a search, set it to the larger of 1000 or 10% of the total number of rows in the table. Per index fragment means that if, for example, the etx index is fragmented into two parts and this tuning parameter is set to 1000, a possible maximum of 2000 hits might be returned. None All hits
NO_HILITE Allows you to speed up queries for which you want to return the score of the query results but you are not interested in highlighting the information. The query must include the etx_ReturnType data type, a complex row type that includes both the score for the document and a subsidiary row type with highlight information. Returning all this information can incur considerable overhead, but by specifying the NO_HILITE query parameter, the DataBlade® module skips all preparation of the highlighted data and returns just the score. Disabled
PATTERN_ALL Enables all the pattern search options: PATTERN_BASIC, PATTERN_TRANS, and PATTERN_SUBS. The etx index must have been created with the WORD_SUPPORT = PATTERN index parameter. Disabled
PATTERN_BASIC Enables the basic search option. The search returns the best pattern matches based on the value of WORD_SCORE. This might include words that are substring or superstring pattern matches of the words in the clue, transpositions, and substitutions, although it is not guaranteed. The etx index must have been created with the WORD_SUPPORT = PATTERN index parameter. Disabled
PATTERN_SUBS Indicates to the search engine that you want words returned that match the clue except for one character. The etx index must have been created with the WORD_SUPPORT = PATTERN index parameter. Disabled
PATTERN_TRANS Indicates to the search engine that you want words returned that match the clue except for a single transposition. The etx index must have been created with the WORD_SUPPORT = PATTERN index parameter. Disabled
SEARCH_TYPE Allows you to specify the type of search you want to perform.
  • To execute a keyword search, specify WORD.
  • To execute a Boolean search, specify BOOLEAN_SEARCH.
  • To execute a proximity search, specify PROX_SEARCH.
  • To execute an exact phrase search, specify PHRASE_EXACT.
  • To execute an approximate phrase search, specify PHRASE_APPROX.
One of the following values:
  • WORD
  • BOOLEAN_SEARCH
  • PROX_SEARCH (lim)
  • PHRASE_EXACT
  • PHRASE_APPROX

If set to PROX_SEARCH, PHRASE_EXACT, or PHRASE_APPROX, the etx index must have been created with the PHRASE_SUPPORT index parameter set to either MEDIUM or MAXIMUM.

If set to BOOLEAN_SEARCH and used for Boolean phrase searches, the etx index must have been created with the PHRASE_SUPPORT index parameter set to either MEDIUM or MAXIMUM.

WORD
WILDCARD_CLUE Provides an alternative to the PATTERN_* query parameters. When you include the WILDCARD_CLUE query parameter, the clue can have an * character at the beginning, end, or both of any or all words in the clue, and the * will be expanded to match any string of characters. WILDCARD_CLUE applies on a word-by-word basis within the clue. This means that after all the clue words have been expanded, the clue is treated as a keyword, a Boolean expression, or a phrase, depending on the other query parameters. An exact match of the literals in a clue when you use WILDCARD_CLUE results in a contribution of score 99 to the document score, regardless of whether the matched word has additional letters at the beginning or end. This means, for example, that the clue *worth* matches with a score of 99 all the words worth, worthy, and unworthiness. You cannot specify an * in the middle of a clue. You cannot combine the WILDCARD_CLUE query parameter with any of the PATTERN_* parameters in the set of query parameters for a given clue. If you want to execute a wildcard query for a clue that includes the character * itself as a literal, use the backslash to escape the * character.

The * must be in the character set for the DataBlade module to be able to search for it.

If you include the query parameter MATCH_SYNONYM in a wildcard query, the literals in the clue (excluding the leading or trailing *) are looked up in the synonym list.

If found, all synonyms have the wildcard expansion done for them and for the original literals in the clue.

If you created the index with a stopword list, the DataBlade module performs standard stopword processing for all clue words that result AFTER doing the wildcard expansion.

Disabled
WORD_SCORE Allows you to specify a minimum resemblance for pattern matches. The search engine counts as hits only words that meet the minimum standard set by WORD_SCORE. Must be a value from 1 through 100, inclusive. Specifying 0 indicates that you want to set the value back to the default, 70. 70

Return type

The etx_contains() operator returns BOOLEAN.

Usage

Use etx_contains() to execute a search on a document stored in a column of a table. You can use the etx_contains() operator only in the WHERE clause of an SQL statement. For example:
SELECT title FROM reports
    WHERE etx_contains (abstract, Row('multimedia')) 
    AND doc_no > 1005 ;
The etx_contains() operator has two required parameters: the name of the column containing text data that you want to search, and either a quoted clue or a Row() expression that contains the clue and optional tuning parameters. The clue can either be a quoted string or a document stored in an IfxDocDesc data type.
Important: The column that you want to search must have an etx index defined on it if you want to use etx_contains() in the WHERE clause.

The optional third parameter of etx_contains() is an SLV that returns scoring and internal highlighting information. The contents of the SLV are valid only for the life of the query. The data type of the SLV is etx_ReturnType, a row data type derived from HCL OneDB. For more information about the etx_ReturnType data type, see The etx_ReturnType data type.

Although the etx access method supports fragmented indexes, you cannot use the etx_contains() operator to fragment an index by expression.

If you do not specify any tuning parameters, the Row() constructor in the etx_contains() operator is optional. This means that the preceding example can also be specified as:
SELECT title FROM reports
    WHERE etx_contains (abstract, 'multimedia') 
    AND doc_no > 1005 ;

Typically, the clue is a quoted string of one or more words, such as the word multimedia in the preceding example. Sometimes, however, you might want to use an entire document as the clue. To do this, instead of specifying a quoted string as one of the parameters of the etx_contains() operator, specify an IfxDocDesc document.

Due to the flexibility of the LLD_Locator data type, the data type of the location field of the IfxDocDesc data type, you can specify as a clue either a document stored in the database or a document stored as a file on the operating system. Only the location field of the IfxDocDesc data type is consulted when a document is specified as a clue to the etx_contains() operator. The contents of the other fields, such as format and version, are ignored.

An example of a search that uses an IfxDocDesc document as a clue is shown in the next section.

For more information about the IfxDocDesc and LLD_Locator data types, see Data types.

Examples

The following statement searches for the specific word multimedia in the column abstract and includes other criteria in the WHERE clause:
SELECT title FROM reports
    WHERE etx_contains(abstract, Row('multimedia'))
    AND author = 'Joe Smith';
The following statement searches for either of the specific words multimedia or video in the column abstract, enables searching for letter transpositions and substitutions, and requests that synonyms from the list named my_synonymlist is to be included in the search:
SELECT title FROM reports
    WHERE etx_contains(abstract, 
    Row('multimedia video', 
        'PATTERN_TRANS & PATTERN_SUBS & MATCH_SYNONYM = my_synonymlist'));
The following statement calls for the rank of a text search to be materialized as the statement local variable rc1 and orders the returned rows by this rank:
SELECT rc1.score, title FROM reports
    WHERE etx_contains (abstract, 
       Row('video'), rc1 # etx_ReturnType)
    AND doc_no > 1005
    ORDER BY 1;
The following statement executes a search on the abstract column, but instead of specifying a quoted string as the clue, it specifies an IfxDocDesc document stored as a file on the operating system as the clue:
SELECT title FROM reports
    WHERE etx_contains (abstract,
    Row ( Row ('ASCII', '0',
          Row ('IFX_FILE', NULL::LLD_Lob,
          '/local0/excal/clue.txt')::LLD_Locator,
          NULL::LVARCHAR)::IfxDocDesc) );

The entire contents of the operating system file /local0/excal/clue.txt are automatically converted into the clue. Even though no tuning parameters are specified, the IfxDocDesc clue must still be encapsulated within a Row() constructor.

The following examples show how to use the NO_HILITE query parameter in the Row() specification of the etx_contains() operator:
select title from reports 
   where etx_contains(abstract,
   Row('dynamic server','search_type=PROX_SEARCH(3) & NO_HILITE'), 
      rc#Etx_ReturnType) order by rc.score;

select title from reports 
   where etx_contains(abstract,
   Row('build*','WILDCARD_CLUE & NO_HILITE'), rc#Etx_ReturnType);
The following example shows how to use the WILDCARD_CLUE query parameter. Consider this query with the following clue:
select title from reports
   where etx_contains(abstract,
   Row("*forgiv* drink*", "WILDCARD_CLUE &    search_type=PHRASE_EXACT"));

This query finds unforgivable drinking and forgiven drinkers. The query does not find unforgiving nondrinkers, since there is no * at the beginning of the clue word drink*.

The following example shows how to use the WILDCARD_CLUE query parameter with an index created with a stopword list. Assume that the following conditions exist:
  • The stopwords in the stopword list are the, and, a, and or.
  • Three rows in the table have the following text in the search column:
    • 1 “these are the best of times”
    • 2 “wherever you are, there you are”
    • 3 “over the rainbow”
  • You specify the following clue:
    select title from reports 
       where etx_contains(abstract, 
       Row("the*", "WILDCARD_CLUE & search_type=WORD"));

The query returns rows 1 and 2. This is because the * is expanded to these, there, and the. Then stopword processing is performed, which eliminates the from the search. This eliminates row 3 from the result.

For additional examples of the etx_contains() operator, see Concepts, and Tutorial.