User-defined character sets

At times, the built-in character sets might be inadequate for your text documents or the types of searches you plan to perform. For example, you might want to index the hyphen character to be able to index and search for hyphenated words such as English-language. Since the three built-in character sets index only alphanumeric characters, you must create your own character set to index the hyphen character.

You must create a user-defined character set before you use it to create an etx index. You create and drop user-defined character sets with the routines etx_CreateCharSet() and etx_DropCharSet() provided by the DataBlade® module.

The etx_CreateCharSet() routine takes two parameters: the name of the new user-defined character set and the full path name of an operating system file that contains a description of the character set. The operating system file contains a 16 x 16 matrix of hexadecimal numbers that represent which characters are indexed. See ids_excal_144.html#ids_excal_144 when you create your matrix.

The following example shows how to execute the etx_CreateCharSet() routine to create a user-defined character set called my_charset from the description contained in the operating system file named /local0/excal/my_char_set_file:
EXECUTE PROCEDURE etx_CreateCharSet 
    ('my_charset', '/local0/excal/my_char_set_file');
The following example shows how to create an etx index that uses the user-defined character set my_charset by specifying it as an option to the CHAR_SET index parameter:
CREATE INDEX desc_idx2 ON videos (description etx_clob_ops)
    USING etx (WORD_SUPPORT = 'PATTERN', CHAR_SET = 'my_charset') 
    IN sbsp1;