Character sets

When you create an etx index, you specify the character set the search engine uses by setting the CHAR_SET index parameter. The value you specify determines which characters in your text data are indexed and which are treated as white space.

HCL OneDB™ provides three built-in character sets: ASCII, ISO, and OVERLAP_ISO. You can also define your own character sets if these character sets are inadequate for your particular text.

You cannot change the setting of the CHAR_SET parameter after you create the index unless you first drop and then recreate the index. If you do not specify a setting for CHAR_SET, the text search engine uses the ASCII character set by default.

This section contains a complete description of the following three built-in character sets:
  • ASCII
  • ISO
  • OVERLAP_ISO

The last section of this section contains a 16 x 16 mapping of all the characters in the ISO 8859-1 character set. You might want to use this mapping as a reference when you define your own character set.