Supported vendor database character sets

Each HCL Compass data code page has a corresponding character set for each supported vendor database: Oracle, DB2®, and SQL Server.

To help you choose an appropriate character set for your vendor database, Table 1 lists the supported Compass data code page values and their corresponding vendor database character set values.

HCL Compass supports the UTF-8 (8-bit Unicode Transformation Format) code page 65001. This support is limited to creating new database sets. Compass does not support converting an existing database set to the UTF-8 code page.

Compass supports the UTF-8 (8-bit Unicode Transformation Format) code page 65001.

For instructions about how to set the character set for your vendor database, see the vendor database documentation.

Table 1. Supported HCL Compass data code pages and corresponding vendor database character sets
HCL Compass data code page Oracle character set DB2 code set SQL Server collation
932 (Japanese) JA16SJISTILDE - See Code page 932 (Japanese) on Oracle IBM-943 (943) - See Code page 932 (Japanese) on DB2 Japanese_*
936 (Simplified Chinese) ZHS16GBK Limited support. See Code page 936 (Simplified Chinese) on Oracle GBK (1386) Chinese_PRC_*
949 (Korean) KO16MSWIN949 1363 Korean_Wangsung_*
950 (Traditional Chinese) ZHT16MSWIN950 big5 (950) Chinese_Taiwan_Bopomofo_*
1250 (Eastern Europe) EE8MSWIN1250 1250 Romanian_*
1251 (Cyrillic) CL8MSWIN1251 1251 Cyrillic_General_*
1252 (Western Europe) WE8MSWIN1252 1252 Latin1_General_*
1253 (Greek) EL8MSWIN1253 1253 Greek_*
1254 (Turkish) TR8MSWIN1254 1254 Turkish_*
1255 (Hebrew) IW8MSWIN1255 1255 Hebrew_*
1257 (Baltic) BLT8MSWIN1257 1257 Estonian_*
20127 (ASCII) Any Any Any
60932 (Safe Shift-JIS) JA16EUC eucJP (954) N/A
65001 (UTF-8) AL32UTF8 See code page 65001 (UTF-8) UTF-8 (1208) See code page 65001 (UTF-8) N/A
Note: For Microsoft™ Access databases, you do not need to set the vendor database code page.

Code page 932 (Japanese) on Oracle

JA16SJISTILDE is the recommended vendor database character set 932 for Japanese SJIS data on Oracle. This is a change from the recommendation of JA16SJIS for versions of HCL Compass earlier than 7.0. The character sets JA16SJIS and JA16SJISTILDE are the same except for the way that the wave dash and the tilde are mapped to and from Unicode. Because HCL Compass versions 7.0 and later use Unicode to communicate with the database, it is necessary to use the JA16SJISTILDE character set. For information on how to convert an existing Oracle database from JA16SJIS to JA16SJISTILDE, see the Oracle documentation.

Code page 932 (Japanese) on DB2

IBM-943 is the recommended code set for Japanese SJIS data on DB2. You must configure the database management system to use the conversion table that is compatible with the Microsoft definition of code page 932. If this alternate character set is not used, you cannot set the Compass data code page to 932 for new schemas. Also, if you do not convert an existing DB2 database set to use the alternate conversion table, some characters in the 932 character set will be corrupted. See Alternative Unicode conversion table for CCSID 943.

Code page 936 (Simplified Chinese) on Oracle

Compass has a limitation when configured to use code page 936 on Oracle. Oracle does not supply a character set that corresponds exactly to Microsoft code page 936. The closest match is the ZHS16GBK character set, which excludes the euro character (U+20AC). You can configure your Oracle database to use ZHS16GBK with Compass. However, doing so presents several limitations:
  • If you are using the installutil setdbcodepage command, then you must use the allowconversion option. This command lets you set the Compass data code page value to 936 even though the validation for the euro character fails.
  • You cannot use the euro character in your data. If you use this character, it is stored as a replacement character in the database, effectively corrupting it.
  • If your deployment uses HCL Compass MultiSite, use Oracle databases that are configured identically with ZHS16GBK for every database in the clan. If you mix vendor databases throughout the clan and a euro character is entered, data divergence occurs because databases other than Oracle can store the euro, while Oracle databases store the euro as a replacement character.

Code page 65001 (UTF-8) on Oracle and DB2

Compass provides multilingual character entry into a code page 65001 database set. UTF-8 is one of several possible Unicode character encodings. UTF-8 encoding is a multibyte character set (MBCS) encoding that can take from one to three bytes to store one Unicode character for the languages that Compass supports. This presents several limitations:
  • Code page 65001 is not supported for the SQL Server database because SQL Server does not provide support for UTF-8 character encoding.
  • The maximum character-string length is reduced for many MBCS code pages. Code page 65001 (UTF-8) might reduce the number of characters stored in a string to one third as compared with an ASCII character string. The reduction depends on the mixture of one-byte, two-byte, or three-byte characters that are stored in the string. (Compass also supports double-byte character sets [DBCS]. With DBCS code pages, a reduction of up to half is possible as compared with an ASCII character string.)
  • Compass does not support converting an existing Compass database set to use the new 65001 code page.
  • Compass does not support bidirectional or complex script languages to use with the 65001 code page.