Supporting a mixed character-set deployment

If you are working in an environment in which different computers with different character sets connect to the same HCL Compass database set, or you have selected Compass UTF-8 (8-bit Unicode Transformation Format) data code page (65001) for your database set, then you must consider the issues of character representation outlined in this topic.

  • Configuring the local character set

    The local character set is the set of characters that can be entered or displayed in the command line shell of the client operating system. On the UNIX™ system, the local character set is controlled by the LANG environment variable. On Windows™, it is controlled by settings in the Regional and Language Options. For more information, see the online help for the Compass Administrator.

  • Selecting and setting the Compass data code page of the database set
  • Selecting and setting the vendor database character set of your databases

    The vendor database character set for Oracle is referred to as character set or charset; as code page or code set for IBM® DB2®, and as code page or collation for SQL Server.

  • Coding hooks and scripts to handle characters in the Compass data code page that might not be included in the local character set
  • Updating schemas and packages

In versions of HCL Compass earlier than 7.0, write operations were not allowed unless the local character set matched the Compass data code page. If the settings did not match, only read-only operations were allowed. Read-only mode was necessary because the Compass applications used the local character set of the client or Web server to write data to the database instead of the Compass data code page for the database set.

Beginning in version 7.0, HCL Compass software processes data in Unicode, and its applications use the Compass data code page to write to its databases. These applications can now connect to the Compass database in read/write mode even when the local character set does not match the Compass data code page.

Beginning in version 7.1.1, you can select a UTF-8 (Unicode) Compass data code page for Oracle or DB2 database sets. A UTF-8 data code page allows multilingual character storage in the Compass database. When you select UTF-8 as the data code page, you are working in a mixed character-set deployment unless the local code page of the operating system is also UTF-8, the latter of which is not an option on Windows systems.

This functionality allows great flexibility in a Compass environment that includes computers with different local character sets. However, scripts and hooks written for these environments must handle Compass character data that might not be included in the local character set. These scripts and hooks must be written to support Unicode to take full advantage of this capability in these environments.

The Designer includes a new setting: Unicode Aware. Hooks written for HCL Compass version 7.0 can specify whether characters in strings returned from Compass API calls must be in the local character set only (RETURN_STRING_LOCAL) or can be any Unicode character (RETURN_STRING_UNICODE). Also, new API functions are available to control the return string mode. In RETURN_STRING_LOCAL mode, an API call returns an exception if the return string includes characters that cannot be represented in the local character set. In RETURN_STRING_UNICODE, an API call returns all characters without error.

To ensure that hooks and scripts handle all possible data in a mixed character-set or UTF-8 deployment, you must set the mode to RETURN_STRING_UNICODE and properly handle the Unicode characters that might be returned. Setting the return string mode to RETURN_STRING_UNICODE is not sufficient; you must verify that your code can handle Unicode characters correctly. The guidelines listed below are helpful, but ultimately, you must use the appropriate Unicode programming techniques for the scripting language.

If you are upgrading to HCL Compass version 7.0, these changes have no impact on an existing schema if all local character sets in the environment match the Compass date code page, as was common in earlier versions. The default mode is RETURN_STRING_LOCAL, which allows hooks and scripts to continue to function.

If you are deploying version 7.0 into an environment in which local character sets do not match the Compass data code page, you must ensure that your scripts can process Unicode character data for Compass software, set the return mode for scripts to RETURN_STRING_UNICODE, and upgrade packages to version 7.0. For a list of the Compass packages that support Unicode, see Package return string mode . Scripts that do not handle Unicode can run, but an error is returned if the system attempts to return to the script any character data that is not included in the local character set. These scripts continue to work as long as the data that they process is restricted to the local character set of the client or Web server.

Table 1. Package return string mode
Package Return string mode
AMBaseActivity RETURN_STRING_LOCAL
AMStateTypes RETURN_STRING_LOCAL
AMWorkActivitySchedule RETURN_STRING_UNICODE
AnalysisStudioSetup - not to be included RETURN_STRING_LOCAL
ATStateTypes RETURN_STRING_UNICODE
Attachments RETURN_STRING_UNICODE
AuditTrail RETURN_STRING_UNICODE
BaseCMActivity RETURN_STRING_LOCAL
BTStateTypes RETURN_STRING_UNICODE
BuildTracking RETURN_STRING_UNICODE
CharacterSetValidation - not to be included RETURN_STRING_LOCAL
ClearCase® RETURN_STRING_LOCAL
ClearCaseUpgrade - not to be included RETURN_STRING_LOCAL
CQTM RETURN_STRING_UNICODE
CrossPlatformSCM RETURN_STRING_LOCAL
Customer RETURN_STRING_UNICODE
DefectTrackingSetup - not to be included RETURN_STRING_LOCAL
DeploymentTracking RETURN_STRING_UNICODE
DevelopmentStudioSetup - not to be included RETURN_STRING_LOCAL
DTStateTypes RETURN_STRING_UNICODE
EMail RETURN_STRING_UNICODE
EmailUpgrade - not to be included RETURN_STRING_LOCAL
EnhancementRequest RETURN_STRING_UNICODE
EnterpriseSetup - not to be included RETURN_STRING_LOCAL
eSignature RETURN_STRING_UNICODE
History RETURN_STRING_UNICODE
Notes® RETURN_STRING_UNICODE
PQC RETURN_STRING_LOCAL
Project RETURN_STRING_UNICODE
Repository RETURN_STRING_LOCAL
RepositoryUpgrade - not to be included RETURN_STRING_LOCAL
RequisitePro® RETURN_STRING_LOCAL
RequisiteProSupplement - not to be included RETURN_STRING_LOCAL
Resolution RETURN_STRING_UNICODE
TeamTest RETURN_STRING_LOCAL
TestStudioSetup - not to be included RETURN_STRING_LOCAL
TPM RETURN_STRING_UNICODE
UCMPolicyScripts RETURN_STRING_LOCAL
UnifiedChangeManagement RETURN_STRING_LOCAL
UnifiedChangeManagementSetup - not to be included RETURN_STRING_LOCAL
VisualSourceSafe RETURN_STRING_LOCAL
VisualSourceSafeUpgrade - not to be included RETURN_STRING_LOCAL
When developing an application that must handle mixed character-set deployments, you must address several considerations.
  • Return string mode

    HCL Compass software handles all data as Unicode characters. However, schema hooks (Perl and Visual Basic) and other HCL Compass API applications or integrations might not be coded to process Unicode characters. In version 7.0, a return string mode is available to handle this problem. Hook code can be set to Unicode Aware in the Designer script editor to indicate that the script runs in the RETURN_STRING_UNICODE return string mode. (To do so, select the Unicode Aware check box). Scripts can call the SetPerlReturnStringMode or SetBasicReturnStringMode method to set the return string mode to RETURN_STRING_UNICODE.

    The return string mode restricts (RETURN_STRING_LOCAL) or allows full (RETURN_STRING_UNICODE) character representation when strings are returned by the HCL Compass API for Perl or COM.

  • Unicode support in existing hook or script code

    It is a good practice to write hooks and scripts that can process Unicode characters. RETURN_STRING_LOCAL is provided as the default return string mode so that existing hooks and scripts for earlier versions of HCL Compass software can run without change. Over time, you should modify existing hooks and scripts to function in RETURN_STRING_UNICODE mode, even if you currently have no need for Unicode.

    Verify that your hook or script code can process any Unicode characters. Then mark the hook code as Unicode Aware in the Designer script editor or have the external script call the SetPerlReturnStringMode or SetBasicReturnStringMode method. The hook or script can then to be used in any HCL Compass environment. For example:
    1. A HCL Compass API script in Perl is running on a Windows Latin 1 (1252) local character-set system that connects to a HCL Compass database whose data code page is set to Shift-JIS (932).
    2. The script retrieves a field value that contains Japanese text. By default, the value is returned in the local character set of the computer that is running the Perl script (1252, in this example). Because the Shift-JIS (932) Japanese characters cannot be represented as a character in the Latin 1 code page, an exception is generated. To process this character, your application must be able to handle Unicode characters and set the return string mode to RETURN_STRING_UNICODE; the exception is not generated, and the script retrieves all the characters in the field value as Unicode characters.

    By default, an exception is thrown in step 2 when the HCL Compass API script returns with a string that includes characters outside the local character set. The exception prevents data corruption. After you review and confirm that the code can process Unicode characters, you can set the RETURN_STRING_UNICODE return string mode by using the HCL Compass API or in the script editor of the Designer. By making this change, in step 2 the HCL Compass API for Perl returns the field value string as UTF8 (UNICODE) if the string contains nonlocal character-set data, and the HCL Compass API for VBScript, Visual Basic or COM returns unrestricted Unicode characters. Characters that cannot be represented in the local character set can then be returned to the hook or script for processing as Unicode characters.

    Hooks and scripts must use RETURN_STRING_LOCAL if they perform an operation in their scripting language (Perl, VBScript, Visual Basic, or COM) that does not support processing characters that cannot be represented in the local character set. For example:
    • Using HCL Compass data in a Perl call that does not work with Perl UTF8 strings (such as some system calls)
    • Using HCL Compass data as a file or directory name (file and directory names must be in the local character set)
    • Writing HCL Compass data to a file without configuring the output file to support Unicode characters
    • Sending HCL Compass data to an integration that accepts only local character-set data

    In the RETURN_STRING_LOCAL mode, operations such as running queries can be performed and the query result sets can include Unicode characters. An exception is generated only if data is extracted from that result set by a HCL Compass API method and the characters returned from the API call are not in the local character set. For example, an integration or external application can operate on a change request if the data that is passed back to the integration contains only local character-set characters. The integration code must handle the exception thrown by a HCL Compass API method when the characters returned are not in the local character set. If the integration API is configured as RETURN_STRING_UNICODE, the exception is not thrown but the application must correctly handle any Unicode character that is returned. In both the RETURN_STRING_LOCAL and RETURN_STRING_UNICODE modes, exceptions are also returned to the calling integration or application if the application writes characters that cannot be represented in the HCL Compass data code page.

    For more information, see Setting the return string mode for hooks and scripts in the HCL Compass API Reference help.

  • Unicode support in packages and schemas

    Some packages or schemas are not designed to handle Unicode and nonlocal character-set data. The support that each script in each package offers is indicated in the Designer script editor (the Unicode Aware check box is selected). The DefectTracking and Common schemas support Unicode. However, any schema that includes a package that does not support Unicode characters cannot be used in a mixed character-set deployment. See Package return string mode .

    You can edit or add hooks that access package fields, and these hooks are considered part of the package. Those hooks inherit the default Unicode support from the package, but the Designer does not display this correct setting for the hook.

If the local character sets of all clients connected to a database set or clan match the data code page, you do not need to consider these issues. For more information about character representations and code page settings, see the online help for the HCL Compass Administrator.