Multibyte characters

A multibyte character can hold code-point values greater than 255. One multibyte character can range 2 - 4 bytes in length.

Asian code sets are multibyte code sets; they contain both single-byte and multibyte characters.

If your application processes multibyte characters, it can no longer make the same assumption as for single-byte characters. The number of bytes in a buffer no longer equals the number of characters in the buffer. Because of the potential of varying number of bytes for each character, you can no longer rely on the C compiler to perform the following operations correctly:
  • Allocate space for a multibyte-character string
  • Traverse a multibyte-character string
  • Find the beginning of the nth character in a multibyte-character string

Your application cannot use the built-in scaling of the C compiler for multibyte-character strings, but it can use the macros and functions of the library to perform these operations on multibyte-character strings. To process a multibyte character, you cannot pass the entire character to a function. You must pass a pointer to the beginning of the character so that the called function can access the remaining bytes of the character.

For a list of operations that the functions of the library can perform on multibyte characters see Character operations. For a list of operations that the functions of the library can perform on multibyte-character strings, see String operations.

One single-byte assumption can still be applied to multibyte-character strings: no multibyte character has the null byte (0x000) as its second, third, or fourth byte. Therefore, if code is checking for only the single-byte ASCII null character, that code does not need to change to handle multibyte characters. This null character is also the null terminator in a multibyte character.

The names of most functions that handle multibyte characters start with one of the following strings:
ifx_gl_mb
Handles a multibyte character
ifx_gl_mbs
Handles a multibyte-character string
For example, ifx_gl_mblen() determines the length of a multibyte character, and ifx_gl_mbslen() determines the length of a multibyte-character string.