Fragment multibyte strings

Sometimes you need to fragment a long character string into two or more nonadjacent buffers to meet the memory-management requirements of their components. When you fragment a character string that contains just single-byte characters, you can fragment at an arbitrary byte location in the string. Because each character is one byte long, the fragmented results are still each a complete character string.

However, to fragment a string that might contain even one multibyte character, you must take special measures. If you fragment at an arbitrary byte location in a multibyte-character string, you might fragment at a byte that is part of a multibyte character. In this case, one fragment might end with the first 1, 2, or 3 bytes of a character, while the next fragment starts with the remaining byte or bytes.

If the only thing that you ever do with these fragments is to concatenate them back together to form one string, you do not need to perform any special processing. However, if you need to traverse the fragments as multibyte strings, these fragments might cause an attempt to read beyond the end of one fragment or an illegal character at the beginning of the next fragment.

Therefore, all functions that traverse one multibyte character or a length-terminated multibyte-character string set the error number to IFX_GL_EINVAL if they detect an otherwise valid character at the end of a fragment.

Important: The functions cannot detect that the beginning of a fragment contains the remaining bytes of the last character in some previous fragment because they cannot look at the previous fragment first. Therefore, they might interpret the last 1, 2, or 3 bytes of a multibyte character as a valid character.
If you know that no fragmentation has occurred on the string, you can consider the IFX_GL_EINVAL error the same as IFX_GL_EILSEQ. However, if fragmentation might have occurred, IFX_GL_EINVAL indicates that you need to fragment the string so that each fragment is a complete string. Depending upon your application, you might take one of the following actions:
  • Make a fragment even shorter than originally intended.
  • Replace the first 1, 2, or 3 bytes of the fragmented character with a padding character that is appropriate for your application and shift these bytes to the beginning of the next fragment.
Important: Even though the library functions can detect invalid characters after fragmentation has occurred, it is much better to avoid the situation.