Character encodings are named sets of numeric values for representing characters. For example, ISO 8859-1, also known as Latin-1, is the character encoding containing the letters and symbols used by most Western European languages. If your applications are sending and receiving messages that use only English language characters (that is, the ASCII character set), you do not need alter your programs to handle different character encodings. The TIBCO Enterprise Message Service server and application APIs automatically handle ASCII characters in messages.
Character sets become important when your application is handling messages that use non-ASCII characters (such as Japanese language). Also, clients encode messages by default as UTF-8. Some character encodings use only one byte to represent each character, but UTF-8 can potentially use two bytes to represent the same character. For example, the Latin-1 is a single-byte character encoding. If all strings in your messages contain only characters that appear in the Latin-1 encoding, you can potentially improve performance by specifying Latin-1 as the encoding for strings in the message.
TIBCO Enterprise Message Service clients can specify a variety of common character encodings for strings in messages. The character encoding for a message applies to strings that appear in any of the following places within a message:
The EMS client APIs (Java, .NET and C) include mechanisms for handling strings and specifying the character encoding used for all strings within a message. The following sections describe the implications of string character encoding for TIBCO Enterprise Message Service clients.
Each message contains the name of the character encoding used to encode strings within the message. This character encoding name is one of the canonical names for character encodings contained in the Java specification. You can obtain a list of canonical character encoding names from the following location:
Java and .NET clients use these canonical character encoding names when setting or retrieving the character encoding names. C clients have a list of macros that correspond to these canonical names. See the C API references for a list of supported character encodings in these interfaces.
When a client sends a message, the message stores the character encoding name used for strings in that message. Java clients represent strings using Unicode. A message created by a Java client that does not specify an encoding will use UTF-8 as the named encoding within the message. UTF-8 uses up to four bytes to represent each character, so a Java client can improve performance by explicitly using a single-byte character encoding, if possible.
Java clients can globally set the encoding to use with the setEncoding
method or the client can set the encoding for each message with the setMessageEncoding
method. For more information about these methods, see the TIBCO Enterprise Message Service Java API Reference.
Typically, C clients manipulate strings using the character encoding of the machine on which they are running. TIBCO Enterprise Message Service provides a character encoding library for C clients to determine the encoding in messages and convert strings to and from Unicode. C clients should explicitly set the character encoding they are using when they create and send a message. For more information, see TIBCO Enterprise Message Service C & COBOL API Reference.
Figure 9 illustrates TIBCO Enterprise Message Service clients sending messages encoded in UTF-8. Java clients use this encoding by default. C clients must explicitly set this encoding and convert strings from the local encoding to UTF-8 before sending the message.
Figure 10 illustrates clients explicitly setting the encoding of strings within a message to ISO-8859-1 (Latin-1). The client must set this encoding explicitly for the message, but there is no need to convert the strings this happens automatically. The C client’s local encoding is Latin-1, so there is no need to convert the strings. However, the C client must specify the encoding of the message before sending.
Each message stores the name of the character encoding the sender used. A message receiver can use this information to decode the strings in the message, if necessary.
Java automatically performs any necessary conversion and represents strings in Unicode. Java clients do not need to explicitly perform any operations to display strings stored in a message.
C clients must compare the encoding used for the message with the encoding of the local machine. If the encodings match, the C client can display the string without conversion. If the encodings do not match, the C client must use the tibconv
library functions to convert the string to the local encoding before the string can be displayed.
Figure 11 illustrates TIBCO Enterprise Message Service clients receiving messages. The Java client can receive the message and display the strings without any additional conversion. The C client must determine the encoding in the message, compare it to the encoding used on the local machine, and then perform conversion, if the encodings do not match.
TIBCO Enterprise Message Service™ User’s Guide Software Release 4.3, February 2006 Copyright © TIBCO Software Inc. All rights reserved www.tibco.com |