Using character set translation

User's Guide
PART 1. Working with Databases
CHAPTER 12. Database Collations and International Languages

Using character set translation

This section outlines how different components of Adaptive Server Anywhere handle code pages, and how character set translation enables systems where different components have different character sets to function properly.

Character set translation can be carried out among character sets that represent the same characters, but at different values. There needs to be a degree of compatibility between the character sets for this to be possible. For example, character set translation between EUC-JIS and Shift-JIS character sets, but not between EUC-JIS and OEM code page 850.

To enable character-set translation, you must start the database server using the -ct command-line option. For example:

dbeng6 -ct asademo.db

Avoiding character-set translation

There is a performance cost associated with character set translation. If you can set up an environment such that no character set translation is required, then you do not have to pay this cost, and your setup is simpler to maintain.

If you work with a single-byte character set and are concerned only with seven-bit ASCII characters (values 1 through 127), then you do not need character set translation. Even if the code pages are different in the database and on the client operating system, they are compatible over this range of characters. Many English-language installations will meet these requirements.

If you do require use of extended characters, there are other steps you may be able to take:

If the code page on your client machine operating system matches that used in the database, no character set translation is needed for data in the database.

For example, in many environments it is appropriate to use the WinLatin1 collation in your database, which corresponds to the Windows NT code page in many environments.
If you are able to use a version of Adaptive Server Anywhere built for your language, and if you use the WinLatin1 code page on your operating system, no character set translation is needed for database messages. The character set used in the Adaptive Server Anywhere message strings is code page 1252, which corresponds to WinLatin1.

Also, recall that character set translation takes place only if the database server is started using the -ct command-line switch.

ODBC code page translation

Adaptive Server Anywhere provides an ODBC translation driver. This converts characters between OEM and ANSI code pages. It allows Windows applications using ANSI code pages to be compatible with databases that use OEM code pages in their collations.

Not needed if you use ANSI character sets
If you use an ANSI character set in your database, and are using ANSI character set applications, you do not need to use this translation driver.

The translation driver carries out a mapping between the OEM code page in use in the "DOS box" and the ANSI code page used in the Windows operating system. If your database uses the same code page as the OEM code page, the characters are translated properly. If your database does not use the same code page as your machine's OEM code page, you will still have compatibility problems.

Embedded SQL does not provide any such code page translation mechanism.

Sybase Central and Interactive SQL code page translation

Interactive SQL and Sybase Central both have OEM to ANSI internal code page translation if the database uses an OEM character set. As with the ODBC translation driver, there is an assumption that the OEM code page on the local machine is the same as that in the database.

For Interactive SQL, you can turn off the translation if you wish, by setting the Interactive SQL option CHAR_OEM_Translation to a value of OFF.

For Info For more information on OEM to ANSI character set translation in Interactive SQL, see CHAR_OEM_TRANSLATION option.

Character translation for database messages

Error and other messages from the database software are held in a language library. Localized versions of this library are provided with localized versions of Adaptive Server Anywhere. The messages for each language assume an ANSI code page.

Some database messages have placeholders that are filled in from the database. For example, if you execute a query with a column that does not exist, the returned error messages is:

Column column-name not found

where column-name is filled in from the database.

If the database collation uses a character set that is different from the Language DLL but compatible with it (such as EUC-JIS compared to Shift-JIS), then the database server automatically translates database messages into the database collation prior to sending to the client. If necessary, further character set translation from database to client is carried out in order to ensure that the message reaches the client in the appropriate character set.

Messages are always translated, if necessary, into the database collation character set, regardless of whether the -ct command-line option is used. The -ct command-line option affects only the character set conversion on the way to the client.

Connection strings and character sets

Connection strings present a special case for character set translation. The connection string is parsed by the client library, in order to locate or start a database server. This parsing is done with no knowledge of the server character set or language.

The connection string is parsed as follows:

It is broken down into its keyword = value components. This can be done independently of character set, as long as you do not use the { curly braces }around CommLinks parameters. Instead, use the recommended ( parentheses ).
The server is located. The server name must be constructed from lower page (seven-bit) ASCII characters.
The DatabaseName or DatabaseFile parameter is interpreted in the server character set.
Once the database is located, the remaining connection parameters are interpreted according to its character set.