User's Guide
PART 1. Working with Databases
CHAPTER 12. Database Collations and International Languages
This section provides an introduction to the issues you may face when working in an environment that uses more than one character set, or when using languages other than English.
Database users working at client applications may see or access text that comes from several possible sources:
Data in the database Strings and other text data are stored in the database. The database server processes these strings when responding to requests.
For example, the database server may be asked to supply all the last names beginning with a letter ordered less than N in a table. This request requires string comparisons to be carried out, and assumes a character set ordering.
Database server software messages Applications can cause database errors to be generated. For example, an application may submit a query that references a column that does not exist. In this case, the database server returns a warning or error message. This message is held in a language library, which is a DLL or shared library called by Adaptive Server Anywhere.
Client application The client application interface displays text, and internally the client application may process text.
Client software messages The client library uses the same language library as the database server to provide messages to the client application.
Operating system The client operating system has text displayed on its interface, and may also process text.
For a satisfactory working environment, all these sources of text must work together. Loosely speaking, they must all be working in the user's language, but what exactly is involved in this?
There are several distinct aspects to character storage and display by computer software:
Each piece of software works with a character set. A character set is a set of symbols, including letters, digits, spaces and other symbols.
To handle these characters, each piece of software employs a character set encoding, in which each character is mapped onto one or more bytes of information, typically represented as hexadecimal numbers.
Characters are printed or displayed on a screen using a font, which is a mapping between characters in the character set and their appearance.
Operating systems also use a keyboard mapping to map keys or key combinations on the keyboard to characters in the character set.
Any software that needs to sort characters (for example, list names alphabetically) uses a collation. A collation is a combination of a character encoding (a map between characters and hexadecimal numbers) and a sort order for the characters. There may be more than one sort order for each character set; for example, a case-sensitive order and a case-insensitive order.
The database server receives strings from client applications as streams of bytes. It associates these bytes with characters according to the collation specified when the database was created. If the data is held in an indexed column, the index is sorted according to the sort order of the collation.
It is up to the operating system of the computer on which the client application is running to handle the following aspects of character strings:
Which character is stored when a particular key on the keyboard is pressed.
What a character looks like on your computer screen.
What characters are available to the application.
What byte or combination of bytes is stored for each character.
If your character data consists of characters in the English alphabet, you probably do not need to use this chapter.
If you have an existing setup that does not display language problems, you probably do not need to read this chapter.
Adaptive Server Anywhere has support for all the language issues discussed in the previous sections.
Data in the database When you create a database, you can select a collation of your choice. By choosing a collation that is suitable for the data in your database, you ensure that query results and so on are sorted properly, and that string comparisons are executed properly.
For more information, see Choosing a database collation.
Message strings The Adaptive Server Anywhere language library holds messages in the language of the software. If you have purchased an English language version of Adaptive Server Anywhere, the messages are in English, and so on.
Character set translation Adaptive Server Anywhere can de configured to carry out character set translation. This means that message and query results are passed back to the client application in the character set requested by the client, even if this is different from the character set used in the database collation.
For more information, see Using character set translation.
Connection strings Connections strings are processed outside any database, and handling connection strings with proper attention to character sets is a separate topic.
The other sections in this chapter provide detailed descriptions of character set and language issues.