User's Guide
PART 1. Working with Databases
CHAPTER 12. Database Collations and International Languages
This section describes in more detail how Adaptive Server Anywhere carries out character set translation. Most users do not need this information. It is provided for advanced users, such as those who may be deploying applications or databases in a multi-character-set environment.
The database server and the client library recognize their language and character set environment using a locale definition. The application local is used by the client library when making requests, to determine which character set it wants results in. The server compares its own locale with the application locale to determine whether character set translation is needed. The locale consists of the following components:
Language The language is a two-character string: FR for French and so on.
Character set The character set is the code page in use. It uses a Sybase identifier.
Collation label The collation label is the Adaptive Server Anywhere collation.
Each of these is discussed in the following sections.
The language is a two-letter identifier; EN specifies English, JA specifies Japanese, and so on. It is used to determine the following behavior:
Which language library to load by the database server and by the client library.
For the client, which language to request from the database.
The way the language component of the locale is determined depends on the operating system:
If a SQLLOCALE environment variable exists, it is used.
For more information, see The SQLLOCALE environment variable.
On Windows and Windows NT, get the language from the registry, as described in Registry settings on installation.
On other operating systems, get the language from the operating system.
Both application and server locale definitions have a character set. The application uses its character set when requesting character strings from the server. The database server compares its character set with that of the application to determine whether character set translation is needed.
The character set is also used to determine which code page to use when initializing a database, if none was specified by the application and no collation label is available.
The character set is determined as follows:
If a SQLLOCALE environment variable exists, it is used.
For more information, see The SQLLOCALE environment variable.
Get the character set from the operating system:
On Windows NT, use the GetACP system call. This returns the ANSI character set.
On UNIX, default to ISO8859-1.
On other platforms, use code page 850.
Each database has its own collation. The collation label taken from the locale definition is used by the database server to determine which code page to use when initializing a database.
The collation label is determined as follows:
If a SQLLOCALE environment variable exists, it is used.
For more information, see The SQLLOCALE environment variable.
Get the language and character set from the operating system as described in the previous two sections, and use an internal table to find a collation label corresponding to that language and character set.
The SQLLOCALE environment variable is a single string that consists of three semi-colon-separated assignments. It has the following form:
Charset=cslabel;Language=langlabel;CollationLabel=colabel
The values that the labels can take are set out in the following tables.
The following table shows the valid character set label values, together with the equivalent IANA labels and a description:
Character set label |
IANA label |
Description |
---|---|---|
iso_1 |
iso_8859-1:1987 |
ISO 8859-1 Latin-1 |
cp850 |
<N/A> |
IBM CP850 - European code set |
cp437 |
<N/A> |
IBM CP437 - U.S. code set |
roman8 |
hp-rpman8 |
HP Roman-8 |
mac |
macintosh |
Standard Mac coding |
sjis |
shift_jis |
Shift JIS (no extensions) |
eucjis |
euc-jp |
Sun EUC JIS encoding |
deckanji |
<N/A> |
DEC Unix JIS encoding |
euccns |
<N/A> |
EUC CNS encoding: Traditional Chinese with extensions |
eucgb |
<N/A> |
EUC GB encoding = Simplified Chinese |
cp932 |
windows-31j |
Microsoft CP932 = Win31J-DBCS |
iso88592 |
iso_8859-2:1987 |
ISO 8859-2 Latin-2 Eastern Europe |
iso88595 |
iso_8859-5:1988 |
ISO 8859-5 Latin/Cyrillic |
iso88596 |
iso_8859-6:1987 |
ISO 8859-6 Latin/Arabic |
iso88597 |
iso_8859-7:1987 |
ISO 8859-7 Latin/Greek |
iso88598 |
iso_8859-8:1988 |
ISO 8859-8 Latin/Hebrew |
iso88599 |
iso_8859-9:1989 |
ISO 8859-9 Latin-5 Turkish |
iso15 |
<N/A> |
ISO 8859-15 Latin1 with Euro, etc. |
mac_cyr |
<N/A> |
Macintosh Cyrillic |
mac_ee |
<N/A> |
Macintosh Eastern European |
macgrk2 |
<N/A> |
Macintosh Greek |
macturk |
<N/A> |
Macintosh Turkish |
greek8 |
<N/A> |
HP Greek-8 |
turkish8 |
<N/A> |
HP Turkish-8 |
koi8 |
<N/A> |
KOI-8 Cyrillic |
tis620 |
<N/A> |
TIS-620 Thai standard |
big5 |
<N/A> |
Traditional Chinese (cf. CP950) |
eucksc |
<N/A> |
EUC KSC Korean encoding (cf. CP949) |
cp852 |
<N/A> |
PC Eastern Europe |
cp855 |
<N/A> |
IBM PC Cyrillic |
cp856 |
<N/A> |
Alternate Hebrew |
cp857 |
<N/A> |
IBM PC Turkish |
cp860 |
<N/A> |
PC Portuguese |
cp861 |
<N/A> |
PC Icelandic |
cp862 |
<N/A> |
PC Hebrew |
cp863 |
<N/A> |
IBM PC Canadian French code page |
cp864 |
<N/A> |
PC Arabic |
cp865 |
<N/A> |
PC Nordic |
cp866 |
<N/A> |
PC Russian |
cp869 |
<N/A> |
IBM PC Greek |
cp874 |
<N/A> |
Microsoft Thai SB code page |
cp950 |
<N/A> |
PC (MS) Traditional Chinese |
cp1250 |
<N/A> |
MS Windows 3.1 Eastern European |
cp1251 |
<N/A> |
MS Windows 3.1 Cyrillic |
cp1252 |
<N/A> |
MS Windows 3.1 US (ANSI) |
cp1253 |
<N/A> |
MS Windows 3.1 Greek |
cp1254 |
<N/A> |
MS Windows 3.1 Turkish |
cp1255 |
<N/A> |
MS Windows Hebrew |
cp1256 |
<N/A> |
MS Windows Arabic |
cp1257 |
<N/A> |
MS Windows Baltic |
cp1258 |
<N/A> |
MS Windows Vietnamese |
utf8 |
utf-8 |
UTF-8 treated as a character set |
The following table shows the valid language label values, together with the equivalent ISO 639 labels:
Language label |
Alternate label |
ISP_639 |
---|---|---|
us_english |
english |
EN |
french |
<N/A> |
FR |
german |
<N/A> |
DE |
spanish |
<N/A> |
ES |
japanese |
<N/A> |
JA |
korean |
<N/A> |
KO |
portugese |
portugue |
PT |
chinese |
<N/A> |
ZH |
The collation label is a label for one of the supplied Adaptive Server Anywhere collations, as listed in Supplied collations.