Character set translation details

User's Guide
PART 1. Working with Databases
CHAPTER 12. Database Collations and International Languages

Character set translation details

This section describes in more detail how Adaptive Server Anywhere carries out character set translation. Most users do not need this information. It is provided for advanced users, such as those who may be deploying applications or databases in a multi-character-set environment.

Locales

The database server and the client library recognize their language and character set environment using a locale definition. The application local is used by the client library when making requests, to determine which character set it wants results in. The server compares its own locale with the application locale to determine whether character set translation is needed. The locale consists of the following components:

Language The language is a two-character string: FR for French and so on.
Character set The character set is the code page in use. It uses a Sybase identifier.
Collation label The collation label is the Adaptive Server Anywhere collation.

Each of these is discussed in the following sections.

Determining the locale language

The language is a two-letter identifier; EN specifies English, JA specifies Japanese, and so on. It is used to determine the following behavior:

Which language library to load by the database server and by the client library.
For the client, which language to request from the database.

The way the language component of the locale is determined depends on the operating system:

If a SQLLOCALE environment variable exists, it is used.

For more information, see The SQLLOCALE environment variable.
On Windows and Windows NT, get the language from the registry, as described in Registry settings on installation.
On other operating systems, get the language from the operating system.

Determining the locale character set

Both application and server locale definitions have a character set. The application uses its character set when requesting character strings from the server. The database server compares its character set with that of the application to determine whether character set translation is needed.

The character set is also used to determine which code page to use when initializing a database, if none was specified by the application and no collation label is available.

The character set is determined as follows:

If a SQLLOCALE environment variable exists, it is used.

For more information, see The SQLLOCALE environment variable.
Get the character set from the operating system:
- On Windows NT, use the GetACP system call. This returns the ANSI character set.
- On UNIX, default to ISO8859-1.
- On other platforms, use code page 850.

Determining the locale collation label

Each database has its own collation. The collation label taken from the locale definition is used by the database server to determine which code page to use when initializing a database.

The collation label is determined as follows:

If a SQLLOCALE environment variable exists, it is used.

For more information, see The SQLLOCALE environment variable.
Get the language and character set from the operating system as described in the previous two sections, and use an internal table to find a collation label corresponding to that language and character set.

The SQLLOCALE environment variable

The SQLLOCALE environment variable is a single string that consists of three semi-colon-separated assignments. It has the following form:

Charset=cslabel;Language=langlabel;CollationLabel=colabel

The values that the labels can take are set out in the following tables.

Character set label values

The following table shows the valid character set label values, together with the equivalent IANA labels and a description:

Character set label

IANA label

Description

iso_1

iso_8859-1:1987

ISO 8859-1 Latin-1

cp850

<N/A>

IBM CP850 - European code set

cp437

<N/A>

IBM CP437 - U.S. code set

roman8

hp-rpman8

HP Roman-8

mac

macintosh

Standard Mac coding

sjis

shift_jis

Shift JIS (no extensions)

eucjis

euc-jp

Sun EUC JIS encoding

deckanji

<N/A>

DEC Unix JIS encoding

euccns

<N/A>

EUC CNS encoding: Traditional Chinese with extensions

eucgb

<N/A>

EUC GB encoding = Simplified Chinese

cp932

windows-31j

Microsoft CP932 = Win31J-DBCS

iso88592

iso_8859-2:1987

ISO 8859-2 Latin-2 Eastern Europe

iso88595

iso_8859-5:1988

ISO 8859-5 Latin/Cyrillic

iso88596

iso_8859-6:1987

ISO 8859-6 Latin/Arabic

iso88597

iso_8859-7:1987

ISO 8859-7 Latin/Greek

iso88598

iso_8859-8:1988

ISO 8859-8 Latin/Hebrew

iso88599

iso_8859-9:1989

ISO 8859-9 Latin-5 Turkish

iso15

<N/A>

ISO 8859-15 Latin1 with Euro, etc.

mac_cyr

<N/A>

Macintosh Cyrillic

mac_ee

<N/A>

Macintosh Eastern European

macgrk2

<N/A>

Macintosh Greek

macturk

<N/A>

Macintosh Turkish

greek8

<N/A>

HP Greek-8

turkish8

<N/A>

HP Turkish-8

koi8

<N/A>

KOI-8 Cyrillic

tis620

<N/A>

TIS-620 Thai standard

big5

<N/A>

Traditional Chinese (cf. CP950)

eucksc

<N/A>

EUC KSC Korean encoding (cf. CP949)

cp852

<N/A>

PC Eastern Europe

cp855

<N/A>

IBM PC Cyrillic

cp856

<N/A>

Alternate Hebrew

cp857

<N/A>

IBM PC Turkish

cp860

<N/A>

PC Portuguese

cp861

<N/A>

PC Icelandic

cp862

<N/A>

PC Hebrew

cp863

<N/A>

IBM PC Canadian French code page

cp864

<N/A>

PC Arabic

cp865

<N/A>

PC Nordic

cp866

<N/A>

PC Russian

cp869

<N/A>

IBM PC Greek

cp874

<N/A>

Microsoft Thai SB code page

cp950

<N/A>

PC (MS) Traditional Chinese

cp1250

<N/A>

MS Windows 3.1 Eastern European

cp1251

<N/A>

MS Windows 3.1 Cyrillic

cp1252

<N/A>

MS Windows 3.1 US (ANSI)

cp1253

<N/A>

MS Windows 3.1 Greek

cp1254

<N/A>

MS Windows 3.1 Turkish

cp1255

<N/A>

MS Windows Hebrew

cp1256

<N/A>

MS Windows Arabic

cp1257

<N/A>

MS Windows Baltic

cp1258

<N/A>

MS Windows Vietnamese

utf8

utf-8

UTF-8 treated as a character set

Character set label	IANA label	Description
iso_1	iso_8859-1:1987	ISO 8859-1 Latin-1
cp850	<N/A>	IBM CP850 - European code set
cp437	<N/A>	IBM CP437 - U.S. code set
roman8	hp-rpman8	HP Roman-8
mac	macintosh	Standard Mac coding
sjis	shift_jis	Shift JIS (no extensions)
eucjis	euc-jp	Sun EUC JIS encoding
deckanji	<N/A>	DEC Unix JIS encoding
euccns	<N/A>	EUC CNS encoding: Traditional Chinese with extensions
eucgb	<N/A>	EUC GB encoding = Simplified Chinese
cp932	windows-31j	Microsoft CP932 = Win31J-DBCS
iso88592	iso_8859-2:1987	ISO 8859-2 Latin-2 Eastern Europe
iso88595	iso_8859-5:1988	ISO 8859-5 Latin/Cyrillic
iso88596	iso_8859-6:1987	ISO 8859-6 Latin/Arabic
iso88597	iso_8859-7:1987	ISO 8859-7 Latin/Greek
iso88598	iso_8859-8:1988	ISO 8859-8 Latin/Hebrew
iso88599	iso_8859-9:1989	ISO 8859-9 Latin-5 Turkish
iso15	<N/A>	ISO 8859-15 Latin1 with Euro, etc.
mac_cyr	<N/A>	Macintosh Cyrillic
mac_ee	<N/A>	Macintosh Eastern European
macgrk2	<N/A>	Macintosh Greek
macturk	<N/A>	Macintosh Turkish
greek8	<N/A>	HP Greek-8
turkish8	<N/A>	HP Turkish-8
koi8	<N/A>	KOI-8 Cyrillic
tis620	<N/A>	TIS-620 Thai standard
big5	<N/A>	Traditional Chinese (cf. CP950)
eucksc	<N/A>	EUC KSC Korean encoding (cf. CP949)
cp852	<N/A>	PC Eastern Europe
cp855	<N/A>	IBM PC Cyrillic
cp856	<N/A>	Alternate Hebrew
cp857	<N/A>	IBM PC Turkish
cp860	<N/A>	PC Portuguese
cp861	<N/A>	PC Icelandic
cp862	<N/A>	PC Hebrew
cp863	<N/A>	IBM PC Canadian French code page
cp864	<N/A>	PC Arabic
cp865	<N/A>	PC Nordic
cp866	<N/A>	PC Russian
cp869	<N/A>	IBM PC Greek
cp874	<N/A>	Microsoft Thai SB code page
cp950	<N/A>	PC (MS) Traditional Chinese
cp1250	<N/A>	MS Windows 3.1 Eastern European
cp1251	<N/A>	MS Windows 3.1 Cyrillic
cp1252	<N/A>	MS Windows 3.1 US (ANSI)
cp1253	<N/A>	MS Windows 3.1 Greek
cp1254	<N/A>	MS Windows 3.1 Turkish
cp1255	<N/A>	MS Windows Hebrew
cp1256	<N/A>	MS Windows Arabic
cp1257	<N/A>	MS Windows Baltic
cp1258	<N/A>	MS Windows Vietnamese
utf8	utf-8	UTF-8 treated as a character set

Language label values

The following table shows the valid language label values, together with the equivalent ISO 639 labels:

Language label

Alternate label

ISP_639

us_english

english

EN

french

<N/A>

FR

german

<N/A>

DE

spanish

<N/A>

ES

japanese

<N/A>

JA

korean

<N/A>

KO

portugese

portugue

PT

chinese

<N/A>

ZH

Language label	Alternate label	ISP_639
us_english	english	EN
french	<N/A>	FR
german	<N/A>	DE
spanish	<N/A>	ES
japanese	<N/A>	JA
korean	<N/A>	KO
portugese	portugue	PT
chinese	<N/A>	ZH

Collation label values

The collation label is a label for one of the supplied Adaptive Server Anywhere collations, as listed in Supplied collations.