Choosing a database collation

User's Guide
PART 1. Working with Databases
CHAPTER 12. Database Collations and International Languages

Choosing a database collation

You choose a collation when you create a database. The collation is an option on the CREATE DATABASE statement and in the Initialization utility.

Supplied collations

The following collations are supplied with Adaptive Server Anywhere. You can obtain this list by entering the following command at a system command line:

dbinit /l

Collation name

Type

Description

internal

OEM

Based on Code page 850

437

OEM

Code page 437, ASCII, United States

850

OEM

Code page 850, ASCII, Multilingual

852

OEM

Code page 852, ASCII, Slavic/Latin II

860

OEM

Code page 860, ASCII, Portugal

863

OEM

Code page 863, ASCII, Canada-French

865

OEM

Code page 865, ASCII, Norway

EBCDIC

ANSI

EBCDIC

437EBCDIC

OEM

Code Page 437, EBCDIC

437LATIN1

OEM

Code Page 437, Latin 1

437ESP

OEM

Code Page 437, Spanish

437SVE

OEM

Code Page 437, Swedish/Finnish

819CYR

ANSI

Code Page 819, Cyrillic

819DAN

ANSI

Code Page 819, Danish

819ELL

ANSI

Code Page 819, Greek

819ESP

ANSI

Code Page 819, Spanish

819ISL

ANSI

Code Page 819, Icelandic

819LATIN1

ANSI

Code Page 819, Latin 1

819LATIN2

ANSI

Code Page 819, Latin 2

819NOR

ANSI

Code Page 819, Norwegian

819RUS

ANSI

Code Page 819, Russian

819SVE

ANSI

Code Page 819, Swedish/Finnish

819TRK

ANSI

Code Page 819, Turkish

850CYR

OEM

Code Page 850, Cyrillic

850DAN

OEM

Code Page 850, Danish

850ELL

OEM

Code Page 850, Greek

850ESP

OEM

Code Page 850, Spanish

850ISL

OEM

Code Page 850, Icelandic

850LATIN1

OEM

Code Page 850, Latin 1

850LATIN2

OEM

Code Page 850, Latin 2

850NOR

OEM

Code Page 850, Norwegian

850RUS

OEM

Code Page 850, Russian

850SVE

OEM

Code Page 850, Swedish/Finnish

850TRK

OEM

Code Page 850, Turkish

852LATIN2

OEM

Code Page 852, Latin 2

852CYR

OEM

Code Page 852, Cyrillic

855CYR

OEM

Code Page 855, Cyrillic

856HEB

OEM

Code Page 856, Hebrew

857TRK

OEM

Code Page 857, Turkish

860LATIN1

OEM

Code Page 860, Latin 1

861ISL

OEM

Code Page 861, Icelandic

862HEB

OEM

Code Page 862, Hebrew

863LATIN1

OEM

Code Page 863, Latin 1

865NOR

OEM

Code Page 865, Norwegian

866RUS

OEM

Code Page 866, Russian

869ELL

OEM

Code Page 869, Greek

920TRK

ANSI

Code page 920, Turkish

SJIS

ANSI

Japanese Shift-JIS Encoding

SJIS2

ANSI

Japanese Shift-JIS Encoding, Sybase Adaptive Server Enterprise-compatible

EUC_JAPAN

ANSI

Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding

EUC_CHINA

ANSI

Chinese GB 2312-80 Encoding

EUC_TAIWAN

ANSI

Taiwanese Big 5 Encoding

EUC_KOREA

ANSI

Korean KS C 5601-1992 Encoding

UTF8

ANSI

UCS-4 Transformation Format

ISO_1

ANSI

ISO8859-1, Latin 1

ISO_BINENG

ANSI

Binary ordering, English ISO/ASCII 7-bit letter case mappings

WIN_LATIN1

ANSI

Windows Latin 1, similar to ISO_1, ISO8859-1, with extensions

WIN_LATIN5

ANSI

A superset of ISO 8859-9 with characters also in 80-9F

Collation name	Type	Description
internal	OEM	Based on Code page 850
437	OEM	Code page 437, ASCII, United States
850	OEM	Code page 850, ASCII, Multilingual
852	OEM	Code page 852, ASCII, Slavic/Latin II
860	OEM	Code page 860, ASCII, Portugal
863	OEM	Code page 863, ASCII, Canada-French
865	OEM	Code page 865, ASCII, Norway
EBCDIC	ANSI	EBCDIC
437EBCDIC	OEM	Code Page 437, EBCDIC
437LATIN1	OEM	Code Page 437, Latin 1
437ESP	OEM	Code Page 437, Spanish
437SVE	OEM	Code Page 437, Swedish/Finnish
819CYR	ANSI	Code Page 819, Cyrillic
819DAN	ANSI	Code Page 819, Danish
819ELL	ANSI	Code Page 819, Greek
819ESP	ANSI	Code Page 819, Spanish
819ISL	ANSI	Code Page 819, Icelandic
819LATIN1	ANSI	Code Page 819, Latin 1
819LATIN2	ANSI	Code Page 819, Latin 2
819NOR	ANSI	Code Page 819, Norwegian
819RUS	ANSI	Code Page 819, Russian
819SVE	ANSI	Code Page 819, Swedish/Finnish
819TRK	ANSI	Code Page 819, Turkish
850CYR	OEM	Code Page 850, Cyrillic
850DAN	OEM	Code Page 850, Danish
850ELL	OEM	Code Page 850, Greek
850ESP	OEM	Code Page 850, Spanish
850ISL	OEM	Code Page 850, Icelandic
850LATIN1	OEM	Code Page 850, Latin 1
850LATIN2	OEM	Code Page 850, Latin 2
850NOR	OEM	Code Page 850, Norwegian
850RUS	OEM	Code Page 850, Russian
850SVE	OEM	Code Page 850, Swedish/Finnish
850TRK	OEM	Code Page 850, Turkish
852LATIN2	OEM	Code Page 852, Latin 2
852CYR	OEM	Code Page 852, Cyrillic
855CYR	OEM	Code Page 855, Cyrillic
856HEB	OEM	Code Page 856, Hebrew
857TRK	OEM	Code Page 857, Turkish
860LATIN1	OEM	Code Page 860, Latin 1
861ISL	OEM	Code Page 861, Icelandic
862HEB	OEM	Code Page 862, Hebrew
863LATIN1	OEM	Code Page 863, Latin 1
865NOR	OEM	Code Page 865, Norwegian
866RUS	OEM	Code Page 866, Russian
869ELL	OEM	Code Page 869, Greek
920TRK	ANSI	Code page 920, Turkish
SJIS	ANSI	Japanese Shift-JIS Encoding
SJIS2	ANSI	Japanese Shift-JIS Encoding, Sybase Adaptive Server Enterprise-compatible
EUC_JAPAN	ANSI	Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding
EUC_CHINA	ANSI	Chinese GB 2312-80 Encoding
EUC_TAIWAN	ANSI	Taiwanese Big 5 Encoding
EUC_KOREA	ANSI	Korean KS C 5601-1992 Encoding
UTF8	ANSI	UCS-4 Transformation Format
ISO_1	ANSI	ISO8859-1, Latin 1
ISO_BINENG	ANSI	Binary ordering, English ISO/ASCII 7-bit letter case mappings
WIN_LATIN1	ANSI	Windows Latin 1, similar to ISO_1, ISO8859-1, with extensions
WIN_LATIN5	ANSI	A superset of ISO 8859-9 with characters also in 80-9F

ANSI or OEM?

Adaptive Server Anywhere collations are based on code pages that are designated as either ANSI or OEM. In most cases, use of an ANSI code page is recommended.

If you choose to use an ANSI code page, you must not use the ODBC translation driver in the ODBC data source configuration window.

If you choose to use an OEM code page, you must do the following:

Choose a code page that matches the OEM code pages on your users' client machines.
When setting up data sources for Windows-based ODBC applications, do choose the Adaptive Server Anywhere translation driver in the ODBC data source configuration.

The translation driver converts between the OEM code page on your machine and the ANSI code page used by Windows. If the database collation is a different OEM code page than the one on your machine, results may not be perfect.

Both Interactive SQL and Sybase Central detect whether the database collation is ANSI or OEM by checking the first few characters, and either enable or disable translation as needed.

For Info For more information about code page translation in Interactive SQL, see CHAR_OEM_TRANSLATION option.

Supplied ANSI collations

Adaptive Server Anywhere provides the following ANSI collations:

920TRK
SJIS
SJIS2
EUC_JAPAN
EUC_CHINA
EUC_TAIWAN
EUC_KOREA
UTF8
ISO_1
ISO_BINENG
WIN_LATIN1
WIN_LATIN5

The ISO_1 collation

ISO_1 is provided for compatibility with the Adaptive Server Enterprise default ISO_1 collation. The differences are as follows:

The lower case letter sharp s (\xDF) sorts with the lower case s in Adaptive Server Anywhere, but after ss in Adaptive Server Enterprise.
Ligatures are two characters combined into a single character. The ligatures corresponding to AE and ae (\xC6 and \xE6) sort after A and a respectively in Adaptive Server Anywhere, but after AE and ae in Adaptive Server Enterprise.

The WIN_LATIN1 collation

WIN_LATIN1 is similar to ISO_1, except that Windows has defined characters in places where ISO_1 says "undefined", specifically the range 0x80-0xBF. The differences from Adaptive Server Enterprise's ISO_1 are as follows:

The upper case and lower case Icelandic Eth (\xD0 and \xF0) is sorted with D in Adaptive Server Anywhere, but after all other letters in Adaptive Server Enterprise.
The upper case and lower case Icelandic Thorn (\xD0 and \xF0) is sorted with T in Adaptive Server Anywhere, but after all other letters in Adaptive Server Enterprise.
The upper-case Y-diaresis (\x9F) is sorted with Y in Adaptive Server Anywhere, and case converts with lower-case Y-diaresis (\xFF). In Adaptive Server Enterprise it is undefined and sorts after \x9E
The lower case letter sharp s (\xDF) sorts with the lower case s in Adaptive Server Anywhere, but after ss in Adaptive Server Enterprise.
Ligatures are two characters combined into a single character. The ligatures corresponding to AE and ae (\xC6 and \xE6) sort after A and a respectively in Adaptive Server Anywhere, but after AE and ae in Adaptive Server Enterprise.
The ligatures corresponding to OE and oe (\x8C and \x9C) sort with O in Adaptive Server Anywhere, but after OE and oe in Adaptive Server Enterprise.
The upper case and lower case letter S with caron (\x8A and \x9A) sorts with S in Adaptive Server Anywhere, but is undefined in Adaptive Server Enterprise, sorting after \x89 and \x99.

Supported OEM collations

The following table shows the built-in collations that correspond to OEM code pages. The table and the corresponding collations were derived from several manuals from IBM concerning National Language Support, subject to the restrictions mentioned above. (This table represents the best information available at the time of writing. Due to continuing rapid geopolitical changes, the table may contain names for countries that no longer exist).

Country

Language

Primary Code Page

Primary Collation

Secondary Code Page

Secondary Collation

Argentina

Spanish

850

850ESP

437

437ESP

Australia

English

437

437LATIN1

850

850LATIN1

Austria

German

850

850LATIN1

437

437LATIN1

Belgium

Belgian Dutch

850

850LATIN1

437

437LATIN1

Belgium

Belgian French

850

850LATIN1

437

437LATIN1

Belarus

Belarussian

855

855CYR

Brazil

Portuguese

850

850LATIN1

437

437LATIN1

Bulgaria

Bulgarian

855

855CYR

850

850CYR

Canada

Cdn French

850

850LATIN1

863

863LATIN1

Canada

English

437

437LATIN1

850

850LATIN1

Croatia

Croatian

852

852LATIN2

850

850LATIN2

Czech Republic

Czech

852

852LATIN2

850

850LATIN2

Denmark

Danish

850

850DAN

Finland

Finnish

850

850SVE

437

437SVE

France

French

850

850LATIN1

437

437LATIN1

Germany

German

850

850LATIN1

437

437LATIN1

Greece

Greek

869

869ELL

850

850ELL

Hungary

Hungarian

852

852LATIN2

850

850LATIN2

Iceland

Icelandic

850

850ISL

861

861ISL

Ireland

English

850

850LATIN1

437

437LATIN1

Israel

Hebrew

862

862HEB

856

856HEB

Italy

Italian

850

850LATIN1

437

437LATIN1

Mexico

Spanish

850

850ESP

437

437ESP

Nether-lands

Dutch

850

850LATIN1

437

437LATIN1

New Zealand

English

437

437LATIN1

850

850LATIN1

Norway

Norwegian

865

865NOR

850

850NOR

Peru

Spanish

850

850ESP

437

437ESP

Poland

Polish

852

852LATIN2

850

850LATIN2

Portugal

Portuguese

850

850LATIN1

860

860LATIN1

Romania

Romanian

852

852LATIN2

850

850LATIN2

Russia

Russian

866

866RUS

850

850RUS

S. Africa

Afrikaans

437

437LATIN1

850

850LATIN1

S. Africa

English

437

437LATIN1

850

850LATIN1

Slovak Republic

Slovakian

852

852LATIN2

850

850LATIN2

Slovenia

Slovenian

852

852LATIN2

850

850LATIN2

Spain

Spanish

850

850ESP

437

437ESP

Sweden

Swedish

850

850SVE

437

437SVE

Switzer-land

French

850

850LATIN1

437

437LATIN1

Switzer-land

German

850

850LATIN1

437

437LATIN1

Switzer-land

Italian

850

850LATIN1

437

437LATIN1

Turkey

Turkish

857

857TRK

850

850TRK

UK

English

850

850LATIN1

437

437LATIN1

USA

English

437

437LATIN1

850

850LATIN1

Venezuela

Spanish

850

850ESP

437

437ESP

Yugoslavia

Maced-onian

852

852LATIN2

850

850LATIN2

Yugoslavia

Serbian Cyrillic

855

855CYR

852

852CYR

Yugoslavia

Serbian Latin

852

852LATIN2

850

850LATIN2

Country	Language	Primary Code Page	Primary Collation	Secondary Code Page	Secondary Collation
Argentina	Spanish	850	850ESP	437	437ESP
Australia	English	437	437LATIN1	850	850LATIN1
Austria	German	850	850LATIN1	437	437LATIN1
Belgium	Belgian Dutch	850	850LATIN1	437	437LATIN1
Belgium	Belgian French	850	850LATIN1	437	437LATIN1
Belarus	Belarussian	855	855CYR
Brazil	Portuguese	850	850LATIN1	437	437LATIN1
Bulgaria	Bulgarian	855	855CYR	850	850CYR
Canada	Cdn French	850	850LATIN1	863	863LATIN1
Canada	English	437	437LATIN1	850	850LATIN1
Croatia	Croatian	852	852LATIN2	850	850LATIN2
Czech Republic	Czech	852	852LATIN2	850	850LATIN2
Denmark	Danish	850	850DAN
Finland	Finnish	850	850SVE	437	437SVE
France	French	850	850LATIN1	437	437LATIN1
Germany	German	850	850LATIN1	437	437LATIN1
Greece	Greek	869	869ELL	850	850ELL
Hungary	Hungarian	852	852LATIN2	850	850LATIN2
Iceland	Icelandic	850	850ISL	861	861ISL
Ireland	English	850	850LATIN1	437	437LATIN1
Israel	Hebrew	862	862HEB	856	856HEB
Italy	Italian	850	850LATIN1	437	437LATIN1
Mexico	Spanish	850	850ESP	437	437ESP
Nether-lands	Dutch	850	850LATIN1	437	437LATIN1
New Zealand	English	437	437LATIN1	850	850LATIN1
Norway	Norwegian	865	865NOR	850	850NOR
Peru	Spanish	850	850ESP	437	437ESP
Poland	Polish	852	852LATIN2	850	850LATIN2
Portugal	Portuguese	850	850LATIN1	860	860LATIN1
Romania	Romanian	852	852LATIN2	850	850LATIN2
Russia	Russian	866	866RUS	850	850RUS
S. Africa	Afrikaans	437	437LATIN1	850	850LATIN1
S. Africa	English	437	437LATIN1	850	850LATIN1
Slovak Republic	Slovakian	852	852LATIN2	850	850LATIN2
Slovenia	Slovenian	852	852LATIN2	850	850LATIN2
Spain	Spanish	850	850ESP	437	437ESP
Sweden	Swedish	850	850SVE	437	437SVE
Switzer-land	French	850	850LATIN1	437	437LATIN1
Switzer-land	German	850	850LATIN1	437	437LATIN1
Switzer-land	Italian	850	850LATIN1	437	437LATIN1
Turkey	Turkish	857	857TRK	850	850TRK
UK	English	850	850LATIN1	437	437LATIN1
USA	English	437	437LATIN1	850	850LATIN1
Venezuela	Spanish	850	850ESP	437	437ESP
Yugoslavia	Maced-onian	852	852LATIN2	850	850LATIN2
Yugoslavia	Serbian Cyrillic	855	855CYR	852	852CYR
Yugoslavia	Serbian Latin	852	852LATIN2	850	850LATIN2

When creating a new database

A user creating a new database should find the line with the country/language that they wish to use, then pick either the primary or secondary collation, depending on which code page is in use in their computer. (The chcp command will display the current code page number.) If their particular combination is not present, then another line with a satisfactory combination may be used, or a custom collation may be required.

International aspects of case sensitivity

Adaptive Server Anywhere is case preserving and case insensitive for identifiers, such as table names and column names. This means that the names are stored in the case in which they are created, but any access to the identifiers is done in a case-insensitive manner.

For example, the names of the system tables are held in upper case (SYSDOMAIN, SYSTABLE, and so on), but access is case insensitive, so that the two following statements are equivalent:

SELECT *
FROM systable
SELECT *
FROM SYSTABLE

The equivalence of upper and lower case characters is enforced in the collation. There are some collations where particular care may be needed when assuming case insensitivity of identifiers.

Example

In the Turkish 857TRK collation, the lower case i does not have the character I as its upper case equivalent. Therefore, despite the case insensitivity of identifiers, the following two statements are not equivalent in this collation:

SELECT *
FROM sysdomain
SELECT *
FROM SYSDOMAIN

Using multibyte collations

Adaptive Server Anywhere provides collations using several multibyte character sets, including the following:

Multibyte character set

Description

SJIS

Japanese Shift-JIS Encoding

EUC_JAPAN

Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding

EUC_CHINA

Chinese GB 2312-80 Encoding

EUC_TAIWAN

Taiwanese Big 5 Encoding

EUC_KOREA

Korean KS C 5601-1992 Encoding

UTF8

UTF8 is a variable-byte encoding of 4-byte Unicode (UCS-4). UTF8 characters up to 4 bytes in length are supported, while UTF8 may use up to 6 bytes.

Multibyte character set	Description
SJIS	Japanese Shift-JIS Encoding
EUC_JAPAN	Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding
EUC_CHINA	Chinese GB 2312-80 Encoding
EUC_TAIWAN	Taiwanese Big 5 Encoding
EUC_KOREA	Korean KS C 5601-1992 Encoding
UTF8	UTF8 is a variable-byte encoding of 4-byte Unicode (UCS-4). UTF8 characters up to 4 bytes in length are supported, while UTF8 may use up to 6 bytes.

For Info For a complete listing, see Supplied collations.

This section describes how multibyte character sets are handled. The description applies to the supported collations and to any multibyte custom collations you may create.

Variable length character sets

Adaptive Server Anywhere supports variable length character sets. In these sets, some characters are represented by one byte, and some by more than one, to a maximum of four bytes. The value of the first byte in any character indicates the number of bytes used for that character, and also indicates whether the character is a space character, a digit, or an alphabetic (alpha) character.

Adaptive Server Anywhere does not support fixed-length multibyte character sets such as 2-byte Unicode or 4-byte Unicode.

Example

As an example, characters in the Shift-JIS character set are of either one or two bytes in length. If the hex value of the first byte is in the range 81-9F or E0-EF (decimal values 129-159 or 224-239); the character is a two-byte character and the subsequent byte (called a follow byte) completes the character. If the first byte is outside this range, the character is a single-byte character and the next byte is the first byte of the following character.

The properties of any Shift-JIS character can be read from its first byte also. Characters with a first byte in the (hex) range 09 to 0D, or 20, are space characters.
Characters in the ranges 41 to 5A, 61 to 7A, 81 to 9F or E0 to EF are alphabetic (letters).
Characters in the range 30 to 39 are digits.

In building custom collations, you can specify which ranges of values for the first byte signify single- and double-byte (or more) characters, and which specify space, alpha, and digit characters. However, all first bytes of value less than 40 (hex 28) must be single-byte characters, and no follow bytes may have values less than 40. This restriction is satisfied by all known current encodings.