Collection Contents Index Using character set translation Creating databases with custom collations pdf/chap12.pdf

User's Guide
   PART 1. Working with Databases
     CHAPTER 12. Database Collations and International Languages       

Choosing a database collation


You choose a collation when you create a database. The collation is an option on the CREATE DATABASE statement and in the Initialization utility.

Top of page  Supplied collations

The following collations are supplied with Adaptive Server Anywhere. You can obtain this list by entering the following command at a system command line:

dbinit /l

Collation name

Type

Description

internal

OEM

Based on Code page 850

437

OEM

Code page 437, ASCII, United States

850

OEM

Code page 850, ASCII, Multilingual

852

OEM

Code page 852, ASCII, Slavic/Latin II

860

OEM

Code page 860, ASCII, Portugal

863

OEM

Code page 863, ASCII, Canada-French

865

OEM

Code page 865, ASCII, Norway

EBCDIC

ANSI

EBCDIC

437EBCDIC

OEM

Code Page 437, EBCDIC

437LATIN1

OEM

Code Page 437, Latin 1

437ESP

OEM

Code Page 437, Spanish

437SVE

OEM

Code Page 437, Swedish/Finnish

819CYR

ANSI

Code Page 819, Cyrillic

819DAN

ANSI

Code Page 819, Danish

819ELL

ANSI

Code Page 819, Greek

819ESP

ANSI

Code Page 819, Spanish

819ISL

ANSI

Code Page 819, Icelandic

819LATIN1

ANSI

Code Page 819, Latin 1

819LATIN2

ANSI

Code Page 819, Latin 2

819NOR

ANSI

Code Page 819, Norwegian

819RUS

ANSI

Code Page 819, Russian

819SVE

ANSI

Code Page 819, Swedish/Finnish

819TRK

ANSI

Code Page 819, Turkish

850CYR

OEM

Code Page 850, Cyrillic

850DAN

OEM

Code Page 850, Danish

850ELL

OEM

Code Page 850, Greek

850ESP

OEM

Code Page 850, Spanish

850ISL

OEM

Code Page 850, Icelandic

850LATIN1

OEM

Code Page 850, Latin 1

850LATIN2

OEM

Code Page 850, Latin 2

850NOR

OEM

Code Page 850, Norwegian

850RUS

OEM

Code Page 850, Russian

850SVE

OEM

Code Page 850, Swedish/Finnish

850TRK

OEM

Code Page 850, Turkish

852LATIN2

OEM

Code Page 852, Latin 2

852CYR

OEM

Code Page 852, Cyrillic

855CYR

OEM

Code Page 855, Cyrillic

856HEB

OEM

Code Page 856, Hebrew

857TRK

OEM

Code Page 857, Turkish

860LATIN1

OEM

Code Page 860, Latin 1

861ISL

OEM

Code Page 861, Icelandic

862HEB

OEM

Code Page 862, Hebrew

863LATIN1

OEM

Code Page 863, Latin 1

865NOR

OEM

Code Page 865, Norwegian

866RUS

OEM

Code Page 866, Russian

869ELL

OEM

Code Page 869, Greek

920TRK

ANSI

Code page 920, Turkish

SJIS

ANSI

Japanese Shift-JIS Encoding

SJIS2

ANSI

Japanese Shift-JIS Encoding, Sybase Adaptive Server Enterprise-compatible

EUC_JAPAN

ANSI

Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding

EUC_CHINA

ANSI

Chinese GB 2312-80 Encoding

EUC_TAIWAN

ANSI

Taiwanese Big 5 Encoding

EUC_KOREA

ANSI

Korean KS C 5601-1992 Encoding

UTF8

ANSI

UCS-4 Transformation Format

ISO_1

ANSI

ISO8859-1, Latin 1

ISO_BINENG

ANSI

Binary ordering, English ISO/ASCII 7-bit letter case mappings

WIN_LATIN1

ANSI

Windows Latin 1, similar to ISO_1, ISO8859-1, with extensions

WIN_LATIN5

ANSI

A superset of ISO 8859-9 with characters also in 80-9F

Top of page  ANSI or OEM?

Adaptive Server Anywhere collations are based on code pages that are designated as either ANSI or OEM. In most cases, use of an ANSI code page is recommended.

If you choose to use an ANSI code page, you must not use the ODBC translation driver in the ODBC data source configuration window.

If you choose to use an OEM code page, you must do the following:

Both Interactive SQL and Sybase Central detect whether the database collation is ANSI or OEM by checking the first few characters, and either enable or disable translation as needed.

For Info     For more information about code page translation in Interactive SQL, see CHAR_OEM_TRANSLATION option.

Top of page  Supplied ANSI collations

Adaptive Server Anywhere provides the following ANSI collations:

The ISO_1 collation 

ISO_1 is provided for compatibility with the Adaptive Server Enterprise default ISO_1 collation. The differences are as follows:

The WIN_LATIN1 collation 

WIN_LATIN1 is similar to ISO_1, except that Windows has defined characters in places where ISO_1 says "undefined", specifically the range 0x80-0xBF. The differences from Adaptive Server Enterprise's ISO_1 are as follows:

Top of page  Supported OEM collations

The following table shows the built-in collations that correspond to OEM code pages. The table and the corresponding collations were derived from several manuals from IBM concerning National Language Support, subject to the restrictions mentioned above. (This table represents the best information available at the time of writing. Due to continuing rapid geopolitical changes, the table may contain names for countries that no longer exist).

Country

Language

Primary Code Page

Primary Collation

Secondary Code Page

Secondary Collation

Argentina

Spanish

850

850ESP

437

437ESP

Australia

English

437

437LATIN1

850

850LATIN1

Austria

German

850

850LATIN1

437

437LATIN1

Belgium

Belgian Dutch

850

850LATIN1

437

437LATIN1

Belgium

Belgian French

850

850LATIN1

437

437LATIN1

Belarus

Belarussian

855

855CYR

Brazil

Portuguese

850

850LATIN1

437

437LATIN1

Bulgaria

Bulgarian

855

855CYR

850

850CYR

Canada

Cdn French

850

850LATIN1

863

863LATIN1

Canada

English

437

437LATIN1

850

850LATIN1

Croatia

Croatian

852

852LATIN2

850

850LATIN2

Czech Republic

Czech

852

852LATIN2

850

850LATIN2

Denmark

Danish

850

850DAN

Finland

Finnish

850

850SVE

437

437SVE

France

French

850

850LATIN1

437

437LATIN1

Germany

German

850

850LATIN1

437

437LATIN1

Greece

Greek

869

869ELL

850

850ELL

Hungary

Hungarian

852

852LATIN2

850

850LATIN2

Iceland

Icelandic

850

850ISL

861

861ISL

Ireland

English

850

850LATIN1

437

437LATIN1

Israel

Hebrew

862

862HEB

856

856HEB

Italy

Italian

850

850LATIN1

437

437LATIN1

Mexico

Spanish

850

850ESP

437

437ESP

Nether-lands

Dutch

850

850LATIN1

437

437LATIN1

New Zealand

English

437

437LATIN1

850

850LATIN1

Norway

Norwegian

865

865NOR

850

850NOR

Peru

Spanish

850

850ESP

437

437ESP

Poland

Polish

852

852LATIN2

850

850LATIN2

Portugal

Portuguese

850

850LATIN1

860

860LATIN1

Romania

Romanian

852

852LATIN2

850

850LATIN2

Russia

Russian

866

866RUS

850

850RUS

S. Africa

Afrikaans

437

437LATIN1

850

850LATIN1

S. Africa

English

437

437LATIN1

850

850LATIN1

Slovak Republic

Slovakian

852

852LATIN2

850

850LATIN2

Slovenia

Slovenian

852

852LATIN2

850

850LATIN2

Spain

Spanish

850

850ESP

437

437ESP

Sweden

Swedish

850

850SVE

437

437SVE

Switzer-land

French

850

850LATIN1

437

437LATIN1

Switzer-land

German

850

850LATIN1

437

437LATIN1

Switzer-land

Italian

850

850LATIN1

437

437LATIN1

Turkey

Turkish

857

857TRK

850

850TRK

UK

English

850

850LATIN1

437

437LATIN1

USA

English

437

437LATIN1

850

850LATIN1

Venezuela

Spanish

850

850ESP

437

437ESP

Yugoslavia

Maced-onian

852

852LATIN2

850

850LATIN2

Yugoslavia

Serbian Cyrillic

855

855CYR

852

852CYR

Yugoslavia

Serbian Latin

852

852LATIN2

850

850LATIN2

When creating a new database 

A user creating a new database should find the line with the country/language that they wish to use, then pick either the primary or secondary collation, depending on which code page is in use in their computer. (The chcp command will display the current code page number.) If their particular combination is not present, then another line with a satisfactory combination may be used, or a custom collation may be required.

Top of page  International aspects of case sensitivity

Adaptive Server Anywhere is case preserving and case insensitive for identifiers, such as table names and column names. This means that the names are stored in the case in which they are created, but any access to the identifiers is done in a case-insensitive manner.

For example, the names of the system tables are held in upper case (SYSDOMAIN, SYSTABLE, and so on), but access is case insensitive, so that the two following statements are equivalent:

SELECT *
FROM systable
SELECT *
FROM SYSTABLE

The equivalence of upper and lower case characters is enforced in the collation. There are some collations where particular care may be needed when assuming case insensitivity of identifiers.

Example 

In the Turkish 857TRK collation, the lower case i does not have the character I as its upper case equivalent. Therefore, despite the case insensitivity of identifiers, the following two statements are not equivalent in this collation:

SELECT *
FROM sysdomain
SELECT *
FROM SYSDOMAIN

Top of page  Using multibyte collations

Adaptive Server Anywhere provides collations using several multibyte character sets, including the following:

Multibyte character set

Description

SJIS

Japanese Shift-JIS Encoding

EUC_JAPAN

Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding

EUC_CHINA

Chinese GB 2312-80 Encoding

EUC_TAIWAN

Taiwanese Big 5 Encoding

EUC_KOREA

Korean KS C 5601-1992 Encoding

UTF8

UTF8 is a variable-byte encoding of 4-byte Unicode (UCS-4). UTF8 characters up to 4 bytes in length are supported, while UTF8 may use up to 6 bytes.

For Info     For a complete listing, see Supplied collations.

This section describes how multibyte character sets are handled. The description applies to the supported collations and to any multibyte custom collations you may create.

Top of page  Variable length character sets

Adaptive Server Anywhere supports variable length character sets. In these sets, some characters are represented by one byte, and some by more than one, to a maximum of four bytes. The value of the first byte in any character indicates the number of bytes used for that character, and also indicates whether the character is a space character, a digit, or an alphabetic (alpha) character.

Adaptive Server Anywhere does not support fixed-length multibyte character sets such as 2-byte Unicode or 4-byte Unicode.

Example 

As an example, characters in the Shift-JIS character set are of either one or two bytes in length. If the hex value of the first byte is in the range 81-9F or E0-EF (decimal values 129-159 or 224-239); the character is a two-byte character and the subsequent byte (called a follow byte) completes the character. If the first byte is outside this range, the character is a single-byte character and the next byte is the first byte of the following character.

In building custom collations, you can specify which ranges of values for the first byte signify single- and double-byte (or more) characters, and which specify space, alpha, and digit characters. However, all first bytes of value less than 40 (hex 28) must be single-byte characters, and no follow bytes may have values less than 40. This restriction is satisfied by all known current encodings.

Top of page  

Collection Contents Index Using character set translation Creating databases with custom collations pdf/chap12.pdf