User's Guide
PART 1. Working with Databases
CHAPTER 12. Database Collations and International Languages
You choose a collation when you create a database. The collation is an option on the CREATE DATABASE statement and in the Initialization utility.
The following collations are supplied with Adaptive Server Anywhere. You can obtain this list by entering the following command at a system command line:
dbinit /l
Collation name |
Type |
Description |
---|---|---|
internal |
OEM |
Based on Code page 850 |
437 |
OEM |
Code page 437, ASCII, United States |
850 |
OEM |
Code page 850, ASCII, Multilingual |
852 |
OEM |
Code page 852, ASCII, Slavic/Latin II |
860 |
OEM |
Code page 860, ASCII, Portugal |
863 |
OEM |
Code page 863, ASCII, Canada-French |
865 |
OEM |
Code page 865, ASCII, Norway |
EBCDIC |
ANSI |
EBCDIC |
437EBCDIC |
OEM |
Code Page 437, EBCDIC |
437LATIN1 |
OEM |
Code Page 437, Latin 1 |
437ESP |
OEM |
Code Page 437, Spanish |
437SVE |
OEM |
Code Page 437, Swedish/Finnish |
819CYR |
ANSI |
Code Page 819, Cyrillic |
819DAN |
ANSI |
Code Page 819, Danish |
819ELL |
ANSI |
Code Page 819, Greek |
819ESP |
ANSI |
Code Page 819, Spanish |
819ISL |
ANSI |
Code Page 819, Icelandic |
819LATIN1 |
ANSI |
Code Page 819, Latin 1 |
819LATIN2 |
ANSI |
Code Page 819, Latin 2 |
819NOR |
ANSI |
Code Page 819, Norwegian |
819RUS |
ANSI |
Code Page 819, Russian |
819SVE |
ANSI |
Code Page 819, Swedish/Finnish |
819TRK |
ANSI |
Code Page 819, Turkish |
850CYR |
OEM |
Code Page 850, Cyrillic |
850DAN |
OEM |
Code Page 850, Danish |
850ELL |
OEM |
Code Page 850, Greek |
850ESP |
OEM |
Code Page 850, Spanish |
850ISL |
OEM |
Code Page 850, Icelandic |
850LATIN1 |
OEM |
Code Page 850, Latin 1 |
850LATIN2 |
OEM |
Code Page 850, Latin 2 |
850NOR |
OEM |
Code Page 850, Norwegian |
850RUS |
OEM |
Code Page 850, Russian |
850SVE |
OEM |
Code Page 850, Swedish/Finnish |
850TRK |
OEM |
Code Page 850, Turkish |
852LATIN2 |
OEM |
Code Page 852, Latin 2 |
852CYR |
OEM |
Code Page 852, Cyrillic |
855CYR |
OEM |
Code Page 855, Cyrillic |
856HEB |
OEM |
Code Page 856, Hebrew |
857TRK |
OEM |
Code Page 857, Turkish |
860LATIN1 |
OEM |
Code Page 860, Latin 1 |
861ISL |
OEM |
Code Page 861, Icelandic |
862HEB |
OEM |
Code Page 862, Hebrew |
863LATIN1 |
OEM |
Code Page 863, Latin 1 |
865NOR |
OEM |
Code Page 865, Norwegian |
866RUS |
OEM |
Code Page 866, Russian |
869ELL |
OEM |
Code Page 869, Greek |
920TRK |
ANSI |
Code page 920, Turkish |
SJIS |
ANSI |
Japanese Shift-JIS Encoding |
SJIS2 |
ANSI |
Japanese Shift-JIS Encoding, Sybase Adaptive Server Enterprise-compatible |
EUC_JAPAN |
ANSI |
Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding |
EUC_CHINA |
ANSI |
Chinese GB 2312-80 Encoding |
EUC_TAIWAN |
ANSI |
Taiwanese Big 5 Encoding |
EUC_KOREA |
ANSI |
Korean KS C 5601-1992 Encoding |
UTF8 |
ANSI |
UCS-4 Transformation Format |
ISO_1 |
ANSI |
ISO8859-1, Latin 1 |
ISO_BINENG |
ANSI |
Binary ordering, English ISO/ASCII 7-bit letter case mappings |
WIN_LATIN1 |
ANSI |
Windows Latin 1, similar to ISO_1, ISO8859-1, with extensions |
WIN_LATIN5 |
ANSI |
A superset of ISO 8859-9 with characters also in 80-9F |
Adaptive Server Anywhere collations are based on code pages that are designated as either ANSI or OEM. In most cases, use of an ANSI code page is recommended.
If you choose to use an ANSI code page, you must not use the ODBC translation driver in the ODBC data source configuration window.
If you choose to use an OEM code page, you must do the following:
Choose a code page that matches the OEM code pages on your users' client machines.
When setting up data sources for Windows-based ODBC applications, do choose the Adaptive Server Anywhere translation driver in the ODBC data source configuration.
The translation driver converts between the OEM code page on your machine and the ANSI code page used by Windows. If the database collation is a different OEM code page than the one on your machine, results may not be perfect.
Both Interactive SQL and Sybase Central detect whether the database collation is ANSI or OEM by checking the first few characters, and either enable or disable translation as needed.
For more information about code page translation in Interactive SQL, see CHAR_OEM_TRANSLATION option.
Adaptive Server Anywhere provides the following ANSI collations:
920TRK
SJIS
SJIS2
EUC_JAPAN
EUC_CHINA
EUC_TAIWAN
EUC_KOREA
UTF8
ISO_1
ISO_BINENG
WIN_LATIN1
WIN_LATIN5
ISO_1 is provided for compatibility with the Adaptive Server Enterprise default ISO_1 collation. The differences are as follows:
The lower case letter sharp s (\xDF) sorts with the lower case s in Adaptive Server Anywhere, but after ss in Adaptive Server Enterprise.
Ligatures are two characters combined into a single character. The ligatures corresponding to AE and ae (\xC6 and \xE6) sort after A and a respectively in Adaptive Server Anywhere, but after AE and ae in Adaptive Server Enterprise.
WIN_LATIN1 is similar to ISO_1, except that Windows has defined characters in places where ISO_1 says "undefined", specifically the range 0x80-0xBF. The differences from Adaptive Server Enterprise's ISO_1 are as follows:
The upper case and lower case Icelandic Eth (\xD0 and \xF0) is sorted with D in Adaptive Server Anywhere, but after all other letters in Adaptive Server Enterprise.
The upper case and lower case Icelandic Thorn (\xD0 and \xF0) is sorted with T in Adaptive Server Anywhere, but after all other letters in Adaptive Server Enterprise.
The upper-case Y-diaresis (\x9F) is sorted with Y in Adaptive Server Anywhere, and case converts with lower-case Y-diaresis (\xFF). In Adaptive Server Enterprise it is undefined and sorts after \x9E
The lower case letter sharp s (\xDF) sorts with the lower case s in Adaptive Server Anywhere, but after ss in Adaptive Server Enterprise.
Ligatures are two characters combined into a single character. The ligatures corresponding to AE and ae (\xC6 and \xE6) sort after A and a respectively in Adaptive Server Anywhere, but after AE and ae in Adaptive Server Enterprise.
The ligatures corresponding to OE and oe (\x8C and \x9C) sort with O in Adaptive Server Anywhere, but after OE and oe in Adaptive Server Enterprise.
The upper case and lower case letter S with caron (\x8A and \x9A) sorts with S in Adaptive Server Anywhere, but is undefined in Adaptive Server Enterprise, sorting after \x89 and \x99.
The following table shows the built-in collations that correspond to OEM code pages. The table and the corresponding collations were derived from several manuals from IBM concerning National Language Support, subject to the restrictions mentioned above. (This table represents the best information available at the time of writing. Due to continuing rapid geopolitical changes, the table may contain names for countries that no longer exist).
Country |
Language |
Primary Code Page |
Primary Collation |
Secondary Code Page |
Secondary Collation |
---|---|---|---|---|---|
Argentina |
Spanish |
850 |
850ESP |
437 |
437ESP |
Australia |
English |
437 |
437LATIN1 |
850 |
850LATIN1 |
Austria |
German |
850 |
850LATIN1 |
437 |
437LATIN1 |
Belgium |
Belgian Dutch |
850 |
850LATIN1 |
437 |
437LATIN1 |
Belgium |
Belgian French |
850 |
850LATIN1 |
437 |
437LATIN1 |
Belarus |
Belarussian |
855 |
855CYR |
||
Brazil |
Portuguese |
850 |
850LATIN1 |
437 |
437LATIN1 |
Bulgaria |
Bulgarian |
855 |
855CYR |
850 |
850CYR |
Canada |
Cdn French |
850 |
850LATIN1 |
863 |
863LATIN1 |
Canada |
English |
437 |
437LATIN1 |
850 |
850LATIN1 |
Croatia |
Croatian |
852 |
852LATIN2 |
850 |
850LATIN2 |
Czech Republic |
Czech |
852 |
852LATIN2 |
850 |
850LATIN2 |
Denmark |
Danish |
850 |
850DAN |
||
Finland |
Finnish |
850 |
850SVE |
437 |
437SVE |
France |
French |
850 |
850LATIN1 |
437 |
437LATIN1 |
Germany |
German |
850 |
850LATIN1 |
437 |
437LATIN1 |
Greece |
Greek |
869 |
869ELL |
850 |
850ELL |
Hungary |
Hungarian |
852 |
852LATIN2 |
850 |
850LATIN2 |
Iceland |
Icelandic |
850 |
850ISL |
861 |
861ISL |
Ireland |
English |
850 |
850LATIN1 |
437 |
437LATIN1 |
Israel |
Hebrew |
862 |
862HEB |
856 |
856HEB |
Italy |
Italian |
850 |
850LATIN1 |
437 |
437LATIN1 |
Mexico |
Spanish |
850 |
850ESP |
437 |
437ESP |
Nether-lands |
Dutch |
850 |
850LATIN1 |
437 |
437LATIN1 |
New Zealand |
English |
437 |
437LATIN1 |
850 |
850LATIN1 |
Norway |
Norwegian |
865 |
865NOR |
850 |
850NOR |
Peru |
Spanish |
850 |
850ESP |
437 |
437ESP |
Poland |
Polish |
852 |
852LATIN2 |
850 |
850LATIN2 |
Portugal |
Portuguese |
850 |
850LATIN1 |
860 |
860LATIN1 |
Romania |
Romanian |
852 |
852LATIN2 |
850 |
850LATIN2 |
Russia |
Russian |
866 |
866RUS |
850 |
850RUS |
S. Africa |
Afrikaans |
437 |
437LATIN1 |
850 |
850LATIN1 |
S. Africa |
English |
437 |
437LATIN1 |
850 |
850LATIN1 |
Slovak Republic |
Slovakian |
852 |
852LATIN2 |
850 |
850LATIN2 |
Slovenia |
Slovenian |
852 |
852LATIN2 |
850 |
850LATIN2 |
Spain |
Spanish |
850 |
850ESP |
437 |
437ESP |
Sweden |
Swedish |
850 |
850SVE |
437 |
437SVE |
Switzer-land |
French |
850 |
850LATIN1 |
437 |
437LATIN1 |
Switzer-land |
German |
850 |
850LATIN1 |
437 |
437LATIN1 |
Switzer-land |
Italian |
850 |
850LATIN1 |
437 |
437LATIN1 |
Turkey |
Turkish |
857 |
857TRK |
850 |
850TRK |
UK |
English |
850 |
850LATIN1 |
437 |
437LATIN1 |
USA |
English |
437 |
437LATIN1 |
850 |
850LATIN1 |
Venezuela |
Spanish |
850 |
850ESP |
437 |
437ESP |
Yugoslavia |
Maced-onian |
852 |
852LATIN2 |
850 |
850LATIN2 |
Yugoslavia |
Serbian Cyrillic |
855 |
855CYR |
852 |
852CYR |
Yugoslavia |
Serbian Latin |
852 |
852LATIN2 |
850 |
850LATIN2 |
A user creating a new database should find the line with the country/language that they wish to use, then pick either the primary or secondary collation, depending on which code page is in use in their computer. (The chcp command will display the current code page number.) If their particular combination is not present, then another line with a satisfactory combination may be used, or a custom collation may be required.
Adaptive Server Anywhere is case preserving and case insensitive for identifiers, such as table names and column names. This means that the names are stored in the case in which they are created, but any access to the identifiers is done in a case-insensitive manner.
For example, the names of the system tables are held in upper case (SYSDOMAIN, SYSTABLE, and so on), but access is case insensitive, so that the two following statements are equivalent:
SELECT * FROM systable SELECT * FROM SYSTABLE
The equivalence of upper and lower case characters is enforced in the collation. There are some collations where particular care may be needed when assuming case insensitivity of identifiers.
In the Turkish 857TRK collation, the lower case i does not have the character I as its upper case equivalent. Therefore, despite the case insensitivity of identifiers, the following two statements are not equivalent in this collation:
SELECT * FROM sysdomain SELECT * FROM SYSDOMAIN
Adaptive Server Anywhere provides collations using several multibyte character sets, including the following:
Multibyte character set |
Description |
---|---|
SJIS |
Japanese Shift-JIS Encoding |
EUC_JAPAN |
Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding |
EUC_CHINA |
Chinese GB 2312-80 Encoding |
EUC_TAIWAN |
Taiwanese Big 5 Encoding |
EUC_KOREA |
Korean KS C 5601-1992 Encoding |
UTF8 |
UTF8 is a variable-byte encoding of 4-byte Unicode (UCS-4). UTF8 characters up to 4 bytes in length are supported, while UTF8 may use up to 6 bytes. |
For a complete listing, see Supplied collations.
This section describes how multibyte character sets are handled. The description applies to the supported collations and to any multibyte custom collations you may create.
Adaptive Server Anywhere supports variable length character sets. In these sets, some characters are represented by one byte, and some by more than one, to a maximum of four bytes. The value of the first byte in any character indicates the number of bytes used for that character, and also indicates whether the character is a space character, a digit, or an alphabetic (alpha) character.
Adaptive Server Anywhere does not support fixed-length multibyte character sets such as 2-byte Unicode or 4-byte Unicode.
As an example, characters in the Shift-JIS character set are of either one or two bytes in length. If the hex value of the first byte is in the range 81-9F or E0-EF (decimal values 129-159 or 224-239); the character is a two-byte character and the subsequent byte (called a follow byte) completes the character. If the first byte is outside this range, the character is a single-byte character and the next byte is the first byte of the following character.
The properties of any Shift-JIS character can be read from its first byte also. Characters with a first byte in the (hex) range 09 to 0D, or 20, are space characters.
Characters in the ranges 41 to 5A, 61 to 7A, 81 to 9F or E0 to EF are alphabetic (letters).
Characters in the range 30 to 39 are digits.
In building custom collations, you can specify which ranges of values for the first byte signify single- and double-byte (or more) characters, and which specify space, alpha, and digit characters. However, all first bytes of value less than 40 (hex 28) must be single-byte characters, and no follow bytes may have values less than 40. This restriction is satisfied by all known current encodings.