162306a36Sopenharmony_ciUnicode support 262306a36Sopenharmony_ci=============== 362306a36Sopenharmony_ci 462306a36Sopenharmony_ci Last update: 2005-01-17, version 1.4 562306a36Sopenharmony_ci 662306a36Sopenharmony_ciNote: The original version of this document, which was maintained at 762306a36Sopenharmony_cilanana.org as part of the Linux Assigned Names And Numbers Authority 862306a36Sopenharmony_ci(LANANA) project, is no longer existent. So, this version in the 962306a36Sopenharmony_cimainline Linux kernel is now the maintained main document. 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ciIntroduction 1262306a36Sopenharmony_ci------------ 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ciThe Linux kernel code has been rewritten to use Unicode to map 1562306a36Sopenharmony_cicharacters to fonts. By downloading a single Unicode-to-font table, 1662306a36Sopenharmony_ciboth the eight-bit character sets and UTF-8 mode are changed to use 1762306a36Sopenharmony_cithe font as indicated. 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciThis changes the semantics of the eight-bit character tables subtly. 2062306a36Sopenharmony_ciThe four character tables are now: 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ci=============== =============================== ================ 2362306a36Sopenharmony_ciMap symbol Map name Escape code (G0) 2462306a36Sopenharmony_ci=============== =============================== ================ 2562306a36Sopenharmony_ciLAT1_MAP Latin-1 (ISO 8859-1) ESC ( B 2662306a36Sopenharmony_ciGRAF_MAP DEC VT100 pseudographics ESC ( 0 2762306a36Sopenharmony_ciIBMPC_MAP IBM code page 437 ESC ( U 2862306a36Sopenharmony_ciUSER_MAP User defined ESC ( K 2962306a36Sopenharmony_ci=============== =============================== ================ 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciIn particular, ESC ( U is no longer "straight to font", since the font 3262306a36Sopenharmony_cimight be completely different than the IBM character set. This 3362306a36Sopenharmony_cipermits for example the use of block graphics even with a Latin-1 font 3462306a36Sopenharmony_ciloaded. 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ciNote that although these codes are similar to ISO 2022, neither the 3762306a36Sopenharmony_cicodes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and 3862306a36Sopenharmony_ciG1), whereas ISO 2022 has four 7-bit codes (G0-G3). 3962306a36Sopenharmony_ci 4062306a36Sopenharmony_ciIn accordance with the Unicode standard/ISO 10646 the range U+F000 to 4162306a36Sopenharmony_ciU+F8FF has been reserved for OS-wide allocation (the Unicode Standard 4262306a36Sopenharmony_cirefers to this as a "Corporate Zone", since this is inaccurate for 4362306a36Sopenharmony_ciLinux we call it the "Linux Zone"). U+F000 was picked as the starting 4462306a36Sopenharmony_cipoint since it lets the direct-mapping area start on a large power of 4562306a36Sopenharmony_citwo (in case 1024- or 2048-character fonts ever become necessary). 4662306a36Sopenharmony_ciThis leaves U+E000 to U+EFFF as End User Zone. 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ci[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been 4962306a36Sopenharmony_cihard-coded to map directly to the loaded font, bypassing the 5062306a36Sopenharmony_citranslation table. The user-defined map now defaults to U+F000 to 5162306a36Sopenharmony_ciU+F0FF, emulating the previous behaviour. In practice, this range 5262306a36Sopenharmony_cimight be shorter; for example, vgacon can only handle 256-character 5362306a36Sopenharmony_ci(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts. 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ci 5662306a36Sopenharmony_ciActual characters assigned in the Linux Zone 5762306a36Sopenharmony_ci-------------------------------------------- 5862306a36Sopenharmony_ci 5962306a36Sopenharmony_ciIn addition, the following characters not present in Unicode 1.1.4 6062306a36Sopenharmony_cihave been defined; these are used by the DEC VT graphics map. [v1.2] 6162306a36Sopenharmony_ciTHIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW. 6262306a36Sopenharmony_ci 6362306a36Sopenharmony_ci====== ====================================== 6462306a36Sopenharmony_ciU+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1 6562306a36Sopenharmony_ciU+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3 6662306a36Sopenharmony_ciU+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7 6762306a36Sopenharmony_ciU+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9 6862306a36Sopenharmony_ci====== ====================================== 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_ciThe DEC VT220 uses a 6x10 character matrix, and these characters form 7162306a36Sopenharmony_cia smooth progression in the DEC VT graphics character set. I have 7262306a36Sopenharmony_ciomitted the scan 5 line, since it is also used as a block-graphics 7362306a36Sopenharmony_cicharacter, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL. 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ci[v1.3]: These characters have been officially added to Unicode 3.2.0; 7662306a36Sopenharmony_cithey are added at U+23BA, U+23BB, U+23BC, U+23BD. Linux now uses the 7762306a36Sopenharmony_cinew values. 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ci[v1.2]: The following characters have been added to represent common 8062306a36Sopenharmony_cikeyboard symbols that are unlikely to ever be added to Unicode proper 8162306a36Sopenharmony_cisince they are horribly vendor-specific. This, of course, is an 8262306a36Sopenharmony_ciexcellent example of horrible design. 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ci====== ====================================== 8562306a36Sopenharmony_ciU+F810 KEYBOARD SYMBOL FLYING FLAG 8662306a36Sopenharmony_ciU+F811 KEYBOARD SYMBOL PULLDOWN MENU 8762306a36Sopenharmony_ciU+F812 KEYBOARD SYMBOL OPEN APPLE 8862306a36Sopenharmony_ciU+F813 KEYBOARD SYMBOL SOLID APPLE 8962306a36Sopenharmony_ci====== ====================================== 9062306a36Sopenharmony_ci 9162306a36Sopenharmony_ciKlingon language support 9262306a36Sopenharmony_ci------------------------ 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ciIn 1996, Linux was the first operating system in the world to add 9562306a36Sopenharmony_cisupport for the artificial language Klingon, created by Marc Okrand 9662306a36Sopenharmony_cifor the "Star Trek" television series. This encoding was later 9762306a36Sopenharmony_ciadopted by the ConScript Unicode Registry and proposed (but ultimately 9862306a36Sopenharmony_cirejected) for inclusion in Unicode Plane 1. Thus, it remains as a 9962306a36Sopenharmony_ciLinux/CSUR private assignment in the Linux Zone. 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ciThis encoding has been endorsed by the Klingon Language Institute. 10262306a36Sopenharmony_ciFor more information, contact them at: 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ci http://www.kli.org/ 10562306a36Sopenharmony_ci 10662306a36Sopenharmony_ciSince the characters in the beginning of the Linux CZ have been more 10762306a36Sopenharmony_ciof the dingbats/symbols/forms type and this is a language, I have 10862306a36Sopenharmony_cilocated it at the end, on a 16-cell boundary in keeping with standard 10962306a36Sopenharmony_ciUnicode practice. 11062306a36Sopenharmony_ci 11162306a36Sopenharmony_ci.. note:: 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ci This range is now officially managed by the ConScript Unicode 11462306a36Sopenharmony_ci Registry. The normative reference is at: 11562306a36Sopenharmony_ci 11662306a36Sopenharmony_ci https://www.evertype.com/standards/csur/klingon.html 11762306a36Sopenharmony_ci 11862306a36Sopenharmony_ciKlingon has an alphabet of 26 characters, a positional numeric writing 11962306a36Sopenharmony_cisystem with 10 digits, and is written left-to-right, top-to-bottom. 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ciSeveral glyph forms for the Klingon alphabet have been proposed. 12262306a36Sopenharmony_ciHowever, since the set of symbols appear to be consistent throughout, 12362306a36Sopenharmony_ciwith only the actual shapes being different, in keeping with standard 12462306a36Sopenharmony_ciUnicode practice these differences are considered font variants. 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ci====== ======================================================= 12762306a36Sopenharmony_ciU+F8D0 KLINGON LETTER A 12862306a36Sopenharmony_ciU+F8D1 KLINGON LETTER B 12962306a36Sopenharmony_ciU+F8D2 KLINGON LETTER CH 13062306a36Sopenharmony_ciU+F8D3 KLINGON LETTER D 13162306a36Sopenharmony_ciU+F8D4 KLINGON LETTER E 13262306a36Sopenharmony_ciU+F8D5 KLINGON LETTER GH 13362306a36Sopenharmony_ciU+F8D6 KLINGON LETTER H 13462306a36Sopenharmony_ciU+F8D7 KLINGON LETTER I 13562306a36Sopenharmony_ciU+F8D8 KLINGON LETTER J 13662306a36Sopenharmony_ciU+F8D9 KLINGON LETTER L 13762306a36Sopenharmony_ciU+F8DA KLINGON LETTER M 13862306a36Sopenharmony_ciU+F8DB KLINGON LETTER N 13962306a36Sopenharmony_ciU+F8DC KLINGON LETTER NG 14062306a36Sopenharmony_ciU+F8DD KLINGON LETTER O 14162306a36Sopenharmony_ciU+F8DE KLINGON LETTER P 14262306a36Sopenharmony_ciU+F8DF KLINGON LETTER Q 14362306a36Sopenharmony_ci - Written <q> in standard Okrand Latin transliteration 14462306a36Sopenharmony_ciU+F8E0 KLINGON LETTER QH 14562306a36Sopenharmony_ci - Written <Q> in standard Okrand Latin transliteration 14662306a36Sopenharmony_ciU+F8E1 KLINGON LETTER R 14762306a36Sopenharmony_ciU+F8E2 KLINGON LETTER S 14862306a36Sopenharmony_ciU+F8E3 KLINGON LETTER T 14962306a36Sopenharmony_ciU+F8E4 KLINGON LETTER TLH 15062306a36Sopenharmony_ciU+F8E5 KLINGON LETTER U 15162306a36Sopenharmony_ciU+F8E6 KLINGON LETTER V 15262306a36Sopenharmony_ciU+F8E7 KLINGON LETTER W 15362306a36Sopenharmony_ciU+F8E8 KLINGON LETTER Y 15462306a36Sopenharmony_ciU+F8E9 KLINGON LETTER GLOTTAL STOP 15562306a36Sopenharmony_ci 15662306a36Sopenharmony_ciU+F8F0 KLINGON DIGIT ZERO 15762306a36Sopenharmony_ciU+F8F1 KLINGON DIGIT ONE 15862306a36Sopenharmony_ciU+F8F2 KLINGON DIGIT TWO 15962306a36Sopenharmony_ciU+F8F3 KLINGON DIGIT THREE 16062306a36Sopenharmony_ciU+F8F4 KLINGON DIGIT FOUR 16162306a36Sopenharmony_ciU+F8F5 KLINGON DIGIT FIVE 16262306a36Sopenharmony_ciU+F8F6 KLINGON DIGIT SIX 16362306a36Sopenharmony_ciU+F8F7 KLINGON DIGIT SEVEN 16462306a36Sopenharmony_ciU+F8F8 KLINGON DIGIT EIGHT 16562306a36Sopenharmony_ciU+F8F9 KLINGON DIGIT NINE 16662306a36Sopenharmony_ci 16762306a36Sopenharmony_ciU+F8FD KLINGON COMMA 16862306a36Sopenharmony_ciU+F8FE KLINGON FULL STOP 16962306a36Sopenharmony_ciU+F8FF KLINGON SYMBOL FOR EMPIRE 17062306a36Sopenharmony_ci====== ======================================================= 17162306a36Sopenharmony_ci 17262306a36Sopenharmony_ciOther Fictional and Artificial Scripts 17362306a36Sopenharmony_ci-------------------------------------- 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ciSince the assignment of the Klingon Linux Unicode block, a registry of 17662306a36Sopenharmony_cifictional and artificial scripts has been established by John Cowan 17762306a36Sopenharmony_ci<jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>. 17862306a36Sopenharmony_ciThe ConScript Unicode Registry is accessible at: 17962306a36Sopenharmony_ci 18062306a36Sopenharmony_ci https://www.evertype.com/standards/csur/ 18162306a36Sopenharmony_ci 18262306a36Sopenharmony_ciThe ranges used fall at the low end of the End User Zone and can hence 18362306a36Sopenharmony_cinot be normatively assigned, but it is recommended that people who 18462306a36Sopenharmony_ciwish to encode fictional scripts use these codes, in the interest of 18562306a36Sopenharmony_ciinteroperability. For Klingon, CSUR has adopted the Linux encoding. 18662306a36Sopenharmony_ciThe CSUR people are driving adding Tengwar and Cirth into Unicode 18762306a36Sopenharmony_ciPlane 1; the addition of Klingon to Unicode Plane 1 has been rejected 18862306a36Sopenharmony_ciand so the above encoding remains official. 189