18c2ecf20Sopenharmony_ciUnicode support 28c2ecf20Sopenharmony_ci=============== 38c2ecf20Sopenharmony_ci 48c2ecf20Sopenharmony_ci Last update: 2005-01-17, version 1.4 58c2ecf20Sopenharmony_ci 68c2ecf20Sopenharmony_ciThis file is maintained by H. Peter Anvin <unicode@lanana.org> as part 78c2ecf20Sopenharmony_ciof the Linux Assigned Names And Numbers Authority (LANANA) project. 88c2ecf20Sopenharmony_ciThe current version can be found at: 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ci http://www.lanana.org/docs/unicode/admin-guide/unicode.rst 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ciIntroduction 138c2ecf20Sopenharmony_ci------------ 148c2ecf20Sopenharmony_ci 158c2ecf20Sopenharmony_ciThe Linux kernel code has been rewritten to use Unicode to map 168c2ecf20Sopenharmony_cicharacters to fonts. By downloading a single Unicode-to-font table, 178c2ecf20Sopenharmony_ciboth the eight-bit character sets and UTF-8 mode are changed to use 188c2ecf20Sopenharmony_cithe font as indicated. 198c2ecf20Sopenharmony_ci 208c2ecf20Sopenharmony_ciThis changes the semantics of the eight-bit character tables subtly. 218c2ecf20Sopenharmony_ciThe four character tables are now: 228c2ecf20Sopenharmony_ci 238c2ecf20Sopenharmony_ci=============== =============================== ================ 248c2ecf20Sopenharmony_ciMap symbol Map name Escape code (G0) 258c2ecf20Sopenharmony_ci=============== =============================== ================ 268c2ecf20Sopenharmony_ciLAT1_MAP Latin-1 (ISO 8859-1) ESC ( B 278c2ecf20Sopenharmony_ciGRAF_MAP DEC VT100 pseudographics ESC ( 0 288c2ecf20Sopenharmony_ciIBMPC_MAP IBM code page 437 ESC ( U 298c2ecf20Sopenharmony_ciUSER_MAP User defined ESC ( K 308c2ecf20Sopenharmony_ci=============== =============================== ================ 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_ciIn particular, ESC ( U is no longer "straight to font", since the font 338c2ecf20Sopenharmony_cimight be completely different than the IBM character set. This 348c2ecf20Sopenharmony_cipermits for example the use of block graphics even with a Latin-1 font 358c2ecf20Sopenharmony_ciloaded. 368c2ecf20Sopenharmony_ci 378c2ecf20Sopenharmony_ciNote that although these codes are similar to ISO 2022, neither the 388c2ecf20Sopenharmony_cicodes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and 398c2ecf20Sopenharmony_ciG1), whereas ISO 2022 has four 7-bit codes (G0-G3). 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_ciIn accordance with the Unicode standard/ISO 10646 the range U+F000 to 428c2ecf20Sopenharmony_ciU+F8FF has been reserved for OS-wide allocation (the Unicode Standard 438c2ecf20Sopenharmony_cirefers to this as a "Corporate Zone", since this is inaccurate for 448c2ecf20Sopenharmony_ciLinux we call it the "Linux Zone"). U+F000 was picked as the starting 458c2ecf20Sopenharmony_cipoint since it lets the direct-mapping area start on a large power of 468c2ecf20Sopenharmony_citwo (in case 1024- or 2048-character fonts ever become necessary). 478c2ecf20Sopenharmony_ciThis leaves U+E000 to U+EFFF as End User Zone. 488c2ecf20Sopenharmony_ci 498c2ecf20Sopenharmony_ci[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been 508c2ecf20Sopenharmony_cihard-coded to map directly to the loaded font, bypassing the 518c2ecf20Sopenharmony_citranslation table. The user-defined map now defaults to U+F000 to 528c2ecf20Sopenharmony_ciU+F0FF, emulating the previous behaviour. In practice, this range 538c2ecf20Sopenharmony_cimight be shorter; for example, vgacon can only handle 256-character 548c2ecf20Sopenharmony_ci(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts. 558c2ecf20Sopenharmony_ci 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ciActual characters assigned in the Linux Zone 588c2ecf20Sopenharmony_ci-------------------------------------------- 598c2ecf20Sopenharmony_ci 608c2ecf20Sopenharmony_ciIn addition, the following characters not present in Unicode 1.1.4 618c2ecf20Sopenharmony_cihave been defined; these are used by the DEC VT graphics map. [v1.2] 628c2ecf20Sopenharmony_ciTHIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW. 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ci====== ====================================== 658c2ecf20Sopenharmony_ciU+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1 668c2ecf20Sopenharmony_ciU+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3 678c2ecf20Sopenharmony_ciU+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7 688c2ecf20Sopenharmony_ciU+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9 698c2ecf20Sopenharmony_ci====== ====================================== 708c2ecf20Sopenharmony_ci 718c2ecf20Sopenharmony_ciThe DEC VT220 uses a 6x10 character matrix, and these characters form 728c2ecf20Sopenharmony_cia smooth progression in the DEC VT graphics character set. I have 738c2ecf20Sopenharmony_ciomitted the scan 5 line, since it is also used as a block-graphics 748c2ecf20Sopenharmony_cicharacter, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL. 758c2ecf20Sopenharmony_ci 768c2ecf20Sopenharmony_ci[v1.3]: These characters have been officially added to Unicode 3.2.0; 778c2ecf20Sopenharmony_cithey are added at U+23BA, U+23BB, U+23BC, U+23BD. Linux now uses the 788c2ecf20Sopenharmony_cinew values. 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ci[v1.2]: The following characters have been added to represent common 818c2ecf20Sopenharmony_cikeyboard symbols that are unlikely to ever be added to Unicode proper 828c2ecf20Sopenharmony_cisince they are horribly vendor-specific. This, of course, is an 838c2ecf20Sopenharmony_ciexcellent example of horrible design. 848c2ecf20Sopenharmony_ci 858c2ecf20Sopenharmony_ci====== ====================================== 868c2ecf20Sopenharmony_ciU+F810 KEYBOARD SYMBOL FLYING FLAG 878c2ecf20Sopenharmony_ciU+F811 KEYBOARD SYMBOL PULLDOWN MENU 888c2ecf20Sopenharmony_ciU+F812 KEYBOARD SYMBOL OPEN APPLE 898c2ecf20Sopenharmony_ciU+F813 KEYBOARD SYMBOL SOLID APPLE 908c2ecf20Sopenharmony_ci====== ====================================== 918c2ecf20Sopenharmony_ci 928c2ecf20Sopenharmony_ciKlingon language support 938c2ecf20Sopenharmony_ci------------------------ 948c2ecf20Sopenharmony_ci 958c2ecf20Sopenharmony_ciIn 1996, Linux was the first operating system in the world to add 968c2ecf20Sopenharmony_cisupport for the artificial language Klingon, created by Marc Okrand 978c2ecf20Sopenharmony_cifor the "Star Trek" television series. This encoding was later 988c2ecf20Sopenharmony_ciadopted by the ConScript Unicode Registry and proposed (but ultimately 998c2ecf20Sopenharmony_cirejected) for inclusion in Unicode Plane 1. Thus, it remains as a 1008c2ecf20Sopenharmony_ciLinux/CSUR private assignment in the Linux Zone. 1018c2ecf20Sopenharmony_ci 1028c2ecf20Sopenharmony_ciThis encoding has been endorsed by the Klingon Language Institute. 1038c2ecf20Sopenharmony_ciFor more information, contact them at: 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ci http://www.kli.org/ 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ciSince the characters in the beginning of the Linux CZ have been more 1088c2ecf20Sopenharmony_ciof the dingbats/symbols/forms type and this is a language, I have 1098c2ecf20Sopenharmony_cilocated it at the end, on a 16-cell boundary in keeping with standard 1108c2ecf20Sopenharmony_ciUnicode practice. 1118c2ecf20Sopenharmony_ci 1128c2ecf20Sopenharmony_ci.. note:: 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_ci This range is now officially managed by the ConScript Unicode 1158c2ecf20Sopenharmony_ci Registry. The normative reference is at: 1168c2ecf20Sopenharmony_ci 1178c2ecf20Sopenharmony_ci https://www.evertype.com/standards/csur/klingon.html 1188c2ecf20Sopenharmony_ci 1198c2ecf20Sopenharmony_ciKlingon has an alphabet of 26 characters, a positional numeric writing 1208c2ecf20Sopenharmony_cisystem with 10 digits, and is written left-to-right, top-to-bottom. 1218c2ecf20Sopenharmony_ci 1228c2ecf20Sopenharmony_ciSeveral glyph forms for the Klingon alphabet have been proposed. 1238c2ecf20Sopenharmony_ciHowever, since the set of symbols appear to be consistent throughout, 1248c2ecf20Sopenharmony_ciwith only the actual shapes being different, in keeping with standard 1258c2ecf20Sopenharmony_ciUnicode practice these differences are considered font variants. 1268c2ecf20Sopenharmony_ci 1278c2ecf20Sopenharmony_ci====== ======================================================= 1288c2ecf20Sopenharmony_ciU+F8D0 KLINGON LETTER A 1298c2ecf20Sopenharmony_ciU+F8D1 KLINGON LETTER B 1308c2ecf20Sopenharmony_ciU+F8D2 KLINGON LETTER CH 1318c2ecf20Sopenharmony_ciU+F8D3 KLINGON LETTER D 1328c2ecf20Sopenharmony_ciU+F8D4 KLINGON LETTER E 1338c2ecf20Sopenharmony_ciU+F8D5 KLINGON LETTER GH 1348c2ecf20Sopenharmony_ciU+F8D6 KLINGON LETTER H 1358c2ecf20Sopenharmony_ciU+F8D7 KLINGON LETTER I 1368c2ecf20Sopenharmony_ciU+F8D8 KLINGON LETTER J 1378c2ecf20Sopenharmony_ciU+F8D9 KLINGON LETTER L 1388c2ecf20Sopenharmony_ciU+F8DA KLINGON LETTER M 1398c2ecf20Sopenharmony_ciU+F8DB KLINGON LETTER N 1408c2ecf20Sopenharmony_ciU+F8DC KLINGON LETTER NG 1418c2ecf20Sopenharmony_ciU+F8DD KLINGON LETTER O 1428c2ecf20Sopenharmony_ciU+F8DE KLINGON LETTER P 1438c2ecf20Sopenharmony_ciU+F8DF KLINGON LETTER Q 1448c2ecf20Sopenharmony_ci - Written <q> in standard Okrand Latin transliteration 1458c2ecf20Sopenharmony_ciU+F8E0 KLINGON LETTER QH 1468c2ecf20Sopenharmony_ci - Written <Q> in standard Okrand Latin transliteration 1478c2ecf20Sopenharmony_ciU+F8E1 KLINGON LETTER R 1488c2ecf20Sopenharmony_ciU+F8E2 KLINGON LETTER S 1498c2ecf20Sopenharmony_ciU+F8E3 KLINGON LETTER T 1508c2ecf20Sopenharmony_ciU+F8E4 KLINGON LETTER TLH 1518c2ecf20Sopenharmony_ciU+F8E5 KLINGON LETTER U 1528c2ecf20Sopenharmony_ciU+F8E6 KLINGON LETTER V 1538c2ecf20Sopenharmony_ciU+F8E7 KLINGON LETTER W 1548c2ecf20Sopenharmony_ciU+F8E8 KLINGON LETTER Y 1558c2ecf20Sopenharmony_ciU+F8E9 KLINGON LETTER GLOTTAL STOP 1568c2ecf20Sopenharmony_ci 1578c2ecf20Sopenharmony_ciU+F8F0 KLINGON DIGIT ZERO 1588c2ecf20Sopenharmony_ciU+F8F1 KLINGON DIGIT ONE 1598c2ecf20Sopenharmony_ciU+F8F2 KLINGON DIGIT TWO 1608c2ecf20Sopenharmony_ciU+F8F3 KLINGON DIGIT THREE 1618c2ecf20Sopenharmony_ciU+F8F4 KLINGON DIGIT FOUR 1628c2ecf20Sopenharmony_ciU+F8F5 KLINGON DIGIT FIVE 1638c2ecf20Sopenharmony_ciU+F8F6 KLINGON DIGIT SIX 1648c2ecf20Sopenharmony_ciU+F8F7 KLINGON DIGIT SEVEN 1658c2ecf20Sopenharmony_ciU+F8F8 KLINGON DIGIT EIGHT 1668c2ecf20Sopenharmony_ciU+F8F9 KLINGON DIGIT NINE 1678c2ecf20Sopenharmony_ci 1688c2ecf20Sopenharmony_ciU+F8FD KLINGON COMMA 1698c2ecf20Sopenharmony_ciU+F8FE KLINGON FULL STOP 1708c2ecf20Sopenharmony_ciU+F8FF KLINGON SYMBOL FOR EMPIRE 1718c2ecf20Sopenharmony_ci====== ======================================================= 1728c2ecf20Sopenharmony_ci 1738c2ecf20Sopenharmony_ciOther Fictional and Artificial Scripts 1748c2ecf20Sopenharmony_ci-------------------------------------- 1758c2ecf20Sopenharmony_ci 1768c2ecf20Sopenharmony_ciSince the assignment of the Klingon Linux Unicode block, a registry of 1778c2ecf20Sopenharmony_cifictional and artificial scripts has been established by John Cowan 1788c2ecf20Sopenharmony_ci<jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>. 1798c2ecf20Sopenharmony_ciThe ConScript Unicode Registry is accessible at: 1808c2ecf20Sopenharmony_ci 1818c2ecf20Sopenharmony_ci https://www.evertype.com/standards/csur/ 1828c2ecf20Sopenharmony_ci 1838c2ecf20Sopenharmony_ciThe ranges used fall at the low end of the End User Zone and can hence 1848c2ecf20Sopenharmony_cinot be normatively assigned, but it is recommended that people who 1858c2ecf20Sopenharmony_ciwish to encode fictional scripts use these codes, in the interest of 1868c2ecf20Sopenharmony_ciinteroperability. For Klingon, CSUR has adopted the Linux encoding. 1878c2ecf20Sopenharmony_ciThe CSUR people are driving adding Tengwar and Cirth into Unicode 1888c2ecf20Sopenharmony_ciPlane 1; the addition of Klingon to Unicode Plane 1 has been rejected 1898c2ecf20Sopenharmony_ciand so the above encoding remains official. 190