18c2ecf20Sopenharmony_ciUnicode support
28c2ecf20Sopenharmony_ci===============
38c2ecf20Sopenharmony_ci
48c2ecf20Sopenharmony_ci		 Last update: 2005-01-17, version 1.4
58c2ecf20Sopenharmony_ci
68c2ecf20Sopenharmony_ciThis file is maintained by H. Peter Anvin <unicode@lanana.org> as part
78c2ecf20Sopenharmony_ciof the Linux Assigned Names And Numbers Authority (LANANA) project.
88c2ecf20Sopenharmony_ciThe current version can be found at:
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ci	    http://www.lanana.org/docs/unicode/admin-guide/unicode.rst
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciIntroduction
138c2ecf20Sopenharmony_ci------------
148c2ecf20Sopenharmony_ci
158c2ecf20Sopenharmony_ciThe Linux kernel code has been rewritten to use Unicode to map
168c2ecf20Sopenharmony_cicharacters to fonts.  By downloading a single Unicode-to-font table,
178c2ecf20Sopenharmony_ciboth the eight-bit character sets and UTF-8 mode are changed to use
188c2ecf20Sopenharmony_cithe font as indicated.
198c2ecf20Sopenharmony_ci
208c2ecf20Sopenharmony_ciThis changes the semantics of the eight-bit character tables subtly.
218c2ecf20Sopenharmony_ciThe four character tables are now:
228c2ecf20Sopenharmony_ci
238c2ecf20Sopenharmony_ci=============== =============================== ================
248c2ecf20Sopenharmony_ciMap symbol	Map name			Escape code (G0)
258c2ecf20Sopenharmony_ci=============== =============================== ================
268c2ecf20Sopenharmony_ciLAT1_MAP	Latin-1 (ISO 8859-1)		ESC ( B
278c2ecf20Sopenharmony_ciGRAF_MAP	DEC VT100 pseudographics	ESC ( 0
288c2ecf20Sopenharmony_ciIBMPC_MAP	IBM code page 437		ESC ( U
298c2ecf20Sopenharmony_ciUSER_MAP	User defined			ESC ( K
308c2ecf20Sopenharmony_ci=============== =============================== ================
318c2ecf20Sopenharmony_ci
328c2ecf20Sopenharmony_ciIn particular, ESC ( U is no longer "straight to font", since the font
338c2ecf20Sopenharmony_cimight be completely different than the IBM character set.  This
348c2ecf20Sopenharmony_cipermits for example the use of block graphics even with a Latin-1 font
358c2ecf20Sopenharmony_ciloaded.
368c2ecf20Sopenharmony_ci
378c2ecf20Sopenharmony_ciNote that although these codes are similar to ISO 2022, neither the
388c2ecf20Sopenharmony_cicodes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and
398c2ecf20Sopenharmony_ciG1), whereas ISO 2022 has four 7-bit codes (G0-G3).
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_ciIn accordance with the Unicode standard/ISO 10646 the range U+F000 to
428c2ecf20Sopenharmony_ciU+F8FF has been reserved for OS-wide allocation (the Unicode Standard
438c2ecf20Sopenharmony_cirefers to this as a "Corporate Zone", since this is inaccurate for
448c2ecf20Sopenharmony_ciLinux we call it the "Linux Zone").  U+F000 was picked as the starting
458c2ecf20Sopenharmony_cipoint since it lets the direct-mapping area start on a large power of
468c2ecf20Sopenharmony_citwo (in case 1024- or 2048-character fonts ever become necessary).
478c2ecf20Sopenharmony_ciThis leaves U+E000 to U+EFFF as End User Zone.
488c2ecf20Sopenharmony_ci
498c2ecf20Sopenharmony_ci[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been
508c2ecf20Sopenharmony_cihard-coded to map directly to the loaded font, bypassing the
518c2ecf20Sopenharmony_citranslation table.  The user-defined map now defaults to U+F000 to
528c2ecf20Sopenharmony_ciU+F0FF, emulating the previous behaviour.  In practice, this range
538c2ecf20Sopenharmony_cimight be shorter; for example, vgacon can only handle 256-character
548c2ecf20Sopenharmony_ci(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts.
558c2ecf20Sopenharmony_ci
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ciActual characters assigned in the Linux Zone
588c2ecf20Sopenharmony_ci--------------------------------------------
598c2ecf20Sopenharmony_ci
608c2ecf20Sopenharmony_ciIn addition, the following characters not present in Unicode 1.1.4
618c2ecf20Sopenharmony_cihave been defined; these are used by the DEC VT graphics map.  [v1.2]
628c2ecf20Sopenharmony_ciTHIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ci====== ======================================
658c2ecf20Sopenharmony_ciU+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
668c2ecf20Sopenharmony_ciU+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
678c2ecf20Sopenharmony_ciU+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
688c2ecf20Sopenharmony_ciU+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
698c2ecf20Sopenharmony_ci====== ======================================
708c2ecf20Sopenharmony_ci
718c2ecf20Sopenharmony_ciThe DEC VT220 uses a 6x10 character matrix, and these characters form
728c2ecf20Sopenharmony_cia smooth progression in the DEC VT graphics character set.  I have
738c2ecf20Sopenharmony_ciomitted the scan 5 line, since it is also used as a block-graphics
748c2ecf20Sopenharmony_cicharacter, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
758c2ecf20Sopenharmony_ci
768c2ecf20Sopenharmony_ci[v1.3]: These characters have been officially added to Unicode 3.2.0;
778c2ecf20Sopenharmony_cithey are added at U+23BA, U+23BB, U+23BC, U+23BD.  Linux now uses the
788c2ecf20Sopenharmony_cinew values.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ci[v1.2]: The following characters have been added to represent common
818c2ecf20Sopenharmony_cikeyboard symbols that are unlikely to ever be added to Unicode proper
828c2ecf20Sopenharmony_cisince they are horribly vendor-specific.  This, of course, is an
838c2ecf20Sopenharmony_ciexcellent example of horrible design.
848c2ecf20Sopenharmony_ci
858c2ecf20Sopenharmony_ci====== ======================================
868c2ecf20Sopenharmony_ciU+F810 KEYBOARD SYMBOL FLYING FLAG
878c2ecf20Sopenharmony_ciU+F811 KEYBOARD SYMBOL PULLDOWN MENU
888c2ecf20Sopenharmony_ciU+F812 KEYBOARD SYMBOL OPEN APPLE
898c2ecf20Sopenharmony_ciU+F813 KEYBOARD SYMBOL SOLID APPLE
908c2ecf20Sopenharmony_ci====== ======================================
918c2ecf20Sopenharmony_ci
928c2ecf20Sopenharmony_ciKlingon language support
938c2ecf20Sopenharmony_ci------------------------
948c2ecf20Sopenharmony_ci
958c2ecf20Sopenharmony_ciIn 1996, Linux was the first operating system in the world to add
968c2ecf20Sopenharmony_cisupport for the artificial language Klingon, created by Marc Okrand
978c2ecf20Sopenharmony_cifor the "Star Trek" television series.	This encoding was later
988c2ecf20Sopenharmony_ciadopted by the ConScript Unicode Registry and proposed (but ultimately
998c2ecf20Sopenharmony_cirejected) for inclusion in Unicode Plane 1.  Thus, it remains as a
1008c2ecf20Sopenharmony_ciLinux/CSUR private assignment in the Linux Zone.
1018c2ecf20Sopenharmony_ci
1028c2ecf20Sopenharmony_ciThis encoding has been endorsed by the Klingon Language Institute.
1038c2ecf20Sopenharmony_ciFor more information, contact them at:
1048c2ecf20Sopenharmony_ci
1058c2ecf20Sopenharmony_ci	http://www.kli.org/
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ciSince the characters in the beginning of the Linux CZ have been more
1088c2ecf20Sopenharmony_ciof the dingbats/symbols/forms type and this is a language, I have
1098c2ecf20Sopenharmony_cilocated it at the end, on a 16-cell boundary in keeping with standard
1108c2ecf20Sopenharmony_ciUnicode practice.
1118c2ecf20Sopenharmony_ci
1128c2ecf20Sopenharmony_ci.. note::
1138c2ecf20Sopenharmony_ci
1148c2ecf20Sopenharmony_ci  This range is now officially managed by the ConScript Unicode
1158c2ecf20Sopenharmony_ci  Registry.  The normative reference is at:
1168c2ecf20Sopenharmony_ci
1178c2ecf20Sopenharmony_ci	https://www.evertype.com/standards/csur/klingon.html
1188c2ecf20Sopenharmony_ci
1198c2ecf20Sopenharmony_ciKlingon has an alphabet of 26 characters, a positional numeric writing
1208c2ecf20Sopenharmony_cisystem with 10 digits, and is written left-to-right, top-to-bottom.
1218c2ecf20Sopenharmony_ci
1228c2ecf20Sopenharmony_ciSeveral glyph forms for the Klingon alphabet have been proposed.
1238c2ecf20Sopenharmony_ciHowever, since the set of symbols appear to be consistent throughout,
1248c2ecf20Sopenharmony_ciwith only the actual shapes being different, in keeping with standard
1258c2ecf20Sopenharmony_ciUnicode practice these differences are considered font variants.
1268c2ecf20Sopenharmony_ci
1278c2ecf20Sopenharmony_ci======	=======================================================
1288c2ecf20Sopenharmony_ciU+F8D0	KLINGON LETTER A
1298c2ecf20Sopenharmony_ciU+F8D1	KLINGON LETTER B
1308c2ecf20Sopenharmony_ciU+F8D2	KLINGON LETTER CH
1318c2ecf20Sopenharmony_ciU+F8D3	KLINGON LETTER D
1328c2ecf20Sopenharmony_ciU+F8D4	KLINGON LETTER E
1338c2ecf20Sopenharmony_ciU+F8D5	KLINGON LETTER GH
1348c2ecf20Sopenharmony_ciU+F8D6	KLINGON LETTER H
1358c2ecf20Sopenharmony_ciU+F8D7	KLINGON LETTER I
1368c2ecf20Sopenharmony_ciU+F8D8	KLINGON LETTER J
1378c2ecf20Sopenharmony_ciU+F8D9	KLINGON LETTER L
1388c2ecf20Sopenharmony_ciU+F8DA	KLINGON LETTER M
1398c2ecf20Sopenharmony_ciU+F8DB	KLINGON LETTER N
1408c2ecf20Sopenharmony_ciU+F8DC	KLINGON LETTER NG
1418c2ecf20Sopenharmony_ciU+F8DD	KLINGON LETTER O
1428c2ecf20Sopenharmony_ciU+F8DE	KLINGON LETTER P
1438c2ecf20Sopenharmony_ciU+F8DF	KLINGON LETTER Q
1448c2ecf20Sopenharmony_ci	- Written <q> in standard Okrand Latin transliteration
1458c2ecf20Sopenharmony_ciU+F8E0	KLINGON LETTER QH
1468c2ecf20Sopenharmony_ci	- Written <Q> in standard Okrand Latin transliteration
1478c2ecf20Sopenharmony_ciU+F8E1	KLINGON LETTER R
1488c2ecf20Sopenharmony_ciU+F8E2	KLINGON LETTER S
1498c2ecf20Sopenharmony_ciU+F8E3	KLINGON LETTER T
1508c2ecf20Sopenharmony_ciU+F8E4	KLINGON LETTER TLH
1518c2ecf20Sopenharmony_ciU+F8E5	KLINGON LETTER U
1528c2ecf20Sopenharmony_ciU+F8E6	KLINGON LETTER V
1538c2ecf20Sopenharmony_ciU+F8E7	KLINGON LETTER W
1548c2ecf20Sopenharmony_ciU+F8E8	KLINGON LETTER Y
1558c2ecf20Sopenharmony_ciU+F8E9	KLINGON LETTER GLOTTAL STOP
1568c2ecf20Sopenharmony_ci
1578c2ecf20Sopenharmony_ciU+F8F0	KLINGON DIGIT ZERO
1588c2ecf20Sopenharmony_ciU+F8F1	KLINGON DIGIT ONE
1598c2ecf20Sopenharmony_ciU+F8F2	KLINGON DIGIT TWO
1608c2ecf20Sopenharmony_ciU+F8F3	KLINGON DIGIT THREE
1618c2ecf20Sopenharmony_ciU+F8F4	KLINGON DIGIT FOUR
1628c2ecf20Sopenharmony_ciU+F8F5	KLINGON DIGIT FIVE
1638c2ecf20Sopenharmony_ciU+F8F6	KLINGON DIGIT SIX
1648c2ecf20Sopenharmony_ciU+F8F7	KLINGON DIGIT SEVEN
1658c2ecf20Sopenharmony_ciU+F8F8	KLINGON DIGIT EIGHT
1668c2ecf20Sopenharmony_ciU+F8F9	KLINGON DIGIT NINE
1678c2ecf20Sopenharmony_ci
1688c2ecf20Sopenharmony_ciU+F8FD	KLINGON COMMA
1698c2ecf20Sopenharmony_ciU+F8FE	KLINGON FULL STOP
1708c2ecf20Sopenharmony_ciU+F8FF	KLINGON SYMBOL FOR EMPIRE
1718c2ecf20Sopenharmony_ci======	=======================================================
1728c2ecf20Sopenharmony_ci
1738c2ecf20Sopenharmony_ciOther Fictional and Artificial Scripts
1748c2ecf20Sopenharmony_ci--------------------------------------
1758c2ecf20Sopenharmony_ci
1768c2ecf20Sopenharmony_ciSince the assignment of the Klingon Linux Unicode block, a registry of
1778c2ecf20Sopenharmony_cifictional and artificial scripts has been established by John Cowan
1788c2ecf20Sopenharmony_ci<jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>.
1798c2ecf20Sopenharmony_ciThe ConScript Unicode Registry is accessible at:
1808c2ecf20Sopenharmony_ci
1818c2ecf20Sopenharmony_ci	  https://www.evertype.com/standards/csur/
1828c2ecf20Sopenharmony_ci
1838c2ecf20Sopenharmony_ciThe ranges used fall at the low end of the End User Zone and can hence
1848c2ecf20Sopenharmony_cinot be normatively assigned, but it is recommended that people who
1858c2ecf20Sopenharmony_ciwish to encode fictional scripts use these codes, in the interest of
1868c2ecf20Sopenharmony_ciinteroperability.  For Klingon, CSUR has adopted the Linux encoding.
1878c2ecf20Sopenharmony_ciThe CSUR people are driving adding Tengwar and Cirth into Unicode
1888c2ecf20Sopenharmony_ciPlane 1; the addition of Klingon to Unicode Plane 1 has been rejected
1898c2ecf20Sopenharmony_ciand so the above encoding remains official.
190