162306a36Sopenharmony_ciUnicode support
262306a36Sopenharmony_ci===============
362306a36Sopenharmony_ci
462306a36Sopenharmony_ci		 Last update: 2005-01-17, version 1.4
562306a36Sopenharmony_ci
662306a36Sopenharmony_ciNote: The original version of this document, which was maintained at
762306a36Sopenharmony_cilanana.org as part of the Linux Assigned Names And Numbers Authority
862306a36Sopenharmony_ci(LANANA) project, is no longer existent.  So, this version in the
962306a36Sopenharmony_cimainline Linux kernel is now the maintained main document.
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ciIntroduction
1262306a36Sopenharmony_ci------------
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_ciThe Linux kernel code has been rewritten to use Unicode to map
1562306a36Sopenharmony_cicharacters to fonts.  By downloading a single Unicode-to-font table,
1662306a36Sopenharmony_ciboth the eight-bit character sets and UTF-8 mode are changed to use
1762306a36Sopenharmony_cithe font as indicated.
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciThis changes the semantics of the eight-bit character tables subtly.
2062306a36Sopenharmony_ciThe four character tables are now:
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ci=============== =============================== ================
2362306a36Sopenharmony_ciMap symbol	Map name			Escape code (G0)
2462306a36Sopenharmony_ci=============== =============================== ================
2562306a36Sopenharmony_ciLAT1_MAP	Latin-1 (ISO 8859-1)		ESC ( B
2662306a36Sopenharmony_ciGRAF_MAP	DEC VT100 pseudographics	ESC ( 0
2762306a36Sopenharmony_ciIBMPC_MAP	IBM code page 437		ESC ( U
2862306a36Sopenharmony_ciUSER_MAP	User defined			ESC ( K
2962306a36Sopenharmony_ci=============== =============================== ================
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciIn particular, ESC ( U is no longer "straight to font", since the font
3262306a36Sopenharmony_cimight be completely different than the IBM character set.  This
3362306a36Sopenharmony_cipermits for example the use of block graphics even with a Latin-1 font
3462306a36Sopenharmony_ciloaded.
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ciNote that although these codes are similar to ISO 2022, neither the
3762306a36Sopenharmony_cicodes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and
3862306a36Sopenharmony_ciG1), whereas ISO 2022 has four 7-bit codes (G0-G3).
3962306a36Sopenharmony_ci
4062306a36Sopenharmony_ciIn accordance with the Unicode standard/ISO 10646 the range U+F000 to
4162306a36Sopenharmony_ciU+F8FF has been reserved for OS-wide allocation (the Unicode Standard
4262306a36Sopenharmony_cirefers to this as a "Corporate Zone", since this is inaccurate for
4362306a36Sopenharmony_ciLinux we call it the "Linux Zone").  U+F000 was picked as the starting
4462306a36Sopenharmony_cipoint since it lets the direct-mapping area start on a large power of
4562306a36Sopenharmony_citwo (in case 1024- or 2048-character fonts ever become necessary).
4662306a36Sopenharmony_ciThis leaves U+E000 to U+EFFF as End User Zone.
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ci[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been
4962306a36Sopenharmony_cihard-coded to map directly to the loaded font, bypassing the
5062306a36Sopenharmony_citranslation table.  The user-defined map now defaults to U+F000 to
5162306a36Sopenharmony_ciU+F0FF, emulating the previous behaviour.  In practice, this range
5262306a36Sopenharmony_cimight be shorter; for example, vgacon can only handle 256-character
5362306a36Sopenharmony_ci(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts.
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ci
5662306a36Sopenharmony_ciActual characters assigned in the Linux Zone
5762306a36Sopenharmony_ci--------------------------------------------
5862306a36Sopenharmony_ci
5962306a36Sopenharmony_ciIn addition, the following characters not present in Unicode 1.1.4
6062306a36Sopenharmony_cihave been defined; these are used by the DEC VT graphics map.  [v1.2]
6162306a36Sopenharmony_ciTHIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ci====== ======================================
6462306a36Sopenharmony_ciU+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
6562306a36Sopenharmony_ciU+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
6662306a36Sopenharmony_ciU+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
6762306a36Sopenharmony_ciU+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
6862306a36Sopenharmony_ci====== ======================================
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ciThe DEC VT220 uses a 6x10 character matrix, and these characters form
7162306a36Sopenharmony_cia smooth progression in the DEC VT graphics character set.  I have
7262306a36Sopenharmony_ciomitted the scan 5 line, since it is also used as a block-graphics
7362306a36Sopenharmony_cicharacter, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_ci[v1.3]: These characters have been officially added to Unicode 3.2.0;
7662306a36Sopenharmony_cithey are added at U+23BA, U+23BB, U+23BC, U+23BD.  Linux now uses the
7762306a36Sopenharmony_cinew values.
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ci[v1.2]: The following characters have been added to represent common
8062306a36Sopenharmony_cikeyboard symbols that are unlikely to ever be added to Unicode proper
8162306a36Sopenharmony_cisince they are horribly vendor-specific.  This, of course, is an
8262306a36Sopenharmony_ciexcellent example of horrible design.
8362306a36Sopenharmony_ci
8462306a36Sopenharmony_ci====== ======================================
8562306a36Sopenharmony_ciU+F810 KEYBOARD SYMBOL FLYING FLAG
8662306a36Sopenharmony_ciU+F811 KEYBOARD SYMBOL PULLDOWN MENU
8762306a36Sopenharmony_ciU+F812 KEYBOARD SYMBOL OPEN APPLE
8862306a36Sopenharmony_ciU+F813 KEYBOARD SYMBOL SOLID APPLE
8962306a36Sopenharmony_ci====== ======================================
9062306a36Sopenharmony_ci
9162306a36Sopenharmony_ciKlingon language support
9262306a36Sopenharmony_ci------------------------
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ciIn 1996, Linux was the first operating system in the world to add
9562306a36Sopenharmony_cisupport for the artificial language Klingon, created by Marc Okrand
9662306a36Sopenharmony_cifor the "Star Trek" television series.	This encoding was later
9762306a36Sopenharmony_ciadopted by the ConScript Unicode Registry and proposed (but ultimately
9862306a36Sopenharmony_cirejected) for inclusion in Unicode Plane 1.  Thus, it remains as a
9962306a36Sopenharmony_ciLinux/CSUR private assignment in the Linux Zone.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ciThis encoding has been endorsed by the Klingon Language Institute.
10262306a36Sopenharmony_ciFor more information, contact them at:
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ci	http://www.kli.org/
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ciSince the characters in the beginning of the Linux CZ have been more
10762306a36Sopenharmony_ciof the dingbats/symbols/forms type and this is a language, I have
10862306a36Sopenharmony_cilocated it at the end, on a 16-cell boundary in keeping with standard
10962306a36Sopenharmony_ciUnicode practice.
11062306a36Sopenharmony_ci
11162306a36Sopenharmony_ci.. note::
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ci  This range is now officially managed by the ConScript Unicode
11462306a36Sopenharmony_ci  Registry.  The normative reference is at:
11562306a36Sopenharmony_ci
11662306a36Sopenharmony_ci	https://www.evertype.com/standards/csur/klingon.html
11762306a36Sopenharmony_ci
11862306a36Sopenharmony_ciKlingon has an alphabet of 26 characters, a positional numeric writing
11962306a36Sopenharmony_cisystem with 10 digits, and is written left-to-right, top-to-bottom.
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ciSeveral glyph forms for the Klingon alphabet have been proposed.
12262306a36Sopenharmony_ciHowever, since the set of symbols appear to be consistent throughout,
12362306a36Sopenharmony_ciwith only the actual shapes being different, in keeping with standard
12462306a36Sopenharmony_ciUnicode practice these differences are considered font variants.
12562306a36Sopenharmony_ci
12662306a36Sopenharmony_ci======	=======================================================
12762306a36Sopenharmony_ciU+F8D0	KLINGON LETTER A
12862306a36Sopenharmony_ciU+F8D1	KLINGON LETTER B
12962306a36Sopenharmony_ciU+F8D2	KLINGON LETTER CH
13062306a36Sopenharmony_ciU+F8D3	KLINGON LETTER D
13162306a36Sopenharmony_ciU+F8D4	KLINGON LETTER E
13262306a36Sopenharmony_ciU+F8D5	KLINGON LETTER GH
13362306a36Sopenharmony_ciU+F8D6	KLINGON LETTER H
13462306a36Sopenharmony_ciU+F8D7	KLINGON LETTER I
13562306a36Sopenharmony_ciU+F8D8	KLINGON LETTER J
13662306a36Sopenharmony_ciU+F8D9	KLINGON LETTER L
13762306a36Sopenharmony_ciU+F8DA	KLINGON LETTER M
13862306a36Sopenharmony_ciU+F8DB	KLINGON LETTER N
13962306a36Sopenharmony_ciU+F8DC	KLINGON LETTER NG
14062306a36Sopenharmony_ciU+F8DD	KLINGON LETTER O
14162306a36Sopenharmony_ciU+F8DE	KLINGON LETTER P
14262306a36Sopenharmony_ciU+F8DF	KLINGON LETTER Q
14362306a36Sopenharmony_ci	- Written <q> in standard Okrand Latin transliteration
14462306a36Sopenharmony_ciU+F8E0	KLINGON LETTER QH
14562306a36Sopenharmony_ci	- Written <Q> in standard Okrand Latin transliteration
14662306a36Sopenharmony_ciU+F8E1	KLINGON LETTER R
14762306a36Sopenharmony_ciU+F8E2	KLINGON LETTER S
14862306a36Sopenharmony_ciU+F8E3	KLINGON LETTER T
14962306a36Sopenharmony_ciU+F8E4	KLINGON LETTER TLH
15062306a36Sopenharmony_ciU+F8E5	KLINGON LETTER U
15162306a36Sopenharmony_ciU+F8E6	KLINGON LETTER V
15262306a36Sopenharmony_ciU+F8E7	KLINGON LETTER W
15362306a36Sopenharmony_ciU+F8E8	KLINGON LETTER Y
15462306a36Sopenharmony_ciU+F8E9	KLINGON LETTER GLOTTAL STOP
15562306a36Sopenharmony_ci
15662306a36Sopenharmony_ciU+F8F0	KLINGON DIGIT ZERO
15762306a36Sopenharmony_ciU+F8F1	KLINGON DIGIT ONE
15862306a36Sopenharmony_ciU+F8F2	KLINGON DIGIT TWO
15962306a36Sopenharmony_ciU+F8F3	KLINGON DIGIT THREE
16062306a36Sopenharmony_ciU+F8F4	KLINGON DIGIT FOUR
16162306a36Sopenharmony_ciU+F8F5	KLINGON DIGIT FIVE
16262306a36Sopenharmony_ciU+F8F6	KLINGON DIGIT SIX
16362306a36Sopenharmony_ciU+F8F7	KLINGON DIGIT SEVEN
16462306a36Sopenharmony_ciU+F8F8	KLINGON DIGIT EIGHT
16562306a36Sopenharmony_ciU+F8F9	KLINGON DIGIT NINE
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ciU+F8FD	KLINGON COMMA
16862306a36Sopenharmony_ciU+F8FE	KLINGON FULL STOP
16962306a36Sopenharmony_ciU+F8FF	KLINGON SYMBOL FOR EMPIRE
17062306a36Sopenharmony_ci======	=======================================================
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_ciOther Fictional and Artificial Scripts
17362306a36Sopenharmony_ci--------------------------------------
17462306a36Sopenharmony_ci
17562306a36Sopenharmony_ciSince the assignment of the Klingon Linux Unicode block, a registry of
17662306a36Sopenharmony_cifictional and artificial scripts has been established by John Cowan
17762306a36Sopenharmony_ci<jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>.
17862306a36Sopenharmony_ciThe ConScript Unicode Registry is accessible at:
17962306a36Sopenharmony_ci
18062306a36Sopenharmony_ci	  https://www.evertype.com/standards/csur/
18162306a36Sopenharmony_ci
18262306a36Sopenharmony_ciThe ranges used fall at the low end of the End User Zone and can hence
18362306a36Sopenharmony_cinot be normatively assigned, but it is recommended that people who
18462306a36Sopenharmony_ciwish to encode fictional scripts use these codes, in the interest of
18562306a36Sopenharmony_ciinteroperability.  For Klingon, CSUR has adopted the Linux encoding.
18662306a36Sopenharmony_ciThe CSUR people are driving adding Tengwar and Cirth into Unicode
18762306a36Sopenharmony_ciPlane 1; the addition of Klingon to Unicode Plane 1 has been rejected
18862306a36Sopenharmony_ciand so the above encoding remains official.
189