17db96d56Sopenharmony_ciTo generate or modify mapping headers
27db96d56Sopenharmony_ci-------------------------------------
37db96d56Sopenharmony_ciMapping headers are generated from Tools/unicode/genmap_*.py
47db96d56Sopenharmony_ci
57db96d56Sopenharmony_ci
67db96d56Sopenharmony_ci
77db96d56Sopenharmony_ciNotes on implementation characteristics of each codecs
87db96d56Sopenharmony_ci-----------------------------------------------------
97db96d56Sopenharmony_ci
107db96d56Sopenharmony_ci1) Big5 codec
117db96d56Sopenharmony_ci
127db96d56Sopenharmony_ci  The big5 codec maps the following characters as cp950 does rather
137db96d56Sopenharmony_ci  than conforming Unicode.org's that maps to 0xFFFD.
147db96d56Sopenharmony_ci
157db96d56Sopenharmony_ci    BIG5        Unicode     Description
167db96d56Sopenharmony_ci
177db96d56Sopenharmony_ci    0xA15A      0x2574      SPACING UNDERSCORE
187db96d56Sopenharmony_ci    0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE
197db96d56Sopenharmony_ci    0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE
207db96d56Sopenharmony_ci    0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT
217db96d56Sopenharmony_ci    0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT
227db96d56Sopenharmony_ci    0xA2CC      0x5341      HANGZHOU NUMERAL TEN
237db96d56Sopenharmony_ci    0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY
247db96d56Sopenharmony_ci
257db96d56Sopenharmony_ci  Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
267db96d56Sopenharmony_ci  big5 codes already, a roundtrip compatibility is not guaranteed for
277db96d56Sopenharmony_ci  them.
287db96d56Sopenharmony_ci
297db96d56Sopenharmony_ci
307db96d56Sopenharmony_ci2) cp932 codec
317db96d56Sopenharmony_ci
327db96d56Sopenharmony_ci  To conform to Windows's real mapping, cp932 codec maps the following
337db96d56Sopenharmony_ci  codepoints in addition of the official cp932 mapping.
347db96d56Sopenharmony_ci
357db96d56Sopenharmony_ci    CP932     Unicode     Description
367db96d56Sopenharmony_ci
377db96d56Sopenharmony_ci    0x80      0x80        UNDEFINED
387db96d56Sopenharmony_ci    0xA0      0xF8F0      UNDEFINED
397db96d56Sopenharmony_ci    0xFD      0xF8F1      UNDEFINED
407db96d56Sopenharmony_ci    0xFE      0xF8F2      UNDEFINED
417db96d56Sopenharmony_ci    0xFF      0xF8F3      UNDEFINED
427db96d56Sopenharmony_ci
437db96d56Sopenharmony_ci
447db96d56Sopenharmony_ci3) euc-jisx0213 codec
457db96d56Sopenharmony_ci
467db96d56Sopenharmony_ci  The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
477db96d56Sopenharmony_ci  unicode U+FF3C instead of U+005C as on unicode.org's mapping.
487db96d56Sopenharmony_ci  Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
497db96d56Sopenharmony_ci  is shown as a full width character, mapping to U+FF3C can make
507db96d56Sopenharmony_ci  more sense.
517db96d56Sopenharmony_ci
527db96d56Sopenharmony_ci  The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
537db96d56Sopenharmony_ci  codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
547db96d56Sopenharmony_ci  overlapped by each other, it doesn't bother standard conformations
557db96d56Sopenharmony_ci  (and JIS X 0213 Plane 2 is intended to use so.) On encoding
567db96d56Sopenharmony_ci  sessions, the codec will try to encode kanji characters in this
577db96d56Sopenharmony_ci  order:
587db96d56Sopenharmony_ci
597db96d56Sopenharmony_ci    JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212
607db96d56Sopenharmony_ci
617db96d56Sopenharmony_ci
627db96d56Sopenharmony_ci4) euc-jp codec
637db96d56Sopenharmony_ci
647db96d56Sopenharmony_ci  The euc-jp codec is a compatibility instance on these points:
657db96d56Sopenharmony_ci   - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
667db96d56Sopenharmony_ci   - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
677db96d56Sopenharmony_ci   - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)
687db96d56Sopenharmony_ci
697db96d56Sopenharmony_ci
707db96d56Sopenharmony_ci5) shift-jis codec
717db96d56Sopenharmony_ci
727db96d56Sopenharmony_ci  The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
737db96d56Sopenharmony_ci  instead of using JIS X 0201 for compatibility. The differences are:
747db96d56Sopenharmony_ci   - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
757db96d56Sopenharmony_ci   - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
767db96d56Sopenharmony_ci   - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.
777db96d56Sopenharmony_ci
78