17db96d56Sopenharmony_ciTo generate or modify mapping headers 27db96d56Sopenharmony_ci------------------------------------- 37db96d56Sopenharmony_ciMapping headers are generated from Tools/unicode/genmap_*.py 47db96d56Sopenharmony_ci 57db96d56Sopenharmony_ci 67db96d56Sopenharmony_ci 77db96d56Sopenharmony_ciNotes on implementation characteristics of each codecs 87db96d56Sopenharmony_ci----------------------------------------------------- 97db96d56Sopenharmony_ci 107db96d56Sopenharmony_ci1) Big5 codec 117db96d56Sopenharmony_ci 127db96d56Sopenharmony_ci The big5 codec maps the following characters as cp950 does rather 137db96d56Sopenharmony_ci than conforming Unicode.org's that maps to 0xFFFD. 147db96d56Sopenharmony_ci 157db96d56Sopenharmony_ci BIG5 Unicode Description 167db96d56Sopenharmony_ci 177db96d56Sopenharmony_ci 0xA15A 0x2574 SPACING UNDERSCORE 187db96d56Sopenharmony_ci 0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE 197db96d56Sopenharmony_ci 0xA1C5 0x02CD SPACING HEAVY UNDERSCORE 207db96d56Sopenharmony_ci 0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT 217db96d56Sopenharmony_ci 0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT 227db96d56Sopenharmony_ci 0xA2CC 0x5341 HANGZHOU NUMERAL TEN 237db96d56Sopenharmony_ci 0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY 247db96d56Sopenharmony_ci 257db96d56Sopenharmony_ci Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another 267db96d56Sopenharmony_ci big5 codes already, a roundtrip compatibility is not guaranteed for 277db96d56Sopenharmony_ci them. 287db96d56Sopenharmony_ci 297db96d56Sopenharmony_ci 307db96d56Sopenharmony_ci2) cp932 codec 317db96d56Sopenharmony_ci 327db96d56Sopenharmony_ci To conform to Windows's real mapping, cp932 codec maps the following 337db96d56Sopenharmony_ci codepoints in addition of the official cp932 mapping. 347db96d56Sopenharmony_ci 357db96d56Sopenharmony_ci CP932 Unicode Description 367db96d56Sopenharmony_ci 377db96d56Sopenharmony_ci 0x80 0x80 UNDEFINED 387db96d56Sopenharmony_ci 0xA0 0xF8F0 UNDEFINED 397db96d56Sopenharmony_ci 0xFD 0xF8F1 UNDEFINED 407db96d56Sopenharmony_ci 0xFE 0xF8F2 UNDEFINED 417db96d56Sopenharmony_ci 0xFF 0xF8F3 UNDEFINED 427db96d56Sopenharmony_ci 437db96d56Sopenharmony_ci 447db96d56Sopenharmony_ci3) euc-jisx0213 codec 457db96d56Sopenharmony_ci 467db96d56Sopenharmony_ci The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into 477db96d56Sopenharmony_ci unicode U+FF3C instead of U+005C as on unicode.org's mapping. 487db96d56Sopenharmony_ci Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140 497db96d56Sopenharmony_ci is shown as a full width character, mapping to U+FF3C can make 507db96d56Sopenharmony_ci more sense. 517db96d56Sopenharmony_ci 527db96d56Sopenharmony_ci The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on 537db96d56Sopenharmony_ci codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have 547db96d56Sopenharmony_ci overlapped by each other, it doesn't bother standard conformations 557db96d56Sopenharmony_ci (and JIS X 0213 Plane 2 is intended to use so.) On encoding 567db96d56Sopenharmony_ci sessions, the codec will try to encode kanji characters in this 577db96d56Sopenharmony_ci order: 587db96d56Sopenharmony_ci 597db96d56Sopenharmony_ci JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212 607db96d56Sopenharmony_ci 617db96d56Sopenharmony_ci 627db96d56Sopenharmony_ci4) euc-jp codec 637db96d56Sopenharmony_ci 647db96d56Sopenharmony_ci The euc-jp codec is a compatibility instance on these points: 657db96d56Sopenharmony_ci - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa) 667db96d56Sopenharmony_ci - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way) 677db96d56Sopenharmony_ci - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way) 687db96d56Sopenharmony_ci 697db96d56Sopenharmony_ci 707db96d56Sopenharmony_ci5) shift-jis codec 717db96d56Sopenharmony_ci 727db96d56Sopenharmony_ci The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly 737db96d56Sopenharmony_ci instead of using JIS X 0201 for compatibility. The differences are: 747db96d56Sopenharmony_ci - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c. 757db96d56Sopenharmony_ci - U+007E TILDE is mapped to SHIFT-JIS 0x7e. 767db96d56Sopenharmony_ci - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f. 777db96d56Sopenharmony_ci 78