12e5b6d6dSopenharmony_ci---
22e5b6d6dSopenharmony_cilayout: default
32e5b6d6dSopenharmony_cititle: Case Mappings
42e5b6d6dSopenharmony_cinav_order: 1
52e5b6d6dSopenharmony_ciparent: Transforms
62e5b6d6dSopenharmony_ci---
72e5b6d6dSopenharmony_ci<!--
82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others.
92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html
102e5b6d6dSopenharmony_ci-->
112e5b6d6dSopenharmony_ci
122e5b6d6dSopenharmony_ci# Case Mappings
132e5b6d6dSopenharmony_ci{: .no_toc }
142e5b6d6dSopenharmony_ci
152e5b6d6dSopenharmony_ci## Contents
162e5b6d6dSopenharmony_ci{: .no_toc .text-delta }
172e5b6d6dSopenharmony_ci
182e5b6d6dSopenharmony_ci1. TOC
192e5b6d6dSopenharmony_ci{:toc}
202e5b6d6dSopenharmony_ci
212e5b6d6dSopenharmony_ci---
222e5b6d6dSopenharmony_ci
232e5b6d6dSopenharmony_ci## Overview
242e5b6d6dSopenharmony_ci
252e5b6d6dSopenharmony_ciCase mapping is used to handle the mapping of upper-case, lower-case, and title
262e5b6d6dSopenharmony_cicase characters for a given language. Case is a normative property of characters
272e5b6d6dSopenharmony_ciin specific alphabets (e.g. Latin, Greek, Cyrillic, Armenian, and Georgian)
282e5b6d6dSopenharmony_ciwhereby characters are considered to be variants of a single letter. ICU refers
292e5b6d6dSopenharmony_cito these variants, which may differ markedly in shape and size, as uppercase
302e5b6d6dSopenharmony_ciletters (also known as capital or majuscule) and lower-case letters (also known
312e5b6d6dSopenharmony_cias small or minuscule). Alphabets with case differences are called bicameral and
322e5b6d6dSopenharmony_cialphabets without case differences are called unicameral.
332e5b6d6dSopenharmony_ci
342e5b6d6dSopenharmony_ciDue to the inclusion of certain composite characters for compatibility, such as
352e5b6d6dSopenharmony_cithe Latin capital letter 'DZ' (\\u01F1 'DZ'), there is a third case called title
362e5b6d6dSopenharmony_cicase. Title case is used to capitalize the first character of a word such as the
372e5b6d6dSopenharmony_ciLatin capital letter 'D' with small letter 'z' ( \\u01F2 'Dz'). The term "title
382e5b6d6dSopenharmony_cicase" can also be used to refer to words whose first letter is an uppercase or
392e5b6d6dSopenharmony_cititle case letter and the rest are lowercase letters. However, not all words in
402e5b6d6dSopenharmony_cithe title of a document or first words in a sentence will be title case. The use
412e5b6d6dSopenharmony_ciof title case words is language dependent. For example, in English, "Taming of
422e5b6d6dSopenharmony_cithe Shrew" would be the appropriate capitalization and not "Taming Of The
432e5b6d6dSopenharmony_ciShrew".
442e5b6d6dSopenharmony_ci
452e5b6d6dSopenharmony_ci> :point_right: **Note**: *As of Unicode 11, Georgian now has Mkhedruli (lowercase) and Mtavruli
462e5b6d6dSopenharmony_ci(uppercase) which form case pairs, but are not used in title case.*
472e5b6d6dSopenharmony_ci
482e5b6d6dSopenharmony_ciSample code is available in the ICU source code library at
492e5b6d6dSopenharmony_ci[icu/source/samples/ustring/ustring.cpp](https://github.com/unicode-org/icu/blob/main/icu4c/source/samples/ustring/ustring.cpp)
502e5b6d6dSopenharmony_ci.
512e5b6d6dSopenharmony_ci
522e5b6d6dSopenharmony_ciPlease refer to the following sections in the [The Unicode Standard](http://www.unicode.org/versions/latest/)
532e5b6d6dSopenharmony_cifor more information about case mapping:
542e5b6d6dSopenharmony_ci
552e5b6d6dSopenharmony_ci*   3.13 Default Case Algorithms
562e5b6d6dSopenharmony_ci*   4.2 Case
572e5b6d6dSopenharmony_ci*   5.18 Case Mappings
582e5b6d6dSopenharmony_ci
592e5b6d6dSopenharmony_ci## Simple (Single-Character) Case Mapping
602e5b6d6dSopenharmony_ci
612e5b6d6dSopenharmony_ciThe general case mapping in ICU is non-language based and a 1 to 1 generic
622e5b6d6dSopenharmony_cicharacter map.
632e5b6d6dSopenharmony_ci
642e5b6d6dSopenharmony_ciA character is considered to have a lowercase, uppercase, or title case
652e5b6d6dSopenharmony_ciequivalent if there is a respective "simple" case mapping specified for the
662e5b6d6dSopenharmony_cicharacter in the [Unicode Character Database](http://www.unicode.org/ucd/) (UnicodeData.txt).
672e5b6d6dSopenharmony_ciIf a character has no mapping equivalent, the result is the character itself.
682e5b6d6dSopenharmony_ci
692e5b6d6dSopenharmony_ciThe APIs provided for the general case mapping, located in `uchar.h` file, handles
702e5b6d6dSopenharmony_cionly single characters of type `UChar32` and returns only single characters. To
712e5b6d6dSopenharmony_ciconvert a string to a non-language based specific case, use the APIs in either
722e5b6d6dSopenharmony_cithe `unistr.h` or `ustring.h` files with a `NULL` argument locale.
732e5b6d6dSopenharmony_ci
742e5b6d6dSopenharmony_ci## Full (Language-Specific) Case Mapping
752e5b6d6dSopenharmony_ci
762e5b6d6dSopenharmony_ciThere are different case mappings for different locales. For instance, unlike
772e5b6d6dSopenharmony_ciEnglish, the character Latin small letter 'i' in Turkish has an equivalent Latin
782e5b6d6dSopenharmony_cicapital letter 'I' with dot above ( \\u0130 'İ').
792e5b6d6dSopenharmony_ci
802e5b6d6dSopenharmony_ciSimilar to the simple case mapping API, a character is considered to have a
812e5b6d6dSopenharmony_cilowercase, uppercase or title case equivalent if there is a respective mapping
822e5b6d6dSopenharmony_cispecified for the character in the Unicode Character database (UnicodeData.txt).
832e5b6d6dSopenharmony_ciIn the case where a character has no mapping equivalent, the result is the
842e5b6d6dSopenharmony_cicharacter itself.
852e5b6d6dSopenharmony_ci
862e5b6d6dSopenharmony_ciTo convert a string to a language based specific case, use the APIs in `ustring.h`
872e5b6d6dSopenharmony_ciand `unistr.h` with an intended argument locale.
882e5b6d6dSopenharmony_ci
892e5b6d6dSopenharmony_ciICU implements full Unicode string case mappings.
902e5b6d6dSopenharmony_ci
912e5b6d6dSopenharmony_ci**In general:**
922e5b6d6dSopenharmony_ci
932e5b6d6dSopenharmony_ci*   **case mapping can change the number of code points and/or code units of a
942e5b6d6dSopenharmony_ci    string,**
952e5b6d6dSopenharmony_ci*   **is language-sensitive (results may differ depending on language), and**
962e5b6d6dSopenharmony_ci*   **is context-sensitive (a character in the input string may map differently
972e5b6d6dSopenharmony_ci    depending on surrounding characters).**
982e5b6d6dSopenharmony_ci
992e5b6d6dSopenharmony_ci## Case Folding
1002e5b6d6dSopenharmony_ci
1012e5b6d6dSopenharmony_ciCase folding maps strings to a canonical form where case differences are erased.
1022e5b6d6dSopenharmony_ciUsing the case folding API, ICU supports fast matches without regard to case in
1032e5b6d6dSopenharmony_cilookups, since only binary comparison is required.
1042e5b6d6dSopenharmony_ci
1052e5b6d6dSopenharmony_ciThe CaseFolding.txt file in the Unicode Character Database is used for
1062e5b6d6dSopenharmony_ciperforming locale-independent case folding. This text file is generated from the
1072e5b6d6dSopenharmony_cicase mappings in the Unicode Character Database, using both the single-character
1082e5b6d6dSopenharmony_ciand the multi-character mappings. The CaseFolding.txt file transforms all
1092e5b6d6dSopenharmony_cicharacters having different case forms into a common form. To compare two
1102e5b6d6dSopenharmony_cistrings for non-case-sensitive matching, you can transform each string and then
1112e5b6d6dSopenharmony_ciuse a binary comparison. There are also functions to compare two strings
1122e5b6d6dSopenharmony_cicase-insensitively using the same case folding data.
1132e5b6d6dSopenharmony_ci
1142e5b6d6dSopenharmony_ciUnicode case folding is not context-sensitive. It is also not
1152e5b6d6dSopenharmony_cilanguage-sensitive, although there is a flag for whether to apply special
1162e5b6d6dSopenharmony_cimappings for use with Turkic (Turkish/Azerbaijani) text data.
1172e5b6d6dSopenharmony_ci
1182e5b6d6dSopenharmony_ciCharacter case folding APIs implementations are located in:
1192e5b6d6dSopenharmony_ci
1202e5b6d6dSopenharmony_ci1.  `uchar.h` for single character folding
1212e5b6d6dSopenharmony_ci
1222e5b6d6dSopenharmony_ci2.  `ustring.h` and `unistr.h` for character string folding.
123