1e41f4b71Sopenharmony_ci# Character Processing 2e41f4b71Sopenharmony_ci 3e41f4b71Sopenharmony_ci## Use Cases 4e41f4b71Sopenharmony_ci 5e41f4b71Sopenharmony_ciCharacter rules vary greatly in different languages, and it is usually difficult to extract expected information from the corresponding text. Character processing makes it possible to process text with similar logic under different language rules. 6e41f4b71Sopenharmony_ci 7e41f4b71Sopenharmony_ci## How to Develop 8e41f4b71Sopenharmony_ci 9e41f4b71Sopenharmony_ci 10e41f4b71Sopenharmony_ci### Character Type Identification Using Character Attributes 11e41f4b71Sopenharmony_ci 12e41f4b71Sopenharmony_ciCharacter attributes are used to determine the character type, for example, digit, letter, or space, and check whether a character is of the right-to-left (RTL) language or whether a character is an ideographic character (for example, Chinese, Japanese, or Korean). 13e41f4b71Sopenharmony_ci 14e41f4b71Sopenharmony_ciThese functions are implemented by APIs of the **Unicode** class. For example, you can use [isDigit](../reference/apis-localization-kit/js-apis-i18n.md#isdigit9) to check whether a character is a digit. The development procedure is as follows: 15e41f4b71Sopenharmony_ci 16e41f4b71Sopenharmony_ci1. Import the **i18n** module. 17e41f4b71Sopenharmony_ci 18e41f4b71Sopenharmony_ci ```ts 19e41f4b71Sopenharmony_ci import { i18n } from '@kit.LocalizationKit'; 20e41f4b71Sopenharmony_ci ``` 21e41f4b71Sopenharmony_ci 22e41f4b71Sopenharmony_ci2. Obtain the character attribute. 23e41f4b71Sopenharmony_ci 24e41f4b71Sopenharmony_ci ```ts 25e41f4b71Sopenharmony_ci let isDigit: boolean = i18n.Unicode.isDigit(char: string); 26e41f4b71Sopenharmony_ci ``` 27e41f4b71Sopenharmony_ci 28e41f4b71Sopenharmony_ci3. Obtain the character type. The following code snippet uses the common type as an example. For details, see the **getType** API reference. 29e41f4b71Sopenharmony_ci 30e41f4b71Sopenharmony_ci ```ts 31e41f4b71Sopenharmony_ci let type = i18n.Unicode.getType(char: string); 32e41f4b71Sopenharmony_ci ``` 33e41f4b71Sopenharmony_ci 34e41f4b71Sopenharmony_ci**Development Example** 35e41f4b71Sopenharmony_ci```ts 36e41f4b71Sopenharmony_ci// Import the i18n module. 37e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit'; 38e41f4b71Sopenharmony_ci 39e41f4b71Sopenharmony_ci// Check whether the input character is a digit. 40e41f4b71Sopenharmony_cilet isDigit = i18n.Unicode.isDigit('1'); // isDigit: true 41e41f4b71Sopenharmony_ci 42e41f4b71Sopenharmony_ci// Check whether a character is of the RTL language. 43e41f4b71Sopenharmony_cilet isRTL = i18n.Unicode.isRTL('a'); // isRTL: false 44e41f4b71Sopenharmony_ci 45e41f4b71Sopenharmony_ci// Check whether a character is an ideographic character. 46e41f4b71Sopenharmony_cilet isIdeograph = i18n.Unicode.isIdeograph('Hua'); // isIdeograph: true 47e41f4b71Sopenharmony_ci 48e41f4b71Sopenharmony_ci// Obtain the character type. 49e41f4b71Sopenharmony_cilet type = i18n.Unicode.getType('a'); // type: U_LOWERCASE_LETTER 50e41f4b71Sopenharmony_ci``` 51e41f4b71Sopenharmony_ci 52e41f4b71Sopenharmony_ci 53e41f4b71Sopenharmony_ci### Transliteration 54e41f4b71Sopenharmony_ci 55e41f4b71Sopenharmony_ciTransliteration means to use content with similar pronunciation in the local language to replace the original content. This function is implemented through the [transform](../reference/apis-localization-kit/js-apis-i18n.md#transform9) API of the **Transliterator** class. The development procedure is as follows: 56e41f4b71Sopenharmony_ci 57e41f4b71Sopenharmony_ci> **NOTE** 58e41f4b71Sopenharmony_ci> This module supports the transliteration from Chinese characters to pinyin. However, it does not guaranteed that polyphonic characters are effectively processed based on the context. 59e41f4b71Sopenharmony_ci 60e41f4b71Sopenharmony_ci1. Import the **i18n** module. 61e41f4b71Sopenharmony_ci ```ts 62e41f4b71Sopenharmony_ci import { i18n } from '@kit.LocalizationKit'; 63e41f4b71Sopenharmony_ci ``` 64e41f4b71Sopenharmony_ci 65e41f4b71Sopenharmony_ci2. Create a **Transliterator** object to obtain the transliteration list. 66e41f4b71Sopenharmony_ci ```ts 67e41f4b71Sopenharmony_ci let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance(id: string); // Pass in a valid ID to create a Transliterator object. 68e41f4b71Sopenharmony_ci let ids: string[] = i18n.Transliterator.getAvailableIDs(); // Obtain the list of IDs supported by the Transliterator object. 69e41f4b71Sopenharmony_ci ``` 70e41f4b71Sopenharmony_ci 71e41f4b71Sopenharmony_ci3. Transliterate text. 72e41f4b71Sopenharmony_ci ```ts 73e41f4b71Sopenharmony_ci let res: string = transliterator.transform(text: string); // Transliterate the text content. 74e41f4b71Sopenharmony_ci ``` 75e41f4b71Sopenharmony_ci 76e41f4b71Sopenharmony_ci 77e41f4b71Sopenharmony_ci**Development Example** 78e41f4b71Sopenharmony_ci```ts 79e41f4b71Sopenharmony_ci// Import the i18n module. 80e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit'; 81e41f4b71Sopenharmony_ci 82e41f4b71Sopenharmony_ci// Transliterate the text into the Latn format. 83e41f4b71Sopenharmony_cilet transliterator = i18n.Transliterator.getInstance('Any-Latn'); 84e41f4b71Sopenharmony_cilet res = transliterator.transform("中国"); // res = "zhōng guó" 85e41f4b71Sopenharmony_ci 86e41f4b71Sopenharmony_ci// Obtain the list of IDs supported by the Transliterator object. 87e41f4b71Sopenharmony_cilet ids = i18n.Transliterator.getAvailableIDs(); // ids: ['ASCII-Latin', 'Accents-Any', ...] 88e41f4b71Sopenharmony_ci``` 89e41f4b71Sopenharmony_ci 90e41f4b71Sopenharmony_ci 91e41f4b71Sopenharmony_ci### Character Normalization 92e41f4b71Sopenharmony_ci 93e41f4b71Sopenharmony_ciCharacter normalization means to the standardize characters according to the specified paradigm. This function is implemented through the [normalize](../reference/apis-localization-kit/js-apis-i18n.md#normalize10) API of the **Normalizer** class. The development procedure is as follows: 94e41f4b71Sopenharmony_ci 95e41f4b71Sopenharmony_ci1. Import the **i18n** module. 96e41f4b71Sopenharmony_ci ```ts 97e41f4b71Sopenharmony_ci import { i18n } from '@kit.LocalizationKit'; 98e41f4b71Sopenharmony_ci ``` 99e41f4b71Sopenharmony_ci 100e41f4b71Sopenharmony_ci2. Create a **Normalizer** object. Pass in the text normalization paradigm to create a **Normalizer** object. The text normalization paradigm can be NFC, NFD, NFKC, or NFKD. For details, see [Unicode Normalization Forms](https://www.unicode.org/reports/tr15/#Norm_Forms). 101e41f4b71Sopenharmony_ci ```ts 102e41f4b71Sopenharmony_ci let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(mode: NormalizerMode); 103e41f4b71Sopenharmony_ci ``` 104e41f4b71Sopenharmony_ci 105e41f4b71Sopenharmony_ci3. Normalize the text. 106e41f4b71Sopenharmony_ci ```ts 107e41f4b71Sopenharmony_ci let normalizedText: string = normalizer.normalize(text: string); // Normalize the text. 108e41f4b71Sopenharmony_ci ``` 109e41f4b71Sopenharmony_ci 110e41f4b71Sopenharmony_ci**Development Example** 111e41f4b71Sopenharmony_ci```ts 112e41f4b71Sopenharmony_ci// Import the i18n module. 113e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit'; 114e41f4b71Sopenharmony_ci 115e41f4b71Sopenharmony_ci// Normalize characters in the NFC form. 116e41f4b71Sopenharmony_cilet normalizer = i18n.Normalizer.getInstance(i18n.NormalizerMode.NFC); 117e41f4b71Sopenharmony_cilet normalizedText = normalizer.normalize('\u1E9B\u0323'); // normalizedText: \u1E9B\u0323 118e41f4b71Sopenharmony_ci``` 119e41f4b71Sopenharmony_ci 120e41f4b71Sopenharmony_ci 121e41f4b71Sopenharmony_ci### Line Wrapping 122e41f4b71Sopenharmony_ci 123e41f4b71Sopenharmony_ciLine wrapping means to obtain the text break position based on the specified text boundary and wrap the line. It is implemented by using the APIs of the [BreakIterator](../reference/apis-localization-kit/js-apis-i18n.md#breakiterator8) class. The development procedure is as follows: 124e41f4b71Sopenharmony_ci 125e41f4b71Sopenharmony_ci1. Import the **i18n** module. 126e41f4b71Sopenharmony_ci ```ts 127e41f4b71Sopenharmony_ci import { i18n } from '@kit.LocalizationKit'; 128e41f4b71Sopenharmony_ci ``` 129e41f4b71Sopenharmony_ci 130e41f4b71Sopenharmony_ci2. Create a **BreakIterator** object. 131e41f4b71Sopenharmony_ci Pass a valid locale to create a **BreakIterator** object. This object wraps lines based on the rules specified by the locale. 132e41f4b71Sopenharmony_ci 133e41f4b71Sopenharmony_ci ```ts 134e41f4b71Sopenharmony_ci let iterator: i18n.BreakIterator = i18n.getLineInstance(locale: string); 135e41f4b71Sopenharmony_ci ``` 136e41f4b71Sopenharmony_ci 137e41f4b71Sopenharmony_ci3. Set the text to be processed. 138e41f4b71Sopenharmony_ci ```ts 139e41f4b71Sopenharmony_ci iterator.setLineBreakText(text: string); // Set the text to be processed. 140e41f4b71Sopenharmony_ci let breakText: string = iterator.getLineBreakText(); // View the text being processed by the BreakIterator object. 141e41f4b71Sopenharmony_ci ``` 142e41f4b71Sopenharmony_ci 143e41f4b71Sopenharmony_ci4. Obtain the break positions of the text. 144e41f4b71Sopenharmony_ci ```ts 145e41f4b71Sopenharmony_ci let currentPos: number = iterator.current(); // Obtain the position of BreakIterator in the text. 146e41f4b71Sopenharmony_ci let firstPos: number = iterator.first(); // Set the position of BreakIterator as the first break point and return the position of the break point. The first break point is always at the beginning of the text, that is firstPos = 0. 147e41f4b71Sopenharmony_ci let nextPos: number = iterator.next(number); // Move BreakIterator by the specified number of break points. If the number is a positive number, the iterator is moved backward. If the number is a negative number, the iterator is moved forward. The default value is 1. nextPos indicates the position after moving. If BreakIterator is moved out of the text length range, -1 is returned. 148e41f4b71Sopenharmony_ci let isBoundary: boolean = iterator.isBoundary(number); // Check whether the position indicated by the specified number is a break point. 149e41f4b71Sopenharmony_ci ``` 150e41f4b71Sopenharmony_ci 151e41f4b71Sopenharmony_ci 152e41f4b71Sopenharmony_ci**Development Example** 153e41f4b71Sopenharmony_ci```ts 154e41f4b71Sopenharmony_ci// Import the i18n module. 155e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit'; 156e41f4b71Sopenharmony_ci 157e41f4b71Sopenharmony_ci// Create a BreakIterator object. 158e41f4b71Sopenharmony_cilet iterator = i18n.getLineInstance('en-GB'); 159e41f4b71Sopenharmony_ci 160e41f4b71Sopenharmony_ci// Set the text to be processed. 161e41f4b71Sopenharmony_ciiterator.setLineBreakText('Apple is my favorite fruit.'); 162e41f4b71Sopenharmony_ci 163e41f4b71Sopenharmony_ci// Move BreakIterator to the beginning of the text. 164e41f4b71Sopenharmony_cilet firstPos = iterator.first(); // firstPos: 0 165e41f4b71Sopenharmony_ci 166e41f4b71Sopenharmony_ci// Move BreakIterator by several break points. 167e41f4b71Sopenharmony_cilet nextPos = iterator.next(2); // nextPos: 9 168e41f4b71Sopenharmony_ci 169e41f4b71Sopenharmony_ci// Check whether a position is a break point. 170e41f4b71Sopenharmony_cilet isBoundary = iterator.isBoundary(9); // isBoundary: true 171e41f4b71Sopenharmony_ci 172e41f4b71Sopenharmony_ci// Obtain the text processed by BreakIterator. 173e41f4b71Sopenharmony_cilet breakText = iterator.getLineBreakText(); // breakText: Apple is my favorite fruit. 174e41f4b71Sopenharmony_ci``` 175e41f4b71Sopenharmony_ci<!--RP1--><!--RP1End--> 176e41f4b71Sopenharmony_ci 177e41f4b71Sopenharmony_ci<!--no_check-->