application-dev/internationalization/i18n-character-processing.md

e41f4b71Sopenharmony_ci# Character Processing
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci## Use Cases
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ciCharacter rules vary greatly in different languages, and it is usually difficult to extract expected information from the corresponding text. Character processing makes it possible to process text with similar logic under different language rules.
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci## How to Develop
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci### Character Type Identification Using Character Attributes
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ciCharacter attributes are used to determine the character type, for example, digit, letter, or space, and check whether a character is of the right-to-left (RTL) language or whether a character is an ideographic character (for example, Chinese, Japanese, or Korean).
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ciThese functions are implemented by APIs of the **Unicode** class. For example, you can use [isDigit](../reference/apis-localization-kit/js-apis-i18n.md#isdigit9) to check whether a character is a digit. The development procedure is as follows:
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci1. Import the **i18n** module.
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   import { i18n } from '@kit.LocalizationKit';
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci2. Obtain the character attribute.
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   let isDigit: boolean = i18n.Unicode.isDigit(char: string);
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci3. Obtain the character type. The following code snippet uses the common type as an example. For details, see the **getType** API reference.
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   let type = i18n.Unicode.getType(char: string);
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci**Development Example**
e41f4b71Sopenharmony_ci```ts
e41f4b71Sopenharmony_ci// Import the i18n module.
e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit';
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Check whether the input character is a digit.
e41f4b71Sopenharmony_cilet isDigit = i18n.Unicode.isDigit('1'); // isDigit: true
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Check whether a character is of the RTL language.
e41f4b71Sopenharmony_cilet isRTL = i18n.Unicode.isRTL('a'); // isRTL: false
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Check whether a character is an ideographic character.
e41f4b71Sopenharmony_cilet isIdeograph = i18n.Unicode.isIdeograph('Hua'); // isIdeograph: true
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Obtain the character type.
e41f4b71Sopenharmony_cilet type = i18n.Unicode.getType('a'); // type: U_LOWERCASE_LETTER
e41f4b71Sopenharmony_ci```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci### Transliteration
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ciTransliteration means to use content with similar pronunciation in the local language to replace the original content. This function is implemented through the [transform](../reference/apis-localization-kit/js-apis-i18n.md#transform9) API of the **Transliterator** class. The development procedure is as follows:
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci> **NOTE**
e41f4b71Sopenharmony_ci> This module supports the transliteration from Chinese characters to pinyin. However, it does not guaranteed that polyphonic characters are effectively processed based on the context.
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci1. Import the **i18n** module.
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   import { i18n } from '@kit.LocalizationKit';
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci2. Create a **Transliterator** object to obtain the transliteration list.
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance(id: string);  // Pass in a valid ID to create a Transliterator object.
e41f4b71Sopenharmony_ci   let ids: string[] = i18n.Transliterator.getAvailableIDs();  // Obtain the list of IDs supported by the Transliterator object.
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci3. Transliterate text.
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   let res: string = transliterator.transform(text: string);  // Transliterate the text content.
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci**Development Example**
e41f4b71Sopenharmony_ci```ts
e41f4b71Sopenharmony_ci// Import the i18n module.
e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit';
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Transliterate the text into the Latn format.
e41f4b71Sopenharmony_cilet transliterator = i18n.Transliterator.getInstance('Any-Latn');
e41f4b71Sopenharmony_cilet res = transliterator.transform("中国"); // res = "zhōng guó"
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Obtain the list of IDs supported by the Transliterator object.
e41f4b71Sopenharmony_cilet ids = i18n.Transliterator.getAvailableIDs(); // ids: ['ASCII-Latin', 'Accents-Any', ...]
e41f4b71Sopenharmony_ci```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci### Character Normalization
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ciCharacter normalization means to the standardize characters according to the specified paradigm. This function is implemented through the [normalize](../reference/apis-localization-kit/js-apis-i18n.md#normalize10) API of the **Normalizer** class. The development procedure is as follows:
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci1. Import the **i18n** module.
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   import { i18n } from '@kit.LocalizationKit';
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci2. Create a **Normalizer** object. Pass in the text normalization paradigm to create a **Normalizer** object. The text normalization paradigm can be NFC, NFD, NFKC, or NFKD. For details, see [Unicode Normalization Forms](https://www.unicode.org/reports/tr15/#Norm_Forms).
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(mode: NormalizerMode);
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci3. Normalize the text.
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   let normalizedText: string = normalizer.normalize(text: string); // Normalize the text.
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci**Development Example**
e41f4b71Sopenharmony_ci```ts
e41f4b71Sopenharmony_ci// Import the i18n module.
e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit';
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Normalize characters in the NFC form.
e41f4b71Sopenharmony_cilet normalizer = i18n.Normalizer.getInstance(i18n.NormalizerMode.NFC);
e41f4b71Sopenharmony_cilet normalizedText = normalizer.normalize('\u1E9B\u0323'); // normalizedText: \u1E9B\u0323
e41f4b71Sopenharmony_ci```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci### Line Wrapping
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ciLine wrapping means to obtain the text break position based on the specified text boundary and wrap the line. It is implemented by using the APIs of the [BreakIterator](../reference/apis-localization-kit/js-apis-i18n.md#breakiterator8) class. The development procedure is as follows:
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci1. Import the **i18n** module.
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   import { i18n } from '@kit.LocalizationKit';
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci2. Create a **BreakIterator** object.
e41f4b71Sopenharmony_ci   Pass a valid locale to create a **BreakIterator** object. This object wraps lines based on the rules specified by the locale.
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   let iterator: i18n.BreakIterator = i18n.getLineInstance(locale: string);
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci3. Set the text to be processed.
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   iterator.setLineBreakText(text: string); // Set the text to be processed.
e41f4b71Sopenharmony_ci   let breakText: string = iterator.getLineBreakText(); // View the text being processed by the BreakIterator object.
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci4. Obtain the break positions of the text.
e41f4b71Sopenharmony_ci   ```ts
e41f4b71Sopenharmony_ci   let currentPos: number = iterator.current(); // Obtain the position of BreakIterator in the text.
e41f4b71Sopenharmony_ci   let firstPos: number = iterator.first(); // Set the position of BreakIterator as the first break point and return the position of the break point. The first break point is always at the beginning of the text, that is firstPos = 0.
e41f4b71Sopenharmony_ci   let nextPos: number = iterator.next(number); // Move BreakIterator by the specified number of break points. If the number is a positive number, the iterator is moved backward. If the number is a negative number, the iterator is moved forward. The default value is 1. nextPos indicates the position after moving. If BreakIterator is moved out of the text length range, -1 is returned.
e41f4b71Sopenharmony_ci   let isBoundary: boolean = iterator.isBoundary(number); // Check whether the position indicated by the specified number is a break point.
e41f4b71Sopenharmony_ci   ```
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci**Development Example**
e41f4b71Sopenharmony_ci```ts
e41f4b71Sopenharmony_ci// Import the i18n module.
e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit';
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Create a BreakIterator object.
e41f4b71Sopenharmony_cilet iterator = i18n.getLineInstance('en-GB');
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Set the text to be processed.
e41f4b71Sopenharmony_ciiterator.setLineBreakText('Apple is my favorite fruit.');
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Move BreakIterator to the beginning of the text.
e41f4b71Sopenharmony_cilet firstPos = iterator.first(); // firstPos: 0
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Move BreakIterator by several break points.
e41f4b71Sopenharmony_cilet nextPos = iterator.next(2); // nextPos: 9
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Check whether a position is a break point.
e41f4b71Sopenharmony_cilet isBoundary = iterator.isBoundary(9); // isBoundary: true
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci// Obtain the text processed by BreakIterator.
e41f4b71Sopenharmony_cilet breakText = iterator.getLineBreakText(); // breakText: Apple is my favorite fruit.
e41f4b71Sopenharmony_ci```
e41f4b71Sopenharmony_ci<!--RP1--><!--RP1End-->
e41f4b71Sopenharmony_ci
e41f4b71Sopenharmony_ci<!--no_check-->