1e41f4b71Sopenharmony_ci# Character Processing
2e41f4b71Sopenharmony_ci
3e41f4b71Sopenharmony_ci## Use Cases
4e41f4b71Sopenharmony_ci
5e41f4b71Sopenharmony_ciCharacter rules vary greatly in different languages, and it is usually difficult to extract expected information from the corresponding text. Character processing makes it possible to process text with similar logic under different language rules.
6e41f4b71Sopenharmony_ci
7e41f4b71Sopenharmony_ci## How to Develop
8e41f4b71Sopenharmony_ci
9e41f4b71Sopenharmony_ci
10e41f4b71Sopenharmony_ci### Character Type Identification Using Character Attributes
11e41f4b71Sopenharmony_ci
12e41f4b71Sopenharmony_ciCharacter attributes are used to determine the character type, for example, digit, letter, or space, and check whether a character is of the right-to-left (RTL) language or whether a character is an ideographic character (for example, Chinese, Japanese, or Korean).
13e41f4b71Sopenharmony_ci
14e41f4b71Sopenharmony_ciThese functions are implemented by APIs of the **Unicode** class. For example, you can use [isDigit](../reference/apis-localization-kit/js-apis-i18n.md#isdigit9) to check whether a character is a digit. The development procedure is as follows:
15e41f4b71Sopenharmony_ci
16e41f4b71Sopenharmony_ci1. Import the **i18n** module.
17e41f4b71Sopenharmony_ci
18e41f4b71Sopenharmony_ci   ```ts
19e41f4b71Sopenharmony_ci   import { i18n } from '@kit.LocalizationKit';
20e41f4b71Sopenharmony_ci   ```
21e41f4b71Sopenharmony_ci
22e41f4b71Sopenharmony_ci2. Obtain the character attribute.
23e41f4b71Sopenharmony_ci
24e41f4b71Sopenharmony_ci   ```ts
25e41f4b71Sopenharmony_ci   let isDigit: boolean = i18n.Unicode.isDigit(char: string);
26e41f4b71Sopenharmony_ci   ```
27e41f4b71Sopenharmony_ci
28e41f4b71Sopenharmony_ci3. Obtain the character type. The following code snippet uses the common type as an example. For details, see the **getType** API reference.
29e41f4b71Sopenharmony_ci
30e41f4b71Sopenharmony_ci   ```ts
31e41f4b71Sopenharmony_ci   let type = i18n.Unicode.getType(char: string);
32e41f4b71Sopenharmony_ci   ```
33e41f4b71Sopenharmony_ci
34e41f4b71Sopenharmony_ci**Development Example**
35e41f4b71Sopenharmony_ci```ts
36e41f4b71Sopenharmony_ci// Import the i18n module.
37e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit';
38e41f4b71Sopenharmony_ci
39e41f4b71Sopenharmony_ci// Check whether the input character is a digit.
40e41f4b71Sopenharmony_cilet isDigit = i18n.Unicode.isDigit('1'); // isDigit: true
41e41f4b71Sopenharmony_ci
42e41f4b71Sopenharmony_ci// Check whether a character is of the RTL language.
43e41f4b71Sopenharmony_cilet isRTL = i18n.Unicode.isRTL('a'); // isRTL: false
44e41f4b71Sopenharmony_ci
45e41f4b71Sopenharmony_ci// Check whether a character is an ideographic character.
46e41f4b71Sopenharmony_cilet isIdeograph = i18n.Unicode.isIdeograph('Hua'); // isIdeograph: true
47e41f4b71Sopenharmony_ci
48e41f4b71Sopenharmony_ci// Obtain the character type.
49e41f4b71Sopenharmony_cilet type = i18n.Unicode.getType('a'); // type: U_LOWERCASE_LETTER
50e41f4b71Sopenharmony_ci```
51e41f4b71Sopenharmony_ci
52e41f4b71Sopenharmony_ci
53e41f4b71Sopenharmony_ci### Transliteration
54e41f4b71Sopenharmony_ci
55e41f4b71Sopenharmony_ciTransliteration means to use content with similar pronunciation in the local language to replace the original content. This function is implemented through the [transform](../reference/apis-localization-kit/js-apis-i18n.md#transform9) API of the **Transliterator** class. The development procedure is as follows:
56e41f4b71Sopenharmony_ci
57e41f4b71Sopenharmony_ci> **NOTE**
58e41f4b71Sopenharmony_ci> This module supports the transliteration from Chinese characters to pinyin. However, it does not guaranteed that polyphonic characters are effectively processed based on the context.
59e41f4b71Sopenharmony_ci
60e41f4b71Sopenharmony_ci1. Import the **i18n** module.
61e41f4b71Sopenharmony_ci   ```ts
62e41f4b71Sopenharmony_ci   import { i18n } from '@kit.LocalizationKit';
63e41f4b71Sopenharmony_ci   ```
64e41f4b71Sopenharmony_ci
65e41f4b71Sopenharmony_ci2. Create a **Transliterator** object to obtain the transliteration list.
66e41f4b71Sopenharmony_ci   ```ts
67e41f4b71Sopenharmony_ci   let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance(id: string);  // Pass in a valid ID to create a Transliterator object.
68e41f4b71Sopenharmony_ci   let ids: string[] = i18n.Transliterator.getAvailableIDs();  // Obtain the list of IDs supported by the Transliterator object.
69e41f4b71Sopenharmony_ci   ```
70e41f4b71Sopenharmony_ci
71e41f4b71Sopenharmony_ci3. Transliterate text.
72e41f4b71Sopenharmony_ci   ```ts
73e41f4b71Sopenharmony_ci   let res: string = transliterator.transform(text: string);  // Transliterate the text content.
74e41f4b71Sopenharmony_ci   ```
75e41f4b71Sopenharmony_ci
76e41f4b71Sopenharmony_ci
77e41f4b71Sopenharmony_ci**Development Example**
78e41f4b71Sopenharmony_ci```ts
79e41f4b71Sopenharmony_ci// Import the i18n module.
80e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit';
81e41f4b71Sopenharmony_ci
82e41f4b71Sopenharmony_ci// Transliterate the text into the Latn format.
83e41f4b71Sopenharmony_cilet transliterator = i18n.Transliterator.getInstance('Any-Latn');
84e41f4b71Sopenharmony_cilet res = transliterator.transform("中国"); // res = "zhōng guó"
85e41f4b71Sopenharmony_ci
86e41f4b71Sopenharmony_ci// Obtain the list of IDs supported by the Transliterator object.
87e41f4b71Sopenharmony_cilet ids = i18n.Transliterator.getAvailableIDs(); // ids: ['ASCII-Latin', 'Accents-Any', ...]
88e41f4b71Sopenharmony_ci```
89e41f4b71Sopenharmony_ci
90e41f4b71Sopenharmony_ci
91e41f4b71Sopenharmony_ci### Character Normalization
92e41f4b71Sopenharmony_ci
93e41f4b71Sopenharmony_ciCharacter normalization means to the standardize characters according to the specified paradigm. This function is implemented through the [normalize](../reference/apis-localization-kit/js-apis-i18n.md#normalize10) API of the **Normalizer** class. The development procedure is as follows:
94e41f4b71Sopenharmony_ci
95e41f4b71Sopenharmony_ci1. Import the **i18n** module.
96e41f4b71Sopenharmony_ci   ```ts
97e41f4b71Sopenharmony_ci   import { i18n } from '@kit.LocalizationKit';
98e41f4b71Sopenharmony_ci   ```
99e41f4b71Sopenharmony_ci
100e41f4b71Sopenharmony_ci2. Create a **Normalizer** object. Pass in the text normalization paradigm to create a **Normalizer** object. The text normalization paradigm can be NFC, NFD, NFKC, or NFKD. For details, see [Unicode Normalization Forms](https://www.unicode.org/reports/tr15/#Norm_Forms).
101e41f4b71Sopenharmony_ci   ```ts
102e41f4b71Sopenharmony_ci   let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(mode: NormalizerMode);
103e41f4b71Sopenharmony_ci   ```
104e41f4b71Sopenharmony_ci
105e41f4b71Sopenharmony_ci3. Normalize the text.
106e41f4b71Sopenharmony_ci   ```ts
107e41f4b71Sopenharmony_ci   let normalizedText: string = normalizer.normalize(text: string); // Normalize the text.
108e41f4b71Sopenharmony_ci   ```
109e41f4b71Sopenharmony_ci
110e41f4b71Sopenharmony_ci**Development Example**
111e41f4b71Sopenharmony_ci```ts
112e41f4b71Sopenharmony_ci// Import the i18n module.
113e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit';
114e41f4b71Sopenharmony_ci
115e41f4b71Sopenharmony_ci// Normalize characters in the NFC form.
116e41f4b71Sopenharmony_cilet normalizer = i18n.Normalizer.getInstance(i18n.NormalizerMode.NFC);
117e41f4b71Sopenharmony_cilet normalizedText = normalizer.normalize('\u1E9B\u0323'); // normalizedText: \u1E9B\u0323
118e41f4b71Sopenharmony_ci```
119e41f4b71Sopenharmony_ci
120e41f4b71Sopenharmony_ci
121e41f4b71Sopenharmony_ci### Line Wrapping
122e41f4b71Sopenharmony_ci
123e41f4b71Sopenharmony_ciLine wrapping means to obtain the text break position based on the specified text boundary and wrap the line. It is implemented by using the APIs of the [BreakIterator](../reference/apis-localization-kit/js-apis-i18n.md#breakiterator8) class. The development procedure is as follows:
124e41f4b71Sopenharmony_ci
125e41f4b71Sopenharmony_ci1. Import the **i18n** module.
126e41f4b71Sopenharmony_ci   ```ts
127e41f4b71Sopenharmony_ci   import { i18n } from '@kit.LocalizationKit';
128e41f4b71Sopenharmony_ci   ```
129e41f4b71Sopenharmony_ci
130e41f4b71Sopenharmony_ci2. Create a **BreakIterator** object.
131e41f4b71Sopenharmony_ci   Pass a valid locale to create a **BreakIterator** object. This object wraps lines based on the rules specified by the locale.
132e41f4b71Sopenharmony_ci
133e41f4b71Sopenharmony_ci   ```ts
134e41f4b71Sopenharmony_ci   let iterator: i18n.BreakIterator = i18n.getLineInstance(locale: string);
135e41f4b71Sopenharmony_ci   ```
136e41f4b71Sopenharmony_ci
137e41f4b71Sopenharmony_ci3. Set the text to be processed.
138e41f4b71Sopenharmony_ci   ```ts
139e41f4b71Sopenharmony_ci   iterator.setLineBreakText(text: string); // Set the text to be processed.
140e41f4b71Sopenharmony_ci   let breakText: string = iterator.getLineBreakText(); // View the text being processed by the BreakIterator object.
141e41f4b71Sopenharmony_ci   ```
142e41f4b71Sopenharmony_ci
143e41f4b71Sopenharmony_ci4. Obtain the break positions of the text.
144e41f4b71Sopenharmony_ci   ```ts
145e41f4b71Sopenharmony_ci   let currentPos: number = iterator.current(); // Obtain the position of BreakIterator in the text.
146e41f4b71Sopenharmony_ci   let firstPos: number = iterator.first(); // Set the position of BreakIterator as the first break point and return the position of the break point. The first break point is always at the beginning of the text, that is firstPos = 0.
147e41f4b71Sopenharmony_ci   let nextPos: number = iterator.next(number); // Move BreakIterator by the specified number of break points. If the number is a positive number, the iterator is moved backward. If the number is a negative number, the iterator is moved forward. The default value is 1. nextPos indicates the position after moving. If BreakIterator is moved out of the text length range, -1 is returned.
148e41f4b71Sopenharmony_ci   let isBoundary: boolean = iterator.isBoundary(number); // Check whether the position indicated by the specified number is a break point.
149e41f4b71Sopenharmony_ci   ```
150e41f4b71Sopenharmony_ci
151e41f4b71Sopenharmony_ci
152e41f4b71Sopenharmony_ci**Development Example**
153e41f4b71Sopenharmony_ci```ts
154e41f4b71Sopenharmony_ci// Import the i18n module.
155e41f4b71Sopenharmony_ciimport { i18n } from '@kit.LocalizationKit';
156e41f4b71Sopenharmony_ci
157e41f4b71Sopenharmony_ci// Create a BreakIterator object.
158e41f4b71Sopenharmony_cilet iterator = i18n.getLineInstance('en-GB');
159e41f4b71Sopenharmony_ci
160e41f4b71Sopenharmony_ci// Set the text to be processed.
161e41f4b71Sopenharmony_ciiterator.setLineBreakText('Apple is my favorite fruit.');
162e41f4b71Sopenharmony_ci
163e41f4b71Sopenharmony_ci// Move BreakIterator to the beginning of the text.
164e41f4b71Sopenharmony_cilet firstPos = iterator.first(); // firstPos: 0
165e41f4b71Sopenharmony_ci
166e41f4b71Sopenharmony_ci// Move BreakIterator by several break points.
167e41f4b71Sopenharmony_cilet nextPos = iterator.next(2); // nextPos: 9
168e41f4b71Sopenharmony_ci
169e41f4b71Sopenharmony_ci// Check whether a position is a break point.
170e41f4b71Sopenharmony_cilet isBoundary = iterator.isBoundary(9); // isBoundary: true
171e41f4b71Sopenharmony_ci
172e41f4b71Sopenharmony_ci// Obtain the text processed by BreakIterator.
173e41f4b71Sopenharmony_cilet breakText = iterator.getLineBreakText(); // breakText: Apple is my favorite fruit.
174e41f4b71Sopenharmony_ci```
175e41f4b71Sopenharmony_ci<!--RP1--><!--RP1End-->
176e41f4b71Sopenharmony_ci
177e41f4b71Sopenharmony_ci<!--no_check-->