1# Character Processing
2
3## Use Cases
4
5Character rules vary greatly in different languages, and it is usually difficult to extract expected information from the corresponding text. Character processing makes it possible to process text with similar logic under different language rules.
6
7## How to Develop
8
9
10### Character Type Identification Using Character Attributes
11
12Character attributes are used to determine the character type, for example, digit, letter, or space, and check whether a character is of the right-to-left (RTL) language or whether a character is an ideographic character (for example, Chinese, Japanese, or Korean).
13
14These functions are implemented by APIs of the **Unicode** class. For example, you can use [isDigit](../reference/apis-localization-kit/js-apis-i18n.md#isdigit9) to check whether a character is a digit. The development procedure is as follows:
15
161. Import the **i18n** module.
17
18   ```ts
19   import { i18n } from '@kit.LocalizationKit';
20   ```
21
222. Obtain the character attribute.
23
24   ```ts
25   let isDigit: boolean = i18n.Unicode.isDigit(char: string);
26   ```
27
283. Obtain the character type. The following code snippet uses the common type as an example. For details, see the **getType** API reference.
29
30   ```ts
31   let type = i18n.Unicode.getType(char: string);
32   ```
33
34**Development Example**
35```ts
36// Import the i18n module.
37import { i18n } from '@kit.LocalizationKit';
38
39// Check whether the input character is a digit.
40let isDigit = i18n.Unicode.isDigit('1'); // isDigit: true
41
42// Check whether a character is of the RTL language.
43let isRTL = i18n.Unicode.isRTL('a'); // isRTL: false
44
45// Check whether a character is an ideographic character.
46let isIdeograph = i18n.Unicode.isIdeograph('Hua'); // isIdeograph: true
47
48// Obtain the character type.
49let type = i18n.Unicode.getType('a'); // type: U_LOWERCASE_LETTER
50```
51
52
53### Transliteration
54
55Transliteration means to use content with similar pronunciation in the local language to replace the original content. This function is implemented through the [transform](../reference/apis-localization-kit/js-apis-i18n.md#transform9) API of the **Transliterator** class. The development procedure is as follows:
56
57> **NOTE**
58> This module supports the transliteration from Chinese characters to pinyin. However, it does not guaranteed that polyphonic characters are effectively processed based on the context.
59
601. Import the **i18n** module.
61   ```ts
62   import { i18n } from '@kit.LocalizationKit';
63   ```
64
652. Create a **Transliterator** object to obtain the transliteration list.
66   ```ts
67   let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance(id: string);  // Pass in a valid ID to create a Transliterator object.
68   let ids: string[] = i18n.Transliterator.getAvailableIDs();  // Obtain the list of IDs supported by the Transliterator object.
69   ```
70
713. Transliterate text.
72   ```ts
73   let res: string = transliterator.transform(text: string);  // Transliterate the text content.
74   ```
75
76
77**Development Example**
78```ts
79// Import the i18n module.
80import { i18n } from '@kit.LocalizationKit';
81
82// Transliterate the text into the Latn format.
83let transliterator = i18n.Transliterator.getInstance('Any-Latn');
84let res = transliterator.transform("中国"); // res = "zhōng guó"
85
86// Obtain the list of IDs supported by the Transliterator object.
87let ids = i18n.Transliterator.getAvailableIDs(); // ids: ['ASCII-Latin', 'Accents-Any', ...]
88```
89
90
91### Character Normalization
92
93Character normalization means to the standardize characters according to the specified paradigm. This function is implemented through the [normalize](../reference/apis-localization-kit/js-apis-i18n.md#normalize10) API of the **Normalizer** class. The development procedure is as follows:
94
951. Import the **i18n** module.
96   ```ts
97   import { i18n } from '@kit.LocalizationKit';
98   ```
99
1002. Create a **Normalizer** object. Pass in the text normalization paradigm to create a **Normalizer** object. The text normalization paradigm can be NFC, NFD, NFKC, or NFKD. For details, see [Unicode Normalization Forms](https://www.unicode.org/reports/tr15/#Norm_Forms).
101   ```ts
102   let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(mode: NormalizerMode);
103   ```
104
1053. Normalize the text.
106   ```ts
107   let normalizedText: string = normalizer.normalize(text: string); // Normalize the text.
108   ```
109
110**Development Example**
111```ts
112// Import the i18n module.
113import { i18n } from '@kit.LocalizationKit';
114
115// Normalize characters in the NFC form.
116let normalizer = i18n.Normalizer.getInstance(i18n.NormalizerMode.NFC);
117let normalizedText = normalizer.normalize('\u1E9B\u0323'); // normalizedText: \u1E9B\u0323
118```
119
120
121### Line Wrapping
122
123Line wrapping means to obtain the text break position based on the specified text boundary and wrap the line. It is implemented by using the APIs of the [BreakIterator](../reference/apis-localization-kit/js-apis-i18n.md#breakiterator8) class. The development procedure is as follows:
124
1251. Import the **i18n** module.
126   ```ts
127   import { i18n } from '@kit.LocalizationKit';
128   ```
129
1302. Create a **BreakIterator** object.
131   Pass a valid locale to create a **BreakIterator** object. This object wraps lines based on the rules specified by the locale.
132
133   ```ts
134   let iterator: i18n.BreakIterator = i18n.getLineInstance(locale: string);
135   ```
136
1373. Set the text to be processed.
138   ```ts
139   iterator.setLineBreakText(text: string); // Set the text to be processed.
140   let breakText: string = iterator.getLineBreakText(); // View the text being processed by the BreakIterator object.
141   ```
142
1434. Obtain the break positions of the text.
144   ```ts
145   let currentPos: number = iterator.current(); // Obtain the position of BreakIterator in the text.
146   let firstPos: number = iterator.first(); // Set the position of BreakIterator as the first break point and return the position of the break point. The first break point is always at the beginning of the text, that is firstPos = 0.
147   let nextPos: number = iterator.next(number); // Move BreakIterator by the specified number of break points. If the number is a positive number, the iterator is moved backward. If the number is a negative number, the iterator is moved forward. The default value is 1. nextPos indicates the position after moving. If BreakIterator is moved out of the text length range, -1 is returned.
148   let isBoundary: boolean = iterator.isBoundary(number); // Check whether the position indicated by the specified number is a break point.
149   ```
150
151
152**Development Example**
153```ts
154// Import the i18n module.
155import { i18n } from '@kit.LocalizationKit';
156
157// Create a BreakIterator object.
158let iterator = i18n.getLineInstance('en-GB');
159
160// Set the text to be processed.
161iterator.setLineBreakText('Apple is my favorite fruit.');
162
163// Move BreakIterator to the beginning of the text.
164let firstPos = iterator.first(); // firstPos: 0
165
166// Move BreakIterator by several break points.
167let nextPos = iterator.next(2); // nextPos: 9
168
169// Check whether a position is a break point.
170let isBoundary = iterator.isBoundary(9); // isBoundary: true
171
172// Obtain the text processed by BreakIterator.
173let breakText = iterator.getLineBreakText(); // breakText: Apple is my favorite fruit.
174```
175<!--RP1--><!--RP1End-->
176
177<!--no_check-->