12e5b6d6dSopenharmony_ci--- 22e5b6d6dSopenharmony_cilayout: default 32e5b6d6dSopenharmony_cititle: API Details 42e5b6d6dSopenharmony_cinav_order: 6 52e5b6d6dSopenharmony_ciparent: Collation 62e5b6d6dSopenharmony_ci--- 72e5b6d6dSopenharmony_ci<!-- 82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others. 92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html 102e5b6d6dSopenharmony_ci--> 112e5b6d6dSopenharmony_ci 122e5b6d6dSopenharmony_ci# Collation API Details 132e5b6d6dSopenharmony_ci{: .no_toc } 142e5b6d6dSopenharmony_ci 152e5b6d6dSopenharmony_ci## Contents 162e5b6d6dSopenharmony_ci{: .no_toc .text-delta } 172e5b6d6dSopenharmony_ci 182e5b6d6dSopenharmony_ci1. TOC 192e5b6d6dSopenharmony_ci{:toc} 202e5b6d6dSopenharmony_ci 212e5b6d6dSopenharmony_ci--- 222e5b6d6dSopenharmony_ci 232e5b6d6dSopenharmony_ci## Overview 242e5b6d6dSopenharmony_ci 252e5b6d6dSopenharmony_ciThis section describes some of the usage conventions for the ICU Collation 262e5b6d6dSopenharmony_ciService API. 272e5b6d6dSopenharmony_ci 282e5b6d6dSopenharmony_ci## Collator Instantiation 292e5b6d6dSopenharmony_ci 302e5b6d6dSopenharmony_ciTo use the Collation Service, you must instantiate a `Collator`. The 312e5b6d6dSopenharmony_ciCollator defines the properties and behavior of the sort ordering. The Collator 322e5b6d6dSopenharmony_cican be repeatedly referenced until all collation activities have been performed. 332e5b6d6dSopenharmony_ciThe Collator can then be closed and removed. 342e5b6d6dSopenharmony_ci 352e5b6d6dSopenharmony_ci### Instantiating the Predefined Collators 362e5b6d6dSopenharmony_ci 372e5b6d6dSopenharmony_ciICU comes with a large set of already predefined collators that are suited for 382e5b6d6dSopenharmony_cispecific locales. Most of the ICU locales have a predefined collator. In the worst 392e5b6d6dSopenharmony_cicase, the CLDR default set of rules, 402e5b6d6dSopenharmony_ciwhich is mostly equivalent to the UCA default ordering (DUCET), is used. 412e5b6d6dSopenharmony_ciThe default sort order itself is designed to work well for many languages. 422e5b6d6dSopenharmony_ci(For example, there are no tailorings for the standard sort orders for 432e5b6d6dSopenharmony_ciEnglish, German, French, etc.) 442e5b6d6dSopenharmony_ci 452e5b6d6dSopenharmony_ciTo instantiate a predefined collator, use the APIs `ucol_open`, `createInstance` and 462e5b6d6dSopenharmony_ci`getInstance` for C, C++ and Java codes respectively. The C API takes a locale ID 472e5b6d6dSopenharmony_ci(or language tag) string argument, C++ takes a Locale object, and Java takes a 482e5b6d6dSopenharmony_ciLocale or ULocale. 492e5b6d6dSopenharmony_ci 502e5b6d6dSopenharmony_ciFor some languages, multiple collation types are available; for example, 512e5b6d6dSopenharmony_ci"de-u-co-phonebk" / "de@collation=phonebook". They can be enumerated via 522e5b6d6dSopenharmony_ci`Collator::getKeywordValuesForLocale()`. See also the list of available collation 532e5b6d6dSopenharmony_citailorings in the online [ICU Collation 542e5b6d6dSopenharmony_ciDemo](https://icu4c-demos.unicode.org/icu-bin/collation.html). 552e5b6d6dSopenharmony_ci 562e5b6d6dSopenharmony_ciStarting with ICU 54, collation attributes can be specified via locale keywords 572e5b6d6dSopenharmony_cias well, in the old locale extension syntax ("el@colCaseFirst=upper") or in 582e5b6d6dSopenharmony_cilanguage tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive. 592e5b6d6dSopenharmony_ci 602e5b6d6dSopenharmony_ciSee the [LDML Collation spec, Collation 612e5b6d6dSopenharmony_ciSettings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings), 622e5b6d6dSopenharmony_ciand the [data 632e5b6d6dSopenharmony_cifile](https://github.com/unicode-org/cldr/blob/main/common/bcp47/collation.xml) listing 642e5b6d6dSopenharmony_cithe valid collation keywords and their values. (The deprecated attributes 652e5b6d6dSopenharmony_cikh/colHiraganaQuaternary and vt/variableTop are not supported.) 662e5b6d6dSopenharmony_ci 672e5b6d6dSopenharmony_ciFor the [old locale extension 682e5b6d6dSopenharmony_cisyntax](http://www.unicode.org/reports/tr35/tr35.html#Old_Locale_Extension_Syntax), 692e5b6d6dSopenharmony_cithe data file's alias names are used (first alias, if defined, otherwise the 702e5b6d6dSopenharmony_ciname): "de@collation=phonebook;colCaseLevel=yes;kv=space" 712e5b6d6dSopenharmony_ci 722e5b6d6dSopenharmony_ciFor the language tag syntax, the non-alias names are used, and "true" values can 732e5b6d6dSopenharmony_cibe omitted: "de-u-co-phonebk-kc-kv-space" 742e5b6d6dSopenharmony_ci 752e5b6d6dSopenharmony_ciThis example demonstrates the instantiation of a collator. 762e5b6d6dSopenharmony_ci 772e5b6d6dSopenharmony_ci**C:** 782e5b6d6dSopenharmony_ci 792e5b6d6dSopenharmony_ci```c 802e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 812e5b6d6dSopenharmony_ciUCollator *coll = ucol_open("en_US", &status); 822e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) { 832e5b6d6dSopenharmony_ci /* close the collator*/ 842e5b6d6dSopenharmony_ci ucol_close(coll); 852e5b6d6dSopenharmony_ci} 862e5b6d6dSopenharmony_ci``` 872e5b6d6dSopenharmony_ci 882e5b6d6dSopenharmony_ci**C++:** 892e5b6d6dSopenharmony_ci 902e5b6d6dSopenharmony_ci```c++ 912e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 922e5b6d6dSopenharmony_ciCollator *coll = Collator::createInstance(Locale("en", "US"), status); 932e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) { 942e5b6d6dSopenharmony_ci //close the collator 952e5b6d6dSopenharmony_ci delete coll; 962e5b6d6dSopenharmony_ci} 972e5b6d6dSopenharmony_ci``` 982e5b6d6dSopenharmony_ci 992e5b6d6dSopenharmony_ci**Java:** 1002e5b6d6dSopenharmony_ci 1012e5b6d6dSopenharmony_ci```java 1022e5b6d6dSopenharmony_ciCollator col = null; 1032e5b6d6dSopenharmony_citry { 1042e5b6d6dSopenharmony_ci col = Collator.getInstance(Locale.US); 1052e5b6d6dSopenharmony_ci} catch (Exception e) { 1062e5b6d6dSopenharmony_ci System.err.println("English collation creation failed."); 1072e5b6d6dSopenharmony_ci e.printStackTrace(); 1082e5b6d6dSopenharmony_ci} 1092e5b6d6dSopenharmony_ci``` 1102e5b6d6dSopenharmony_ci 1112e5b6d6dSopenharmony_ci### Instantiating Collators Using Custom Rules 1122e5b6d6dSopenharmony_ci 1132e5b6d6dSopenharmony_ciIf the ICU predefined collators are not appropriate for your intended usage, you 1142e5b6d6dSopenharmony_cican define your own set of rules and instantiate a collator that uses them. For more 1152e5b6d6dSopenharmony_cidetails, please see [the section on collation customization](customization/index). 1162e5b6d6dSopenharmony_ci 1172e5b6d6dSopenharmony_ciThis example demonstrates the instantiation of a collator. 1182e5b6d6dSopenharmony_ci 1192e5b6d6dSopenharmony_ci**C:** 1202e5b6d6dSopenharmony_ci 1212e5b6d6dSopenharmony_ci```c 1222e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 1232e5b6d6dSopenharmony_ciU_STRING_DECL(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52); 1242e5b6d6dSopenharmony_ciUCollator *coll; 1252e5b6d6dSopenharmony_ci 1262e5b6d6dSopenharmony_ciU_STRING_INIT(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52); 1272e5b6d6dSopenharmony_cicoll = ucol_openRules(rules, -1, UCOL_ON, UCOL_DEFAULT_STRENGTH, NULL, &status); 1282e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) { 1292e5b6d6dSopenharmony_ci /* close the collator*/ 1302e5b6d6dSopenharmony_ci ucol_close(coll); 1312e5b6d6dSopenharmony_ci} 1322e5b6d6dSopenharmony_ci``` 1332e5b6d6dSopenharmony_ci 1342e5b6d6dSopenharmony_ci**C++:** 1352e5b6d6dSopenharmony_ci 1362e5b6d6dSopenharmony_ci```c++ 1372e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 1382e5b6d6dSopenharmony_ciUnicodeString rules(u"&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E"); 1392e5b6d6dSopenharmony_ciCollator *coll = new RuleBasedCollator(rules, status); 1402e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) { 1412e5b6d6dSopenharmony_ci //close the collator 1422e5b6d6dSopenharmony_ci delete coll; 1432e5b6d6dSopenharmony_ci} 1442e5b6d6dSopenharmony_ci``` 1452e5b6d6dSopenharmony_ci 1462e5b6d6dSopenharmony_ci**Java:** 1472e5b6d6dSopenharmony_ci 1482e5b6d6dSopenharmony_ci```java 1492e5b6d6dSopenharmony_ciRuleBasedCollator coll = null; 1502e5b6d6dSopenharmony_ciString ruleset = "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E"; 1512e5b6d6dSopenharmony_citry { 1522e5b6d6dSopenharmony_ci coll = new RuleBasedCollator(ruleset); 1532e5b6d6dSopenharmony_ci} catch (Exception e) { 1542e5b6d6dSopenharmony_ci System.err.println("Customized collation creation failed."); 1552e5b6d6dSopenharmony_ci e.printStackTrace(); 1562e5b6d6dSopenharmony_ci} 1572e5b6d6dSopenharmony_ci``` 1582e5b6d6dSopenharmony_ci 1592e5b6d6dSopenharmony_ci## Compare 1602e5b6d6dSopenharmony_ci 1612e5b6d6dSopenharmony_ciTwo of the most used functions in ICU collation API, `ucol_strcoll` and `ucol_getSortKey`, have their counterparts in both Win32 and ANSI APIs: 1622e5b6d6dSopenharmony_ci 1632e5b6d6dSopenharmony_ciICU C | ICU C++ | ICU Java | ANSI/POSIX | WIN32 1642e5b6d6dSopenharmony_ci----------------- | --------------------------- | -------------------------- | ---------- | ----- 1652e5b6d6dSopenharmony_ci`ucol_strcoll` | `Collator::compare` | `Collator.compare` | `strcoll` | `CompareString` 1662e5b6d6dSopenharmony_ci`ucol_getSortKey` | `Collator::getSortKey` | `Collator.getCollationKey` | `strxfrm` | `LCMapString` 1672e5b6d6dSopenharmony_ci | `Collator::getCollationKey` | | | 1682e5b6d6dSopenharmony_ci 1692e5b6d6dSopenharmony_ciFor more sophisticated usage, such as user-controlled language-sensitive text 1702e5b6d6dSopenharmony_cisearching, an iterating interface to collation is provided. Please refer to the 1712e5b6d6dSopenharmony_cisection below on `CollationElementIterator` for more details. 1722e5b6d6dSopenharmony_ci 1732e5b6d6dSopenharmony_ciThe `ucol_compare` function compares one pair of strings at a time. Comparing two 1742e5b6d6dSopenharmony_cistrings is much faster than calculating sort keys for both of them. However, if 1752e5b6d6dSopenharmony_cicomparisons should be done repeatedly on a very large number of strings, generating 1762e5b6d6dSopenharmony_ciand storing sort keys can improve performance. In all other cases (such as quick 1772e5b6d6dSopenharmony_cisort or bubble sort of a 1782e5b6d6dSopenharmony_cimoderately-sized list of strings), comparing strings works very well. 1792e5b6d6dSopenharmony_ci 1802e5b6d6dSopenharmony_ciThe C API used for comparing two strings is `ucol_strcoll`. It requires two 1812e5b6d6dSopenharmony_ci`UChar *` strings and their lengths as parameters, as well as a pointer to a valid 1822e5b6d6dSopenharmony_ci`UCollator` instance. The result is a `UCollationResult` constant, which can be one 1832e5b6d6dSopenharmony_ciof `UCOL_LESS`, `UCOL_EQUAL` or `UCOL_GREATER`. 1842e5b6d6dSopenharmony_ci 1852e5b6d6dSopenharmony_ciThe C++ API offers the method `Collator::compare` with several overloads. 1862e5b6d6dSopenharmony_ciAcceptable input arguments are `UChar *` with length of strings, or `UnicodeString` 1872e5b6d6dSopenharmony_ciinstances. The result is a member of the `UCollationResult` or `EComparisonResult` enums. 1882e5b6d6dSopenharmony_ci 1892e5b6d6dSopenharmony_ciThe Java API provides the method `Collator.compare` with one overload. Acceptable 1902e5b6d6dSopenharmony_ciinput arguments are Strings or Objects. The result is an int value, which is 1912e5b6d6dSopenharmony_ciless than zero if source is less than target, zero if source and target are 1922e5b6d6dSopenharmony_ciequal, or greater than zero if source is greater than target. 1932e5b6d6dSopenharmony_ci 1942e5b6d6dSopenharmony_ciThere are also several convenience functions and methods returning a boolean 1952e5b6d6dSopenharmony_civalue, such as `ucol_greater`, `ucol_greaterOrEqual`, `ucol_equal` (in C) 1962e5b6d6dSopenharmony_ci`Collator::greater`, `Collator::greaterOrEqual`, `Collator::equal` (in C++) and 1972e5b6d6dSopenharmony_ci`Collator.equals` (in Java). 1982e5b6d6dSopenharmony_ci 1992e5b6d6dSopenharmony_ci### Examples 2002e5b6d6dSopenharmony_ci 2012e5b6d6dSopenharmony_ci**C:** 2022e5b6d6dSopenharmony_ci 2032e5b6d6dSopenharmony_ci```c 2042e5b6d6dSopenharmony_ciUChar *s [] = { /* list of Unicode strings */ }; 2052e5b6d6dSopenharmony_ciuint32_t listSize = sizeof(s)/sizeof(s[0]); 2062e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 2072e5b6d6dSopenharmony_ciUCollator *coll = ucol_open("en_US", &status); 2082e5b6d6dSopenharmony_ciuint32_t i, j; 2092e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) { 2102e5b6d6dSopenharmony_ci for(i=listSize-1; i>=1; i--) { 2112e5b6d6dSopenharmony_ci for(j=0; j<i; j++) { 2122e5b6d6dSopenharmony_ci if(ucol_strcoll(s[j], -1, s[j+1], -1) == UCOL_LESS) { 2132e5b6d6dSopenharmony_ci swap(s[j], s[j+1]); 2142e5b6d6dSopenharmony_ci } 2152e5b6d6dSopenharmony_ci } 2162e5b6d6dSopenharmony_ci} 2172e5b6d6dSopenharmony_ciucol_close(coll); 2182e5b6d6dSopenharmony_ci} 2192e5b6d6dSopenharmony_ci``` 2202e5b6d6dSopenharmony_ci 2212e5b6d6dSopenharmony_ci**C++:** 2222e5b6d6dSopenharmony_ci 2232e5b6d6dSopenharmony_ci```c++ 2242e5b6d6dSopenharmony_ciUnicodeString s [] = { /* list of Unicode strings */ }; 2252e5b6d6dSopenharmony_ciuint32_t listSize = sizeof(s)/sizeof(s[0]); 2262e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 2272e5b6d6dSopenharmony_ciCollator *coll = Collator::createInstance(Locale("en", "US"), status); 2282e5b6d6dSopenharmony_ciuint32_t i, j; 2292e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) { 2302e5b6d6dSopenharmony_ci for(i=listSize-1; i>=1; i--) { 2312e5b6d6dSopenharmony_ci for(j=0; j<i; j++) { 2322e5b6d6dSopenharmony_ci if(coll->compare(s[j], s[j+1]) == UCOL_LESS) { 2332e5b6d6dSopenharmony_ci swap(s[j], s[j+1]); 2342e5b6d6dSopenharmony_ci } 2352e5b6d6dSopenharmony_ci } 2362e5b6d6dSopenharmony_ci} 2372e5b6d6dSopenharmony_cidelete coll; 2382e5b6d6dSopenharmony_ci} 2392e5b6d6dSopenharmony_ci``` 2402e5b6d6dSopenharmony_ci 2412e5b6d6dSopenharmony_ci**Java:** 2422e5b6d6dSopenharmony_ci 2432e5b6d6dSopenharmony_ci```java 2442e5b6d6dSopenharmony_ciString s [] = { /* list of Unicode strings */ }; 2452e5b6d6dSopenharmony_citry { 2462e5b6d6dSopenharmony_ci Collator coll = Collator.getInstance(Locale.US); 2472e5b6d6dSopenharmony_ci for (int i = s.length - 1; i > = 1; i --) { 2482e5b6d6dSopenharmony_ci for (j=0; j<i; j++) { 2492e5b6d6dSopenharmony_ci if (coll.compare(s[j], s[j+1]) == -1) { 2502e5b6d6dSopenharmony_ci swap(s[j], s[j+1]); 2512e5b6d6dSopenharmony_ci } 2522e5b6d6dSopenharmony_ci } 2532e5b6d6dSopenharmony_ci } 2542e5b6d6dSopenharmony_ci} catch (Exception e) { 2552e5b6d6dSopenharmony_ci System.err.println("English collation creation failed."); 2562e5b6d6dSopenharmony_ci e.printStackTrace(); 2572e5b6d6dSopenharmony_ci} 2582e5b6d6dSopenharmony_ci``` 2592e5b6d6dSopenharmony_ci 2602e5b6d6dSopenharmony_ci## GetSortKey 2612e5b6d6dSopenharmony_ci 2622e5b6d6dSopenharmony_ciThe C API provides the `ucol_getSortKey` function, which requires (apart from a 2632e5b6d6dSopenharmony_cipointer to a valid `UCollator` instance), an original `UChar` pointer, together with 2642e5b6d6dSopenharmony_ciits length. It also requires a pointer to a receiving buffer and its length. 2652e5b6d6dSopenharmony_ci 2662e5b6d6dSopenharmony_ciThe C++ API provides the `Collator::getSortKey` method with similar parameters as 2672e5b6d6dSopenharmony_cithe C version. It also provides `Collator::getCollationKey`, which produces a 2682e5b6d6dSopenharmony_ci`CollationKey` object instance (a wrapper around a sort key). 2692e5b6d6dSopenharmony_ci 2702e5b6d6dSopenharmony_ciThe Java API provides only the `Collator.getCollationKey` method, which produces a 2712e5b6d6dSopenharmony_ci`CollationKey` object instance (a wrapper around a sort key). 2722e5b6d6dSopenharmony_ci 2732e5b6d6dSopenharmony_ciSort keys are generally only useful in databases or other circumstances where 2742e5b6d6dSopenharmony_cifunction calls are extremely expensive. See [Sortkeys vs 2752e5b6d6dSopenharmony_ciComparison](concepts#sortkeys-vs-comparison). 2762e5b6d6dSopenharmony_ci 2772e5b6d6dSopenharmony_ci### Sort Key Features 2782e5b6d6dSopenharmony_ci 2792e5b6d6dSopenharmony_ciICU writes sort keys as sequences of bytes. 2802e5b6d6dSopenharmony_ci 2812e5b6d6dSopenharmony_ciEach sort key ends with one 00 byte and does not contain any other 00 byte. The 2822e5b6d6dSopenharmony_citerminating 00 byte is included in the length of the sort key as returned by the 2832e5b6d6dSopenharmony_ciAPI (unlike any other ICU API where terminating NUL bytes or characters are not 2842e5b6d6dSopenharmony_cicounted as part of the length). 2852e5b6d6dSopenharmony_ci 2862e5b6d6dSopenharmony_ciSort key byte sequences must be compared with an unsigned-byte comparison, as 2872e5b6d6dSopenharmony_ciwith `strcmp()`. 2882e5b6d6dSopenharmony_ci 2892e5b6d6dSopenharmony_ciComparing the sort keys of two strings from the same collator yields the same 2902e5b6d6dSopenharmony_ciordering as using the collator to compare the two strings directly. That is: 2912e5b6d6dSopenharmony_ci`strcmp(coll.getSortKey(str1), coll.getSortKey(str2))` is equivalent to 2922e5b6d6dSopenharmony_ci`coll.compare(str1, str2)`. 2932e5b6d6dSopenharmony_ci 2942e5b6d6dSopenharmony_ciSort keys from different collators (different locale or strength or any other 2952e5b6d6dSopenharmony_ciattributes/settings) are not comparable. 2962e5b6d6dSopenharmony_ci 2972e5b6d6dSopenharmony_ciSort keys can be "merged" as described in [UTS #10 Merging Sort 2982e5b6d6dSopenharmony_ciKeys](http://www.unicode.org/reports/tr10/#Merging_Sort_Keys), via 2992e5b6d6dSopenharmony_ci`ucol_mergeSortkeys()` or Java `CollationKey.merge()`. 3002e5b6d6dSopenharmony_ci 3012e5b6d6dSopenharmony_ci* Since CLDR 1.9/ICU 4.6, the same effect can be achieved by concatenating 3022e5b6d6dSopenharmony_ci strings with U+FFFE between them. The concatenation has the same sort order 3032e5b6d6dSopenharmony_ci as the merged sort keys. 3042e5b6d6dSopenharmony_ci* However, it is not guaranteed that the sort key of the concatenated strings 3052e5b6d6dSopenharmony_ci is the same as the merged result of the individual sort keys. (That is, 3062e5b6d6dSopenharmony_ci merge(getSortKey(str1), getSortKey(str2)) may differ from getSortKey(str1 + 3072e5b6d6dSopenharmony_ci '\\uFFFE' + str2).) 3082e5b6d6dSopenharmony_ci* In particular, a future version of ICU is likely to generate shorter sort 3092e5b6d6dSopenharmony_ci keys when concatenating strings with U+FFFE between them (by using 3102e5b6d6dSopenharmony_ci compression across the U+FFFE weights). 3112e5b6d6dSopenharmony_ci* *The recommended way to achieve "merged" sorting is via strings with 3122e5b6d6dSopenharmony_ci U+FFFE.* 3132e5b6d6dSopenharmony_ci 3142e5b6d6dSopenharmony_ciAny further analysis or parsing of sort keys is not supported. 3152e5b6d6dSopenharmony_ci 3162e5b6d6dSopenharmony_ciSort keys will change from one ICU version to another; therefore, if sort keys 3172e5b6d6dSopenharmony_ciare stored in a database or other persistent storage, then each upgrade requires 3182e5b6d6dSopenharmony_citheir regeneration. 3192e5b6d6dSopenharmony_ci 3202e5b6d6dSopenharmony_ci* The details of the underlying data change with every Unicode and CLDR 3212e5b6d6dSopenharmony_ci version. 3222e5b6d6dSopenharmony_ci* Sort keys are also subject to enhancements and bug fixes in the builder and 3232e5b6d6dSopenharmony_ci implementation code. 3242e5b6d6dSopenharmony_ci* On the other hand, the sort *order* is much more stable. It is subject to 3252e5b6d6dSopenharmony_ci deliberate changes to the default Unicode collation order, which is kept 3262e5b6d6dSopenharmony_ci quite stable, and subject to deliberate changes in CLDR data as new data is 3272e5b6d6dSopenharmony_ci added and feedback on existing data is taken into account. 3282e5b6d6dSopenharmony_ci 3292e5b6d6dSopenharmony_ciImplementation notes: (Not supported as permanent constraints on sort keys) 3302e5b6d6dSopenharmony_ci 3312e5b6d6dSopenharmony_ciByte 02 was unique as a merge separator for some versions of ICU before version 3322e5b6d6dSopenharmony_ciICU 53. Since ICU 53, 02 is also used in regular collation weights where there 3332e5b6d6dSopenharmony_ciis no conflict (to expand the number of available short weights). 3342e5b6d6dSopenharmony_ci 3352e5b6d6dSopenharmony_ciByte 01 has been unique as a level separator. This is not strictly necessary for 3362e5b6d6dSopenharmony_cinon-primary levels. (A level's compressible "common" weight as its level 3372e5b6d6dSopenharmony_ciseparator would yield shorter sort keys.) However, the current implementation of 3382e5b6d6dSopenharmony_ci`ucol_mergeSortkeys()` relies on it. (Also, test code currently examines sort keys 3392e5b6d6dSopenharmony_cifor finding the strength of a comparison difference.) This may change in the 3402e5b6d6dSopenharmony_cifuture, especially if `ucol_mergeSortkeys()` were to become deprecated. 3412e5b6d6dSopenharmony_ci 3422e5b6d6dSopenharmony_ciLevel separators are likely to be equivalent to single-byte weights (possibly 3432e5b6d6dSopenharmony_cicompressible): Multi-byte level separators would noticeably lengthen sort keys 3442e5b6d6dSopenharmony_cifor short strings. 3452e5b6d6dSopenharmony_ci 3462e5b6d6dSopenharmony_ciThe byte values used in several ICU versions for sort keys and collation 3472e5b6d6dSopenharmony_cielements are documented in the [“Special Byte Values” design 3482e5b6d6dSopenharmony_cidoc](https://icu.unicode.org/design/collation/bytes) on the ICU site. 3492e5b6d6dSopenharmony_ci 3502e5b6d6dSopenharmony_ci### Sort Key Output Buffer 3512e5b6d6dSopenharmony_ci 3522e5b6d6dSopenharmony_ci`ucol_getSortKey()` can operate in 'preflighting' mode, which returns the amount 3532e5b6d6dSopenharmony_ciof memory needed to store the resulting sort key. This mode is automatically 3542e5b6d6dSopenharmony_ciactivated if the output buffer size passed is set to zero. Should the sort key 3552e5b6d6dSopenharmony_cibecome longer than the buffer provided, function again slips into preflighting 3562e5b6d6dSopenharmony_cimode. The overall performance is poorer than if the function is called with a 3572e5b6d6dSopenharmony_cizero output buffer. If the size of the sort key returned is greater than the 3582e5b6d6dSopenharmony_cisize of the buffer provided, the content of the result buffer is undefined. In 3592e5b6d6dSopenharmony_cithat case, the result buffer could be reallocated to its proper size and the 3602e5b6d6dSopenharmony_cisort key generator function can be used again. 3612e5b6d6dSopenharmony_ci 3622e5b6d6dSopenharmony_ciThe best way to generate a series of sort keys is to do the following: 3632e5b6d6dSopenharmony_ci 3642e5b6d6dSopenharmony_ci1. Create a big temporary buffer on the stack. Typically, this buffer is 3652e5b6d6dSopenharmony_ci allocated only once, and reused with every sort key generated. There is no 3662e5b6d6dSopenharmony_ci need to keep it as small as possible. A recommended size for the temporary 3672e5b6d6dSopenharmony_ci buffer is four times the length of the longest string processed. 3682e5b6d6dSopenharmony_ci 3692e5b6d6dSopenharmony_ci2. Start the loop. Call `ucol_getSortKey()` to find out how big the sort key 3702e5b6d6dSopenharmony_ci buffer should be, and fill in the temporary buffer at the same time. 3712e5b6d6dSopenharmony_ci 3722e5b6d6dSopenharmony_ci3. If the temporary buffer is too small, allocate or reallocate more space. 3732e5b6d6dSopenharmony_ci Fill in the sort key values in the overflow buffer. 3742e5b6d6dSopenharmony_ci 3752e5b6d6dSopenharmony_ci4. Allocate the sort key buffer with the size returned by `ucol_getSortKey()` and 3762e5b6d6dSopenharmony_ci call `memcpy` to copy the sort key content from the temp buffer to the sort 3772e5b6d6dSopenharmony_ci key buffer. 3782e5b6d6dSopenharmony_ci 3792e5b6d6dSopenharmony_ci5. Loop back to step 1 until you are done. 3802e5b6d6dSopenharmony_ci 3812e5b6d6dSopenharmony_ci6. Delete the overflow buffer if you created one. 3822e5b6d6dSopenharmony_ci 3832e5b6d6dSopenharmony_ci### Example 3842e5b6d6dSopenharmony_ci 3852e5b6d6dSopenharmony_ci```c 3862e5b6d6dSopenharmony_civoid GetSortKeys(const Ucollator* coll, const UChar* 3872e5b6d6dSopenharmony_ciconst *source, uint32_t arrayLength) 3882e5b6d6dSopenharmony_ci{ 3892e5b6d6dSopenharmony_ci char[1000] buffer; // allocate stack buffer 3902e5b6d6dSopenharmony_ci char* currBuffer = buffer; 3912e5b6d6dSopenharmony_ci int32_t bufferLen = sizeof(buffer); 3922e5b6d6dSopenharmony_ci int32_t expectedLen = 0; 3932e5b6d6dSopenharmony_ci UErrorCode err = U_ZERO_ERROR; 3942e5b6d6dSopenharmony_ci 3952e5b6d6dSopenharmony_ci for (int i = 0; i < arrayLength; ++i) { 3962e5b6d6dSopenharmony_ci expectedLen = ucol_getSortKey(coll, source[i], -1, currBuffer, bufferLen); 3972e5b6d6dSopenharmony_ci if (expectedLen > bufferLen) { 3982e5b6d6dSopenharmony_ci if (currBuffer == buffer) { 3992e5b6d6dSopenharmony_ci currBuffer = (char*)malloc(expectedLen); 4002e5b6d6dSopenharmony_ci } else { 4012e5b6d6dSopenharmony_ci currBuffer = (char*)realloc(currBuffer, expectedLen); 4022e5b6d6dSopenharmony_ci } 4032e5b6d6dSopenharmony_ci } 4042e5b6d6dSopenharmony_ci bufferLen = ucol_getSortKey(coll, source[i], -1, currBuffer, expectedLen); 4052e5b6d6dSopenharmony_ci } 4062e5b6d6dSopenharmony_ci processSortKey(i, currBuffer, bufferLen); 4072e5b6d6dSopenharmony_ci 4082e5b6d6dSopenharmony_ci 4092e5b6d6dSopenharmony_ci if (currBuffer != buffer && currBuffer != NULL) { 4102e5b6d6dSopenharmony_ci free(currBuffer); 4112e5b6d6dSopenharmony_ci } 4122e5b6d6dSopenharmony_ci} 4132e5b6d6dSopenharmony_ci``` 4142e5b6d6dSopenharmony_ci 4152e5b6d6dSopenharmony_ci> :point_right: **Note** Although the API allows you to call 4162e5b6d6dSopenharmony_ci> `ucol_getSortKey` with `NULL` to see what the 4172e5b6d6dSopenharmony_ci> sort key length is, it is strongly recommended that you NOT determine the length 4182e5b6d6dSopenharmony_ci> first, then allocate and fill the sort key buffer. If you do, it requires twice 4192e5b6d6dSopenharmony_ci> the processing since computing the length has to do the same calculation as 4202e5b6d6dSopenharmony_ci> actually getting the sort key. Instead, the example shown above uses a stack buffer. 4212e5b6d6dSopenharmony_ci 4222e5b6d6dSopenharmony_ci### Using Iterators for String Comparison 4232e5b6d6dSopenharmony_ci 4242e5b6d6dSopenharmony_ciICU4C's `ucol_strcollIter` API allows for comparing two strings that are supplied 4252e5b6d6dSopenharmony_cias character iterators (`UCharIterator`). This is useful when you need to compare 4262e5b6d6dSopenharmony_cidifferently encoded strings using `strcoll`. In that case, converting the strings 4272e5b6d6dSopenharmony_cifirst would probably be wasteful, since `strcoll` usually gives the result 4282e5b6d6dSopenharmony_cibefore whole strings are processed. This API is implemented only as a C function 4292e5b6d6dSopenharmony_ciin ICU4C. There are no equivalent C++ or ICU4J functions. 4302e5b6d6dSopenharmony_ci 4312e5b6d6dSopenharmony_ci```c 4322e5b6d6dSopenharmony_ci... 4332e5b6d6dSopenharmony_ci/* we are arriving with two char*: utf8Source and utf8Target, with their 4342e5b6d6dSopenharmony_ci* lengths in utf8SourceLen and utf8TargetLen 4352e5b6d6dSopenharmony_ci*/ 4362e5b6d6dSopenharmony_ci UCharIterator sIter, tIter; 4372e5b6d6dSopenharmony_ci uiter_setUTF8(&sIter, utf8Source, utf8SourceLen); 4382e5b6d6dSopenharmony_ci uiter_setUTF8(&tIter, utf8Target, utf8TargetLen); 4392e5b6d6dSopenharmony_ci compareResultUTF8 = ucol_strcollIter(myCollation, &sIter, &tIter, &status); 4402e5b6d6dSopenharmony_ci... 4412e5b6d6dSopenharmony_ci``` 4422e5b6d6dSopenharmony_ci 4432e5b6d6dSopenharmony_ci### Obtaining Partial Sort Keys 4442e5b6d6dSopenharmony_ci 4452e5b6d6dSopenharmony_ciWhen using different sort algorithms, such as radix sort, sometimes it is useful 4462e5b6d6dSopenharmony_cito process strings only as much as needed to feed into the sorting algorithm. 4472e5b6d6dSopenharmony_ciFor that purpose, ICU provides the `ucol_nextSortKeyPart` API, which also takes 4482e5b6d6dSopenharmony_cicharacter iterators. This API allows for iterating over subsequent pieces of an 4492e5b6d6dSopenharmony_ciuncompressed sort key. Between calls to the API you need to save a 64-bit state. 4502e5b6d6dSopenharmony_ciFollowing is an example of simulating a string compare function using the partial 4512e5b6d6dSopenharmony_cisort key API. Your usage model is bound to look much different. 4522e5b6d6dSopenharmony_ci 4532e5b6d6dSopenharmony_ci```c 4542e5b6d6dSopenharmony_cistatic UCollationResult compareUsingPartials(UCollator *coll, 4552e5b6d6dSopenharmony_ci const UChar source[], int32_t sLen, 4562e5b6d6dSopenharmony_ci const UChar target[], int32_t tLen, 4572e5b6d6dSopenharmony_ci int32_t pieceSize, UErrorCode *status) { 4582e5b6d6dSopenharmony_ci int32_t partialSKResult = 0; 4592e5b6d6dSopenharmony_ci UCharIterator sIter, tIter; 4602e5b6d6dSopenharmony_ci uint32_t sState[2], tState[2]; 4612e5b6d6dSopenharmony_ci int32_t sSize = pieceSize, tSize = pieceSize; 4622e5b6d6dSopenharmony_ci int32_t i = 0; 4632e5b6d6dSopenharmony_ci uint8_t sBuf[16384], tBuf[16384]; 4642e5b6d6dSopenharmony_ci if(pieceSize > 16384) { 4652e5b6d6dSopenharmony_ci *status = U_BUFFER_OVERFLOW_ERROR; 4662e5b6d6dSopenharmony_ci return UCOL_EQUAL; 4672e5b6d6dSopenharmony_ci } 4682e5b6d6dSopenharmony_ci *status = U_ZERO_ERROR; 4692e5b6d6dSopenharmony_ci sState[0] = 0; sState[1] = 0; 4702e5b6d6dSopenharmony_ci tState[0] = 0; tState[1] = 0; 4712e5b6d6dSopenharmony_ci while(sSize == pieceSize && tSize == pieceSize && partialSKResult == 0) { 4722e5b6d6dSopenharmony_ci uiter_setString(&sIter, source, sLen); 4732e5b6d6dSopenharmony_ci uiter_setString(&tIter, target, tLen); 4742e5b6d6dSopenharmony_ci sSize = ucol_nextSortKeyPart(coll, &sIter, sState, sBuf, pieceSize, status); 4752e5b6d6dSopenharmony_ci tSize = ucol_nextSortKeyPart(coll, &tIter, tState, tBuf, pieceSize, status); 4762e5b6d6dSopenharmony_ci partialSKResult = memcmp(sBuf, tBuf, pieceSize); 4772e5b6d6dSopenharmony_ci } 4782e5b6d6dSopenharmony_ci 4792e5b6d6dSopenharmony_ci if(partialSKResult < 0) { 4802e5b6d6dSopenharmony_ci return UCOL_LESS; 4812e5b6d6dSopenharmony_ci } else if(partialSKResult > 0) { 4822e5b6d6dSopenharmony_ci return UCOL_GREATER; 4832e5b6d6dSopenharmony_ci } else { 4842e5b6d6dSopenharmony_ci return UCOL_EQUAL; 4852e5b6d6dSopenharmony_ci } 4862e5b6d6dSopenharmony_ci} 4872e5b6d6dSopenharmony_ci``` 4882e5b6d6dSopenharmony_ci 4892e5b6d6dSopenharmony_ci### Other Examples 4902e5b6d6dSopenharmony_ci 4912e5b6d6dSopenharmony_ciA longer example is presented in the 'Examples' section. Here is an illustration 4922e5b6d6dSopenharmony_ciof the usage model. 4932e5b6d6dSopenharmony_ci 4942e5b6d6dSopenharmony_ci**C:** 4952e5b6d6dSopenharmony_ci 4962e5b6d6dSopenharmony_ci```c 4972e5b6d6dSopenharmony_ci#define MAX_KEY_SIZE 100 4982e5b6d6dSopenharmony_ci#define MAX_BUFFER_SIZE 10000 4992e5b6d6dSopenharmony_ci#define MAX_LIST_LENGTH 5 5002e5b6d6dSopenharmony_ciconst char text[] = { 5012e5b6d6dSopenharmony_ci "Quick", 5022e5b6d6dSopenharmony_ci "fox", 5032e5b6d6dSopenharmony_ci "Moving", 5042e5b6d6dSopenharmony_ci "trucks", 5052e5b6d6dSopenharmony_ci "riddle" 5062e5b6d6dSopenharmony_ci}; 5072e5b6d6dSopenharmony_ciconst UChar s [5][20]; 5082e5b6d6dSopenharmony_ciint i; 5092e5b6d6dSopenharmony_ciint32_t length, expectedLen; 5102e5b6d6dSopenharmony_ciuint8_t temp[MAX_BUFFER _SIZE]; 5112e5b6d6dSopenharmony_ci 5122e5b6d6dSopenharmony_ci 5132e5b6d6dSopenharmony_ciuint8_t *temp2 = NULL; 5142e5b6d6dSopenharmony_ciuint8_t keys [MAX_LIST_LENGTH][MAX_KEY_SIZE]; 5152e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 5162e5b6d6dSopenharmony_ci 5172e5b6d6dSopenharmony_citemp2 = temp; 5182e5b6d6dSopenharmony_ci 5192e5b6d6dSopenharmony_cilength = MAX_BUFFER_SIZE; 5202e5b6d6dSopenharmony_cifor( i = 0; i < 5; i++) 5212e5b6d6dSopenharmony_ci{ 5222e5b6d6dSopenharmony_ci u_uastrcpy(s[i], text[i]); 5232e5b6d6dSopenharmony_ci} 5242e5b6d6dSopenharmony_ciUCollator *coll = ucol_open("en_US",&status); 5252e5b6d6dSopenharmony_ciuint32_t length; 5262e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) { 5272e5b6d6dSopenharmony_ci for(i=0; i<MAX_LIST_LENGTH; i++) { 5282e5b6d6dSopenharmony_ci expectedLen = ucol_getSortKey(coll, s[i], -1,temp2,length ); 5292e5b6d6dSopenharmony_ci if (expectedLen > length) { 5302e5b6d6dSopenharmony_ci if (temp2 == temp) { 5312e5b6d6dSopenharmony_ci temp2 =(char*)malloc(expectedLen); 5322e5b6d6dSopenharmony_ci } else { 5332e5b6d6dSopenharmony_ci temp2 =(char*)realloc(temp2, expectedLen); 5342e5b6d6dSopenharmony_ci } 5352e5b6d6dSopenharmony_ci length =ucol_getSortKey(coll, s[i], -1, temp2, expectedLen); 5362e5b6d6dSopenharmony_ci } 5372e5b6d6dSopenharmony_ci memcpy(key[i], temp2, length); 5382e5b6d6dSopenharmony_ci } 5392e5b6d6dSopenharmony_ci} 5402e5b6d6dSopenharmony_ciqsort(keys, MAX_LIST_LENGTH,MAX_KEY_SIZE*sizeof(uint8_t), strcmp); 5412e5b6d6dSopenharmony_cifor (i = 0; i < MAX_LIST_LENGTH; i++) { 5422e5b6d6dSopenharmony_ci free(key[i]); 5432e5b6d6dSopenharmony_ci} 5442e5b6d6dSopenharmony_ciucol_close(coll); 5452e5b6d6dSopenharmony_ci``` 5462e5b6d6dSopenharmony_ci 5472e5b6d6dSopenharmony_ci**C++:** 5482e5b6d6dSopenharmony_ci 5492e5b6d6dSopenharmony_ci```c++ 5502e5b6d6dSopenharmony_ci#define MAX_LIST_LENGTH 5 5512e5b6d6dSopenharmony_ciconst UnicodeString s [] = { 5522e5b6d6dSopenharmony_ci "Quick", 5532e5b6d6dSopenharmony_ci "fox", 5542e5b6d6dSopenharmony_ci "Moving", 5552e5b6d6dSopenharmony_ci "trucks", 5562e5b6d6dSopenharmony_ci "riddle" 5572e5b6d6dSopenharmony_ci}; 5582e5b6d6dSopenharmony_ciCollationKey *keys[MAX_LIST_LENGTH]; 5592e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 5602e5b6d6dSopenharmony_ciCollator *coll = Collator::createInstance(Locale("en_US"), status); 5612e5b6d6dSopenharmony_ciuint32_t i; 5622e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) { 5632e5b6d6dSopenharmony_ci for(i=0; i<listSize; i++) { 5642e5b6d6dSopenharmony_ci keys[i] = coll->getCollationKey(s[i], -1); 5652e5b6d6dSopenharmony_ci } 5662e5b6d6dSopenharmony_ci qsort(keys, MAX_LIST_LENGTH, sizeof(CollationKey),compareKeys); 5672e5b6d6dSopenharmony_ci delete[] keys; 5682e5b6d6dSopenharmony_ci delete coll; 5692e5b6d6dSopenharmony_ci} 5702e5b6d6dSopenharmony_ci``` 5712e5b6d6dSopenharmony_ci 5722e5b6d6dSopenharmony_ci**Java:** 5732e5b6d6dSopenharmony_ci 5742e5b6d6dSopenharmony_ci```java 5752e5b6d6dSopenharmony_ciString s [] = { 5762e5b6d6dSopenharmony_ci "Quick", 5772e5b6d6dSopenharmony_ci "fox", 5782e5b6d6dSopenharmony_ci "Moving", 5792e5b6d6dSopenharmony_ci "trucks", 5802e5b6d6dSopenharmony_ci "riddle" 5812e5b6d6dSopenharmony_ci}; 5822e5b6d6dSopenharmony_ciCollationKey keys[] = new CollationKey[s.length]; 5832e5b6d6dSopenharmony_citry { 5842e5b6d6dSopenharmony_ci Collator coll = Collator.getInstance(Locale.US); 5852e5b6d6dSopenharmony_ci for (int i = 0; i < s.length; i ++) { 5862e5b6d6dSopenharmony_ci keys[i] = coll.getCollationKey(s[i]); 5872e5b6d6dSopenharmony_ci } 5882e5b6d6dSopenharmony_ci 5892e5b6d6dSopenharmony_ci Arrays.sort(keys); 5902e5b6d6dSopenharmony_ci} 5912e5b6d6dSopenharmony_cicatch (Exception e) { 5922e5b6d6dSopenharmony_ci System.err.println("Error creating English collator"); 5932e5b6d6dSopenharmony_ci e.printStackTrace(); 5942e5b6d6dSopenharmony_ci} 5952e5b6d6dSopenharmony_ci``` 5962e5b6d6dSopenharmony_ci 5972e5b6d6dSopenharmony_ci## Collation ElementIterator 5982e5b6d6dSopenharmony_ci 5992e5b6d6dSopenharmony_ciA collation element iterator can only be used in one direction. This is 6002e5b6d6dSopenharmony_ciestablished at the time of the first call to retrieve a collation element. Once 6012e5b6d6dSopenharmony_ci`ucol_next` (C), `CollationElementIterator::next` (C++) or 6022e5b6d6dSopenharmony_ci`CollationElementIterator.next` (Java) are invoked, 6032e5b6d6dSopenharmony_ci`ucol_previous` (C), 6042e5b6d6dSopenharmony_ci`CollationElementIterator::previous` (C++) or `CollationElementIterator.previous` 6052e5b6d6dSopenharmony_ci(Java) should not be used (and vice versa). The direction can be changed 6062e5b6d6dSopenharmony_ciimmediately after `ucol_first`, `ucol_last`, `ucol_reset` (in C), 6072e5b6d6dSopenharmony_ci`CollationElementIterator::first`, `CollationElementIterator::last`, 6082e5b6d6dSopenharmony_ci`CollationElementIterator::reset` (in C++) or `CollationElementIterator.first`, 6092e5b6d6dSopenharmony_ci`CollationElementIterator.last`, `CollationElementIterator.reset` (in Java) is 6102e5b6d6dSopenharmony_cicalled, or when it reaches the end of string while traversing the string. 6112e5b6d6dSopenharmony_ci 6122e5b6d6dSopenharmony_ciWhen `ucol_next` is called at the end of the string buffer, `UCOL_NULLORDER` is 6132e5b6d6dSopenharmony_cialways returned with any subsequent calls to `ucol_next`. The same applies to 6142e5b6d6dSopenharmony_ci`ucol_previous`. 6152e5b6d6dSopenharmony_ci 6162e5b6d6dSopenharmony_ciAn example of how iterators are used is the Boyer-Moore search implementation, 6172e5b6d6dSopenharmony_ciwhich can be found in the samples section. 6182e5b6d6dSopenharmony_ci 6192e5b6d6dSopenharmony_ci### API Example 6202e5b6d6dSopenharmony_ci 6212e5b6d6dSopenharmony_ci**C:** 6222e5b6d6dSopenharmony_ci 6232e5b6d6dSopenharmony_ci```c 6242e5b6d6dSopenharmony_ciUCollator *coll = ucol_open("en_US",status); 6252e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 6262e5b6d6dSopenharmony_ciUChar text[20]; 6272e5b6d6dSopenharmony_ciUCollationElements *collelemitr; 6282e5b6d6dSopenharmony_ciuint32_t collelem; 6292e5b6d6dSopenharmony_ci 6302e5b6d6dSopenharmony_ciu_uastrcpy(text, "text"); 6312e5b6d6dSopenharmony_cicollelemitr = ucol_openElements(coll, text, -1, &status); 6322e5b6d6dSopenharmony_cicollelem = 0; 6332e5b6d6dSopenharmony_cido { 6342e5b6d6dSopenharmony_ci collelem = ucol_next(collelemitr, &status); 6352e5b6d6dSopenharmony_ci} while (collelem != UCOL_NULLORDER); 6362e5b6d6dSopenharmony_ci 6372e5b6d6dSopenharmony_ciucol_closeElements(collelemitr); 6382e5b6d6dSopenharmony_ciucol_close(coll); 6392e5b6d6dSopenharmony_ci``` 6402e5b6d6dSopenharmony_ci 6412e5b6d6dSopenharmony_ci**C++:** 6422e5b6d6dSopenharmony_ci 6432e5b6d6dSopenharmony_ci```c++ 6442e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR; 6452e5b6d6dSopenharmony_ciCollator *coll = Collator::createInstance(Locale::getUS(), status); 6462e5b6d6dSopenharmony_ciUnicodeString text("text"); 6472e5b6d6dSopenharmony_ciCollationElementIterator *collelemitr = coll->createCollationElementIterator(text); 6482e5b6d6dSopenharmony_ciuint32_t collelem = 0; 6492e5b6d6dSopenharmony_cido { 6502e5b6d6dSopenharmony_ci collelem = collelemitr->next(status); 6512e5b6d6dSopenharmony_ci} while (collelem != CollationElementIterator::NULLORDER); 6522e5b6d6dSopenharmony_ci 6532e5b6d6dSopenharmony_cidelete collelemitr; 6542e5b6d6dSopenharmony_cidelete coll; 6552e5b6d6dSopenharmony_ci``` 6562e5b6d6dSopenharmony_ci 6572e5b6d6dSopenharmony_ci**Java:** 6582e5b6d6dSopenharmony_ci 6592e5b6d6dSopenharmony_ci```java 6602e5b6d6dSopenharmony_citry { 6612e5b6d6dSopenharmony_ci RuleBasedCollator coll = (RuleBasedCollator)Collator.getInstance(Locale.US); 6622e5b6d6dSopenharmony_ci String text = "text"; 6632e5b6d6dSopenharmony_ci CollationElementIterator collelemitr = coll.getCollationElementIterator(text); 6642e5b6d6dSopenharmony_ci int collelem = 0; 6652e5b6d6dSopenharmony_ci do { 6662e5b6d6dSopenharmony_ci collelem = collelemitr.next(); 6672e5b6d6dSopenharmony_ci } while (collelem != CollationElementIterator.NULLORDER); 6682e5b6d6dSopenharmony_ci} catch (Exception e) { 6692e5b6d6dSopenharmony_ci System.err.println("Error in collation iteration"); 6702e5b6d6dSopenharmony_ci e.printStackTrace(); 6712e5b6d6dSopenharmony_ci} 6722e5b6d6dSopenharmony_ci``` 6732e5b6d6dSopenharmony_ci 6742e5b6d6dSopenharmony_ci## Setting and Getting Attributes 6752e5b6d6dSopenharmony_ci 6762e5b6d6dSopenharmony_ciThe general attribute setting APIs are `ucol_setAttribute` (in C) and 6772e5b6d6dSopenharmony_ci`Collator::setAttribute` (in C++). These APIs take an attribute name and an 6782e5b6d6dSopenharmony_ciattribute value. If the name and the value pass a syntax and range check, the 6792e5b6d6dSopenharmony_ciproperty of the collator is changed. If the name and value do not pass a syntax 6802e5b6d6dSopenharmony_ciand range check, however, the state is not changed and the error code variable 6812e5b6d6dSopenharmony_ciis set to an error condition. The Java version does not provide general 6822e5b6d6dSopenharmony_ciattribute setting APIs; instead, each attribute has its own setter API of 6832e5b6d6dSopenharmony_cithe form `RuleBasedCollator.setATTRIBUTE_NAME(arguments)`. 6842e5b6d6dSopenharmony_ci 6852e5b6d6dSopenharmony_ciThe attribute getting APIs are `ucol_getAttribute` (C) and `Collator::getAttribute` 6862e5b6d6dSopenharmony_ci(C++). Both APIs require an attribute name as an argument and return an 6872e5b6d6dSopenharmony_ciattribute value if a valid attribute name was supplied. If a valid attribute 6882e5b6d6dSopenharmony_ciname was not supplied, however, they return an undefined result and set the 6892e5b6d6dSopenharmony_cierror code. Similarly to the setter APIs for the Java version, no generic getter 6902e5b6d6dSopenharmony_ciAPI is provided. Each attribute has its own setter API of the form 6912e5b6d6dSopenharmony_ci`RuleBasedCollator.getATTRIBUTE_NAME()` in the Java version. 6922e5b6d6dSopenharmony_ci 6932e5b6d6dSopenharmony_ci## References 6942e5b6d6dSopenharmony_ci 6952e5b6d6dSopenharmony_ci1. Ken Whistler, Markus Scherer: "Unicode Technical Standard #10, Unicode Collation 6962e5b6d6dSopenharmony_ci Algorithm" (<http://www.unicode.org/reports/tr10/>) 6972e5b6d6dSopenharmony_ci 6982e5b6d6dSopenharmony_ci2. ICU Design doc: "Collation v2" (<https://icu.unicode.org/design/collation/v2>) 6992e5b6d6dSopenharmony_ci 7002e5b6d6dSopenharmony_ci3. Mark Davis: "ICU Collation Design Document" 7012e5b6d6dSopenharmony_ci (<https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/main/design/collation/ICU_collation_design.htm>) 7022e5b6d6dSopenharmony_ci 7032e5b6d6dSopenharmony_ci3. The Unicode Standard, chapter 5, "Implementation guidelines" 7042e5b6d6dSopenharmony_ci (<http://www.unicode.org/uni2book/ch05.pdf>) 7052e5b6d6dSopenharmony_ci 7062e5b6d6dSopenharmony_ci4. Laura Werner: "Efficient text searching in Java: Finding the right string in 7072e5b6d6dSopenharmony_ci any language" 7082e5b6d6dSopenharmony_ci (<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>) 7092e5b6d6dSopenharmony_ci 7102e5b6d6dSopenharmony_ci5. Mark Davis, Martin Dürst: "Unicode Standard Annex #15: Unicode Normalization 7112e5b6d6dSopenharmony_ci Forms" (<http://www.unicode.org/reports/tr15/>). 712