12e5b6d6dSopenharmony_ci---
22e5b6d6dSopenharmony_cilayout: default
32e5b6d6dSopenharmony_cititle: API Details
42e5b6d6dSopenharmony_cinav_order: 6
52e5b6d6dSopenharmony_ciparent: Collation
62e5b6d6dSopenharmony_ci---
72e5b6d6dSopenharmony_ci<!--
82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others.
92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html
102e5b6d6dSopenharmony_ci-->
112e5b6d6dSopenharmony_ci
122e5b6d6dSopenharmony_ci# Collation API Details
132e5b6d6dSopenharmony_ci{: .no_toc }
142e5b6d6dSopenharmony_ci
152e5b6d6dSopenharmony_ci## Contents
162e5b6d6dSopenharmony_ci{: .no_toc .text-delta }
172e5b6d6dSopenharmony_ci
182e5b6d6dSopenharmony_ci1. TOC
192e5b6d6dSopenharmony_ci{:toc}
202e5b6d6dSopenharmony_ci
212e5b6d6dSopenharmony_ci---
222e5b6d6dSopenharmony_ci
232e5b6d6dSopenharmony_ci## Overview
242e5b6d6dSopenharmony_ci
252e5b6d6dSopenharmony_ciThis section describes some of the usage conventions for the ICU Collation
262e5b6d6dSopenharmony_ciService API.
272e5b6d6dSopenharmony_ci
282e5b6d6dSopenharmony_ci## Collator Instantiation
292e5b6d6dSopenharmony_ci
302e5b6d6dSopenharmony_ciTo use the Collation Service, you must instantiate a `Collator`. The
312e5b6d6dSopenharmony_ciCollator defines the properties and behavior of the sort ordering. The Collator
322e5b6d6dSopenharmony_cican be repeatedly referenced until all collation activities have been performed.
332e5b6d6dSopenharmony_ciThe Collator can then be closed and removed.
342e5b6d6dSopenharmony_ci
352e5b6d6dSopenharmony_ci### Instantiating the Predefined Collators
362e5b6d6dSopenharmony_ci
372e5b6d6dSopenharmony_ciICU comes with a large set of already predefined collators that are suited for
382e5b6d6dSopenharmony_cispecific locales. Most of the ICU locales have a predefined collator. In the worst
392e5b6d6dSopenharmony_cicase, the CLDR default set of rules,
402e5b6d6dSopenharmony_ciwhich is mostly equivalent to the UCA default ordering (DUCET), is used.
412e5b6d6dSopenharmony_ciThe default sort order itself is designed to work well for many languages.
422e5b6d6dSopenharmony_ci(For example, there are no tailorings for the standard sort orders for
432e5b6d6dSopenharmony_ciEnglish, German, French, etc.)
442e5b6d6dSopenharmony_ci
452e5b6d6dSopenharmony_ciTo instantiate a predefined collator, use the APIs `ucol_open`, `createInstance` and
462e5b6d6dSopenharmony_ci`getInstance` for C, C++ and Java codes respectively. The C API takes a locale ID
472e5b6d6dSopenharmony_ci(or language tag) string argument, C++ takes a Locale object, and Java takes a
482e5b6d6dSopenharmony_ciLocale or ULocale.
492e5b6d6dSopenharmony_ci
502e5b6d6dSopenharmony_ciFor some languages, multiple collation types are available; for example,
512e5b6d6dSopenharmony_ci"de-u-co-phonebk" / "de@collation=phonebook". They can be enumerated via
522e5b6d6dSopenharmony_ci`Collator::getKeywordValuesForLocale()`. See also the list of available collation
532e5b6d6dSopenharmony_citailorings in the online [ICU Collation
542e5b6d6dSopenharmony_ciDemo](https://icu4c-demos.unicode.org/icu-bin/collation.html).
552e5b6d6dSopenharmony_ci
562e5b6d6dSopenharmony_ciStarting with ICU 54, collation attributes can be specified via locale keywords
572e5b6d6dSopenharmony_cias well, in the old locale extension syntax ("el@colCaseFirst=upper") or in
582e5b6d6dSopenharmony_cilanguage tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive.
592e5b6d6dSopenharmony_ci
602e5b6d6dSopenharmony_ciSee the [LDML Collation spec, Collation
612e5b6d6dSopenharmony_ciSettings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings),
622e5b6d6dSopenharmony_ciand the [data
632e5b6d6dSopenharmony_cifile](https://github.com/unicode-org/cldr/blob/main/common/bcp47/collation.xml) listing
642e5b6d6dSopenharmony_cithe valid collation keywords and their values. (The deprecated attributes
652e5b6d6dSopenharmony_cikh/colHiraganaQuaternary and vt/variableTop are not supported.)
662e5b6d6dSopenharmony_ci
672e5b6d6dSopenharmony_ciFor the [old locale extension
682e5b6d6dSopenharmony_cisyntax](http://www.unicode.org/reports/tr35/tr35.html#Old_Locale_Extension_Syntax),
692e5b6d6dSopenharmony_cithe data file's alias names are used (first alias, if defined, otherwise the
702e5b6d6dSopenharmony_ciname): "de@collation=phonebook;colCaseLevel=yes;kv=space"
712e5b6d6dSopenharmony_ci
722e5b6d6dSopenharmony_ciFor the language tag syntax, the non-alias names are used, and "true" values can
732e5b6d6dSopenharmony_cibe omitted: "de-u-co-phonebk-kc-kv-space"
742e5b6d6dSopenharmony_ci
752e5b6d6dSopenharmony_ciThis example demonstrates the instantiation of a collator.
762e5b6d6dSopenharmony_ci
772e5b6d6dSopenharmony_ci**C:**
782e5b6d6dSopenharmony_ci
792e5b6d6dSopenharmony_ci```c
802e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR;
812e5b6d6dSopenharmony_ciUCollator *coll = ucol_open("en_US", &status);
822e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) {
832e5b6d6dSopenharmony_ci    /* close the collator*/
842e5b6d6dSopenharmony_ci    ucol_close(coll);
852e5b6d6dSopenharmony_ci}
862e5b6d6dSopenharmony_ci```
872e5b6d6dSopenharmony_ci
882e5b6d6dSopenharmony_ci**C++:**
892e5b6d6dSopenharmony_ci
902e5b6d6dSopenharmony_ci```c++
912e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR;
922e5b6d6dSopenharmony_ciCollator *coll = Collator::createInstance(Locale("en", "US"), status);
932e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) {
942e5b6d6dSopenharmony_ci    //close the collator
952e5b6d6dSopenharmony_ci    delete coll;
962e5b6d6dSopenharmony_ci}
972e5b6d6dSopenharmony_ci```
982e5b6d6dSopenharmony_ci
992e5b6d6dSopenharmony_ci**Java:**
1002e5b6d6dSopenharmony_ci
1012e5b6d6dSopenharmony_ci```java
1022e5b6d6dSopenharmony_ciCollator col = null;
1032e5b6d6dSopenharmony_citry {
1042e5b6d6dSopenharmony_ci    col = Collator.getInstance(Locale.US);
1052e5b6d6dSopenharmony_ci} catch (Exception e) {
1062e5b6d6dSopenharmony_ci    System.err.println("English collation creation failed.");
1072e5b6d6dSopenharmony_ci    e.printStackTrace();
1082e5b6d6dSopenharmony_ci}
1092e5b6d6dSopenharmony_ci```
1102e5b6d6dSopenharmony_ci
1112e5b6d6dSopenharmony_ci### Instantiating Collators Using Custom Rules
1122e5b6d6dSopenharmony_ci
1132e5b6d6dSopenharmony_ciIf the ICU predefined collators are not appropriate for your intended usage, you
1142e5b6d6dSopenharmony_cican define your own set of rules and instantiate a collator that uses them. For more
1152e5b6d6dSopenharmony_cidetails, please see [the section on collation customization](customization/index).
1162e5b6d6dSopenharmony_ci
1172e5b6d6dSopenharmony_ciThis example demonstrates the instantiation of a collator.
1182e5b6d6dSopenharmony_ci
1192e5b6d6dSopenharmony_ci**C:**
1202e5b6d6dSopenharmony_ci
1212e5b6d6dSopenharmony_ci```c
1222e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR;
1232e5b6d6dSopenharmony_ciU_STRING_DECL(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52);
1242e5b6d6dSopenharmony_ciUCollator *coll;
1252e5b6d6dSopenharmony_ci
1262e5b6d6dSopenharmony_ciU_STRING_INIT(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52);
1272e5b6d6dSopenharmony_cicoll = ucol_openRules(rules, -1, UCOL_ON, UCOL_DEFAULT_STRENGTH, NULL, &status);
1282e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) {
1292e5b6d6dSopenharmony_ci    /* close the collator*/
1302e5b6d6dSopenharmony_ci    ucol_close(coll);
1312e5b6d6dSopenharmony_ci}
1322e5b6d6dSopenharmony_ci```
1332e5b6d6dSopenharmony_ci
1342e5b6d6dSopenharmony_ci**C++:**
1352e5b6d6dSopenharmony_ci
1362e5b6d6dSopenharmony_ci```c++
1372e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR;
1382e5b6d6dSopenharmony_ciUnicodeString rules(u"&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E");
1392e5b6d6dSopenharmony_ciCollator *coll = new RuleBasedCollator(rules, status);
1402e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) {
1412e5b6d6dSopenharmony_ci    //close the collator
1422e5b6d6dSopenharmony_ci    delete coll;
1432e5b6d6dSopenharmony_ci}
1442e5b6d6dSopenharmony_ci```
1452e5b6d6dSopenharmony_ci
1462e5b6d6dSopenharmony_ci**Java:**
1472e5b6d6dSopenharmony_ci
1482e5b6d6dSopenharmony_ci```java
1492e5b6d6dSopenharmony_ciRuleBasedCollator coll = null;
1502e5b6d6dSopenharmony_ciString ruleset = "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E";
1512e5b6d6dSopenharmony_citry {
1522e5b6d6dSopenharmony_ci    coll = new RuleBasedCollator(ruleset);
1532e5b6d6dSopenharmony_ci} catch (Exception e) {
1542e5b6d6dSopenharmony_ci    System.err.println("Customized collation creation failed.");
1552e5b6d6dSopenharmony_ci    e.printStackTrace();
1562e5b6d6dSopenharmony_ci}
1572e5b6d6dSopenharmony_ci```
1582e5b6d6dSopenharmony_ci
1592e5b6d6dSopenharmony_ci## Compare
1602e5b6d6dSopenharmony_ci
1612e5b6d6dSopenharmony_ciTwo of the most used functions in ICU collation API, `ucol_strcoll` and `ucol_getSortKey`, have their counterparts in both Win32 and ANSI APIs:
1622e5b6d6dSopenharmony_ci
1632e5b6d6dSopenharmony_ciICU C             | ICU C++                     | ICU Java                   | ANSI/POSIX | WIN32
1642e5b6d6dSopenharmony_ci----------------- | --------------------------- | -------------------------- | ---------- | -----
1652e5b6d6dSopenharmony_ci`ucol_strcoll`    | `Collator::compare`         | `Collator.compare`         | `strcoll`  | `CompareString`
1662e5b6d6dSopenharmony_ci`ucol_getSortKey` | `Collator::getSortKey`      | `Collator.getCollationKey` | `strxfrm`  | `LCMapString`
1672e5b6d6dSopenharmony_ci&nbsp;            | `Collator::getCollationKey` | &nbsp;                     | &nbsp;     |
1682e5b6d6dSopenharmony_ci
1692e5b6d6dSopenharmony_ciFor more sophisticated usage, such as user-controlled language-sensitive text
1702e5b6d6dSopenharmony_cisearching, an iterating interface to collation is provided. Please refer to the
1712e5b6d6dSopenharmony_cisection below on `CollationElementIterator` for more details.
1722e5b6d6dSopenharmony_ci
1732e5b6d6dSopenharmony_ciThe `ucol_compare` function compares one pair of strings at a time. Comparing two
1742e5b6d6dSopenharmony_cistrings is much faster than calculating sort keys for both of them. However, if
1752e5b6d6dSopenharmony_cicomparisons should be done repeatedly on a very large number of strings, generating
1762e5b6d6dSopenharmony_ciand storing sort keys can improve performance. In all other cases (such as quick
1772e5b6d6dSopenharmony_cisort or bubble sort of a
1782e5b6d6dSopenharmony_cimoderately-sized list of strings), comparing strings works very well.
1792e5b6d6dSopenharmony_ci
1802e5b6d6dSopenharmony_ciThe C API used for comparing two strings is `ucol_strcoll`. It requires two
1812e5b6d6dSopenharmony_ci`UChar *` strings and their lengths as parameters, as well as a pointer to a valid
1822e5b6d6dSopenharmony_ci`UCollator` instance. The result is a `UCollationResult` constant, which can be one
1832e5b6d6dSopenharmony_ciof `UCOL_LESS`, `UCOL_EQUAL` or `UCOL_GREATER`.
1842e5b6d6dSopenharmony_ci
1852e5b6d6dSopenharmony_ciThe C++ API offers the method `Collator::compare` with several overloads.
1862e5b6d6dSopenharmony_ciAcceptable input arguments are `UChar *` with length of strings, or `UnicodeString`
1872e5b6d6dSopenharmony_ciinstances. The result is a member of the `UCollationResult` or `EComparisonResult` enums.
1882e5b6d6dSopenharmony_ci
1892e5b6d6dSopenharmony_ciThe Java API provides the method `Collator.compare` with one overload. Acceptable
1902e5b6d6dSopenharmony_ciinput arguments are Strings or Objects. The result is an int value, which is
1912e5b6d6dSopenharmony_ciless than zero if source is less than target, zero if source and target are
1922e5b6d6dSopenharmony_ciequal, or greater than zero if source is greater than target.
1932e5b6d6dSopenharmony_ci
1942e5b6d6dSopenharmony_ciThere are also several convenience functions and methods returning a boolean
1952e5b6d6dSopenharmony_civalue, such as `ucol_greater`, `ucol_greaterOrEqual`, `ucol_equal` (in C)
1962e5b6d6dSopenharmony_ci`Collator::greater`, `Collator::greaterOrEqual`, `Collator::equal` (in C++) and
1972e5b6d6dSopenharmony_ci`Collator.equals` (in Java).
1982e5b6d6dSopenharmony_ci
1992e5b6d6dSopenharmony_ci### Examples
2002e5b6d6dSopenharmony_ci
2012e5b6d6dSopenharmony_ci**C:**
2022e5b6d6dSopenharmony_ci
2032e5b6d6dSopenharmony_ci```c
2042e5b6d6dSopenharmony_ciUChar *s [] = { /* list of Unicode strings */ };
2052e5b6d6dSopenharmony_ciuint32_t listSize = sizeof(s)/sizeof(s[0]);
2062e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR;
2072e5b6d6dSopenharmony_ciUCollator *coll = ucol_open("en_US", &status);
2082e5b6d6dSopenharmony_ciuint32_t i, j;
2092e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) {
2102e5b6d6dSopenharmony_ci  for(i=listSize-1; i>=1; i--) {
2112e5b6d6dSopenharmony_ci    for(j=0; j<i; j++) {
2122e5b6d6dSopenharmony_ci      if(ucol_strcoll(s[j], -1, s[j+1], -1) == UCOL_LESS) {
2132e5b6d6dSopenharmony_ci        swap(s[j], s[j+1]);
2142e5b6d6dSopenharmony_ci     }
2152e5b6d6dSopenharmony_ci   }
2162e5b6d6dSopenharmony_ci}
2172e5b6d6dSopenharmony_ciucol_close(coll);
2182e5b6d6dSopenharmony_ci}
2192e5b6d6dSopenharmony_ci```
2202e5b6d6dSopenharmony_ci
2212e5b6d6dSopenharmony_ci**C++:**
2222e5b6d6dSopenharmony_ci
2232e5b6d6dSopenharmony_ci```c++
2242e5b6d6dSopenharmony_ciUnicodeString s [] = { /* list of Unicode strings */ };
2252e5b6d6dSopenharmony_ciuint32_t listSize = sizeof(s)/sizeof(s[0]);
2262e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR;
2272e5b6d6dSopenharmony_ciCollator *coll = Collator::createInstance(Locale("en", "US"), status);
2282e5b6d6dSopenharmony_ciuint32_t i, j;
2292e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) {
2302e5b6d6dSopenharmony_ci  for(i=listSize-1; i>=1; i--) {
2312e5b6d6dSopenharmony_ci    for(j=0; j<i; j++) {
2322e5b6d6dSopenharmony_ci      if(coll->compare(s[j], s[j+1]) == UCOL_LESS) {
2332e5b6d6dSopenharmony_ci        swap(s[j], s[j+1]);
2342e5b6d6dSopenharmony_ci     }
2352e5b6d6dSopenharmony_ci   }
2362e5b6d6dSopenharmony_ci}
2372e5b6d6dSopenharmony_cidelete coll;
2382e5b6d6dSopenharmony_ci}
2392e5b6d6dSopenharmony_ci```
2402e5b6d6dSopenharmony_ci
2412e5b6d6dSopenharmony_ci**Java:**
2422e5b6d6dSopenharmony_ci
2432e5b6d6dSopenharmony_ci```java
2442e5b6d6dSopenharmony_ciString s [] = { /* list of Unicode strings */ };
2452e5b6d6dSopenharmony_citry {
2462e5b6d6dSopenharmony_ci    Collator coll = Collator.getInstance(Locale.US);
2472e5b6d6dSopenharmony_ci    for (int i = s.length - 1; i > = 1; i --) {
2482e5b6d6dSopenharmony_ci        for (j=0; j<i; j++) {
2492e5b6d6dSopenharmony_ci            if (coll.compare(s[j], s[j+1]) == -1) {
2502e5b6d6dSopenharmony_ci                swap(s[j], s[j+1]);
2512e5b6d6dSopenharmony_ci            }
2522e5b6d6dSopenharmony_ci        }
2532e5b6d6dSopenharmony_ci    }
2542e5b6d6dSopenharmony_ci} catch (Exception e) {
2552e5b6d6dSopenharmony_ci    System.err.println("English collation creation failed.");
2562e5b6d6dSopenharmony_ci    e.printStackTrace();
2572e5b6d6dSopenharmony_ci}
2582e5b6d6dSopenharmony_ci```
2592e5b6d6dSopenharmony_ci
2602e5b6d6dSopenharmony_ci## GetSortKey
2612e5b6d6dSopenharmony_ci
2622e5b6d6dSopenharmony_ciThe C API provides the `ucol_getSortKey` function, which requires (apart from a
2632e5b6d6dSopenharmony_cipointer to a valid `UCollator` instance), an original `UChar` pointer, together with
2642e5b6d6dSopenharmony_ciits length. It also requires a pointer to a receiving buffer and its length.
2652e5b6d6dSopenharmony_ci
2662e5b6d6dSopenharmony_ciThe C++ API provides the `Collator::getSortKey` method with similar parameters as
2672e5b6d6dSopenharmony_cithe C version. It also provides `Collator::getCollationKey`, which produces a
2682e5b6d6dSopenharmony_ci`CollationKey` object instance (a wrapper around a sort key).
2692e5b6d6dSopenharmony_ci
2702e5b6d6dSopenharmony_ciThe Java API provides only the `Collator.getCollationKey` method, which produces a
2712e5b6d6dSopenharmony_ci`CollationKey` object instance (a wrapper around a sort key).
2722e5b6d6dSopenharmony_ci
2732e5b6d6dSopenharmony_ciSort keys are generally only useful in databases or other circumstances where
2742e5b6d6dSopenharmony_cifunction calls are extremely expensive. See [Sortkeys vs
2752e5b6d6dSopenharmony_ciComparison](concepts#sortkeys-vs-comparison).
2762e5b6d6dSopenharmony_ci
2772e5b6d6dSopenharmony_ci### Sort Key Features
2782e5b6d6dSopenharmony_ci
2792e5b6d6dSopenharmony_ciICU writes sort keys as sequences of bytes.
2802e5b6d6dSopenharmony_ci
2812e5b6d6dSopenharmony_ciEach sort key ends with one 00 byte and does not contain any other 00 byte. The
2822e5b6d6dSopenharmony_citerminating 00 byte is included in the length of the sort key as returned by the
2832e5b6d6dSopenharmony_ciAPI (unlike any other ICU API where terminating NUL bytes or characters are not
2842e5b6d6dSopenharmony_cicounted as part of the length).
2852e5b6d6dSopenharmony_ci
2862e5b6d6dSopenharmony_ciSort key byte sequences must be compared with an unsigned-byte comparison, as
2872e5b6d6dSopenharmony_ciwith `strcmp()`.
2882e5b6d6dSopenharmony_ci
2892e5b6d6dSopenharmony_ciComparing the sort keys of two strings from the same collator yields the same
2902e5b6d6dSopenharmony_ciordering as using the collator to compare the two strings directly. That is:
2912e5b6d6dSopenharmony_ci`strcmp(coll.getSortKey(str1), coll.getSortKey(str2))` is equivalent to
2922e5b6d6dSopenharmony_ci`coll.compare(str1, str2)`.
2932e5b6d6dSopenharmony_ci
2942e5b6d6dSopenharmony_ciSort keys from different collators (different locale or strength or any other
2952e5b6d6dSopenharmony_ciattributes/settings) are not comparable.
2962e5b6d6dSopenharmony_ci
2972e5b6d6dSopenharmony_ciSort keys can be "merged" as described in [UTS #10 Merging Sort
2982e5b6d6dSopenharmony_ciKeys](http://www.unicode.org/reports/tr10/#Merging_Sort_Keys), via
2992e5b6d6dSopenharmony_ci`ucol_mergeSortkeys()` or Java `CollationKey.merge()`.
3002e5b6d6dSopenharmony_ci
3012e5b6d6dSopenharmony_ci*   Since CLDR 1.9/ICU 4.6, the same effect can be achieved by concatenating
3022e5b6d6dSopenharmony_ci    strings with U+FFFE between them. The concatenation has the same sort order
3032e5b6d6dSopenharmony_ci    as the merged sort keys.
3042e5b6d6dSopenharmony_ci*   However, it is not guaranteed that the sort key of the concatenated strings
3052e5b6d6dSopenharmony_ci    is the same as the merged result of the individual sort keys. (That is,
3062e5b6d6dSopenharmony_ci    merge(getSortKey(str1), getSortKey(str2)) may differ from getSortKey(str1 +
3072e5b6d6dSopenharmony_ci    '\\uFFFE' + str2).)
3082e5b6d6dSopenharmony_ci*   In particular, a future version of ICU is likely to generate shorter sort
3092e5b6d6dSopenharmony_ci    keys when concatenating strings with U+FFFE between them (by using
3102e5b6d6dSopenharmony_ci    compression across the U+FFFE weights).
3112e5b6d6dSopenharmony_ci*   *The recommended way to achieve "merged" sorting is via strings with
3122e5b6d6dSopenharmony_ci    U+FFFE.*
3132e5b6d6dSopenharmony_ci
3142e5b6d6dSopenharmony_ciAny further analysis or parsing of sort keys is not supported.
3152e5b6d6dSopenharmony_ci
3162e5b6d6dSopenharmony_ciSort keys will change from one ICU version to another; therefore, if sort keys
3172e5b6d6dSopenharmony_ciare stored in a database or other persistent storage, then each upgrade requires
3182e5b6d6dSopenharmony_citheir regeneration.
3192e5b6d6dSopenharmony_ci
3202e5b6d6dSopenharmony_ci*   The details of the underlying data change with every Unicode and CLDR
3212e5b6d6dSopenharmony_ci    version.
3222e5b6d6dSopenharmony_ci*   Sort keys are also subject to enhancements and bug fixes in the builder and
3232e5b6d6dSopenharmony_ci    implementation code.
3242e5b6d6dSopenharmony_ci*   On the other hand, the sort *order* is much more stable. It is subject to
3252e5b6d6dSopenharmony_ci    deliberate changes to the default Unicode collation order, which is kept
3262e5b6d6dSopenharmony_ci    quite stable, and subject to deliberate changes in CLDR data as new data is
3272e5b6d6dSopenharmony_ci    added and feedback on existing data is taken into account.
3282e5b6d6dSopenharmony_ci
3292e5b6d6dSopenharmony_ciImplementation notes: (Not supported as permanent constraints on sort keys)
3302e5b6d6dSopenharmony_ci
3312e5b6d6dSopenharmony_ciByte 02 was unique as a merge separator for some versions of ICU before version
3322e5b6d6dSopenharmony_ciICU 53. Since ICU 53, 02 is also used in regular collation weights where there
3332e5b6d6dSopenharmony_ciis no conflict (to expand the number of available short weights).
3342e5b6d6dSopenharmony_ci
3352e5b6d6dSopenharmony_ciByte 01 has been unique as a level separator. This is not strictly necessary for
3362e5b6d6dSopenharmony_cinon-primary levels. (A level's compressible "common" weight as its level
3372e5b6d6dSopenharmony_ciseparator would yield shorter sort keys.) However, the current implementation of
3382e5b6d6dSopenharmony_ci`ucol_mergeSortkeys()` relies on it. (Also, test code currently examines sort keys
3392e5b6d6dSopenharmony_cifor finding the strength of a comparison difference.) This may change in the
3402e5b6d6dSopenharmony_cifuture, especially if `ucol_mergeSortkeys()` were to become deprecated.
3412e5b6d6dSopenharmony_ci
3422e5b6d6dSopenharmony_ciLevel separators are likely to be equivalent to single-byte weights (possibly
3432e5b6d6dSopenharmony_cicompressible): Multi-byte level separators would noticeably lengthen sort keys
3442e5b6d6dSopenharmony_cifor short strings.
3452e5b6d6dSopenharmony_ci
3462e5b6d6dSopenharmony_ciThe byte values used in several ICU versions for sort keys and collation
3472e5b6d6dSopenharmony_cielements are documented in the [“Special Byte Values” design
3482e5b6d6dSopenharmony_cidoc](https://icu.unicode.org/design/collation/bytes) on the ICU site.
3492e5b6d6dSopenharmony_ci
3502e5b6d6dSopenharmony_ci### Sort Key Output Buffer
3512e5b6d6dSopenharmony_ci
3522e5b6d6dSopenharmony_ci`ucol_getSortKey()` can operate in 'preflighting' mode, which returns the amount
3532e5b6d6dSopenharmony_ciof memory needed to store the resulting sort key. This mode is automatically
3542e5b6d6dSopenharmony_ciactivated if the output buffer size passed is set to zero. Should the sort key
3552e5b6d6dSopenharmony_cibecome longer than the buffer provided, function again slips into preflighting
3562e5b6d6dSopenharmony_cimode. The overall performance is poorer than if the function is called with a
3572e5b6d6dSopenharmony_cizero output buffer. If the size of the sort key returned is greater than the
3582e5b6d6dSopenharmony_cisize of the buffer provided, the content of the result buffer is undefined. In
3592e5b6d6dSopenharmony_cithat case, the result buffer could be reallocated to its proper size and the
3602e5b6d6dSopenharmony_cisort key generator function can be used again.
3612e5b6d6dSopenharmony_ci
3622e5b6d6dSopenharmony_ciThe best way to generate a series of sort keys is to do the following:
3632e5b6d6dSopenharmony_ci
3642e5b6d6dSopenharmony_ci1.  Create a big temporary buffer on the stack. Typically, this buffer is
3652e5b6d6dSopenharmony_ci    allocated only once, and reused with every sort key generated. There is no
3662e5b6d6dSopenharmony_ci    need to keep it as small as possible. A recommended size for the temporary
3672e5b6d6dSopenharmony_ci    buffer is four times the length of the longest string processed.
3682e5b6d6dSopenharmony_ci
3692e5b6d6dSopenharmony_ci2.  Start the loop. Call `ucol_getSortKey()` to find out how big the sort key
3702e5b6d6dSopenharmony_ci    buffer should be, and fill in the temporary buffer at the same time.
3712e5b6d6dSopenharmony_ci
3722e5b6d6dSopenharmony_ci3.  If the temporary buffer is too small, allocate or reallocate more space.
3732e5b6d6dSopenharmony_ci    Fill in the sort key values in the overflow buffer.
3742e5b6d6dSopenharmony_ci
3752e5b6d6dSopenharmony_ci4.  Allocate the sort key buffer with the size returned by `ucol_getSortKey()` and
3762e5b6d6dSopenharmony_ci    call `memcpy` to copy the sort key content from the temp buffer to the sort
3772e5b6d6dSopenharmony_ci    key buffer.
3782e5b6d6dSopenharmony_ci
3792e5b6d6dSopenharmony_ci5.  Loop back to step 1 until you are done.
3802e5b6d6dSopenharmony_ci
3812e5b6d6dSopenharmony_ci6.  Delete the overflow buffer if you created one.
3822e5b6d6dSopenharmony_ci
3832e5b6d6dSopenharmony_ci### Example
3842e5b6d6dSopenharmony_ci
3852e5b6d6dSopenharmony_ci```c
3862e5b6d6dSopenharmony_civoid GetSortKeys(const Ucollator* coll, const UChar*
3872e5b6d6dSopenharmony_ciconst *source, uint32_t arrayLength)
3882e5b6d6dSopenharmony_ci{
3892e5b6d6dSopenharmony_ci  char[1000] buffer; // allocate stack buffer
3902e5b6d6dSopenharmony_ci  char* currBuffer = buffer;
3912e5b6d6dSopenharmony_ci  int32_t bufferLen = sizeof(buffer);
3922e5b6d6dSopenharmony_ci  int32_t expectedLen = 0;
3932e5b6d6dSopenharmony_ci  UErrorCode err = U_ZERO_ERROR;
3942e5b6d6dSopenharmony_ci
3952e5b6d6dSopenharmony_ci  for (int i = 0; i < arrayLength; ++i) {
3962e5b6d6dSopenharmony_ci    expectedLen = ucol_getSortKey(coll, source[i], -1, currBuffer, bufferLen);
3972e5b6d6dSopenharmony_ci    if (expectedLen > bufferLen) {
3982e5b6d6dSopenharmony_ci      if (currBuffer == buffer) {
3992e5b6d6dSopenharmony_ci        currBuffer = (char*)malloc(expectedLen);
4002e5b6d6dSopenharmony_ci      } else {
4012e5b6d6dSopenharmony_ci        currBuffer = (char*)realloc(currBuffer, expectedLen);
4022e5b6d6dSopenharmony_ci      }
4032e5b6d6dSopenharmony_ci    }
4042e5b6d6dSopenharmony_ci    bufferLen = ucol_getSortKey(coll, source[i], -1, currBuffer, expectedLen);
4052e5b6d6dSopenharmony_ci  }
4062e5b6d6dSopenharmony_ci  processSortKey(i, currBuffer, bufferLen);
4072e5b6d6dSopenharmony_ci
4082e5b6d6dSopenharmony_ci
4092e5b6d6dSopenharmony_ci  if (currBuffer != buffer && currBuffer != NULL) {
4102e5b6d6dSopenharmony_ci    free(currBuffer);
4112e5b6d6dSopenharmony_ci  }
4122e5b6d6dSopenharmony_ci}
4132e5b6d6dSopenharmony_ci```
4142e5b6d6dSopenharmony_ci
4152e5b6d6dSopenharmony_ci> :point_right: **Note** Although the API allows you to call
4162e5b6d6dSopenharmony_ci> `ucol_getSortKey` with `NULL` to see what the
4172e5b6d6dSopenharmony_ci> sort key length is, it is strongly recommended that you NOT determine the length
4182e5b6d6dSopenharmony_ci> first, then allocate and fill the sort key buffer. If you do, it requires twice
4192e5b6d6dSopenharmony_ci> the processing since computing the length has to do the same calculation as
4202e5b6d6dSopenharmony_ci> actually getting the sort key. Instead, the example shown above uses a stack buffer.
4212e5b6d6dSopenharmony_ci
4222e5b6d6dSopenharmony_ci### Using Iterators for String Comparison
4232e5b6d6dSopenharmony_ci
4242e5b6d6dSopenharmony_ciICU4C's `ucol_strcollIter` API allows for comparing two strings that are supplied
4252e5b6d6dSopenharmony_cias character iterators (`UCharIterator`). This is useful when you need to compare
4262e5b6d6dSopenharmony_cidifferently encoded strings using `strcoll`. In that case, converting the strings
4272e5b6d6dSopenharmony_cifirst would probably be wasteful, since `strcoll` usually gives the result
4282e5b6d6dSopenharmony_cibefore whole strings are processed. This API is implemented only as a C function
4292e5b6d6dSopenharmony_ciin ICU4C. There are no equivalent C++ or ICU4J functions.
4302e5b6d6dSopenharmony_ci
4312e5b6d6dSopenharmony_ci```c
4322e5b6d6dSopenharmony_ci...
4332e5b6d6dSopenharmony_ci/* we are arriving with two char*: utf8Source and utf8Target, with their
4342e5b6d6dSopenharmony_ci* lengths in utf8SourceLen and utf8TargetLen
4352e5b6d6dSopenharmony_ci*/
4362e5b6d6dSopenharmony_ci    UCharIterator sIter, tIter;
4372e5b6d6dSopenharmony_ci    uiter_setUTF8(&sIter, utf8Source, utf8SourceLen);
4382e5b6d6dSopenharmony_ci    uiter_setUTF8(&tIter, utf8Target, utf8TargetLen);
4392e5b6d6dSopenharmony_ci    compareResultUTF8 = ucol_strcollIter(myCollation, &sIter, &tIter, &status);
4402e5b6d6dSopenharmony_ci...
4412e5b6d6dSopenharmony_ci```
4422e5b6d6dSopenharmony_ci
4432e5b6d6dSopenharmony_ci### Obtaining Partial Sort Keys
4442e5b6d6dSopenharmony_ci
4452e5b6d6dSopenharmony_ciWhen using different sort algorithms, such as radix sort, sometimes it is useful
4462e5b6d6dSopenharmony_cito process strings only as much as needed to feed into the sorting algorithm.
4472e5b6d6dSopenharmony_ciFor that purpose, ICU provides the `ucol_nextSortKeyPart` API, which also takes
4482e5b6d6dSopenharmony_cicharacter iterators. This API allows for iterating over subsequent pieces of an
4492e5b6d6dSopenharmony_ciuncompressed sort key. Between calls to the API you need to save a 64-bit state.
4502e5b6d6dSopenharmony_ciFollowing is an example of simulating a string compare function using the partial
4512e5b6d6dSopenharmony_cisort key API. Your usage model is bound to look much different.
4522e5b6d6dSopenharmony_ci
4532e5b6d6dSopenharmony_ci```c
4542e5b6d6dSopenharmony_cistatic UCollationResult compareUsingPartials(UCollator *coll,
4552e5b6d6dSopenharmony_ci                                             const UChar source[], int32_t sLen,
4562e5b6d6dSopenharmony_ci                                             const UChar target[], int32_t tLen,
4572e5b6d6dSopenharmony_ci                                             int32_t pieceSize, UErrorCode *status) {
4582e5b6d6dSopenharmony_ci  int32_t partialSKResult = 0;
4592e5b6d6dSopenharmony_ci  UCharIterator sIter, tIter;
4602e5b6d6dSopenharmony_ci  uint32_t sState[2], tState[2];
4612e5b6d6dSopenharmony_ci  int32_t sSize = pieceSize, tSize = pieceSize;
4622e5b6d6dSopenharmony_ci  int32_t i = 0;
4632e5b6d6dSopenharmony_ci  uint8_t sBuf[16384], tBuf[16384];
4642e5b6d6dSopenharmony_ci  if(pieceSize > 16384) {
4652e5b6d6dSopenharmony_ci    *status = U_BUFFER_OVERFLOW_ERROR;
4662e5b6d6dSopenharmony_ci    return UCOL_EQUAL;
4672e5b6d6dSopenharmony_ci  }
4682e5b6d6dSopenharmony_ci  *status = U_ZERO_ERROR;
4692e5b6d6dSopenharmony_ci  sState[0] = 0; sState[1] = 0;
4702e5b6d6dSopenharmony_ci  tState[0] = 0; tState[1] = 0;
4712e5b6d6dSopenharmony_ci  while(sSize == pieceSize && tSize == pieceSize && partialSKResult == 0) {
4722e5b6d6dSopenharmony_ci    uiter_setString(&sIter, source, sLen);
4732e5b6d6dSopenharmony_ci    uiter_setString(&tIter, target, tLen);
4742e5b6d6dSopenharmony_ci    sSize = ucol_nextSortKeyPart(coll, &sIter, sState, sBuf, pieceSize, status);
4752e5b6d6dSopenharmony_ci    tSize = ucol_nextSortKeyPart(coll, &tIter, tState, tBuf, pieceSize, status);
4762e5b6d6dSopenharmony_ci    partialSKResult = memcmp(sBuf, tBuf, pieceSize);
4772e5b6d6dSopenharmony_ci  }
4782e5b6d6dSopenharmony_ci
4792e5b6d6dSopenharmony_ci  if(partialSKResult < 0) {
4802e5b6d6dSopenharmony_ci      return UCOL_LESS;
4812e5b6d6dSopenharmony_ci  } else if(partialSKResult > 0) {
4822e5b6d6dSopenharmony_ci    return UCOL_GREATER;
4832e5b6d6dSopenharmony_ci  } else {
4842e5b6d6dSopenharmony_ci    return UCOL_EQUAL;
4852e5b6d6dSopenharmony_ci  }
4862e5b6d6dSopenharmony_ci}
4872e5b6d6dSopenharmony_ci```
4882e5b6d6dSopenharmony_ci
4892e5b6d6dSopenharmony_ci### Other Examples
4902e5b6d6dSopenharmony_ci
4912e5b6d6dSopenharmony_ciA longer example is presented in the 'Examples' section. Here is an illustration
4922e5b6d6dSopenharmony_ciof the usage model.
4932e5b6d6dSopenharmony_ci
4942e5b6d6dSopenharmony_ci**C:**
4952e5b6d6dSopenharmony_ci
4962e5b6d6dSopenharmony_ci```c
4972e5b6d6dSopenharmony_ci#define MAX_KEY_SIZE 100
4982e5b6d6dSopenharmony_ci#define MAX_BUFFER_SIZE 10000
4992e5b6d6dSopenharmony_ci#define MAX_LIST_LENGTH 5
5002e5b6d6dSopenharmony_ciconst char text[] = {
5012e5b6d6dSopenharmony_ci   "Quick",
5022e5b6d6dSopenharmony_ci   "fox",
5032e5b6d6dSopenharmony_ci   "Moving",
5042e5b6d6dSopenharmony_ci   "trucks",
5052e5b6d6dSopenharmony_ci   "riddle"
5062e5b6d6dSopenharmony_ci};
5072e5b6d6dSopenharmony_ciconst UChar s [5][20];
5082e5b6d6dSopenharmony_ciint i;
5092e5b6d6dSopenharmony_ciint32_t length, expectedLen;
5102e5b6d6dSopenharmony_ciuint8_t temp[MAX_BUFFER _SIZE];
5112e5b6d6dSopenharmony_ci
5122e5b6d6dSopenharmony_ci
5132e5b6d6dSopenharmony_ciuint8_t *temp2 = NULL;
5142e5b6d6dSopenharmony_ciuint8_t keys [MAX_LIST_LENGTH][MAX_KEY_SIZE];
5152e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR;
5162e5b6d6dSopenharmony_ci
5172e5b6d6dSopenharmony_citemp2 = temp;
5182e5b6d6dSopenharmony_ci
5192e5b6d6dSopenharmony_cilength = MAX_BUFFER_SIZE;
5202e5b6d6dSopenharmony_cifor( i = 0; i < 5; i++)
5212e5b6d6dSopenharmony_ci{
5222e5b6d6dSopenharmony_ci   u_uastrcpy(s[i], text[i]);
5232e5b6d6dSopenharmony_ci}
5242e5b6d6dSopenharmony_ciUCollator *coll = ucol_open("en_US",&status);
5252e5b6d6dSopenharmony_ciuint32_t length;
5262e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) {
5272e5b6d6dSopenharmony_ci  for(i=0; i<MAX_LIST_LENGTH; i++) {
5282e5b6d6dSopenharmony_ci    expectedLen = ucol_getSortKey(coll, s[i], -1,temp2,length );
5292e5b6d6dSopenharmony_ci    if (expectedLen > length) {
5302e5b6d6dSopenharmony_ci      if (temp2 == temp) {
5312e5b6d6dSopenharmony_ci        temp2 =(char*)malloc(expectedLen);
5322e5b6d6dSopenharmony_ci      } else {
5332e5b6d6dSopenharmony_ci        temp2 =(char*)realloc(temp2, expectedLen);
5342e5b6d6dSopenharmony_ci      }
5352e5b6d6dSopenharmony_ci        length =ucol_getSortKey(coll, s[i], -1, temp2, expectedLen);
5362e5b6d6dSopenharmony_ci    }
5372e5b6d6dSopenharmony_ci    memcpy(key[i], temp2, length);
5382e5b6d6dSopenharmony_ci  }
5392e5b6d6dSopenharmony_ci}
5402e5b6d6dSopenharmony_ciqsort(keys, MAX_LIST_LENGTH,MAX_KEY_SIZE*sizeof(uint8_t), strcmp);
5412e5b6d6dSopenharmony_cifor (i = 0; i < MAX_LIST_LENGTH; i++) {
5422e5b6d6dSopenharmony_ci  free(key[i]);
5432e5b6d6dSopenharmony_ci}
5442e5b6d6dSopenharmony_ciucol_close(coll);
5452e5b6d6dSopenharmony_ci```
5462e5b6d6dSopenharmony_ci
5472e5b6d6dSopenharmony_ci**C++:**
5482e5b6d6dSopenharmony_ci
5492e5b6d6dSopenharmony_ci```c++
5502e5b6d6dSopenharmony_ci#define MAX_LIST_LENGTH 5
5512e5b6d6dSopenharmony_ciconst UnicodeString s [] = {
5522e5b6d6dSopenharmony_ci  "Quick",
5532e5b6d6dSopenharmony_ci  "fox",
5542e5b6d6dSopenharmony_ci  "Moving",
5552e5b6d6dSopenharmony_ci  "trucks",
5562e5b6d6dSopenharmony_ci  "riddle"
5572e5b6d6dSopenharmony_ci};
5582e5b6d6dSopenharmony_ciCollationKey *keys[MAX_LIST_LENGTH];
5592e5b6d6dSopenharmony_ciUErrorCode status = U_ZERO_ERROR;
5602e5b6d6dSopenharmony_ciCollator *coll = Collator::createInstance(Locale("en_US"), status);
5612e5b6d6dSopenharmony_ciuint32_t i;
5622e5b6d6dSopenharmony_ciif(U_SUCCESS(status)) {
5632e5b6d6dSopenharmony_ci  for(i=0; i<listSize; i++) {
5642e5b6d6dSopenharmony_ci    keys[i] = coll->getCollationKey(s[i], -1);
5652e5b6d6dSopenharmony_ci  }
5662e5b6d6dSopenharmony_ci  qsort(keys, MAX_LIST_LENGTH, sizeof(CollationKey),compareKeys);
5672e5b6d6dSopenharmony_ci  delete[] keys;
5682e5b6d6dSopenharmony_ci  delete coll;
5692e5b6d6dSopenharmony_ci}
5702e5b6d6dSopenharmony_ci```
5712e5b6d6dSopenharmony_ci
5722e5b6d6dSopenharmony_ci**Java:**
5732e5b6d6dSopenharmony_ci
5742e5b6d6dSopenharmony_ci```java
5752e5b6d6dSopenharmony_ciString s [] = {
5762e5b6d6dSopenharmony_ci  "Quick",
5772e5b6d6dSopenharmony_ci  "fox",
5782e5b6d6dSopenharmony_ci  "Moving",
5792e5b6d6dSopenharmony_ci  "trucks",
5802e5b6d6dSopenharmony_ci  "riddle"
5812e5b6d6dSopenharmony_ci};
5822e5b6d6dSopenharmony_ciCollationKey keys[] = new CollationKey[s.length];
5832e5b6d6dSopenharmony_citry {
5842e5b6d6dSopenharmony_ci    Collator coll = Collator.getInstance(Locale.US);
5852e5b6d6dSopenharmony_ci    for (int i = 0; i < s.length; i ++) {
5862e5b6d6dSopenharmony_ci        keys[i] = coll.getCollationKey(s[i]);
5872e5b6d6dSopenharmony_ci    }
5882e5b6d6dSopenharmony_ci
5892e5b6d6dSopenharmony_ci    Arrays.sort(keys);
5902e5b6d6dSopenharmony_ci}
5912e5b6d6dSopenharmony_cicatch (Exception e) {
5922e5b6d6dSopenharmony_ci    System.err.println("Error creating English collator");
5932e5b6d6dSopenharmony_ci    e.printStackTrace();
5942e5b6d6dSopenharmony_ci}
5952e5b6d6dSopenharmony_ci```
5962e5b6d6dSopenharmony_ci
5972e5b6d6dSopenharmony_ci## Collation ElementIterator
5982e5b6d6dSopenharmony_ci
5992e5b6d6dSopenharmony_ciA collation element iterator can only be used in one direction. This is
6002e5b6d6dSopenharmony_ciestablished at the time of the first call to retrieve a collation element. Once
6012e5b6d6dSopenharmony_ci`ucol_next` (C), `CollationElementIterator::next` (C++) or
6022e5b6d6dSopenharmony_ci`CollationElementIterator.next` (Java) are invoked,
6032e5b6d6dSopenharmony_ci`ucol_previous` (C),
6042e5b6d6dSopenharmony_ci`CollationElementIterator::previous` (C++) or `CollationElementIterator.previous`
6052e5b6d6dSopenharmony_ci(Java) should not be used (and vice versa). The direction can be changed
6062e5b6d6dSopenharmony_ciimmediately after `ucol_first`, `ucol_last`, `ucol_reset` (in C),
6072e5b6d6dSopenharmony_ci`CollationElementIterator::first`, `CollationElementIterator::last`,
6082e5b6d6dSopenharmony_ci`CollationElementIterator::reset` (in C++) or `CollationElementIterator.first`,
6092e5b6d6dSopenharmony_ci`CollationElementIterator.last`, `CollationElementIterator.reset` (in Java) is
6102e5b6d6dSopenharmony_cicalled, or when it reaches the end of string while traversing the string.
6112e5b6d6dSopenharmony_ci
6122e5b6d6dSopenharmony_ciWhen `ucol_next` is called at the end of the string buffer, `UCOL_NULLORDER` is
6132e5b6d6dSopenharmony_cialways returned with any subsequent calls to `ucol_next`. The same applies to
6142e5b6d6dSopenharmony_ci`ucol_previous`.
6152e5b6d6dSopenharmony_ci
6162e5b6d6dSopenharmony_ciAn example of how iterators are used is the Boyer-Moore search implementation,
6172e5b6d6dSopenharmony_ciwhich can be found in the samples section.
6182e5b6d6dSopenharmony_ci
6192e5b6d6dSopenharmony_ci### API Example
6202e5b6d6dSopenharmony_ci
6212e5b6d6dSopenharmony_ci**C:**
6222e5b6d6dSopenharmony_ci
6232e5b6d6dSopenharmony_ci```c
6242e5b6d6dSopenharmony_ciUCollator         *coll = ucol_open("en_US",status);
6252e5b6d6dSopenharmony_ciUErrorCode         status = U_ZERO_ERROR;
6262e5b6d6dSopenharmony_ciUChar              text[20];
6272e5b6d6dSopenharmony_ciUCollationElements *collelemitr;
6282e5b6d6dSopenharmony_ciuint32_t           collelem;
6292e5b6d6dSopenharmony_ci
6302e5b6d6dSopenharmony_ciu_uastrcpy(text, "text");
6312e5b6d6dSopenharmony_cicollelemitr = ucol_openElements(coll, text, -1, &status);
6322e5b6d6dSopenharmony_cicollelem = 0;
6332e5b6d6dSopenharmony_cido {
6342e5b6d6dSopenharmony_ci  collelem = ucol_next(collelemitr, &status);
6352e5b6d6dSopenharmony_ci} while (collelem != UCOL_NULLORDER);
6362e5b6d6dSopenharmony_ci
6372e5b6d6dSopenharmony_ciucol_closeElements(collelemitr);
6382e5b6d6dSopenharmony_ciucol_close(coll);
6392e5b6d6dSopenharmony_ci```
6402e5b6d6dSopenharmony_ci
6412e5b6d6dSopenharmony_ci**C++:**
6422e5b6d6dSopenharmony_ci
6432e5b6d6dSopenharmony_ci```c++
6442e5b6d6dSopenharmony_ciUErrorCode    status = U_ZERO_ERROR;
6452e5b6d6dSopenharmony_ciCollator      *coll = Collator::createInstance(Locale::getUS(), status);
6462e5b6d6dSopenharmony_ciUnicodeString text("text");
6472e5b6d6dSopenharmony_ciCollationElementIterator *collelemitr = coll->createCollationElementIterator(text);
6482e5b6d6dSopenharmony_ciuint32_t      collelem = 0;
6492e5b6d6dSopenharmony_cido {
6502e5b6d6dSopenharmony_ci  collelem = collelemitr->next(status);
6512e5b6d6dSopenharmony_ci} while (collelem != CollationElementIterator::NULLORDER);
6522e5b6d6dSopenharmony_ci
6532e5b6d6dSopenharmony_cidelete collelemitr;
6542e5b6d6dSopenharmony_cidelete coll;
6552e5b6d6dSopenharmony_ci```
6562e5b6d6dSopenharmony_ci
6572e5b6d6dSopenharmony_ci**Java:**
6582e5b6d6dSopenharmony_ci
6592e5b6d6dSopenharmony_ci```java
6602e5b6d6dSopenharmony_citry {
6612e5b6d6dSopenharmony_ci    RuleBasedCollator coll = (RuleBasedCollator)Collator.getInstance(Locale.US);
6622e5b6d6dSopenharmony_ci    String text = "text";
6632e5b6d6dSopenharmony_ci    CollationElementIterator collelemitr = coll.getCollationElementIterator(text);
6642e5b6d6dSopenharmony_ci    int collelem = 0;
6652e5b6d6dSopenharmony_ci    do {
6662e5b6d6dSopenharmony_ci        collelem = collelemitr.next();
6672e5b6d6dSopenharmony_ci    } while (collelem != CollationElementIterator.NULLORDER);
6682e5b6d6dSopenharmony_ci} catch (Exception e) {
6692e5b6d6dSopenharmony_ci    System.err.println("Error in collation iteration");
6702e5b6d6dSopenharmony_ci    e.printStackTrace();
6712e5b6d6dSopenharmony_ci}
6722e5b6d6dSopenharmony_ci```
6732e5b6d6dSopenharmony_ci
6742e5b6d6dSopenharmony_ci## Setting and Getting Attributes
6752e5b6d6dSopenharmony_ci
6762e5b6d6dSopenharmony_ciThe general attribute setting APIs are `ucol_setAttribute` (in C) and
6772e5b6d6dSopenharmony_ci`Collator::setAttribute` (in C++). These APIs take an attribute name and an
6782e5b6d6dSopenharmony_ciattribute value. If the name and the value pass a syntax and range check, the
6792e5b6d6dSopenharmony_ciproperty of the collator is changed. If the name and value do not pass a syntax
6802e5b6d6dSopenharmony_ciand range check, however, the state is not changed and the error code variable
6812e5b6d6dSopenharmony_ciis set to an error condition. The Java version does not provide general
6822e5b6d6dSopenharmony_ciattribute setting APIs; instead, each attribute has its own setter API of
6832e5b6d6dSopenharmony_cithe form `RuleBasedCollator.setATTRIBUTE_NAME(arguments)`.
6842e5b6d6dSopenharmony_ci
6852e5b6d6dSopenharmony_ciThe attribute getting APIs are `ucol_getAttribute` (C) and `Collator::getAttribute`
6862e5b6d6dSopenharmony_ci(C++). Both APIs require an attribute name as an argument and return an
6872e5b6d6dSopenharmony_ciattribute value if a valid attribute name was supplied. If a valid attribute
6882e5b6d6dSopenharmony_ciname was not supplied, however, they return an undefined result and set the
6892e5b6d6dSopenharmony_cierror code. Similarly to the setter APIs for the Java version, no generic getter
6902e5b6d6dSopenharmony_ciAPI is provided. Each attribute has its own setter API of the form
6912e5b6d6dSopenharmony_ci`RuleBasedCollator.getATTRIBUTE_NAME()` in the Java version.
6922e5b6d6dSopenharmony_ci
6932e5b6d6dSopenharmony_ci## References
6942e5b6d6dSopenharmony_ci
6952e5b6d6dSopenharmony_ci1.  Ken Whistler, Markus Scherer: "Unicode Technical Standard #10, Unicode Collation
6962e5b6d6dSopenharmony_ci    Algorithm" (<http://www.unicode.org/reports/tr10/>)
6972e5b6d6dSopenharmony_ci
6982e5b6d6dSopenharmony_ci2.  ICU Design doc: "Collation v2" (<https://icu.unicode.org/design/collation/v2>)
6992e5b6d6dSopenharmony_ci
7002e5b6d6dSopenharmony_ci3.  Mark Davis: "ICU Collation Design Document"
7012e5b6d6dSopenharmony_ci    (<https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/main/design/collation/ICU_collation_design.htm>)
7022e5b6d6dSopenharmony_ci
7032e5b6d6dSopenharmony_ci3.  The Unicode Standard, chapter 5, "Implementation guidelines"
7042e5b6d6dSopenharmony_ci    (<http://www.unicode.org/uni2book/ch05.pdf>)
7052e5b6d6dSopenharmony_ci
7062e5b6d6dSopenharmony_ci4.  Laura Werner: "Efficient text searching in Java: Finding the right string in
7072e5b6d6dSopenharmony_ci    any language"
7082e5b6d6dSopenharmony_ci    (<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>)
7092e5b6d6dSopenharmony_ci
7102e5b6d6dSopenharmony_ci5.  Mark Davis, Martin Dürst: "Unicode Standard Annex #15: Unicode Normalization
7112e5b6d6dSopenharmony_ci    Forms" (<http://www.unicode.org/reports/tr15/>).
712