12e5b6d6dSopenharmony_ci--- 22e5b6d6dSopenharmony_cilayout: default 32e5b6d6dSopenharmony_cititle: String Search 42e5b6d6dSopenharmony_cinav_order: 4 52e5b6d6dSopenharmony_ciparent: Collation 62e5b6d6dSopenharmony_ci--- 72e5b6d6dSopenharmony_ci<!-- 82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others. 92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html 102e5b6d6dSopenharmony_ci--> 112e5b6d6dSopenharmony_ci 122e5b6d6dSopenharmony_ci# String Search Service 132e5b6d6dSopenharmony_ci{: .no_toc } 142e5b6d6dSopenharmony_ci 152e5b6d6dSopenharmony_ci## Contents 162e5b6d6dSopenharmony_ci{: .no_toc .text-delta } 172e5b6d6dSopenharmony_ci 182e5b6d6dSopenharmony_ci1. TOC 192e5b6d6dSopenharmony_ci{:toc} 202e5b6d6dSopenharmony_ci 212e5b6d6dSopenharmony_ci--- 222e5b6d6dSopenharmony_ci 232e5b6d6dSopenharmony_ci## Overview 242e5b6d6dSopenharmony_ci 252e5b6d6dSopenharmony_ciString searching, also known as string matching, is a very important subject in 262e5b6d6dSopenharmony_cithe wider domain of text processing and analysis. Many software applications use 272e5b6d6dSopenharmony_cithe basic string search algorithm in the implementations on most operating 282e5b6d6dSopenharmony_cisystems. With the popularity of Internet, the quantity of available data from 292e5b6d6dSopenharmony_cidifferent parts of the world has increased dramatically within a short time. 302e5b6d6dSopenharmony_ciTherefore, a string search algorithm that is language-aware has become more 312e5b6d6dSopenharmony_ciimportant. A bitwise match that uses the `u_strstr` (C), `UnicodeString::indexOf` 322e5b6d6dSopenharmony_ci(C++) or `String.indexOf` (Java) APIs will not yield the correct result specific 332e5b6d6dSopenharmony_cito a particular language's requirements. The APIs will not yield the correct 342e5b6d6dSopenharmony_ciresult because all the issues that are important to language-sensitive collation 352e5b6d6dSopenharmony_ciare also applicable to text searching. The following lists those issues which 362e5b6d6dSopenharmony_ciare applicable to text searching: 372e5b6d6dSopenharmony_ci 382e5b6d6dSopenharmony_ci1. Accented letters\ 392e5b6d6dSopenharmony_ci In English, accents are treated as minor variations of a letter. In French, 402e5b6d6dSopenharmony_ci accented letters have much more significance as they can actually change the 412e5b6d6dSopenharmony_ci meaning of a word. Very often, an accented letter is actually a distinct 422e5b6d6dSopenharmony_ci letter. For example, letter 'å' (\\u00e5) may be just a letter 'a' with an 432e5b6d6dSopenharmony_ci accent symbol to English speakers. However, it is actually a distinct letter 442e5b6d6dSopenharmony_ci in Danish; in Danish searching for 'a' should generally not match 'å' and 452e5b6d6dSopenharmony_ci vice versa. In some cases, such as in traditional German, an accented letter 462e5b6d6dSopenharmony_ci is short-hand for something longer. In sorting, an 'ä' (\\u00e4) is treated 472e5b6d6dSopenharmony_ci as 'ae'. Note that primary- and secondary-level distinctions for *searching* 482e5b6d6dSopenharmony_ci may not be the same as those for sorting; in ICU, many languages provide a 492e5b6d6dSopenharmony_ci special "search" collator with the appropriate level settings for search. 502e5b6d6dSopenharmony_ci 512e5b6d6dSopenharmony_ci2. Conjoined letters\ 522e5b6d6dSopenharmony_ci Special handling is required when a single letter is treated equivalent to 532e5b6d6dSopenharmony_ci two distinct letters and vice versa. For example, in German, the letter 'ß' 542e5b6d6dSopenharmony_ci (\\u00df) is treated as 'ss' in sorting. Also, in most languages, 'æ' 552e5b6d6dSopenharmony_ci (\\u00e6) is considered equivalent to the letter 'a' followed by the letter 562e5b6d6dSopenharmony_ci 'e'. Also, the ligatures are often treated as distinct letters by 572e5b6d6dSopenharmony_ci themselves. For example, 'ch' is treated as a distinct letter between the 582e5b6d6dSopenharmony_ci letter 'c' and the letter 'd' in Spanish. 592e5b6d6dSopenharmony_ci 602e5b6d6dSopenharmony_ci3. Ignorable punctuation\ 612e5b6d6dSopenharmony_ci As in collation, it is important that the user is able to choose to ignore 622e5b6d6dSopenharmony_ci punctuation symbols while the user searches for a pattern in the string. For 632e5b6d6dSopenharmony_ci example, a user may search for "blackbird" and want to include entries such 642e5b6d6dSopenharmony_ci as "black-bird". 652e5b6d6dSopenharmony_ci 662e5b6d6dSopenharmony_ci## ICU String Search Model 672e5b6d6dSopenharmony_ci 682e5b6d6dSopenharmony_ciThe ICU string search service provides similar APIs to the other text iterating 692e5b6d6dSopenharmony_ciservices. Allowing users to specify the starting position and direction within 702e5b6d6dSopenharmony_cithe text string to be searched. For more information, please see the [Boundary 712e5b6d6dSopenharmony_ciAnalysis](../boundaryanalysis/index.md) chapter. The user can locate one or all 722e5b6d6dSopenharmony_cioccurrences of a pattern in a string. For a given collator, a pattern match is 732e5b6d6dSopenharmony_cilocated at the offsets <start, end> in a string if the collator finds that the 742e5b6d6dSopenharmony_cisub-string between the start and end is equal. 752e5b6d6dSopenharmony_ci 762e5b6d6dSopenharmony_ciThe string search service supports two different types of canonical match 772e5b6d6dSopenharmony_cibehavior. 782e5b6d6dSopenharmony_ci 792e5b6d6dSopenharmony_ciLet S' be the sub-string of a text string S between the offsets start and end 802e5b6d6dSopenharmony_ci<start, end>. 812e5b6d6dSopenharmony_ciA pattern string P matches a text string S at the offsets <start, end> if 822e5b6d6dSopenharmony_ci 832e5b6d6dSopenharmony_ci1. option 1. P matches some canonical equivalent string of S'. Suppose the 842e5b6d6dSopenharmony_ci collator used for searching has a tertiary collation strength, all accents 852e5b6d6dSopenharmony_ci are non-ignorable. If the pattern "a\\u0300" is searched in the target text 862e5b6d6dSopenharmony_ci "a\\u0325\\u0300", a match will be found, since the target text is 872e5b6d6dSopenharmony_ci canonically equivalent to "a\\u0300\\u0325" 882e5b6d6dSopenharmony_ci 892e5b6d6dSopenharmony_ci2. option 2. P matches S' and if P starts or ends with a combining mark, there 902e5b6d6dSopenharmony_ci exists no non-ignorable combining mark before or after S' in S respectively. 912e5b6d6dSopenharmony_ci Following the example above, the pattern "a\\u0300" will not find a match in 922e5b6d6dSopenharmony_ci "a\\u0325\\u0300", since there exists a non-ignorable accent '\\u0325' in 932e5b6d6dSopenharmony_ci the middle of 'a' and '\\u0300'. Even with a target text of 942e5b6d6dSopenharmony_ci "a\\u0300\\u0325" a match will not be found because of the non-ignorable 952e5b6d6dSopenharmony_ci trailing accent \\u0325. 962e5b6d6dSopenharmony_ci 972e5b6d6dSopenharmony_ciOne restriction is to be noted for option 1. Currently there are no composite 982e5b6d6dSopenharmony_cicharacters that consists of a character with combining class greater than 0 992e5b6d6dSopenharmony_cibefore a character with combining class equals to 0. However, if such a 1002e5b6d6dSopenharmony_cicharacter exists in the future, the string search service may not work correctly 1012e5b6d6dSopenharmony_ciwith option 1 when such characters are encountered. 1022e5b6d6dSopenharmony_ci 1032e5b6d6dSopenharmony_ciFurthermore, option 1 could generate more than one "encompassing" matches. For 1042e5b6d6dSopenharmony_ciexample, in Danish, 'å' (\\u00e5) and 'aa' are considered equivalent. So the 1052e5b6d6dSopenharmony_cipattern "baad" will match "a--båd--man" (a--b\\u00e5d--man) at the start offset 1062e5b6d6dSopenharmony_ciat 3 and the end offset 5. However, the start offset can be 1 or 2 and the end 1072e5b6d6dSopenharmony_cioffset can be 6 or 7, because "-" (hyphen) is ignorable for a certain collation. 1082e5b6d6dSopenharmony_ciThe ICU implementation always returns the offsets of the shortest match 1092e5b6d6dSopenharmony_cisub-string. To be more exact, the string search added a "tightest" match 1102e5b6d6dSopenharmony_cicondition. In other words, if the pattern matches at offsets <start, end> as 1112e5b6d6dSopenharmony_ciwell as offsets <start + 1, end>, the offsets <start, end> are not considered a 1122e5b6d6dSopenharmony_cimatch. Likewise, if the pattern matches at offsets <start, end> as well as 1132e5b6d6dSopenharmony_cioffsets <start, end + 1>, the offsets <start, end + 1> are not considered a 1142e5b6d6dSopenharmony_cimatch. Therefore, when the option 1 is chosen in Danish collator, 'baad' will 1152e5b6d6dSopenharmony_cimatch in the string "a--båd--man" (a--b\\u00e5d--man) ONLY at offsets <3,5>. 1162e5b6d6dSopenharmony_ci 1172e5b6d6dSopenharmony_ciThe default behavior is that described in option 2 above. To obtain the behavior 1182e5b6d6dSopenharmony_cidescribed in option 1, you must set the normalization mode to ON in the collator 1192e5b6d6dSopenharmony_ciused for search. 1202e5b6d6dSopenharmony_ci 1212e5b6d6dSopenharmony_ci> :point_right: **Note**: The "tightest match" behavior described above 1222e5b6d6dSopenharmony_ci> is defined as "Minimal Match" in 1232e5b6d6dSopenharmony_ci> [Section 8 Searching and Matching in UTS #10 Unicode Collation Collation Algorithm](http://www.unicode.org/reports/tr10/#Searching). 1242e5b6d6dSopenharmony_ci> "Medial Match" and "Maximal Match" are not yet implemented by the ICU String Search service. 1252e5b6d6dSopenharmony_ci 1262e5b6d6dSopenharmony_ciThe string search service also supports two varieties of “asymmetric search” as 1272e5b6d6dSopenharmony_cidescribed in *[Section 8.2 Asymmetric Search in UTS #10 Unicode Collation 1282e5b6d6dSopenharmony_ciCollation Algorithm](http://www.unicode.org/reports/tr10/#Asymmetric_Search)*. 1292e5b6d6dSopenharmony_ciWith asymmetric search, for example, unaccented characters are treated as 1302e5b6d6dSopenharmony_ci“wildcards” that may match any character with the same primary weight, this 1312e5b6d6dSopenharmony_cibehavior can be applied just to characters in the search pattern, or to 1322e5b6d6dSopenharmony_cicharacters in both the search pattern and the searched text. With the former 1332e5b6d6dSopenharmony_cibehavior, searching with French behavior for 'e' might match 'e', 'è', 'é', 'ê', 1342e5b6d6dSopenharmony_ciand so one, while search for 'é' would only match 'é'. 1352e5b6d6dSopenharmony_ci 1362e5b6d6dSopenharmony_ciBoth a locale or collator can be used to specify the language-sensitive rules 1372e5b6d6dSopenharmony_cifor searches. When a locale is specified, a collator will be created internally 1382e5b6d6dSopenharmony_ciand the StringSearch instance that is created is responsible for the ownership 1392e5b6d6dSopenharmony_ciof the collator. All the collation attributes will be considered during the 1402e5b6d6dSopenharmony_cistring search operation. However, the users only can set the collator attributes 1412e5b6d6dSopenharmony_ciusing the collator APIs. Normalization is usually done within collation and the 1422e5b6d6dSopenharmony_ciprocess is outside the scope of the string search service. 1432e5b6d6dSopenharmony_ci 1442e5b6d6dSopenharmony_ciAs in other iterator interfaces, the string search service provides APIs to 1452e5b6d6dSopenharmony_ciperform string matching for the first pattern occurrence, immediate next, 1462e5b6d6dSopenharmony_ciprevious match, and the last pattern occurrence. There are also options to allow 1472e5b6d6dSopenharmony_cifor overlapping matching. For example, in English, if the string is "ababab" and 1482e5b6d6dSopenharmony_cithe pattern is "abab", overlapping matching produces results of offsets <0, 3> 1492e5b6d6dSopenharmony_ciand <2, 5>. Otherwise, the mutually exclusive matching produces the result 1502e5b6d6dSopenharmony_cioffset <0, 3> only. To find a whole word match, the user can provide a 1512e5b6d6dSopenharmony_cilocale-specific `BreakIterator` object to a `StringSearch` instance to correctly 1522e5b6d6dSopenharmony_cilocate the word boundaries. For example, if "c" exists in the string "abc", a 1532e5b6d6dSopenharmony_cimatch is returned. However, the behavior can be overwritten by supplying a word 1542e5b6d6dSopenharmony_ci`BreakIterator`. 1552e5b6d6dSopenharmony_ci 1562e5b6d6dSopenharmony_ciThe minimum unit of match is aligned to an extended grapheme cluster in the ICU 1572e5b6d6dSopenharmony_cistring search service implementation defined by [UAX #29 Unicode Text 1582e5b6d6dSopenharmony_ciSegmentation](http://www.unicode.org/reports/tr29/). Therefore, all matches will 1592e5b6d6dSopenharmony_cibegin and end on extended grapheme cluster boundaries. If the given input search 1602e5b6d6dSopenharmony_cipattern starts with non-base character, no matches will be returned. 1612e5b6d6dSopenharmony_ciWhen there are contractions in the collation sequence and the contraction 1622e5b6d6dSopenharmony_cihappens to span across the boundary of a match, it is not considered a match. 1632e5b6d6dSopenharmony_ciFor example, in traditional Spanish where 'ch' is a contraction, the "har" 1642e5b6d6dSopenharmony_cipattern will not match in the string "uno charo". Boundaries that are 1652e5b6d6dSopenharmony_cidiscontiguous contractions will yield a match result similar to those described 1662e5b6d6dSopenharmony_ciabove, where the end of the match returned will be one character before the 1672e5b6d6dSopenharmony_ciimmediate following base letter. In addition, only the first match will be 1682e5b6d6dSopenharmony_cilocated if a pattern contains only combining marks and the search string 1692e5b6d6dSopenharmony_cicontains more than one occurrences of the pattern consecutively. For example, if 1702e5b6d6dSopenharmony_cithe user searches for the pattern "´" (\\u00b4) in the string "A´´B", 1712e5b6d6dSopenharmony_ci(A\\u00b4\\u00b4B) the result will be offsets <1, 2>. 1722e5b6d6dSopenharmony_ci 1732e5b6d6dSopenharmony_ci### Example 1742e5b6d6dSopenharmony_ci 1752e5b6d6dSopenharmony_ci**In C:** 1762e5b6d6dSopenharmony_ci 1772e5b6d6dSopenharmony_ci```c 1782e5b6d6dSopenharmony_ci char *tgtstr = "The quick brown fox jumps over the lazy dog."; 1792e5b6d6dSopenharmony_ci char *patstr = "fox"; 1802e5b6d6dSopenharmony_ci UChar target[64]; 1812e5b6d6dSopenharmony_ci 1822e5b6d6dSopenharmony_ci UChar pattern[16]; 1832e5b6d6dSopenharmony_ci int pos = 0; 1842e5b6d6dSopenharmony_ci UErrorCode status = U_ZERO_ERROR; 1852e5b6d6dSopenharmony_ci UStringSearch *search = NULL; 1862e5b6d6dSopenharmony_ci 1872e5b6d6dSopenharmony_ci u_uastrcpy(target, tgtstr); 1882e5b6d6dSopenharmony_ci u_uastrcpy(pattern, patstr); 1892e5b6d6dSopenharmony_ci 1902e5b6d6dSopenharmony_ci 1912e5b6d6dSopenharmony_ci search = usearch_open(pattern, -1, target, -1, "en_US", 1922e5b6d6dSopenharmony_ci NULL, &status); 1932e5b6d6dSopenharmony_ci 1942e5b6d6dSopenharmony_ci 1952e5b6d6dSopenharmony_ci if (U_FAILURE(status)) { 1962e5b6d6dSopenharmony_ci fprintf(stderr, "Could not create a UStringSearch.\n"); 1972e5b6d6dSopenharmony_ci return; 1982e5b6d6dSopenharmony_ci } 1992e5b6d6dSopenharmony_ci 2002e5b6d6dSopenharmony_ci for(pos = usearch_first(search, &status); 2012e5b6d6dSopenharmony_ci U_SUCCESS(status) && pos != USEARCH_DONE; 2022e5b6d6dSopenharmony_ci pos = usearch_next(search, &status)) 2032e5b6d6dSopenharmony_ci { 2042e5b6d6dSopenharmony_ci fprintf(stdout, "Match found at position %d.\n", pos); 2052e5b6d6dSopenharmony_ci } 2062e5b6d6dSopenharmony_ci 2072e5b6d6dSopenharmony_ci if (U_FAILURE(status)) { 2082e5b6d6dSopenharmony_ci fprintf(stderr, "Error searching for pattern.\n"); 2092e5b6d6dSopenharmony_ci } 2102e5b6d6dSopenharmony_ci``` 2112e5b6d6dSopenharmony_ci 2122e5b6d6dSopenharmony_ci**In C++:** 2132e5b6d6dSopenharmony_ci 2142e5b6d6dSopenharmony_ci```c++ 2152e5b6d6dSopenharmony_ci UErrorCode status = U_ZERO_ERROR; 2162e5b6d6dSopenharmony_ci UnicodeString target("Jackdaws love my big sphinx of quartz."); 2172e5b6d6dSopenharmony_ci UnicodeString pattern("sphinx"); 2182e5b6d6dSopenharmony_ci StringSearch search(pattern, target, Locale::getUS(), NULL, status); 2192e5b6d6dSopenharmony_ci 2202e5b6d6dSopenharmony_ci 2212e5b6d6dSopenharmony_ci if (U_FAILURE(status)) { 2222e5b6d6dSopenharmony_ci fprintf(stderr, "Could not create a StringSearch object.\n"); 2232e5b6d6dSopenharmony_ci return; 2242e5b6d6dSopenharmony_ci } 2252e5b6d6dSopenharmony_ci 2262e5b6d6dSopenharmony_ci for(int pos = search.first(status); 2272e5b6d6dSopenharmony_ci U_SUCCESS(status) && pos != USEARCH_DONE; 2282e5b6d6dSopenharmony_ci pos = search.next(status)) 2292e5b6d6dSopenharmony_ci { 2302e5b6d6dSopenharmony_ci fprintf(stdout, "Match found at position %d.\n", pos); 2312e5b6d6dSopenharmony_ci } 2322e5b6d6dSopenharmony_ci 2332e5b6d6dSopenharmony_ci if (U_FAILURE(status)) { 2342e5b6d6dSopenharmony_ci fprintf(stderr, "Error searching for pattern.\n"); 2352e5b6d6dSopenharmony_ci } 2362e5b6d6dSopenharmony_ci``` 2372e5b6d6dSopenharmony_ci 2382e5b6d6dSopenharmony_ci**In Java:** 2392e5b6d6dSopenharmony_ci 2402e5b6d6dSopenharmony_ci```java 2412e5b6d6dSopenharmony_ci StringCharacterIterator target = new StringCharacterIterator( 2422e5b6d6dSopenharmony_ci "Pack my box with five dozen liquor jugs."); 2432e5b6d6dSopenharmony_ci String pattern = "box"; 2442e5b6d6dSopenharmony_ci 2452e5b6d6dSopenharmony_ci try { 2462e5b6d6dSopenharmony_ci StringSearch search = new StringSearch(pattern, target, Locale.US); 2472e5b6d6dSopenharmony_ci 2482e5b6d6dSopenharmony_ci 2492e5b6d6dSopenharmony_ci for(int pos = search.first(); 2502e5b6d6dSopenharmony_ci pos != StringSearch.DONE; 2512e5b6d6dSopenharmony_ci pos = search.next()) 2522e5b6d6dSopenharmony_ci { 2532e5b6d6dSopenharmony_ci System.out.println("Match found for pattern at position " + pos); 2542e5b6d6dSopenharmony_ci } 2552e5b6d6dSopenharmony_ci } catch (Exception e) { 2562e5b6d6dSopenharmony_ci System.err.println("StringSearch failure: " + e.toString()); 2572e5b6d6dSopenharmony_ci } 2582e5b6d6dSopenharmony_ci``` 2592e5b6d6dSopenharmony_ci 2602e5b6d6dSopenharmony_ci## Performance and Other Implications 2612e5b6d6dSopenharmony_ci 2622e5b6d6dSopenharmony_ciThe ICU string search service is designed to be on top of the ICU collation 2632e5b6d6dSopenharmony_ciservice. Therefore, all the performance implications that apply to a collator 2642e5b6d6dSopenharmony_ciare also applicable to the string search service. To obtain the best 2652e5b6d6dSopenharmony_ciperformance, use the default collator attributes described in the Performance 2662e5b6d6dSopenharmony_ciand Storage Implications on Attributes section in the [Collation Service 2672e5b6d6dSopenharmony_ciArchitecture](architecture#performance-and-storage-implications-of-attributes) 2682e5b6d6dSopenharmony_cichapter. In addition, users need to be aware of 2692e5b6d6dSopenharmony_cithe following `StringSearch` specific considerations: 2702e5b6d6dSopenharmony_ci 2712e5b6d6dSopenharmony_ci### Search Algorithm 2722e5b6d6dSopenharmony_ci 2732e5b6d6dSopenharmony_ciICU4C (C/C++) releases up to 3.8 used the Boyer-Moore search algorithm in the string 2742e5b6d6dSopenharmony_cisearch service. There were some known issues in these previous releases. 2752e5b6d6dSopenharmony_ci(See ICU tickets [ICU-5024](https://unicode-org.atlassian.net/browse/ICU-5024), 2762e5b6d6dSopenharmony_ci[ICU-5382](https://unicode-org.atlassian.net/browse/ICU-5382), 2772e5b6d6dSopenharmony_ci[ICU-5420](https://unicode-org.atlassian.net/browse/ICU-5420)). 2782e5b6d6dSopenharmony_ci 2792e5b6d6dSopenharmony_ciIn ICU4C 4.0, the string search service was updated to use a simple linear search 2802e5b6d6dSopenharmony_cialgorithm, which locates a match by shifting a cursor in the target text one by one, 2812e5b6d6dSopenharmony_ciand these issues were fixed. 2822e5b6d6dSopenharmony_ci 2832e5b6d6dSopenharmony_ciIn ICU4C 4.0.1, the Boyer-Moore search code was reintroduced as a separate API with 2842e5b6d6dSopenharmony_citechnology preview status. However, in ICU4C 51.1, this was removed. 2852e5b6d6dSopenharmony_ci(See ICU ticket [ICU-9573](https://unicode-org.atlassian.net/browse/ICU-9573)). 2862e5b6d6dSopenharmony_ci 2872e5b6d6dSopenharmony_ciSimilarly, in ICU4J 53 (Java) the Boyer-Moore search algorithm was replaced by the 2882e5b6d6dSopenharmony_cisimple linear search algorithm, ported from ICU4C. (See ICU ticket [ICU-6288](https://unicode-org.atlassian.net/browse/ICU-6288)). 2892e5b6d6dSopenharmony_ci 2902e5b6d6dSopenharmony_ciThe Boyer-Moore search algorithm is based on automata or combinatorial properties of strings and 2912e5b6d6dSopenharmony_cipre-processes the pattern and known to be much faster than the linear search 2922e5b6d6dSopenharmony_ciwhen search pattern length is longer. According to performance evaluation 2932e5b6d6dSopenharmony_cibetween these two implementations, the Boyer-Moore search is faster than the 2942e5b6d6dSopenharmony_cilinear search when the pattern text is longer than 3 or 4 characters. 2952e5b6d6dSopenharmony_ciHowever, it is very tricky to get correct results with a collation-based Boyer-Moore search. 2962e5b6d6dSopenharmony_ci 2972e5b6d6dSopenharmony_ci### Change Iterating Direction 2982e5b6d6dSopenharmony_ci 2992e5b6d6dSopenharmony_ciThe ICU string search service provides a set of very dynamic APIs that allow 3002e5b6d6dSopenharmony_ciusers to change the iterating direction randomly. For example, users can search 3012e5b6d6dSopenharmony_cifor a particular word going forward by calling the `usearch_next` (C), 3022e5b6d6dSopenharmony_ci`StringSearch::next` (C++) or `StringSearch.next` (Java) APIs and then search 3032e5b6d6dSopenharmony_cibackwards at any point of the search operation by calling the `usearch_previous` 3042e5b6d6dSopenharmony_ci(C), `StringSearch::previous` (C++) or `StringSearch.previous` (Java) APIs. Another 3052e5b6d6dSopenharmony_ciway to change the iterating direction is by calling the `usearch_reset` (C), 3062e5b6d6dSopenharmony_ci`StringSearch::previous` (C++) or `StringSearch.previous` (Java) APIs. Though the 3072e5b6d6dSopenharmony_cidirection change can occur without calling the reset APIs first, this operation 3082e5b6d6dSopenharmony_cicomes with a reduction in speed. 3092e5b6d6dSopenharmony_ci 3102e5b6d6dSopenharmony_ci> :point_right: **Note**: The backward search is not available with the 3112e5b6d6dSopenharmony_ci> ICU4C Boyer-Moore search technology preview introduced in ICU4C 4.0.1 3122e5b6d6dSopenharmony_ci> and only available with the linear search implementation. 3132e5b6d6dSopenharmony_ci 3142e5b6d6dSopenharmony_ci### Thai and Lao Character Boundaries 3152e5b6d6dSopenharmony_ci 3162e5b6d6dSopenharmony_ciIn collation, certain Thai and Lao vowels are swapped with the next character. 3172e5b6d6dSopenharmony_ciFor example, the text string "A ขเ" (A \\u0e02\\u0e40) is processed internally 3182e5b6d6dSopenharmony_ciin collation as 3192e5b6d6dSopenharmony_ci"A เข" (A \\u0e40\\u0e02). Therefore, if the user searches for the pattern "Aเ" 3202e5b6d6dSopenharmony_ci(A\\u0e40) in "A ขเ" (A \\u0e02\\u0e40) the string search service will match 3212e5b6d6dSopenharmony_cistarting at offset 0. Since this normalization process is internal to collation, 3222e5b6d6dSopenharmony_cithere is no notification that the swapping has happened. The return result 3232e5b6d6dSopenharmony_cioffsets in this example will be <0, 2> even though the range would encompass one 3242e5b6d6dSopenharmony_ciextra character. 3252e5b6d6dSopenharmony_ci 3262e5b6d6dSopenharmony_ci### Case Level Search 3272e5b6d6dSopenharmony_ci 3282e5b6d6dSopenharmony_ciCase level string search is currently done with the strength set to tertiary. 3292e5b6d6dSopenharmony_ciWhen searching with the strength set to primary and the case level attribute 3302e5b6d6dSopenharmony_citurned on, results given may not be correct. The case level attribute is 3312e5b6d6dSopenharmony_cidifferent from tertiary strength in that accents are ignored but case 3322e5b6d6dSopenharmony_cidifferences are not. Suppose you wanted to search for “A” in the text 3332e5b6d6dSopenharmony_ci“ABC\\u00C5a”. The match found should be at 0 and 3 if using the case level 3342e5b6d6dSopenharmony_ciattribute. However, searching with the case level attribute turned on finds 3352e5b6d6dSopenharmony_cimatches at 0, 3, and 4, which includes the lower case 'a'. To ensure that case 3362e5b6d6dSopenharmony_cilevel differences are not ignored, string search must be done with at least 3372e5b6d6dSopenharmony_citertiary strength. 338