12e5b6d6dSopenharmony_ci--- 22e5b6d6dSopenharmony_cilayout: default 32e5b6d6dSopenharmony_cititle: StringPrep 42e5b6d6dSopenharmony_cinav_order: 7 52e5b6d6dSopenharmony_ciparent: Chars and Strings 62e5b6d6dSopenharmony_ci--- 72e5b6d6dSopenharmony_ci<!-- 82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others. 92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html 102e5b6d6dSopenharmony_ci--> 112e5b6d6dSopenharmony_ci 122e5b6d6dSopenharmony_ci# StringPrep 132e5b6d6dSopenharmony_ci 142e5b6d6dSopenharmony_ci## Overview 152e5b6d6dSopenharmony_ci 162e5b6d6dSopenharmony_ciComparing strings in a consistent manner becomes imperative when a large 172e5b6d6dSopenharmony_cirepertoire of characters such as Unicode is used in network protocols. 182e5b6d6dSopenharmony_ciStringPrep provides sets of rules for use of Unicode and syntax for prevention 192e5b6d6dSopenharmony_ciof spoofing. The implementation of StringPrep and IDNA services and their usage 202e5b6d6dSopenharmony_ciin ICU is described below. 212e5b6d6dSopenharmony_ci 222e5b6d6dSopenharmony_ci## StringPrep 232e5b6d6dSopenharmony_ci 242e5b6d6dSopenharmony_ciStringPrep, the process of preparing Unicode strings for use in network 252e5b6d6dSopenharmony_ciprotocols is defined in RFC 3454 (<http://www.rfc-editor.org/rfc/rfc3454.txt> ). 262e5b6d6dSopenharmony_ciThe RFC defines a broad framework and rules for processing the strings. 272e5b6d6dSopenharmony_ci 282e5b6d6dSopenharmony_ciProtocols that prescribe use of StringPrep must define a profile of StringPrep, 292e5b6d6dSopenharmony_ciwhose applicability is limited to the protocol. Profiles are a set of rules and 302e5b6d6dSopenharmony_cidata tables which describe the how the strings should be prepare. The profiles 312e5b6d6dSopenharmony_cican choose to turn on or turn off normalization, checking for bidirectional 322e5b6d6dSopenharmony_cicharacters. They can also choose to add or remove mappings, unassigned and 332e5b6d6dSopenharmony_ciprohibited code points from the tables provided. 342e5b6d6dSopenharmony_ci 352e5b6d6dSopenharmony_ciStringPrep uses Unicode Version 3.2 and defines a set of tables for use by the 362e5b6d6dSopenharmony_ciprofiles. The profiles can chose to include or exclude tables or code points 372e5b6d6dSopenharmony_cifrom the tables defined by the RFC. 382e5b6d6dSopenharmony_ci 392e5b6d6dSopenharmony_ciStringPrep defines tables that can be broadly classified into 402e5b6d6dSopenharmony_ci 412e5b6d6dSopenharmony_ci1. *Unassigned Table*: Contains code points that are unassigned in Unicode 422e5b6d6dSopenharmony_ci Version 3.2. Unassigned code points may be allowed or disallowed in the 432e5b6d6dSopenharmony_ci output string depending on the application. The table in Appendix A.1 of the 442e5b6d6dSopenharmony_ci RFC contains the code points. 452e5b6d6dSopenharmony_ci 462e5b6d6dSopenharmony_ci1. *Mapping Tables*: Code points that are commonly deleted from the output and 472e5b6d6dSopenharmony_ci code points that are case mapped are included in this table. There are two 482e5b6d6dSopenharmony_ci mapping tables in the Appendix namely B.1 and B.2 492e5b6d6dSopenharmony_ci 502e5b6d6dSopenharmony_ci2. *Prohibited Tables*: Contains code points that are prohibited from the 512e5b6d6dSopenharmony_ci output string. Control codes, private use area code points, non-character 522e5b6d6dSopenharmony_ci code points, surrogate code points, tagging and deprecated code points are 532e5b6d6dSopenharmony_ci included in this table. There are nine mapping tables in Appendix which 542e5b6d6dSopenharmony_ci include the prohibited code points namely C.1, C.2, C.3, C.4, C.5, C.6, C.7, 552e5b6d6dSopenharmony_ci C.8 and C.9. 562e5b6d6dSopenharmony_ci 572e5b6d6dSopenharmony_ciThe procedure for preparing strings for use can be described in the following 582e5b6d6dSopenharmony_cisteps: 592e5b6d6dSopenharmony_ci 602e5b6d6dSopenharmony_ci1. *Map*: For each code point in the input check if it has a mapping defined in 612e5b6d6dSopenharmony_ci the mapping table, if so, replace it with the mapping in the output. 622e5b6d6dSopenharmony_ci 632e5b6d6dSopenharmony_ci2. *Normalize*: Normalize the output of step 1 using Unicode Normalization Form 642e5b6d6dSopenharmony_ci NFKC, it the option is set. Normalization algorithm must conform to UAX 15. 652e5b6d6dSopenharmony_ci 662e5b6d6dSopenharmony_ci3. *Prohibit*: For each code point in the output of step 2 check if the code 672e5b6d6dSopenharmony_ci point is present in the prohibited table, if so, fail returning an error. 682e5b6d6dSopenharmony_ci 692e5b6d6dSopenharmony_ci4. *Check BiDi*: Check for code points with strong right-to-left directionality 702e5b6d6dSopenharmony_ci in the output of step 3. If present, check if the string satisfies the rules 712e5b6d6dSopenharmony_ci for bidirectional strings as specified. 722e5b6d6dSopenharmony_ci 732e5b6d6dSopenharmony_ci## NamePrep 742e5b6d6dSopenharmony_ci 752e5b6d6dSopenharmony_ciNamePrep is a profile of StringPrep for use in IDNA. This profile in defined in 762e5b6d6dSopenharmony_ciRFC 3491(<http://www.rfc-editor.org/rfc/rfc3491.txt> ). 772e5b6d6dSopenharmony_ci 782e5b6d6dSopenharmony_ciThe profile specifies the following rules: 792e5b6d6dSopenharmony_ci 802e5b6d6dSopenharmony_ci1. *Map* : Include all code point mappings specified in the StringPrep. 812e5b6d6dSopenharmony_ci 822e5b6d6dSopenharmony_ci2. *Normalize*: Normalize the output of step 1 according to NFKC. 832e5b6d6dSopenharmony_ci 842e5b6d6dSopenharmony_ci3. *Prohibit*: Prohibit all code points specified as prohibited in StringPrep 852e5b6d6dSopenharmony_ci except for the space ( U+0020) code point from the output of step 2. 862e5b6d6dSopenharmony_ci 872e5b6d6dSopenharmony_ci4. *Check BiDi*: Check for bidirectional code points and process according to 882e5b6d6dSopenharmony_ci the rules specified in StringPrep. 892e5b6d6dSopenharmony_ci 902e5b6d6dSopenharmony_ci## Punycode 912e5b6d6dSopenharmony_ci 922e5b6d6dSopenharmony_ciPunycode is an encoding scheme for Unicode for use in IDNA. Punycode converts 932e5b6d6dSopenharmony_ciUnicode text to unique sequence of ASCII text and back to Unicode. It is an 942e5b6d6dSopenharmony_ciASCII Compatible Encoding (ACE). Punycode is described in RFC 3492 952e5b6d6dSopenharmony_ci(<http://www.rfc-editor.org/rfc/rfc3492.txt> ). 962e5b6d6dSopenharmony_ci 972e5b6d6dSopenharmony_ciThe Punycode algorithm is a form of a general Bootstring algorithm which allows 982e5b6d6dSopenharmony_cistrings composed of smaller set of code points to uniquely represent any string 992e5b6d6dSopenharmony_ciof code points from a larger set. Punycode represents Unicode code points from 1002e5b6d6dSopenharmony_ciU+0000 to U+10FFFF by using the smaller ASCII set U+0000 to U+0007F. The 1012e5b6d6dSopenharmony_cialgorithm can also preserve case information of the code points in the lager set 1022e5b6d6dSopenharmony_ciwhile and encoding and decoding. This feature, however, is not used in IDNA. 1032e5b6d6dSopenharmony_ci 1042e5b6d6dSopenharmony_ci## Internationalizing Domain Names in Applications (IDNA) 1052e5b6d6dSopenharmony_ci 1062e5b6d6dSopenharmony_ciThe Domain Name Service (DNS) protocol defines the procedure for matching of 1072e5b6d6dSopenharmony_ciASCII strings case insensitively to the names in the lookup tables containing 1082e5b6d6dSopenharmony_cimapping of IP (Internet Protocol) addresses to server names. When Unicode is 1092e5b6d6dSopenharmony_ciused instead of ASCII in server names then two problems arise which need to be 1102e5b6d6dSopenharmony_cidealt with differently. When the server name is displayed to the user then 1112e5b6d6dSopenharmony_ciUnicode text should be displayed. When Unicode text is stored in lookup tables, 1122e5b6d6dSopenharmony_cifor compatibility with older DNS protocol and the resolver libraries, the text 1132e5b6d6dSopenharmony_cishould be the ASCII equivalent. The IDNA protocol, defined by RFC 3490 1142e5b6d6dSopenharmony_ci(<http://www.rfc-editor.org/rfc/rfc3490.txt> ), satisfies the above 1152e5b6d6dSopenharmony_cirequirements. 1162e5b6d6dSopenharmony_ci 1172e5b6d6dSopenharmony_ciServer names stored in the DNS lookup tables are usually formed by concatenating 1182e5b6d6dSopenharmony_cidomain labels with a label separator, for example: 1192e5b6d6dSopenharmony_ci 1202e5b6d6dSopenharmony_ciThe protocol defines operations to be performed on domain labels before the 1212e5b6d6dSopenharmony_cinames are stored in the lookup tables and before the names fetched from lookup 1222e5b6d6dSopenharmony_citables are displayed to the user. The operations are : 1232e5b6d6dSopenharmony_ci 1242e5b6d6dSopenharmony_ci1. ToASCII: This operation is performed on domain labels before sending the 1252e5b6d6dSopenharmony_ci name to a resolver and before storing the name in the DNS lookup table. The 1262e5b6d6dSopenharmony_ci domain labels are processed by StringPrep algorithm by using the rules 1272e5b6d6dSopenharmony_ci specified by NamePrep profile. The output of this step is then encoded by 1282e5b6d6dSopenharmony_ci using Punycode and an ACE prefix is added to denote that the text is encoded 1292e5b6d6dSopenharmony_ci using Punycode. IDNA uses “xn--” before the encoded label. 1302e5b6d6dSopenharmony_ci 1312e5b6d6dSopenharmony_ci1. ToUnicode: This operation is performed on domain labels before displaying 1322e5b6d6dSopenharmony_ci the names to to users. If the domain label is prefixed with the ACE prefix 1332e5b6d6dSopenharmony_ci for IDNA, then the label excluding the prefix is decoded using Punycode. The 1342e5b6d6dSopenharmony_ci output of Punycode decoder is verified by applying ToASCII operation and 1352e5b6d6dSopenharmony_ci comparing the output with the input to the ToUnicode operation. 1362e5b6d6dSopenharmony_ci 1372e5b6d6dSopenharmony_ciUnicode contains code points that are glyphically similar to the ASCII Full Stop 1382e5b6d6dSopenharmony_ci(U+002E). These code points must be treated as label separators when performing 1392e5b6d6dSopenharmony_ciToASCII operation. These code points are : 1402e5b6d6dSopenharmony_ci 1412e5b6d6dSopenharmony_ci1. Ideographic Full Stop (U+3002) 1422e5b6d6dSopenharmony_ci 1432e5b6d6dSopenharmony_ci2. Full Width Full Stop (U+FF0E) 1442e5b6d6dSopenharmony_ci 1452e5b6d6dSopenharmony_ci3. Half Width Ideographic Full Stop (U+FF61) 1462e5b6d6dSopenharmony_ci 1472e5b6d6dSopenharmony_ciUnassigned code points in Unicode Version 3.2 as given in StringPrep tables are 1482e5b6d6dSopenharmony_citreated differently depending on how the processed string is used. For query 1492e5b6d6dSopenharmony_cioperations, where a registrar is requested for information regarding 1502e5b6d6dSopenharmony_ciavailability of a certain domain name, unassigned code points are allowed to be 1512e5b6d6dSopenharmony_cipresent in the string. For storing the string in DNS lookup tables, unassigned 1522e5b6d6dSopenharmony_cicode points are prohibited from the input. 1532e5b6d6dSopenharmony_ci 1542e5b6d6dSopenharmony_ciIDNA specifies that the ToUnicode and ToASCII have options to check for 1552e5b6d6dSopenharmony_ciLetter-Digit-Hyphen code points and adhere to the STD3 ASCII Rules. 1562e5b6d6dSopenharmony_ci 1572e5b6d6dSopenharmony_ciIDNA specifies that domain labels are equivalent if and only if the output of 1582e5b6d6dSopenharmony_ciToASCII operation on the labels match using case insensitive ASCII comparison. 1592e5b6d6dSopenharmony_ci 1602e5b6d6dSopenharmony_ci## StringPrep Service in ICU 1612e5b6d6dSopenharmony_ci 1622e5b6d6dSopenharmony_ciThe StringPrep service in ICU is data driven. The service is based on 1632e5b6d6dSopenharmony_ciOpen-Use-Close pattern. A StringPrep profile is opened, the strings are 1642e5b6d6dSopenharmony_ciprocessed according to the rules specified in the profile and the profile is 1652e5b6d6dSopenharmony_ciclosed once the profile is ready to be disposed. 1662e5b6d6dSopenharmony_ci 1672e5b6d6dSopenharmony_ciTools for filtering RFC 3454 and producing a rule file that can be compiled into 1682e5b6d6dSopenharmony_cia binary format containing all the information required by the service are 1692e5b6d6dSopenharmony_ciprovided. 1702e5b6d6dSopenharmony_ci 1712e5b6d6dSopenharmony_ciThe procedure for producing a StringPrep profile data file are as given below: 1722e5b6d6dSopenharmony_ci 1732e5b6d6dSopenharmony_ci1. Run filterRFC3454.pl Perl tool, to filter the RFC file and produce a rule 1742e5b6d6dSopenharmony_ci file. The text file produced can be edited by the clients to add/delete 1752e5b6d6dSopenharmony_ci mappings or add/delete prohibited code points. 1762e5b6d6dSopenharmony_ci 1772e5b6d6dSopenharmony_ci2. Run the gensprep tool to compile the rule file into a binary format. The 1782e5b6d6dSopenharmony_ci options to turn on normalization of strings and checking of bidirectional 1792e5b6d6dSopenharmony_ci code points are passed as command line options to the tool. This tool 1802e5b6d6dSopenharmony_ci produces a binary profile file with the extension “spp”. 1812e5b6d6dSopenharmony_ci 1822e5b6d6dSopenharmony_ci3. Open the StringPrep profile with path to the binary and name of the binary 1832e5b6d6dSopenharmony_ci profile file as the options to the open call. The profile data files are 1842e5b6d6dSopenharmony_ci memory mapped and cached for optimum performance. 1852e5b6d6dSopenharmony_ci 1862e5b6d6dSopenharmony_ci### Code Snippets 1872e5b6d6dSopenharmony_ci 1882e5b6d6dSopenharmony_ci> :point_right: **Note**: The code snippets demonstrate the usage of the APIs. Applications should 1892e5b6d6dSopenharmony_cikeep the profile object around for reuse, instead of opening and closing the 1902e5b6d6dSopenharmony_ciprofile each time.* 1912e5b6d6dSopenharmony_ci 1922e5b6d6dSopenharmony_ci#### C++ 1932e5b6d6dSopenharmony_ci 1942e5b6d6dSopenharmony_ci UErrorCode status = U_ZERO_ERROR; 1952e5b6d6dSopenharmony_ci UParseError parseError; 1962e5b6d6dSopenharmony_ci /* open the StringPrep profile */ 1972e5b6d6dSopenharmony_ci UStringPrepProfile* nameprep = usprep_open("/usr/joe/mydata", 1982e5b6d6dSopenharmony_ci "nfscsi", &status); 1992e5b6d6dSopenharmony_ci if(U_FAILURE(status)) { 2002e5b6d6dSopenharmony_ci /* handle the error */ 2012e5b6d6dSopenharmony_ci } 2022e5b6d6dSopenharmony_ci /* prepare the string for use according 2032e5b6d6dSopenharmony_ci * to the rules specified in the profile 2042e5b6d6dSopenharmony_ci */ 2052e5b6d6dSopenharmony_ci int32_t retLen = usprep_prepare(src, srcLength, dest, 2062e5b6d6dSopenharmony_ci destCapacity, USPREP_ALLOW_UNASSIGNED, 2072e5b6d6dSopenharmony_ci nameprep, &parseError, &status); 2082e5b6d6dSopenharmony_ci /* close the profile */ 2092e5b6d6dSopenharmony_ci usprep_close(nameprep); 2102e5b6d6dSopenharmony_ci 2112e5b6d6dSopenharmony_ci#### Java 2122e5b6d6dSopenharmony_ci 2132e5b6d6dSopenharmony_ci private static final StringPrep nfscsi = null; 2142e5b6d6dSopenharmony_ci //singleton instance 2152e5b6d6dSopenharmony_ci private static final NFSCSIStringPrep prep=new NFSCSIStringPrep(); 2162e5b6d6dSopenharmony_ci private NFSCSIStringPrep() { 2172e5b6d6dSopenharmony_ci try { 2182e5b6d6dSopenharmony_ci InputStream nfscsiFile = TestUtil.getDataStream("nfscsi.spp"); 2192e5b6d6dSopenharmony_ci nfscsi = new StringPrep(nfscsiFile); 2202e5b6d6dSopenharmony_ci nfscsiFile.close(); 2212e5b6d6dSopenharmony_ci } catch(IOException e) { 2222e5b6d6dSopenharmony_ci throw new RuntimeException(e.toString()); 2232e5b6d6dSopenharmony_ci } 2242e5b6d6dSopenharmony_ci } 2252e5b6d6dSopenharmony_ci private static byte[] prepare(byte[] src, StringPrep prep) 2262e5b6d6dSopenharmony_ci throws StringPrepParseException, UnsupportedEncodingException { 2272e5b6d6dSopenharmony_ci String s = new String(src, "UTF-8"); 2282e5b6d6dSopenharmony_ci UCharacterIterator iter = UCharacterIterator.getInstance(s); 2292e5b6d6dSopenharmony_ci StringBuffer out = prep.prepare(iter,StringPrep.DEFAULT); 2302e5b6d6dSopenharmony_ci return out.toString().getBytes("UTF-8"); 2312e5b6d6dSopenharmony_ci } 2322e5b6d6dSopenharmony_ci 2332e5b6d6dSopenharmony_ci## IDNA API in ICU 2342e5b6d6dSopenharmony_ci 2352e5b6d6dSopenharmony_ciICU provides APIs for performing the ToASCII, ToUnicode and compare operations 2362e5b6d6dSopenharmony_cias defined by the RFC 3490. Convenience methods for comparing IDNs are also 2372e5b6d6dSopenharmony_ciprovided. These APIs follow ICU policies for string manipulation and coding 2382e5b6d6dSopenharmony_ciguidelines. 2392e5b6d6dSopenharmony_ci 2402e5b6d6dSopenharmony_ci### Code Snippets 2412e5b6d6dSopenharmony_ci 2422e5b6d6dSopenharmony_ci> :point_right: **Note**: The code snippets demonstrate the usage of the APIs. Applications should 2432e5b6d6dSopenharmony_cikeep the profile object around for reuse, instead of opening and closing the 2442e5b6d6dSopenharmony_ciprofile each time.* 2452e5b6d6dSopenharmony_ci 2462e5b6d6dSopenharmony_ci### ToASCII operation 2472e5b6d6dSopenharmony_ci 2482e5b6d6dSopenharmony_ci***C*** 2492e5b6d6dSopenharmony_ci 2502e5b6d6dSopenharmony_ci UChar* dest = (UChar*) malloc(destCapacity * U_SIZEOF_UCHAR); 2512e5b6d6dSopenharmony_ci destLen = uidna_toASCII(src, srcLen, dest, destCapacity, 2522e5b6d6dSopenharmony_ci UIDNA_DEFAULT, &parseError, &status); 2532e5b6d6dSopenharmony_ci if(status == U_BUFFER_OVERFLOW_ERROR) { 2542e5b6d6dSopenharmony_ci status = U_ZERO_ERROR; 2552e5b6d6dSopenharmony_ci destCapacity= destLen + 1; /* for the terminating Null */ 2562e5b6d6dSopenharmony_ci free(dest); /* free the memory */ 2572e5b6d6dSopenharmony_ci dest = (UChar*) malloc(destLen * U_SIZEOF_UCHAR); 2582e5b6d6dSopenharmony_ci destLen = uidna_toASCII(src, srcLen, dest, destCapacity, 2592e5b6d6dSopenharmony_ci UIDNA_DEFAULT, &parseError, &status); 2602e5b6d6dSopenharmony_ci } 2612e5b6d6dSopenharmony_ci if(U_FAILURE(status)) { 2622e5b6d6dSopenharmony_ci /* handle the error */ 2632e5b6d6dSopenharmony_ci } 2642e5b6d6dSopenharmony_ci /* do interesting stuff with output*/ 2652e5b6d6dSopenharmony_ci 2662e5b6d6dSopenharmony_ci***Java*** 2672e5b6d6dSopenharmony_ci 2682e5b6d6dSopenharmony_ci try { 2692e5b6d6dSopenharmony_ci StringBuffer out= IDNA.convertToASCII(inBuf,IDNA.DEFAULT); 2702e5b6d6dSopenharmony_ci } catch(StringPrepParseException ex) { 2712e5b6d6dSopenharmony_ci /*handle the exception*/ 2722e5b6d6dSopenharmony_ci } 2732e5b6d6dSopenharmony_ci 2742e5b6d6dSopenharmony_ci### toUnicode operation 2752e5b6d6dSopenharmony_ci 2762e5b6d6dSopenharmony_ci***C*** 2772e5b6d6dSopenharmony_ci 2782e5b6d6dSopenharmony_ci UChar * dest = (UChar *) malloc(destCapacity * U_SIZEOF_UCHAR); 2792e5b6d6dSopenharmony_ci destLen = uidna_toUnicode(src, srcLen, dest, destCapacity, 2802e5b6d6dSopenharmony_ci UIDNA_DEFAULT 2812e5b6d6dSopenharmony_ci &parseError, &status); 2822e5b6d6dSopenharmony_ci if(status == U_BUFFER_OVERFLOW_ERROR) { 2832e5b6d6dSopenharmony_ci status = U_ZERO_ERROR; 2842e5b6d6dSopenharmony_ci destCapacity= destLen + 1; /* for the terminating Null */ 2852e5b6d6dSopenharmony_ci /* free the memory */ 2862e5b6d6dSopenharmony_ci free(dest); 2872e5b6d6dSopenharmony_ci dest = (UChar*) malloc(destLen * U_SIZEOF_UCHAR); 2882e5b6d6dSopenharmony_ci destLen = uidna_toUnicode(src, srcLen, dest, destCapacity, 2892e5b6d6dSopenharmony_ci UIDNA_DEFAULT, &parseError, &status); 2902e5b6d6dSopenharmony_ci } 2912e5b6d6dSopenharmony_ci if(U_FAILURE(status)) { 2922e5b6d6dSopenharmony_ci /* handle the error */ 2932e5b6d6dSopenharmony_ci } 2942e5b6d6dSopenharmony_ci /* do interesting stuff with output*/ 2952e5b6d6dSopenharmony_ci 2962e5b6d6dSopenharmony_ci***Java*** 2972e5b6d6dSopenharmony_ci 2982e5b6d6dSopenharmony_ci try { 2992e5b6d6dSopenharmony_ci StringBuffer out= IDNA.convertToUnicode(inBuf,IDNA.DEFAULT); 3002e5b6d6dSopenharmony_ci } catch(StringPrepParseException ex) { 3012e5b6d6dSopenharmony_ci // handle the exception 3022e5b6d6dSopenharmony_ci } 3032e5b6d6dSopenharmony_ci 3042e5b6d6dSopenharmony_ci### compare operation 3052e5b6d6dSopenharmony_ci 3062e5b6d6dSopenharmony_ci***C*** 3072e5b6d6dSopenharmony_ci 3082e5b6d6dSopenharmony_ci int32_t rc = uidna_compare(source1, length1, 3092e5b6d6dSopenharmony_ci source2, length2, 3102e5b6d6dSopenharmony_ci UIDNA_DEFAULT, 3112e5b6d6dSopenharmony_ci &status); 3122e5b6d6dSopenharmony_ci if(rc==0) { 3132e5b6d6dSopenharmony_ci /* the IDNs are same ... do something interesting */ 3142e5b6d6dSopenharmony_ci } else { 3152e5b6d6dSopenharmony_ci /* the IDNs are different ... do something */ 3162e5b6d6dSopenharmony_ci } 3172e5b6d6dSopenharmony_ci 3182e5b6d6dSopenharmony_ci***Java*** 3192e5b6d6dSopenharmony_ci 3202e5b6d6dSopenharmony_ci try { 3212e5b6d6dSopenharmony_ci int retVal = IDNA.compare(s1,s2,IDNA.DEFAULT); 3222e5b6d6dSopenharmony_ci // do something interesting with retVal 3232e5b6d6dSopenharmony_ci } catch(StringPrepParseException e) { 3242e5b6d6dSopenharmony_ci // handle the exception 3252e5b6d6dSopenharmony_ci } 3262e5b6d6dSopenharmony_ci 3272e5b6d6dSopenharmony_ci## Design Considerations 3282e5b6d6dSopenharmony_ci 3292e5b6d6dSopenharmony_ciStringPrep profiles exhibit the following characteristics: 3302e5b6d6dSopenharmony_ci 3312e5b6d6dSopenharmony_ci1. The profiles contain information about code points. StringPrep allows 3322e5b6d6dSopenharmony_ci profiles to add/delete code points or mappings. 3332e5b6d6dSopenharmony_ci 3342e5b6d6dSopenharmony_ci2. Options such as turning normalization and checking for bidirectional code 3352e5b6d6dSopenharmony_ci points on or off are the properties of the profiles. 3362e5b6d6dSopenharmony_ci 3372e5b6d6dSopenharmony_ci3. The StringPrep algorithm is not overridden by the profile. 3382e5b6d6dSopenharmony_ci 3392e5b6d6dSopenharmony_ci4. Once defined, the profiles do not change. 3402e5b6d6dSopenharmony_ci 3412e5b6d6dSopenharmony_ciThe StringPrep profiles are used in network protocols so runtime performance is 3422e5b6d6dSopenharmony_ciimportant. 3432e5b6d6dSopenharmony_ci 3442e5b6d6dSopenharmony_ciMany profiles have been and are being defined, so applications should be able to 3452e5b6d6dSopenharmony_ciplug-in arbitrary profiles and get the desired result out of the framework. 3462e5b6d6dSopenharmony_ci 3472e5b6d6dSopenharmony_ciICU is designed for this usage by providing build-time tools for arbitrary 3482e5b6d6dSopenharmony_ciStringPrep profile definitions, and loading them from application-supplied data 3492e5b6d6dSopenharmony_ciin binary form with data structures optimized for runtime use. 3502e5b6d6dSopenharmony_ci 3512e5b6d6dSopenharmony_ci## Demo 3522e5b6d6dSopenharmony_ci 3532e5b6d6dSopenharmony_ciA web application at <https://icu4c-demos.unicode.org/icu-bin/idnbrowser> 3542e5b6d6dSopenharmony_ciillustrates the use of IDNA API. The source code for the application is 3552e5b6d6dSopenharmony_ciavailable at <https://github.com/unicode-org/icu-demos/tree/main/idnbrowser>. 3562e5b6d6dSopenharmony_ci 3572e5b6d6dSopenharmony_ci## Appendix 3582e5b6d6dSopenharmony_ci 3592e5b6d6dSopenharmony_ci#### NFS Version 4 Profiles 3602e5b6d6dSopenharmony_ci 3612e5b6d6dSopenharmony_ciNetwork File System Version 4 defined by RFC 3530 3622e5b6d6dSopenharmony_ci(<http://www.rfc-editor.org/rfc/rfc3530.txt> ) defines use of Unicode text in 3632e5b6d6dSopenharmony_cithe protocol. ICU provides the requisite profiles as part of test suite and code 3642e5b6d6dSopenharmony_cifor processing the strings according the profiles as a part of samples. 3652e5b6d6dSopenharmony_ci 3662e5b6d6dSopenharmony_ciThe RFC defines three profiles : 3672e5b6d6dSopenharmony_ci 3682e5b6d6dSopenharmony_ci1. *nfs4_cs_prep Profile*: This profile is used for preparing file and path 3692e5b6d6dSopenharmony_ci name strings. Normalization of code points and checking for bidirectional 3702e5b6d6dSopenharmony_ci code points are turned off. Case mappings are included if the NFS 3712e5b6d6dSopenharmony_ci implementation supports case insensitive file and path names. 3722e5b6d6dSopenharmony_ci 3732e5b6d6dSopenharmony_ci2. *nfs4_cis_prep Profile*: This profile is used for preparing NFS server 3742e5b6d6dSopenharmony_ci names. Normalization of code points and checking for bidirectional code 3752e5b6d6dSopenharmony_ci points are turned on. This profile is equivalent to NamePrep profile. 3762e5b6d6dSopenharmony_ci 3772e5b6d6dSopenharmony_ci3. *nfs4_mixed_prep Profile*: This profile is used for preparing strings in the 3782e5b6d6dSopenharmony_ci Access Control Entries of NFS servers. These strings consist of two parts, 3792e5b6d6dSopenharmony_ci prefix and suffix, separated by '@' (U+0040). The prefix is processed with 3802e5b6d6dSopenharmony_ci case mappings turned off and the suffix is processed with case mappings 3812e5b6d6dSopenharmony_ci turned on. Normalization of code points and checking for bidirectional code 3822e5b6d6dSopenharmony_ci points are turned on. 3832e5b6d6dSopenharmony_ci 3842e5b6d6dSopenharmony_ci#### XMPP Profiles 3852e5b6d6dSopenharmony_ci 3862e5b6d6dSopenharmony_ciExtensible Messaging and Presence Protocol (XMPP) is an XML based protocol for 3872e5b6d6dSopenharmony_cinear real-time extensible messaging and presence. This protocol defines use of 3882e5b6d6dSopenharmony_citwo StringPrep profiles: 3892e5b6d6dSopenharmony_ci 3902e5b6d6dSopenharmony_ci1. *ResourcePrep Profile*: This profile is used for processing the resource 3912e5b6d6dSopenharmony_ci identifiers within XMPP. Normalization of code points and checking of 3922e5b6d6dSopenharmony_ci bidirectional code points are turned on. Case mappings are excluded. The 3932e5b6d6dSopenharmony_ci space code point (U+0020) is excluded from the prohibited code points set. 3942e5b6d6dSopenharmony_ci 3952e5b6d6dSopenharmony_ci2. *NodePrep Profile*: This profile is used for processing the node identifiers 3962e5b6d6dSopenharmony_ci within XMPP. Normalization of code points and checking of bidirectional code 3972e5b6d6dSopenharmony_ci points are turned on. Case mappings are included. All code points specified 3982e5b6d6dSopenharmony_ci as prohibited in StringPrep are prohibited. Additional code points are added 3992e5b6d6dSopenharmony_ci to the prohibited set. 400