12e5b6d6dSopenharmony_ci---
22e5b6d6dSopenharmony_cilayout: default
32e5b6d6dSopenharmony_cititle: StringPrep
42e5b6d6dSopenharmony_cinav_order: 7
52e5b6d6dSopenharmony_ciparent: Chars and Strings
62e5b6d6dSopenharmony_ci---
72e5b6d6dSopenharmony_ci<!--
82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others.
92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html
102e5b6d6dSopenharmony_ci-->
112e5b6d6dSopenharmony_ci
122e5b6d6dSopenharmony_ci# StringPrep
132e5b6d6dSopenharmony_ci
142e5b6d6dSopenharmony_ci## Overview
152e5b6d6dSopenharmony_ci
162e5b6d6dSopenharmony_ciComparing strings in a consistent manner becomes imperative when a large
172e5b6d6dSopenharmony_cirepertoire of characters such as Unicode is used in network protocols.
182e5b6d6dSopenharmony_ciStringPrep provides sets of rules for use of Unicode and syntax for prevention
192e5b6d6dSopenharmony_ciof spoofing. The implementation of StringPrep and IDNA services and their usage
202e5b6d6dSopenharmony_ciin ICU is described below.
212e5b6d6dSopenharmony_ci
222e5b6d6dSopenharmony_ci## StringPrep
232e5b6d6dSopenharmony_ci
242e5b6d6dSopenharmony_ciStringPrep, the process of preparing Unicode strings for use in network
252e5b6d6dSopenharmony_ciprotocols is defined in RFC 3454 (<http://www.rfc-editor.org/rfc/rfc3454.txt> ).
262e5b6d6dSopenharmony_ciThe RFC defines a broad framework and rules for processing the strings.
272e5b6d6dSopenharmony_ci
282e5b6d6dSopenharmony_ciProtocols that prescribe use of StringPrep must define a profile of StringPrep,
292e5b6d6dSopenharmony_ciwhose applicability is limited to the protocol. Profiles are a set of rules and
302e5b6d6dSopenharmony_cidata tables which describe the how the strings should be prepare. The profiles
312e5b6d6dSopenharmony_cican choose to turn on or turn off normalization, checking for bidirectional
322e5b6d6dSopenharmony_cicharacters. They can also choose to add or remove mappings, unassigned and
332e5b6d6dSopenharmony_ciprohibited code points from the tables provided.
342e5b6d6dSopenharmony_ci
352e5b6d6dSopenharmony_ciStringPrep uses Unicode Version 3.2 and defines a set of tables for use by the
362e5b6d6dSopenharmony_ciprofiles. The profiles can chose to include or exclude tables or code points
372e5b6d6dSopenharmony_cifrom the tables defined by the RFC.
382e5b6d6dSopenharmony_ci
392e5b6d6dSopenharmony_ciStringPrep defines tables that can be broadly classified into
402e5b6d6dSopenharmony_ci
412e5b6d6dSopenharmony_ci1.  *Unassigned Table*: Contains code points that are unassigned in Unicode
422e5b6d6dSopenharmony_ci    Version 3.2. Unassigned code points may be allowed or disallowed in the
432e5b6d6dSopenharmony_ci    output string depending on the application. The table in Appendix A.1 of the
442e5b6d6dSopenharmony_ci    RFC contains the code points.
452e5b6d6dSopenharmony_ci
462e5b6d6dSopenharmony_ci1.  *Mapping Tables*: Code points that are commonly deleted from the output and
472e5b6d6dSopenharmony_ci    code points that are case mapped are included in this table. There are two
482e5b6d6dSopenharmony_ci    mapping tables in the Appendix namely B.1 and B.2
492e5b6d6dSopenharmony_ci
502e5b6d6dSopenharmony_ci2.  *Prohibited Tables*: Contains code points that are prohibited from the
512e5b6d6dSopenharmony_ci    output string. Control codes, private use area code points, non-character
522e5b6d6dSopenharmony_ci    code points, surrogate code points, tagging and deprecated code points are
532e5b6d6dSopenharmony_ci    included in this table. There are nine mapping tables in Appendix which
542e5b6d6dSopenharmony_ci    include the prohibited code points namely C.1, C.2, C.3, C.4, C.5, C.6, C.7,
552e5b6d6dSopenharmony_ci    C.8 and C.9.
562e5b6d6dSopenharmony_ci
572e5b6d6dSopenharmony_ciThe procedure for preparing strings for use can be described in the following
582e5b6d6dSopenharmony_cisteps:
592e5b6d6dSopenharmony_ci
602e5b6d6dSopenharmony_ci1.  *Map*: For each code point in the input check if it has a mapping defined in
612e5b6d6dSopenharmony_ci    the mapping table, if so, replace it with the mapping in the output.
622e5b6d6dSopenharmony_ci
632e5b6d6dSopenharmony_ci2.  *Normalize*: Normalize the output of step 1 using Unicode Normalization Form
642e5b6d6dSopenharmony_ci    NFKC, it the option is set. Normalization algorithm must conform to UAX 15.
652e5b6d6dSopenharmony_ci
662e5b6d6dSopenharmony_ci3.  *Prohibit*: For each code point in the output of step 2 check if the code
672e5b6d6dSopenharmony_ci    point is present in the prohibited table, if so, fail returning an error.
682e5b6d6dSopenharmony_ci
692e5b6d6dSopenharmony_ci4.  *Check BiDi*: Check for code points with strong right-to-left directionality
702e5b6d6dSopenharmony_ci    in the output of step 3. If present, check if the string satisfies the rules
712e5b6d6dSopenharmony_ci    for bidirectional strings as specified.
722e5b6d6dSopenharmony_ci
732e5b6d6dSopenharmony_ci## NamePrep
742e5b6d6dSopenharmony_ci
752e5b6d6dSopenharmony_ciNamePrep is a profile of StringPrep for use in IDNA. This profile in defined in
762e5b6d6dSopenharmony_ciRFC 3491(<http://www.rfc-editor.org/rfc/rfc3491.txt> ).
772e5b6d6dSopenharmony_ci
782e5b6d6dSopenharmony_ciThe profile specifies the following rules:
792e5b6d6dSopenharmony_ci
802e5b6d6dSopenharmony_ci1.  *Map* : Include all code point mappings specified in the StringPrep.
812e5b6d6dSopenharmony_ci
822e5b6d6dSopenharmony_ci2.  *Normalize*: Normalize the output of step 1 according to NFKC.
832e5b6d6dSopenharmony_ci
842e5b6d6dSopenharmony_ci3.  *Prohibit*: Prohibit all code points specified as prohibited in StringPrep
852e5b6d6dSopenharmony_ci    except for the space ( U+0020) code point from the output of step 2.
862e5b6d6dSopenharmony_ci
872e5b6d6dSopenharmony_ci4.  *Check BiDi*: Check for bidirectional code points and process according to
882e5b6d6dSopenharmony_ci    the rules specified in StringPrep.
892e5b6d6dSopenharmony_ci
902e5b6d6dSopenharmony_ci## Punycode
912e5b6d6dSopenharmony_ci
922e5b6d6dSopenharmony_ciPunycode is an encoding scheme for Unicode for use in IDNA. Punycode converts
932e5b6d6dSopenharmony_ciUnicode text to unique sequence of ASCII text and back to Unicode. It is an
942e5b6d6dSopenharmony_ciASCII Compatible Encoding (ACE). Punycode is described in RFC 3492
952e5b6d6dSopenharmony_ci(<http://www.rfc-editor.org/rfc/rfc3492.txt> ).
962e5b6d6dSopenharmony_ci
972e5b6d6dSopenharmony_ciThe Punycode algorithm is a form of a general Bootstring algorithm which allows
982e5b6d6dSopenharmony_cistrings composed of smaller set of code points to uniquely represent any string
992e5b6d6dSopenharmony_ciof code points from a larger set. Punycode represents Unicode code points from
1002e5b6d6dSopenharmony_ciU+0000 to U+10FFFF by using the smaller ASCII set U+0000 to U+0007F. The
1012e5b6d6dSopenharmony_cialgorithm can also preserve case information of the code points in the lager set
1022e5b6d6dSopenharmony_ciwhile and encoding and decoding. This feature, however, is not used in IDNA.
1032e5b6d6dSopenharmony_ci
1042e5b6d6dSopenharmony_ci## Internationalizing Domain Names in Applications (IDNA)
1052e5b6d6dSopenharmony_ci
1062e5b6d6dSopenharmony_ciThe Domain Name Service (DNS) protocol defines the procedure for matching of
1072e5b6d6dSopenharmony_ciASCII strings case insensitively to the names in the lookup tables containing
1082e5b6d6dSopenharmony_cimapping of IP (Internet Protocol) addresses to server names. When Unicode is
1092e5b6d6dSopenharmony_ciused instead of ASCII in server names then two problems arise which need to be
1102e5b6d6dSopenharmony_cidealt with differently. When the server name is displayed to the user then
1112e5b6d6dSopenharmony_ciUnicode text should be displayed. When Unicode text is stored in lookup tables,
1122e5b6d6dSopenharmony_cifor compatibility with older DNS protocol and the resolver libraries, the text
1132e5b6d6dSopenharmony_cishould be the ASCII equivalent. The IDNA protocol, defined by RFC 3490
1142e5b6d6dSopenharmony_ci(<http://www.rfc-editor.org/rfc/rfc3490.txt> ), satisfies the above
1152e5b6d6dSopenharmony_cirequirements.
1162e5b6d6dSopenharmony_ci
1172e5b6d6dSopenharmony_ciServer names stored in the DNS lookup tables are usually formed by concatenating
1182e5b6d6dSopenharmony_cidomain labels with a label separator, for example:
1192e5b6d6dSopenharmony_ci
1202e5b6d6dSopenharmony_ciThe protocol defines operations to be performed on domain labels before the
1212e5b6d6dSopenharmony_cinames are stored in the lookup tables and before the names fetched from lookup
1222e5b6d6dSopenharmony_citables are displayed to the user. The operations are :
1232e5b6d6dSopenharmony_ci
1242e5b6d6dSopenharmony_ci1.  ToASCII: This operation is performed on domain labels before sending the
1252e5b6d6dSopenharmony_ci    name to a resolver and before storing the name in the DNS lookup table. The
1262e5b6d6dSopenharmony_ci    domain labels are processed by StringPrep algorithm by using the rules
1272e5b6d6dSopenharmony_ci    specified by NamePrep profile. The output of this step is then encoded by
1282e5b6d6dSopenharmony_ci    using Punycode and an ACE prefix is added to denote that the text is encoded
1292e5b6d6dSopenharmony_ci    using Punycode. IDNA uses “xn--” before the encoded label.
1302e5b6d6dSopenharmony_ci
1312e5b6d6dSopenharmony_ci1.  ToUnicode: This operation is performed on domain labels before displaying
1322e5b6d6dSopenharmony_ci    the names to to users. If the domain label is prefixed with the ACE prefix
1332e5b6d6dSopenharmony_ci    for IDNA, then the label excluding the prefix is decoded using Punycode. The
1342e5b6d6dSopenharmony_ci    output of Punycode decoder is verified by applying ToASCII operation and
1352e5b6d6dSopenharmony_ci    comparing the output with the input to the ToUnicode operation.
1362e5b6d6dSopenharmony_ci
1372e5b6d6dSopenharmony_ciUnicode contains code points that are glyphically similar to the ASCII Full Stop
1382e5b6d6dSopenharmony_ci(U+002E). These code points must be treated as label separators when performing
1392e5b6d6dSopenharmony_ciToASCII operation. These code points are :
1402e5b6d6dSopenharmony_ci
1412e5b6d6dSopenharmony_ci1.  Ideographic Full Stop (U+3002)
1422e5b6d6dSopenharmony_ci
1432e5b6d6dSopenharmony_ci2.  Full Width Full Stop (U+FF0E)
1442e5b6d6dSopenharmony_ci
1452e5b6d6dSopenharmony_ci3.  Half Width Ideographic Full Stop (U+FF61)
1462e5b6d6dSopenharmony_ci
1472e5b6d6dSopenharmony_ciUnassigned code points in Unicode Version 3.2 as given in StringPrep tables are
1482e5b6d6dSopenharmony_citreated differently depending on how the processed string is used. For query
1492e5b6d6dSopenharmony_cioperations, where a registrar is requested for information regarding
1502e5b6d6dSopenharmony_ciavailability of a certain domain name, unassigned code points are allowed to be
1512e5b6d6dSopenharmony_cipresent in the string. For storing the string in DNS lookup tables, unassigned
1522e5b6d6dSopenharmony_cicode points are prohibited from the input.
1532e5b6d6dSopenharmony_ci
1542e5b6d6dSopenharmony_ciIDNA specifies that the ToUnicode and ToASCII have options to check for
1552e5b6d6dSopenharmony_ciLetter-Digit-Hyphen code points and adhere to the STD3 ASCII Rules.
1562e5b6d6dSopenharmony_ci
1572e5b6d6dSopenharmony_ciIDNA specifies that domain labels are equivalent if and only if the output of
1582e5b6d6dSopenharmony_ciToASCII operation on the labels match using case insensitive ASCII comparison.
1592e5b6d6dSopenharmony_ci
1602e5b6d6dSopenharmony_ci## StringPrep Service in ICU
1612e5b6d6dSopenharmony_ci
1622e5b6d6dSopenharmony_ciThe StringPrep service in ICU is data driven. The service is based on
1632e5b6d6dSopenharmony_ciOpen-Use-Close pattern. A StringPrep profile is opened, the strings are
1642e5b6d6dSopenharmony_ciprocessed according to the rules specified in the profile and the profile is
1652e5b6d6dSopenharmony_ciclosed once the profile is ready to be disposed.
1662e5b6d6dSopenharmony_ci
1672e5b6d6dSopenharmony_ciTools for filtering RFC 3454 and producing a rule file that can be compiled into
1682e5b6d6dSopenharmony_cia binary format containing all the information required by the service are
1692e5b6d6dSopenharmony_ciprovided.
1702e5b6d6dSopenharmony_ci
1712e5b6d6dSopenharmony_ciThe procedure for producing a StringPrep profile data file are as given below:
1722e5b6d6dSopenharmony_ci
1732e5b6d6dSopenharmony_ci1.  Run filterRFC3454.pl Perl tool, to filter the RFC file and produce a rule
1742e5b6d6dSopenharmony_ci    file. The text file produced can be edited by the clients to add/delete
1752e5b6d6dSopenharmony_ci    mappings or add/delete prohibited code points.
1762e5b6d6dSopenharmony_ci
1772e5b6d6dSopenharmony_ci2.  Run the gensprep tool to compile the rule file into a binary format. The
1782e5b6d6dSopenharmony_ci    options to turn on normalization of strings and checking of bidirectional
1792e5b6d6dSopenharmony_ci    code points are passed as command line options to the tool. This tool
1802e5b6d6dSopenharmony_ci    produces a binary profile file with the extension “spp”.
1812e5b6d6dSopenharmony_ci
1822e5b6d6dSopenharmony_ci3.  Open the StringPrep profile with path to the binary and name of the binary
1832e5b6d6dSopenharmony_ci    profile file as the options to the open call. The profile data files are
1842e5b6d6dSopenharmony_ci    memory mapped and cached for optimum performance.
1852e5b6d6dSopenharmony_ci
1862e5b6d6dSopenharmony_ci### Code Snippets
1872e5b6d6dSopenharmony_ci
1882e5b6d6dSopenharmony_ci> :point_right: **Note**: The code snippets demonstrate the usage of the APIs. Applications should
1892e5b6d6dSopenharmony_cikeep the profile object around for reuse, instead of opening and closing the
1902e5b6d6dSopenharmony_ciprofile each time.*
1912e5b6d6dSopenharmony_ci
1922e5b6d6dSopenharmony_ci#### C++
1932e5b6d6dSopenharmony_ci
1942e5b6d6dSopenharmony_ci    UErrorCode status = U_ZERO_ERROR;
1952e5b6d6dSopenharmony_ci    UParseError parseError;
1962e5b6d6dSopenharmony_ci    /* open the StringPrep profile */
1972e5b6d6dSopenharmony_ci    UStringPrepProfile* nameprep = usprep_open("/usr/joe/mydata",
1982e5b6d6dSopenharmony_ci                                               "nfscsi", &status);
1992e5b6d6dSopenharmony_ci    if(U_FAILURE(status)) {
2002e5b6d6dSopenharmony_ci        /* handle the error */
2012e5b6d6dSopenharmony_ci    }
2022e5b6d6dSopenharmony_ci    /* prepare the string for use according
2032e5b6d6dSopenharmony_ci     * to the rules specified in the profile
2042e5b6d6dSopenharmony_ci     */
2052e5b6d6dSopenharmony_ci    int32_t retLen = usprep_prepare(src, srcLength, dest,
2062e5b6d6dSopenharmony_ci                                    destCapacity, USPREP_ALLOW_UNASSIGNED,
2072e5b6d6dSopenharmony_ci                                    nameprep, &parseError, &status);
2082e5b6d6dSopenharmony_ci    /* close the profile */
2092e5b6d6dSopenharmony_ci    usprep_close(nameprep);
2102e5b6d6dSopenharmony_ci
2112e5b6d6dSopenharmony_ci#### Java
2122e5b6d6dSopenharmony_ci
2132e5b6d6dSopenharmony_ci    private static final StringPrep nfscsi = null;
2142e5b6d6dSopenharmony_ci    //singleton instance
2152e5b6d6dSopenharmony_ci    private static final NFSCSIStringPrep prep=new NFSCSIStringPrep();
2162e5b6d6dSopenharmony_ci    private NFSCSIStringPrep() {
2172e5b6d6dSopenharmony_ci        try {
2182e5b6d6dSopenharmony_ci            InputStream nfscsiFile = TestUtil.getDataStream("nfscsi.spp");
2192e5b6d6dSopenharmony_ci            nfscsi = new StringPrep(nfscsiFile);
2202e5b6d6dSopenharmony_ci            nfscsiFile.close();
2212e5b6d6dSopenharmony_ci        } catch(IOException e) {
2222e5b6d6dSopenharmony_ci            throw new RuntimeException(e.toString());
2232e5b6d6dSopenharmony_ci        }
2242e5b6d6dSopenharmony_ci    }
2252e5b6d6dSopenharmony_ci    private static byte[] prepare(byte[] src, StringPrep prep)
2262e5b6d6dSopenharmony_ci            throws StringPrepParseException, UnsupportedEncodingException {
2272e5b6d6dSopenharmony_ci        String s = new String(src, "UTF-8");
2282e5b6d6dSopenharmony_ci        UCharacterIterator iter = UCharacterIterator.getInstance(s);
2292e5b6d6dSopenharmony_ci        StringBuffer out = prep.prepare(iter,StringPrep.DEFAULT);
2302e5b6d6dSopenharmony_ci        return out.toString().getBytes("UTF-8");
2312e5b6d6dSopenharmony_ci    }
2322e5b6d6dSopenharmony_ci
2332e5b6d6dSopenharmony_ci## IDNA API in ICU
2342e5b6d6dSopenharmony_ci
2352e5b6d6dSopenharmony_ciICU provides APIs for performing the ToASCII, ToUnicode and compare operations
2362e5b6d6dSopenharmony_cias defined by the RFC 3490. Convenience methods for comparing IDNs are also
2372e5b6d6dSopenharmony_ciprovided. These APIs follow ICU policies for string manipulation and coding
2382e5b6d6dSopenharmony_ciguidelines.
2392e5b6d6dSopenharmony_ci
2402e5b6d6dSopenharmony_ci### Code Snippets
2412e5b6d6dSopenharmony_ci
2422e5b6d6dSopenharmony_ci> :point_right: **Note**: The code snippets demonstrate the usage of the APIs. Applications should
2432e5b6d6dSopenharmony_cikeep the profile object around for reuse, instead of opening and closing the
2442e5b6d6dSopenharmony_ciprofile each time.*
2452e5b6d6dSopenharmony_ci
2462e5b6d6dSopenharmony_ci### ToASCII operation
2472e5b6d6dSopenharmony_ci
2482e5b6d6dSopenharmony_ci***C***
2492e5b6d6dSopenharmony_ci
2502e5b6d6dSopenharmony_ci    UChar* dest = (UChar*) malloc(destCapacity * U_SIZEOF_UCHAR);
2512e5b6d6dSopenharmony_ci    destLen = uidna_toASCII(src, srcLen, dest, destCapacity,
2522e5b6d6dSopenharmony_ci                            UIDNA_DEFAULT, &parseError, &status);
2532e5b6d6dSopenharmony_ci    if(status == U_BUFFER_OVERFLOW_ERROR) {
2542e5b6d6dSopenharmony_ci        status = U_ZERO_ERROR;
2552e5b6d6dSopenharmony_ci        destCapacity= destLen + 1; /* for the terminating Null */
2562e5b6d6dSopenharmony_ci        free(dest); /* free the memory */
2572e5b6d6dSopenharmony_ci        dest = (UChar*) malloc(destLen * U_SIZEOF_UCHAR);
2582e5b6d6dSopenharmony_ci        destLen = uidna_toASCII(src, srcLen, dest, destCapacity,
2592e5b6d6dSopenharmony_ci                                UIDNA_DEFAULT, &parseError, &status);
2602e5b6d6dSopenharmony_ci    }
2612e5b6d6dSopenharmony_ci    if(U_FAILURE(status)) {
2622e5b6d6dSopenharmony_ci        /* handle the error */
2632e5b6d6dSopenharmony_ci    }
2642e5b6d6dSopenharmony_ci    /* do interesting stuff with output*/
2652e5b6d6dSopenharmony_ci
2662e5b6d6dSopenharmony_ci***Java***
2672e5b6d6dSopenharmony_ci
2682e5b6d6dSopenharmony_ci    try {
2692e5b6d6dSopenharmony_ci        StringBuffer out= IDNA.convertToASCII(inBuf,IDNA.DEFAULT);
2702e5b6d6dSopenharmony_ci    } catch(StringPrepParseException ex) {
2712e5b6d6dSopenharmony_ci        /*handle the exception*/
2722e5b6d6dSopenharmony_ci    }
2732e5b6d6dSopenharmony_ci
2742e5b6d6dSopenharmony_ci### toUnicode operation
2752e5b6d6dSopenharmony_ci
2762e5b6d6dSopenharmony_ci***C***
2772e5b6d6dSopenharmony_ci
2782e5b6d6dSopenharmony_ci    UChar * dest = (UChar *) malloc(destCapacity * U_SIZEOF_UCHAR);
2792e5b6d6dSopenharmony_ci    destLen = uidna_toUnicode(src, srcLen, dest, destCapacity,
2802e5b6d6dSopenharmony_ci                              UIDNA_DEFAULT
2812e5b6d6dSopenharmony_ci                              &parseError, &status);
2822e5b6d6dSopenharmony_ci    if(status == U_BUFFER_OVERFLOW_ERROR) {
2832e5b6d6dSopenharmony_ci        status = U_ZERO_ERROR;
2842e5b6d6dSopenharmony_ci        destCapacity= destLen + 1; /* for the terminating Null */
2852e5b6d6dSopenharmony_ci        /* free the memory */
2862e5b6d6dSopenharmony_ci        free(dest);
2872e5b6d6dSopenharmony_ci        dest = (UChar*) malloc(destLen * U_SIZEOF_UCHAR);
2882e5b6d6dSopenharmony_ci        destLen = uidna_toUnicode(src, srcLen, dest, destCapacity,
2892e5b6d6dSopenharmony_ci                                  UIDNA_DEFAULT, &parseError, &status);
2902e5b6d6dSopenharmony_ci    }
2912e5b6d6dSopenharmony_ci    if(U_FAILURE(status)) {
2922e5b6d6dSopenharmony_ci        /* handle the error */
2932e5b6d6dSopenharmony_ci    }
2942e5b6d6dSopenharmony_ci    /* do interesting stuff with output*/
2952e5b6d6dSopenharmony_ci
2962e5b6d6dSopenharmony_ci***Java***
2972e5b6d6dSopenharmony_ci
2982e5b6d6dSopenharmony_ci    try {
2992e5b6d6dSopenharmony_ci        StringBuffer out= IDNA.convertToUnicode(inBuf,IDNA.DEFAULT);
3002e5b6d6dSopenharmony_ci    } catch(StringPrepParseException ex) {
3012e5b6d6dSopenharmony_ci        // handle the exception
3022e5b6d6dSopenharmony_ci    }
3032e5b6d6dSopenharmony_ci
3042e5b6d6dSopenharmony_ci### compare operation
3052e5b6d6dSopenharmony_ci
3062e5b6d6dSopenharmony_ci***C***
3072e5b6d6dSopenharmony_ci
3082e5b6d6dSopenharmony_ci    int32_t rc = uidna_compare(source1, length1,
3092e5b6d6dSopenharmony_ci                               source2, length2,
3102e5b6d6dSopenharmony_ci                               UIDNA_DEFAULT,
3112e5b6d6dSopenharmony_ci                               &status);
3122e5b6d6dSopenharmony_ci    if(rc==0) {
3132e5b6d6dSopenharmony_ci        /* the IDNs are same ... do something interesting */
3142e5b6d6dSopenharmony_ci    } else {
3152e5b6d6dSopenharmony_ci        /* the IDNs are different ... do something */
3162e5b6d6dSopenharmony_ci    }
3172e5b6d6dSopenharmony_ci
3182e5b6d6dSopenharmony_ci***Java***
3192e5b6d6dSopenharmony_ci
3202e5b6d6dSopenharmony_ci    try {
3212e5b6d6dSopenharmony_ci        int retVal = IDNA.compare(s1,s2,IDNA.DEFAULT);
3222e5b6d6dSopenharmony_ci        // do something interesting with retVal
3232e5b6d6dSopenharmony_ci    } catch(StringPrepParseException e) {
3242e5b6d6dSopenharmony_ci       // handle the exception
3252e5b6d6dSopenharmony_ci    }
3262e5b6d6dSopenharmony_ci
3272e5b6d6dSopenharmony_ci## Design Considerations
3282e5b6d6dSopenharmony_ci
3292e5b6d6dSopenharmony_ciStringPrep profiles exhibit the following characteristics:
3302e5b6d6dSopenharmony_ci
3312e5b6d6dSopenharmony_ci1.  The profiles contain information about code points. StringPrep allows
3322e5b6d6dSopenharmony_ci    profiles to add/delete code points or mappings.
3332e5b6d6dSopenharmony_ci
3342e5b6d6dSopenharmony_ci2.  Options such as turning normalization and checking for bidirectional code
3352e5b6d6dSopenharmony_ci    points on or off are the properties of the profiles.
3362e5b6d6dSopenharmony_ci
3372e5b6d6dSopenharmony_ci3.  The StringPrep algorithm is not overridden by the profile.
3382e5b6d6dSopenharmony_ci
3392e5b6d6dSopenharmony_ci4.  Once defined, the profiles do not change.
3402e5b6d6dSopenharmony_ci
3412e5b6d6dSopenharmony_ciThe StringPrep profiles are used in network protocols so runtime performance is
3422e5b6d6dSopenharmony_ciimportant.
3432e5b6d6dSopenharmony_ci
3442e5b6d6dSopenharmony_ciMany profiles have been and are being defined, so applications should be able to
3452e5b6d6dSopenharmony_ciplug-in arbitrary profiles and get the desired result out of the framework.
3462e5b6d6dSopenharmony_ci
3472e5b6d6dSopenharmony_ciICU is designed for this usage by providing build-time tools for arbitrary
3482e5b6d6dSopenharmony_ciStringPrep profile definitions, and loading them from application-supplied data
3492e5b6d6dSopenharmony_ciin binary form with data structures optimized for runtime use.
3502e5b6d6dSopenharmony_ci
3512e5b6d6dSopenharmony_ci## Demo
3522e5b6d6dSopenharmony_ci
3532e5b6d6dSopenharmony_ciA web application at <https://icu4c-demos.unicode.org/icu-bin/idnbrowser>
3542e5b6d6dSopenharmony_ciillustrates the use of IDNA API. The source code for the application is
3552e5b6d6dSopenharmony_ciavailable at <https://github.com/unicode-org/icu-demos/tree/main/idnbrowser>.
3562e5b6d6dSopenharmony_ci
3572e5b6d6dSopenharmony_ci## Appendix
3582e5b6d6dSopenharmony_ci
3592e5b6d6dSopenharmony_ci#### NFS Version 4 Profiles
3602e5b6d6dSopenharmony_ci
3612e5b6d6dSopenharmony_ciNetwork File System Version 4 defined by RFC 3530
3622e5b6d6dSopenharmony_ci(<http://www.rfc-editor.org/rfc/rfc3530.txt> ) defines use of Unicode text in
3632e5b6d6dSopenharmony_cithe protocol. ICU provides the requisite profiles as part of test suite and code
3642e5b6d6dSopenharmony_cifor processing the strings according the profiles as a part of samples.
3652e5b6d6dSopenharmony_ci
3662e5b6d6dSopenharmony_ciThe RFC defines three profiles :
3672e5b6d6dSopenharmony_ci
3682e5b6d6dSopenharmony_ci1.  *nfs4_cs_prep Profile*: This profile is used for preparing file and path
3692e5b6d6dSopenharmony_ci    name strings. Normalization of code points and checking for bidirectional
3702e5b6d6dSopenharmony_ci    code points are turned off. Case mappings are included if the NFS
3712e5b6d6dSopenharmony_ci    implementation supports case insensitive file and path names.
3722e5b6d6dSopenharmony_ci
3732e5b6d6dSopenharmony_ci2.  *nfs4_cis_prep Profile*: This profile is used for preparing NFS server
3742e5b6d6dSopenharmony_ci    names. Normalization of code points and checking for bidirectional code
3752e5b6d6dSopenharmony_ci    points are turned on. This profile is equivalent to NamePrep profile.
3762e5b6d6dSopenharmony_ci
3772e5b6d6dSopenharmony_ci3.  *nfs4_mixed_prep Profile*: This profile is used for preparing strings in the
3782e5b6d6dSopenharmony_ci    Access Control Entries of NFS servers. These strings consist of two parts,
3792e5b6d6dSopenharmony_ci    prefix and suffix, separated by '@' (U+0040). The prefix is processed with
3802e5b6d6dSopenharmony_ci    case mappings turned off and the suffix is processed with case mappings
3812e5b6d6dSopenharmony_ci    turned on. Normalization of code points and checking for bidirectional code
3822e5b6d6dSopenharmony_ci    points are turned on.
3832e5b6d6dSopenharmony_ci
3842e5b6d6dSopenharmony_ci#### XMPP Profiles
3852e5b6d6dSopenharmony_ci
3862e5b6d6dSopenharmony_ciExtensible Messaging and Presence Protocol (XMPP) is an XML based protocol for
3872e5b6d6dSopenharmony_cinear real-time extensible messaging and presence. This protocol defines use of
3882e5b6d6dSopenharmony_citwo StringPrep profiles:
3892e5b6d6dSopenharmony_ci
3902e5b6d6dSopenharmony_ci1.  *ResourcePrep Profile*: This profile is used for processing the resource
3912e5b6d6dSopenharmony_ci    identifiers within XMPP. Normalization of code points and checking of
3922e5b6d6dSopenharmony_ci    bidirectional code points are turned on. Case mappings are excluded. The
3932e5b6d6dSopenharmony_ci    space code point (U+0020) is excluded from the prohibited code points set.
3942e5b6d6dSopenharmony_ci
3952e5b6d6dSopenharmony_ci2.  *NodePrep Profile*: This profile is used for processing the node identifiers
3962e5b6d6dSopenharmony_ci    within XMPP. Normalization of code points and checking of bidirectional code
3972e5b6d6dSopenharmony_ci    points are turned on. Case mappings are included. All code points specified
3982e5b6d6dSopenharmony_ci    as prohibited in StringPrep are prohibited. Additional code points are added
3992e5b6d6dSopenharmony_ci    to the prohibited set.
400