12e5b6d6dSopenharmony_ci<html> 22e5b6d6dSopenharmony_ci<head> 32e5b6d6dSopenharmony_ci<meta http-equiv="Content-Language" content="en-us"> 42e5b6d6dSopenharmony_ci<meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> 52e5b6d6dSopenharmony_ci<title>ICU's Unicode Tools Read Me</title> 62e5b6d6dSopenharmony_ci <meta name="COPYRIGHT" content= 72e5b6d6dSopenharmony_ci "Copyright (c) 2004-2006 IBM Corporation and others. All Rights Reserved." /> 82e5b6d6dSopenharmony_ci<style> 92e5b6d6dSopenharmony_ci<!-- 102e5b6d6dSopenharmony_cili { margin-top: 0.5em; margin-bottom: 0.5em } 112e5b6d6dSopenharmony_ci--> 122e5b6d6dSopenharmony_ci</style> 132e5b6d6dSopenharmony_ci</head> 142e5b6d6dSopenharmony_ci 152e5b6d6dSopenharmony_ci<body> 162e5b6d6dSopenharmony_ci 172e5b6d6dSopenharmony_ci<h1>UnicodeTools</h1> 182e5b6d6dSopenharmony_ci<p>This file provides instructions for building and running the UnicodeTools, which<br> 192e5b6d6dSopenharmony_cican be used to:</p> 202e5b6d6dSopenharmony_ci<ul> 212e5b6d6dSopenharmony_ci <li>build the Derived Unicode files in the UCD (Unicode Character Database),</li> 222e5b6d6dSopenharmony_ci <li>build the transformed UCA (Unicode Collation Algorithm) files needed by ICU.</li> 232e5b6d6dSopenharmony_ci <li>run consistency checks on beta releases of the UCD and the UCA.</li> 242e5b6d6dSopenharmony_ci <li>build 4 chart folders on the unicode site</li> 252e5b6d6dSopenharmony_ci</ul> 262e5b6d6dSopenharmony_ci<p><font color="#FF0000"><b>WARNING!!</b></font></p> 272e5b6d6dSopenharmony_ci<ul> 282e5b6d6dSopenharmony_ci <li>This is NOT production level code, and should never be used in programs.</li> 292e5b6d6dSopenharmony_ci <li>The API is subject to change without notice, and will not be maintained.</li> 302e5b6d6dSopenharmony_ci <li>The source is uncommented, and has many warts; since it is not production code, it has not 312e5b6d6dSopenharmony_ci been worth the time to clean it up.</li> 322e5b6d6dSopenharmony_ci <li>It will probably not work on Unix or Mac without changing the file separator.</li> 332e5b6d6dSopenharmony_ci <li>Currently it uses hard-coded directory names.</li> 342e5b6d6dSopenharmony_ci <li>The contents of multiple versions of the UCD must be copied to a local directory, as described 352e5b6d6dSopenharmony_ci below.</li> 362e5b6d6dSopenharmony_ci</ul> 372e5b6d6dSopenharmony_ci<h2>Instructions:</h2> 382e5b6d6dSopenharmony_ci<h3>0. You will need to get ICU4J on your system, using CVS.</h3> 392e5b6d6dSopenharmony_ci<p>The rest of this will assume that you have set up CVS so that you load the ICU4J project into 402e5b6d6dSopenharmony_ciC:\ICU4J<br> 412e5b6d6dSopenharmony_ci<br> 422e5b6d6dSopenharmony_ciYou need both the main icu4j and a subproject called unicodetools. See: 432e5b6d6dSopenharmony_ci<a href="http://www.ibm.com/software/globalization/icu/repository.jsp"> 442e5b6d6dSopenharmony_cihttp://www.ibm.com/software/globalization/icu/repository.jsp</a>. Inside unicodetools, look at com/ibm/text. The 452e5b6d6dSopenharmony_cimain directories of interest are UCD, UCA and utility.</p> 462e5b6d6dSopenharmony_ci<h4>0a. If you are using Eclipse for your IDE, look at the instructions on 472e5b6d6dSopenharmony_ci<a href="http://icu.sourceforge.net/docs/eclipse_howto/eclipse_howto.html"> 482e5b6d6dSopenharmony_cihttp://icu.sourceforge.net/docs/eclipse_howto/eclipse_howto.html</a> </h4> 492e5b6d6dSopenharmony_ci<p>Set up Eclipse to build two projects: ICU4J and UnicodeTools:<br> 502e5b6d6dSopenharmony_ci<br> 512e5b6d6dSopenharmony_ci<b>Project Name: </b>ICU4J<br> 522e5b6d6dSopenharmony_ci<b>Directory: </b>C:\ICU4J\icu4j<br> 532e5b6d6dSopenharmony_ci<b>Default output folder = </b>ICU4J/classes<br> 542e5b6d6dSopenharmony_ci<br> 552e5b6d6dSopenharmony_ci<b>Project Name: </b>unicodetools<br> 562e5b6d6dSopenharmony_ci<b>Create project from existing source: </b>C:\ICU4J\unicodetools<br> 572e5b6d6dSopenharmony_ci<b>Default Output Folder: </b>unicodetools/classes<br> 582e5b6d6dSopenharmony_ci<br> 592e5b6d6dSopenharmony_ciAfter Eclipse is set up with these, exclude certain files from unicodetools:<br> 602e5b6d6dSopenharmony_ci<br> 612e5b6d6dSopenharmony_ciRight-Click UnicodeTools > Properties > Java Build Path > Exclusions<br> 622e5b6d6dSopenharmony_cicom/ibm/rbm/<br> 632e5b6d6dSopenharmony_cicom/ibm/text/utility/UnicodeMapInt.java<br> 642e5b6d6dSopenharmony_cicom/ibm/text/utility/TestUtility.java<br> 652e5b6d6dSopenharmony_cicom/ibm/text/UCD/GenerateThaiBreaks-old.java/<br> 662e5b6d6dSopenharmony_cicom/ibm/text/UCD/ProcessUnihan.java/<br> 672e5b6d6dSopenharmony_cicom/ibm/text/UCA/WriteHTMLCollation.java/<br> 682e5b6d6dSopenharmony_ci<br> 692e5b6d6dSopenharmony_ciUnicodeTools must also include the ICU4J project, with<br> 702e5b6d6dSopenharmony_ci<br> 712e5b6d6dSopenharmony_ciRight-Click UnicodeTools > Properties > Java Build Path > Projects</p> 722e5b6d6dSopenharmony_ci<h3>1. In UCD, you must edit UCD_Types.java at the top, to set the directories for the build:</h3> 732e5b6d6dSopenharmony_ci<p>public static final String DATA_DIR = "C:\\DATA\\";<br> 742e5b6d6dSopenharmony_cipublic static final String UCD_DIR = BASE_DIR + "UCD\\";<br> 752e5b6d6dSopenharmony_cipublic static final String BIN_DIR = DATA_DIR + "BIN\\";<br> 762e5b6d6dSopenharmony_cipublic static final String GEN_DIR = DATA_DIR + "GEN\\";<br> 772e5b6d6dSopenharmony_ci<br> 782e5b6d6dSopenharmony_ciMake sure that each of these directories exist. Also make sure that the following<br> 792e5b6d6dSopenharmony_ciexist:<br> 802e5b6d6dSopenharmony_ci<br> 812e5b6d6dSopenharmony_ci<GEN_DIR>/DerivedData<br> 822e5b6d6dSopenharmony_ci<GEN_DIR>/DerivedData/ExtractedProperties<br> 832e5b6d6dSopenharmony_ci<UCD_DIR>/EXTRAS-Update</p> 842e5b6d6dSopenharmony_ci<h3>2. Download all of the UnicodeData files for each version into UCD_DIR.</h3> 852e5b6d6dSopenharmony_ci<p>The folder names must be of the form: "3.2.0-Update", so rename the folders on the<br> 862e5b6d6dSopenharmony_ciUnicode site to this format. I<span style="background-color: #FFFF00">f the 872e5b6d6dSopenharmony_cifolder contains ucd, then make the contents of that directory be the contents of 882e5b6d6dSopenharmony_cithe x.x.x-Update directory. That is, each directory will directly contain files 892e5b6d6dSopenharmony_cilike PropList....txt</span></p> 902e5b6d6dSopenharmony_ci<h4>2a Ensure Complete Release</h4> 912e5b6d6dSopenharmony_ci<p>If you are downloading any "incomplete" release (one that does not contain a complete set of data 922e5b6d6dSopenharmony_cifiles for that release, you need to also download the previous complete release). Most of the N.M-Update 932e5b6d6dSopenharmony_cidirectoriess are complete, *except*:</p> 942e5b6d6dSopenharmony_ci<p>4.0-Update, which does not contain a copy of Unihan.txt and some other files<br> 952e5b6d6dSopenharmony_ci3.1-Update, which does not contain a copy of BidiMirroring.txt</p> 962e5b6d6dSopenharmony_ci<p>Also, make the following changes to UnicodeData for 1.1.5:</p> 972e5b6d6dSopenharmony_ci<p><b>Delete</b></p> 982e5b6d6dSopenharmony_ci<pre>3400;HANGUL SYLLABLE KIYEOK A;Lo;0;L;1100 1161;;;;N;;;;; 992e5b6d6dSopenharmony_ci... 1002e5b6d6dSopenharmony_ci4DFF;HANGUL SYLLABLE MIEUM WEO RIEUL-THIEUTH;Lo;0;L;1106 116F 11B4;;;;N;;;;; 1012e5b6d6dSopenharmony_ci4E00;<cjk IDEOGRAPH REPRESENTATIVE>;Lo;0;L;;;;;N;;;;;</pre> 1022e5b6d6dSopenharmony_ci<p><b>Add:</b></p> 1032e5b6d6dSopenharmony_ci<pre>4E00;<cjk Ideograph, First>;Lo;0;L;;;;;N;;;;; 1042e5b6d6dSopenharmony_ci9FA5;<cjk Ideograph, Last>;Lo;0;L;;;;;N;;;;; 1052e5b6d6dSopenharmony_ciE000;<private Use, First>;Co;0;L;;;;;N;;;;; 1062e5b6d6dSopenharmony_ciF8FF;<private Use, Last>;Co;0;L;;;;;N;;;;;</pre> 1072e5b6d6dSopenharmony_ci<p><b>And from a late version of Unicode, add:</b></p> 1082e5b6d6dSopenharmony_ci<pre>F900;CJK COMPATIBILITY IDEOGRAPH-F900;Lo;0;L;8C48;;;;N;;;;; 1092e5b6d6dSopenharmony_ci... 1102e5b6d6dSopenharmony_ciFA2D;CJK COMPATIBILITY IDEOGRAPH-FA2D;Lo;0;L;9DB4;;;;N;;;;;</pre> 1112e5b6d6dSopenharmony_ci<h4>2b. UCA data</h4> 1122e5b6d6dSopenharmony_ci<p>If you are building any of the UCA tools, you need to get a copy of the UCA data file<br> 1132e5b6d6dSopenharmony_cifrom http://www.unicode.org/reports/tr10/#AllKeys. The default location for this is:<br> 1142e5b6d6dSopenharmony_ci<br> 1152e5b6d6dSopenharmony_ciBASE_DIR + "Collation\allkeys" + VERSION + ".txt".<br> 1162e5b6d6dSopenharmony_ci<br> 1172e5b6d6dSopenharmony_ciIf you have it in a different location, change that value for KEYS in UCA.java, and <br> 1182e5b6d6dSopenharmony_cithe value for BASE_DIR</p> 1192e5b6d6dSopenharmony_ci<h4>2c. Here is an example of the default directory structure with files. All of 1202e5b6d6dSopenharmony_cithe yellow ones should exist</h4> 1212e5b6d6dSopenharmony_ci<pre>C://DATA/ 1222e5b6d6dSopenharmony_ci 1232e5b6d6dSopenharmony_ci BIN/ 1242e5b6d6dSopenharmony_ci 1252e5b6d6dSopenharmony_ci<span style="background-color: #FFFF00"> Collation/ 1262e5b6d6dSopenharmony_ci allkeys-3.1.1.txt 1272e5b6d6dSopenharmony_ci</span> 1282e5b6d6dSopenharmony_ci GEN/ 1292e5b6d6dSopenharmony_ci DerivedData/ 1302e5b6d6dSopenharmony_ci<span style="background-color: #FFFF00"> </span><span style="background-color: #FFFF00">UCD/ 1312e5b6d6dSopenharmony_ci 3.0.0-Update/ 1322e5b6d6dSopenharmony_ci Unihan-3.2.0.txt 1332e5b6d6dSopenharmony_ci ... 1342e5b6d6dSopenharmony_ci 3.0.1-Update/ 1352e5b6d6dSopenharmony_ci ... 1362e5b6d6dSopenharmony_ci 3.1.0-Update/ 1372e5b6d6dSopenharmony_ci ... 1382e5b6d6dSopenharmony_ci 3.1.1-Update/ 1392e5b6d6dSopenharmony_ci ... 1402e5b6d6dSopenharmony_ci 3.2.0-Update/ 1412e5b6d6dSopenharmony_ci ... 1422e5b6d6dSopenharmony_ci 4.0.0-Update/ 1432e5b6d6dSopenharmony_ci ArabicShaping-4.0.0d14b.txt 1442e5b6d6dSopenharmony_ci BidiMirroring-4.0.0d1b.txt 1452e5b6d6dSopenharmony_ci ... 1462e5b6d6dSopenharmony_ci EXTRAS-Update/</span></pre> 1472e5b6d6dSopenharmony_ci<h3>3. Versions</h3> 1482e5b6d6dSopenharmony_ci<p>All of the following have "version X" in the options you give to Java (either on the 1492e5b6d6dSopenharmony_cicommand line, or in the Eclipse 'run' options. If you want a specific version like 3.1.0, then you 1502e5b6d6dSopenharmony_ciwould write "version 3.1.1". If you want the latest version (4.1.0), you can omit the "version X".</p> 1512e5b6d6dSopenharmony_ci<h3>4. Building Files</h3> 1522e5b6d6dSopenharmony_ci<ol> 1532e5b6d6dSopenharmony_ci <li><b>Setup</b><ol> 1542e5b6d6dSopenharmony_ci <li>In Eclipse, open the Package Explorer (Use Window>Show View if you 1552e5b6d6dSopenharmony_ci don't see it)</li> 1562e5b6d6dSopenharmony_ci <li>Open UnicodeTools<ul> 1572e5b6d6dSopenharmony_ci <li>com.ibm.text.UCD<ul> 1582e5b6d6dSopenharmony_ci <li>MakeUnicodeFiles.<span style="background-color: #FFFF00">txt</span><p>This file drives the production of 1592e5b6d6dSopenharmony_ci the derived Unicode files. The first three lines contain 1602e5b6d6dSopenharmony_ci parameters that you may want to modify at some times:</p> 1612e5b6d6dSopenharmony_ci <pre>Generate: <b>.*script.*</b> <i>// this is a regular expression. Use .* for all files</i> 1622e5b6d6dSopenharmony_ciDeltaVersion: <b>10</b> <i> // This gets appended to the file name. Pick 1+ the highest value in Public</i> 1632e5b6d6dSopenharmony_ciCopyrightYear: <b>2006</b> <i> // Pick the current year</i></pre> 1642e5b6d6dSopenharmony_ci </li> 1652e5b6d6dSopenharmony_ci </ul> 1662e5b6d6dSopenharmony_ci </li> 1672e5b6d6dSopenharmony_ci </ul> 1682e5b6d6dSopenharmony_ci </li> 1692e5b6d6dSopenharmony_ci <li>Open in Package Explorer 1702e5b6d6dSopenharmony_ci <ul> 1712e5b6d6dSopenharmony_ci <li>com.ibm.text.UCD<ul> 1722e5b6d6dSopenharmony_ci <li>Main</li> 1732e5b6d6dSopenharmony_ci </ul> 1742e5b6d6dSopenharmony_ci </li> 1752e5b6d6dSopenharmony_ci </ul> 1762e5b6d6dSopenharmony_ci </li> 1772e5b6d6dSopenharmony_ci <li>Run>Run As...<ol> 1782e5b6d6dSopenharmony_ci <li>Choose Java Application<ul> 1792e5b6d6dSopenharmony_ci <li>it will fail, don't worry; you need to set some parameters.</li> 1802e5b6d6dSopenharmony_ci </ul> 1812e5b6d6dSopenharmony_ci </li> 1822e5b6d6dSopenharmony_ci </ol> 1832e5b6d6dSopenharmony_ci </li> 1842e5b6d6dSopenharmony_ci <li>Run>Run...<ul> 1852e5b6d6dSopenharmony_ci <li>Select the Arguments tab, and fill in the following<ul> 1862e5b6d6dSopenharmony_ci <li>Program arguments:<pre>build 5.0<span style="background-color: #FFFF00">.0</span> MakeUnicodeFiles</pre> 1872e5b6d6dSopenharmony_ci </li> 1882e5b6d6dSopenharmony_ci <li>VM arguments: 1892e5b6d6dSopenharmony_ci <pre>-Xms512m -Xmx512m</pre> 1902e5b6d6dSopenharmony_ci </li> 1912e5b6d6dSopenharmony_ci </ul> 1922e5b6d6dSopenharmony_ci </li> 1932e5b6d6dSopenharmony_ci <li>Close and Save</li> 1942e5b6d6dSopenharmony_ci </ul> 1952e5b6d6dSopenharmony_ci </li> 1962e5b6d6dSopenharmony_ci </ol> 1972e5b6d6dSopenharmony_ci </li> 1982e5b6d6dSopenharmony_ci <li><b>Run</b><ol> 1992e5b6d6dSopenharmony_ci <li>You'll see it build the 5.0 files, with something like the following 2002e5b6d6dSopenharmony_ci results:<pre>Writing UCD_Data5.0.0 2012e5b6d6dSopenharmony_ciData Size: 109,802 2022e5b6d6dSopenharmony_ciWrote Data 109802</pre> 2032e5b6d6dSopenharmony_ci </li> 2042e5b6d6dSopenharmony_ci <li>For each version, the tools build a set of binary data in BIN that 2052e5b6d6dSopenharmony_ci contain the information for that release. This is done automatically, or 2062e5b6d6dSopenharmony_ci you can manually do it with the Program Arguments<pre>version X build</pre> 2072e5b6d6dSopenharmony_ci <p>This builds an compressed format of all the UCD data (except blocks 2082e5b6d6dSopenharmony_ci and Unihan) into the BIN directory. Don't worry about the voluminous 2092e5b6d6dSopenharmony_ci console messages, unless one says "FAIL".</p> 2102e5b6d6dSopenharmony_ci <p><font color="#FF0000"><i>You have to manually do this if you change 2112e5b6d6dSopenharmony_ci any of the data files in that version!</i></font></p> 2122e5b6d6dSopenharmony_ci <p>Note: if for any reason you modify the binary format of the BIN files, you also have to bump the 2132e5b6d6dSopenharmony_civalue in that file:</p> 2142e5b6d6dSopenharmony_ci <pre>static final byte BINARY_FORMAT = 8; // bumped if binary format of UCD changes</pre> 2152e5b6d6dSopenharmony_ci </li> 2162e5b6d6dSopenharmony_ci </ol> 2172e5b6d6dSopenharmony_ci </li> 2182e5b6d6dSopenharmony_ci <li>Results in <a href="file:///C:/DATA/GEN/DerivedData"> 2192e5b6d6dSopenharmony_ci C:\DATA\GEN\DerivedData</a><ol> 2202e5b6d6dSopenharmony_ci <li>The files will be in this directory.</li> 2212e5b6d6dSopenharmony_ci <li>There are also DIFF folders, that contain BAT files that you can run 2222e5b6d6dSopenharmony_ci on Windows with CompareIt. (You can modify the code to build BATs with 2232e5b6d6dSopenharmony_ci another Diff program if you want).<ol> 2242e5b6d6dSopenharmony_ci <li>For any file with a significant difference, it will build two 2252e5b6d6dSopenharmony_ci BAT files, such as the first two below.<pre>Diff_PropList-5.0.0d10.txt.bat 2262e5b6d6dSopenharmony_ciOLDER-Diff_PropList-5.0.0d10.txt.bat 2272e5b6d6dSopenharmony_ci 2282e5b6d6dSopenharmony_ciUNCHANGED-Diff_PropertyValueAliases-5.0.0d10.txt.bat</pre> 2292e5b6d6dSopenharmony_ci </li> 2302e5b6d6dSopenharmony_ci </ol> 2312e5b6d6dSopenharmony_ci </li> 2322e5b6d6dSopenharmony_ci <li>Any files without significant changes will have "UNCHANGED" as a 2332e5b6d6dSopenharmony_ci prefix: ignore them. The OLDER prefix is the comparison to the 2342e5b6d6dSopenharmony_ci last version of Unicode.</li> 2352e5b6d6dSopenharmony_ci <li>On Windows you can run these BATs to compare files:</li> 2362e5b6d6dSopenharmony_ci </ol> 2372e5b6d6dSopenharmony_ci </li> 2382e5b6d6dSopenharmony_ci <li><span style="background-color: #FFFF00">NFSkippable</span><ol> 2392e5b6d6dSopenharmony_ci <li><span style="background-color: #FFFF00">A file is needed by ICU that is 2402e5b6d6dSopenharmony_ci generated with the same tool. Just use the input parameter "NFSkippable" to 2412e5b6d6dSopenharmony_ci generate the file NFSafeSets.txt, also in </span> 2422e5b6d6dSopenharmony_ci <a href="file:///C:/DATA/GEN"><span style="background-color: #FFFF00"> 2432e5b6d6dSopenharmony_ci file:///C:/DATA/GEN</span></a></li> 2442e5b6d6dSopenharmony_ci</ol> 2452e5b6d6dSopenharmony_ci </li> 2462e5b6d6dSopenharmony_ci</ol> 2472e5b6d6dSopenharmony_ci<h3>5. Invariant Checking</h3> 2482e5b6d6dSopenharmony_ci<ol> 2492e5b6d6dSopenharmony_ci <li>Setup<ol> 2502e5b6d6dSopenharmony_ci <li>Open in Package Explorer<ul> 2512e5b6d6dSopenharmony_ci <li>com.ibm.text.UCD<ul> 2522e5b6d6dSopenharmony_ci <li>TestUnicodeInvariants.java</li> 2532e5b6d6dSopenharmony_ci </ul> 2542e5b6d6dSopenharmony_ci </li> 2552e5b6d6dSopenharmony_ci </ul> 2562e5b6d6dSopenharmony_ci </li> 2572e5b6d6dSopenharmony_ci <li>Run>Run As... Java Application<br> 2582e5b6d6dSopenharmony_ci Will create the following file of results:<pre><a href="file:///C:/DATA/GEN/UnicodeInvariantResults.txt/">C:\DATA\GEN\UnicodeInvariantResults.txt\</a></pre> 2592e5b6d6dSopenharmony_ci <p>And on the console will list whether any problems are found. Thus in 2602e5b6d6dSopenharmony_ci the following case there was one failure:</p> 2612e5b6d6dSopenharmony_ci <pre>ParseErrorCount=0 2622e5b6d6dSopenharmony_ciTestFailureCount=1</pre> 2632e5b6d6dSopenharmony_ci </li> 2642e5b6d6dSopenharmony_ci <li>The header of the result file explains the syntax of the tests.</li> 2652e5b6d6dSopenharmony_ci <li>Open that file and search for "**** START Error Info ****". Each such 2662e5b6d6dSopenharmony_ci point provides a dump of comparison information.<ol> 2672e5b6d6dSopenharmony_ci <li>Failures print a list of differences between two sets being 2682e5b6d6dSopenharmony_ci compared. So if A and B are being compared, it prints all the items in 2692e5b6d6dSopenharmony_ci A-B, then in B-A, then in A&B.</li> 2702e5b6d6dSopenharmony_ci <li>For example, here is a listing of a problem that must be corrected. 2712e5b6d6dSopenharmony_ci Note that usually there is a comment that explains what the following 2722e5b6d6dSopenharmony_ci line or lines are supposed to test. Then will come false (indicating 2732e5b6d6dSopenharmony_ci that the test failed), then the detailed error report.<pre><span style="font-size: 9pt"># Canonical decompositions (minus exclusions) must be identical across releases 2742e5b6d6dSopenharmony_ci[$Decomposition_Type:Canonical - $Full_Composition_Exclusion] = [$�Decomposition_Type:Canonical - $�Full_Composition_Exclusion] 2752e5b6d6dSopenharmony_ci 2762e5b6d6dSopenharmony_cifalse 2772e5b6d6dSopenharmony_ci**** START Error Info **** 2782e5b6d6dSopenharmony_ci 2792e5b6d6dSopenharmony_ciIn [$�Decomposition_Type:Canonical - $�Full_Composition_Exclusion], but not in [$Decomposition_Type:Canonical - $Full_Composition_Exclusion] : 2802e5b6d6dSopenharmony_ci 2812e5b6d6dSopenharmony_ci# Total code points: 0 2822e5b6d6dSopenharmony_ci 2832e5b6d6dSopenharmony_ciNot in [$�Decomposition_Type:Canonical - $�Full_Composition_Exclusion], but in [$Decomposition_Type:Canonical - $Full_Composition_Exclusion] : 2842e5b6d6dSopenharmony_ci1B06 # Lo BALINESE LETTER AKARA TEDUNG 2852e5b6d6dSopenharmony_ci1B08 # Lo BALINESE LETTER IKARA TEDUNG 2862e5b6d6dSopenharmony_ci1B0A # Lo BALINESE LETTER UKARA TEDUNG 2872e5b6d6dSopenharmony_ci1B0C # Lo BALINESE LETTER RA REPA TEDUNG 2882e5b6d6dSopenharmony_ci1B0E # Lo BALINESE LETTER LA LENGA TEDUNG 2892e5b6d6dSopenharmony_ci1B12 # Lo BALINESE LETTER OKARA TEDUNG 2902e5b6d6dSopenharmony_ci1B3B # Mc BALINESE VOWEL SIGN RA REPA TEDUNG 2912e5b6d6dSopenharmony_ci1B3D # Mc BALINESE VOWEL SIGN LA LENGA TEDUNG 2922e5b6d6dSopenharmony_ci1B40..1B41 # Mc [2] BALINESE VOWEL SIGN TALING TEDUNG..BALINESE VOWEL SIGN TALING REPA TEDUNG 2932e5b6d6dSopenharmony_ci1B43 # Mc BALINESE VOWEL SIGN PEPET TEDUNG 2942e5b6d6dSopenharmony_ci 2952e5b6d6dSopenharmony_ci# Total code points: 11 2962e5b6d6dSopenharmony_ci 2972e5b6d6dSopenharmony_ciIn both [$�Decomposition_Type:Canonical - $�Full_Composition_Exclusion], and in [$Decomposition_Type:Canonical - $Full_Composition_Exclusion] : 2982e5b6d6dSopenharmony_ci00C0..00C5 # L& [6] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER A WITH RING ABOVE 2992e5b6d6dSopenharmony_ci00C7..00CF # L& [9] LATIN CAPITAL LETTER C WITH CEDILLA..LATIN CAPITAL LETTER I WITH DIAERESIS 3002e5b6d6dSopenharmony_ci00D1..00D6 # L& [6] LATIN CAPITAL LETTER N WITH TILDE..LATIN CAPITAL LETTER O WITH DIAERESIS 3012e5b6d6dSopenharmony_ci... 3022e5b6d6dSopenharmony_ci30F7..30FA # Lo [4] KATAKANA LETTER VA..KATAKANA LETTER VO 3032e5b6d6dSopenharmony_ci30FE # Lm KATAKANA VOICED ITERATION MARK 3042e5b6d6dSopenharmony_ciAC00..D7A3 # Lo [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH 3052e5b6d6dSopenharmony_ci 3062e5b6d6dSopenharmony_ci# Total code points: 12089 3072e5b6d6dSopenharmony_ci**** END Error Info ****</span></pre> 3082e5b6d6dSopenharmony_ci </li> 3092e5b6d6dSopenharmony_ci </ol> 3102e5b6d6dSopenharmony_ci </li> 3112e5b6d6dSopenharmony_ci <li>Options:<ol> 3122e5b6d6dSopenharmony_ci <li>-r Print the failures as a range list.</li> 3132e5b6d6dSopenharmony_ci <li>-fxxx Use a different input file, such as -fInvariantTest.txt</li> 3142e5b6d6dSopenharmony_ci </ol> 3152e5b6d6dSopenharmony_ci </li> 3162e5b6d6dSopenharmony_ci </ol> 3172e5b6d6dSopenharmony_ci </li> 3182e5b6d6dSopenharmony_ci</ol> 3192e5b6d6dSopenharmony_ci<h3>6. Options</h3> 3202e5b6d6dSopenharmony_ci<ol> 3212e5b6d6dSopenharmony_ci <li>If you want to see files that are opened while processing, do the 3222e5b6d6dSopenharmony_ci following:<ol> 3232e5b6d6dSopenharmony_ci <li>Run>Run</li> 3242e5b6d6dSopenharmony_ci <li>Select the Arguments tab, and add the following<ol> 3252e5b6d6dSopenharmony_ci <li>VM arguments: 3262e5b6d6dSopenharmony_ci <pre>-DSHOW_FILES</pre> 3272e5b6d6dSopenharmony_ci </li> 3282e5b6d6dSopenharmony_ci </ol> 3292e5b6d6dSopenharmony_ci </li> 3302e5b6d6dSopenharmony_ci </ol> 3312e5b6d6dSopenharmony_ci </li> 3322e5b6d6dSopenharmony_ci</ol> 3332e5b6d6dSopenharmony_ci<h3>5. UCA</h3> 3342e5b6d6dSopenharmony_ci<ol> 3352e5b6d6dSopenharmony_ci <li>You will use com.ibm.text.UCA.Main as your main class, creating along 3362e5b6d6dSopenharmony_ci the same lines as above.</li> 3372e5b6d6dSopenharmony_ci <li>To test whether the UCA files are valid, use the 3382e5b6d6dSopenharmony_ci <span style="font-weight: 400">options (<i>note: you must also build the ICU 3392e5b6d6dSopenharmony_ci files below, since they test other aspects</i>).</span><pre>writeCollationValidityLog</pre> 3402e5b6d6dSopenharmony_ci <p>It will create a file:</p> 3412e5b6d6dSopenharmony_ci <pre><a href="file:///C:/DATA/GEN/collation/5.0.0/CheckCollationValidity.html">C:\DATA\GEN\collation\5.0.0\CheckCollationValidity.html</a></pre> 3422e5b6d6dSopenharmony_ci <ol> 3432e5b6d6dSopenharmony_ci <li>Review this file. It will list errors. Some of those are actually 3442e5b6d6dSopenharmony_ci warnings, and indicate possible problems (this is indicated in the text, 3452e5b6d6dSopenharmony_ci such as by: "These are not necessarily errors, but should be examined for 3462e5b6d6dSopenharmony_ci <i>possible</i> errors"). In those cases, the items should be reviewed to make 3472e5b6d6dSopenharmony_ci sure that there are no inadvertent problems.</li> 3482e5b6d6dSopenharmony_ci <li>If it is not so marked, it is a true error, and must be fixed.</li> 3492e5b6d6dSopenharmony_ci <li>At the end, there is section <b>11. Coverage</b>. There are two sections:<ol> 3502e5b6d6dSopenharmony_ci <li>In UCDxxx, but not in allkeys. Check this over to make sure that these 3512e5b6d6dSopenharmony_ci are all the characters that should get <b><i>implicit</i></b> weights.</li> 3522e5b6d6dSopenharmony_ci <li>In allkeys, but not in UCD. These should be <b><i>only</i></b> 3532e5b6d6dSopenharmony_ci contractions. Check them over to make sure they look right also.</li> 3542e5b6d6dSopenharmony_ci </ol></li> 3552e5b6d6dSopenharmony_ci </ol></li> 3562e5b6d6dSopenharmony_ci <li> 3572e5b6d6dSopenharmony_ci <h4><span style="font-weight: 400">To build all the charts (including for 3582e5b6d6dSopenharmony_ci the UCA), use the options: </span></h4> 3592e5b6d6dSopenharmony_ci <pre>normalizationChart caseChart scriptChart indexChart</pre> 3602e5b6d6dSopenharmony_ci <p>They will be built into</p> 3612e5b6d6dSopenharmony_ci <pre><a href="file:///C:/DATA/GEN/charts">C:\DATA\GEN\charts</a></pre> 3622e5b6d6dSopenharmony_ci <p><b>Once UCA is released, then copy those files up to the right spots in 3632e5b6d6dSopenharmony_ci the Unicode site:</b><ul> 3642e5b6d6dSopenharmony_ci <li> 3652e5b6d6dSopenharmony_ci <pre><a href="http://www.unicode.org/charts/normalization/">http://www.unicode.org/charts/normalization/</a></pre> 3662e5b6d6dSopenharmony_ci </li> 3672e5b6d6dSopenharmony_ci <li> 3682e5b6d6dSopenharmony_ci <pre><a href="http://www.unicode.org/charts/collation/">http://www.unicode.org/charts/collation/</a> </pre> 3692e5b6d6dSopenharmony_ci </li> 3702e5b6d6dSopenharmony_ci <li> 3712e5b6d6dSopenharmony_ci <pre><a href="http://www.unicode.org/charts/case/">http://www.unicode.org/charts/case/</a> </pre> 3722e5b6d6dSopenharmony_ci </li> 3732e5b6d6dSopenharmony_ci <li> 3742e5b6d6dSopenharmony_ci <pre><a href="http://www.unicode.org/charts/collation/">http://www.unicode.org/charts/collation/</a> </pre> 3752e5b6d6dSopenharmony_ci </li> 3762e5b6d6dSopenharmony_ci </ul> 3772e5b6d6dSopenharmony_ci </li> 3782e5b6d6dSopenharmony_ci <li> 3792e5b6d6dSopenharmony_ci <h4><span style="font-weight: 400">To build all the UCA files used by ICU, use the 3802e5b6d6dSopenharmony_ci option:</span></h4> 3812e5b6d6dSopenharmony_ci <pre>ICU</pre> 3822e5b6d6dSopenharmony_ci <p>They will be built into:</p> 3832e5b6d6dSopenharmony_ci <pre><a href="file:///C:/DATA/GEN/collation/5.0.0">C:\DATA\GEN\collation\5.0.0</a></pre> 3842e5b6d6dSopenharmony_ci </li> 3852e5b6d6dSopenharmony_ci <li>You should then build a set of the ICU files for the previous version, 3862e5b6d6dSopenharmony_ci if you don't have them. Use the options:<pre>version 4.1.0 ICU</pre> 3872e5b6d6dSopenharmony_ci <p>Or whatever the last version was.</li> 3882e5b6d6dSopenharmony_ci <li>Now, you will want to compare versions. The key file is 3892e5b6d6dSopenharmony_ci UCA_Rules_NoCE.txt. It contains the rules expressed in ICU format, which 3902e5b6d6dSopenharmony_ci allows for comparison across versions of UCA without spurious variations of 3912e5b6d6dSopenharmony_ci the numbers getting in the way.<ol> 3922e5b6d6dSopenharmony_ci <li>Do a Diff between the last and current versions of these files, and 3932e5b6d6dSopenharmony_ci verify that all the differences are either new characters, or were 3942e5b6d6dSopenharmony_ci authorized to be changed by the UTC.</li> 3952e5b6d6dSopenharmony_ci </ol></li> 3962e5b6d6dSopenharmony_ci</ol> 3972e5b6d6dSopenharmony_ci 3982e5b6d6dSopenharmony_ci</body> 3992e5b6d6dSopenharmony_ci 4002e5b6d6dSopenharmony_ci</html>