12e5b6d6dSopenharmony_ci--- 22e5b6d6dSopenharmony_cilayout: default 32e5b6d6dSopenharmony_cititle: ICU Data 42e5b6d6dSopenharmony_cinav_order: 1600 52e5b6d6dSopenharmony_cihas_children: true 62e5b6d6dSopenharmony_ci--- 72e5b6d6dSopenharmony_ci<!-- 82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others. 92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html 102e5b6d6dSopenharmony_ci--> 112e5b6d6dSopenharmony_ci 122e5b6d6dSopenharmony_ci# ICU Data 132e5b6d6dSopenharmony_ci{: .no_toc } 142e5b6d6dSopenharmony_ci 152e5b6d6dSopenharmony_ci## Contents 162e5b6d6dSopenharmony_ci{: .no_toc .text-delta } 172e5b6d6dSopenharmony_ci 182e5b6d6dSopenharmony_ci1. TOC 192e5b6d6dSopenharmony_ci{:toc} 202e5b6d6dSopenharmony_ci 212e5b6d6dSopenharmony_ci--- 222e5b6d6dSopenharmony_ci 232e5b6d6dSopenharmony_ci## Overview 242e5b6d6dSopenharmony_ci 252e5b6d6dSopenharmony_ciICU makes use of a wide variety of data tables to provide many of its services. 262e5b6d6dSopenharmony_ciExamples include converter mapping tables, collation rules, transliteration 272e5b6d6dSopenharmony_cirules, break iterator rules and dictionaries, and other locale data. Additional 282e5b6d6dSopenharmony_cidata can be provided by users, either as customizations of ICU's data or as new 292e5b6d6dSopenharmony_cidata altogether. 302e5b6d6dSopenharmony_ci 312e5b6d6dSopenharmony_ciThis section describes how ICU data is stored and located at run time. It also 322e5b6d6dSopenharmony_cidescribes how ICU data can be customized to suit the needs of a particular 332e5b6d6dSopenharmony_ciapplication. 342e5b6d6dSopenharmony_ci 352e5b6d6dSopenharmony_ciFor simple use of ICU's predefined data, this section on data management can 362e5b6d6dSopenharmony_cisafely be skipped. The data is built into a library that is loaded along with 372e5b6d6dSopenharmony_cithe rest of ICU. No specific action or setup is required of either the 382e5b6d6dSopenharmony_ciapplication program or the execution environment. 392e5b6d6dSopenharmony_ci 402e5b6d6dSopenharmony_ciUpdate: as of ICU 64, the standard data library is over 20 MB in size. We have 412e5b6d6dSopenharmony_ciintroduced a new tool, the [ICU Data Build Tool](./buildtool.md), 422e5b6d6dSopenharmony_cito give you more control over what goes into your ICU locale data file. 432e5b6d6dSopenharmony_ci 442e5b6d6dSopenharmony_ci> :point_right: **Note**: ICU for C by default comes with pre-built data. 452e5b6d6dSopenharmony_ci> The source data files are included as an "icu\*data.zip" file starting in ICU4C 49. 462e5b6d6dSopenharmony_ci> Previously, they were not included unless ICU is downloaded from the [source repository](https://icu.unicode.org/repository). 472e5b6d6dSopenharmony_ci 482e5b6d6dSopenharmony_ci## ICU and CLDR Data 492e5b6d6dSopenharmony_ci 502e5b6d6dSopenharmony_ciMost of ICU's data is sourced from [CLDR](http://cldr.unicode.org), the [Common 512e5b6d6dSopenharmony_ciLocale Data Repository](http://cldr.unicode.org) project. Do not file bugs 522e5b6d6dSopenharmony_ciagainst ICU to request data changes in CLDR, see the CLDR project's page itself. 532e5b6d6dSopenharmony_ciAlso note that most ICU data files are therefore autogenerated from CLDR, and so 542e5b6d6dSopenharmony_cimanually editing them is not usually recommended. 552e5b6d6dSopenharmony_ci 562e5b6d6dSopenharmony_ciData which is NOT sourced from CLDR includes: 572e5b6d6dSopenharmony_ci 582e5b6d6dSopenharmony_ci* [Conversion Data](conversion/data.md) 592e5b6d6dSopenharmony_ci* Break Iterator Dictionary Data ( Thai, CJK, etc ) 602e5b6d6dSopenharmony_ci* Break Iterator Rule Data (as of this writing, it is manually kept in sync 612e5b6d6dSopenharmony_ci with the CLDR datasets) 622e5b6d6dSopenharmony_ci 632e5b6d6dSopenharmony_ciFor information on building ICU data from CLDR, see the 642e5b6d6dSopenharmony_ci[cldr-icu-readme](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt). 652e5b6d6dSopenharmony_ci 662e5b6d6dSopenharmony_ci## ICU Data Directory 672e5b6d6dSopenharmony_ci 682e5b6d6dSopenharmony_ciThe ICU data directory is the default location for all ICU data. Any requests 692e5b6d6dSopenharmony_cifor data items that do not include an explicit directory path will be resolved 702e5b6d6dSopenharmony_cito files located in the ICU data directory. 712e5b6d6dSopenharmony_ci 722e5b6d6dSopenharmony_ciThe ICU data directory is determined as follows: 732e5b6d6dSopenharmony_ci 742e5b6d6dSopenharmony_ci1. If the application has called the function `u_setDataDirectory()`, use the 752e5b6d6dSopenharmony_ci directory specified there, otherwise: 762e5b6d6dSopenharmony_ci 772e5b6d6dSopenharmony_ci2. If the environment variable `ICU_DATA` is set, use that, otherwise: 782e5b6d6dSopenharmony_ci 792e5b6d6dSopenharmony_ci3. If the C preprocessor variable `ICU_DATA_DIR` was set at the time ICU was 802e5b6d6dSopenharmony_ci built, use its compiled-in value. 812e5b6d6dSopenharmony_ci 822e5b6d6dSopenharmony_ci4. Otherwise, the ICU data directory is an empty string. This is the default 832e5b6d6dSopenharmony_ci behavior for ICU using a shared library for its data and provides the 842e5b6d6dSopenharmony_ci highest data loading performance. 852e5b6d6dSopenharmony_ci 862e5b6d6dSopenharmony_ci> :point_right: **Note**: `u_setDataDirectory()` is not thread-safe. Call it 872e5b6d6dSopenharmony_ci> *before* calling ICU APIs from multiple threads. If you use both 882e5b6d6dSopenharmony_ci> `u_setDataDirectory()` and `u_init()`, then use `u_setDataDirectory()` first. 892e5b6d6dSopenharmony_ci> 902e5b6d6dSopenharmony_ci> *Earlier versions of ICU supported two additional schemes: setting a data 912e5b6d6dSopenharmony_ci> directory relative to the location of the ICU shared libraries, and on Windows, 922e5b6d6dSopenharmony_ci> taking a location from the registry. These have both been removed to make the 932e5b6d6dSopenharmony_ci> behavior more predictable and easier to understand.* 942e5b6d6dSopenharmony_ci 952e5b6d6dSopenharmony_ciThe ICU data directory does not need to be set in order to reference the 962e5b6d6dSopenharmony_cistandard built-in ICU data. Applications that just use standard ICU capabilities 972e5b6d6dSopenharmony_ci(converters, locales, collation, etc.) but do not build and reference their own 982e5b6d6dSopenharmony_cidata do not need to specify an ICU data directory. 992e5b6d6dSopenharmony_ci 1002e5b6d6dSopenharmony_ci### Multiple-Item ICU Data Directory Values 1012e5b6d6dSopenharmony_ci 1022e5b6d6dSopenharmony_ciThe ICU data directory string can contain multiple directories as well as .dat 1032e5b6d6dSopenharmony_cipath/filenames. They must be separated by the path separator that is used on the 1042e5b6d6dSopenharmony_ciplatform, for example a semicolon (`;`) on Windows. Data files will be searched in 1052e5b6d6dSopenharmony_ciall directories and .dat package files in the order of the directory string. For 1062e5b6d6dSopenharmony_cidetails, see the example below. 1072e5b6d6dSopenharmony_ci 1082e5b6d6dSopenharmony_ci## Default ICU Data 1092e5b6d6dSopenharmony_ci 1102e5b6d6dSopenharmony_ciThe default ICU data consists of the data needed for the converters, collators, 1112e5b6d6dSopenharmony_cilocales, etc. that are provided with ICU. Default data must be present in order 1122e5b6d6dSopenharmony_cifor ICU to function. 1132e5b6d6dSopenharmony_ci 1142e5b6d6dSopenharmony_ciThe default data is most commonly built into a shared library that is installed 1152e5b6d6dSopenharmony_ciwith the other ICU libraries. Nothing is required of the application for this 1162e5b6d6dSopenharmony_cimechanism to work. ICU provides additional options for loading the default data 1172e5b6d6dSopenharmony_ciif more flexibility is required. 1182e5b6d6dSopenharmony_ci 1192e5b6d6dSopenharmony_ciHere are the steps followed by ICU to locate its default data. This procedure 1202e5b6d6dSopenharmony_cihappens only once per process, at the time an ICU data item is first requested. 1212e5b6d6dSopenharmony_ci 1222e5b6d6dSopenharmony_ci1. If the application has called the function `udata_setCommonData()`, use the 1232e5b6d6dSopenharmony_ci data that was provided. The application specifies the address in memory of 1242e5b6d6dSopenharmony_ci an image of an ICU common format data file (either in shared-library format 1252e5b6d6dSopenharmony_ci or .dat package file format). 1262e5b6d6dSopenharmony_ci 1272e5b6d6dSopenharmony_ci2. Examine the contents of the default ICU data shared library. If it contains 1282e5b6d6dSopenharmony_ci data, use that data. If the data library is empty, a stub library, proceed 1292e5b6d6dSopenharmony_ci to the next step. (A data shared library must always be present in order for 1302e5b6d6dSopenharmony_ci ICU to successfully link and load. A stub data library is used when the 1312e5b6d6dSopenharmony_ci actual ICU common data is to be provided from another source). 1322e5b6d6dSopenharmony_ci 1332e5b6d6dSopenharmony_ci3. Dynamically load (memory map, typically) a common format (.dat) file 1342e5b6d6dSopenharmony_ci containing the default ICU data. Loading is described in the section 1352e5b6d6dSopenharmony_ci [How Data Loading Works](#how-data-loading-works). The path to 1362e5b6d6dSopenharmony_ci the data is of the form "icudt\<version\>\<flag\>", where \<version\> is 1372e5b6d6dSopenharmony_ci the two-digit ICU version number, and \<flag\> is a letter indicating the 1382e5b6d6dSopenharmony_ci internal format of the file (see the 1392e5b6d6dSopenharmony_ci [Sharing ICU Data Between Platforms](#sharing-icu-data-between-platforms) 1402e5b6d6dSopenharmony_ci section). 1412e5b6d6dSopenharmony_ci 1422e5b6d6dSopenharmony_ciOnce the default ICU data has been located, loading of individual data items 1432e5b6d6dSopenharmony_ciproceeds as described in the section 1442e5b6d6dSopenharmony_ci[How Data Loading Works](#how-data-loading-works). 1452e5b6d6dSopenharmony_ci 1462e5b6d6dSopenharmony_ci## Building and Linking against ICU data 1472e5b6d6dSopenharmony_ci 1482e5b6d6dSopenharmony_ciWhen using ICU's configure or runConfigureICU tool to build, several different 1492e5b6d6dSopenharmony_cimethods of packging are available. 1502e5b6d6dSopenharmony_ci 1512e5b6d6dSopenharmony_ci> :point_right: **Note**: in all cases, you **must** link all ICU tools and 1522e5b6d6dSopenharmony_ciapplications against a "data library": either a data library containing the ICU 1532e5b6d6dSopenharmony_cidata, or against the "stubdata" library located in icu/source/stubdata. For 1542e5b6d6dSopenharmony_ciexample, even if ICU is built in "files" mode, you must still link against the 1552e5b6d6dSopenharmony_ci"stubdata" library or an undefined symbol error occurs. 1562e5b6d6dSopenharmony_ci 1572e5b6d6dSopenharmony_ci* `--with-data-packaging=library` 1582e5b6d6dSopenharmony_ci This mode builds a shared library (DLL or .so). This is the simplest mode to 1592e5b6d6dSopenharmony_ci use, and is the default. 1602e5b6d6dSopenharmony_ci To use: link your application against the common and data libraries. 1612e5b6d6dSopenharmony_ci This is the only directly supported behavior on Windows builds. 1622e5b6d6dSopenharmony_ci* `--with-data-packaging=static` 1632e5b6d6dSopenharmony_ci This option builds ICU data as a single (large) static library. This mode is 1642e5b6d6dSopenharmony_ci more complex to use. If you encounter errors, you may need to build ICU 1652e5b6d6dSopenharmony_ci multiple times. 1662e5b6d6dSopenharmony_ci* `--with-data-packaging=files` 1672e5b6d6dSopenharmony_ci With this option, ICU outputs separate individual files (.res, .cnv, etc) 1682e5b6d6dSopenharmony_ci which will be loaded at runtime. Read the rest of this document, especially 1692e5b6d6dSopenharmony_ci the sections that discuss the ICU directory path. 1702e5b6d6dSopenharmony_ci* `--with-data-packaging=archive` 1712e5b6d6dSopenharmony_ci With this option, ICU outputs a single "icudt__.dat" file containing ICU 1722e5b6d6dSopenharmony_ci data. Read the rest of this document, especially the sections that discuss 1732e5b6d6dSopenharmony_ci the ICU directory path. 1742e5b6d6dSopenharmony_ci 1752e5b6d6dSopenharmony_ci## Time Zone Data 1762e5b6d6dSopenharmony_ci 1772e5b6d6dSopenharmony_ciBecause time zone data requires frequent updates in response to countries 1782e5b6d6dSopenharmony_cichanging their transition dates for daylight saving time, ICU provides 1792e5b6d6dSopenharmony_ciadditional options for loading time zone data from separate files, thus avoiding 1802e5b6d6dSopenharmony_cithe need to update a combined ICU data package. Further information is found 1812e5b6d6dSopenharmony_ciunder [Time Zones](../datetime/timezone/index.md). 1822e5b6d6dSopenharmony_ci 1832e5b6d6dSopenharmony_ci## Application Data 1842e5b6d6dSopenharmony_ci 1852e5b6d6dSopenharmony_ciICU-based applications can ship and use their own data for localized strings, 1862e5b6d6dSopenharmony_cicustom conversion tables, etc. Each data item file must have a package name as a 1872e5b6d6dSopenharmony_ciprefix, and this package name must match the basename of a .dat package file, if 1882e5b6d6dSopenharmony_cione is used. The package name must be used in ICU APIs, for example in 1892e5b6d6dSopenharmony_ci`udata_setAppData()` (instead of `udata_setCommonData()` which is only used for 1902e5b6d6dSopenharmony_ciICU's own data) and in the pathname argument of `ures_open()`. 1912e5b6d6dSopenharmony_ci 1922e5b6d6dSopenharmony_ciThe only real difference to ICU's own data is that application data cannot be 1932e5b6d6dSopenharmony_cisimply loaded by specifying a NULL value for the path arguments of ICU APIs, and 1942e5b6d6dSopenharmony_ciapplication data will not be used by APIs that do not have path/package name 1952e5b6d6dSopenharmony_ciarguments at all. 1962e5b6d6dSopenharmony_ci 1972e5b6d6dSopenharmony_ciThe most important APIs that allow application data to be used are for Resource 1982e5b6d6dSopenharmony_ciBundles, which are most often used for localized strings and other data. There 1992e5b6d6dSopenharmony_ciare also functions like `ucnv_openPackage()` that allow to specify application 2002e5b6d6dSopenharmony_cidata, and the `udata.h` API can be used to load any data with minimum 2012e5b6d6dSopenharmony_cirequirements on the binary format, and without ICU interpreting the contents of 2022e5b6d6dSopenharmony_cithe data. 2032e5b6d6dSopenharmony_ci 2042e5b6d6dSopenharmony_ciThe `pkgdata` tool, which is used to package the data into various formats (e.g. 2052e5b6d6dSopenharmony_cishared library), has an option (`--without-assembly` or `-w`) to not use 2062e5b6d6dSopenharmony_ciassembly code when building and packaging the application specific data into a 2072e5b6d6dSopenharmony_cishared library. Building the data with assembly code, which is enabled by 2082e5b6d6dSopenharmony_cidefault, is faster and more efficient; however, there are some platform 2092e5b6d6dSopenharmony_cispecific issues that may arise. The `--without-assembly` option may be 2102e5b6d6dSopenharmony_cinecessary on certain platforms (e.g. Linux) which have trouble properly loading 2112e5b6d6dSopenharmony_ciapplication data when it was built with assembly code and is packaged as a 2122e5b6d6dSopenharmony_cishared library. 2132e5b6d6dSopenharmony_ci 2142e5b6d6dSopenharmony_ci## Alignment 2152e5b6d6dSopenharmony_ci 2162e5b6d6dSopenharmony_ciICU data is designed to be 16-aligned, with natural alignment of values inside 2172e5b6d6dSopenharmony_cithe data structure, so that the data is usable as is when memory-mapped. 2182e5b6d6dSopenharmony_ci("16-aligned" means that the start address is a multiple of 16 bytes.) 2192e5b6d6dSopenharmony_ci 2202e5b6d6dSopenharmony_ciMemory-mapping (as well as memory allocation) provides at least 16-alignment on 2212e5b6d6dSopenharmony_cimodern platforms. Some CPUs require n-alignment of types of size n bytes (and 2222e5b6d6dSopenharmony_cicrash on unaligned reads), other CPUs usually operate faster on data that is 2232e5b6d6dSopenharmony_cialigned properly. 2242e5b6d6dSopenharmony_ci 2252e5b6d6dSopenharmony_ciSome of the ICU code explicitly checks for proper alignment. 2262e5b6d6dSopenharmony_ci 2272e5b6d6dSopenharmony_ciThe `icupkg` tool places data items into the .dat file at start offsets that are 2282e5b6d6dSopenharmony_cimultiples of 16 bytes. 2292e5b6d6dSopenharmony_ci 2302e5b6d6dSopenharmony_ciWhen using `genccode` to directly write a .o/.obj file, or to write assembler 2312e5b6d6dSopenharmony_cicode, it specifies at least 16-alignment. When using `genccode` to write C code, 2322e5b6d6dSopenharmony_ciit prepends the data with a double value which should yield at least 8-alignment 2332e5b6d6dSopenharmony_cion most platforms (usually `sizeof(double)=8`). 2342e5b6d6dSopenharmony_ci 2352e5b6d6dSopenharmony_ci## Flexibility vs. Installation vs. Performance 2362e5b6d6dSopenharmony_ci 2372e5b6d6dSopenharmony_ciThere are choices that affect ICU data loading and depend on application 2382e5b6d6dSopenharmony_cirequirements. 2392e5b6d6dSopenharmony_ci 2402e5b6d6dSopenharmony_ci### Data in Shared Libraries/DLLs vs. .dat package files 2412e5b6d6dSopenharmony_ci 2422e5b6d6dSopenharmony_ciBuilding ICU data into shared libraries (`--with-data-packaging=library`) is the 2432e5b6d6dSopenharmony_cimost convenient packaging method because shared libraries (DLLs) are easily 2442e5b6d6dSopenharmony_cifound if they are in the same directory as the application libraries, or if they 2452e5b6d6dSopenharmony_ciare on the system library path. The application installer usually just copies 2462e5b6d6dSopenharmony_cithe ICU shared libraries in the same place. On the other hand, shared libraries 2472e5b6d6dSopenharmony_ciare not portable. 2482e5b6d6dSopenharmony_ci 2492e5b6d6dSopenharmony_ciPackaging data into .dat files (`--with-data-packaging=archive`) allows them to 2502e5b6d6dSopenharmony_cibe shared across platforms, but they must either be loaded by the application 2512e5b6d6dSopenharmony_ciand set with `udata_setCommonData()` or `udata_setAppData()`, or they must be 2522e5b6d6dSopenharmony_ciin a known location that is included in the ICU data directory string. This 2532e5b6d6dSopenharmony_cirequires the application installer, or the application itself at runtime, to 2542e5b6d6dSopenharmony_cilocate the ICU and/or application data by setting the ICU data directory (see 2552e5b6d6dSopenharmony_cithe [ICU Data Directory](#icu-data-directory) section above) or by 2562e5b6d6dSopenharmony_ciloading the data and providing it to one of the `udata_setXYZData()` functions. 2572e5b6d6dSopenharmony_ci 2582e5b6d6dSopenharmony_ciUnlike shared libraries, .dat package files can be taken apart into separate 2592e5b6d6dSopenharmony_cidata item files with the decmn ICU tool. This allows post-installation 2602e5b6d6dSopenharmony_cimodification of a package file. The `gencmn` and `pkgdata` ICU tools can then be 2612e5b6d6dSopenharmony_ciused to reassemble the .dat package file. 2622e5b6d6dSopenharmony_ci 2632e5b6d6dSopenharmony_ciFor more information about .dat package files see the section [Sharing ICU Data 2642e5b6d6dSopenharmony_ciBetween Platforms](#sharing-icu-data-between-platforms) below. 2652e5b6d6dSopenharmony_ci 2662e5b6d6dSopenharmony_ci### Data Overriding vs. Loading Performance 2672e5b6d6dSopenharmony_ci 2682e5b6d6dSopenharmony_ciIf the ICU data directory string is empty, then ICU will not attempt to load 2692e5b6d6dSopenharmony_cidata from the file system. It is then only possible to load data from the 2702e5b6d6dSopenharmony_cilinked-in shared library or via `udata_setCommonData()` and 2712e5b6d6dSopenharmony_ci`udata_setAppData()`. This is inflexible but provides the highest performance. 2722e5b6d6dSopenharmony_ci 2732e5b6d6dSopenharmony_ciIf the ICU data directory string is not empty, then data items are searched in 2742e5b6d6dSopenharmony_ciall directories and matching .dat files mentioned before checking in 2752e5b6d6dSopenharmony_cialready-loaded package files. This allows overriding of packaged data items with 2762e5b6d6dSopenharmony_cisingle files after installation but costs some time for filesystem accesses. 2772e5b6d6dSopenharmony_ciThis is usually done only once per data item; see 2782e5b6d6dSopenharmony_ci[User Data Caching](#user-data-caching) below. 2792e5b6d6dSopenharmony_ci 2802e5b6d6dSopenharmony_ci### Single Data Files vs. Packages 2812e5b6d6dSopenharmony_ci 2822e5b6d6dSopenharmony_ciSingle data files (`--with-data-packaging=files`) are easy to replace and can 2832e5b6d6dSopenharmony_cioverride items inside data packages. However, it is usually desirable to reduce 2842e5b6d6dSopenharmony_cithe number of files during installation, and package files use less disk space 2852e5b6d6dSopenharmony_cithan many small files. 2862e5b6d6dSopenharmony_ci 2872e5b6d6dSopenharmony_ci## How Data Loading Works 2882e5b6d6dSopenharmony_ci 2892e5b6d6dSopenharmony_ciICU data items are referenced by three names - a path, a name and a type. The 2902e5b6d6dSopenharmony_cifollowing are some examples: 2912e5b6d6dSopenharmony_ci 2922e5b6d6dSopenharmony_cipath | name | type 2932e5b6d6dSopenharmony_ci-----------------------------|----------|------- 2942e5b6d6dSopenharmony_ci c:\\some\\path\\dataLibName | test | dat 2952e5b6d6dSopenharmony_ci no path | cnvalias | icu 2962e5b6d6dSopenharmony_ci no path | cp1252 | cnv 2972e5b6d6dSopenharmony_ci no path | en | res 2982e5b6d6dSopenharmony_ci no path | uprops | icu 2992e5b6d6dSopenharmony_ci 3002e5b6d6dSopenharmony_ci 3012e5b6d6dSopenharmony_ciItems with 'no path' specified are loaded from the default ICU data. 3022e5b6d6dSopenharmony_ci 3032e5b6d6dSopenharmony_ciApplication data items include a path, and will be loaded from user data files, 3042e5b6d6dSopenharmony_cinot from the ICU default data. For application data, the path argument need not 3052e5b6d6dSopenharmony_cicontain an actual directory, but must contain the application data's package 3062e5b6d6dSopenharmony_ciname after the last directory separator character (or by itself if there is no 3072e5b6d6dSopenharmony_cidirectory). If the path argument contains a directory, then it is logically 3082e5b6d6dSopenharmony_ciprepended to the ICU data directory string and searched first for data. The path 3092e5b6d6dSopenharmony_ciargument can contain at most one directory. (Path separators like semicolon (;) 3102e5b6d6dSopenharmony_ciare not handled here.) 3112e5b6d6dSopenharmony_ci 3122e5b6d6dSopenharmony_ci> :point_right: **Note**: The ICU data directory string itself may 3132e5b6d6dSopenharmony_cicontain multiple directories and path/filenames to .dat package files. See the 3142e5b6d6dSopenharmony_ci[ICU Data Directory](#icu-data-directory) section. 3152e5b6d6dSopenharmony_ci 3162e5b6d6dSopenharmony_ciIt is recommended to not include the directory in the path argument but to make 3172e5b6d6dSopenharmony_cisure via setting the application data or the ICU data directory string that the 3182e5b6d6dSopenharmony_cidata can be located. This simplifies program maintenance and improves 3192e5b6d6dSopenharmony_cirobustness. 3202e5b6d6dSopenharmony_ci 3212e5b6d6dSopenharmony_ciSee the API descriptions for the functions `udata_open()` and 3222e5b6d6dSopenharmony_ci`udata_openChoice()` for additional information on opening ICU data from within 3232e5b6d6dSopenharmony_cian application. 3242e5b6d6dSopenharmony_ci 3252e5b6d6dSopenharmony_ciData items can exist as individual files, or a number of them can be packaged 3262e5b6d6dSopenharmony_citogether in a single file for greater efficiency in loading and convenience of 3272e5b6d6dSopenharmony_cidistribution. The combined files are called Common Files. 3282e5b6d6dSopenharmony_ci 3292e5b6d6dSopenharmony_ciBased on the supplied path and name, ICU searches several possible locations 3302e5b6d6dSopenharmony_ciwhen opening data. To make things more concrete in the following descriptions, 3312e5b6d6dSopenharmony_cithe following values of path, name and type are used: 3322e5b6d6dSopenharmony_ci 3332e5b6d6dSopenharmony_ci``` 3342e5b6d6dSopenharmony_cipath = "c:\\some\\path\\dataLibName" 3352e5b6d6dSopenharmony_ciname = "test" 3362e5b6d6dSopenharmony_citype = "res" 3372e5b6d6dSopenharmony_ci``` 3382e5b6d6dSopenharmony_ci 3392e5b6d6dSopenharmony_ciIn this case, "dataLibName" is the "package name" part of the path argument, and 3402e5b6d6dSopenharmony_ci"c:\\some\\path\\" is the directory part of it. 3412e5b6d6dSopenharmony_ci 3422e5b6d6dSopenharmony_ciThe search sequence for the data for "test.res" is as follows (the first 3432e5b6d6dSopenharmony_cisuccessful loading attempt wins): 3442e5b6d6dSopenharmony_ci 3452e5b6d6dSopenharmony_ci1. Try to load the file "dataLibName_test.res" from c:\\some\\data\\. 3462e5b6d6dSopenharmony_ci 3472e5b6d6dSopenharmony_ci2. Try to load the file "dataLibName_test.res" from each of the directories in 3482e5b6d6dSopenharmony_ci the ICU data directory string. 3492e5b6d6dSopenharmony_ci 3502e5b6d6dSopenharmony_ci3. Try to locate the data package for the package name "dataLibName". 3512e5b6d6dSopenharmony_ci 3522e5b6d6dSopenharmony_ci1. Try to locate the data package in the internal cache. 3532e5b6d6dSopenharmony_ci 3542e5b6d6dSopenharmony_ci2. Try to load the package file "dataLibName.dat" from c:\\some\\data\\. 3552e5b6d6dSopenharmony_ci 3562e5b6d6dSopenharmony_ci3. Try to load the package file "dataLibName.dat" from each of the directories 3572e5b6d6dSopenharmony_ci in the ICU data directory string. 3582e5b6d6dSopenharmony_ci 3592e5b6d6dSopenharmony_ciThe first steps, loading the data item from an individual file, are omitted if 3602e5b6d6dSopenharmony_cino directory is specified in either the path argument or the ICU data directory 3612e5b6d6dSopenharmony_cistring. 3622e5b6d6dSopenharmony_ci 3632e5b6d6dSopenharmony_ciPackage files are loaded at most once and then cached. They are identified only 3642e5b6d6dSopenharmony_ciby their package name. Whenever a data item is requested from a package and that 3652e5b6d6dSopenharmony_cipackage has been loaded before, then the cached package is used immediately 3662e5b6d6dSopenharmony_ciinstead of searching through the filesystem. 3672e5b6d6dSopenharmony_ci 3682e5b6d6dSopenharmony_ci> :point_right: **Note**: ICU versions before 2.2 always searched data packages 3692e5b6d6dSopenharmony_cibefore looking for individual files, which made it impossible to override 3702e5b6d6dSopenharmony_cipackaged data items. See the ICU 2.2 download page and the readme for more 3712e5b6d6dSopenharmony_ciinformation about the changes. 3722e5b6d6dSopenharmony_ci 3732e5b6d6dSopenharmony_ci## User Data Caching 3742e5b6d6dSopenharmony_ci 3752e5b6d6dSopenharmony_ciOnce loaded, data package files are cached, and stay loaded for the duration of 3762e5b6d6dSopenharmony_cithe process. Any requests for data items from an already loaded data package 3772e5b6d6dSopenharmony_cifile are routed directly to the cached data. No additional search for loadable 3782e5b6d6dSopenharmony_cifiles is made. 3792e5b6d6dSopenharmony_ci 3802e5b6d6dSopenharmony_ciThe user data cache is keyed by the base file name portion of the requested 3812e5b6d6dSopenharmony_cipath, with any directory portion stripped off and ignored. Using the previous 3822e5b6d6dSopenharmony_ciexample, for the path name "c:\\some\\path\\dataLibName", the cache key is 3832e5b6d6dSopenharmony_ci"dataLibName". After this is cached, a subsequent request for "dataLibName", no 3842e5b6d6dSopenharmony_cimatter what directory path is specified, will resolve to the cached data. 3852e5b6d6dSopenharmony_ci 3862e5b6d6dSopenharmony_ciData can be explicitly added to the cache of common format data by means of the 3872e5b6d6dSopenharmony_ci`udata_setAppData()` function. This function takes as input the path (name) and 3882e5b6d6dSopenharmony_cia pointer to a memory image of a .dat file. The data is added to the cache, 3892e5b6d6dSopenharmony_cicausing any subsequent requests for data items from that file name to be routed 3902e5b6d6dSopenharmony_cito the cache. 3912e5b6d6dSopenharmony_ci 3922e5b6d6dSopenharmony_ciOnly data package files are cached. Separate data files that contain just a 3932e5b6d6dSopenharmony_cisingle data item are not cached; for these, multiple requests to ICU to open the 3942e5b6d6dSopenharmony_cidata will result in multiple requests to the operating system to open the 3952e5b6d6dSopenharmony_ciunderlying file. 3962e5b6d6dSopenharmony_ci 3972e5b6d6dSopenharmony_ciHowever, most ICU services (Resource Bundles, conversion, etc.) themselves cache 3982e5b6d6dSopenharmony_ciloaded data, so that data is usually loaded only once until the end of the 3992e5b6d6dSopenharmony_ciprocess (or until `u_cleanup()` or `ucnv_flushCache()` or similar are called.) 4002e5b6d6dSopenharmony_ci 4012e5b6d6dSopenharmony_ciThere is no mechanism for removing or updating cached data files. 4022e5b6d6dSopenharmony_ci 4032e5b6d6dSopenharmony_ci## Directory Separator Characters 4042e5b6d6dSopenharmony_ci 4052e5b6d6dSopenharmony_ciIf a directory separator (generally '/' or '\\') is needed in a path parameter, 4062e5b6d6dSopenharmony_ciuse the form that is native to the platform. The ICU header `"putil.h"` defines 4072e5b6d6dSopenharmony_ci`U_FILE_SEP_CHAR` appropriately for the platform. 4082e5b6d6dSopenharmony_ci 4092e5b6d6dSopenharmony_ci> :point_right: **Note**: On Windows, the directory separator must be '\\' for 4102e5b6d6dSopenharmony_ciany paths passed to ICU APIs. This is different from native Windows APIs, which 4112e5b6d6dSopenharmony_cigenerally allow either '/' or '\\'. 4122e5b6d6dSopenharmony_ci 4132e5b6d6dSopenharmony_ci## Sharing ICU Data Between Platforms 4142e5b6d6dSopenharmony_ci 4152e5b6d6dSopenharmony_ciICU's default data is (at the time of this writing) about 8 MB in size. Because 4162e5b6d6dSopenharmony_ciit is normally built as a shared library, the file format is specific to each 4172e5b6d6dSopenharmony_ciplatform (operating system). The data libraries can not be shared between 4182e5b6d6dSopenharmony_ciplatforms even though the actual data contents are identical. 4192e5b6d6dSopenharmony_ci 4202e5b6d6dSopenharmony_ciBy distributing the default data in the form of common format .dat files rather 4212e5b6d6dSopenharmony_cithan as shared libraries, a single data file can be shared among multiple 4222e5b6d6dSopenharmony_ciplatforms. This is beneficial if a single distribution of the application (a CD, 4232e5b6d6dSopenharmony_cifor example) includes binaries for many platforms, and the size requirements for 4242e5b6d6dSopenharmony_cireplicating the ICU data for each platform are a problem. 4252e5b6d6dSopenharmony_ci 4262e5b6d6dSopenharmony_ciICU common format data files are not completely interchangeable between 4272e5b6d6dSopenharmony_ciplatforms. The format depends on these properties of the platform: 4282e5b6d6dSopenharmony_ci 4292e5b6d6dSopenharmony_ci1. Byte Ordering (little endian vs. big endian) 4302e5b6d6dSopenharmony_ci 4312e5b6d6dSopenharmony_ci2. Base character set - ASCII or EBCDIC 4322e5b6d6dSopenharmony_ci 4332e5b6d6dSopenharmony_ciThis means, for example, that ICU data files are interchangeable between Windows 4342e5b6d6dSopenharmony_ciand Linux on X86 (both are ASCII little endian), or between Macintosh and 4352e5b6d6dSopenharmony_ciSolaris on SPARC (both are ASCII big endian), but not between Solaris on SPARC 4362e5b6d6dSopenharmony_ciand Solaris on X86 (different byte ordering). 4372e5b6d6dSopenharmony_ci 4382e5b6d6dSopenharmony_ciThe single letter following the version number in the file name of the default 4392e5b6d6dSopenharmony_ciICU data file encodes the properties of the file as follows: 4402e5b6d6dSopenharmony_ci 4412e5b6d6dSopenharmony_ci``` 4422e5b6d6dSopenharmony_ciicudt19l.dat Little Endian, ASCII 4432e5b6d6dSopenharmony_ciicudt19b.dat Big Endian, ASCII 4442e5b6d6dSopenharmony_ciicudt19e.dat Big Endian, EBCDIC 4452e5b6d6dSopenharmony_ci``` 4462e5b6d6dSopenharmony_ci 4472e5b6d6dSopenharmony_ci(There are no little endian EBCDIC systems. All non-EBCDIC encodings include an 4482e5b6d6dSopenharmony_ciinvariant subset of ASCII that is sufficient to enable these files to 4492e5b6d6dSopenharmony_ciinteroperate.) 4502e5b6d6dSopenharmony_ci 4512e5b6d6dSopenharmony_ciThe packaging of the default ICU data as a .dat file rather than as a shared 4522e5b6d6dSopenharmony_cilibrary is requested by using an option in the configure script at build time. 4532e5b6d6dSopenharmony_ciNothing is required at run time; ICU finds and uses whatever form of the data is 4542e5b6d6dSopenharmony_ciavailable. 4552e5b6d6dSopenharmony_ci 4562e5b6d6dSopenharmony_ci> :point_right: **Note**: When the ICU data is built in the form of shared 4572e5b6d6dSopenharmony_cilibraries, the library names have platform-specific prefixes and suffixes. On 4582e5b6d6dSopenharmony_ciUnix-style platforms, all the libraries have the "lib" prefix and one of the 4592e5b6d6dSopenharmony_ciusual (".dll", ".so", ".sl", etc.) suffixes. Other than these prefixes and 4602e5b6d6dSopenharmony_cisuffixes, the library names are the same as the above .dat files. 4612e5b6d6dSopenharmony_ci 4622e5b6d6dSopenharmony_ci## Customizing ICU's Data Library 4632e5b6d6dSopenharmony_ci 4642e5b6d6dSopenharmony_ciICU includes a standard library of data that is about 16 MB in size. Most of 4652e5b6d6dSopenharmony_cithis consists of conversion tables and locale information. The data itself is 4662e5b6d6dSopenharmony_cinormally placed into a single shared library. 4672e5b6d6dSopenharmony_ci 4682e5b6d6dSopenharmony_ciUpdate: as of ICU 64, the standard data library is over 20 MB in size. We have 4692e5b6d6dSopenharmony_ciintroduced a new tool, the [ICU Data Build Tool](./buildtool.md), 4702e5b6d6dSopenharmony_cito replace the makefiles explained below and give you more control over what 4712e5b6d6dSopenharmony_cigoes into your ICU locale data file. 4722e5b6d6dSopenharmony_ci 4732e5b6d6dSopenharmony_ci### Adding Converters to ICU 4742e5b6d6dSopenharmony_ci 4752e5b6d6dSopenharmony_ciThe first step is to obtain or create a .ucm (source) mapping data file for the 4762e5b6d6dSopenharmony_cidesired converter. A large archive of converter data is maintained by the ICU 4772e5b6d6dSopenharmony_citeam at <https://github.com/unicode-org/icu-data/tree/main/charset/data/ucm> 4782e5b6d6dSopenharmony_ci 4792e5b6d6dSopenharmony_ciWe will use `solaris-eucJP-2.7.ucm`, available from the repository mentioned 4802e5b6d6dSopenharmony_ciabove, as an example. 4812e5b6d6dSopenharmony_ci 4822e5b6d6dSopenharmony_ci#### Build the Converter 4832e5b6d6dSopenharmony_ci 4842e5b6d6dSopenharmony_ciConverter source files are compiled into binary converter files (.cnv files) by 4852e5b6d6dSopenharmony_ciusing the icu tool makeconv. For the example, you can use this command 4862e5b6d6dSopenharmony_ci 4872e5b6d6dSopenharmony_ci``` 4882e5b6d6dSopenharmony_cimakeconv -v solaris-eucJP-2.7.ucm 4892e5b6d6dSopenharmony_ci``` 4902e5b6d6dSopenharmony_ci 4912e5b6d6dSopenharmony_ciSome of the .ucm files from the repository will need additional header 4922e5b6d6dSopenharmony_ciinformation before they can be built. Use the error messages from the makeconv 4932e5b6d6dSopenharmony_citool, .ucm files for similar converters, and the ICU user guide documentation of 4942e5b6d6dSopenharmony_ci.ucm files as a guide when making changes. For the `solaris-eucJP-2.7.ucm` 4952e5b6d6dSopenharmony_ciexample, we will borrow the missing header fields from 4962e5b6d6dSopenharmony_ci`source/data/mappings/ibm-33722_P12A-2000.ucm`, which is the standard ICU eucJP 4972e5b6d6dSopenharmony_ciconverter data. 4982e5b6d6dSopenharmony_ci 4992e5b6d6dSopenharmony_ciThe ucm file format is described in the 5002e5b6d6dSopenharmony_ci["Conversion Data" chapter](../conversion/data.md) of this user guide. 5012e5b6d6dSopenharmony_ci 5022e5b6d6dSopenharmony_ciAfter adjustment, the header of the `solaris-eucJP-2.7.ucm` file contains these 5032e5b6d6dSopenharmony_ciitems: 5042e5b6d6dSopenharmony_ci 5052e5b6d6dSopenharmony_ci``` 5062e5b6d6dSopenharmony_ci<code_set_name> "solaris-eucJP-2.7" 5072e5b6d6dSopenharmony_ci<subchar> \\x3F 5082e5b6d6dSopenharmony_ci<uconv_class> "MBCS" 5092e5b6d6dSopenharmony_ci 5102e5b6d6dSopenharmony_ci<mb_cur_max> 3 5112e5b6d6dSopenharmony_ci<mb_cur_min> 1 5122e5b6d6dSopenharmony_ci 5132e5b6d6dSopenharmony_ci<icu:state> 0-8d, 8e:2, 8f:3, 90-9f, a1-fe:1 5142e5b6d6dSopenharmony_ci<icu:state> a1-fe 5152e5b6d6dSopenharmony_ci<icu:state> a1-e4 5162e5b6d6dSopenharmony_ci<icu:state> a1-fe:1, a1:4, a3-af:4, b6:4, d6:4, da-db:4, ed-f2:4 5172e5b6d6dSopenharmony_ci<icu:state> a1-fe 5182e5b6d6dSopenharmony_ci``` 5192e5b6d6dSopenharmony_ci 5202e5b6d6dSopenharmony_ciThe binary converter file produced by the `makeconv` tool is 5212e5b6d6dSopenharmony_ci`solaris-eucJP-2.7.cnv`. 5222e5b6d6dSopenharmony_ci 5232e5b6d6dSopenharmony_ci#### Installation 5242e5b6d6dSopenharmony_ci 5252e5b6d6dSopenharmony_ciCopy the new .cnv file to the desired location for use. Set the environment 5262e5b6d6dSopenharmony_civariable `ICU_DATA` to the directory containing the data, or, alternatively, 5272e5b6d6dSopenharmony_cifrom within an application, tell ICU the location of the new data with the 5282e5b6d6dSopenharmony_cifunction `u_setDataDirectory()` before using the new converter. 5292e5b6d6dSopenharmony_ci 5302e5b6d6dSopenharmony_ciIf ICU is already obtaining data from files rather than a shared library, 5312e5b6d6dSopenharmony_ciinstall the new file in the same location as the existing ICU data file(s), and 5322e5b6d6dSopenharmony_cidon't change/set the environment variable or data directory. 5332e5b6d6dSopenharmony_ci 5342e5b6d6dSopenharmony_ciIf you do not want to add a converter to ICU's base data, you can also generate 5352e5b6d6dSopenharmony_cia conversion table with `makeconv`, use pkgdata to generate your own package and 5362e5b6d6dSopenharmony_ciuse the `ucnv_openPackage()` to open up a converter with that conversion table 5372e5b6d6dSopenharmony_cifrom the generated package. 5382e5b6d6dSopenharmony_ci 5392e5b6d6dSopenharmony_ci#### Building the new converter into ICU 5402e5b6d6dSopenharmony_ci 5412e5b6d6dSopenharmony_ciThe need to install a separate file and inform ICU of the data directory can be 5422e5b6d6dSopenharmony_ciavoided by building the new converter into ICU's standard data library. Here is 5432e5b6d6dSopenharmony_cithe procedure for doing so: 5442e5b6d6dSopenharmony_ci 5452e5b6d6dSopenharmony_ci1. Move the .ucm file(s) for the converter(s) to be added ( 5462e5b6d6dSopenharmony_ci `solaris-eucJP-2.7.ucm` for our example) into the directory 5472e5b6d6dSopenharmony_ci `source/data/mappings/` 5482e5b6d6dSopenharmony_ci 5492e5b6d6dSopenharmony_ci2. Create, or edit, if it already exists, the file 5502e5b6d6dSopenharmony_ci `source/data/mappings/ucmlocal.mk`. Add this line: 5512e5b6d6dSopenharmony_ci 5522e5b6d6dSopenharmony_ci ``` 5532e5b6d6dSopenharmony_ci UCM_SOURCE_LOCAL = solaris-eucJP-2.7.ucm 5542e5b6d6dSopenharmony_ci ``` 5552e5b6d6dSopenharmony_ci 5562e5b6d6dSopenharmony_ci Any number of converters can be listed. Extend the list to new lines with a 5572e5b6d6dSopenharmony_ci back slash at the end of the line. The `ucmlocal.mk` file is described in 5582e5b6d6dSopenharmony_ci more detail in `source/data/mappings/ucmfiles.mk` (Even though they use very 5592e5b6d6dSopenharmony_ci different build systems, `ucmlocal.mk` is used for both the Windows and UNIX 5602e5b6d6dSopenharmony_ci builds.) 5612e5b6d6dSopenharmony_ci 5622e5b6d6dSopenharmony_ci3. Add the converter name and aliases to `source/data/mappings/convrtrs.txt`. 5632e5b6d6dSopenharmony_ci This will allow your converter to be shown in the list of available 5642e5b6d6dSopenharmony_ci converters when you call the `ucnv_getAvailableName(`) function. The file 5652e5b6d6dSopenharmony_ci syntax is described within the file. 5662e5b6d6dSopenharmony_ci 5672e5b6d6dSopenharmony_ci4. Rebuild the ICU data. 5682e5b6d6dSopenharmony_ci For Windows, from MSVC choose the makedata project from the GUI, then build 5692e5b6d6dSopenharmony_ci the project. 5702e5b6d6dSopenharmony_ci For UNIX, `cd icu/source/data; gmake` 5712e5b6d6dSopenharmony_ci 5722e5b6d6dSopenharmony_ciWhen opening an ICU converter (`ucnv_open()`), the converter name can not be 5732e5b6d6dSopenharmony_ciqualified with a path that indicates the directory or common data file 5742e5b6d6dSopenharmony_cicontaining the corresponding converter data. The required data must be present 5752e5b6d6dSopenharmony_cieither in the main ICU data library or as a separate .cnv file located in the 5762e5b6d6dSopenharmony_ciICU data directory. This is different from opening resources or other types of 5772e5b6d6dSopenharmony_ciICU data, which do allow a path. 5782e5b6d6dSopenharmony_ci 5792e5b6d6dSopenharmony_ci### Adding Locale Data to ICU's Data 5802e5b6d6dSopenharmony_ci 5812e5b6d6dSopenharmony_ciIf you have data for a locale that is not included in ICU's standard build, then 5822e5b6d6dSopenharmony_ciyou can add it to the build in a very similar way as with conversion tables 5832e5b6d6dSopenharmony_ciabove. The ICU project provides a large number of additional locales in its 5842e5b6d6dSopenharmony_ci[locale 5852e5b6d6dSopenharmony_cirepository](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/locales/) 5862e5b6d6dSopenharmony_cion the web. Most of this locale data is derived from the CLDR ([Common Locale 5872e5b6d6dSopenharmony_ciData Repository](http://www.unicode.org/cldr/)) project. 5882e5b6d6dSopenharmony_ci 5892e5b6d6dSopenharmony_ciDropping the txt file into the correct place in the source tree is sufficient to 5902e5b6d6dSopenharmony_ciadd it to your ICU build. You will need to re-configure in order to pick it up. 5912e5b6d6dSopenharmony_ci 5922e5b6d6dSopenharmony_ci## Customizing ICU's Data Library for ICU 63 or earlier 5932e5b6d6dSopenharmony_ciThe ICU data library can be easily customized, either by adding additional converters or locales, or by removing some of the standard ones for the purpose of saving space. 5942e5b6d6dSopenharmony_ci 5952e5b6d6dSopenharmony_ci> :point_right: **Note**: ICU for C by default comes with pre-built data. 5962e5b6d6dSopenharmony_ciThe source data files are included as an "icu\*data.zip" file starting in ICU4C 5972e5b6d6dSopenharmony_ci49. Previously, they were not included unless ICU is downloaded from the 5982e5b6d6dSopenharmony_ci[source repository](https://github.com/unicode-org/icu). Alternatively, the 5992e5b6d6dSopenharmony_ci[Data Customizer](http://apps.icu-project.org/datacustom/) may be used to 6002e5b6d6dSopenharmony_cicustomize the pre-built data. 6012e5b6d6dSopenharmony_ci 6022e5b6d6dSopenharmony_ciICU can load data from individual data files as well as from its default 6032e5b6d6dSopenharmony_cilibrary, so building a customized library when adding additional data is not 6042e5b6d6dSopenharmony_cistrictly necessary. Adding to ICU's library can simplify application 6052e5b6d6dSopenharmony_ciinstallation by eliminating the need to include separate files with an 6062e5b6d6dSopenharmony_ciapplication distribution, and the need to tell ICU where they are installed. 6072e5b6d6dSopenharmony_ci 6082e5b6d6dSopenharmony_ciReducing the size of ICU's data by eliminating unneeded resources can make 6092e5b6d6dSopenharmony_cisense on small systems with limited or no disk, but for desktop or server 6102e5b6d6dSopenharmony_cisystems there is no real advantage to trimming. ICU's data is memory mapped 6112e5b6d6dSopenharmony_ciinto an application's address space, and only those portions of the data 6122e5b6d6dSopenharmony_ciactually being used are ever paged in, so there are no significant RAM savings. 6132e5b6d6dSopenharmony_ciAs for disk space, with the large size of today's hard drives, saving a few MB 6142e5b6d6dSopenharmony_ciis not worth the bother. 6152e5b6d6dSopenharmony_ci 6162e5b6d6dSopenharmony_ciBy default, ICU builds with a large set of converters and with all available 6172e5b6d6dSopenharmony_cilocales. This means that any extra items added must be provided by the 6182e5b6d6dSopenharmony_ciapplication developer. There is no extra ICU-supplied data that could be 6192e5b6d6dSopenharmony_cispecified. 6202e5b6d6dSopenharmony_ci 6212e5b6d6dSopenharmony_ci### Details 6222e5b6d6dSopenharmony_ci 6232e5b6d6dSopenharmony_ciThe converters and resources that ICU builds are in the following configuration 6242e5b6d6dSopenharmony_cifiles. They are only available when building from ICU's source code repository. 6252e5b6d6dSopenharmony_ciNormally, the standard ICU distribution do not include these files. 6262e5b6d6dSopenharmony_ci 6272e5b6d6dSopenharmony_ciFile | Description 6282e5b6d6dSopenharmony_ci----------------------------------|-------------- 6292e5b6d6dSopenharmony_cisource/data/locales/resfiles.mk | The standard set of locale data resource bundles 6302e5b6d6dSopenharmony_cisource/data/locales/reslocal.mk | User-provided file with additional resource bundles 6312e5b6d6dSopenharmony_cisource/data/coll/colfiles.mk | The standard set of collation data resource bundles 6322e5b6d6dSopenharmony_cisource/data/coll/collocal.mk | User-provided file with additional collation resource bundles 6332e5b6d6dSopenharmony_cisource/data/brkitr/brkfiles.mk | The standard set of break iterator data resource bundles 6342e5b6d6dSopenharmony_cisource/data/brkitr/brklocal.mk | User-provided file with additional break iterator resource bundles 6352e5b6d6dSopenharmony_cisource/data/translit/trnsfiles.mk | The standard set of transliterator resource files 6362e5b6d6dSopenharmony_cisource/data/translit/trnslocal.mk | User-provided file with a set of additional transliterator resource files 6372e5b6d6dSopenharmony_cisource/data/mappings/ucmcore.mk | Core set of conversion tables for MIME/Unix/Windows 6382e5b6d6dSopenharmony_cisource/data/mappings/ucmfiles.mk | Additional, large set of conversion tables for a wide range of uses 6392e5b6d6dSopenharmony_cisource/data/mappings/ucmebcdic.mk | Large set of EBCDIC conversion tables 6402e5b6d6dSopenharmony_cisource/data/mappings/ucmlocal.mk | User-provided file with additional conversion tables 6412e5b6d6dSopenharmony_cisource/data/misc/miscfiles.mk | Miscellaneous data, like timezone information 6422e5b6d6dSopenharmony_ci 6432e5b6d6dSopenharmony_ciThese files function identically for both Windows and UNIX builds of ICU. ICU 6442e5b6d6dSopenharmony_ciwill automatically update the list of installed locales returned by 6452e5b6d6dSopenharmony_ci`uloc_getAvailable()` whenever `resfiles.mk` or `reslocal.mk` are updated and 6462e5b6d6dSopenharmony_cithe ICU data library is rebuilt. These files are only needed while building ICU. 6472e5b6d6dSopenharmony_ciIf any of these files are removed or renamed, the size of the ICU data library 6482e5b6d6dSopenharmony_ciwill be reduced. 6492e5b6d6dSopenharmony_ci 6502e5b6d6dSopenharmony_ciThe optional files `reslocal.mk` and `ucmlocal.mk` are not included as part of 6512e5b6d6dSopenharmony_cia standard ICU distribution. Thus these customization files do not need to be 6522e5b6d6dSopenharmony_cimerged or updated when updating versions of ICU. 6532e5b6d6dSopenharmony_ci 6542e5b6d6dSopenharmony_ciBoth `reslocal.mk` and `ucmlocal.mk` are makefile includes. So the usual rules 6552e5b6d6dSopenharmony_cifor makefiles apply. Lines may be continued by preceding the end of the line to 6562e5b6d6dSopenharmony_cibe continued with a back slash. Lines beginning with a # are comments. See 6572e5b6d6dSopenharmony_ci`ucmfiles.mk` and `resfiles.mk` for additional information. 6582e5b6d6dSopenharmony_ci 6592e5b6d6dSopenharmony_ci### Reducing the Size of ICU's Data: Conversion Tables 6602e5b6d6dSopenharmony_ci 6612e5b6d6dSopenharmony_ciThe size of the ICU data file in the standard build configuration is about 8 MB. 6622e5b6d6dSopenharmony_ciThe majority of this is used for conversion tables. ICU comes with so many 6632e5b6d6dSopenharmony_ciconversion tables because many ICU users need to support many encodings from 6642e5b6d6dSopenharmony_cimany platforms. There are conversion tables for EBCDIC and DOS codepages, for 6652e5b6d6dSopenharmony_ciISO 2022 variants, and for small variations of popular encodings. 6662e5b6d6dSopenharmony_ci 6672e5b6d6dSopenharmony_ci> :point_right: **Important**: ICU provides full internationalization 6682e5b6d6dSopenharmony_cifunctionality without **any** conversion table data. The common library 6692e5b6d6dSopenharmony_cicontains code to handle several important encodings algorithmically: US-ASCII, 6702e5b6d6dSopenharmony_ciISO-8859-1, UTF-7/8/16/32, SCSU, BOCU-1, CESU-8, and IMAP-mailbox-name (i.e., 6712e5b6d6dSopenharmony_ciUS-ASCII, ISO-8859-1, and all Unicode charsets; see 6722e5b6d6dSopenharmony_cisource/data/mappings/convrtrs.txt for the current list). 6732e5b6d6dSopenharmony_ci 6742e5b6d6dSopenharmony_ciTherefore, the easiest way to reduce the size of ICU's data by a lot (without 6752e5b6d6dSopenharmony_cilimitation of I18N support) is to reduce the number of conversion tables that 6762e5b6d6dSopenharmony_ciare built into the data file. 6772e5b6d6dSopenharmony_ci 6782e5b6d6dSopenharmony_ciThe conversion tables are listed for the build process in several makefiles 6792e5b6d6dSopenharmony_ci`source/data/mappings/ucm\*.mk`, roughly grouped by how commonly they are used. 6802e5b6d6dSopenharmony_ciIf you remove or rename any of these files, then the ICU build will exclude the 6812e5b6d6dSopenharmony_ciconversion tables that are listed in that file. Beginning with ICU 2.0, all of 6822e5b6d6dSopenharmony_cithese makefiles including the main one are optional. If you remove all of them, 6832e5b6d6dSopenharmony_cithen ICU will include only very few conversion tables for "fallback" encodings 6842e5b6d6dSopenharmony_ci(see note below). 6852e5b6d6dSopenharmony_ci 6862e5b6d6dSopenharmony_ciIf you remove or rename all `ucm\*.mk` files, then ICU's data is reduced to 6872e5b6d6dSopenharmony_ciabout 3.6 MB. If you remove all these files except for `ucmcore.mk`, then ICU's 6882e5b6d6dSopenharmony_cidata is reduced to about 4.7 MB, while keeping support for a core set of common 6892e5b6d6dSopenharmony_ciMIME/Unix/Windows encodings. 6902e5b6d6dSopenharmony_ci 6912e5b6d6dSopenharmony_ci> :point_right: **Note**: If you remove the conversion table for an encoding 6922e5b6d6dSopenharmony_cithat could be a default encoding on one of your platforms, then ICU will not be 6932e5b6d6dSopenharmony_ciable to instantiate a default converter. In this case, ICU 2.0 and up will 6942e5b6d6dSopenharmony_ciautomatically fall back to a "lowest common denominator" and load a converter 6952e5b6d6dSopenharmony_cifor US-ASCII (or, on EBCDIC platforms, for codepages 37 or 1047). This will be 6962e5b6d6dSopenharmony_cigood enough for converting strings that contain only "ASCII" characters (see the 6972e5b6d6dSopenharmony_cicomment about "invariant characters" in `utypes.h`). 6982e5b6d6dSopenharmony_ci*When ICU is built with a reduced set of conversion tables, then some tests will 6992e5b6d6dSopenharmony_cifail that test the behavior of the converters based on known features of some 7002e5b6d6dSopenharmony_ciencodings. Also, building the testdata will fail if you remove some conversion 7012e5b6d6dSopenharmony_citables that are necessary for that (to test non-ASCII/Unicode resource bundle 7022e5b6d6dSopenharmony_cisource files, for example). You can ignore these failures. Build with the 7032e5b6d6dSopenharmony_cistandard set of conversion tables, if you want to run the tests.* 7042e5b6d6dSopenharmony_ci 7052e5b6d6dSopenharmony_ci### Reducing the Size of ICU's Data: Locale Data 7062e5b6d6dSopenharmony_ci 7072e5b6d6dSopenharmony_ciIf you need to reduce the size of ICU's data even further, then you need to 7082e5b6d6dSopenharmony_ciremove other files or parts of files from the build as well. 7092e5b6d6dSopenharmony_ci 7102e5b6d6dSopenharmony_ciThere are a number of different subdirectories of 'data' containing locale data 7112e5b6d6dSopenharmony_cisplit out by section. Each subdirectory has its own **.mk** file listing the 7122e5b6d6dSopenharmony_cilocales which will be built. Subdirectories include **lang** for language names 7132e5b6d6dSopenharmony_ciand **curr** for currency names. 7142e5b6d6dSopenharmony_ci 7152e5b6d6dSopenharmony_ciYou can remove data for entire locales by removing their files from 7162e5b6d6dSopenharmony_ci`source/data/locales/resfiles.mk` or the appropriate other .mk file. ICU will 7172e5b6d6dSopenharmony_cithen use the data of the parent locale instead, which is root.txt. If you 7182e5b6d6dSopenharmony_ciremove all resource bundles for a given language and its country/region/variant 7192e5b6d6dSopenharmony_cisublocales, **do not remove root.txt!** Also, do not remove a parent locale if 7202e5b6d6dSopenharmony_cichild locales exist. For example, do not remove "en" while retaining "en_US". 7212e5b6d6dSopenharmony_ci 7222e5b6d6dSopenharmony_ci### Reducing the Size of ICU's Data: Collation Data 7232e5b6d6dSopenharmony_ci 7242e5b6d6dSopenharmony_ciCollation data (for sorting, searching and alphabetic indexes) is also large, 7252e5b6d6dSopenharmony_ciespecially the collation data for East Asian languages because they define 7262e5b6d6dSopenharmony_cimultiple orderings of tens of thousands of Han characters. You can remove the 7272e5b6d6dSopenharmony_cicollation data for those languages by removing references to those locales from 7282e5b6d6dSopenharmony_ci`source/data/coll/colfiles.mk` files. When you do that, the collation for those 7292e5b6d6dSopenharmony_cilanguages will fall back to the root collator, that is, you lose 7302e5b6d6dSopenharmony_cilanguage-specific behavior. 7312e5b6d6dSopenharmony_ci 7322e5b6d6dSopenharmony_ciA much less radical approach is to keep the collation data tables but remove the 7332e5b6d6dSopenharmony_citailoring rule strings from which they were built. Those rule strings are 7342e5b6d6dSopenharmony_cirarely used at runtime. For documentation about their use and how to remove 7352e5b6d6dSopenharmony_cithem see the section "Building on Existing Locales" in the 7362e5b6d6dSopenharmony_ci[Collation Customization chapter](collation/customization/index.md). 7372e5b6d6dSopenharmony_ci 7382e5b6d6dSopenharmony_ci### Adding Locale Data to ICU's Data 7392e5b6d6dSopenharmony_ciYou need to write a resource bundle file for it with a structure like the 7402e5b6d6dSopenharmony_ciexisting locale resource bundles (e.g. `source/data/locales/ja.txt, ru_RU.txt`, 7412e5b6d6dSopenharmony_ci`kok_IN.txt`) and add it by writing a file `source/data/locales/reslocal.mk` 7422e5b6d6dSopenharmony_cijust like above. In this file, define the list of additional resource bundles as 7432e5b6d6dSopenharmony_ci 7442e5b6d6dSopenharmony_ci``` 7452e5b6d6dSopenharmony_ciGENRB_SOURCE_LOCAL=myLocale.txt other.txt ... 7462e5b6d6dSopenharmony_ci``` 7472e5b6d6dSopenharmony_ci 7482e5b6d6dSopenharmony_ciStarting in ICU 2.2, these added locales are automatically listed by 7492e5b6d6dSopenharmony_ci`uloc_getAvailable()`. 7502e5b6d6dSopenharmony_ci 7512e5b6d6dSopenharmony_ci## ICU Data File Formats 7522e5b6d6dSopenharmony_ci 7532e5b6d6dSopenharmony_ciICU uses several kinds of data files with specific source (plain text) and 7542e5b6d6dSopenharmony_cibinary data formats. The following lists provides links to descriptions of those 7552e5b6d6dSopenharmony_ciformats. 7562e5b6d6dSopenharmony_ci 7572e5b6d6dSopenharmony_ciEach ICU data object begins with a header before the actual, specific data. The 7582e5b6d6dSopenharmony_ciheader consists of a 16-bit header length value, the two "magic" bytes DA 27 and 7592e5b6d6dSopenharmony_cia [UDataInfo](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/structUDataInfo.html#_details) 7602e5b6d6dSopenharmony_cistructure which specifies the data object's endianness, charset family, format, 7612e5b6d6dSopenharmony_cidata version, etc. 7622e5b6d6dSopenharmony_ci 7632e5b6d6dSopenharmony_ci(This is not the case for the trie structures, which are not stand-alone, 7642e5b6d6dSopenharmony_ciloadable data objects.) 7652e5b6d6dSopenharmony_ci 7662e5b6d6dSopenharmony_ci### Public Data Files 7672e5b6d6dSopenharmony_ci 7682e5b6d6dSopenharmony_ci#### ICU.dat package files 7692e5b6d6dSopenharmony_ci* Source format: (list of files provided as input to the icupkg tool, or 7702e5b6d6dSopenharmony_ci on the gencmn tool command line) 7712e5b6d6dSopenharmony_ci* Binary format: .dat: 7722e5b6d6dSopenharmony_ci [source/tools/toolutil/pkg_gencmn.cpp](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/toolutil/pkg_gencmn.cpp) 7732e5b6d6dSopenharmony_ci* Generator tool: 7742e5b6d6dSopenharmony_ci [icupkg](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/icupkg) 7752e5b6d6dSopenharmony_ci or 7762e5b6d6dSopenharmony_ci [gencmn](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gencmn) 7772e5b6d6dSopenharmony_ci 7782e5b6d6dSopenharmony_ci#### Resource bundles 7792e5b6d6dSopenharmony_ci* Source format: .txt: 7802e5b6d6dSopenharmony_ci [icuhtml/design/bnf_rb.txt](https://github.com/unicode-org/icu-docs/blob/main/design/bnf_rb.txt) 7812e5b6d6dSopenharmony_ci* Binary format: .res: 7822e5b6d6dSopenharmony_ci [source/common/uresdata.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/uresdata.h) 7832e5b6d6dSopenharmony_ci* Generator tool: 7842e5b6d6dSopenharmony_ci [genrb](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/genrb) 7852e5b6d6dSopenharmony_ci 7862e5b6d6dSopenharmony_ci#### Unicode conversion mapping tables 7872e5b6d6dSopenharmony_ci* Source format: .ucm: [Conversion Data chapter](../conversion/data.md) 7882e5b6d6dSopenharmony_ci* Binary format: .cnv: 7892e5b6d6dSopenharmony_ci [source/common/ucnvmbcs.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucnvmbcs.h) 7902e5b6d6dSopenharmony_ci* Generator tool: 7912e5b6d6dSopenharmony_ci [makeconv](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/makeconv) 7922e5b6d6dSopenharmony_ci 7932e5b6d6dSopenharmony_ci#### Conversion (charset) aliases 7942e5b6d6dSopenharmony_ci* Source format: 7952e5b6d6dSopenharmony_ci [source/data/mappings/convrtrs.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/convrtrs.txt): 7962e5b6d6dSopenharmony_ci contains format description. The command "uconv -l --canon" will also 7972e5b6d6dSopenharmony_ci generate the alias table from the currently used copy of ICU. 7982e5b6d6dSopenharmony_ci* Binary format: cnvalias.icu: 7992e5b6d6dSopenharmony_ci [source/common/ucnv_io.cpp](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucnv_io.cpp) 8002e5b6d6dSopenharmony_ci* Generator tool: 8012e5b6d6dSopenharmony_ci [gencnval](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gencnval) 8022e5b6d6dSopenharmony_ci 8032e5b6d6dSopenharmony_ci#### Unicode Character Data (Properties; for Java only: hardcoded in C common library) 8042e5b6d6dSopenharmony_ci* Source format: 8052e5b6d6dSopenharmony_ci [source/data/unidata/ppucd.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/ppucd.txt): 8062e5b6d6dSopenharmony_ci [Preparsed UCD](https://icu.unicode.org/design/props/ppucd) 8072e5b6d6dSopenharmony_ci* Binary format: uprops.icu: 8082e5b6d6dSopenharmony_ci [tools/unicode/c/genprops/corepropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/corepropsbuilder.cpp) 8092e5b6d6dSopenharmony_ci* Generator tool: 8102e5b6d6dSopenharmony_ci [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops) 8112e5b6d6dSopenharmony_ci 8122e5b6d6dSopenharmony_ci#### Unicode Character Data (Case mappings; for Java only: hardcoded in C common library) 8132e5b6d6dSopenharmony_ci* Source format: 8142e5b6d6dSopenharmony_ci [source/data/unidata/*.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata): 8152e5b6d6dSopenharmony_ci [Unicode Character Database](http://www.unicode.org/onlinedat/online.html) 8162e5b6d6dSopenharmony_ci* Binary format: ucase.icu: 8172e5b6d6dSopenharmony_ci [tools/unicode/c/genprops/casepropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/casepropsbuilder.cpp) 8182e5b6d6dSopenharmony_ci* Generator tool: 8192e5b6d6dSopenharmony_ci [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops) 8202e5b6d6dSopenharmony_ci 8212e5b6d6dSopenharmony_ci#### Unicode Character Data (BiDi, and Arabic shaping; for Java only: hardcoded in C common library) 8222e5b6d6dSopenharmony_ci* Source format: 8232e5b6d6dSopenharmony_ci [source/data/unidata/*.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata): 8242e5b6d6dSopenharmony_ci [Unicode Character Database](http://www.unicode.org/onlinedat/online.html) 8252e5b6d6dSopenharmony_ci* Binary format: ubidi.icu: 8262e5b6d6dSopenharmony_ci [tools/unicode/c/genprops/bidipropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/bidipropsbuilder.cpp) 8272e5b6d6dSopenharmony_ci* Generator tool: 8282e5b6d6dSopenharmony_ci [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops) 8292e5b6d6dSopenharmony_ci 8302e5b6d6dSopenharmony_ci#### Unicode Character Data (Normalization since ICU 4.4) & custom normalization data 8312e5b6d6dSopenharmony_ci* Source format: 8322e5b6d6dSopenharmony_ci [source/data/unidata/norm2/*.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/norm2): 8332e5b6d6dSopenharmony_ci Files derived from the [Unicode Character 8342e5b6d6dSopenharmony_ci Database](https://www.unicode.org/onlinedat/online.html), or custom data. 8352e5b6d6dSopenharmony_ci* Binary format: .nrm: 8362e5b6d6dSopenharmony_ci [source/common/normalizer2impl.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/normalizer2impl.h) 8372e5b6d6dSopenharmony_ci* Generator tool: 8382e5b6d6dSopenharmony_ci [gennorm2](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gennorm2) 8392e5b6d6dSopenharmony_ci 8402e5b6d6dSopenharmony_ci#### Unicode Character Data (Character names) 8412e5b6d6dSopenharmony_ci* Source format: 8422e5b6d6dSopenharmony_ci [source/data/unidata/UnicodeData.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/UnicodeData.txt): 8432e5b6d6dSopenharmony_ci [Unicode Character Database](http://www.unicode.org/onlinedat/online.html) 8442e5b6d6dSopenharmony_ci* Binary format: unames.icu: 8452e5b6d6dSopenharmony_ci [tools/unicode/c/genprops/namespropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/namespropsbuilder.cpp) 8462e5b6d6dSopenharmony_ci* Generator tool: 8472e5b6d6dSopenharmony_ci [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops) 8482e5b6d6dSopenharmony_ci 8492e5b6d6dSopenharmony_ci#### Unicode Character Data (Property [value] aliases since ICU 4.8; for Java only: hardcoded in C common library since ICU 4.8) 8502e5b6d6dSopenharmony_ci* Source format: [UCD Property*Aliases.txt](http://www.unicode.org/Public/UNIDATA/): 8512e5b6d6dSopenharmony_ci [Unicode Character Database](http://www.unicode.org/onlinedat/online.html) 8522e5b6d6dSopenharmony_ci* Binary format: pnames.icu: 8532e5b6d6dSopenharmony_ci [source/common/propname.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/propname.h) 8542e5b6d6dSopenharmony_ci* Generator tool: 8552e5b6d6dSopenharmony_ci [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops) 8562e5b6d6dSopenharmony_ci 8572e5b6d6dSopenharmony_ci#### Unicode Character Data (Text layout properties since ICU 64) 8582e5b6d6dSopenharmony_ci* Source format: 8592e5b6d6dSopenharmony_ci [source/data/unidata/ppucd.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/ppucd.txt): 8602e5b6d6dSopenharmony_ci [Preparsed UCD](https://icu.unicode.org/design/props/ppucd) 8612e5b6d6dSopenharmony_ci* Binary format: ulayout.icu: 8622e5b6d6dSopenharmony_ci [tools/unicode/c/genprops/layoutpropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/layoutpropsbuilder.cpp) 8632e5b6d6dSopenharmony_ci* Generator tool: 8642e5b6d6dSopenharmony_ci [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops) 8652e5b6d6dSopenharmony_ci 8662e5b6d6dSopenharmony_ci#### Unicode Character Data (Emoji properties since ICU 70) 8672e5b6d6dSopenharmony_ciEmoji properties of code points moved out of uprops.icu. 8682e5b6d6dSopenharmony_ciEmoji properties of strings added. 8692e5b6d6dSopenharmony_ci* Source format: 8702e5b6d6dSopenharmony_ci [source/data/unidata/emoji-sequences.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/emoji-sequences.txt) and 8712e5b6d6dSopenharmony_ci [source/data/unidata/emoji-zwj-sequences.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/emoji-zwj-sequences.txt): 8722e5b6d6dSopenharmony_ci [UTS #51 Data Files](https://www.unicode.org/reports/tr51/#Data_Files) 8732e5b6d6dSopenharmony_ci* Binary format: uemoji.icu: 8742e5b6d6dSopenharmony_ci [tools/unicode/c/genprops/emojipropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/emojipropsbuilder.cpp) 8752e5b6d6dSopenharmony_ci* Generator tool: 8762e5b6d6dSopenharmony_ci [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops) 8772e5b6d6dSopenharmony_ci 8782e5b6d6dSopenharmony_ci#### Collation data (root collation & tailorings; ICU 53 & later) 8792e5b6d6dSopenharmony_ci* Source format: Original data from allkeys_CLDR.txt in 8802e5b6d6dSopenharmony_ci [CLDR Root Collation Data Files](http://www.unicode.org/reports/tr35/tr35-collation.html#Root_Data_Files) 8812e5b6d6dSopenharmony_ci processed into 8822e5b6d6dSopenharmony_ci [source/data/unidata/FractionalUCA.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/FractionalUCA.txt) 8832e5b6d6dSopenharmony_ci by 8842e5b6d6dSopenharmony_ci [tool at unicode.org maintained by Mark Davis](https://sites.google.com/site/unicodetools/#TOC-UCA) 8852e5b6d6dSopenharmony_ci (call the Main class with option writeFractionalUCA); source tailorings (text rules) in 8862e5b6d6dSopenharmony_ci [source/data/coll/*.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/coll) 8872e5b6d6dSopenharmony_ci resource bundles: [Collation Customization chapter](../collation/customization/index.md). 8882e5b6d6dSopenharmony_ci* Binary format: ucadata.icu & binary tailorings in resource bundles: 8892e5b6d6dSopenharmony_ci [source/i18n/collationdatareader.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/i18n/collationdatareader.h) 8902e5b6d6dSopenharmony_ci* Generator tool: 8912e5b6d6dSopenharmony_ci [genuca](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genuca), 8922e5b6d6dSopenharmony_ci [genrb](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/genrb) 8932e5b6d6dSopenharmony_ci 8942e5b6d6dSopenharmony_ci#### Rule-based break iterator data 8952e5b6d6dSopenharmony_ci* Source format: .txt: [Boundary Analysis chapter](boundaryanalysis/index.md) 8962e5b6d6dSopenharmony_ci* Binary format: .brk: 8972e5b6d6dSopenharmony_ci [source/common/rbbidata.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/rbbidata.h) 8982e5b6d6dSopenharmony_ci* Generator tool: 8992e5b6d6dSopenharmony_ci [genbrk](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/genbrk) 9002e5b6d6dSopenharmony_ci 9012e5b6d6dSopenharmony_ci#### Dictionary-based break iterator data (ICU 50 & later) 9022e5b6d6dSopenharmony_ci* Source format: txt: [gendict.cpp 9032e5b6d6dSopenharmony_ci comments](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gendict/gendict.cpp) 9042e5b6d6dSopenharmony_ci* Binary format: .dict: see 9052e5b6d6dSopenharmony_ci [source/common/dictionarydata.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/dictionarydata.h 9062e5b6d6dSopenharmony_ci* Generator tool: 9072e5b6d6dSopenharmony_ci [gendict](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gendict) 9082e5b6d6dSopenharmony_ci 9092e5b6d6dSopenharmony_ci#### Rule-based transform (transliterator) data 9102e5b6d6dSopenharmony_ci* Source format: .txt (in resource bundles): [Transform Rule Tutorial chapter](transforms/general/rules.md) 9112e5b6d6dSopenharmony_ci* Binary format: Uses genrb to make binary format 9122e5b6d6dSopenharmony_ci* Generator tool: Does not apply 9132e5b6d6dSopenharmony_ci 9142e5b6d6dSopenharmony_ci#### Time zone data (ICU 4.4 & later) 9152e5b6d6dSopenharmony_ci* Source format: 9162e5b6d6dSopenharmony_ci [source/data/misc/zoneinfo64.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/misc/zoneinfo64.txt): 9172e5b6d6dSopenharmony_ci ftp://elsie.nci.nih.gov/pub/ tzdata<year><rev>.tar.gz 9182e5b6d6dSopenharmony_ci* Binary format: zoneinfo64.res (generated by genrb and 9192e5b6d6dSopenharmony_ci [tzcode tools](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/readme.txt)). 9202e5b6d6dSopenharmony_ci* Generator tool: Does not apply 9212e5b6d6dSopenharmony_ci 9222e5b6d6dSopenharmony_ci#### StringPrep profile data 9232e5b6d6dSopenharmony_ci* Source format: 9242e5b6d6dSopenharmony_ci [source/data/sprep/rfc3491.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/sprep/rfc3491.txt): 9252e5b6d6dSopenharmony_ci* Binary format: .spp: 9262e5b6d6dSopenharmony_ci [source/tools/gensprep/store.c](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gensprep/store.c) 9272e5b6d6dSopenharmony_ci* Generator tool: 9282e5b6d6dSopenharmony_ci [gensprep](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gensprep) 9292e5b6d6dSopenharmony_ci 9302e5b6d6dSopenharmony_ci#### Confusables data 9312e5b6d6dSopenharmony_ci* Source format: 9322e5b6d6dSopenharmony_ci [source/data/unidata/confusables.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/confusables.txt), 9332e5b6d6dSopenharmony_ci [source/data/unidata/confusablesWholeScript.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/confusablesWholeScript.txt) 9342e5b6d6dSopenharmony_ci* Binary format: .spp: 9352e5b6d6dSopenharmony_ci [confusables.cfu: source/i18n/uspoof_impl.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/i18n/uspoof_impl.h) 9362e5b6d6dSopenharmony_ci* Generator tool: [gencfu](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gencfu) 9372e5b6d6dSopenharmony_ci 9382e5b6d6dSopenharmony_ci### Public Data Files (old versions) 9392e5b6d6dSopenharmony_ci 9402e5b6d6dSopenharmony_ci#### Unicode Character Data (Normalization before ICU 4.4; for Java only: was hardcoded in C common library) 9412e5b6d6dSopenharmony_ci* Source format: 9422e5b6d6dSopenharmony_ci [source/data/unidata/*.txt]((https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata): 9432e5b6d6dSopenharmony_ci [Unicode Character Database](http://www.unicode.org/onlinedat/online.html) 9442e5b6d6dSopenharmony_ci* Binary format: unorm.icu: 9452e5b6d6dSopenharmony_ci [source/common/unormimp.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unormimp.h) 9462e5b6d6dSopenharmony_ci* Generator tool: gennorm 9472e5b6d6dSopenharmony_ci 9482e5b6d6dSopenharmony_ci#### Unicode Character Data (Property [value] aliases before ICU 4.8) 9492e5b6d6dSopenharmony_ci* Source format: source/data/unidata/Property*Aliases.txt: [Unicode Character Database](http://www.unicode.org/onlinedat/online.html) 9502e5b6d6dSopenharmony_ci* Binary format: pnames.icu: source/common/propname.h (ICU 4.6) 9512e5b6d6dSopenharmony_ci* Generator tool: genpname 9522e5b6d6dSopenharmony_ci 9532e5b6d6dSopenharmony_ci#### Collation data (UCA, code points to weights; ICU 52 & earlier) 9542e5b6d6dSopenharmony_ci* Source format: Same as in ICU 53 9552e5b6d6dSopenharmony_ci* Binary format: ucadata.icu & binary tailorings in resource bundles: source/i18n/ucol_imp.h (ICU 52) 9562e5b6d6dSopenharmony_ci* Generator tool: 9572e5b6d6dSopenharmony_ci [genuca](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genuca), 9582e5b6d6dSopenharmony_ci [genrb](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/genrb) 9592e5b6d6dSopenharmony_ci 9602e5b6d6dSopenharmony_ci#### Collation data (Inverse UCA, weights->code points; ICU 52 & earlier) 9612e5b6d6dSopenharmony_ci* Source format: Processed from FractionalUCA.txt like ICU 52 ucadata.icu 9622e5b6d6dSopenharmony_ci* Binary format: invuca.icu: source/i18n/ucol_imp.h (ICU 52) 9632e5b6d6dSopenharmony_ci* Generator tool: 9642e5b6d6dSopenharmony_ci [genuca](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genuca) 9652e5b6d6dSopenharmony_ci 9662e5b6d6dSopenharmony_ci#### Dictionary-based break iterator data (ICU 49 & earlier) 9672e5b6d6dSopenharmony_ci* Source format: .txt: genctd.cpp comments 9682e5b6d6dSopenharmony_ci* Binary format: ctd: see CompactTrieHeader in source/common/triedict.cpp 9692e5b6d6dSopenharmony_ci* Generator tool: genctd 9702e5b6d6dSopenharmony_ci 9712e5b6d6dSopenharmony_ci#### Time zone data (Before ICU 4.4) 9722e5b6d6dSopenharmony_ci* Source format: .source/data/misc/zoneinfo.txt (ICU 4.2): ftp://elsie.nci.nih.gov/pub/ tzdata<year><rev>.tar.gz 9732e5b6d6dSopenharmony_ci* Binary format: zoneinfo64.res (generated by genrb and 9742e5b6d6dSopenharmony_ci [tzcode tools](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/readme.txt)). 9752e5b6d6dSopenharmony_ci* Generator tool: Does not apply 9762e5b6d6dSopenharmony_ci 9772e5b6d6dSopenharmony_ci### Non-File API Binary Data 9782e5b6d6dSopenharmony_ci 9792e5b6d6dSopenharmony_ci#### Converter selector data 9802e5b6d6dSopenharmony_ci* Source format: none 9812e5b6d6dSopenharmony_ci* Binary format: 9822e5b6d6dSopenharmony_ci [source/common/ucnvsel.cpp](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucnvsel.cpp) 9832e5b6d6dSopenharmony_ci* Generator tool: 9842e5b6d6dSopenharmony_ci [ucnvsel_open()](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucnvsel.cpp) 9852e5b6d6dSopenharmony_ci 9862e5b6d6dSopenharmony_ci### Test-Only Data Files 9872e5b6d6dSopenharmony_ci 9882e5b6d6dSopenharmony_ci#### test.icu (for udata API testing) 9892e5b6d6dSopenharmony_ci* Source format: none (fixed output from gentest when not using -r or -j options) 9902e5b6d6dSopenharmony_ci* Binary format: test.icu: see `createData()` in 9912e5b6d6dSopenharmony_ci [source/tools/gentest/gentest.c](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gentest/gentest.c) 9922e5b6d6dSopenharmony_ci* Generator tool: 9932e5b6d6dSopenharmony_ci [gentest](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gentest/gentest.c) 9942e5b6d6dSopenharmony_ci 9952e5b6d6dSopenharmony_ci### Other Data Structures 9962e5b6d6dSopenharmony_ci 9972e5b6d6dSopenharmony_ci#### UCPTrie (C)/CodePointTrie (Java) (maps code points to integers) 9982e5b6d6dSopenharmony_ci* Source format: (public builder API) 9992e5b6d6dSopenharmony_ci* Binary format: 10002e5b6d6dSopenharmony_ci [ICU Code Point Tries design doc](https://icu.unicode.org/design/struct/utrie), 10012e5b6d6dSopenharmony_ci [icu4c/source/common/ucptrie_impl.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucptrie_impl.h) 10022e5b6d6dSopenharmony_ci* Generator tool: (builder class) 10032e5b6d6dSopenharmony_ci 10042e5b6d6dSopenharmony_ci#### UTrie2 (C)/Trie2 (Java) (maps code points to integers) 10052e5b6d6dSopenharmony_ci* Source format: (internal builder API) 10062e5b6d6dSopenharmony_ci* Binary format: 10072e5b6d6dSopenharmony_ci [ICU Code Point Tries design doc](https://icu.unicode.org/design/struct/utrie), 10082e5b6d6dSopenharmony_ci [icu4c/source/common/utrie2_impl.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/utrie2_impl.h) 10092e5b6d6dSopenharmony_ci* Generator tool: (builder class) 10102e5b6d6dSopenharmony_ci 10112e5b6d6dSopenharmony_ci#### BytesTrie (maps byte sequences to 32-bit integers) 10122e5b6d6dSopenharmony_ci* Source format: (public builder API) 10132e5b6d6dSopenharmony_ci* Binary format: 10142e5b6d6dSopenharmony_ci [BytesTrie design doc](https://icu.unicode.org/design/struct/tries/bytestrie), 10152e5b6d6dSopenharmony_ci [icu4c/source/common/unicode/bytestrie.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/bytestrie.h) 10162e5b6d6dSopenharmony_ci* Generator tool: (builder class) 10172e5b6d6dSopenharmony_ci 10182e5b6d6dSopenharmony_ci#### UCharsTrie (C++)/CharsTrie (Java) (maps 16-bit-Unicode strings to 32-bit integers) 10192e5b6d6dSopenharmony_ci* Source format: (public builder API) 10202e5b6d6dSopenharmony_ci* Binary format: 10212e5b6d6dSopenharmony_ci [UCharsTrie design doc](https://icu.unicode.org/design/struct/tries/ucharstrie), 10222e5b6d6dSopenharmony_ci [icu4c/source/common/unicode/ucharstrie.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/ucharstrie.h) 10232e5b6d6dSopenharmony_ci* Generator tool: (builder class) 10242e5b6d6dSopenharmony_ci 10252e5b6d6dSopenharmony_ci## ICU4J Resource Information 10262e5b6d6dSopenharmony_ci 10272e5b6d6dSopenharmony_ciStarting with release 2.1, ICU4J includes its own resource information which is 10282e5b6d6dSopenharmony_cicompletely independent of the JRE resource information. (Note, ICU4J 2.8 to 3.4, 10292e5b6d6dSopenharmony_citime zone information depends on the underlying JRE). The new ICU4J information 10302e5b6d6dSopenharmony_ciis equivalent to the information in ICU4C and many resources are, in fact, the 10312e5b6d6dSopenharmony_cisame binary files that ICU4C uses. 10322e5b6d6dSopenharmony_ci 10332e5b6d6dSopenharmony_ciBy default the ICU4J distribution includes all of the standard resource 10342e5b6d6dSopenharmony_ciinformation. It is located under the directory `com/ibm/icu/impl/data`. 10352e5b6d6dSopenharmony_ciDepending on the service, the data is in different locations and in different 10362e5b6d6dSopenharmony_ciformats. Note: This will continue to change from release to release, so clients 10372e5b6d6dSopenharmony_cishould not depend on the exact organization of the data in ICU4J. 10382e5b6d6dSopenharmony_ci 10392e5b6d6dSopenharmony_ci1. The primary **locale data** is under the directory icudt38b, as a set of 10402e5b6d6dSopenharmony_ci ".res" files whose names are the locale identifiers. Locale naming is 10412e5b6d6dSopenharmony_ci documented in the `com.ibm.icu.util.ULocale` class, and the use of these 10422e5b6d6dSopenharmony_ci names in searching for resources is documented in 10432e5b6d6dSopenharmony_ci `com.ibm.icu.util.UResourceBundle`. 10442e5b6d6dSopenharmony_ci 10452e5b6d6dSopenharmony_ci2. The **collation data** is under the directory `icudt38b/coll`, as a set of 10462e5b6d6dSopenharmony_ci ".res" files. 10472e5b6d6dSopenharmony_ci 10482e5b6d6dSopenharmony_ci3. The **rule-based transliterator data** is under the directory 10492e5b6d6dSopenharmony_ci `icudt38b/translit` as a set of ".res" files. (**Note:** the Han 10502e5b6d6dSopenharmony_ci transliterator test data is no longer included in the core icu4j.jar file by 10512e5b6d6dSopenharmony_ci default.) 10522e5b6d6dSopenharmony_ci 10532e5b6d6dSopenharmony_ci4. The **rule-based number format data** is under the directory `icudt38b/rbnf` 10542e5b6d6dSopenharmony_ci as a set of ".res" files. 10552e5b6d6dSopenharmony_ci 10562e5b6d6dSopenharmony_ci5. The **break iterator data** is directly under the data directory, as a set 10572e5b6d6dSopenharmony_ci of ".brk" files, named according to the type of break and the locale where 10582e5b6d6dSopenharmony_ci there are locale-specific versions. 10592e5b6d6dSopenharmony_ci 10602e5b6d6dSopenharmony_ci6. The **holiday data** is under the data directory, as a set of ".class" 10612e5b6d6dSopenharmony_ci files, named "HolidayBundle_" followed by the locale ID. 10622e5b6d6dSopenharmony_ci 10632e5b6d6dSopenharmony_ci7. The **character property data** as well as assorted **normalization data** 10642e5b6d6dSopenharmony_ci and default **unicode collation algorithm (UCA) data** is found under the 10652e5b6d6dSopenharmony_ci data directory as a set of ".icu" files. 10662e5b6d6dSopenharmony_ci 10672e5b6d6dSopenharmony_ci8. The **character set converter data** is under the directory `icudt38b/`, as 10682e5b6d6dSopenharmony_ci a set of ".cnv" files. These files are currently included only in 10692e5b6d6dSopenharmony_ci icu-charset.jar. 10702e5b6d6dSopenharmony_ci 10712e5b6d6dSopenharmony_ci9. The **time zone data** is named `zoneinfo.res` under the directory 10722e5b6d6dSopenharmony_ci `icudt38b`. 10732e5b6d6dSopenharmony_ci 10742e5b6d6dSopenharmony_ciSome of the data files alias or otherwise reference data from other data files. 10752e5b6d6dSopenharmony_ciOne reason for this is because some locale names have changed. For example, 10762e5b6d6dSopenharmony_cihe_IL used to be iw_IL. In order to support both names but not duplicate the 10772e5b6d6dSopenharmony_cidata, one of the resource files refers to the other file's data. In other cases, 10782e5b6d6dSopenharmony_cia file may alias a portion of another file's data in order to save space. 10792e5b6d6dSopenharmony_ciCurrently ICU4J provides no tool for revealing these dependencies. 10802e5b6d6dSopenharmony_ci 10812e5b6d6dSopenharmony_ci> :point_right: **Note**: Java's Locale class silently converts the language 10822e5b6d6dSopenharmony_cicode "he" to "iw" when you construct the Locale (for versions of Java through 10832e5b6d6dSopenharmony_ciJava 5). Thus Java cannot be used to locate resources that use the "he" language 10842e5b6d6dSopenharmony_cicode. ICU, on the other hand, does not perform this conversion in ULocale, and 10852e5b6d6dSopenharmony_ciinstead uses aliasing in the locale data to represent the same set of data under 10862e5b6d6dSopenharmony_cidifferent locale ids. 10872e5b6d6dSopenharmony_ci 10882e5b6d6dSopenharmony_ciResource files that use locale ids form a hierarchy, with up to four levels: a 10892e5b6d6dSopenharmony_ciroot, language, region (country), and variant. Searches for locale data attempt 10902e5b6d6dSopenharmony_cito match as far down the hierarchy as possible, for example, "he_IL" will match 10912e5b6d6dSopenharmony_cihe_IL, but "he_US" will match he (since there is no US variant for he, and 10922e5b6d6dSopenharmony_ci"xx_YY will match root (the default fallback locale) since there is no xx 10932e5b6d6dSopenharmony_cilanguage code in the locale hierarchy. Again, see `java.util.ResourceBundle` for 10942e5b6d6dSopenharmony_cimore information. 10952e5b6d6dSopenharmony_ci 10962e5b6d6dSopenharmony_ciCurrently ICU4J provides no tool for revealing these dependencies between data 10972e5b6d6dSopenharmony_cifiles, so trimming the data directly in the ICU4J project is a hit-or-miss 10982e5b6d6dSopenharmony_ciaffair. The key point when you remove data is to make sure to remove all 10992e5b6d6dSopenharmony_cidependencies on that data as well. For example, if you remove he.res, you need 11002e5b6d6dSopenharmony_cito remove he_IL.res, since it is lower in the hierarchy, and you must remove 11012e5b6d6dSopenharmony_ciiw.res, since it references he.res, and iw_IL.res, since it depends on it (and 11022e5b6d6dSopenharmony_cialso references he_IL.res). 11032e5b6d6dSopenharmony_ci 11042e5b6d6dSopenharmony_ciUnfortunately, the jar tool in the JDK provides no way to remove items from a 11052e5b6d6dSopenharmony_cijar file. Thus you have to extract the resources, remove the ones you don't 11062e5b6d6dSopenharmony_ciwant, and then create a new jar file with the remaining resources. See the jar 11072e5b6d6dSopenharmony_citool information for how to do this. Before 'rejaring' the files, be sure to 11082e5b6d6dSopenharmony_cithoroughly test your application with the remaining resources, making sure each 11092e5b6d6dSopenharmony_cirequired resource is present. 11102e5b6d6dSopenharmony_ci 11112e5b6d6dSopenharmony_ci#### Using additional resource files with ICU4J 11122e5b6d6dSopenharmony_ci 11132e5b6d6dSopenharmony_ci> :point_right: **Note**: Resource file formats can change across releases of ICU4J! 11142e5b6d6dSopenharmony_ci> 11152e5b6d6dSopenharmony_ci> *The format of ICU4J resources is not part of the API. Clients who develop their 11162e5b6d6dSopenharmony_ci> own resources for use with ICU4J should be prepared to regenerate them when they 11172e5b6d6dSopenharmony_ci> move to new releases of ICU4J.* 11182e5b6d6dSopenharmony_ci 11192e5b6d6dSopenharmony_ciWe are still developing ICU4J's resource mechanism. Currently it is not possible 11202e5b6d6dSopenharmony_cito mix icu's new binary .res resources with traditional java-style .class or 11212e5b6d6dSopenharmony_ci.txt resources. We might allow for this in a future release, but since the 11222e5b6d6dSopenharmony_ciresource data and format is not formally supported, you run the risk of 11232e5b6d6dSopenharmony_ciincompatibilities with future releases of ICU4J. 11242e5b6d6dSopenharmony_ci 11252e5b6d6dSopenharmony_ciResource data in ICU4J is checked in to the repository as a jar file containing 11262e5b6d6dSopenharmony_cithe resource binaries, icudata.jar. This means that inspecting the contents of 11272e5b6d6dSopenharmony_cithese resources is difficult. They currently are compiled from ICU4C .txt file 11282e5b6d6dSopenharmony_cidata. You can view the contents of the ICU4C text resource files to understand 11292e5b6d6dSopenharmony_cithe contents of the ICU4J resources. 11302e5b6d6dSopenharmony_ci 11312e5b6d6dSopenharmony_ciThe files in icudata.jar get extracted to com/ibm/icu/impl/data in the build 11322e5b6d6dSopenharmony_cidirectory when the 'core' target is built. Building the 'resources' target will 11332e5b6d6dSopenharmony_ciforce the resources to once again be extracted. Extraction will overwrite any 11342e5b6d6dSopenharmony_cicorresponding resource files already in that directory. 11352e5b6d6dSopenharmony_ci 11362e5b6d6dSopenharmony_ci### Building ICU4J Resources from ICU4C 11372e5b6d6dSopenharmony_ci 11382e5b6d6dSopenharmony_ci#### Requirements 11392e5b6d6dSopenharmony_ci 11402e5b6d6dSopenharmony_ci1. [ICU4C](https://icu.unicode.org/download) 11412e5b6d6dSopenharmony_ci 11422e5b6d6dSopenharmony_ci2. Compilers and tools required for [building ICU4C](../icu4c/build.md). 11432e5b6d6dSopenharmony_ci 11442e5b6d6dSopenharmony_ci3. J2SE SDK version 5 or above 11452e5b6d6dSopenharmony_ci 11462e5b6d6dSopenharmony_ci#### Procedure 11472e5b6d6dSopenharmony_ci 11482e5b6d6dSopenharmony_ci1. Download and build ICU4C on a Windows or Linux machine. For instructions on downloading and building ICU4C, please click 11492e5b6d6dSopenharmony_ci [here](../icu4c/build.md). 11502e5b6d6dSopenharmony_ci 11512e5b6d6dSopenharmony_ci2. Follow the remaining instructions in 11522e5b6d6dSopenharmony_ci the [ICU4J Readme](../icu4j/). 1153