12e5b6d6dSopenharmony_ci---
22e5b6d6dSopenharmony_cilayout: default
32e5b6d6dSopenharmony_cititle: ICU Data
42e5b6d6dSopenharmony_cinav_order: 1600
52e5b6d6dSopenharmony_cihas_children: true
62e5b6d6dSopenharmony_ci---
72e5b6d6dSopenharmony_ci<!--
82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others.
92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html
102e5b6d6dSopenharmony_ci-->
112e5b6d6dSopenharmony_ci
122e5b6d6dSopenharmony_ci# ICU Data
132e5b6d6dSopenharmony_ci{: .no_toc }
142e5b6d6dSopenharmony_ci
152e5b6d6dSopenharmony_ci## Contents
162e5b6d6dSopenharmony_ci{: .no_toc .text-delta }
172e5b6d6dSopenharmony_ci
182e5b6d6dSopenharmony_ci1. TOC
192e5b6d6dSopenharmony_ci{:toc}
202e5b6d6dSopenharmony_ci
212e5b6d6dSopenharmony_ci---
222e5b6d6dSopenharmony_ci
232e5b6d6dSopenharmony_ci## Overview
242e5b6d6dSopenharmony_ci
252e5b6d6dSopenharmony_ciICU makes use of a wide variety of data tables to provide many of its services.
262e5b6d6dSopenharmony_ciExamples include converter mapping tables, collation rules, transliteration
272e5b6d6dSopenharmony_cirules, break iterator rules and dictionaries, and other locale data. Additional
282e5b6d6dSopenharmony_cidata can be provided by users, either as customizations of ICU's data or as new
292e5b6d6dSopenharmony_cidata altogether.
302e5b6d6dSopenharmony_ci
312e5b6d6dSopenharmony_ciThis section describes how ICU data is stored and located at run time. It also
322e5b6d6dSopenharmony_cidescribes how ICU data can be customized to suit the needs of a particular
332e5b6d6dSopenharmony_ciapplication.
342e5b6d6dSopenharmony_ci
352e5b6d6dSopenharmony_ciFor simple use of ICU's predefined data, this section on data management can
362e5b6d6dSopenharmony_cisafely be skipped. The data is built into a library that is loaded along with
372e5b6d6dSopenharmony_cithe rest of ICU. No specific action or setup is required of either the
382e5b6d6dSopenharmony_ciapplication program or the execution environment.
392e5b6d6dSopenharmony_ci
402e5b6d6dSopenharmony_ciUpdate: as of ICU 64, the standard data library is over 20 MB in size. We have
412e5b6d6dSopenharmony_ciintroduced a new tool, the [ICU Data Build Tool](./buildtool.md),
422e5b6d6dSopenharmony_cito give you more control over what goes into your ICU locale data file.
432e5b6d6dSopenharmony_ci
442e5b6d6dSopenharmony_ci> :point_right: **Note**: ICU for C by default comes with pre-built data.
452e5b6d6dSopenharmony_ci> The source data files are included as an "icu\*data.zip" file starting in ICU4C 49.
462e5b6d6dSopenharmony_ci> Previously, they were not included unless ICU is downloaded from the [source repository](https://icu.unicode.org/repository).
472e5b6d6dSopenharmony_ci
482e5b6d6dSopenharmony_ci## ICU and CLDR Data
492e5b6d6dSopenharmony_ci
502e5b6d6dSopenharmony_ciMost of ICU's data is sourced from [CLDR](http://cldr.unicode.org), the [Common
512e5b6d6dSopenharmony_ciLocale Data Repository](http://cldr.unicode.org) project. Do not file bugs
522e5b6d6dSopenharmony_ciagainst ICU to request data changes in CLDR, see the CLDR project's page itself.
532e5b6d6dSopenharmony_ciAlso note that most ICU data files are therefore autogenerated from CLDR, and so
542e5b6d6dSopenharmony_cimanually editing them is not usually recommended.
552e5b6d6dSopenharmony_ci
562e5b6d6dSopenharmony_ciData which is NOT sourced from CLDR includes:
572e5b6d6dSopenharmony_ci
582e5b6d6dSopenharmony_ci*   [Conversion Data](conversion/data.md)
592e5b6d6dSopenharmony_ci*   Break Iterator Dictionary Data ( Thai, CJK, etc )
602e5b6d6dSopenharmony_ci*   Break Iterator Rule Data (as of this writing, it is manually kept in sync
612e5b6d6dSopenharmony_ci    with the CLDR datasets)
622e5b6d6dSopenharmony_ci
632e5b6d6dSopenharmony_ciFor information on building ICU data from CLDR, see the
642e5b6d6dSopenharmony_ci[cldr-icu-readme](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt).
652e5b6d6dSopenharmony_ci
662e5b6d6dSopenharmony_ci## ICU Data Directory
672e5b6d6dSopenharmony_ci
682e5b6d6dSopenharmony_ciThe ICU data directory is the default location for all ICU data. Any requests
692e5b6d6dSopenharmony_cifor data items that do not include an explicit directory path will be resolved
702e5b6d6dSopenharmony_cito files located in the ICU data directory.
712e5b6d6dSopenharmony_ci
722e5b6d6dSopenharmony_ciThe ICU data directory is determined as follows:
732e5b6d6dSopenharmony_ci
742e5b6d6dSopenharmony_ci1.  If the application has called the function `u_setDataDirectory()`, use the
752e5b6d6dSopenharmony_ci    directory specified there, otherwise:
762e5b6d6dSopenharmony_ci
772e5b6d6dSopenharmony_ci2.  If the environment variable `ICU_DATA` is set, use that, otherwise:
782e5b6d6dSopenharmony_ci
792e5b6d6dSopenharmony_ci3.  If the C preprocessor variable `ICU_DATA_DIR` was set at the time ICU was
802e5b6d6dSopenharmony_ci    built, use its compiled-in value.
812e5b6d6dSopenharmony_ci
822e5b6d6dSopenharmony_ci4.  Otherwise, the ICU data directory is an empty string. This is the default
832e5b6d6dSopenharmony_ci    behavior for ICU using a shared library for its data and provides the
842e5b6d6dSopenharmony_ci    highest data loading performance.
852e5b6d6dSopenharmony_ci
862e5b6d6dSopenharmony_ci> :point_right: **Note**: `u_setDataDirectory()` is not thread-safe. Call it
872e5b6d6dSopenharmony_ci> *before* calling ICU APIs from multiple threads. If you use both
882e5b6d6dSopenharmony_ci> `u_setDataDirectory()` and `u_init()`, then use `u_setDataDirectory()` first.
892e5b6d6dSopenharmony_ci> 
902e5b6d6dSopenharmony_ci> *Earlier versions of ICU supported two additional schemes: setting a data
912e5b6d6dSopenharmony_ci> directory relative to the location of the ICU shared libraries, and on Windows,
922e5b6d6dSopenharmony_ci> taking a location from the registry. These have both been removed to make the
932e5b6d6dSopenharmony_ci> behavior more predictable and easier to understand.*
942e5b6d6dSopenharmony_ci
952e5b6d6dSopenharmony_ciThe ICU data directory does not need to be set in order to reference the
962e5b6d6dSopenharmony_cistandard built-in ICU data. Applications that just use standard ICU capabilities
972e5b6d6dSopenharmony_ci(converters, locales, collation, etc.) but do not build and reference their own
982e5b6d6dSopenharmony_cidata do not need to specify an ICU data directory.
992e5b6d6dSopenharmony_ci
1002e5b6d6dSopenharmony_ci### Multiple-Item ICU Data Directory Values
1012e5b6d6dSopenharmony_ci
1022e5b6d6dSopenharmony_ciThe ICU data directory string can contain multiple directories as well as .dat
1032e5b6d6dSopenharmony_cipath/filenames. They must be separated by the path separator that is used on the
1042e5b6d6dSopenharmony_ciplatform, for example a semicolon (`;`) on Windows. Data files will be searched in
1052e5b6d6dSopenharmony_ciall directories and .dat package files in the order of the directory string. For
1062e5b6d6dSopenharmony_cidetails, see the example below.
1072e5b6d6dSopenharmony_ci
1082e5b6d6dSopenharmony_ci## Default ICU Data
1092e5b6d6dSopenharmony_ci
1102e5b6d6dSopenharmony_ciThe default ICU data consists of the data needed for the converters, collators,
1112e5b6d6dSopenharmony_cilocales, etc. that are provided with ICU. Default data must be present in order
1122e5b6d6dSopenharmony_cifor ICU to function.
1132e5b6d6dSopenharmony_ci
1142e5b6d6dSopenharmony_ciThe default data is most commonly built into a shared library that is installed
1152e5b6d6dSopenharmony_ciwith the other ICU libraries. Nothing is required of the application for this
1162e5b6d6dSopenharmony_cimechanism to work. ICU provides additional options for loading the default data
1172e5b6d6dSopenharmony_ciif more flexibility is required.
1182e5b6d6dSopenharmony_ci
1192e5b6d6dSopenharmony_ciHere are the steps followed by ICU to locate its default data. This procedure
1202e5b6d6dSopenharmony_cihappens only once per process, at the time an ICU data item is first requested.
1212e5b6d6dSopenharmony_ci
1222e5b6d6dSopenharmony_ci1.  If the application has called the function `udata_setCommonData()`, use the
1232e5b6d6dSopenharmony_ci    data that was provided. The application specifies the address in memory of
1242e5b6d6dSopenharmony_ci    an image of an ICU common format data file (either in shared-library format
1252e5b6d6dSopenharmony_ci    or .dat package file format).
1262e5b6d6dSopenharmony_ci
1272e5b6d6dSopenharmony_ci2.  Examine the contents of the default ICU data shared library. If it contains
1282e5b6d6dSopenharmony_ci    data, use that data. If the data library is empty, a stub library, proceed
1292e5b6d6dSopenharmony_ci    to the next step. (A data shared library must always be present in order for
1302e5b6d6dSopenharmony_ci    ICU to successfully link and load. A stub data library is used when the
1312e5b6d6dSopenharmony_ci    actual ICU common data is to be provided from another source).
1322e5b6d6dSopenharmony_ci
1332e5b6d6dSopenharmony_ci3.  Dynamically load (memory map, typically) a common format (.dat) file
1342e5b6d6dSopenharmony_ci    containing the default ICU data. Loading is described in the section
1352e5b6d6dSopenharmony_ci    [How Data Loading Works](#how-data-loading-works). The path to
1362e5b6d6dSopenharmony_ci    the data is of the form  "icudt\<version\>\<flag\>", where \<version\> is
1372e5b6d6dSopenharmony_ci    the two-digit ICU version number, and \<flag\> is a letter indicating the
1382e5b6d6dSopenharmony_ci    internal format of the file (see the
1392e5b6d6dSopenharmony_ci    [Sharing ICU Data Between Platforms](#sharing-icu-data-between-platforms)
1402e5b6d6dSopenharmony_ci    section).
1412e5b6d6dSopenharmony_ci
1422e5b6d6dSopenharmony_ciOnce the default ICU data has been located, loading of individual data items
1432e5b6d6dSopenharmony_ciproceeds as described in the section
1442e5b6d6dSopenharmony_ci[How Data Loading Works](#how-data-loading-works).
1452e5b6d6dSopenharmony_ci
1462e5b6d6dSopenharmony_ci## Building and Linking against ICU data
1472e5b6d6dSopenharmony_ci
1482e5b6d6dSopenharmony_ciWhen using ICU's configure or runConfigureICU tool to build, several different
1492e5b6d6dSopenharmony_cimethods of packging are available.
1502e5b6d6dSopenharmony_ci
1512e5b6d6dSopenharmony_ci> :point_right: **Note**: in all cases, you **must** link all ICU tools and
1522e5b6d6dSopenharmony_ciapplications against a "data library": either a data library containing the ICU
1532e5b6d6dSopenharmony_cidata, or against the "stubdata" library located in icu/source/stubdata. For
1542e5b6d6dSopenharmony_ciexample, even if ICU is built in "files" mode, you must still link against the
1552e5b6d6dSopenharmony_ci"stubdata" library or an undefined symbol error occurs.
1562e5b6d6dSopenharmony_ci
1572e5b6d6dSopenharmony_ci*   `--with-data-packaging=library`
1582e5b6d6dSopenharmony_ci    This mode builds a shared library (DLL or .so). This is the simplest mode to
1592e5b6d6dSopenharmony_ci    use, and is the default.
1602e5b6d6dSopenharmony_ci    To use: link your application against the common and data libraries.
1612e5b6d6dSopenharmony_ci    This is the only directly supported behavior on Windows builds.
1622e5b6d6dSopenharmony_ci*   `--with-data-packaging=static`
1632e5b6d6dSopenharmony_ci    This option builds ICU data as a single (large) static library. This mode is
1642e5b6d6dSopenharmony_ci    more complex to use. If you encounter errors, you may need to build ICU
1652e5b6d6dSopenharmony_ci    multiple times.
1662e5b6d6dSopenharmony_ci*   `--with-data-packaging=files`
1672e5b6d6dSopenharmony_ci    With this option, ICU outputs separate individual files (.res, .cnv, etc)
1682e5b6d6dSopenharmony_ci    which will be loaded at runtime. Read the rest of this document, especially
1692e5b6d6dSopenharmony_ci    the sections that discuss the ICU directory path.
1702e5b6d6dSopenharmony_ci*   `--with-data-packaging=archive`
1712e5b6d6dSopenharmony_ci    With this option, ICU outputs a single "icudt__.dat" file containing ICU
1722e5b6d6dSopenharmony_ci    data. Read the rest of this document, especially the sections that discuss
1732e5b6d6dSopenharmony_ci    the ICU directory path.
1742e5b6d6dSopenharmony_ci
1752e5b6d6dSopenharmony_ci## Time Zone Data
1762e5b6d6dSopenharmony_ci
1772e5b6d6dSopenharmony_ciBecause time zone data requires frequent updates in response to countries
1782e5b6d6dSopenharmony_cichanging their transition dates for daylight saving time, ICU provides
1792e5b6d6dSopenharmony_ciadditional options for loading time zone data from separate files, thus avoiding
1802e5b6d6dSopenharmony_cithe need to update a combined ICU data package. Further information is found
1812e5b6d6dSopenharmony_ciunder [Time Zones](../datetime/timezone/index.md).
1822e5b6d6dSopenharmony_ci
1832e5b6d6dSopenharmony_ci## Application Data
1842e5b6d6dSopenharmony_ci
1852e5b6d6dSopenharmony_ciICU-based applications can ship and use their own data for localized strings,
1862e5b6d6dSopenharmony_cicustom conversion tables, etc. Each data item file must have a package name as a
1872e5b6d6dSopenharmony_ciprefix, and this package name must match the basename of a .dat package file, if
1882e5b6d6dSopenharmony_cione is used. The package name must be used in ICU APIs, for example in
1892e5b6d6dSopenharmony_ci`udata_setAppData()` (instead of `udata_setCommonData()` which is only used for
1902e5b6d6dSopenharmony_ciICU's own data) and in the pathname argument of `ures_open()`.
1912e5b6d6dSopenharmony_ci
1922e5b6d6dSopenharmony_ciThe only real difference to ICU's own data is that application data cannot be
1932e5b6d6dSopenharmony_cisimply loaded by specifying a NULL value for the path arguments of ICU APIs, and
1942e5b6d6dSopenharmony_ciapplication data will not be used by APIs that do not have path/package name
1952e5b6d6dSopenharmony_ciarguments at all.
1962e5b6d6dSopenharmony_ci
1972e5b6d6dSopenharmony_ciThe most important APIs that allow application data to be used are for Resource
1982e5b6d6dSopenharmony_ciBundles, which are most often used for localized strings and other data. There
1992e5b6d6dSopenharmony_ciare also functions like `ucnv_openPackage()` that allow to specify application
2002e5b6d6dSopenharmony_cidata, and the `udata.h` API can be used to load any data with minimum
2012e5b6d6dSopenharmony_cirequirements on the binary format, and without ICU interpreting the contents of
2022e5b6d6dSopenharmony_cithe data.
2032e5b6d6dSopenharmony_ci
2042e5b6d6dSopenharmony_ciThe `pkgdata` tool, which is used to package the data into various formats (e.g.
2052e5b6d6dSopenharmony_cishared library), has an option (`--without-assembly` or `-w`) to not use
2062e5b6d6dSopenharmony_ciassembly code when building and packaging the application specific data into a
2072e5b6d6dSopenharmony_cishared library. Building the data with assembly code, which is enabled by
2082e5b6d6dSopenharmony_cidefault, is faster and more efficient; however, there are some platform
2092e5b6d6dSopenharmony_cispecific issues that may arise. The `--without-assembly` option may be
2102e5b6d6dSopenharmony_cinecessary on certain platforms (e.g. Linux) which have trouble properly loading
2112e5b6d6dSopenharmony_ciapplication data when it was built with assembly code and is packaged as a
2122e5b6d6dSopenharmony_cishared library.
2132e5b6d6dSopenharmony_ci
2142e5b6d6dSopenharmony_ci## Alignment
2152e5b6d6dSopenharmony_ci
2162e5b6d6dSopenharmony_ciICU data is designed to be 16-aligned, with natural alignment of values inside
2172e5b6d6dSopenharmony_cithe data structure, so that the data is usable as is when memory-mapped.
2182e5b6d6dSopenharmony_ci("16-aligned" means that the start address is a multiple of 16 bytes.)
2192e5b6d6dSopenharmony_ci
2202e5b6d6dSopenharmony_ciMemory-mapping (as well as memory allocation) provides at least 16-alignment on
2212e5b6d6dSopenharmony_cimodern platforms. Some CPUs require n-alignment of types of size n bytes (and
2222e5b6d6dSopenharmony_cicrash on unaligned reads), other CPUs usually operate faster on data that is
2232e5b6d6dSopenharmony_cialigned properly.
2242e5b6d6dSopenharmony_ci
2252e5b6d6dSopenharmony_ciSome of the ICU code explicitly checks for proper alignment.
2262e5b6d6dSopenharmony_ci
2272e5b6d6dSopenharmony_ciThe `icupkg` tool places data items into the .dat file at start offsets that are
2282e5b6d6dSopenharmony_cimultiples of 16 bytes.
2292e5b6d6dSopenharmony_ci
2302e5b6d6dSopenharmony_ciWhen using `genccode` to directly write a .o/.obj file, or to write assembler
2312e5b6d6dSopenharmony_cicode, it specifies at least 16-alignment. When using `genccode` to write C code,
2322e5b6d6dSopenharmony_ciit prepends the data with a double value which should yield at least 8-alignment
2332e5b6d6dSopenharmony_cion most platforms (usually `sizeof(double)=8`).
2342e5b6d6dSopenharmony_ci
2352e5b6d6dSopenharmony_ci## Flexibility vs. Installation vs. Performance
2362e5b6d6dSopenharmony_ci
2372e5b6d6dSopenharmony_ciThere are choices that affect ICU data loading and depend on application
2382e5b6d6dSopenharmony_cirequirements.
2392e5b6d6dSopenharmony_ci
2402e5b6d6dSopenharmony_ci### Data in Shared Libraries/DLLs vs. .dat package files
2412e5b6d6dSopenharmony_ci
2422e5b6d6dSopenharmony_ciBuilding ICU data into shared libraries (`--with-data-packaging=library`) is the
2432e5b6d6dSopenharmony_cimost convenient packaging method because shared libraries (DLLs) are easily
2442e5b6d6dSopenharmony_cifound if they are in the same directory as the application libraries, or if they
2452e5b6d6dSopenharmony_ciare on the system library path. The application installer usually just copies
2462e5b6d6dSopenharmony_cithe ICU shared libraries in the same place. On the other hand, shared libraries
2472e5b6d6dSopenharmony_ciare not portable.
2482e5b6d6dSopenharmony_ci
2492e5b6d6dSopenharmony_ciPackaging data into .dat files (`--with-data-packaging=archive`) allows them to
2502e5b6d6dSopenharmony_cibe shared across platforms, but they must either be loaded by the application
2512e5b6d6dSopenharmony_ciand set with `udata_setCommonData()` or `udata_setAppData()`, or they must be
2522e5b6d6dSopenharmony_ciin a known location that is included in the ICU data directory string. This
2532e5b6d6dSopenharmony_cirequires the application installer, or the application itself at runtime, to
2542e5b6d6dSopenharmony_cilocate the ICU and/or application data by setting the ICU data directory (see
2552e5b6d6dSopenharmony_cithe [ICU Data Directory](#icu-data-directory) section above) or by
2562e5b6d6dSopenharmony_ciloading the data and providing it to one of the `udata_setXYZData()` functions.
2572e5b6d6dSopenharmony_ci
2582e5b6d6dSopenharmony_ciUnlike shared libraries, .dat package files can be taken apart into separate
2592e5b6d6dSopenharmony_cidata item files with the decmn ICU tool. This allows post-installation
2602e5b6d6dSopenharmony_cimodification of a package file. The `gencmn` and `pkgdata` ICU tools can then be
2612e5b6d6dSopenharmony_ciused to reassemble the .dat package file.
2622e5b6d6dSopenharmony_ci
2632e5b6d6dSopenharmony_ciFor more information about .dat package files see the section [Sharing ICU Data
2642e5b6d6dSopenharmony_ciBetween Platforms](#sharing-icu-data-between-platforms) below.
2652e5b6d6dSopenharmony_ci
2662e5b6d6dSopenharmony_ci### Data Overriding vs. Loading Performance
2672e5b6d6dSopenharmony_ci
2682e5b6d6dSopenharmony_ciIf the ICU data directory string is empty, then ICU will not attempt to load
2692e5b6d6dSopenharmony_cidata from the file system. It is then only possible to load data from the
2702e5b6d6dSopenharmony_cilinked-in shared library or via `udata_setCommonData()` and
2712e5b6d6dSopenharmony_ci`udata_setAppData()`. This is inflexible but provides the highest performance.
2722e5b6d6dSopenharmony_ci
2732e5b6d6dSopenharmony_ciIf the ICU data directory string is not empty, then data items are searched in
2742e5b6d6dSopenharmony_ciall directories and matching .dat files mentioned before checking in
2752e5b6d6dSopenharmony_cialready-loaded package files. This allows overriding of packaged data items with
2762e5b6d6dSopenharmony_cisingle files after installation but costs some time for filesystem accesses.
2772e5b6d6dSopenharmony_ciThis is usually done only once per data item; see
2782e5b6d6dSopenharmony_ci[User Data Caching](#user-data-caching) below.
2792e5b6d6dSopenharmony_ci
2802e5b6d6dSopenharmony_ci### Single Data Files vs. Packages
2812e5b6d6dSopenharmony_ci
2822e5b6d6dSopenharmony_ciSingle data files (`--with-data-packaging=files`) are easy to replace and can
2832e5b6d6dSopenharmony_cioverride items inside data packages. However, it is usually desirable to reduce
2842e5b6d6dSopenharmony_cithe number of files during installation, and package files use less disk space
2852e5b6d6dSopenharmony_cithan many small files.
2862e5b6d6dSopenharmony_ci
2872e5b6d6dSopenharmony_ci## How Data Loading Works
2882e5b6d6dSopenharmony_ci
2892e5b6d6dSopenharmony_ciICU data items are referenced by three names - a path, a name and a type. The
2902e5b6d6dSopenharmony_cifollowing are some examples:
2912e5b6d6dSopenharmony_ci
2922e5b6d6dSopenharmony_cipath                         |   name   | type
2932e5b6d6dSopenharmony_ci-----------------------------|----------|-------
2942e5b6d6dSopenharmony_ci c:\\some\\path\\dataLibName | test     | dat
2952e5b6d6dSopenharmony_ci no path                     | cnvalias | icu
2962e5b6d6dSopenharmony_ci no path                     | cp1252   | cnv
2972e5b6d6dSopenharmony_ci no path                     | en       | res
2982e5b6d6dSopenharmony_ci no path                     | uprops   | icu
2992e5b6d6dSopenharmony_ci
3002e5b6d6dSopenharmony_ci
3012e5b6d6dSopenharmony_ciItems with 'no path' specified are loaded from the default ICU data.
3022e5b6d6dSopenharmony_ci
3032e5b6d6dSopenharmony_ciApplication data items include a path, and will be loaded from user data files,
3042e5b6d6dSopenharmony_cinot from the ICU default data. For application data, the path argument need not
3052e5b6d6dSopenharmony_cicontain an actual directory, but must contain the application data's package
3062e5b6d6dSopenharmony_ciname after the last directory separator character (or by itself if there is no
3072e5b6d6dSopenharmony_cidirectory). If the path argument contains a directory, then it is logically
3082e5b6d6dSopenharmony_ciprepended to the ICU data directory string and searched first for data. The path
3092e5b6d6dSopenharmony_ciargument can contain at most one directory. (Path separators like semicolon (;)
3102e5b6d6dSopenharmony_ciare not handled here.)
3112e5b6d6dSopenharmony_ci
3122e5b6d6dSopenharmony_ci> :point_right: **Note**: The ICU data directory string itself may
3132e5b6d6dSopenharmony_cicontain multiple directories and path/filenames to .dat package files. See the
3142e5b6d6dSopenharmony_ci[ICU Data Directory](#icu-data-directory) section.
3152e5b6d6dSopenharmony_ci
3162e5b6d6dSopenharmony_ciIt is recommended to not include the directory in the path argument but to make
3172e5b6d6dSopenharmony_cisure via setting the application data or the ICU data directory string that the
3182e5b6d6dSopenharmony_cidata can be located. This simplifies program maintenance and improves
3192e5b6d6dSopenharmony_cirobustness.
3202e5b6d6dSopenharmony_ci
3212e5b6d6dSopenharmony_ciSee the API descriptions for the functions `udata_open()` and
3222e5b6d6dSopenharmony_ci`udata_openChoice()` for additional information on opening ICU data from within
3232e5b6d6dSopenharmony_cian application.
3242e5b6d6dSopenharmony_ci
3252e5b6d6dSopenharmony_ciData items can exist as individual files, or a number of them can be packaged
3262e5b6d6dSopenharmony_citogether in a single file for greater efficiency in loading and convenience of
3272e5b6d6dSopenharmony_cidistribution. The combined files are called Common Files.
3282e5b6d6dSopenharmony_ci
3292e5b6d6dSopenharmony_ciBased on the supplied path and name, ICU searches several possible locations
3302e5b6d6dSopenharmony_ciwhen opening data. To make things more concrete in the following descriptions,
3312e5b6d6dSopenharmony_cithe following values of path, name and type are used:
3322e5b6d6dSopenharmony_ci
3332e5b6d6dSopenharmony_ci```
3342e5b6d6dSopenharmony_cipath = "c:\\some\\path\\dataLibName"
3352e5b6d6dSopenharmony_ciname = "test"
3362e5b6d6dSopenharmony_citype = "res"
3372e5b6d6dSopenharmony_ci```
3382e5b6d6dSopenharmony_ci
3392e5b6d6dSopenharmony_ciIn this case, "dataLibName" is the "package name" part of the path argument, and
3402e5b6d6dSopenharmony_ci"c:\\some\\path\\" is the directory part of it.
3412e5b6d6dSopenharmony_ci
3422e5b6d6dSopenharmony_ciThe search sequence for the data for "test.res" is as follows (the first
3432e5b6d6dSopenharmony_cisuccessful loading attempt wins):
3442e5b6d6dSopenharmony_ci
3452e5b6d6dSopenharmony_ci1.  Try to load the file "dataLibName_test.res" from c:\\some\\data\\.
3462e5b6d6dSopenharmony_ci
3472e5b6d6dSopenharmony_ci2.  Try to load the file "dataLibName_test.res" from each of the directories in
3482e5b6d6dSopenharmony_ci    the ICU data directory string.
3492e5b6d6dSopenharmony_ci
3502e5b6d6dSopenharmony_ci3.  Try to locate the data package for the package name "dataLibName".
3512e5b6d6dSopenharmony_ci
3522e5b6d6dSopenharmony_ci1.  Try to locate the data package in the internal cache.
3532e5b6d6dSopenharmony_ci
3542e5b6d6dSopenharmony_ci2.  Try to load the package file "dataLibName.dat" from c:\\some\\data\\.
3552e5b6d6dSopenharmony_ci
3562e5b6d6dSopenharmony_ci3.  Try to load the package file "dataLibName.dat" from each of the directories
3572e5b6d6dSopenharmony_ci    in the ICU data directory string.
3582e5b6d6dSopenharmony_ci
3592e5b6d6dSopenharmony_ciThe first steps, loading the data item from an individual file, are omitted if
3602e5b6d6dSopenharmony_cino directory is specified in either the path argument or the ICU data directory
3612e5b6d6dSopenharmony_cistring.
3622e5b6d6dSopenharmony_ci
3632e5b6d6dSopenharmony_ciPackage files are loaded at most once and then cached. They are identified only
3642e5b6d6dSopenharmony_ciby their package name. Whenever a data item is requested from a package and that
3652e5b6d6dSopenharmony_cipackage has been loaded before, then the cached package is used immediately
3662e5b6d6dSopenharmony_ciinstead of searching through the filesystem.
3672e5b6d6dSopenharmony_ci
3682e5b6d6dSopenharmony_ci> :point_right: **Note**: ICU versions before 2.2 always searched data packages
3692e5b6d6dSopenharmony_cibefore looking for individual files, which made it impossible to override
3702e5b6d6dSopenharmony_cipackaged data items. See the ICU 2.2 download page and the readme for more
3712e5b6d6dSopenharmony_ciinformation about the changes.
3722e5b6d6dSopenharmony_ci
3732e5b6d6dSopenharmony_ci## User Data Caching
3742e5b6d6dSopenharmony_ci
3752e5b6d6dSopenharmony_ciOnce loaded, data package files are cached, and stay loaded for the duration of
3762e5b6d6dSopenharmony_cithe process. Any requests for data items from an already loaded data package
3772e5b6d6dSopenharmony_cifile are routed directly to the cached data. No additional search for loadable
3782e5b6d6dSopenharmony_cifiles is made.
3792e5b6d6dSopenharmony_ci
3802e5b6d6dSopenharmony_ciThe user data cache is keyed by the base file name portion of the requested
3812e5b6d6dSopenharmony_cipath, with any directory portion stripped off and ignored. Using the previous
3822e5b6d6dSopenharmony_ciexample, for the path name "c:\\some\\path\\dataLibName", the cache key is
3832e5b6d6dSopenharmony_ci"dataLibName". After this is cached, a subsequent request for "dataLibName", no
3842e5b6d6dSopenharmony_cimatter what directory path is specified, will resolve to the cached data.
3852e5b6d6dSopenharmony_ci
3862e5b6d6dSopenharmony_ciData can be explicitly added to the cache of common format data by means of the
3872e5b6d6dSopenharmony_ci`udata_setAppData()` function. This function takes as input the path (name) and
3882e5b6d6dSopenharmony_cia pointer to a memory image of a .dat file. The data is added to the cache,
3892e5b6d6dSopenharmony_cicausing any subsequent requests for data items from that file name to be routed
3902e5b6d6dSopenharmony_cito the cache.
3912e5b6d6dSopenharmony_ci
3922e5b6d6dSopenharmony_ciOnly data package files are cached. Separate data files that contain just a
3932e5b6d6dSopenharmony_cisingle data item are not cached; for these, multiple requests to ICU to open the
3942e5b6d6dSopenharmony_cidata will result in multiple requests to the operating system to open the
3952e5b6d6dSopenharmony_ciunderlying file.
3962e5b6d6dSopenharmony_ci
3972e5b6d6dSopenharmony_ciHowever, most ICU services (Resource Bundles, conversion, etc.) themselves cache
3982e5b6d6dSopenharmony_ciloaded data, so that data is usually loaded only once until the end of the
3992e5b6d6dSopenharmony_ciprocess (or until `u_cleanup()` or `ucnv_flushCache()` or similar are called.)
4002e5b6d6dSopenharmony_ci
4012e5b6d6dSopenharmony_ciThere is no mechanism for removing or updating cached data files.
4022e5b6d6dSopenharmony_ci
4032e5b6d6dSopenharmony_ci## Directory Separator Characters
4042e5b6d6dSopenharmony_ci
4052e5b6d6dSopenharmony_ciIf a directory separator (generally '/' or '\\') is needed in a path parameter,
4062e5b6d6dSopenharmony_ciuse the form that is native to the platform. The ICU header `"putil.h"` defines
4072e5b6d6dSopenharmony_ci`U_FILE_SEP_CHAR` appropriately for the platform.
4082e5b6d6dSopenharmony_ci
4092e5b6d6dSopenharmony_ci> :point_right: **Note**: On Windows, the directory separator must be '\\' for
4102e5b6d6dSopenharmony_ciany paths passed to ICU APIs. This is different from native Windows APIs, which
4112e5b6d6dSopenharmony_cigenerally allow either '/' or '\\'.
4122e5b6d6dSopenharmony_ci
4132e5b6d6dSopenharmony_ci## Sharing ICU Data Between Platforms
4142e5b6d6dSopenharmony_ci
4152e5b6d6dSopenharmony_ciICU's default data is (at the time of this writing) about 8 MB in size. Because
4162e5b6d6dSopenharmony_ciit is normally built as a shared library, the file format is specific to each
4172e5b6d6dSopenharmony_ciplatform (operating system). The data libraries can not be shared between
4182e5b6d6dSopenharmony_ciplatforms even though the actual data contents are identical.
4192e5b6d6dSopenharmony_ci
4202e5b6d6dSopenharmony_ciBy distributing the default data in the form of common format .dat files rather
4212e5b6d6dSopenharmony_cithan as shared libraries, a single data file can be shared among multiple
4222e5b6d6dSopenharmony_ciplatforms. This is beneficial if a single distribution of the application (a CD,
4232e5b6d6dSopenharmony_cifor example) includes binaries for many platforms, and the size requirements for
4242e5b6d6dSopenharmony_cireplicating the ICU data for each platform are a problem.
4252e5b6d6dSopenharmony_ci
4262e5b6d6dSopenharmony_ciICU common format data files are not completely interchangeable between
4272e5b6d6dSopenharmony_ciplatforms. The format depends on these properties of the platform:
4282e5b6d6dSopenharmony_ci
4292e5b6d6dSopenharmony_ci1.  Byte Ordering (little endian vs. big endian)
4302e5b6d6dSopenharmony_ci
4312e5b6d6dSopenharmony_ci2.  Base character set - ASCII or EBCDIC
4322e5b6d6dSopenharmony_ci
4332e5b6d6dSopenharmony_ciThis means, for example, that ICU data files are interchangeable between Windows
4342e5b6d6dSopenharmony_ciand Linux on X86 (both are ASCII little endian), or between Macintosh and
4352e5b6d6dSopenharmony_ciSolaris on SPARC (both are ASCII big endian), but not between Solaris on SPARC
4362e5b6d6dSopenharmony_ciand Solaris on X86 (different byte ordering).
4372e5b6d6dSopenharmony_ci
4382e5b6d6dSopenharmony_ciThe single letter following the version number in the file name of the default
4392e5b6d6dSopenharmony_ciICU data file encodes the properties of the file as follows:
4402e5b6d6dSopenharmony_ci
4412e5b6d6dSopenharmony_ci```
4422e5b6d6dSopenharmony_ciicudt19l.dat Little Endian, ASCII
4432e5b6d6dSopenharmony_ciicudt19b.dat Big Endian, ASCII
4442e5b6d6dSopenharmony_ciicudt19e.dat Big Endian, EBCDIC
4452e5b6d6dSopenharmony_ci```
4462e5b6d6dSopenharmony_ci
4472e5b6d6dSopenharmony_ci(There are no little endian EBCDIC systems. All non-EBCDIC encodings include an
4482e5b6d6dSopenharmony_ciinvariant subset of ASCII that is sufficient to enable these files to
4492e5b6d6dSopenharmony_ciinteroperate.)
4502e5b6d6dSopenharmony_ci
4512e5b6d6dSopenharmony_ciThe packaging of the default ICU data as a .dat file rather than as a shared
4522e5b6d6dSopenharmony_cilibrary is requested by using an option in the configure script at build time.
4532e5b6d6dSopenharmony_ciNothing is required at run time; ICU finds and uses whatever form of the data is
4542e5b6d6dSopenharmony_ciavailable.
4552e5b6d6dSopenharmony_ci
4562e5b6d6dSopenharmony_ci> :point_right: **Note**: When the ICU data is built in the form of shared
4572e5b6d6dSopenharmony_cilibraries, the library names have platform-specific prefixes and suffixes. On
4582e5b6d6dSopenharmony_ciUnix-style platforms, all the libraries have the "lib" prefix and one of the
4592e5b6d6dSopenharmony_ciusual (".dll", ".so", ".sl", etc.) suffixes. Other than these prefixes and
4602e5b6d6dSopenharmony_cisuffixes, the library names are the same as the above .dat files.
4612e5b6d6dSopenharmony_ci
4622e5b6d6dSopenharmony_ci## Customizing ICU's Data Library
4632e5b6d6dSopenharmony_ci
4642e5b6d6dSopenharmony_ciICU includes a standard library of data that is about 16 MB in size. Most of
4652e5b6d6dSopenharmony_cithis consists of conversion tables and locale information. The data itself is
4662e5b6d6dSopenharmony_cinormally placed into a single shared library.
4672e5b6d6dSopenharmony_ci
4682e5b6d6dSopenharmony_ciUpdate: as of ICU 64, the standard data library is over 20 MB in size. We have
4692e5b6d6dSopenharmony_ciintroduced a new tool, the [ICU Data Build Tool](./buildtool.md),
4702e5b6d6dSopenharmony_cito replace the makefiles explained below and give you more control over what
4712e5b6d6dSopenharmony_cigoes into your ICU locale data file.
4722e5b6d6dSopenharmony_ci
4732e5b6d6dSopenharmony_ci### Adding Converters to ICU
4742e5b6d6dSopenharmony_ci
4752e5b6d6dSopenharmony_ciThe first step is to obtain or create a .ucm (source) mapping data file for the
4762e5b6d6dSopenharmony_cidesired converter. A large archive of converter data is maintained by the ICU
4772e5b6d6dSopenharmony_citeam at <https://github.com/unicode-org/icu-data/tree/main/charset/data/ucm>
4782e5b6d6dSopenharmony_ci
4792e5b6d6dSopenharmony_ciWe will use `solaris-eucJP-2.7.ucm`, available from the repository mentioned
4802e5b6d6dSopenharmony_ciabove, as an example.
4812e5b6d6dSopenharmony_ci
4822e5b6d6dSopenharmony_ci#### Build the Converter
4832e5b6d6dSopenharmony_ci
4842e5b6d6dSopenharmony_ciConverter source files are compiled into binary converter files (.cnv files) by
4852e5b6d6dSopenharmony_ciusing the icu tool makeconv. For the example, you can use this command
4862e5b6d6dSopenharmony_ci
4872e5b6d6dSopenharmony_ci```
4882e5b6d6dSopenharmony_cimakeconv -v solaris-eucJP-2.7.ucm
4892e5b6d6dSopenharmony_ci```
4902e5b6d6dSopenharmony_ci
4912e5b6d6dSopenharmony_ciSome of the .ucm files from the repository will need additional header
4922e5b6d6dSopenharmony_ciinformation before they can be built. Use the error messages from the makeconv
4932e5b6d6dSopenharmony_citool, .ucm files for similar converters, and the ICU user guide documentation of
4942e5b6d6dSopenharmony_ci.ucm files as a guide when making changes. For the `solaris-eucJP-2.7.ucm`
4952e5b6d6dSopenharmony_ciexample, we will borrow the missing header fields from
4962e5b6d6dSopenharmony_ci`source/data/mappings/ibm-33722_P12A-2000.ucm`, which is the standard ICU eucJP
4972e5b6d6dSopenharmony_ciconverter data.
4982e5b6d6dSopenharmony_ci
4992e5b6d6dSopenharmony_ciThe ucm file format is described in the
5002e5b6d6dSopenharmony_ci["Conversion Data" chapter](../conversion/data.md) of this user guide.
5012e5b6d6dSopenharmony_ci
5022e5b6d6dSopenharmony_ciAfter adjustment, the header of the `solaris-eucJP-2.7.ucm` file contains these
5032e5b6d6dSopenharmony_ciitems:
5042e5b6d6dSopenharmony_ci
5052e5b6d6dSopenharmony_ci```
5062e5b6d6dSopenharmony_ci<code_set_name>   "solaris-eucJP-2.7"
5072e5b6d6dSopenharmony_ci<subchar>         \\x3F
5082e5b6d6dSopenharmony_ci<uconv_class>     "MBCS"
5092e5b6d6dSopenharmony_ci
5102e5b6d6dSopenharmony_ci<mb_cur_max>      3
5112e5b6d6dSopenharmony_ci<mb_cur_min>      1
5122e5b6d6dSopenharmony_ci
5132e5b6d6dSopenharmony_ci<icu:state>       0-8d, 8e:2, 8f:3, 90-9f, a1-fe:1
5142e5b6d6dSopenharmony_ci<icu:state>       a1-fe
5152e5b6d6dSopenharmony_ci<icu:state>       a1-e4
5162e5b6d6dSopenharmony_ci<icu:state>       a1-fe:1, a1:4, a3-af:4, b6:4, d6:4, da-db:4, ed-f2:4
5172e5b6d6dSopenharmony_ci<icu:state>       a1-fe
5182e5b6d6dSopenharmony_ci```
5192e5b6d6dSopenharmony_ci
5202e5b6d6dSopenharmony_ciThe binary converter file produced by the `makeconv` tool is
5212e5b6d6dSopenharmony_ci`solaris-eucJP-2.7.cnv`.
5222e5b6d6dSopenharmony_ci
5232e5b6d6dSopenharmony_ci#### Installation
5242e5b6d6dSopenharmony_ci
5252e5b6d6dSopenharmony_ciCopy the new .cnv file to the desired location for use. Set the environment
5262e5b6d6dSopenharmony_civariable `ICU_DATA` to the directory containing the data, or, alternatively,
5272e5b6d6dSopenharmony_cifrom within an application, tell ICU the location of the new data with the
5282e5b6d6dSopenharmony_cifunction `u_setDataDirectory()` before using the new converter.
5292e5b6d6dSopenharmony_ci
5302e5b6d6dSopenharmony_ciIf ICU is already obtaining data from files rather than a shared library,
5312e5b6d6dSopenharmony_ciinstall the new file in the same location as the existing ICU data file(s), and
5322e5b6d6dSopenharmony_cidon't change/set the environment variable or data directory.
5332e5b6d6dSopenharmony_ci
5342e5b6d6dSopenharmony_ciIf you do not want to add a converter to ICU's base data, you can also generate
5352e5b6d6dSopenharmony_cia conversion table with `makeconv`, use pkgdata to generate your own package and
5362e5b6d6dSopenharmony_ciuse the `ucnv_openPackage()` to open up a converter with that conversion table
5372e5b6d6dSopenharmony_cifrom the generated package.
5382e5b6d6dSopenharmony_ci
5392e5b6d6dSopenharmony_ci#### Building the new converter into ICU
5402e5b6d6dSopenharmony_ci
5412e5b6d6dSopenharmony_ciThe need to install a separate file and inform ICU of the data directory can be
5422e5b6d6dSopenharmony_ciavoided by building the new converter into ICU's standard data library. Here is
5432e5b6d6dSopenharmony_cithe procedure for doing so:
5442e5b6d6dSopenharmony_ci
5452e5b6d6dSopenharmony_ci1.  Move the .ucm file(s) for the converter(s) to be added (
5462e5b6d6dSopenharmony_ci    `solaris-eucJP-2.7.ucm` for our example) into the directory
5472e5b6d6dSopenharmony_ci    `source/data/mappings/`
5482e5b6d6dSopenharmony_ci
5492e5b6d6dSopenharmony_ci2.  Create, or edit, if it already exists, the file
5502e5b6d6dSopenharmony_ci    `source/data/mappings/ucmlocal.mk`. Add this line:
5512e5b6d6dSopenharmony_ci    
5522e5b6d6dSopenharmony_ci    ```
5532e5b6d6dSopenharmony_ci    UCM_SOURCE_LOCAL = solaris-eucJP-2.7.ucm
5542e5b6d6dSopenharmony_ci    ```
5552e5b6d6dSopenharmony_ci    
5562e5b6d6dSopenharmony_ci    Any number of converters can be listed. Extend the list to new lines with a
5572e5b6d6dSopenharmony_ci    back slash at the end of the line. The `ucmlocal.mk` file is described in
5582e5b6d6dSopenharmony_ci    more detail in `source/data/mappings/ucmfiles.mk` (Even though they use very
5592e5b6d6dSopenharmony_ci    different build systems, `ucmlocal.mk` is used for both the Windows and UNIX
5602e5b6d6dSopenharmony_ci    builds.)
5612e5b6d6dSopenharmony_ci
5622e5b6d6dSopenharmony_ci3.  Add the converter name and aliases to `source/data/mappings/convrtrs.txt`.
5632e5b6d6dSopenharmony_ci    This will allow your converter to be shown in the list of available
5642e5b6d6dSopenharmony_ci    converters when you call the `ucnv_getAvailableName(`) function. The file
5652e5b6d6dSopenharmony_ci    syntax is described within the file.
5662e5b6d6dSopenharmony_ci
5672e5b6d6dSopenharmony_ci4.  Rebuild the ICU data.
5682e5b6d6dSopenharmony_ci    For Windows, from MSVC choose the makedata project from the GUI, then build
5692e5b6d6dSopenharmony_ci    the project.
5702e5b6d6dSopenharmony_ci    For UNIX, `cd icu/source/data; gmake`
5712e5b6d6dSopenharmony_ci
5722e5b6d6dSopenharmony_ciWhen opening an ICU converter (`ucnv_open()`), the converter name can not be
5732e5b6d6dSopenharmony_ciqualified with a path that indicates the directory or common data file
5742e5b6d6dSopenharmony_cicontaining the corresponding converter data. The required data must be present
5752e5b6d6dSopenharmony_cieither in the main ICU data library or as a separate .cnv file located in the
5762e5b6d6dSopenharmony_ciICU data directory. This is different from opening resources or other types of
5772e5b6d6dSopenharmony_ciICU data, which do allow a path.
5782e5b6d6dSopenharmony_ci
5792e5b6d6dSopenharmony_ci### Adding Locale Data to ICU's Data
5802e5b6d6dSopenharmony_ci
5812e5b6d6dSopenharmony_ciIf you have data for a locale that is not included in ICU's standard build, then
5822e5b6d6dSopenharmony_ciyou can add it to the build in a very similar way as with conversion tables
5832e5b6d6dSopenharmony_ciabove. The ICU project provides a large number of additional locales in its
5842e5b6d6dSopenharmony_ci[locale
5852e5b6d6dSopenharmony_cirepository](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/locales/)
5862e5b6d6dSopenharmony_cion the web. Most of this locale data is derived from the CLDR ([Common Locale
5872e5b6d6dSopenharmony_ciData Repository](http://www.unicode.org/cldr/)) project.
5882e5b6d6dSopenharmony_ci
5892e5b6d6dSopenharmony_ciDropping the txt file into the correct place in the source tree is sufficient to
5902e5b6d6dSopenharmony_ciadd it to your ICU build. You will need to re-configure in order to pick it up.
5912e5b6d6dSopenharmony_ci
5922e5b6d6dSopenharmony_ci## Customizing ICU's Data Library for ICU 63 or earlier
5932e5b6d6dSopenharmony_ciThe ICU data library can be easily customized, either by adding additional converters or locales, or by removing some of the standard ones for the purpose of saving space.
5942e5b6d6dSopenharmony_ci
5952e5b6d6dSopenharmony_ci> :point_right: **Note**: ICU for C by default comes with pre-built data.
5962e5b6d6dSopenharmony_ciThe source data files are included as an "icu\*data.zip" file starting in ICU4C
5972e5b6d6dSopenharmony_ci49. Previously, they were not included unless ICU is downloaded from the
5982e5b6d6dSopenharmony_ci[source repository](https://github.com/unicode-org/icu). Alternatively, the
5992e5b6d6dSopenharmony_ci[Data Customizer](http://apps.icu-project.org/datacustom/) may be used to
6002e5b6d6dSopenharmony_cicustomize the pre-built data.
6012e5b6d6dSopenharmony_ci
6022e5b6d6dSopenharmony_ciICU can load data from individual data files as well as from its default
6032e5b6d6dSopenharmony_cilibrary, so building a customized library when adding additional data is not
6042e5b6d6dSopenharmony_cistrictly necessary. Adding to ICU's library can simplify application
6052e5b6d6dSopenharmony_ciinstallation by eliminating the need to include separate files with an
6062e5b6d6dSopenharmony_ciapplication distribution, and the need to tell ICU where they are installed.
6072e5b6d6dSopenharmony_ci
6082e5b6d6dSopenharmony_ciReducing the size of ICU's data by eliminating unneeded resources can make
6092e5b6d6dSopenharmony_cisense on small systems with limited or no disk, but for desktop or server
6102e5b6d6dSopenharmony_cisystems there is no real advantage to trimming. ICU's data is memory mapped
6112e5b6d6dSopenharmony_ciinto an application's address space, and only those portions of the data
6122e5b6d6dSopenharmony_ciactually being used are ever paged in, so there are no significant RAM savings.
6132e5b6d6dSopenharmony_ciAs for disk space, with the large size of today's hard drives, saving a few MB
6142e5b6d6dSopenharmony_ciis not worth the bother.
6152e5b6d6dSopenharmony_ci
6162e5b6d6dSopenharmony_ciBy default, ICU builds with a large set of converters and with all available
6172e5b6d6dSopenharmony_cilocales. This means that any extra items added must be provided by the
6182e5b6d6dSopenharmony_ciapplication developer. There is no extra ICU-supplied data that could be
6192e5b6d6dSopenharmony_cispecified.
6202e5b6d6dSopenharmony_ci
6212e5b6d6dSopenharmony_ci### Details
6222e5b6d6dSopenharmony_ci
6232e5b6d6dSopenharmony_ciThe converters and resources that ICU builds are in the following configuration
6242e5b6d6dSopenharmony_cifiles. They are only available when building from ICU's source code repository.
6252e5b6d6dSopenharmony_ciNormally, the standard ICU distribution do not include these files.
6262e5b6d6dSopenharmony_ci
6272e5b6d6dSopenharmony_ciFile                              | Description
6282e5b6d6dSopenharmony_ci----------------------------------|--------------
6292e5b6d6dSopenharmony_cisource/data/locales/resfiles.mk   | The standard set of locale data resource bundles
6302e5b6d6dSopenharmony_cisource/data/locales/reslocal.mk   | User-provided file with additional resource bundles
6312e5b6d6dSopenharmony_cisource/data/coll/colfiles.mk      | The standard set of collation data resource bundles
6322e5b6d6dSopenharmony_cisource/data/coll/collocal.mk      | User-provided file with additional collation resource bundles
6332e5b6d6dSopenharmony_cisource/data/brkitr/brkfiles.mk    | The standard set of break iterator data resource bundles
6342e5b6d6dSopenharmony_cisource/data/brkitr/brklocal.mk    | User-provided file with additional break iterator resource bundles
6352e5b6d6dSopenharmony_cisource/data/translit/trnsfiles.mk | The standard set of transliterator resource files
6362e5b6d6dSopenharmony_cisource/data/translit/trnslocal.mk | User-provided file with a set of additional transliterator resource files
6372e5b6d6dSopenharmony_cisource/data/mappings/ucmcore.mk   | Core set of conversion tables for MIME/Unix/Windows
6382e5b6d6dSopenharmony_cisource/data/mappings/ucmfiles.mk  | Additional, large set of conversion tables for a wide range of uses
6392e5b6d6dSopenharmony_cisource/data/mappings/ucmebcdic.mk | Large set of EBCDIC conversion tables
6402e5b6d6dSopenharmony_cisource/data/mappings/ucmlocal.mk  | User-provided file with additional conversion tables
6412e5b6d6dSopenharmony_cisource/data/misc/miscfiles.mk     | Miscellaneous data, like timezone information 
6422e5b6d6dSopenharmony_ci
6432e5b6d6dSopenharmony_ciThese files function identically for both Windows and UNIX builds of ICU. ICU
6442e5b6d6dSopenharmony_ciwill automatically update the list of installed locales returned by
6452e5b6d6dSopenharmony_ci`uloc_getAvailable()` whenever `resfiles.mk` or `reslocal.mk` are updated and
6462e5b6d6dSopenharmony_cithe ICU data library is rebuilt. These files are only needed while building ICU.
6472e5b6d6dSopenharmony_ciIf any of these files are removed or renamed, the size of the ICU data library
6482e5b6d6dSopenharmony_ciwill be reduced.
6492e5b6d6dSopenharmony_ci
6502e5b6d6dSopenharmony_ciThe optional files `reslocal.mk` and `ucmlocal.mk` are not included as part of
6512e5b6d6dSopenharmony_cia standard ICU distribution. Thus these customization files do not need to be
6522e5b6d6dSopenharmony_cimerged or updated when updating versions of ICU.
6532e5b6d6dSopenharmony_ci
6542e5b6d6dSopenharmony_ciBoth `reslocal.mk` and `ucmlocal.mk` are makefile includes. So the usual rules
6552e5b6d6dSopenharmony_cifor makefiles apply. Lines may be continued by preceding the end of the line to
6562e5b6d6dSopenharmony_cibe continued with a back slash. Lines beginning with a # are comments. See
6572e5b6d6dSopenharmony_ci`ucmfiles.mk` and `resfiles.mk` for additional information.
6582e5b6d6dSopenharmony_ci
6592e5b6d6dSopenharmony_ci### Reducing the Size of ICU's Data: Conversion Tables
6602e5b6d6dSopenharmony_ci
6612e5b6d6dSopenharmony_ciThe size of the ICU data file in the standard build configuration is about 8 MB.
6622e5b6d6dSopenharmony_ciThe majority of this is used for conversion tables. ICU comes with so many
6632e5b6d6dSopenharmony_ciconversion tables because many ICU users need to support many encodings from
6642e5b6d6dSopenharmony_cimany platforms. There are conversion tables for EBCDIC and DOS codepages, for
6652e5b6d6dSopenharmony_ciISO 2022 variants, and for small variations of popular encodings.
6662e5b6d6dSopenharmony_ci
6672e5b6d6dSopenharmony_ci> :point_right: **Important**: ICU provides full internationalization
6682e5b6d6dSopenharmony_cifunctionality without **any** conversion table data. The common library
6692e5b6d6dSopenharmony_cicontains code to handle several important encodings algorithmically: US-ASCII,
6702e5b6d6dSopenharmony_ciISO-8859-1, UTF-7/8/16/32, SCSU, BOCU-1, CESU-8, and IMAP-mailbox-name (i.e.,
6712e5b6d6dSopenharmony_ciUS-ASCII, ISO-8859-1, and all Unicode charsets; see
6722e5b6d6dSopenharmony_cisource/data/mappings/convrtrs.txt for the current list).
6732e5b6d6dSopenharmony_ci
6742e5b6d6dSopenharmony_ciTherefore, the easiest way to reduce the size of ICU's data by a lot (without
6752e5b6d6dSopenharmony_cilimitation of I18N support) is to reduce the number of conversion tables that
6762e5b6d6dSopenharmony_ciare built into the data file.
6772e5b6d6dSopenharmony_ci
6782e5b6d6dSopenharmony_ciThe conversion tables are listed for the build process in several makefiles
6792e5b6d6dSopenharmony_ci`source/data/mappings/ucm\*.mk`, roughly grouped by how commonly they are used.
6802e5b6d6dSopenharmony_ciIf you remove or rename any of these files, then the ICU build will exclude the
6812e5b6d6dSopenharmony_ciconversion tables that are listed in that file. Beginning with ICU 2.0, all of
6822e5b6d6dSopenharmony_cithese makefiles including the main one are optional. If you remove all of them,
6832e5b6d6dSopenharmony_cithen ICU will include only very few conversion tables for "fallback" encodings
6842e5b6d6dSopenharmony_ci(see note below).
6852e5b6d6dSopenharmony_ci
6862e5b6d6dSopenharmony_ciIf you remove or rename all `ucm\*.mk` files, then ICU's data is reduced to
6872e5b6d6dSopenharmony_ciabout 3.6 MB. If you remove all these files except for `ucmcore.mk`, then ICU's
6882e5b6d6dSopenharmony_cidata is reduced to about 4.7 MB, while keeping support for a core set of common
6892e5b6d6dSopenharmony_ciMIME/Unix/Windows encodings.
6902e5b6d6dSopenharmony_ci
6912e5b6d6dSopenharmony_ci> :point_right: **Note**: If you remove the conversion table for an encoding
6922e5b6d6dSopenharmony_cithat could be a default encoding on one of your platforms, then ICU will not be
6932e5b6d6dSopenharmony_ciable to instantiate a default converter. In this case, ICU 2.0 and up will
6942e5b6d6dSopenharmony_ciautomatically fall back to a "lowest common denominator" and load a converter
6952e5b6d6dSopenharmony_cifor US-ASCII (or, on EBCDIC platforms, for codepages 37 or 1047). This will be
6962e5b6d6dSopenharmony_cigood enough for converting strings that contain only "ASCII" characters (see the
6972e5b6d6dSopenharmony_cicomment about "invariant characters" in `utypes.h`).
6982e5b6d6dSopenharmony_ci*When ICU is built with a reduced set of conversion tables, then some tests will
6992e5b6d6dSopenharmony_cifail that test the behavior of the converters based on known features of some
7002e5b6d6dSopenharmony_ciencodings. Also, building the testdata will fail if you remove some conversion
7012e5b6d6dSopenharmony_citables that are necessary for that (to test non-ASCII/Unicode resource bundle
7022e5b6d6dSopenharmony_cisource files, for example). You can ignore these failures. Build with the
7032e5b6d6dSopenharmony_cistandard set of conversion tables, if you want to run the tests.* 
7042e5b6d6dSopenharmony_ci
7052e5b6d6dSopenharmony_ci### Reducing the Size of ICU's Data: Locale Data
7062e5b6d6dSopenharmony_ci
7072e5b6d6dSopenharmony_ciIf you need to reduce the size of ICU's data even further, then you need to
7082e5b6d6dSopenharmony_ciremove other files or parts of files from the build as well.
7092e5b6d6dSopenharmony_ci
7102e5b6d6dSopenharmony_ciThere are a number of different subdirectories of 'data' containing locale data
7112e5b6d6dSopenharmony_cisplit out by section. Each subdirectory has its own **.mk** file listing the
7122e5b6d6dSopenharmony_cilocales which will be built. Subdirectories include **lang** for language names
7132e5b6d6dSopenharmony_ciand **curr** for currency names.
7142e5b6d6dSopenharmony_ci
7152e5b6d6dSopenharmony_ciYou can remove data for entire locales by removing their files from
7162e5b6d6dSopenharmony_ci`source/data/locales/resfiles.mk` or the appropriate other .mk file. ICU will
7172e5b6d6dSopenharmony_cithen use the data of the parent locale instead, which is root.txt. If you
7182e5b6d6dSopenharmony_ciremove all resource bundles for a given language and its country/region/variant
7192e5b6d6dSopenharmony_cisublocales, **do not remove root.txt!** Also, do not remove a parent locale if
7202e5b6d6dSopenharmony_cichild locales exist. For example, do not remove "en" while retaining "en_US".
7212e5b6d6dSopenharmony_ci
7222e5b6d6dSopenharmony_ci### Reducing the Size of ICU's Data: Collation Data
7232e5b6d6dSopenharmony_ci
7242e5b6d6dSopenharmony_ciCollation data (for sorting, searching and alphabetic indexes) is also large,
7252e5b6d6dSopenharmony_ciespecially the collation data for East Asian languages because they define
7262e5b6d6dSopenharmony_cimultiple orderings of tens of thousands of Han characters. You can remove the
7272e5b6d6dSopenharmony_cicollation data for those languages by removing references to those locales from
7282e5b6d6dSopenharmony_ci`source/data/coll/colfiles.mk` files. When you do that, the collation for those
7292e5b6d6dSopenharmony_cilanguages will fall back to the root collator, that is, you lose
7302e5b6d6dSopenharmony_cilanguage-specific behavior.
7312e5b6d6dSopenharmony_ci
7322e5b6d6dSopenharmony_ciA much less radical approach is to keep the collation data tables but remove the
7332e5b6d6dSopenharmony_citailoring rule strings from which they were built. Those rule strings are
7342e5b6d6dSopenharmony_cirarely used at runtime. For documentation about their use and how to remove
7352e5b6d6dSopenharmony_cithem see the section "Building on Existing Locales" in the
7362e5b6d6dSopenharmony_ci[Collation Customization chapter](collation/customization/index.md).
7372e5b6d6dSopenharmony_ci
7382e5b6d6dSopenharmony_ci### Adding Locale Data to ICU's Data
7392e5b6d6dSopenharmony_ciYou need to write a resource bundle file for it with a structure like the
7402e5b6d6dSopenharmony_ciexisting locale resource bundles (e.g. `source/data/locales/ja.txt, ru_RU.txt`,
7412e5b6d6dSopenharmony_ci`kok_IN.txt`) and add it by writing a file `source/data/locales/reslocal.mk`
7422e5b6d6dSopenharmony_cijust like above. In this file, define the list of additional resource bundles as
7432e5b6d6dSopenharmony_ci
7442e5b6d6dSopenharmony_ci```
7452e5b6d6dSopenharmony_ciGENRB_SOURCE_LOCAL=myLocale.txt other.txt ...
7462e5b6d6dSopenharmony_ci```
7472e5b6d6dSopenharmony_ci
7482e5b6d6dSopenharmony_ciStarting in ICU 2.2, these added locales are automatically listed by
7492e5b6d6dSopenharmony_ci`uloc_getAvailable()`.
7502e5b6d6dSopenharmony_ci
7512e5b6d6dSopenharmony_ci## ICU Data File Formats
7522e5b6d6dSopenharmony_ci
7532e5b6d6dSopenharmony_ciICU uses several kinds of data files with specific source (plain text) and
7542e5b6d6dSopenharmony_cibinary data formats. The following lists provides links to descriptions of those
7552e5b6d6dSopenharmony_ciformats.
7562e5b6d6dSopenharmony_ci
7572e5b6d6dSopenharmony_ciEach ICU data object begins with a header before the actual, specific data. The
7582e5b6d6dSopenharmony_ciheader consists of a 16-bit header length value, the two "magic" bytes DA 27 and
7592e5b6d6dSopenharmony_cia [UDataInfo](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/structUDataInfo.html#_details)
7602e5b6d6dSopenharmony_cistructure which specifies the data object's endianness, charset family, format,
7612e5b6d6dSopenharmony_cidata version, etc.
7622e5b6d6dSopenharmony_ci
7632e5b6d6dSopenharmony_ci(This is not the case for the trie structures, which are not stand-alone,
7642e5b6d6dSopenharmony_ciloadable data objects.)
7652e5b6d6dSopenharmony_ci
7662e5b6d6dSopenharmony_ci### Public Data Files
7672e5b6d6dSopenharmony_ci
7682e5b6d6dSopenharmony_ci#### ICU.dat package files
7692e5b6d6dSopenharmony_ci*   Source format: (list of files provided as input to the icupkg tool, or
7702e5b6d6dSopenharmony_ci         on the gencmn tool command line)
7712e5b6d6dSopenharmony_ci*    Binary format: .dat:
7722e5b6d6dSopenharmony_ci     [source/tools/toolutil/pkg_gencmn.cpp](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/toolutil/pkg_gencmn.cpp)
7732e5b6d6dSopenharmony_ci*    Generator tool:
7742e5b6d6dSopenharmony_ci         [icupkg](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/icupkg)
7752e5b6d6dSopenharmony_ci         or
7762e5b6d6dSopenharmony_ci         [gencmn](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gencmn)
7772e5b6d6dSopenharmony_ci         
7782e5b6d6dSopenharmony_ci#### Resource bundles
7792e5b6d6dSopenharmony_ci*   Source format: .txt:
7802e5b6d6dSopenharmony_ci    [icuhtml/design/bnf_rb.txt](https://github.com/unicode-org/icu-docs/blob/main/design/bnf_rb.txt)
7812e5b6d6dSopenharmony_ci*   Binary format: .res:
7822e5b6d6dSopenharmony_ci    [source/common/uresdata.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/uresdata.h)
7832e5b6d6dSopenharmony_ci*   Generator tool:
7842e5b6d6dSopenharmony_ci    [genrb](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/genrb)
7852e5b6d6dSopenharmony_ci
7862e5b6d6dSopenharmony_ci#### Unicode conversion mapping tables
7872e5b6d6dSopenharmony_ci*   Source format: .ucm: [Conversion Data chapter](../conversion/data.md)
7882e5b6d6dSopenharmony_ci*   Binary format: .cnv:
7892e5b6d6dSopenharmony_ci    [source/common/ucnvmbcs.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucnvmbcs.h)
7902e5b6d6dSopenharmony_ci*   Generator tool:
7912e5b6d6dSopenharmony_ci    [makeconv](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/makeconv)
7922e5b6d6dSopenharmony_ci
7932e5b6d6dSopenharmony_ci#### Conversion (charset) aliases
7942e5b6d6dSopenharmony_ci*   Source format:
7952e5b6d6dSopenharmony_ci    [source/data/mappings/convrtrs.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/convrtrs.txt):
7962e5b6d6dSopenharmony_ci    contains format description. The command "uconv -l --canon" will also
7972e5b6d6dSopenharmony_ci    generate the alias table from the currently used copy of ICU.
7982e5b6d6dSopenharmony_ci*   Binary format: cnvalias.icu:
7992e5b6d6dSopenharmony_ci    [source/common/ucnv_io.cpp](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucnv_io.cpp)
8002e5b6d6dSopenharmony_ci*   Generator tool:
8012e5b6d6dSopenharmony_ci    [gencnval](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gencnval)
8022e5b6d6dSopenharmony_ci
8032e5b6d6dSopenharmony_ci#### Unicode Character Data (Properties; for Java only: hardcoded in C common library)
8042e5b6d6dSopenharmony_ci*   Source format:
8052e5b6d6dSopenharmony_ci    [source/data/unidata/ppucd.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/ppucd.txt):
8062e5b6d6dSopenharmony_ci    [Preparsed UCD](https://icu.unicode.org/design/props/ppucd)
8072e5b6d6dSopenharmony_ci*   Binary format: uprops.icu:
8082e5b6d6dSopenharmony_ci    [tools/unicode/c/genprops/corepropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/corepropsbuilder.cpp)
8092e5b6d6dSopenharmony_ci*   Generator tool:
8102e5b6d6dSopenharmony_ci    [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops)
8112e5b6d6dSopenharmony_ci
8122e5b6d6dSopenharmony_ci#### Unicode Character Data (Case mappings; for Java only: hardcoded in C common library)
8132e5b6d6dSopenharmony_ci*   Source format:
8142e5b6d6dSopenharmony_ci    [source/data/unidata/*.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata):
8152e5b6d6dSopenharmony_ci    [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
8162e5b6d6dSopenharmony_ci*   Binary format: ucase.icu:
8172e5b6d6dSopenharmony_ci    [tools/unicode/c/genprops/casepropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/casepropsbuilder.cpp)
8182e5b6d6dSopenharmony_ci*   Generator tool:
8192e5b6d6dSopenharmony_ci    [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops)
8202e5b6d6dSopenharmony_ci
8212e5b6d6dSopenharmony_ci#### Unicode Character Data (BiDi, and Arabic shaping; for Java only: hardcoded in C common library)
8222e5b6d6dSopenharmony_ci*   Source format:
8232e5b6d6dSopenharmony_ci    [source/data/unidata/*.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata):
8242e5b6d6dSopenharmony_ci    [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
8252e5b6d6dSopenharmony_ci*   Binary format: ubidi.icu:
8262e5b6d6dSopenharmony_ci    [tools/unicode/c/genprops/bidipropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/bidipropsbuilder.cpp)
8272e5b6d6dSopenharmony_ci*   Generator tool:
8282e5b6d6dSopenharmony_ci    [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops)
8292e5b6d6dSopenharmony_ci
8302e5b6d6dSopenharmony_ci#### Unicode Character Data (Normalization since ICU 4.4) & custom normalization data
8312e5b6d6dSopenharmony_ci*   Source format:
8322e5b6d6dSopenharmony_ci    [source/data/unidata/norm2/*.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/norm2):
8332e5b6d6dSopenharmony_ci    Files derived from the [Unicode Character
8342e5b6d6dSopenharmony_ci    Database](https://www.unicode.org/onlinedat/online.html), or custom data.
8352e5b6d6dSopenharmony_ci*   Binary format: .nrm:
8362e5b6d6dSopenharmony_ci    [source/common/normalizer2impl.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/normalizer2impl.h)
8372e5b6d6dSopenharmony_ci*   Generator tool:
8382e5b6d6dSopenharmony_ci    [gennorm2](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gennorm2)
8392e5b6d6dSopenharmony_ci
8402e5b6d6dSopenharmony_ci#### Unicode Character Data (Character names)
8412e5b6d6dSopenharmony_ci*   Source format:
8422e5b6d6dSopenharmony_ci    [source/data/unidata/UnicodeData.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/UnicodeData.txt):
8432e5b6d6dSopenharmony_ci    [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
8442e5b6d6dSopenharmony_ci*   Binary format: unames.icu:
8452e5b6d6dSopenharmony_ci    [tools/unicode/c/genprops/namespropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/namespropsbuilder.cpp)
8462e5b6d6dSopenharmony_ci*   Generator tool:
8472e5b6d6dSopenharmony_ci    [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops)
8482e5b6d6dSopenharmony_ci
8492e5b6d6dSopenharmony_ci#### Unicode Character Data (Property [value] aliases since ICU 4.8; for Java only: hardcoded in C common library since ICU 4.8)
8502e5b6d6dSopenharmony_ci*   Source format: [UCD Property*Aliases.txt](http://www.unicode.org/Public/UNIDATA/):
8512e5b6d6dSopenharmony_ci                   [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
8522e5b6d6dSopenharmony_ci*   Binary format: pnames.icu:
8532e5b6d6dSopenharmony_ci    [source/common/propname.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/propname.h)
8542e5b6d6dSopenharmony_ci*   Generator tool:
8552e5b6d6dSopenharmony_ci    [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops)
8562e5b6d6dSopenharmony_ci
8572e5b6d6dSopenharmony_ci#### Unicode Character Data (Text layout properties since ICU 64)
8582e5b6d6dSopenharmony_ci*   Source format:
8592e5b6d6dSopenharmony_ci    [source/data/unidata/ppucd.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/ppucd.txt):
8602e5b6d6dSopenharmony_ci    [Preparsed UCD](https://icu.unicode.org/design/props/ppucd)
8612e5b6d6dSopenharmony_ci*   Binary format: ulayout.icu:
8622e5b6d6dSopenharmony_ci    [tools/unicode/c/genprops/layoutpropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/layoutpropsbuilder.cpp)
8632e5b6d6dSopenharmony_ci*   Generator tool:
8642e5b6d6dSopenharmony_ci    [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops)
8652e5b6d6dSopenharmony_ci
8662e5b6d6dSopenharmony_ci#### Unicode Character Data (Emoji properties since ICU 70)
8672e5b6d6dSopenharmony_ciEmoji properties of code points moved out of uprops.icu.
8682e5b6d6dSopenharmony_ciEmoji properties of strings added.
8692e5b6d6dSopenharmony_ci*   Source format:
8702e5b6d6dSopenharmony_ci    [source/data/unidata/emoji-sequences.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/emoji-sequences.txt) and
8712e5b6d6dSopenharmony_ci    [source/data/unidata/emoji-zwj-sequences.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/emoji-zwj-sequences.txt):
8722e5b6d6dSopenharmony_ci    [UTS #51 Data Files](https://www.unicode.org/reports/tr51/#Data_Files)
8732e5b6d6dSopenharmony_ci*   Binary format: uemoji.icu:
8742e5b6d6dSopenharmony_ci    [tools/unicode/c/genprops/emojipropsbuilder.cpp](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops/emojipropsbuilder.cpp)
8752e5b6d6dSopenharmony_ci*   Generator tool:
8762e5b6d6dSopenharmony_ci    [genprops](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genprops)
8772e5b6d6dSopenharmony_ci
8782e5b6d6dSopenharmony_ci#### Collation data (root collation & tailorings; ICU 53 & later)
8792e5b6d6dSopenharmony_ci*   Source format: Original data from allkeys_CLDR.txt in
8802e5b6d6dSopenharmony_ci    [CLDR Root Collation Data Files](http://www.unicode.org/reports/tr35/tr35-collation.html#Root_Data_Files)
8812e5b6d6dSopenharmony_ci    processed into
8822e5b6d6dSopenharmony_ci    [source/data/unidata/FractionalUCA.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/FractionalUCA.txt)
8832e5b6d6dSopenharmony_ci    by
8842e5b6d6dSopenharmony_ci    [tool at unicode.org maintained by Mark Davis](https://sites.google.com/site/unicodetools/#TOC-UCA)
8852e5b6d6dSopenharmony_ci    (call the Main class with option writeFractionalUCA); source tailorings (text rules) in
8862e5b6d6dSopenharmony_ci    [source/data/coll/*.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/coll)
8872e5b6d6dSopenharmony_ci    resource bundles: [Collation Customization chapter](../collation/customization/index.md).
8882e5b6d6dSopenharmony_ci*   Binary format: ucadata.icu & binary tailorings in resource bundles:
8892e5b6d6dSopenharmony_ci    [source/i18n/collationdatareader.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/i18n/collationdatareader.h)
8902e5b6d6dSopenharmony_ci*   Generator tool:
8912e5b6d6dSopenharmony_ci    [genuca](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genuca),
8922e5b6d6dSopenharmony_ci    [genrb](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/genrb)
8932e5b6d6dSopenharmony_ci
8942e5b6d6dSopenharmony_ci#### Rule-based break iterator data
8952e5b6d6dSopenharmony_ci*   Source format: .txt: [Boundary Analysis chapter](boundaryanalysis/index.md)
8962e5b6d6dSopenharmony_ci*   Binary format: .brk:
8972e5b6d6dSopenharmony_ci    [source/common/rbbidata.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/rbbidata.h)
8982e5b6d6dSopenharmony_ci*   Generator tool:
8992e5b6d6dSopenharmony_ci    [genbrk](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/genbrk)
9002e5b6d6dSopenharmony_ci
9012e5b6d6dSopenharmony_ci#### Dictionary-based break iterator data (ICU 50 & later)
9022e5b6d6dSopenharmony_ci*   Source format: txt: [gendict.cpp
9032e5b6d6dSopenharmony_ci    comments](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gendict/gendict.cpp)
9042e5b6d6dSopenharmony_ci*   Binary format: .dict: see
9052e5b6d6dSopenharmony_ci    [source/common/dictionarydata.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/dictionarydata.h
9062e5b6d6dSopenharmony_ci*   Generator tool:
9072e5b6d6dSopenharmony_ci    [gendict](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gendict)
9082e5b6d6dSopenharmony_ci
9092e5b6d6dSopenharmony_ci#### Rule-based transform (transliterator) data
9102e5b6d6dSopenharmony_ci*   Source format: .txt (in resource bundles): [Transform Rule Tutorial chapter](transforms/general/rules.md)
9112e5b6d6dSopenharmony_ci*   Binary format: Uses genrb to make binary format
9122e5b6d6dSopenharmony_ci*   Generator tool: Does not apply
9132e5b6d6dSopenharmony_ci
9142e5b6d6dSopenharmony_ci#### Time zone data (ICU 4.4 & later)
9152e5b6d6dSopenharmony_ci*   Source format:
9162e5b6d6dSopenharmony_ci    [source/data/misc/zoneinfo64.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/misc/zoneinfo64.txt):
9172e5b6d6dSopenharmony_ci    ftp://elsie.nci.nih.gov/pub/ tzdata<year><rev>.tar.gz
9182e5b6d6dSopenharmony_ci*   Binary format: zoneinfo64.res (generated by genrb and
9192e5b6d6dSopenharmony_ci    [tzcode tools](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/readme.txt)).
9202e5b6d6dSopenharmony_ci*   Generator tool: Does not apply
9212e5b6d6dSopenharmony_ci
9222e5b6d6dSopenharmony_ci#### StringPrep profile data
9232e5b6d6dSopenharmony_ci*   Source format:
9242e5b6d6dSopenharmony_ci    [source/data/sprep/rfc3491.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/sprep/rfc3491.txt):
9252e5b6d6dSopenharmony_ci*   Binary format: .spp:
9262e5b6d6dSopenharmony_ci    [source/tools/gensprep/store.c](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gensprep/store.c)
9272e5b6d6dSopenharmony_ci*   Generator tool:
9282e5b6d6dSopenharmony_ci    [gensprep](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gensprep)
9292e5b6d6dSopenharmony_ci
9302e5b6d6dSopenharmony_ci#### Confusables data
9312e5b6d6dSopenharmony_ci*   Source format:
9322e5b6d6dSopenharmony_ci    [source/data/unidata/confusables.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/confusables.txt),
9332e5b6d6dSopenharmony_ci    [source/data/unidata/confusablesWholeScript.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/confusablesWholeScript.txt)
9342e5b6d6dSopenharmony_ci*   Binary format: .spp:
9352e5b6d6dSopenharmony_ci    [confusables.cfu: source/i18n/uspoof_impl.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/i18n/uspoof_impl.h)
9362e5b6d6dSopenharmony_ci*   Generator tool: [gencfu](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gencfu)
9372e5b6d6dSopenharmony_ci
9382e5b6d6dSopenharmony_ci### Public Data Files (old versions)
9392e5b6d6dSopenharmony_ci
9402e5b6d6dSopenharmony_ci#### Unicode Character Data (Normalization before ICU 4.4; for Java only: was hardcoded in C common library)
9412e5b6d6dSopenharmony_ci*   Source format:
9422e5b6d6dSopenharmony_ci    [source/data/unidata/*.txt]((https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata):
9432e5b6d6dSopenharmony_ci    [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
9442e5b6d6dSopenharmony_ci*   Binary format: unorm.icu:
9452e5b6d6dSopenharmony_ci    [source/common/unormimp.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unormimp.h)
9462e5b6d6dSopenharmony_ci*   Generator tool: gennorm
9472e5b6d6dSopenharmony_ci
9482e5b6d6dSopenharmony_ci#### Unicode Character Data (Property [value] aliases before ICU 4.8)
9492e5b6d6dSopenharmony_ci*   Source format: source/data/unidata/Property*Aliases.txt: [Unicode Character Database](http://www.unicode.org/onlinedat/online.html)
9502e5b6d6dSopenharmony_ci*   Binary format: pnames.icu: source/common/propname.h (ICU 4.6)
9512e5b6d6dSopenharmony_ci*   Generator tool: genpname
9522e5b6d6dSopenharmony_ci
9532e5b6d6dSopenharmony_ci#### Collation data (UCA, code points to weights; ICU 52 & earlier)
9542e5b6d6dSopenharmony_ci*   Source format: Same as in ICU 53
9552e5b6d6dSopenharmony_ci*   Binary format: ucadata.icu & binary tailorings in resource bundles: source/i18n/ucol_imp.h (ICU 52)
9562e5b6d6dSopenharmony_ci*   Generator tool:
9572e5b6d6dSopenharmony_ci    [genuca](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genuca),
9582e5b6d6dSopenharmony_ci    [genrb](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/genrb)
9592e5b6d6dSopenharmony_ci
9602e5b6d6dSopenharmony_ci#### Collation data (Inverse UCA, weights->code points; ICU 52 & earlier)
9612e5b6d6dSopenharmony_ci*   Source format: Processed from FractionalUCA.txt like ICU 52 ucadata.icu
9622e5b6d6dSopenharmony_ci*   Binary format: invuca.icu: source/i18n/ucol_imp.h (ICU 52)
9632e5b6d6dSopenharmony_ci*   Generator tool:
9642e5b6d6dSopenharmony_ci    [genuca](https://github.com/unicode-org/icu/blob/main/tools/unicode/c/genuca)
9652e5b6d6dSopenharmony_ci
9662e5b6d6dSopenharmony_ci#### Dictionary-based break iterator data (ICU 49 & earlier)
9672e5b6d6dSopenharmony_ci*   Source format: .txt: genctd.cpp comments
9682e5b6d6dSopenharmony_ci*   Binary format: ctd: see CompactTrieHeader in source/common/triedict.cpp
9692e5b6d6dSopenharmony_ci*   Generator tool: genctd
9702e5b6d6dSopenharmony_ci
9712e5b6d6dSopenharmony_ci#### Time zone data (Before ICU 4.4)
9722e5b6d6dSopenharmony_ci*   Source format: .source/data/misc/zoneinfo.txt (ICU 4.2): ftp://elsie.nci.nih.gov/pub/ tzdata<year><rev>.tar.gz 
9732e5b6d6dSopenharmony_ci*   Binary format: zoneinfo64.res (generated by genrb and
9742e5b6d6dSopenharmony_ci    [tzcode tools](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/readme.txt)).
9752e5b6d6dSopenharmony_ci*   Generator tool: Does not apply
9762e5b6d6dSopenharmony_ci
9772e5b6d6dSopenharmony_ci### Non-File API Binary Data
9782e5b6d6dSopenharmony_ci
9792e5b6d6dSopenharmony_ci#### Converter selector data
9802e5b6d6dSopenharmony_ci*   Source format: none
9812e5b6d6dSopenharmony_ci*   Binary format:
9822e5b6d6dSopenharmony_ci    [source/common/ucnvsel.cpp](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucnvsel.cpp)
9832e5b6d6dSopenharmony_ci*   Generator tool:
9842e5b6d6dSopenharmony_ci    [ucnvsel_open()](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucnvsel.cpp)
9852e5b6d6dSopenharmony_ci
9862e5b6d6dSopenharmony_ci### Test-Only Data Files
9872e5b6d6dSopenharmony_ci
9882e5b6d6dSopenharmony_ci#### test.icu (for udata API testing)
9892e5b6d6dSopenharmony_ci*   Source format: none (fixed output from gentest when not using -r or -j options)
9902e5b6d6dSopenharmony_ci*   Binary format: test.icu: see `createData()` in
9912e5b6d6dSopenharmony_ci                   [source/tools/gentest/gentest.c](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gentest/gentest.c)
9922e5b6d6dSopenharmony_ci*   Generator tool:
9932e5b6d6dSopenharmony_ci    [gentest](https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/gentest/gentest.c)
9942e5b6d6dSopenharmony_ci
9952e5b6d6dSopenharmony_ci### Other Data Structures
9962e5b6d6dSopenharmony_ci
9972e5b6d6dSopenharmony_ci#### UCPTrie (C)/CodePointTrie (Java) (maps code points to integers)
9982e5b6d6dSopenharmony_ci*   Source format: (public builder API)
9992e5b6d6dSopenharmony_ci*   Binary format:
10002e5b6d6dSopenharmony_ci    [ICU Code Point Tries design doc](https://icu.unicode.org/design/struct/utrie),
10012e5b6d6dSopenharmony_ci    [icu4c/source/common/ucptrie_impl.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ucptrie_impl.h)
10022e5b6d6dSopenharmony_ci*   Generator tool: (builder class)
10032e5b6d6dSopenharmony_ci
10042e5b6d6dSopenharmony_ci#### UTrie2 (C)/Trie2 (Java) (maps code points to integers)
10052e5b6d6dSopenharmony_ci*   Source format: (internal builder API)
10062e5b6d6dSopenharmony_ci*   Binary format:
10072e5b6d6dSopenharmony_ci    [ICU Code Point Tries design doc](https://icu.unicode.org/design/struct/utrie),
10082e5b6d6dSopenharmony_ci    [icu4c/source/common/utrie2_impl.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/utrie2_impl.h)
10092e5b6d6dSopenharmony_ci*   Generator tool: (builder class)
10102e5b6d6dSopenharmony_ci
10112e5b6d6dSopenharmony_ci#### BytesTrie (maps byte sequences to 32-bit integers)
10122e5b6d6dSopenharmony_ci*   Source format: (public builder API)
10132e5b6d6dSopenharmony_ci*   Binary format:
10142e5b6d6dSopenharmony_ci    [BytesTrie design doc](https://icu.unicode.org/design/struct/tries/bytestrie),
10152e5b6d6dSopenharmony_ci    [icu4c/source/common/unicode/bytestrie.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/bytestrie.h)
10162e5b6d6dSopenharmony_ci*   Generator tool: (builder class)
10172e5b6d6dSopenharmony_ci
10182e5b6d6dSopenharmony_ci#### UCharsTrie (C++)/CharsTrie (Java) (maps 16-bit-Unicode strings to 32-bit integers)
10192e5b6d6dSopenharmony_ci*   Source format: (public builder API)
10202e5b6d6dSopenharmony_ci*   Binary format:
10212e5b6d6dSopenharmony_ci    [UCharsTrie design doc](https://icu.unicode.org/design/struct/tries/ucharstrie),
10222e5b6d6dSopenharmony_ci    [icu4c/source/common/unicode/ucharstrie.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/ucharstrie.h)
10232e5b6d6dSopenharmony_ci*   Generator tool: (builder class)
10242e5b6d6dSopenharmony_ci
10252e5b6d6dSopenharmony_ci## ICU4J Resource Information
10262e5b6d6dSopenharmony_ci
10272e5b6d6dSopenharmony_ciStarting with release 2.1, ICU4J includes its own resource information which is
10282e5b6d6dSopenharmony_cicompletely independent of the JRE resource information. (Note, ICU4J 2.8 to 3.4,
10292e5b6d6dSopenharmony_citime zone information depends on the underlying JRE). The new ICU4J information
10302e5b6d6dSopenharmony_ciis equivalent to the information in ICU4C and many resources are, in fact, the
10312e5b6d6dSopenharmony_cisame binary files that ICU4C uses.
10322e5b6d6dSopenharmony_ci
10332e5b6d6dSopenharmony_ciBy default the ICU4J distribution includes all of the standard resource
10342e5b6d6dSopenharmony_ciinformation. It is located under the directory `com/ibm/icu/impl/data`.
10352e5b6d6dSopenharmony_ciDepending on the service, the data is in different locations and in different
10362e5b6d6dSopenharmony_ciformats. Note: This will continue to change from release to release, so clients
10372e5b6d6dSopenharmony_cishould not depend on the exact organization of the data in ICU4J.
10382e5b6d6dSopenharmony_ci
10392e5b6d6dSopenharmony_ci1.  The primary **locale data** is under the directory icudt38b, as a set of
10402e5b6d6dSopenharmony_ci    ".res" files whose names are the locale identifiers. Locale naming is
10412e5b6d6dSopenharmony_ci    documented in the `com.ibm.icu.util.ULocale` class, and the use of these
10422e5b6d6dSopenharmony_ci    names in     searching for resources is documented in
10432e5b6d6dSopenharmony_ci    `com.ibm.icu.util.UResourceBundle`.
10442e5b6d6dSopenharmony_ci
10452e5b6d6dSopenharmony_ci2.  The **collation data** is under the directory `icudt38b/coll`, as a set of
10462e5b6d6dSopenharmony_ci    ".res" files.
10472e5b6d6dSopenharmony_ci
10482e5b6d6dSopenharmony_ci3.  The **rule-based transliterator data** is under the directory
10492e5b6d6dSopenharmony_ci    `icudt38b/translit` as a set of ".res" files. (**Note:** the Han
10502e5b6d6dSopenharmony_ci    transliterator test data is no longer included in the core icu4j.jar file by
10512e5b6d6dSopenharmony_ci    default.)
10522e5b6d6dSopenharmony_ci
10532e5b6d6dSopenharmony_ci4.  The **rule-based number format data** is under the directory `icudt38b/rbnf`
10542e5b6d6dSopenharmony_ci    as a set of ".res" files.
10552e5b6d6dSopenharmony_ci
10562e5b6d6dSopenharmony_ci5.  The **break iterator data** is directly under the data directory, as a set
10572e5b6d6dSopenharmony_ci    of ".brk" files, named according to the type of break and the locale where
10582e5b6d6dSopenharmony_ci    there are locale-specific versions.
10592e5b6d6dSopenharmony_ci
10602e5b6d6dSopenharmony_ci6.  The **holiday data** is under the data directory, as a set of ".class"
10612e5b6d6dSopenharmony_ci    files, named "HolidayBundle_" followed by the locale ID.
10622e5b6d6dSopenharmony_ci
10632e5b6d6dSopenharmony_ci7.  The **character property data** as well as assorted **normalization data**
10642e5b6d6dSopenharmony_ci    and default **unicode collation algorithm (UCA) data** is found under the
10652e5b6d6dSopenharmony_ci    data directory as a set of ".icu" files.
10662e5b6d6dSopenharmony_ci
10672e5b6d6dSopenharmony_ci8.  The **character set converter data** is under the directory `icudt38b/`, as
10682e5b6d6dSopenharmony_ci    a set of ".cnv" files. These files are currently included only in
10692e5b6d6dSopenharmony_ci    icu-charset.jar.
10702e5b6d6dSopenharmony_ci
10712e5b6d6dSopenharmony_ci9.  The **time zone data** is named `zoneinfo.res` under the directory
10722e5b6d6dSopenharmony_ci    `icudt38b`.
10732e5b6d6dSopenharmony_ci
10742e5b6d6dSopenharmony_ciSome of the data files alias or otherwise reference data from other data files.
10752e5b6d6dSopenharmony_ciOne reason for this is because some locale names have changed. For example,
10762e5b6d6dSopenharmony_cihe_IL used to be iw_IL. In order to support both names but not duplicate the
10772e5b6d6dSopenharmony_cidata, one of the resource files refers to the other file's data. In other cases,
10782e5b6d6dSopenharmony_cia file may alias a portion of another file's data in order to save space.
10792e5b6d6dSopenharmony_ciCurrently ICU4J provides no tool for revealing these dependencies.
10802e5b6d6dSopenharmony_ci
10812e5b6d6dSopenharmony_ci> :point_right: **Note**: Java's Locale class silently converts the language
10822e5b6d6dSopenharmony_cicode "he" to "iw" when you construct the Locale (for versions of Java through
10832e5b6d6dSopenharmony_ciJava 5). Thus Java cannot be used to locate resources that use the "he" language
10842e5b6d6dSopenharmony_cicode. ICU, on the other hand, does not perform this conversion in ULocale, and
10852e5b6d6dSopenharmony_ciinstead uses aliasing in the locale data to represent the same set of data under
10862e5b6d6dSopenharmony_cidifferent locale ids.
10872e5b6d6dSopenharmony_ci
10882e5b6d6dSopenharmony_ciResource files that use locale ids form a hierarchy, with up to four levels: a
10892e5b6d6dSopenharmony_ciroot, language, region (country), and variant. Searches for locale data attempt
10902e5b6d6dSopenharmony_cito match as far down the hierarchy as possible, for example, "he_IL" will match
10912e5b6d6dSopenharmony_cihe_IL, but "he_US" will match he (since there is no US variant for he, and
10922e5b6d6dSopenharmony_ci"xx_YY will match root (the default fallback locale) since there is no xx
10932e5b6d6dSopenharmony_cilanguage code in the locale hierarchy. Again, see `java.util.ResourceBundle` for
10942e5b6d6dSopenharmony_cimore information.
10952e5b6d6dSopenharmony_ci
10962e5b6d6dSopenharmony_ciCurrently ICU4J provides no tool for revealing these dependencies between data
10972e5b6d6dSopenharmony_cifiles, so trimming the data directly in the ICU4J project is a hit-or-miss
10982e5b6d6dSopenharmony_ciaffair. The key point when you remove data is to make sure to remove all
10992e5b6d6dSopenharmony_cidependencies on that data as well. For example, if you remove he.res, you need
11002e5b6d6dSopenharmony_cito remove he_IL.res, since it is lower in the hierarchy, and you must remove
11012e5b6d6dSopenharmony_ciiw.res, since it references he.res, and iw_IL.res, since it depends on it (and
11022e5b6d6dSopenharmony_cialso references he_IL.res).
11032e5b6d6dSopenharmony_ci
11042e5b6d6dSopenharmony_ciUnfortunately, the jar tool in the JDK provides no way to remove items from a
11052e5b6d6dSopenharmony_cijar file. Thus you have to extract the resources, remove the ones you don't
11062e5b6d6dSopenharmony_ciwant, and then create a new jar file with the remaining resources. See the jar
11072e5b6d6dSopenharmony_citool information for how to do this. Before 'rejaring' the files, be sure to
11082e5b6d6dSopenharmony_cithoroughly test your application with the remaining resources, making sure each
11092e5b6d6dSopenharmony_cirequired resource is present.
11102e5b6d6dSopenharmony_ci
11112e5b6d6dSopenharmony_ci#### Using additional resource files with ICU4J
11122e5b6d6dSopenharmony_ci
11132e5b6d6dSopenharmony_ci> :point_right: **Note**: Resource file formats can change across releases of ICU4J!
11142e5b6d6dSopenharmony_ci> 
11152e5b6d6dSopenharmony_ci> *The format of ICU4J resources is not part of the API. Clients who develop their
11162e5b6d6dSopenharmony_ci> own resources for use with ICU4J should be prepared to regenerate them when they
11172e5b6d6dSopenharmony_ci> move to new releases of ICU4J.*
11182e5b6d6dSopenharmony_ci
11192e5b6d6dSopenharmony_ciWe are still developing ICU4J's resource mechanism. Currently it is not possible
11202e5b6d6dSopenharmony_cito mix icu's new binary .res resources with traditional java-style .class or
11212e5b6d6dSopenharmony_ci.txt resources. We might allow for this in a future release, but since the
11222e5b6d6dSopenharmony_ciresource data and format is not formally supported, you run the risk of
11232e5b6d6dSopenharmony_ciincompatibilities with future releases of ICU4J.
11242e5b6d6dSopenharmony_ci
11252e5b6d6dSopenharmony_ciResource data in ICU4J is checked in to the repository as a jar file containing
11262e5b6d6dSopenharmony_cithe resource binaries, icudata.jar. This means that inspecting the contents of
11272e5b6d6dSopenharmony_cithese resources is difficult. They currently are compiled from ICU4C .txt file
11282e5b6d6dSopenharmony_cidata. You can view the contents of the ICU4C text resource files to understand
11292e5b6d6dSopenharmony_cithe contents of the ICU4J resources.
11302e5b6d6dSopenharmony_ci
11312e5b6d6dSopenharmony_ciThe files in icudata.jar get extracted to com/ibm/icu/impl/data in the build
11322e5b6d6dSopenharmony_cidirectory when the 'core' target is built. Building the 'resources' target will
11332e5b6d6dSopenharmony_ciforce the resources to once again be extracted. Extraction will overwrite any
11342e5b6d6dSopenharmony_cicorresponding resource files already in that directory.
11352e5b6d6dSopenharmony_ci
11362e5b6d6dSopenharmony_ci### Building ICU4J Resources from ICU4C
11372e5b6d6dSopenharmony_ci
11382e5b6d6dSopenharmony_ci#### Requirements
11392e5b6d6dSopenharmony_ci
11402e5b6d6dSopenharmony_ci1.  [ICU4C](https://icu.unicode.org/download)
11412e5b6d6dSopenharmony_ci
11422e5b6d6dSopenharmony_ci2.  Compilers and tools required for [building ICU4C](../icu4c/build.md).
11432e5b6d6dSopenharmony_ci
11442e5b6d6dSopenharmony_ci3.  J2SE SDK version 5 or above
11452e5b6d6dSopenharmony_ci
11462e5b6d6dSopenharmony_ci#### Procedure
11472e5b6d6dSopenharmony_ci
11482e5b6d6dSopenharmony_ci1.  Download and build ICU4C on a Windows or Linux machine. For instructions on downloading and building ICU4C, please click
11492e5b6d6dSopenharmony_ci    [here](../icu4c/build.md).
11502e5b6d6dSopenharmony_ci
11512e5b6d6dSopenharmony_ci2.  Follow the remaining instructions in
11522e5b6d6dSopenharmony_ci    the [ICU4J Readme](../icu4j/).
1153