1---
2layout: default
3title: Coding Guidelines
4nav_order: 1
5parent: Contributors
6---
7<!--
8© 2020 and later: Unicode, Inc. and others.
9License & terms of use: http://www.unicode.org/copyright.html
10-->
11
12# Coding Guidelines
13{: .no_toc }
14
15## Contents
16{: .no_toc .text-delta }
17
181. TOC
19{:toc}
20
21---
22
23## Overview
24
25This section provides the guidelines for developing C and C++ code, based on the
26coding conventions used by ICU programmers in the creation of the ICU library.
27
28## Details about ICU Error Codes
29
30When calling an ICU API function and an error code pointer (C) or reference
31(C++), a `UErrorCode` variable is often passed in. This variable is allocated by
32the caller and must pass the test `U_SUCCESS()` before the function call.
33Otherwise, the function will return immediately, taking no action. Normally, an
34error code variable is initialized by `U_ZERO_ERROR`.
35
36`UErrorCode` is passed around and used this way, instead of using C++ exceptions
37for the following reasons:
38
39* It is useful in the same form for C also
40* Some C++ compilers do not support exceptions
41
42> :point_right: **Note**: *This error code mechanism, in fact, works similarly to
43> exceptions. If users call several ICU functions in a sequence, as soon as one
44> sets a failure code, the functions in the following example will not work. This
45> procedure prevents the API function from processing data that is not valid in
46> the sequence of function calls and relieves the caller from checking the error
47> code after each call. It is somewhat similar to how an exception terminates a
48> function block or try block early.*
49
50Functions with a UErrorCode parameter will typically check it as the very first
51thing, returning immediately in case of failure. An exception to this general
52rule occurs with functions that adopt, or take ownership of other objects.
53See [Adoption of Objects](#adoption-of-objects) for further information.
54The following code shows the inside of an ICU function implementation:
55
56```c++
57U_CAPI const UBiDiLevel * U_EXPORT2
58ubidi_getLevels(UBiDi *pBiDi, UErrorCode *pErrorCode) {
59    int32_t start, length;
60
61    if(U_FAILURE(*pErrorCode)) {
62        return nullptr;
63    } else if(pBiDi==nullptr || (length=pBiDi->length)<=0) {
64        *pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
65        return nullptr;
66    }
67
68    ...
69    return result;
70}
71```
72
73Note: We have decided that we do not want to test for `pErrorCode==NULL`. Some
74existing code does this, but new code should not.
75
76Note: *Callers* (as opposed to implementers) of ICU APIs can simplify their code
77by defining and using a subclass of `icu::ErrorCode`. ICU implementers can use the
78`IcuTestErrorCode` class in intltest code.
79
80It is not necessary to check for `U_FAILURE()` immediately before calling a
81function that takes a `UErrorCode` parameter, because that function is supposed to
82check for failure. Exception: If the failure comes from objection allocation or
83creation, then you probably have a `NULL` object pointer and must not call any
84method on that object, not even one with a `UErrorCode` parameter.
85
86### Sample Function with Error Checking
87
88```c++
89    U_CAPI int32_t U_EXPORT2
90    uplrules_select(const UPluralRules *uplrules,   // Do not check
91                                                    // "this"/uplrules vs. NULL.
92                    double number,
93                    UChar *keyword, int32_t capacity,
94                    UErrorCode *status)             // Do not check status!=NULL.
95    {
96        if (U_FAILURE(*status)) {                   // Do check for U_FAILURE()
97                                                    // before setting *status
98            return 0;                               // or calling UErrorCode-less
99                                                    // select(number).
100        }
101        if (keyword == NULL ? capacity != 0 : capacity < 0) {
102                                                    // Standard destination buffer
103                                                    // checks.
104            *status = U_ILLEGAL_ARGUMENT_ERROR;
105            return 0;
106        }
107        UnicodeString result = ((PluralRules*)uplrules)->select(number);
108        return result.extract(keyword, capacity, *status);
109    }
110```
111
112### New API Functions
113
114If the API function is non-const, then it should have a `UErrorCode` parameter.
115(Not the other way around: Some const functions may need a `UErrorCode` as well.)
116
117Default C++ assignment operators and copy constructors should not be used (they
118should be declared private and not implemented). Instead, define an `assign(Class
119&other, UErrorCode &errorCode)` function. Normal constructors are fine, and
120should have a `UErrorCode` parameter.
121
122### Warning Codes
123
124Some `UErrorCode` values do not indicate a failure but an additional informational
125return value. Their enum constants have the `_WARNING` suffix and they pass the
126`U_SUCCESS()` test.
127
128However, experience has shown that they are problematic: They can get lost
129easily because subsequent function calls may set their own "warning" codes or
130may reset a `UErrorCode` to `U_ZERO_ERROR`.
131
132The source of the problem is that the `UErrorCode` mechanism is designed to mimic
133C++/Java exceptions. It prevents ICU function execution after a failure code is
134set, but like exceptions it does not work well for non-failure information
135passing.
136
137Therefore, we recommend to use warning codes very carefully:
138
139* Try not to rely on any warning codes.
140* Use real APIs to get the same information if possible.
141  For example, when a string is completely written but cannot be
142  NUL-terminated, then `U_STRING_NOT_TERMINATED_WARNING` indicates this, but so
143  does the returned destination string length (which will have the same value
144  as the destination capacity in this case). Checking the string length is
145  safer than checking the warning code. (It is even safer to not rely on
146  NUL-terminated strings but to use the length.)
147* If warning codes must be used, then the best is to set the `UErrorCode` to
148  `U_ZERO_ERROR` immediately before calling the function in question, and to
149  check for the expected warning code immediately after the function returns.
150
151Future versions of ICU will not introduce new warning codes, and will provide
152real API replacements for all existing warning codes.
153
154### Bogus Objects
155
156Some objects, for example `UnicodeString` and `UnicodeSet`, can become "bogus". This
157is used when methods that create or modify the object fail (mostly due to an
158out-of-memory condition) but do not take a `UErrorCode` parameter and can
159therefore not otherwise report the failure.
160
161* A bogus object appears as empty.
162* A bogus object cannot be modified except with assignment-like functions.
163* The bogus state of one object does not transfer to another. For example,
164  adding a bogus `UnicodeString` to a `UnicodeSet` does not make the set bogus.
165  (It would be hard to make propagation consistent and test it well. Also,
166  propagation among bogus states and error codes would be messy.)
167* If a bogus object is passed into a function that does have a `UErrorCode`
168  parameter, then the function should set the `U_ILLEGAL_ARGUMENT_ERROR` code.
169
170## API Documentation
171
172"API" means any public class, function, or constant.
173
174### API status tag
175
176Aside from documenting an API's functionality, parameters, return values etc. we
177also mark every API with whether it is `@draft`, `@stable`, `@deprecated` or
178`@internal`. (Where `@internal` is used when something is not actually supported
179API but needs to be physically public anyway.) A new API is usually marked with
180"`@draft ICU 4.8`". For details of how we mark APIs see the "ICU API
181compatibility" section of the [ICU Architectural Design](../design.md) page. In
182Java, also see existing @draft APIs for complete examples.
183
184Functions that override a base class or interface definition take the API status
185of the base class function. For C++, use the `@copydoc base::function()` tag to
186copy both the description and the API status from the base function definition.
187For Java methods the status tags must be added by hand; use the `{@inheritDoc}`
188JavaDoc tag to pick up the rest of the base function documentation.
189Documentation should not be manually replicated in overriding functions; it is
190too hard to keep multiple copies synchronized.
191
192The policy for the treatment of status tags in overriding functions was
193introduced with ICU 64 for C++, and with ICU 59 for Java. Earlier code may
194deviate.
195
196### Coding Example
197
198Coding examples help users to understand the usage of each API. Whenever
199possible, it is encouraged to embed a code snippet illustrating the usage of an
200API along with the functional specification.
201
202#### Embedding Coding Examples in ICU4J - JCite
203
204Since ICU4J 49M2, the ICU4J ant build target "doc" utilizes an external tool
205called [JCite](https://arrenbrecht.ch/jcite/). The tool allows us to cite a
206fragment of existing source code into JavaDoc comment using a tag. To embed a
207code snippet with the tag. For example,
208`{@.jcite com.ibm.icu.samples.util.timezone.BasicTimeZoneExample:---getNextTransitionExample}`
209will be replaced a fragment of code marked by comment lines
210`// ---getNextTransisionExample` in `BasicTimeZoneExample.java` in package
211`com.ibm.icu.samples.util.timezone`. When embedding code snippet using JCite, we
212recommend to follow next guidelines
213
214* A sample code should be placed in `<icu4j_root>/samples/src` directory,
215  although you can cite any source fragment from source files in
216  `<icu4j_root>/demos/src`, `<icu4j_root\>/main/core/*/src`,
217  `<icu4j_root>/main/test/*/src`.
218* A sample code should use package name -
219  `com.ibm.icu.samples.<subpackage>.<facility>`. `<subpackage>` is corresponding
220  to the target ICU API class's package, that is, one of lang/math/text/util.
221  `<facility>` is a name of facility, which is usually the base class of the
222  service. For example, use package `com.ibm.icu.samples.text.dateformat` for
223  samples related to ICU's date format service,
224  `com.ibm.icu.samples.util.timezone` for samples related to time zone service.
225* A sample code should be self-contained as much as possible (use only JDK and
226  ICU public APIs if possible). This allows readers to cut & paste a code
227  snippet to try it out easily.
228* The citing comment should start with three consecutive hyphen followed by
229  lower camel case token - for example, "`// ---compareToExample`"
230* Keep in mind that the JCite tag `{@.jcite ...}` is not resolved without JCite.
231  It is encouraged to avoid placing code snippet within a sentence. Instead,
232  you should place a code snippet using JCite in an independent paragraph.
233
234#### Embedding Coding Examples in ICU4C
235
236Also since ICU4C 49M2, ICU4C docs (using the [\\snippet command](http://www.doxygen.nl/manual/commands.html#cmdsnippet)
237which is new in Doxygen 1.7.5) can cite a fragment of existing sample or test code.
238
239Example in `ucnv.h`:
240
241```c++
242 /**
243  * \snippet samples/ucnv/convsamp.cpp ucnv_open
244  */
245 ucnv_open( ... ) ...
246```
247
248This cites code in `icu4c/source/samples/ucnv/convsamp.cpp` as follows:
249
250```c++
251  //! [ucnv_open]
252  conv = ucnv_open("koi8-r", &status);
253  //! [ucnv_open]
254```
255
256Notice the tag "`ucnv_open`" which must be the same in all three places (in
257the header file, and twice in the cited file).
258
259## C and C++ Coding Conventions Overview
260
261The ICU group uses the following coding guidelines to create software using the
262ICU C++ classes and methods as well as the ICU C methods.
263
264### C/C++ Hiding Un-@stable APIs
265
266In C/C++, we enclose `@draft` and such APIs with `#ifndef U_HIDE_DRAFT_API` or
267similar as appropriate. When a draft API becomes stable, we need to remove the
268surrounding `#ifndef`.
269
270Note: The `@system` tag is *in addition to* the
271`@draft`/`@stable`/`@deprecated`/`@obsolete` status tag.
272
273Copy/paste the appropriate `#ifndef..#endif` pair from the following:
274
275```c++
276#ifndef U_HIDE_DRAFT_API
277#endif  // U_HIDE_DRAFT_API
278
279#ifndef U_HIDE_DEPRECATED_API
280#endif  // U_HIDE_DEPRECATED_API
281
282#ifndef U_HIDE_OBSOLETE_API
283#endif  // U_HIDE_OBSOLETE_API
284
285#ifndef U_HIDE_SYSTEM_API
286#endif  // U_HIDE_SYSTEM_API
287
288#ifndef U_HIDE_INTERNAL_API
289#endif  // U_HIDE_INTERNAL_API
290```
291
292We `#ifndef` `@draft`/`@deprecated`/... APIs as much as possible, including C
293functions, many C++ class methods (see exceptions below), enum constants (see
294exceptions below), whole enums, whole classes, etc.
295
296We do not `#ifndef` APIs where that would be problematic:
297
298* struct/class members where that would modify the object layout (non-static
299  struct/class fields, virtual methods)
300* enum constants where that would modify the numeric values of following
301  constants
302  * actually, best to use `#ifndef` together with explicitly defining the
303    numeric value of the next constant
304* C++ class boilerplate (e.g., default/copy constructors), if
305  the compiler would auto-create public functions to replace `#ifndef`’ed ones
306  * For example, the compiler automatically creates a default constructor if
307    the class does not specify any other constructors.
308* private class members
309* definitions in internal/test/tools header files (that would be pointless;
310  they should probably not have API tags in the first place)
311* forward or friend declarations
312* definitions that are needed for other definitions that would not be
313  `#ifndef`'ed (e.g., for public macros or private methods)
314* platform macros (mostly in `platform.h`/`umachine.h` & similar) and
315  user-configurable settings (mostly in `uconfig.h`)
316
317More handy copy-paste text:
318
319```c++
320    // Do not enclose the protected default constructor with #ifndef U_HIDE_INTERNAL_API
321    // or else the compiler will create a public default constructor.
322
323    // Do not enclose protected default/copy constructors with #ifndef U_HIDE_INTERNAL_API
324    // or else the compiler will create public ones.
325```
326
327### C and C++ Type and Format Convention Guidelines
328
329The following C and C++ type and format conventions are used to maximize
330portability across platforms and to provide consistency in the code:
331
332#### Constants (#define, enum items, const)
333
334Use uppercase letters for constants. For example, use `UBREAKITERATOR_DONE`,
335`UBIDI_DEFAULT_LTR`, `ULESS`.
336
337For new enum types (as opposed to new values added to existing types), do not
338define enum types in C++ style. Instead, define C-style enums with U... type
339prefix and `U_`/`UMODULE_` constants. Define such enum types outside the ICU
340namespace and outside any C++ class. Define them in C header files if there are
341appropriate ones.
342
343#### Variables and Functions
344
345Use mixed-case letters that start with a lowercase letter for variables and
346functions. For example, use `getLength()`.
347
348#### Types (class, struct, enum, union)
349
350Use mixed-case that start with an uppercase letter for types. For example, use
351class `DateFormatSymbols`.
352
353#### Function Style
354
355Use the `getProperty()` and `setProperty()` style for functions where a lowercase
356letter begins the first word and the second word is capitalized without a space
357between it and the first word. For example, `UnicodeString`
358`getSymbol(ENumberFormatSymbol symbol)`,
359`void setSymbol(ENumberFormatSymbol symbol, UnicodeString value)` and
360`getLength()`, `getSomethingAt(index/offset)`.
361
362#### Common Parameter Names
363
364In order to keep function parameter names consistent, the following are
365recommendations for names or suffixes (usual "Camel case" applies):
366
367* "start": the index (of the first of several code units) in a string or array
368* "limit": the index (of the **first code unit after** a specified range) in a
369  string or array (the number of units are (limit-start))
370* name the length (for the number of code units in a (range of a) string or
371  array) either "length" or "somePrefixLength"
372* name the capacity (for the number of code units available in an output
373  buffer) either "capacity" or "somePrefixCapacity"
374
375#### Order of Source/Destination Arguments
376
377Many ICU function signatures list source arguments before destination arguments,
378as is common in C++ and Java APIs. This is the preferred order for new APIs.
379(Example: `ucol_getSortKey(const UCollator *coll, const UChar *source,
380int32_t sourceLength, uint8_t *result, int32_t resultLength)`)
381
382Some ICU function signatures list destination arguments before source arguments,
383as is common in C standard library functions. This should be limited to
384functions that closely resemble such C standard library functions or closely
385related ICU functions. (Example: `u_strcpy(UChar *dst, const UChar *src)`)
386
387#### Order of Include File Includes
388
389Include system header files (like `<stdio.h>`) before ICU headers followed by
390application-specific ones. This assures that ICU headers can use existing
391definitions from system headers if both happen to define the same symbols. In
392ICU files, all used headers should be explicitly included, even if some of them
393already include others.
394
395Within a group of headers, place them in alphabetical order.
396
397#### Style for ICU Includes
398
399All ICU headers should be included using ""-style includes (like
400`"unicode/utypes.h"` or `"cmemory.h"`) in source files for the ICU library, tools,
401and tests.
402
403#### Pointer Conversions
404
405Do not cast pointers to integers or integers to pointers. Also, do not cast
406between data pointers and function pointers. This will not work on some
407compilers, especially with different sizes of such types. Exceptions are only
408possible in platform-specific code where the behavior is known.
409
410Please use C++-style casts, at least for pointers, for example `const_cast`.
411
412* For conversion between related types, for example from a base class to a
413  subclass (when you *know* that the object is of that type), use
414  `static_cast`. (When you are not sure if the object has the subclass type,
415  then use a `dynamic_cast`; see a later section about that.)
416* Also use `static_cast`, not `reinterpret_cast`, for conversion from `void *`
417  to a specific pointer type. (This is accepted and recommended because there
418  is an implicit conversion available for the opposite conversion.) See
419  [ICU-9434](https://unicode-org.atlassian.net/browse/ICU-9434) for details.
420* For conversion between unrelated types, for example between `char *` and
421  `uint8_t *`, or between `Collator *` and `UCollator *`, use a
422  `reinterpret_cast`.
423
424#### Returning a Number of Items
425
426To return a number of items, use `countItems()`, **not** `getItemCount()`, even if
427there is no need to actually count using that member function.
428
429#### Ranges of Indexes
430
431Specify a range of indexes by having start and limit parameters with names or
432suffix conventions that represent the index. A range should contain indexes from
433start to limit-1 such as an interval that is left-closed and right-open. Using
434mathematical notation, this is represented as: \[start..limit\[.
435
436#### Functions with Buffers
437
438Set the default value to -1 for functions that take a buffer (pointer) and a
439length argument with a default value so that the function determines the length
440of the input itself (for text, calling `u_strlen()`). Any other negative or
441undefined value constitutes an error.
442
443#### Primitive Types
444
445Primitive types are defined by the `unicode/utypes.h` file or a header file that
446includes other header files. The most common types are `uint8_t`, `uint16_t`,
447`uint32_t`, `int8_t`, `int16_t`, `int32_t`, `char16_t`,
448`UChar` (same as `char16_t`), `UChar32` (signed, 32-bit), and `UErrorCode`.
449
450The language built-in type `bool` and constants `true` and `false` may be used
451internally, for local variables and parameters of internal functions. The ICU
452type `UBool` must be used in public APIs and in the definition of any persistent
453data structures. `UBool` is guaranteed to be one byte in size and signed; `bool`
454is not. **Except**: Starting with ICU 70 (2021q4), `operator==()` and
455`operator!=()` must return `bool`, not `UBool`, because of a change in C++20,
456see [ICU-20973](https://unicode-org.atlassian.net/browse/ICU-20973).
457
458Traditionally, ICU4C has defined its own `FALSE`=0 / `TRUE`=1 macros for use with `UBool`.
459Starting with ICU 68 (2020q4), we no longer define these in public header files
460(unless `U_DEFINE_FALSE_AND_TRUE`=1),
461in order to avoid name collisions with code outside ICU defining enum constants and similar
462with these names.
463Starting with ICU 72 (2022q4), we no longer use these anywhere in ICU.
464
465Instead, the versions of the C and C++ standards we require now do define type `bool`
466and values `false` & `true`, and we and our users can use these values.
467
468As of ICU 70, we are not changing ICU4C API from `UBool` to `bool`, except on
469equality operators (see above).
470Doing so in C API, or in structs that cross the library boundary,
471would break binary compatibility.
472Doing so only in other places in C++ could be confusingly inconsistent.
473We may revisit this.
474
475Note that the details of type `bool` (e.g., `sizeof`) depend on the compiler and
476may differ between C and C++.
477
478#### File Names (.h, .c, .cpp, data files if possible, etc.)
479
480Limit file names to 31 lowercase ASCII characters. (Older versions of MacOS have
481that length limit.)
482
483Exception: The layout engine uses mixed-case file names.
484
485(We have abandoned the 8.3 naming standard although we do not change the names
486of old header files.)
487
488#### Language Extensions and Standards
489
490Proprietary features, language extensions, or library functions, must not be
491used because they will not work on all C or C++ compilers.
492In Microsoft Visual C++, go to Project Settings(alt-f7)->All Configurations->
493C/C++->Customize and check Disable Language Extensions.
494
495Exception: some Microsoft headers will not compile without language extensions
496being enabled, which in turn requires some ICU files be built with language
497extensions.
498
499#### Tabs and Indentation
500
501Save files with spaces instead of tab characters (\\x09). The indentation size
502is 4.
503
504#### Documentation
505
506Use Java doc-style in-file documentation created with
507[doxygen](http://www.doxygen.org/) .
508
509#### Multiple Statements
510
511Place multiple statements in multiple lines. `if()` or loop heads must not be
512followed by their bodies on the same line.
513
514#### Placements of `{}` Curly Braces
515
516Place curly braces `{}` in reasonable and consistent locations. Each of us
517subscribes to different philosophies. It is recommended to use the style of a
518file, instead of mixing different styles. It is requested, however, to not have
519`if()` and loop bodies without curly braces.
520
521#### `if() {...}` and Loop Bodies
522
523Use curly braces for `if()` and else as well as loop bodies, etc., even if there
524is only one statement.
525
526#### Function Declarations
527
528Have one line that has the return type and place all the import declarations,
529extern declarations, export declarations, the function name, and function
530signature at the beginning of the next line.
531
532Function declarations need to be in the form `U_CAPI` return-type `U_EXPORT2` to
533satisfy all the compilers' requirements.
534
535For example, use the following
536convention:
537
538```c++
539U_CAPI int32_t U_EXPORT2
540u_formatMessage(...);
541```
542
543> :point_right: **Note**: The `U_CAPI`/`U_DEPRECATED` and `U_EXPORT2` qualifiers
544> are required for both the declaration and the definition of *exported C and
545> static C++ functions*. Use `U_CAPI` (or `U_DEPRECATED`) before and `U_EXPORT2`
546> after the return type of *exported C and static C++ functions*.
547> 
548> Internal functions that are visible outside a compilation unit need a `U_CFUNC`
549> before the return type.
550> 
551> *Non-static C++ class member functions* do *not* get `U_CAPI`/`U_EXPORT2`
552> because they are exported and declared together with their class exports.
553
554> :point_right: **Note**: Before ICU 68 (2020q4) we used to use alternate qualifiers
555> like `U_DRAFT`, `U_STABLE` etc. rather than `U_CAPI`,
556> but keeping these in sync with API doc tags `@draft` and guard switches like `U_HIDE_DRAFT_API`
557> was tedious and error-prone and added no value.
558> Since ICU 68 (ICU-9961) we only use `U_CAPI` and `U_DEPRECATED`.
559
560#### Use Anonymous Namesapces or Static For File Scope
561
562Use anonymous namespaces or `static` for variables, functions, and constants that
563are not exported explicitly by a header file. Some platforms are confused if
564non-static symbols are not explicitly declared extern. These platforms will not
565be able to build ICU nor link to it.
566
567#### Using C Callbacks From C++ Code
568
569z/OS and Windows COM wrappers around ICU need `__cdecl` for callback functions.
570The reason is that C++ can have a different function calling convention from C.
571These callback functions also usually need to be private. So the following code
572
573```c++
574UBool
575isAcceptable(void * /* context */,
576             const char * /* type */, const char * /* name */,
577             const UDataInfo *pInfo)
578{
579    // Do something here.
580}
581```
582
583should be changed to look like the following by adding `U_CDECL_BEGIN`, `static`,
584`U_CALLCONV` and `U_CDECL_END`.
585
586```c++
587U_CDECL_BEGIN
588static UBool U_CALLCONV
589isAcceptable(void * /* context */,
590             const char * /* type */, const char * /* name */,
591             const UDataInfo *pInfo)
592{
593    // Do something here.
594}
595U_CDECL_END
596```
597
598#### Same Module and Functionality in C and in C++
599
600Determine if two headers are needed. If the same functionality is provided with
601both a C and a C++ API, then there can be two headers, one for each language,
602even if one uses the other. For example, there can be `umsg.h` for C and `msgfmt.h`
603for C++.
604
605Not all functionality has or needs both kinds of API. More and more
606functionality is available only via C APIs to avoid duplication of API,
607documentation, and maintenance. C APIs are perfectly usable from C++ code,
608especially with `UnicodeString` methods that alias or expose C-style string
609buffers.
610
611#### Platform Dependencies
612
613Use the platform dependencies that are within the header files that `utypes.h`
614files include. They are `platform.h` (which is generated by the configuration
615script from `platform.h.in`) and its more specific cousins like `pwin32.h` for
616Windows, which define basic types, and `putil.h`, which defines platform
617utilities.
618**Important:** Outside of these files, and a small number of implementation
619files that depend on platform differences (like `umutex.c`), **no** ICU source
620code may have **any** `#ifdef` **OperatingSystemName** instructions.
621
622#### Short, Unnested Mutex Blocks
623
624Do not use function calls within a mutex block for mutual-exclusion (mutex)
625blocks. This can prevent deadlocks from occurring later. There should be as
626little code inside a mutex block as possible to minimize the performance
627degradation from blocked threads.
628Also, it is not guaranteed that mutex blocks are re-entrant; therefore, they
629must not be nested.
630
631#### Names of Internal Functions
632
633Internal functions that are not declared static (regardless of inlining) must
634follow the naming conventions for exported functions because many compilers and
635linkers do not distinguish between library exports and intra-library visible
636functions.
637
638#### Which Language for the Implementation
639
640Write implementation code in C++. Use objects very carefully, as always:
641Implicit constructors, assignments etc. can make simple-looking code
642surprisingly slow.
643
644For every C API, make sure that there is at least one call from a pure C file in
645the cintltst test suite.
646
647Background: We used to prefer C or C-style C++ for implementation code because
648we used to have users ask for pure C. However, there was never a large, usable
649subset of ICU that was usable without any C++ dependencies, and C++ can(!) make
650for much shorter, simpler, less error-prone and easier-to-maintain code, for
651example via use of "smart pointers" (`unicode/localpointer.h` and `cmemory.h`).
652
653We still try to expose most functionality via *C APIs* because of the
654difficulties of binary compatible C++ APIs exported from DLLs/shared libraries.
655
656#### No Compiler Warnings
657
658ICU must compile without compiler warnings unless such warnings are verified to
659be harmless or bogus. Often times a warning on one compiler indicates a breaking
660error on another.
661
662#### Enum Values
663
664When casting an integer value to an enum type, the enum type *should* have a
665constant with this integer value, or at least it *must* have a constant whose
666value is at least as large as the integer value being cast, with the same
667signedness. For example, do not cast a -1 to an enum type that only has
668non-negative constants. Some compilers choose the internal representation very
669tightly for the defined enum constants, which may result in the equivalent of a
670`uint8_t` representation for an enum type with only small, non-negative constants.
671Casting a -1 to such a type may result in an actual value of 255. (This has
672happened!)
673
674When casting an enum value to an integer type, make sure that the enum value's
675numeric value is within range of the integer type.
676
677#### Do not check for `this!=NULL`, do not check for `NULL` references
678
679In public APIs, assume `this!=0` and assume that references are not 0. In C code,
680`"this"` is the "service object" pointer, such as `set` in
681`uset_add(USet* set, UChar32 c)` — don't check for `set!=NULL`.
682
683We do usually check all other (non-this) pointers for `NULL`, in those cases when
684`NULL` is not valid. (Many functions allow a `NULL` string or buffer pointer if the
685length or capacity is 0.)
686
687Rationale: `"this"` is not really an argument, and checking it costs a little bit
688of code size and runtime. Other libraries also commonly do not check for valid
689`"this"`, and resulting failures are fairly obvious.
690
691### Memory Usage
692
693#### Dynamically Allocated Memory
694
695ICU4C APIs are designed to allow separate heaps for its libraries vs. the
696application. This is achieved by providing factory methods and matching
697destructors for all allocated objects. The C++ API uses a common base class with
698overridden `new`/`delete` operators and/or forms an equivalent pair with `createXyz()`
699factory methods and the `delete` operator. The C API provides pairs of `open`/`close`
700functions for each service. See the C++ and C guideline sections below for
701details.
702
703Exception: Most C++ API functions that return a `StringEnumeration` (by pointer
704which the caller must delete) are named `getXyz()` rather than `createXyz()`
705because `"get"` is much more natural. (These are not factory methods in the sense
706of `NumberFormat::createScientificInstance()`.) For example,
707`static StringEnumeration *Collator::``get``Keywords(UErrorCode &)`. We should document
708clearly in the API comments that the caller must delete the returned
709`StringEnumeration`.
710
711#### Declaring Static Data
712
713All unmodifiable data should be declared `const`. This includes the pointers and
714the data itself. Also if you do not need a pointer to a string, declare the
715string as an array. This reduces the time to load the library and all its
716pointers. This should be done so that the same library data can be shared across
717processes automatically. Here is an example:
718
719```c++
720#define MY_MACRO_DEFINED_STR "macro string"
721const char *myCString = "myCString";
722int16_t myNumbers[] = {1, 2, 3};
723```
724
725This should be changed to the following:
726
727```c++
728static const char MY_MACRO_DEFINED_STR[] = "macro string";
729static const char myCString[] = "myCString";
730static const int16_t myNumbers[] = {1, 2, 3};
731```
732
733#### No Static Initialization
734
735The most common reason to have static initialization is to declare a
736`static const UnicodeString`, for example (see `utypes.h` about invariant characters):
737
738```c++
739static const UnicodeString myStr("myStr", "");
740```
741
742The most portable and most efficient way to declare ASCII text as a Unicode
743string is to do the following instead:
744
745```c++
746static const UChar myStr[] = { 0x6D, 0x79, 0x53, 0x74, 0x72, 0}; /* "myStr" */
747```
748
749We do not use character literals
750for Unicode characters and strings because the execution character set of C/C++
751compilers is almost never Unicode and may not be ASCII-compatible (especially on
752EBCDIC platforms). Depending on the API where the string is to be used, a
753terminating NUL (0) may or may not be required. The length of the string (number
754of `UChar`s in the array) can be determined with `sizeof(myStr)/U_SIZEOF_UCHAR`,
755(subtract 1 for the NUL if present). Always remember to put in a comment at the
756end of the declaration what the Unicode string says.
757
758Static initialization of C++ objects **must not be used** in ICU libraries
759because of the following reasons:
760
7611. It leads to intractable order-of-initialization dependencies.
7622. It makes it difficult or impossible to release all of the libraries
763   resources. See `u_cleanup()`.
7643. It takes time to initialize the library.
7654. Dependency checking is not completely done in C or C++. For instance, if an
766   ICU user creates an ICU object or calls an ICU function statically that
767   depends on static data, it is not guaranteed that the statically declared
768   data is initialized.
7695. Certain users like to manage their own memory. They can not manage ICU's
770   memory properly because of item #2.
7716. It is easier to debug code that does not use static initialization.
7727. Memory allocated at static initialization time is not guaranteed to be
773   deallocated with a C++ destructor when the library is unloaded. This is a
774   problem when ICU is unloaded and reloaded into memory and when you are using
775   a heap debugging tool. It would also not work with the `u_cleanup()` function.
7768. Some platforms cannot handle static initialization or static destruction
777   properly. Several compilers have this random bug (even in the year 2001).
778
779ICU users can use the `U_STRING_DECL` and `U_STRING_INIT` macros for C strings. Note
780that on some platforms this will incur a small initialization cost (simple
781conversion). Also, ICU users need to make sure that they properly and
782consistently declare the strings with both macros. See `ustring.h` for details.
783
784### C++ Coding Guidelines
785
786This section describes the C++ specific guidelines or conventions to use.
787
788#### Portable Subset of C++
789
790ICU uses only a portable subset of C++ for maximum portability. Also, it does
791not use features of C++ that are not implemented well in all compilers or are
792cumbersome. In particular, ICU does not use exceptions, or the Standard Template
793Library (STL).
794
795We have started to use templates in ICU 4.2 (e.g., `StringByteSink`) and ICU 4.4
796(`LocalPointer` and some internal uses). We try to limit templates to where they
797provide a lot of benefit (robust code, avoid duplication) without much or any
798code bloat.
799
800We continue to not use the Standard Template Library (STL) in ICU library code
801because its design causes a lot of code bloat. More importantly:
802
803* Exceptions: STL classes and algorithms throw exceptions. ICU does not throw
804  exceptions, and ICU code is not exception-safe.
805* Memory management: STL uses default new/delete, or Allocator parameters
806  which create different types; they throw out-of-memory exceptions. ICU
807  memory allocation is customizable and must not throw exceptions.
808* Non-polymorphic: For APIs, STL classes are also problematic because
809  different template specializations create different types. For example, some
810  systems use custom string classes (different allocators, different
811  strategies for buffer sharing vs. copying), and ICU should be able to
812  interface with most of them.
813
814We have started to use compiler-provided Run-Time Type Information (RTTI) in ICU
8154.6. It is now required for building ICU, and encouraged for using ICU where
816RTTI is needed. For example, use `dynamic_cast<DecimalFormat*>` on a
817`NumberFormat` pointer that is usually but not always a `DecimalFormat` instance.
818Do not use `dynamic_cast<>` on a reference, because that throws a `bad_cast`
819exception on failure.
820
821ICU uses a limited form of multiple inheritance equivalent to Java's interface
822mechanism: All but one base classes must be interface/mixin classes, i.e., they
823must contain only pure virtual member functions. For details see the
824'boilerplate' discussion below. This restriction to at most one base class with
825non-virtual members eliminates problems with the use and implementation of
826multiple inheritance in C++. ICU does not use virtual base classes.
827
828> :point_right: **Note**: Every additional base class, *even an interface/mixin
829class*, adds another vtable pointer to each subclass object, that is, it
830*increases the object/instance size by 8 bytes* on most platforms.
831
832#### Classes and Members
833
834C++ classes and their members do not need a 'U' or any other prefix.
835
836#### Global Operators
837
838Global operators (operators that are not class members) can be problematic for
839library entry point versioning, may confuse users and cannot be easily ported to
840Java (ICU4J). They should be avoided if possible.
841
842~~The issue with library entry point versioning is that on platforms that do not
843support namespaces, users must rename all classes and global functions via
844urename.h. This renaming process is not possible with operators.~~ Starting with
845ICU 49, we require C++ namespace support. However, a global operator can be used
846in ICU4C (when necessary) if its function signature contains an ICU C++ class
847that is versioned. This will result in a mangled linker name that does contain
848the ICU version number via the versioned name of the class parameter. For
849example, ICU4C 2.8 added an operator + for `UnicodeString`, with two `UnicodeString`
850reference parameters.
851
852#### Virtual Destructors
853
854In classes with virtual methods, destructors must be explicitly declared, and
855must be defined (implemented) outside the class definition in a .cpp file.
856
857More precisely:
858
8591. All classes with any virtual members or any bases with any virtual members
860   should have an explicitly declared virtual destructor.
8612. Constructors and destructors should be declared and/or defined prior to
862   *any* other methods, public or private, within the class definition.
8633. All virtual destructors should be defined out-of-line, and in a .cpp file
864   rather than a header file.
865
866This is so that the destructors serve as "key functions" so that the compiler
867emits the vtable in only and exactly the desired files. It can help make
868binaries smaller that use statically-linked ICU libraries, because the compiler
869and linker can prove more easily that some code is not used.
870
871The Itanium C++ ABI (which is used on all x86 Linux) says: "The virtual table
872for a class is emitted in the same object containing the definition of its key
873function, i.e. the first non-pure virtual function that is not inline at the
874point of class definition. If there is no key function, it is emitted everywhere
875used."
876
877(This was first done in ICU 49; see [ticket #8454](https://unicode-org.atlassian.net/browse/ICU-8454.)
878
879#### Namespaces
880
881Beginning with ICU version 2.0, ICU uses namespaces. The actual namespace is
882`icu_M_N` with M being the major ICU release number and N being the minor ICU
883release number. For convenience, the namespace `icu` is an alias to the current
884release-specific one. (The actual namespace name is `icu` itself if renaming is
885turned off.)
886
887Starting with ICU 49, we require C++ namespace support.
888
889Class declarations, even forward declarations, must be scoped to the ICU
890namespace. For example:
891
892```c++
893U_NAMESPACE_BEGIN
894
895class Locale;
896
897U_NAMESPACE_END
898
899// outside U_NAMESPACE_BEGIN..U_NAMESPACE_END
900extern void fn(icu::UnicodeString&);
901
902// outside U_NAMESPACE_BEGIN..U_NAMESPACE_END
903// automatically set by utypes.h
904// but recommended to be not set automatically
905U_NAMESPACE_USE
906Locale loc("fi");
907```
908
909`U_NAMESPACE_USE` (expands to using namespace icu_M_N; when available) is
910automatically done when `utypes.h` is included, so that all ICU classes are
911immediately usable. However, we recommend that you turn this off via
912`CXXFLAGS="-DU_USING_ICU_NAMESPACE=0"`.
913
914#### Declare Class APIs
915
916Class APIs need to be declared like either of the following:
917
918#### Inline-Implemented Member Functions
919
920Class member functions are usually declared but not inline-implemented in the
921class declaration. A long function implementation in the class declaration makes
922it hard to read the class declaration.
923
924It is ok to inline-implement *trivial* functions in the class declaration.
925Pretty much everyone agrees that inline implementations are ok if they fit on
926the same line as the function signature, even if that means bending the
927single-statement-per-line rule slightly:
928
929```c++
930T *orphan() { T *p=ptr; ptr=NULL; return p; }
931```
932
933Most people also agree that very short multi-line implementations are ok inline
934in the class declaration. Something like the following is probably the maximum:
935
936```c++
937Value *getValue(int index) {
938    if(index>=0 && index<fLimit) {
939        return fArray[index];
940    }
941    return NULL;
942}
943```
944
945If the inline implementation is longer than that, then just declare the function
946inline and put the actual inline implementations after the class declaration in
947the same file. (See `unicode/unistr.h` for many examples.)
948
949If it's significantly longer than that, then it's probably not a good candidate
950for inlining anyway.
951
952#### C++ class layout and 'boilerplate'
953
954There are different sets of requirements for different kinds of C++ classes. In
955general, all instantiable classes (i.e., all classes except for interface/mixin
956classes and ones with only static member functions) inherit the `UMemory` base
957class. `UMemory` provides `new`/`delete` operators, which allows to keep the ICU
958heap separate from the application heap, or to customize ICU's memory allocation
959consistently.
960
961> :point_right: **Note**: Public ICU APIs must return or orphan only C++ objects
962that are to be released with `delete`. They must not return allocated simple
963types (including pointers, and arrays of simple types or pointers) that would
964have to be released with a `free()` function call using the ICU library's heap.
965Simple types and pointers must be returned using fill-in parameters (instead of
966allocation), or cached and owned by the returning API.
967
968**Public ICU C++ classes** must inherit either the `UMemory` or the `UObject`
969base class for proper memory management, and implement the following common set
970of 'boilerplate' functions:
971
972* default constructor
973* copy constructor
974* assignment operator
975* operator==
976* operator!=
977
978> :point_right: **Note**: Each of the above either must be implemented, verified
979that the default implementation according to the C++ standard will work
980(typically not if any pointers are used), or declared private without
981implementation.
982
983* If public subclassing is intended, then the public class must inherit
984  `UObject` and should implement
985  * `clone()`
986* **RTTI:**
987  * If a class is a subclass of a parent (e.g., `Format`) with ICU's "poor
988    man's RTTI" (Run-Time Type Information) mechanism (via
989    `getDynamicClassID()` and `getStaticClassID()`) then add that to the new
990    subclass as well (copy implementations from existing C++ APIs).
991  * If a class is a new, immediate subclass of `UObject` (e.g.,
992    `Normalizer2`), creating a whole new class hierarchy, then declare a
993    *private* `getDynamicClassID()` and define it to return `NULL` (to
994    override the pure virtual version in `UObject`); copy the relevant lines
995    from `normalizer2.h` and `normalizer2.cpp`
996    (`UOBJECT_DEFINE_NO_RTTI_IMPLEMENTATION(className)`). Do not add any
997    "poor man's RTTI" at all to subclasses of this class.
998
999**Interface/mixin classes** are equivalent to Java interfaces. They are as much
1000multiple inheritance as ICU uses — they do not decrease performance, and they do
1001not cause problems associated with multiple base classes having data members.
1002Interface/mixin classes contain only pure virtual member functions, and must
1003contain an empty virtual destructor. See for example the `UnicodeMatcher` class.
1004Interface/mixin classes must not inherit any non-interface/mixin class,
1005especially not `UMemory` or `UObject`. Instead, implementation classes must inherit
1006one of these two (or a subclass of them) in addition to the interface/mixin
1007classes they implement. See for example the `UnicodeSet` class.
1008
1009**Static classes** contain only static member functions and are therefore never
1010instantiated. They must not inherit `UMemory` or `UObject`. Instead, they must
1011declare a private default constructor (without any implementation) to prevent
1012instantiation. See for example the `LESwaps` layout engine class.
1013
1014**C++ classes internal to ICU** need not (but may) implement the boilerplate
1015functions as mentioned above. They must inherit at least `UMemory` if they are
1016instantiable.
1017
1018#### Make Sure The Compiler Uses C++
1019
1020The `__cplusplus` macro being defined ensures that the compiler uses C++. Starting
1021with ICU 49, we use this standard predefined macro.
1022
1023Up until ICU 4.8 we used to define and use `XP_CPLUSPLUS` but that was redundant
1024and did not add any value because it was defined if-and-only-if `__cplusplus` was
1025defined.
1026
1027#### Adoption of Objects
1028
1029Some constructors, factory functions and member functions take pointers to
1030objects that are then adopted. The adopting object contains a pointer to the
1031adoptee and takes over ownership and lifecycle control. Adoption occurs even if
1032an error occurs during the execution of the function, or in the code that adopts
1033the object. The semantics used within ICU are *adopt-on-call* (as opposed to,
1034for example, adopt-on-success):
1035
1036* **General**: A constructor or function that adopts an object does so
1037  in all cases, even if an error occurs and a `UErrorCode` is set. This means
1038  that either the adoptee is deleted immediately or its pointer is stored in
1039  the new object. The former case is most common when the constructor or
1040  factory function is called and the `UErrorCode` already indicates a failure.
1041  In the latter case, the new object must take care of deleting the adoptee
1042  once it is deleted itself regardless of whether or not the constructor was
1043  successful.
1044
1045* **Constructors**: The code that creates the object with the new operator
1046  must check the resulting pointer returned by new, deleting any adoptees if
1047  it is `nullptr` because the constructor was not called. (Typically, a `UErrorCode`
1048  must be set to `U_MEMORY_ALLOCATION_ERROR`.)
1049
1050  **Pitfall**: If you allocate/construct via "`ClassName *p = new ClassName(adoptee);`"
1051  and the memory allocation failed (`p==nullptr`), then the constructor has not
1052  been called, the adoptee has not been adopted, and you are still responsible for
1053  deleting it!
1054
1055  To simplify the above checking, ICU's `LocalPointer` class includes a
1056  constructor that both takes ownership and reports an error if nullptr. It is
1057  intended to be used with other-class constructors that may report a failure via
1058  UErrorCode, so that callers need to check only for U_FAILURE(errorCode) and not
1059  also separately for isNull().
1060
1061* **Factory functions (createInstance())**: The factory function must set a
1062  `U_MEMORY_ALLOCATION_ERROR` and delete any adoptees if it cannot allocate the
1063  new object. If the construction of the object fails otherwise, then the
1064  factory function must delete it and the factory function must delete its
1065  adoptees. As a result, a factory function always returns either a valid
1066  object and a successful `UErrorCode`, or a nullptr and a failure `UErrorCode`.
1067  A factory function returns a pointer to an object that must be deleted by
1068  the user/owner.
1069
1070Example: (This is a best-practice example. It does not reflect current `Calendar`
1071code.)
1072
1073```c++
1074Calendar*
1075Calendar::createInstance(TimeZone* zone, UErrorCode& errorCode) {
1076    LocalPointer<TimeZone> adoptedZone(zone);
1077    if(U_FAILURE(errorCode)) {
1078        // The adoptedZone destructor deletes the zone.
1079        return nullptr;
1080    }
1081    // since the Locale isn't specified, use the default locale
1082    LocalPointer<Calendar> c(new GregorianCalendar(zone, Locale::getDefault(), errorCode),
1083                             errorCode);    // LocalPointer will set a U_MEMORY_ALLOCATION_ERROR if
1084                                            // new GregorianCalendar() returns nullptr.
1085    if (c.isValid()) {
1086        // c adopted the zone.
1087        adoptedZone.orphan();
1088    }
1089    if (U_FAILURE(errorCode)) {
1090        // If c was constructed, then the c destructor deletes the Calendar,
1091        // and the Calendar destructor deletes the adopted zone.
1092        return nullptr;
1093    }
1094    return c.orphan();
1095}
1096```
1097
1098#### Memory Allocation
1099
1100All ICU C++ class objects directly or indirectly inherit `UMemory` (see
1101'boilerplate' discussion above) which provides `new`/`delete` operators, which in
1102turn call the internal functions in `cmemory.c`. Creating and releasing ICU C++
1103objects with `new`/`delete` automatically uses the ICU allocation functions.
1104
1105> :point_right: **Note**: Remember that (in absence of explicit :: scoping) C++
1106determines which `new`/`delete` operator to use from which type is allocated or
1107deleted, not from the context of where the statement is. Since non-class data
1108types (like `int`) cannot define their own `new`/`delete` operators, C++ always
1109uses the global ones for them by default.
1110
1111When global `new`/`delete` operators are to be used in the application (never inside
1112ICU!), then they should be properly scoped as e.g. `::new`, and the application
1113must ensure that matching `new`/`delete` operators are used. In some cases where
1114such scoping is missing in non-ICU code, it may be simpler to compile ICU
1115without its own `new`/`delete` operators. See `source/common/unicode/uobject.h` for
1116details.
1117
1118In ICU library code, allocation of non-class data types — simple integer types
1119**as well as pointers** — must use the functions in `cmemory.h`/`.c` (`uprv_malloc()`,
1120`uprv_free()`, `uprv_realloc()`). Such memory objects must be released inside ICU,
1121never by the user; this is achieved either by providing a "close" function for a
1122service or by avoiding to pass ownership of these objects to the user (and
1123instead filling user-provided buffers or returning constant pointers without
1124passing ownership).
1125
1126The `cmemory.h`/`.c` functions can be overridden at ICU compile time for custom
1127memory management. By default, `UMemory`'s `new`/`delete` operators are
1128implemented by calling these common functions. Overriding the `cmemory.h`/`.c`
1129functions changes the memory management for both C and C++.
1130
1131C++ objects that were either allocated with new or returned from a `createXYZ()`
1132factory method must be deleted by the user/owner.
1133
1134#### Memory Allocation Failures
1135
1136All memory allocations and object creations should be checked for success. In
1137the event of a failure (a `NULL` returned), a `U_MEMORY_ALLOCATION_ERROR` status
1138should be returned by the ICU function in question. If the allocation failure
1139leaves the ICU service in an invalid state, such that subsequent ICU operations
1140could also fail, the situation should be flagged so that the subsequent
1141operations will fail cleanly. Under no circumstances should a memory allocation
1142failure result in a crash in ICU code, or cause incorrect results rather than a
1143clean error return from an ICU function.
1144
1145Some functions, such as the C++ assignment operator, are unable to return an ICU
1146error status to their caller. In the event of an allocation failure, these
1147functions should mark the object as being in an invalid or bogus state so that
1148subsequent attempts to use the object will fail. Deletion of an invalid object
1149should always succeed.
1150
1151#### Memory Management
1152
1153C++ memory management is error-prone, and memory leaks are hard to avoid, but
1154the following helps a lot.
1155
1156First, if you can stack-allocate an object (for example, a `UnicodeString` or
1157`UnicodeSet`), do so. It is the easiest way to manage object lifetime.
1158
1159Inside functions, avoid raw pointers to owned objects. Instead, use
1160[LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)`<UnicodeString>`
1161or `LocalUResouceBundlePointer` etc., which is ICU's "smart pointer"
1162implementation. This is the "[Resource Acquisition Is Initialization(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)"
1163idiom. The "smart pointer" auto-deletes the object when it goes out of scope,
1164which means that you can just return from the function when an error occurs and
1165all auto-managed objects are deleted. You do not need to remember to write an
1166increasing number of "`delete xyz;`" at every function exit point.
1167
1168*In fact, you should almost never need to write "delete" in any function.*
1169
1170* Except in a destructor where you delete all of the objects which the class
1171  instance owns.
1172* Also, in static "cleanup" functions you still need to delete cached objects.
1173
1174When you pass on ownership of an object, for example to return the pointer of a
1175newly built object, or when you call a function which adopts your object, use
1176`LocalPointer`'s `.orphan()`.
1177
1178* Careful: When you return an object or pass it into an adopting factory
1179  method, you can use `.orphan()` directly.
1180* However, when you pass it into an adopting constructor, you need to pass in
1181  the `.getAlias()`, and only if the *allocation* of the new owner succeeded
1182  (you got a non-NULL pointer for that) do you `.orphan()` your `LocalPointer`.
1183* See the `Calendar::createInstance()` example above.
1184* See the `AlphabeticIndex` implementation for live examples. Search for other
1185  uses of `LocalPointer`/`LocalArray`.
1186
1187Every object must always be deletable/destructable. That is, at a minimum, all
1188pointers to owned memory must always be either NULL or point to owned objects.
1189
1190Internally:
1191
1192[cmemory.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/cmemory.h)
1193defines the `LocalMemory` class for chunks of memory of primitive types which
1194will be `uprv_free()`'ed.
1195
1196[cmemory.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/cmemory.h)
1197also defines `MaybeStackArray` and `MaybeStackHeaderAndArray` which automate
1198management of arrays.
1199
1200Use `CharString`
1201([charstr.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/charstr.h))
1202for `char *` strings that you build and modify.
1203
1204#### Global Inline Functions
1205
1206Global functions (non-class member functions) that are declared inline must be
1207made static inline. Some compilers will export symbols that are declared inline
1208but not static.
1209
1210#### No Declarations in the for() Loop Head
1211
1212Iterations through `for()` loops must not use declarations in the first part of
1213the loop. There have been two revisions for the scoping of these declarations
1214and some compilers do not comply to the latest scoping. Declarations of loop
1215variables should be outside these loops.
1216
1217#### Common or I18N
1218
1219Decide whether or not the module is part of the common or the i18n API
1220collection. Use the appropriate macros. For example, use
1221`U_COMMON_IMPLEMENTATION`, `U_I18N_IMPLEMENTATION`, `U_COMMON_API`, `U_I18N_API`.
1222See `utypes.h`.
1223
1224#### Constructor Failure
1225
1226If there is a reasonable chance that a constructor fails (For example, if the
1227constructor relies on loading data), then either it must use and set a
1228`UErrorCode` or the class needs to support an `isBogus()`/`setToBogus()` mechanism
1229like `UnicodeString` and `UnicodeSet`, and the constructor needs to set the object
1230to bogus if it fails.
1231
1232#### `UVector`, `UVector32`, or `UVector64`
1233
1234Use `UVector` to store arrays of `void *`; use `UVector32` to store arrays of
1235`int32_t`; use `UVector64` to store arrays of `int64_t`. Historically, `UVector`
1236has stored either `int32_t` or `void *`, but now storing `int32_t` in a `UVector`
1237is deprecated in favor of `UVector32`.
1238
1239### C Coding Guidelines
1240
1241This section describes the C-specific guidelines or conventions to use.
1242
1243#### Declare and define C APIs with both `U_CAPI` and `U_EXPORT2`
1244
1245All C APIs need to be **both declared and defined** using the `U_CAPI` and
1246`U_EXPORT2` qualifiers.
1247
1248```c++
1249U_CAPI int32_t U_EXPORT2
1250u_formatMessage(...);
1251```
1252
1253> :point_right: **Note**: Use `U_CAPI` before and `U_EXPORT2` after the return
1254type of exported C functions. Internal functions that are visible outside a
1255compilation unit need a `U_CFUNC` before the return type.
1256
1257#### Subdivide the Name Space
1258
1259Use prefixes to avoid name collisions. Some of those prefixes contain a 3- (or
1260sometimes 4-) letter module identifier. Very general names like
1261`u_charDirection()` do not have a module identifier in their prefix.
1262
1263* For POSIX replacements, the (all lowercase) POSIX function names start with
1264  "u_": `u_strlen()`.
1265* For other API functions, a 'u' is appended to the beginning with the module
1266  identifier (if appropriate), and an underscore '_', followed by the
1267  **mixed-case** function name. For example, use `u_charDirection()`,
1268  `ubidi_setPara()`.
1269* For types (struct, enum, union), a "U" is appended to the beginning, often
1270  "`U<module identifier>`" directly to the typename, without an underscore. For
1271  example, use `UComparisonResult`.
1272* For #defined constants and macros, a "U_" is appended to the beginning,
1273  often "`U<module identifier>_`" with an underscore to the uppercase macro
1274  name. For example, use `U_ZERO_ERROR`, `U_SUCCESS()`. For example, `UNORM_NFC`
1275
1276#### Functions for Constructors and Destructors
1277
1278Functions that roughly compare to constructors and destructors are called
1279`umod_open()` and `umod_close()`. See the following example:
1280
1281```c++
1282CAPI UBiDi * U_EXPORT2
1283ubidi_open();
1284
1285CAPI UBiDi * U_EXPORT2
1286ubidi_openSized(UTextOffset maxLength, UTextOffset maxRunCount);
1287
1288CAPI void U_EXPORT2
1289ubidi_close(UBiDi *pBiDi);
1290```
1291
1292Each successful call to a `umod_open()` returns a pointer to an object that must
1293be released by the user/owner by calling the matching `umod_close()`.
1294
1295#### C "Service Object" Types and LocalPointer Equivalents
1296
1297For every C "service object" type (equivalent to C++ class), we want to have a
1298[LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)
1299equivalent, so that C++ code calling the C API can use the specific "smart
1300pointer" to implement the "[Resource Acquisition Is Initialization
1301(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)"
1302idiom.
1303
1304For example, in `ubidi.h` we define the `UBiDi` "service object" type and also
1305have the following "smart pointer" definition which will call `ubidi_close()` on
1306destruction:
1307
1308```c++
1309// Use config switches like this only after including unicode/utypes.h
1310// or another ICU header.
1311#if U_SHOW_CPLUSPLUS_API
1312
1313U_NAMESPACE_BEGIN
1314
1315/**
1316 * class LocalUBiDiPointer
1317 * "Smart pointer" class, closes a UBiDi via ubidi_close().
1318 * For most methods see the LocalPointerBase base class.
1319 *
1320 * @see LocalPointerBase
1321 * @see LocalPointer
1322 * @stable ICU 4.4
1323 */
1324U_DEFINE_LOCAL_OPEN_POINTER(LocalUBiDiPointer, UBiDi, ubidi_close);
1325
1326U_NAMESPACE_END
1327
1328#endif
1329```
1330
1331#### Inline Implementation Functions
1332
1333Some, but not all, C compilers allow ICU users to declare functions inline
1334(which is a C++ language feature) with various keywords. This has advantages for
1335implementations because inline functions are much safer and more easily debugged
1336than macros.
1337
1338ICU *used to* use a portable `U_INLINE` declaration macro that can be used for
1339inline functions in C. However, this was an unnecessary platform dependency.
1340
1341We have changed all code that used `U_INLINE` to C++ (.cpp) using "inline", and
1342removed the `U_INLINE` definition.
1343
1344If you find yourself constrained by .c, change it to .cpp.
1345
1346All functions that are declared inline, or are small enough that an optimizing
1347compiler might inline them even without the inline declaration, should be
1348defined (implemented) – not just declared – before they are first used. This is
1349to enable as much inlining as possible, and also to prevent compiler warnings
1350for functions that are declared inline but whose definition is not available
1351when they are called.
1352
1353#### C Equivalents for Classes with Multiple Constructors
1354
1355In cases like `BreakIterator` and `NumberFormat`, instead of having several
1356different 'open' APIs for each kind of instances, use an enum selector.
1357
1358#### Source File Names
1359
1360Source file names for C begin with a 'u'.
1361
1362#### Memory APIs Inside ICU
1363
1364For memory allocation in C implementation files for ICU, use the functions and
1365macros in `cmemory.h`. When allocated memory is returned from a C API function,
1366there must be a corresponding function (like a `ucnv_close()`) that deallocates
1367that memory.
1368
1369All memory allocations in ICU should be checked for success. In the event of a
1370failure (a `NULL` returned from `uprv_malloc()`), a `U_MEMORY_ALLOCATION_ERROR` status
1371should be returned by the ICU function in question. If the allocation failure
1372leaves the ICU service in an invalid state, such that subsequent ICU operations
1373could also fail, the situation should be flagged so that the subsequent
1374operations will fail cleanly. Under no circumstances should a memory allocation
1375failure result in a crash in ICU code, or cause incorrect results rather than a
1376clean error return from an ICU function.
1377
1378#### // Comments
1379
1380C++ style // comments may be used in plain C files and in headers that will be
1381included in C files.
1382
1383## Source Code Strings with Unicode Characters
1384
1385### `char *` strings in ICU
1386
1387| Declared type | encoding | example | Used with |
1388| --- | --- | --- | --- |
1389| `char *` | varies with platform | `"Hello"` | Most ICU API functions taking `char *` parameters. Unless otherwise noted, characters are restricted to the "Invariant" set, described below |
1390| `char *` | UTF-8 |  `u8"¡Hola!"` | Only functions that are explicitly documented as expecting UTF-8. No restrictions on the characters used. |
1391| `UChar *` | UTF-16 | `u"¡Hola!"` | All ICU functions with `UChar *` parameters |
1392| `UChar32` | Code Point value | `U'�'` | UChar32 single code point constant. |
1393| `wchar_t` | unknown | `L"Hello"` | Not used with ICU. Unknown encoding, unknown size, not portable. |
1394
1395ICU source files are UTF-8 encoded, allowing any Unicode character to appear in
1396Unicode string or character literals, without the need for escaping. But, for
1397clarity, use escapes when plain text would be confusing, e.g. for invisible
1398characters.
1399
1400For convenience, ICU4C tends to use `char *` strings in places where only
1401"invariant characters" (a portable subset of the 7-bit ASCII repertoire) are
1402used. This allows locale IDs, charset names, resource bundle item keys and
1403similar items to be easily specified as string literals in the source code. The
1404same types of strings are also stored as "invariant character" `char *` strings
1405in the ICU data files.
1406
1407ICU has hard coded mapping tables in `source/common/putil.c` to convert invariant
1408characters to and from Unicode without using a full ICU converter. These tables
1409must match the encoding of string literals in the ICU code as well as in the ICU
1410data files.
1411
1412> :point_right: **Note**: Important: ICU assumes that at least the invariant
1413characters always have the same codes as is common on platforms with the same
1414charset family (ASCII vs. EBCDIC). **ICU has not been tested on platforms where
1415this is not the case.**
1416
1417Some usage of `char *` strings in ICU assumes the system charset instead of
1418invariant characters. Such strings are only handled with the default converter
1419(See the following section). The system charset is usually a superset of the
1420invariant characters.
1421
1422The following are the ASCII and EBCDIC byte values for all of the invariant
1423characters (see also `unicode/utypes.h`):
1424
1425| Character(s) | ASCII | EBCDIC |
1426| --- | --- | --- |
1427| a..i | 61..69 | 81..89 |
1428| j..r | 6A..72 | 91..99 |
1429| s..z | 73..7A | A2..A9 |
1430| A..I | 41..49 | C1..C9 |
1431| J..R | 4A..52 | D1..D9 |
1432| S..Z | 53..5A | E2..E9 |
1433| 0..9 | 30..39 | F0..F9 |
1434| (space) | 20 | 40 |
1435| " | 22 | 7F |
1436| % | 25 | 6C |
1437| & | 26 | 50 |
1438| ' | 27 | 7D |
1439| ( | 28 | 4D |
1440| ) | 29 | 5D |
1441| \* | 2A | 5C |
1442| + | 2B | 4E |
1443| , | 2C | 6B |
1444| - | 2D | 60 |
1445| . | 2E | 4B |
1446| / | 2F | 61 |
1447| : | 3A | 7A |
1448| ; | 3B | 5E |
1449| < | 3C | 4C |
1450| = | 3D | 7E |
1451| > | 3E | 6E |
1452| ? | 3F | 6F |
1453| _ | 5F | 6D |
1454
1455### Rules Strings with Unicode Characters
1456
1457In order to include characters in source code strings that are not part of the
1458invariant subset of ASCII, one has to use character escapes. In addition, rules
1459strings for collation, etc. need to follow service-specific syntax, which means
1460that spaces and ASCII punctuation must be quoted using the following rules:
1461
1462* Single quotes delineate literal text: `a'>'b` => `a>b`
1463* Two single quotes, either between or outside of single quoted text, indicate
1464  a literal single quote:
1465  * `a''b` => `a'b`
1466  * `a'>''<'b` => `a>'<b`
1467* A backslash precedes a single literal character:
1468* Several standard mechanisms are handled by `u_unescape()` and its variants.
1469
1470> :point_right: **Note**: All of these quoting mechanisms are supported by the
1471`RuleBasedTransliterator`. The single quote mechanisms (not backslash, not
1472`u_unescape()`) are supported by the format classes. In its infancy,
1473`ResourceBundle` supported the `\uXXXX` mechanism and nothing else.
1474This quoting method is the current policy. However, there are modules within
1475the ICU services that are being updated and this quoting method might not have
1476been applied to all of the modules.
1477
1478## Java Coding Conventions Overview
1479
1480The ICU group uses the following coding guidelines to create software using the
1481ICU Java classes and methods.
1482
1483### Code style
1484
1485The standard order for modifier keywords on APIs is:
1486
1487* `public static final synchronized strictfp`
1488* `public abstract`
1489
1490Do not use wild card import, such as "`import java.util.*`". The sort order of
1491import statements is `java` / `javax` / `org` / `com`. Within each top level package
1492category, sub packages and classes are sorted by alphabetical order. We
1493recommend ICU developers to use the Eclipse IDE feature \[Source\] - \[Organize
1494Imports\] (Ctrl+Shift+O) to organize import statements.
1495
1496All if/else/for/while/do loops use braces, even if the controlled statement is a
1497single line. This is for clarity and to avoid mistakes due to bad nesting of
1498control statements, especially during maintenance.
1499
1500Tabs should not be present in source files.
1501
1502Indentation is 4 spaces.
1503
1504Make sure the code is formatted cleanly with regular indentation. Follow Java
1505style code conventions, e.g., don't put multiple statements on a single line,
1506use mixed-case identifiers for classes and methods and upper case for constants,
1507and so on.
1508
1509Java source formatting rules described above is coming with the Eclipse project
1510file. It is recommended to run \[Source\] - \[Format\] (Ctrl+Shift+F) on Eclipse
1511IDE to clean up source files if necessary.
1512
1513Use UTF-8 encoding (without BOM) for java source files.
1514
1515Javadoc should be complete and correct when code is checked in, to avoid playing
1516catch-up later during the throes of the release. Please javadoc all methods, not
1517just external APIs, since this helps with maintenance.
1518
1519### Code organization
1520
1521Avoid putting more than one top-level class in a single file. Either use
1522separate files or nested classes.
1523
1524Always define at least one constructor in a public API class. The Java compiler
1525automatically generates no-arg constructor when a class has no explicit
1526constructors. We cannot provide proper API documentations for such default
1527constructors.
1528
1529Do not mix test, tool, and runtime code in the same file. If you need some
1530access to private or package methods or data, provide public accessors for them
1531and mark them `@internal`. Test code should be placed in `com.ibm.icu.dev.test`
1532package, and tools (e.g., code that generates data, source code, or computes
1533constants) in `com.ibm.icu.dev.tool` package. Occasionally for very simple cases
1534you can leave a few lines of tool code in the main source and comment it out,
1535but maintenance is easier if you just comment the location of the tools in the
1536source and put the actual code elsewhere.
1537
1538Avoid creating new interfaces unless you know you need to mix the interface into
1539two or more classes that have separate inheritance. Interfaces are impossible to
1540modify later in a backwards-compatible way. Abstract classes, on the other hand,
1541can add new methods with default behavior. Use interfaces only if it is required
1542by the architecture, not just for expediency.
1543
1544Current releases of ICU4J (since ICU 63) are restricted to use Java SE 7 APIs
1545and language features.
1546
1547### ICU Packages
1548
1549Public APIs should be placed in `com.ibm.icu.text`, `com.ibm.icu.util`, and
1550`com.ibm.icu.lang`. For historical reasons and for easier migration from JDK
1551classes, there are also APIs in `com.ibm.icu.math` but new APIs should not be
1552added there.
1553
1554APIs used only during development, testing, or tools work should be placed in
1555`com.ibm.icu.dev`.
1556
1557A class or method which is used by public APIs (listed above) but which is not
1558itself public can be placed in different places:
1559
15601. If it is only used by one class, make it private in that class.
15612. If it is only used by one class and its subclasses, make it protected in
1562   that class. In general, also tag it `@internal` unless you are working on a
1563   class that supports user-subclassing (rare).
15643. If it is used by multiple classes in one package, make it package private
1565   (also known as default access) and mark it `@internal`.
15664. If it is used by multiple packages, make it public and place the class in
1567   `the com.ibm.icu.impl` package.
1568
1569### ICU4J API Stability
1570
1571General discussion: See [ICU Design / ICU API compatibility](../icu/design.md#icu-api-compatibility).
1572
1573Occasionally, we “broaden” or “widen” a Java API by making a parameter broader
1574(e.g., `char` (code unit) to `int` (code point), or `String` to `CharSequence`)
1575or a return type narrower (e.g., `Object` to `UnicodeSet`).
1576
1577Such a change is source-compatible but not binary compatible.
1578Before we do this, we need to check with users like Android whether this is ok.
1579For example, in a class that Android exposes via its SDK,
1580Android may need to retain hidden compatibility overloads with the old input types.
1581
1582In addition, we should test with code using both the old and new types,
1583so that if someone has such compatibility overloads they all get exercised.
1584
1585### Error Handling and Exceptions
1586
1587Errors should be indicated by throwing exceptions, not by returning “bogus”
1588values.
1589
1590If an input parameter is in error, then a new
1591`IllegalArgumentException("description")` should be thrown.
1592
1593Exceptions should be caught only when something must be done, for example
1594special cleanup or rethrowing a different exception. If the error “should never
1595occur”, then throw a `new RuntimeException("description")` (rare). In this case,
1596a comment should be added with a justification.
1597
1598Use exception chaining: When an exception is caught and a new one created and
1599thrown (usually with additional information), the original exception should be
1600chained to the new one.
1601
1602A catch expression should not catch Throwable. Catch expressions should specify
1603the most specific subclass of Throwable that applies. If there are two concrete
1604subclasses, both should be specified in separate catch statements.
1605
1606### Binary Data Files
1607
1608ICU4J uses the same binary data files as ICU4C, in the big-endian/ASCII form.
1609The `ICUBinary` class should be used to read them.
1610
1611Some data sources (for example, compressed Jar files) do not allow the use of
1612several `InputStream` and related APIs:
1613
1614* Memory mapping is efficient, but not available for all data sources.
1615* Do not depend on `InputStream.available()`: It does not provide reliable
1616  information for some data sources. Instead, the length of the data needs to
1617  be determined from the data itself.
1618* Do not call `mark()` and `reset()` methods on `InputStream` without wrapping the
1619  `InputStream` object in a new `BufferedInputStream` object. These methods are
1620  not implemented by the `ZipInputStream` class, and their use may result in an
1621  `IOException`.
1622
1623### Compiler Warnings
1624
1625There should be no compiler warnings when building ICU4J. It is recommended to
1626develop using Eclipse, and to fix any problems that are shown in the Eclipse
1627Problems panel (below the main window).
1628
1629When a warning is not avoidable, you should add `@SuppressWarnings` annotations
1630with minimum scope.
1631
1632### Miscellaneous
1633
1634Objects should not be cast to a class in the `sun.*` packages because this would
1635cause a `SecurityException` when run under a `SecurityManager`. The exception needs
1636to be caught and default action taken, instead of propagating the exception.
1637
1638## Adding .c, .cpp and .h files to ICU
1639
1640In order to add compilable files to ICU, add them to the source code control
1641system in the appropriate folder and also to the build environment.
1642
1643To add these files, use the following steps:
1644
16451. Choose one of the ICU libraries:
1646   * The common library provides mostly low-level utilities and basic APIs that
1647     often do not make use of Locales. Examples are APIs that deal with character
1648     properties, the Locale APIs themselves, and ResourceBundle APIs.
1649   * The i18n library provides Locale-dependent and -using APIs, such as for
1650     collation and formatting, that are most useful for internationalized user
1651     input and output.
16522. Put the source code files into the folder `icu/source/library-name`, then add
1653   them to the build system:
1654   * For most platforms, add the expected .o files to
1655     `icu/source/library-name/Makefile.in`, to the OBJECTS variable. Add the
1656     **public** header files to the HEADERS variable.
1657   * For Microsoft Visual C++ 6.0, add all the source code files to
1658     `icu/source/library-name/library-name.dsp`. If you don't have Visual C++, add
1659     the filenames to the project file manually.
16603. Add test code to `icu/source/test/cintltest` for C APIs and to
1661   `icu/source/test/intltest` for C++ APIs.
16624. Make sure that the API functions are called by the test code (100% API
1663   coverage) and that at least 85% of the implementation code is exercised by
1664   the tests (>=85% code coverage).
16655. Create test code for C using the `log_err()`, `log_info()`, and `log_verbose()`
1666   APIs from `cintltst.h` (which uses `ctest.h`) and check it into the appropriate
1667   folder.
16686. In order to get your C test code called, add its top level function and a
1669   descriptive test module path to the test system by calling `addTest()`. The
1670   function that makes the call to `addTest()` ultimately must be called by
1671   `addAllTests()` in `calltest.c`. Groups of tests typically have a common
1672   `addGroup()` function that calls `addTest()` for the test functions in its
1673   group, according to the common part of the test module path.
16747. Add that test code to the build system also. Modify `Makefile.in` and the
1675   appropriate `.dsp` file (For example, the file for the library code).
1676
1677## C Test Suite Notes
1678
1679The cintltst Test Suite contains all the tests for the International Components
1680for Unicode C API. These tests may be automatically run by typing "cintltst" or
1681"cintltst -all" at the command line. This depends on the C Test Services:
1682`cintltst` or `cintltst -all`.
1683
1684### C Test Services
1685
1686The purpose of the test services is to enable the writing of tests entirely in
1687C. The services have been designed to make creating tests or converting old ones
1688as simple as possible with a minimum of services overhead. A sample test file,
1689"demo.c", is included at the end of this document. For more information
1690regarding C test services, please see the `icu4c/source/tools/ctestfw` directory.
1691
1692### Writing Test Functions
1693
1694The following shows the possible format of test functions:
1695
1696```c++
1697void some_test()
1698{
1699}
1700```
1701
1702Output from the test is accomplished with three printf-like functions:
1703
1704```c++
1705void log_err ( const char *fmt, ... );
1706void log_info ( const char *fmt, ... );
1707void log_verbose ( const char *fmt, ... );
1708```
1709
1710* `log_info()` writes to the console for informational messages.
1711* `log_verbose()` writes to the console ONLY if the VERBOSE flag is turned
1712  on (or the `-v` option to the command line). This option is useful for
1713  debugging. By default, the VERBOSE flag is turned OFF.
1714* `log_error()` can be called when a test failure is detected. The error is
1715  then logged and error count is incremented by one.
1716
1717To use the tests, link them into a hierarchical structure. The root of the
1718structure will be allocated by default.
1719
1720```c++
1721TestNode *root = NULL; /* empty */
1722addTest( &root, &some_test, "/test");
1723```
1724
1725Provide `addTest()` with the function pointer for the function that performs the
1726test as well as the absolute 'path' to the test. Paths may be up to 127 chars in
1727length and may be used to group tests.
1728
1729The calls to `addTest` must be placed in a function or a hierarchy of functions
1730(perhaps mirroring the paths). See the existing cintltst for more details.
1731
1732### Running the Tests
1733
1734A subtree may be extracted from another tree of tests for the programmatic
1735running of subtests.
1736
1737```c++
1738TestNode* sub;
1739sub = getTest(root, "/mytests");
1740```
1741
1742And a tree of tests may be run simply by:
1743
1744```c++
1745runTests( root ); /* or 'sub' */
1746```
1747
1748Similarly, `showTests()` lists out the tests. However, it is easier to use the
1749command prompt with the Usage specified below.
1750
1751### Globals
1752
1753The command line parser resets the error count and prints a summary of the
1754failed tests. But if `runTest` is called directly, for instance, it needs to be
1755managed manually. `ERROR_COUNT` contains the number of times `log_err` was
1756called. `runTests` resets the count to zero before running the tests.
1757`VERBOSITY` must be 1 to display `log_verbose()` data. Otherwise, `VERBOSITY`
1758must be set to 0 (default).
1759
1760### Building cintltst
1761
1762To compile this test suite using Microsoft Visual C++ (MSVC), follow the
1763instructions in [How To Build And Install On Windows](../icu4c/build#how-to-build-and-install-on-windows). This builds the libraries as well as the `cintltst` executable.
1764
1765### Executing cintltst
1766
1767To run the test suite from the command line, change the directories to
1768`icu4c/source/test/cintltst/Debug` for the debug build (or
1769`icu4c/source/test/cintltst/Release` for the release build) and then type `cintltst`.
1770
1771### cintltst Usage
1772
1773Type `cintltst -h` to view its command line parameters.
1774
1775```text
1776### Syntax:
1777### Usage: [ -l ] [ -v ] [ -verbose] [-a] [ -all] [-n]
1778 [-no_err_msg] [ -h] [ /path/to/test ]
1779### -l To get a list of test names
1780### -all To run all the test
1781### -a To run all the test(same as -all)
1782### -verbose To turn ON verbosity
1783### -v To turn ON verbosity(same as -verbose)
1784### -h To print this message
1785### -n To turn OFF printing error messages
1786### -no_err_msg (same as -n)
1787### -[/subtest] To run a subtest
1788### For example to run just the utility tests type: cintltest /tsutil)
1789### To run just the locale test type: cintltst /tsutil/loctst
1790###
1791
1792/******************** sample ctestfw test ********************
1793********* Simply link this with libctestfw or ctestfw.dll ****
1794************************* demo.c *****************************/
1795
1796#include "stdlib.h"
1797#include "ctest.h"
1798#include "stdio.h"
1799#include "string.h"
1800
1801/**
1802* Some sample dummy tests.
1803* the statics simply show how often the test is called.
1804*/
1805void mytest()
1806{
1807    static i = 0;
1808    log_info("I am a test[%d]\n", i++);
1809}
1810
1811void mytest_err()
1812{
1813    static i = 0;
1814    log_err("I am a test containing an error[%d]\n", i++);
1815    log_err("I am a test containing an error[%d]\n", i++);
1816}
1817
1818void mytest_verbose()
1819{
1820    /* will only show if verbose is on (-v) */
1821    log_verbose("I am a verbose test, blabbing about nothing at
1822all.\n");
1823}
1824
1825/**
1826* Add your tests from this function
1827*/
1828
1829void add_tests( TestNode** root )
1830{
1831    addTest(root, &mytest, "/apple/bravo" );
1832    addTest(root, &mytest, "/a/b/c/d/mytest");
1833    addTest(root, &mytest_err, "/d/e/f/h/junk");
1834    addTest(root, &mytest, "/a/b/c/d/another");
1835    addTest(root, &mytest, "/a/b/c/etest");
1836    addTest(root, &mytest_err, "/a/b/c");
1837    addTest(root, &mytest, "/bertrand/andre/damiba");
1838    addTest(root, &mytest_err, "/bertrand/andre/OJSimpson");
1839    addTest(root, &mytest, "/bertrand/andre/juice/oj");
1840    addTest(root, &mytest, "/bertrand/andre/juice/prune");
1841    addTest(root, &mytest_verbose, "/verbose");
1842
1843}
1844
1845int main(int argc, const char *argv[])
1846{
1847    TestNode *root = NULL;
1848
1849    add_tests(&root); /* address of root ptr- will be filled in */
1850
1851    /* Run the tests. An int is returned suitable for the OS status code.
1852    (0 for success, neg for parameter errors, positive for the # of
1853    failed tests) */
1854    return processArgs( root, argc, argv );
1855}
1856```
1857
1858## C++ IntlTest Test Suite Documentation
1859
1860The IntlTest suite contains all of the tests for the C++ API of International
1861Components for Unicode. These tests may be automatically run by typing `intltest`
1862at the command line. Since the verbose option prints out a considerable amount
1863of information, it is recommended that the output be redirected to a file:
1864`intltest -v > testOutput`.
1865
1866### Building IntlTest
1867
1868To compile this test suite using MSVC, follow the instructions for building the
1869`alCPP` (All C++ interfaces) workspace. This builds the libraries as well as the
1870`intltest` executable.
1871
1872### Executing IntelTest
1873
1874To run the test suite from the command line, change the directories to
1875`icu4c/source/test/intltest/Debug`, then type: `intltest -v >testOutput`. For the
1876release build, the executable will reside in the
1877`icu4c/source/test/intltest/Release` directory.
1878
1879### IntelTest Usage
1880
1881Type just `intltest -h` to see the usage:
1882
1883```text
1884### Syntax:
1885### IntlTest [-option1 -option2 ...] [testname1 testname2 ...]
1886### where options are: verbose (v), all (a), noerrormsg (n),
1887### exhaustive (e) and leaks (l).
1888### (Specify either -all (shortcut -a) or a test name).
1889### -all will run all of the tests.
1890###
1891### To get a list of the test names type: intltest LIST
1892### To run just the utility tests type: intltest utility
1893###
1894### Test names can be nested using slashes ("testA/subtest1")
1895### For example to list the utility tests type: intltest utility/LIST
1896### To run just the Locale test type: intltest utility/LocaleTest
1897###
1898### A parameter can be specified for a test by appending '@' and the value
1899### to the testname.
1900```
1901
1902## C: Testing with Fake Time
1903
1904The "Fake Time" capability allows ICU4C to be tested as if the hardware clock is
1905set to a specific time. This section documents how to use this facility.
1906Note that this facility requires the POSIX `'gettimeofday'` function to be
1907operable.
1908
1909This facility affects all ICU 'current time' calculations, including date,
1910calendar, time zone formats, and relative formats. It doesn't affect any calls
1911directly to the underlying operating system.
1912
19131. Build ICU with the **`U_DEBUG_FAKETIME`** preprocessor macro set. This can
1914   be accomplished with the following line in a file
1915   **icu/source/icudefs.local** :
1916
1917   ```shell
1918   CPPFLAGS+=-DU_DEBUG_FAKETIME
1919   ```
1920
19212. Determine the `UDate` value (the time value in milliseconds ± Midnight, Jan 1,
1922   1970 GMT) which you want to use as the target. For this sample we will use
1923   the value `28800000`, which is Midnight, Pacific Standard Time 1/1/1970.
19243. Set the environment variable `U_FAKETIME_START=28800000`
19254. Now, the first time ICU checks the current time, it will start at midnight
1926   1/1/1970 (pacific time) and roll forward. So, at the end of 10 seconds of
1927   program runtime, the clock will appear to be at 12:00:10.
19285. You can test this by running the utility '`icuinfo -m`' which will print out
1929   the 'Milliseconds since Epoch'.
19306. You can also test this by running the cintltest test
1931   `/tsformat/ccaltst/TestCalendar` in verbose mode which will print out the
1932   current time:
1933
1934   ```shell
1935   $ make check ICUINFO_OPTS=-m U_FAKETIME_START=28800000 CINTLTST_OPTS=-v
1936   /tsformat/ccaltst/TestCalendar
1937   U_DEBUG_FAKETIME was set at compile time, so the ICU clock will start at a
1938   preset value
1939   env variable U_FAKETIME_START=28800000 (28800000) for an offset of
1940   -1281957858861 ms from the current time 1281986658861
1941   PASS: The current date and time fetched is Thursday, January 1, 1970 12:00:00
1942   ```
1943
1944## C: Threading Tests
1945
1946Threading tests for ICU4C functions should be placed in under utility /
1947`MultithreadTest`, in the file `intltest/tsmthred.h` and `.cpp`. See the existing
1948tests in this file for examples.
1949
1950Tests from this location are automatically run under the [Thread
1951Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
1952(TSAN) in the ICU continuous build system. TSAN will reliably detect race
1953conditions that could possibly occur, however improbable that occurrence might
1954be normally.
1955
1956Data races are one of the most common and hardest to debug types of bugs in
1957concurrent systems. A data race occurs when two threads access the same variable
1958concurrently and at least one of the accesses is write. The C++11 standard
1959officially bans data races as undefined behavior.
1960
1961## Binary Data Formats
1962
1963ICU services rely heavily on data to perform their functions. Such data is
1964available in various more or less structured text file formats, which make it
1965easy to update and maintain. For high runtime performance, most data items are
1966pre-built into binary formats, i.e., they are parsed and processed once and then
1967stored in a format that is used directly during processing.
1968
1969Most of the data items are pre-built into binary files that are then installed
1970on a user's machine. Some data can also be built at runtime but is not
1971persistent. In the latter case, a primary object should be built once and then
1972cloned to avoid the multiple parsing, processing, and building of the same data.
1973
1974Binary data formats for ICU must be portable across platforms that share the
1975same endianness and the same charset family (ASCII vs. EBCDIC). It would be
1976possible to handle data from other platform types, but that would require
1977load-time or even runtime conversion.
1978
1979### Data Types
1980
1981Binary data items are memory-mapped, i.e., they are used as readonly, constant
1982data. Their structures must be portable according to the criteria above and
1983should be efficiently usable at runtime without building additional runtime data
1984structures.
1985
1986Most native C/C++ data types cannot be used as part of binary data formats
1987because their sizes are not fixed across compilers. For example, an int could be
198816/32/64 or even any other number of bits wide. Only types with absolutely known
1989widths and semantics must be used.
1990
1991Use for example:
1992
1993* `uint8_t`, `uint16_t`, `int32_t` etc.
1994* `UBool`: same as `int8_t`
1995* `UChar`: for 16-bit Unicode strings
1996* `UChar32`: for Unicode code points
1997* `char`: for "invariant characters", see `utypes.h`
1998
1999> :point_right: **Note**: ICU assumes that `char` is an 8-bit byte but makes no
2000assumption about its signedness.
2001
2002**Do not use** for example:
2003
2004* `short`, `int`, `long`, `unsigned int` etc.: undefined widths
2005* `float`, `double`: undefined formats
2006* `bool`: undefined width and signedness
2007* `enum`: undefined width and signedness
2008* `wchar_t`: undefined width, signedness and encoding/charset
2009
2010Each field in a binary/mappable data format must be aligned naturally. This
2011means that a field with a primitive type of size n bytes must be at an n-aligned
2012offset from the start of the data block. `UChar` must be 2-aligned, `int32_t` must
2013be 4-aligned, etc.
2014
2015It is possible to use struct types, but one must make sure that each field is
2016naturally aligned, without possible implicit field padding by the compiler —
2017assuming a reasonable compiler.
2018
2019```c++
2020// bad because i will be preceded by compiler-dependent padding
2021// for proper alignment
2022struct BadExample {
2023    UBool flag;
2024    int32_t i;
2025};
2026
2027// ok with explicitly added padding or generally conscious
2028// sequence of types
2029struct OKExample {
2030    UBool flag;
2031    uint8_t pad[3];
2032    int32_t i;
2033};
2034```
2035
2036Within the binary data, a `struct` type field must be aligned according to its
2037widest member field. The struct `OKExample` must be 4-aligned because it contains
2038an `int32_t` field. Make padding explicit via additional fields, rather than
2039letting the compiler choose optional padding.
2040
2041Another potential problem with `struct` types, especially in C++, is that some
2042compilers provide RTTI for all classes and structs, which inserts a `_vtable`
2043pointer before the first declared field. When using `struct` types with
2044binary/mappable data in C++, assert in some place in the code that `offsetof` the
2045first field is 0. For an example see the genpname tool.
2046
2047### Versioning
2048
2049ICU data files have a `UDataHeader` structure preceding the actual data. Among
2050other fields, it contains a `formatVersion` field with four parts (one `uint8_t`
2051each). It is best to use only the first (major) or first and second
2052(major/minor) fields in the runtime code to determine binary compatibility,
2053i.e., reject a data item only if its `formatVersion` contains an unrecognized
2054major (or major/minor) version number. The following parts of the version should
2055be used to indicate variations in the format that are backward compatible, or
2056carry other information.
2057
2058For example, the current `uprops.icu` file's `formatVersion` (see the genprops tool
2059and `uchar.c`/`uprops.c`) is set to indicate backward-incompatible changes with the
2060major version number, backward-compatible additions with the minor version
2061number, and shift width constants for the `UTrie` data structure in the third and
2062fourth version numbers (these could change independently of the `uprops.icu`
2063format).
2064
2065## C/C++ Debugging Hints and Tips
2066
2067### Makefile-based platforms
2068
2069* use `Makefile.local` files (override of `Makefile`), or `icudefs.local` (at the
2070  top level, override of `icudefs.mk`) to avoid the need to modify
2071  change-controlled source files with debugging information.
2072  * Example: **`CPPFLAGS+=-DUDATA_DEBUG`** in common to enable data
2073    debugging
2074  * Example: **`CINTLTST_OPTS=/tscoll`** in the cintltst directory provides
2075    arguments to the cintltest test upon make check, to only run collation
2076    tests.
2077    * intltest: `INTLTEST_OPTS`
2078    * cintltst: `CINTLTST_OPTS`
2079    * iotest: `IOTEST_OPTS`
2080    * icuinfo: `ICUINFO_OPTS`
2081    * (letest does not have an OPTS variable as of ICU 4.6.)
2082
2083### Windows/Microsoft Visual Studio
2084
2085The following addition to autoexp.dat will cause **`UnicodeString`**s to be
2086visible as strings in the debugger without expanding sub-items:
2087
2088```text
2089;; Copyright (C) 2010 IBM Corporation and Others. All Rights Reserved.
2090;; ICU Additions
2091;; Add to {VISUAL STUDIO} \Common7\Packages\Debugger\autoexp.dat
2092;;   in the [autoexpand] section just before the final [hresult] section.
2093;;
2094;; Need to change 'icu_##' to the current major+minor (so icu_46 for 4.6.1 etc)
2095
2096icu_46::UnicodeString {
2097    preview        (
2098              #if($e.fFlags & 2)   ; stackbuffer
2099               (
2100                  #(
2101                "U= '",
2102                [$e.fUnion.fStackBuffer, su],
2103                "', len=",
2104                [$e.fShortLength, u]
2105                ;[$e.fFields.fArray, su]
2106               )
2107              )
2108              #else
2109               (
2110                  #(
2111                "U* '",
2112                [$e.fUnion.fFields.fArray, su],
2113                "', len=",
2114                [$e.fShortLength, u]
2115                ;[$e.fFields.fArray, su]
2116               )
2117              )
2118            )
2119
2120    stringview    (
2121              #if($e.fFlags & 2)   ; stackbuffer
2122               (
2123                  #(
2124                "U= '",
2125                [$e.fUnion.fStackBuffer, su],
2126                "', len=",
2127                [$e.fShortLength, u]
2128                ;[$e.fFields.fArray, su]
2129               )
2130              )
2131              #else
2132               (
2133                  #(
2134                "U* '",
2135                [$e.fUnion.fFields.fArray, su],
2136                "', len=",
2137                [$e.fShortLength, u]
2138                ;[$e.fFields.fArray, su]
2139               )
2140              )
2141            )
2142
2143}
2144;;;
2145;;; End ICU Additions
2146;;;
2147```
2148