1--- 2layout: default 3title: Coding Guidelines 4nav_order: 1 5parent: Contributors 6--- 7<!-- 8© 2020 and later: Unicode, Inc. and others. 9License & terms of use: http://www.unicode.org/copyright.html 10--> 11 12# Coding Guidelines 13{: .no_toc } 14 15## Contents 16{: .no_toc .text-delta } 17 181. TOC 19{:toc} 20 21--- 22 23## Overview 24 25This section provides the guidelines for developing C and C++ code, based on the 26coding conventions used by ICU programmers in the creation of the ICU library. 27 28## Details about ICU Error Codes 29 30When calling an ICU API function and an error code pointer (C) or reference 31(C++), a `UErrorCode` variable is often passed in. This variable is allocated by 32the caller and must pass the test `U_SUCCESS()` before the function call. 33Otherwise, the function will return immediately, taking no action. Normally, an 34error code variable is initialized by `U_ZERO_ERROR`. 35 36`UErrorCode` is passed around and used this way, instead of using C++ exceptions 37for the following reasons: 38 39* It is useful in the same form for C also 40* Some C++ compilers do not support exceptions 41 42> :point_right: **Note**: *This error code mechanism, in fact, works similarly to 43> exceptions. If users call several ICU functions in a sequence, as soon as one 44> sets a failure code, the functions in the following example will not work. This 45> procedure prevents the API function from processing data that is not valid in 46> the sequence of function calls and relieves the caller from checking the error 47> code after each call. It is somewhat similar to how an exception terminates a 48> function block or try block early.* 49 50Functions with a UErrorCode parameter will typically check it as the very first 51thing, returning immediately in case of failure. An exception to this general 52rule occurs with functions that adopt, or take ownership of other objects. 53See [Adoption of Objects](#adoption-of-objects) for further information. 54The following code shows the inside of an ICU function implementation: 55 56```c++ 57U_CAPI const UBiDiLevel * U_EXPORT2 58ubidi_getLevels(UBiDi *pBiDi, UErrorCode *pErrorCode) { 59 int32_t start, length; 60 61 if(U_FAILURE(*pErrorCode)) { 62 return nullptr; 63 } else if(pBiDi==nullptr || (length=pBiDi->length)<=0) { 64 *pErrorCode=U_ILLEGAL_ARGUMENT_ERROR; 65 return nullptr; 66 } 67 68 ... 69 return result; 70} 71``` 72 73Note: We have decided that we do not want to test for `pErrorCode==NULL`. Some 74existing code does this, but new code should not. 75 76Note: *Callers* (as opposed to implementers) of ICU APIs can simplify their code 77by defining and using a subclass of `icu::ErrorCode`. ICU implementers can use the 78`IcuTestErrorCode` class in intltest code. 79 80It is not necessary to check for `U_FAILURE()` immediately before calling a 81function that takes a `UErrorCode` parameter, because that function is supposed to 82check for failure. Exception: If the failure comes from objection allocation or 83creation, then you probably have a `NULL` object pointer and must not call any 84method on that object, not even one with a `UErrorCode` parameter. 85 86### Sample Function with Error Checking 87 88```c++ 89 U_CAPI int32_t U_EXPORT2 90 uplrules_select(const UPluralRules *uplrules, // Do not check 91 // "this"/uplrules vs. NULL. 92 double number, 93 UChar *keyword, int32_t capacity, 94 UErrorCode *status) // Do not check status!=NULL. 95 { 96 if (U_FAILURE(*status)) { // Do check for U_FAILURE() 97 // before setting *status 98 return 0; // or calling UErrorCode-less 99 // select(number). 100 } 101 if (keyword == NULL ? capacity != 0 : capacity < 0) { 102 // Standard destination buffer 103 // checks. 104 *status = U_ILLEGAL_ARGUMENT_ERROR; 105 return 0; 106 } 107 UnicodeString result = ((PluralRules*)uplrules)->select(number); 108 return result.extract(keyword, capacity, *status); 109 } 110``` 111 112### New API Functions 113 114If the API function is non-const, then it should have a `UErrorCode` parameter. 115(Not the other way around: Some const functions may need a `UErrorCode` as well.) 116 117Default C++ assignment operators and copy constructors should not be used (they 118should be declared private and not implemented). Instead, define an `assign(Class 119&other, UErrorCode &errorCode)` function. Normal constructors are fine, and 120should have a `UErrorCode` parameter. 121 122### Warning Codes 123 124Some `UErrorCode` values do not indicate a failure but an additional informational 125return value. Their enum constants have the `_WARNING` suffix and they pass the 126`U_SUCCESS()` test. 127 128However, experience has shown that they are problematic: They can get lost 129easily because subsequent function calls may set their own "warning" codes or 130may reset a `UErrorCode` to `U_ZERO_ERROR`. 131 132The source of the problem is that the `UErrorCode` mechanism is designed to mimic 133C++/Java exceptions. It prevents ICU function execution after a failure code is 134set, but like exceptions it does not work well for non-failure information 135passing. 136 137Therefore, we recommend to use warning codes very carefully: 138 139* Try not to rely on any warning codes. 140* Use real APIs to get the same information if possible. 141 For example, when a string is completely written but cannot be 142 NUL-terminated, then `U_STRING_NOT_TERMINATED_WARNING` indicates this, but so 143 does the returned destination string length (which will have the same value 144 as the destination capacity in this case). Checking the string length is 145 safer than checking the warning code. (It is even safer to not rely on 146 NUL-terminated strings but to use the length.) 147* If warning codes must be used, then the best is to set the `UErrorCode` to 148 `U_ZERO_ERROR` immediately before calling the function in question, and to 149 check for the expected warning code immediately after the function returns. 150 151Future versions of ICU will not introduce new warning codes, and will provide 152real API replacements for all existing warning codes. 153 154### Bogus Objects 155 156Some objects, for example `UnicodeString` and `UnicodeSet`, can become "bogus". This 157is used when methods that create or modify the object fail (mostly due to an 158out-of-memory condition) but do not take a `UErrorCode` parameter and can 159therefore not otherwise report the failure. 160 161* A bogus object appears as empty. 162* A bogus object cannot be modified except with assignment-like functions. 163* The bogus state of one object does not transfer to another. For example, 164 adding a bogus `UnicodeString` to a `UnicodeSet` does not make the set bogus. 165 (It would be hard to make propagation consistent and test it well. Also, 166 propagation among bogus states and error codes would be messy.) 167* If a bogus object is passed into a function that does have a `UErrorCode` 168 parameter, then the function should set the `U_ILLEGAL_ARGUMENT_ERROR` code. 169 170## API Documentation 171 172"API" means any public class, function, or constant. 173 174### API status tag 175 176Aside from documenting an API's functionality, parameters, return values etc. we 177also mark every API with whether it is `@draft`, `@stable`, `@deprecated` or 178`@internal`. (Where `@internal` is used when something is not actually supported 179API but needs to be physically public anyway.) A new API is usually marked with 180"`@draft ICU 4.8`". For details of how we mark APIs see the "ICU API 181compatibility" section of the [ICU Architectural Design](../design.md) page. In 182Java, also see existing @draft APIs for complete examples. 183 184Functions that override a base class or interface definition take the API status 185of the base class function. For C++, use the `@copydoc base::function()` tag to 186copy both the description and the API status from the base function definition. 187For Java methods the status tags must be added by hand; use the `{@inheritDoc}` 188JavaDoc tag to pick up the rest of the base function documentation. 189Documentation should not be manually replicated in overriding functions; it is 190too hard to keep multiple copies synchronized. 191 192The policy for the treatment of status tags in overriding functions was 193introduced with ICU 64 for C++, and with ICU 59 for Java. Earlier code may 194deviate. 195 196### Coding Example 197 198Coding examples help users to understand the usage of each API. Whenever 199possible, it is encouraged to embed a code snippet illustrating the usage of an 200API along with the functional specification. 201 202#### Embedding Coding Examples in ICU4J - JCite 203 204Since ICU4J 49M2, the ICU4J ant build target "doc" utilizes an external tool 205called [JCite](https://arrenbrecht.ch/jcite/). The tool allows us to cite a 206fragment of existing source code into JavaDoc comment using a tag. To embed a 207code snippet with the tag. For example, 208`{@.jcite com.ibm.icu.samples.util.timezone.BasicTimeZoneExample:---getNextTransitionExample}` 209will be replaced a fragment of code marked by comment lines 210`// ---getNextTransisionExample` in `BasicTimeZoneExample.java` in package 211`com.ibm.icu.samples.util.timezone`. When embedding code snippet using JCite, we 212recommend to follow next guidelines 213 214* A sample code should be placed in `<icu4j_root>/samples/src` directory, 215 although you can cite any source fragment from source files in 216 `<icu4j_root>/demos/src`, `<icu4j_root\>/main/core/*/src`, 217 `<icu4j_root>/main/test/*/src`. 218* A sample code should use package name - 219 `com.ibm.icu.samples.<subpackage>.<facility>`. `<subpackage>` is corresponding 220 to the target ICU API class's package, that is, one of lang/math/text/util. 221 `<facility>` is a name of facility, which is usually the base class of the 222 service. For example, use package `com.ibm.icu.samples.text.dateformat` for 223 samples related to ICU's date format service, 224 `com.ibm.icu.samples.util.timezone` for samples related to time zone service. 225* A sample code should be self-contained as much as possible (use only JDK and 226 ICU public APIs if possible). This allows readers to cut & paste a code 227 snippet to try it out easily. 228* The citing comment should start with three consecutive hyphen followed by 229 lower camel case token - for example, "`// ---compareToExample`" 230* Keep in mind that the JCite tag `{@.jcite ...}` is not resolved without JCite. 231 It is encouraged to avoid placing code snippet within a sentence. Instead, 232 you should place a code snippet using JCite in an independent paragraph. 233 234#### Embedding Coding Examples in ICU4C 235 236Also since ICU4C 49M2, ICU4C docs (using the [\\snippet command](http://www.doxygen.nl/manual/commands.html#cmdsnippet) 237which is new in Doxygen 1.7.5) can cite a fragment of existing sample or test code. 238 239Example in `ucnv.h`: 240 241```c++ 242 /** 243 * \snippet samples/ucnv/convsamp.cpp ucnv_open 244 */ 245 ucnv_open( ... ) ... 246``` 247 248This cites code in `icu4c/source/samples/ucnv/convsamp.cpp` as follows: 249 250```c++ 251 //! [ucnv_open] 252 conv = ucnv_open("koi8-r", &status); 253 //! [ucnv_open] 254``` 255 256Notice the tag "`ucnv_open`" which must be the same in all three places (in 257the header file, and twice in the cited file). 258 259## C and C++ Coding Conventions Overview 260 261The ICU group uses the following coding guidelines to create software using the 262ICU C++ classes and methods as well as the ICU C methods. 263 264### C/C++ Hiding Un-@stable APIs 265 266In C/C++, we enclose `@draft` and such APIs with `#ifndef U_HIDE_DRAFT_API` or 267similar as appropriate. When a draft API becomes stable, we need to remove the 268surrounding `#ifndef`. 269 270Note: The `@system` tag is *in addition to* the 271`@draft`/`@stable`/`@deprecated`/`@obsolete` status tag. 272 273Copy/paste the appropriate `#ifndef..#endif` pair from the following: 274 275```c++ 276#ifndef U_HIDE_DRAFT_API 277#endif // U_HIDE_DRAFT_API 278 279#ifndef U_HIDE_DEPRECATED_API 280#endif // U_HIDE_DEPRECATED_API 281 282#ifndef U_HIDE_OBSOLETE_API 283#endif // U_HIDE_OBSOLETE_API 284 285#ifndef U_HIDE_SYSTEM_API 286#endif // U_HIDE_SYSTEM_API 287 288#ifndef U_HIDE_INTERNAL_API 289#endif // U_HIDE_INTERNAL_API 290``` 291 292We `#ifndef` `@draft`/`@deprecated`/... APIs as much as possible, including C 293functions, many C++ class methods (see exceptions below), enum constants (see 294exceptions below), whole enums, whole classes, etc. 295 296We do not `#ifndef` APIs where that would be problematic: 297 298* struct/class members where that would modify the object layout (non-static 299 struct/class fields, virtual methods) 300* enum constants where that would modify the numeric values of following 301 constants 302 * actually, best to use `#ifndef` together with explicitly defining the 303 numeric value of the next constant 304* C++ class boilerplate (e.g., default/copy constructors), if 305 the compiler would auto-create public functions to replace `#ifndef`’ed ones 306 * For example, the compiler automatically creates a default constructor if 307 the class does not specify any other constructors. 308* private class members 309* definitions in internal/test/tools header files (that would be pointless; 310 they should probably not have API tags in the first place) 311* forward or friend declarations 312* definitions that are needed for other definitions that would not be 313 `#ifndef`'ed (e.g., for public macros or private methods) 314* platform macros (mostly in `platform.h`/`umachine.h` & similar) and 315 user-configurable settings (mostly in `uconfig.h`) 316 317More handy copy-paste text: 318 319```c++ 320 // Do not enclose the protected default constructor with #ifndef U_HIDE_INTERNAL_API 321 // or else the compiler will create a public default constructor. 322 323 // Do not enclose protected default/copy constructors with #ifndef U_HIDE_INTERNAL_API 324 // or else the compiler will create public ones. 325``` 326 327### C and C++ Type and Format Convention Guidelines 328 329The following C and C++ type and format conventions are used to maximize 330portability across platforms and to provide consistency in the code: 331 332#### Constants (#define, enum items, const) 333 334Use uppercase letters for constants. For example, use `UBREAKITERATOR_DONE`, 335`UBIDI_DEFAULT_LTR`, `ULESS`. 336 337For new enum types (as opposed to new values added to existing types), do not 338define enum types in C++ style. Instead, define C-style enums with U... type 339prefix and `U_`/`UMODULE_` constants. Define such enum types outside the ICU 340namespace and outside any C++ class. Define them in C header files if there are 341appropriate ones. 342 343#### Variables and Functions 344 345Use mixed-case letters that start with a lowercase letter for variables and 346functions. For example, use `getLength()`. 347 348#### Types (class, struct, enum, union) 349 350Use mixed-case that start with an uppercase letter for types. For example, use 351class `DateFormatSymbols`. 352 353#### Function Style 354 355Use the `getProperty()` and `setProperty()` style for functions where a lowercase 356letter begins the first word and the second word is capitalized without a space 357between it and the first word. For example, `UnicodeString` 358`getSymbol(ENumberFormatSymbol symbol)`, 359`void setSymbol(ENumberFormatSymbol symbol, UnicodeString value)` and 360`getLength()`, `getSomethingAt(index/offset)`. 361 362#### Common Parameter Names 363 364In order to keep function parameter names consistent, the following are 365recommendations for names or suffixes (usual "Camel case" applies): 366 367* "start": the index (of the first of several code units) in a string or array 368* "limit": the index (of the **first code unit after** a specified range) in a 369 string or array (the number of units are (limit-start)) 370* name the length (for the number of code units in a (range of a) string or 371 array) either "length" or "somePrefixLength" 372* name the capacity (for the number of code units available in an output 373 buffer) either "capacity" or "somePrefixCapacity" 374 375#### Order of Source/Destination Arguments 376 377Many ICU function signatures list source arguments before destination arguments, 378as is common in C++ and Java APIs. This is the preferred order for new APIs. 379(Example: `ucol_getSortKey(const UCollator *coll, const UChar *source, 380int32_t sourceLength, uint8_t *result, int32_t resultLength)`) 381 382Some ICU function signatures list destination arguments before source arguments, 383as is common in C standard library functions. This should be limited to 384functions that closely resemble such C standard library functions or closely 385related ICU functions. (Example: `u_strcpy(UChar *dst, const UChar *src)`) 386 387#### Order of Include File Includes 388 389Include system header files (like `<stdio.h>`) before ICU headers followed by 390application-specific ones. This assures that ICU headers can use existing 391definitions from system headers if both happen to define the same symbols. In 392ICU files, all used headers should be explicitly included, even if some of them 393already include others. 394 395Within a group of headers, place them in alphabetical order. 396 397#### Style for ICU Includes 398 399All ICU headers should be included using ""-style includes (like 400`"unicode/utypes.h"` or `"cmemory.h"`) in source files for the ICU library, tools, 401and tests. 402 403#### Pointer Conversions 404 405Do not cast pointers to integers or integers to pointers. Also, do not cast 406between data pointers and function pointers. This will not work on some 407compilers, especially with different sizes of such types. Exceptions are only 408possible in platform-specific code where the behavior is known. 409 410Please use C++-style casts, at least for pointers, for example `const_cast`. 411 412* For conversion between related types, for example from a base class to a 413 subclass (when you *know* that the object is of that type), use 414 `static_cast`. (When you are not sure if the object has the subclass type, 415 then use a `dynamic_cast`; see a later section about that.) 416* Also use `static_cast`, not `reinterpret_cast`, for conversion from `void *` 417 to a specific pointer type. (This is accepted and recommended because there 418 is an implicit conversion available for the opposite conversion.) See 419 [ICU-9434](https://unicode-org.atlassian.net/browse/ICU-9434) for details. 420* For conversion between unrelated types, for example between `char *` and 421 `uint8_t *`, or between `Collator *` and `UCollator *`, use a 422 `reinterpret_cast`. 423 424#### Returning a Number of Items 425 426To return a number of items, use `countItems()`, **not** `getItemCount()`, even if 427there is no need to actually count using that member function. 428 429#### Ranges of Indexes 430 431Specify a range of indexes by having start and limit parameters with names or 432suffix conventions that represent the index. A range should contain indexes from 433start to limit-1 such as an interval that is left-closed and right-open. Using 434mathematical notation, this is represented as: \[start..limit\[. 435 436#### Functions with Buffers 437 438Set the default value to -1 for functions that take a buffer (pointer) and a 439length argument with a default value so that the function determines the length 440of the input itself (for text, calling `u_strlen()`). Any other negative or 441undefined value constitutes an error. 442 443#### Primitive Types 444 445Primitive types are defined by the `unicode/utypes.h` file or a header file that 446includes other header files. The most common types are `uint8_t`, `uint16_t`, 447`uint32_t`, `int8_t`, `int16_t`, `int32_t`, `char16_t`, 448`UChar` (same as `char16_t`), `UChar32` (signed, 32-bit), and `UErrorCode`. 449 450The language built-in type `bool` and constants `true` and `false` may be used 451internally, for local variables and parameters of internal functions. The ICU 452type `UBool` must be used in public APIs and in the definition of any persistent 453data structures. `UBool` is guaranteed to be one byte in size and signed; `bool` 454is not. **Except**: Starting with ICU 70 (2021q4), `operator==()` and 455`operator!=()` must return `bool`, not `UBool`, because of a change in C++20, 456see [ICU-20973](https://unicode-org.atlassian.net/browse/ICU-20973). 457 458Traditionally, ICU4C has defined its own `FALSE`=0 / `TRUE`=1 macros for use with `UBool`. 459Starting with ICU 68 (2020q4), we no longer define these in public header files 460(unless `U_DEFINE_FALSE_AND_TRUE`=1), 461in order to avoid name collisions with code outside ICU defining enum constants and similar 462with these names. 463Starting with ICU 72 (2022q4), we no longer use these anywhere in ICU. 464 465Instead, the versions of the C and C++ standards we require now do define type `bool` 466and values `false` & `true`, and we and our users can use these values. 467 468As of ICU 70, we are not changing ICU4C API from `UBool` to `bool`, except on 469equality operators (see above). 470Doing so in C API, or in structs that cross the library boundary, 471would break binary compatibility. 472Doing so only in other places in C++ could be confusingly inconsistent. 473We may revisit this. 474 475Note that the details of type `bool` (e.g., `sizeof`) depend on the compiler and 476may differ between C and C++. 477 478#### File Names (.h, .c, .cpp, data files if possible, etc.) 479 480Limit file names to 31 lowercase ASCII characters. (Older versions of MacOS have 481that length limit.) 482 483Exception: The layout engine uses mixed-case file names. 484 485(We have abandoned the 8.3 naming standard although we do not change the names 486of old header files.) 487 488#### Language Extensions and Standards 489 490Proprietary features, language extensions, or library functions, must not be 491used because they will not work on all C or C++ compilers. 492In Microsoft Visual C++, go to Project Settings(alt-f7)->All Configurations-> 493C/C++->Customize and check Disable Language Extensions. 494 495Exception: some Microsoft headers will not compile without language extensions 496being enabled, which in turn requires some ICU files be built with language 497extensions. 498 499#### Tabs and Indentation 500 501Save files with spaces instead of tab characters (\\x09). The indentation size 502is 4. 503 504#### Documentation 505 506Use Java doc-style in-file documentation created with 507[doxygen](http://www.doxygen.org/) . 508 509#### Multiple Statements 510 511Place multiple statements in multiple lines. `if()` or loop heads must not be 512followed by their bodies on the same line. 513 514#### Placements of `{}` Curly Braces 515 516Place curly braces `{}` in reasonable and consistent locations. Each of us 517subscribes to different philosophies. It is recommended to use the style of a 518file, instead of mixing different styles. It is requested, however, to not have 519`if()` and loop bodies without curly braces. 520 521#### `if() {...}` and Loop Bodies 522 523Use curly braces for `if()` and else as well as loop bodies, etc., even if there 524is only one statement. 525 526#### Function Declarations 527 528Have one line that has the return type and place all the import declarations, 529extern declarations, export declarations, the function name, and function 530signature at the beginning of the next line. 531 532Function declarations need to be in the form `U_CAPI` return-type `U_EXPORT2` to 533satisfy all the compilers' requirements. 534 535For example, use the following 536convention: 537 538```c++ 539U_CAPI int32_t U_EXPORT2 540u_formatMessage(...); 541``` 542 543> :point_right: **Note**: The `U_CAPI`/`U_DEPRECATED` and `U_EXPORT2` qualifiers 544> are required for both the declaration and the definition of *exported C and 545> static C++ functions*. Use `U_CAPI` (or `U_DEPRECATED`) before and `U_EXPORT2` 546> after the return type of *exported C and static C++ functions*. 547> 548> Internal functions that are visible outside a compilation unit need a `U_CFUNC` 549> before the return type. 550> 551> *Non-static C++ class member functions* do *not* get `U_CAPI`/`U_EXPORT2` 552> because they are exported and declared together with their class exports. 553 554> :point_right: **Note**: Before ICU 68 (2020q4) we used to use alternate qualifiers 555> like `U_DRAFT`, `U_STABLE` etc. rather than `U_CAPI`, 556> but keeping these in sync with API doc tags `@draft` and guard switches like `U_HIDE_DRAFT_API` 557> was tedious and error-prone and added no value. 558> Since ICU 68 (ICU-9961) we only use `U_CAPI` and `U_DEPRECATED`. 559 560#### Use Anonymous Namesapces or Static For File Scope 561 562Use anonymous namespaces or `static` for variables, functions, and constants that 563are not exported explicitly by a header file. Some platforms are confused if 564non-static symbols are not explicitly declared extern. These platforms will not 565be able to build ICU nor link to it. 566 567#### Using C Callbacks From C++ Code 568 569z/OS and Windows COM wrappers around ICU need `__cdecl` for callback functions. 570The reason is that C++ can have a different function calling convention from C. 571These callback functions also usually need to be private. So the following code 572 573```c++ 574UBool 575isAcceptable(void * /* context */, 576 const char * /* type */, const char * /* name */, 577 const UDataInfo *pInfo) 578{ 579 // Do something here. 580} 581``` 582 583should be changed to look like the following by adding `U_CDECL_BEGIN`, `static`, 584`U_CALLCONV` and `U_CDECL_END`. 585 586```c++ 587U_CDECL_BEGIN 588static UBool U_CALLCONV 589isAcceptable(void * /* context */, 590 const char * /* type */, const char * /* name */, 591 const UDataInfo *pInfo) 592{ 593 // Do something here. 594} 595U_CDECL_END 596``` 597 598#### Same Module and Functionality in C and in C++ 599 600Determine if two headers are needed. If the same functionality is provided with 601both a C and a C++ API, then there can be two headers, one for each language, 602even if one uses the other. For example, there can be `umsg.h` for C and `msgfmt.h` 603for C++. 604 605Not all functionality has or needs both kinds of API. More and more 606functionality is available only via C APIs to avoid duplication of API, 607documentation, and maintenance. C APIs are perfectly usable from C++ code, 608especially with `UnicodeString` methods that alias or expose C-style string 609buffers. 610 611#### Platform Dependencies 612 613Use the platform dependencies that are within the header files that `utypes.h` 614files include. They are `platform.h` (which is generated by the configuration 615script from `platform.h.in`) and its more specific cousins like `pwin32.h` for 616Windows, which define basic types, and `putil.h`, which defines platform 617utilities. 618**Important:** Outside of these files, and a small number of implementation 619files that depend on platform differences (like `umutex.c`), **no** ICU source 620code may have **any** `#ifdef` **OperatingSystemName** instructions. 621 622#### Short, Unnested Mutex Blocks 623 624Do not use function calls within a mutex block for mutual-exclusion (mutex) 625blocks. This can prevent deadlocks from occurring later. There should be as 626little code inside a mutex block as possible to minimize the performance 627degradation from blocked threads. 628Also, it is not guaranteed that mutex blocks are re-entrant; therefore, they 629must not be nested. 630 631#### Names of Internal Functions 632 633Internal functions that are not declared static (regardless of inlining) must 634follow the naming conventions for exported functions because many compilers and 635linkers do not distinguish between library exports and intra-library visible 636functions. 637 638#### Which Language for the Implementation 639 640Write implementation code in C++. Use objects very carefully, as always: 641Implicit constructors, assignments etc. can make simple-looking code 642surprisingly slow. 643 644For every C API, make sure that there is at least one call from a pure C file in 645the cintltst test suite. 646 647Background: We used to prefer C or C-style C++ for implementation code because 648we used to have users ask for pure C. However, there was never a large, usable 649subset of ICU that was usable without any C++ dependencies, and C++ can(!) make 650for much shorter, simpler, less error-prone and easier-to-maintain code, for 651example via use of "smart pointers" (`unicode/localpointer.h` and `cmemory.h`). 652 653We still try to expose most functionality via *C APIs* because of the 654difficulties of binary compatible C++ APIs exported from DLLs/shared libraries. 655 656#### No Compiler Warnings 657 658ICU must compile without compiler warnings unless such warnings are verified to 659be harmless or bogus. Often times a warning on one compiler indicates a breaking 660error on another. 661 662#### Enum Values 663 664When casting an integer value to an enum type, the enum type *should* have a 665constant with this integer value, or at least it *must* have a constant whose 666value is at least as large as the integer value being cast, with the same 667signedness. For example, do not cast a -1 to an enum type that only has 668non-negative constants. Some compilers choose the internal representation very 669tightly for the defined enum constants, which may result in the equivalent of a 670`uint8_t` representation for an enum type with only small, non-negative constants. 671Casting a -1 to such a type may result in an actual value of 255. (This has 672happened!) 673 674When casting an enum value to an integer type, make sure that the enum value's 675numeric value is within range of the integer type. 676 677#### Do not check for `this!=NULL`, do not check for `NULL` references 678 679In public APIs, assume `this!=0` and assume that references are not 0. In C code, 680`"this"` is the "service object" pointer, such as `set` in 681`uset_add(USet* set, UChar32 c)` — don't check for `set!=NULL`. 682 683We do usually check all other (non-this) pointers for `NULL`, in those cases when 684`NULL` is not valid. (Many functions allow a `NULL` string or buffer pointer if the 685length or capacity is 0.) 686 687Rationale: `"this"` is not really an argument, and checking it costs a little bit 688of code size and runtime. Other libraries also commonly do not check for valid 689`"this"`, and resulting failures are fairly obvious. 690 691### Memory Usage 692 693#### Dynamically Allocated Memory 694 695ICU4C APIs are designed to allow separate heaps for its libraries vs. the 696application. This is achieved by providing factory methods and matching 697destructors for all allocated objects. The C++ API uses a common base class with 698overridden `new`/`delete` operators and/or forms an equivalent pair with `createXyz()` 699factory methods and the `delete` operator. The C API provides pairs of `open`/`close` 700functions for each service. See the C++ and C guideline sections below for 701details. 702 703Exception: Most C++ API functions that return a `StringEnumeration` (by pointer 704which the caller must delete) are named `getXyz()` rather than `createXyz()` 705because `"get"` is much more natural. (These are not factory methods in the sense 706of `NumberFormat::createScientificInstance()`.) For example, 707`static StringEnumeration *Collator::``get``Keywords(UErrorCode &)`. We should document 708clearly in the API comments that the caller must delete the returned 709`StringEnumeration`. 710 711#### Declaring Static Data 712 713All unmodifiable data should be declared `const`. This includes the pointers and 714the data itself. Also if you do not need a pointer to a string, declare the 715string as an array. This reduces the time to load the library and all its 716pointers. This should be done so that the same library data can be shared across 717processes automatically. Here is an example: 718 719```c++ 720#define MY_MACRO_DEFINED_STR "macro string" 721const char *myCString = "myCString"; 722int16_t myNumbers[] = {1, 2, 3}; 723``` 724 725This should be changed to the following: 726 727```c++ 728static const char MY_MACRO_DEFINED_STR[] = "macro string"; 729static const char myCString[] = "myCString"; 730static const int16_t myNumbers[] = {1, 2, 3}; 731``` 732 733#### No Static Initialization 734 735The most common reason to have static initialization is to declare a 736`static const UnicodeString`, for example (see `utypes.h` about invariant characters): 737 738```c++ 739static const UnicodeString myStr("myStr", ""); 740``` 741 742The most portable and most efficient way to declare ASCII text as a Unicode 743string is to do the following instead: 744 745```c++ 746static const UChar myStr[] = { 0x6D, 0x79, 0x53, 0x74, 0x72, 0}; /* "myStr" */ 747``` 748 749We do not use character literals 750for Unicode characters and strings because the execution character set of C/C++ 751compilers is almost never Unicode and may not be ASCII-compatible (especially on 752EBCDIC platforms). Depending on the API where the string is to be used, a 753terminating NUL (0) may or may not be required. The length of the string (number 754of `UChar`s in the array) can be determined with `sizeof(myStr)/U_SIZEOF_UCHAR`, 755(subtract 1 for the NUL if present). Always remember to put in a comment at the 756end of the declaration what the Unicode string says. 757 758Static initialization of C++ objects **must not be used** in ICU libraries 759because of the following reasons: 760 7611. It leads to intractable order-of-initialization dependencies. 7622. It makes it difficult or impossible to release all of the libraries 763 resources. See `u_cleanup()`. 7643. It takes time to initialize the library. 7654. Dependency checking is not completely done in C or C++. For instance, if an 766 ICU user creates an ICU object or calls an ICU function statically that 767 depends on static data, it is not guaranteed that the statically declared 768 data is initialized. 7695. Certain users like to manage their own memory. They can not manage ICU's 770 memory properly because of item #2. 7716. It is easier to debug code that does not use static initialization. 7727. Memory allocated at static initialization time is not guaranteed to be 773 deallocated with a C++ destructor when the library is unloaded. This is a 774 problem when ICU is unloaded and reloaded into memory and when you are using 775 a heap debugging tool. It would also not work with the `u_cleanup()` function. 7768. Some platforms cannot handle static initialization or static destruction 777 properly. Several compilers have this random bug (even in the year 2001). 778 779ICU users can use the `U_STRING_DECL` and `U_STRING_INIT` macros for C strings. Note 780that on some platforms this will incur a small initialization cost (simple 781conversion). Also, ICU users need to make sure that they properly and 782consistently declare the strings with both macros. See `ustring.h` for details. 783 784### C++ Coding Guidelines 785 786This section describes the C++ specific guidelines or conventions to use. 787 788#### Portable Subset of C++ 789 790ICU uses only a portable subset of C++ for maximum portability. Also, it does 791not use features of C++ that are not implemented well in all compilers or are 792cumbersome. In particular, ICU does not use exceptions, or the Standard Template 793Library (STL). 794 795We have started to use templates in ICU 4.2 (e.g., `StringByteSink`) and ICU 4.4 796(`LocalPointer` and some internal uses). We try to limit templates to where they 797provide a lot of benefit (robust code, avoid duplication) without much or any 798code bloat. 799 800We continue to not use the Standard Template Library (STL) in ICU library code 801because its design causes a lot of code bloat. More importantly: 802 803* Exceptions: STL classes and algorithms throw exceptions. ICU does not throw 804 exceptions, and ICU code is not exception-safe. 805* Memory management: STL uses default new/delete, or Allocator parameters 806 which create different types; they throw out-of-memory exceptions. ICU 807 memory allocation is customizable and must not throw exceptions. 808* Non-polymorphic: For APIs, STL classes are also problematic because 809 different template specializations create different types. For example, some 810 systems use custom string classes (different allocators, different 811 strategies for buffer sharing vs. copying), and ICU should be able to 812 interface with most of them. 813 814We have started to use compiler-provided Run-Time Type Information (RTTI) in ICU 8154.6. It is now required for building ICU, and encouraged for using ICU where 816RTTI is needed. For example, use `dynamic_cast<DecimalFormat*>` on a 817`NumberFormat` pointer that is usually but not always a `DecimalFormat` instance. 818Do not use `dynamic_cast<>` on a reference, because that throws a `bad_cast` 819exception on failure. 820 821ICU uses a limited form of multiple inheritance equivalent to Java's interface 822mechanism: All but one base classes must be interface/mixin classes, i.e., they 823must contain only pure virtual member functions. For details see the 824'boilerplate' discussion below. This restriction to at most one base class with 825non-virtual members eliminates problems with the use and implementation of 826multiple inheritance in C++. ICU does not use virtual base classes. 827 828> :point_right: **Note**: Every additional base class, *even an interface/mixin 829class*, adds another vtable pointer to each subclass object, that is, it 830*increases the object/instance size by 8 bytes* on most platforms. 831 832#### Classes and Members 833 834C++ classes and their members do not need a 'U' or any other prefix. 835 836#### Global Operators 837 838Global operators (operators that are not class members) can be problematic for 839library entry point versioning, may confuse users and cannot be easily ported to 840Java (ICU4J). They should be avoided if possible. 841 842~~The issue with library entry point versioning is that on platforms that do not 843support namespaces, users must rename all classes and global functions via 844urename.h. This renaming process is not possible with operators.~~ Starting with 845ICU 49, we require C++ namespace support. However, a global operator can be used 846in ICU4C (when necessary) if its function signature contains an ICU C++ class 847that is versioned. This will result in a mangled linker name that does contain 848the ICU version number via the versioned name of the class parameter. For 849example, ICU4C 2.8 added an operator + for `UnicodeString`, with two `UnicodeString` 850reference parameters. 851 852#### Virtual Destructors 853 854In classes with virtual methods, destructors must be explicitly declared, and 855must be defined (implemented) outside the class definition in a .cpp file. 856 857More precisely: 858 8591. All classes with any virtual members or any bases with any virtual members 860 should have an explicitly declared virtual destructor. 8612. Constructors and destructors should be declared and/or defined prior to 862 *any* other methods, public or private, within the class definition. 8633. All virtual destructors should be defined out-of-line, and in a .cpp file 864 rather than a header file. 865 866This is so that the destructors serve as "key functions" so that the compiler 867emits the vtable in only and exactly the desired files. It can help make 868binaries smaller that use statically-linked ICU libraries, because the compiler 869and linker can prove more easily that some code is not used. 870 871The Itanium C++ ABI (which is used on all x86 Linux) says: "The virtual table 872for a class is emitted in the same object containing the definition of its key 873function, i.e. the first non-pure virtual function that is not inline at the 874point of class definition. If there is no key function, it is emitted everywhere 875used." 876 877(This was first done in ICU 49; see [ticket #8454](https://unicode-org.atlassian.net/browse/ICU-8454.) 878 879#### Namespaces 880 881Beginning with ICU version 2.0, ICU uses namespaces. The actual namespace is 882`icu_M_N` with M being the major ICU release number and N being the minor ICU 883release number. For convenience, the namespace `icu` is an alias to the current 884release-specific one. (The actual namespace name is `icu` itself if renaming is 885turned off.) 886 887Starting with ICU 49, we require C++ namespace support. 888 889Class declarations, even forward declarations, must be scoped to the ICU 890namespace. For example: 891 892```c++ 893U_NAMESPACE_BEGIN 894 895class Locale; 896 897U_NAMESPACE_END 898 899// outside U_NAMESPACE_BEGIN..U_NAMESPACE_END 900extern void fn(icu::UnicodeString&); 901 902// outside U_NAMESPACE_BEGIN..U_NAMESPACE_END 903// automatically set by utypes.h 904// but recommended to be not set automatically 905U_NAMESPACE_USE 906Locale loc("fi"); 907``` 908 909`U_NAMESPACE_USE` (expands to using namespace icu_M_N; when available) is 910automatically done when `utypes.h` is included, so that all ICU classes are 911immediately usable. However, we recommend that you turn this off via 912`CXXFLAGS="-DU_USING_ICU_NAMESPACE=0"`. 913 914#### Declare Class APIs 915 916Class APIs need to be declared like either of the following: 917 918#### Inline-Implemented Member Functions 919 920Class member functions are usually declared but not inline-implemented in the 921class declaration. A long function implementation in the class declaration makes 922it hard to read the class declaration. 923 924It is ok to inline-implement *trivial* functions in the class declaration. 925Pretty much everyone agrees that inline implementations are ok if they fit on 926the same line as the function signature, even if that means bending the 927single-statement-per-line rule slightly: 928 929```c++ 930T *orphan() { T *p=ptr; ptr=NULL; return p; } 931``` 932 933Most people also agree that very short multi-line implementations are ok inline 934in the class declaration. Something like the following is probably the maximum: 935 936```c++ 937Value *getValue(int index) { 938 if(index>=0 && index<fLimit) { 939 return fArray[index]; 940 } 941 return NULL; 942} 943``` 944 945If the inline implementation is longer than that, then just declare the function 946inline and put the actual inline implementations after the class declaration in 947the same file. (See `unicode/unistr.h` for many examples.) 948 949If it's significantly longer than that, then it's probably not a good candidate 950for inlining anyway. 951 952#### C++ class layout and 'boilerplate' 953 954There are different sets of requirements for different kinds of C++ classes. In 955general, all instantiable classes (i.e., all classes except for interface/mixin 956classes and ones with only static member functions) inherit the `UMemory` base 957class. `UMemory` provides `new`/`delete` operators, which allows to keep the ICU 958heap separate from the application heap, or to customize ICU's memory allocation 959consistently. 960 961> :point_right: **Note**: Public ICU APIs must return or orphan only C++ objects 962that are to be released with `delete`. They must not return allocated simple 963types (including pointers, and arrays of simple types or pointers) that would 964have to be released with a `free()` function call using the ICU library's heap. 965Simple types and pointers must be returned using fill-in parameters (instead of 966allocation), or cached and owned by the returning API. 967 968**Public ICU C++ classes** must inherit either the `UMemory` or the `UObject` 969base class for proper memory management, and implement the following common set 970of 'boilerplate' functions: 971 972* default constructor 973* copy constructor 974* assignment operator 975* operator== 976* operator!= 977 978> :point_right: **Note**: Each of the above either must be implemented, verified 979that the default implementation according to the C++ standard will work 980(typically not if any pointers are used), or declared private without 981implementation. 982 983* If public subclassing is intended, then the public class must inherit 984 `UObject` and should implement 985 * `clone()` 986* **RTTI:** 987 * If a class is a subclass of a parent (e.g., `Format`) with ICU's "poor 988 man's RTTI" (Run-Time Type Information) mechanism (via 989 `getDynamicClassID()` and `getStaticClassID()`) then add that to the new 990 subclass as well (copy implementations from existing C++ APIs). 991 * If a class is a new, immediate subclass of `UObject` (e.g., 992 `Normalizer2`), creating a whole new class hierarchy, then declare a 993 *private* `getDynamicClassID()` and define it to return `NULL` (to 994 override the pure virtual version in `UObject`); copy the relevant lines 995 from `normalizer2.h` and `normalizer2.cpp` 996 (`UOBJECT_DEFINE_NO_RTTI_IMPLEMENTATION(className)`). Do not add any 997 "poor man's RTTI" at all to subclasses of this class. 998 999**Interface/mixin classes** are equivalent to Java interfaces. They are as much 1000multiple inheritance as ICU uses — they do not decrease performance, and they do 1001not cause problems associated with multiple base classes having data members. 1002Interface/mixin classes contain only pure virtual member functions, and must 1003contain an empty virtual destructor. See for example the `UnicodeMatcher` class. 1004Interface/mixin classes must not inherit any non-interface/mixin class, 1005especially not `UMemory` or `UObject`. Instead, implementation classes must inherit 1006one of these two (or a subclass of them) in addition to the interface/mixin 1007classes they implement. See for example the `UnicodeSet` class. 1008 1009**Static classes** contain only static member functions and are therefore never 1010instantiated. They must not inherit `UMemory` or `UObject`. Instead, they must 1011declare a private default constructor (without any implementation) to prevent 1012instantiation. See for example the `LESwaps` layout engine class. 1013 1014**C++ classes internal to ICU** need not (but may) implement the boilerplate 1015functions as mentioned above. They must inherit at least `UMemory` if they are 1016instantiable. 1017 1018#### Make Sure The Compiler Uses C++ 1019 1020The `__cplusplus` macro being defined ensures that the compiler uses C++. Starting 1021with ICU 49, we use this standard predefined macro. 1022 1023Up until ICU 4.8 we used to define and use `XP_CPLUSPLUS` but that was redundant 1024and did not add any value because it was defined if-and-only-if `__cplusplus` was 1025defined. 1026 1027#### Adoption of Objects 1028 1029Some constructors, factory functions and member functions take pointers to 1030objects that are then adopted. The adopting object contains a pointer to the 1031adoptee and takes over ownership and lifecycle control. Adoption occurs even if 1032an error occurs during the execution of the function, or in the code that adopts 1033the object. The semantics used within ICU are *adopt-on-call* (as opposed to, 1034for example, adopt-on-success): 1035 1036* **General**: A constructor or function that adopts an object does so 1037 in all cases, even if an error occurs and a `UErrorCode` is set. This means 1038 that either the adoptee is deleted immediately or its pointer is stored in 1039 the new object. The former case is most common when the constructor or 1040 factory function is called and the `UErrorCode` already indicates a failure. 1041 In the latter case, the new object must take care of deleting the adoptee 1042 once it is deleted itself regardless of whether or not the constructor was 1043 successful. 1044 1045* **Constructors**: The code that creates the object with the new operator 1046 must check the resulting pointer returned by new, deleting any adoptees if 1047 it is `nullptr` because the constructor was not called. (Typically, a `UErrorCode` 1048 must be set to `U_MEMORY_ALLOCATION_ERROR`.) 1049 1050 **Pitfall**: If you allocate/construct via "`ClassName *p = new ClassName(adoptee);`" 1051 and the memory allocation failed (`p==nullptr`), then the constructor has not 1052 been called, the adoptee has not been adopted, and you are still responsible for 1053 deleting it! 1054 1055 To simplify the above checking, ICU's `LocalPointer` class includes a 1056 constructor that both takes ownership and reports an error if nullptr. It is 1057 intended to be used with other-class constructors that may report a failure via 1058 UErrorCode, so that callers need to check only for U_FAILURE(errorCode) and not 1059 also separately for isNull(). 1060 1061* **Factory functions (createInstance())**: The factory function must set a 1062 `U_MEMORY_ALLOCATION_ERROR` and delete any adoptees if it cannot allocate the 1063 new object. If the construction of the object fails otherwise, then the 1064 factory function must delete it and the factory function must delete its 1065 adoptees. As a result, a factory function always returns either a valid 1066 object and a successful `UErrorCode`, or a nullptr and a failure `UErrorCode`. 1067 A factory function returns a pointer to an object that must be deleted by 1068 the user/owner. 1069 1070Example: (This is a best-practice example. It does not reflect current `Calendar` 1071code.) 1072 1073```c++ 1074Calendar* 1075Calendar::createInstance(TimeZone* zone, UErrorCode& errorCode) { 1076 LocalPointer<TimeZone> adoptedZone(zone); 1077 if(U_FAILURE(errorCode)) { 1078 // The adoptedZone destructor deletes the zone. 1079 return nullptr; 1080 } 1081 // since the Locale isn't specified, use the default locale 1082 LocalPointer<Calendar> c(new GregorianCalendar(zone, Locale::getDefault(), errorCode), 1083 errorCode); // LocalPointer will set a U_MEMORY_ALLOCATION_ERROR if 1084 // new GregorianCalendar() returns nullptr. 1085 if (c.isValid()) { 1086 // c adopted the zone. 1087 adoptedZone.orphan(); 1088 } 1089 if (U_FAILURE(errorCode)) { 1090 // If c was constructed, then the c destructor deletes the Calendar, 1091 // and the Calendar destructor deletes the adopted zone. 1092 return nullptr; 1093 } 1094 return c.orphan(); 1095} 1096``` 1097 1098#### Memory Allocation 1099 1100All ICU C++ class objects directly or indirectly inherit `UMemory` (see 1101'boilerplate' discussion above) which provides `new`/`delete` operators, which in 1102turn call the internal functions in `cmemory.c`. Creating and releasing ICU C++ 1103objects with `new`/`delete` automatically uses the ICU allocation functions. 1104 1105> :point_right: **Note**: Remember that (in absence of explicit :: scoping) C++ 1106determines which `new`/`delete` operator to use from which type is allocated or 1107deleted, not from the context of where the statement is. Since non-class data 1108types (like `int`) cannot define their own `new`/`delete` operators, C++ always 1109uses the global ones for them by default. 1110 1111When global `new`/`delete` operators are to be used in the application (never inside 1112ICU!), then they should be properly scoped as e.g. `::new`, and the application 1113must ensure that matching `new`/`delete` operators are used. In some cases where 1114such scoping is missing in non-ICU code, it may be simpler to compile ICU 1115without its own `new`/`delete` operators. See `source/common/unicode/uobject.h` for 1116details. 1117 1118In ICU library code, allocation of non-class data types — simple integer types 1119**as well as pointers** — must use the functions in `cmemory.h`/`.c` (`uprv_malloc()`, 1120`uprv_free()`, `uprv_realloc()`). Such memory objects must be released inside ICU, 1121never by the user; this is achieved either by providing a "close" function for a 1122service or by avoiding to pass ownership of these objects to the user (and 1123instead filling user-provided buffers or returning constant pointers without 1124passing ownership). 1125 1126The `cmemory.h`/`.c` functions can be overridden at ICU compile time for custom 1127memory management. By default, `UMemory`'s `new`/`delete` operators are 1128implemented by calling these common functions. Overriding the `cmemory.h`/`.c` 1129functions changes the memory management for both C and C++. 1130 1131C++ objects that were either allocated with new or returned from a `createXYZ()` 1132factory method must be deleted by the user/owner. 1133 1134#### Memory Allocation Failures 1135 1136All memory allocations and object creations should be checked for success. In 1137the event of a failure (a `NULL` returned), a `U_MEMORY_ALLOCATION_ERROR` status 1138should be returned by the ICU function in question. If the allocation failure 1139leaves the ICU service in an invalid state, such that subsequent ICU operations 1140could also fail, the situation should be flagged so that the subsequent 1141operations will fail cleanly. Under no circumstances should a memory allocation 1142failure result in a crash in ICU code, or cause incorrect results rather than a 1143clean error return from an ICU function. 1144 1145Some functions, such as the C++ assignment operator, are unable to return an ICU 1146error status to their caller. In the event of an allocation failure, these 1147functions should mark the object as being in an invalid or bogus state so that 1148subsequent attempts to use the object will fail. Deletion of an invalid object 1149should always succeed. 1150 1151#### Memory Management 1152 1153C++ memory management is error-prone, and memory leaks are hard to avoid, but 1154the following helps a lot. 1155 1156First, if you can stack-allocate an object (for example, a `UnicodeString` or 1157`UnicodeSet`), do so. It is the easiest way to manage object lifetime. 1158 1159Inside functions, avoid raw pointers to owned objects. Instead, use 1160[LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html)`<UnicodeString>` 1161or `LocalUResouceBundlePointer` etc., which is ICU's "smart pointer" 1162implementation. This is the "[Resource Acquisition Is Initialization(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)" 1163idiom. The "smart pointer" auto-deletes the object when it goes out of scope, 1164which means that you can just return from the function when an error occurs and 1165all auto-managed objects are deleted. You do not need to remember to write an 1166increasing number of "`delete xyz;`" at every function exit point. 1167 1168*In fact, you should almost never need to write "delete" in any function.* 1169 1170* Except in a destructor where you delete all of the objects which the class 1171 instance owns. 1172* Also, in static "cleanup" functions you still need to delete cached objects. 1173 1174When you pass on ownership of an object, for example to return the pointer of a 1175newly built object, or when you call a function which adopts your object, use 1176`LocalPointer`'s `.orphan()`. 1177 1178* Careful: When you return an object or pass it into an adopting factory 1179 method, you can use `.orphan()` directly. 1180* However, when you pass it into an adopting constructor, you need to pass in 1181 the `.getAlias()`, and only if the *allocation* of the new owner succeeded 1182 (you got a non-NULL pointer for that) do you `.orphan()` your `LocalPointer`. 1183* See the `Calendar::createInstance()` example above. 1184* See the `AlphabeticIndex` implementation for live examples. Search for other 1185 uses of `LocalPointer`/`LocalArray`. 1186 1187Every object must always be deletable/destructable. That is, at a minimum, all 1188pointers to owned memory must always be either NULL or point to owned objects. 1189 1190Internally: 1191 1192[cmemory.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/cmemory.h) 1193defines the `LocalMemory` class for chunks of memory of primitive types which 1194will be `uprv_free()`'ed. 1195 1196[cmemory.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/cmemory.h) 1197also defines `MaybeStackArray` and `MaybeStackHeaderAndArray` which automate 1198management of arrays. 1199 1200Use `CharString` 1201([charstr.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/charstr.h)) 1202for `char *` strings that you build and modify. 1203 1204#### Global Inline Functions 1205 1206Global functions (non-class member functions) that are declared inline must be 1207made static inline. Some compilers will export symbols that are declared inline 1208but not static. 1209 1210#### No Declarations in the for() Loop Head 1211 1212Iterations through `for()` loops must not use declarations in the first part of 1213the loop. There have been two revisions for the scoping of these declarations 1214and some compilers do not comply to the latest scoping. Declarations of loop 1215variables should be outside these loops. 1216 1217#### Common or I18N 1218 1219Decide whether or not the module is part of the common or the i18n API 1220collection. Use the appropriate macros. For example, use 1221`U_COMMON_IMPLEMENTATION`, `U_I18N_IMPLEMENTATION`, `U_COMMON_API`, `U_I18N_API`. 1222See `utypes.h`. 1223 1224#### Constructor Failure 1225 1226If there is a reasonable chance that a constructor fails (For example, if the 1227constructor relies on loading data), then either it must use and set a 1228`UErrorCode` or the class needs to support an `isBogus()`/`setToBogus()` mechanism 1229like `UnicodeString` and `UnicodeSet`, and the constructor needs to set the object 1230to bogus if it fails. 1231 1232#### `UVector`, `UVector32`, or `UVector64` 1233 1234Use `UVector` to store arrays of `void *`; use `UVector32` to store arrays of 1235`int32_t`; use `UVector64` to store arrays of `int64_t`. Historically, `UVector` 1236has stored either `int32_t` or `void *`, but now storing `int32_t` in a `UVector` 1237is deprecated in favor of `UVector32`. 1238 1239### C Coding Guidelines 1240 1241This section describes the C-specific guidelines or conventions to use. 1242 1243#### Declare and define C APIs with both `U_CAPI` and `U_EXPORT2` 1244 1245All C APIs need to be **both declared and defined** using the `U_CAPI` and 1246`U_EXPORT2` qualifiers. 1247 1248```c++ 1249U_CAPI int32_t U_EXPORT2 1250u_formatMessage(...); 1251``` 1252 1253> :point_right: **Note**: Use `U_CAPI` before and `U_EXPORT2` after the return 1254type of exported C functions. Internal functions that are visible outside a 1255compilation unit need a `U_CFUNC` before the return type. 1256 1257#### Subdivide the Name Space 1258 1259Use prefixes to avoid name collisions. Some of those prefixes contain a 3- (or 1260sometimes 4-) letter module identifier. Very general names like 1261`u_charDirection()` do not have a module identifier in their prefix. 1262 1263* For POSIX replacements, the (all lowercase) POSIX function names start with 1264 "u_": `u_strlen()`. 1265* For other API functions, a 'u' is appended to the beginning with the module 1266 identifier (if appropriate), and an underscore '_', followed by the 1267 **mixed-case** function name. For example, use `u_charDirection()`, 1268 `ubidi_setPara()`. 1269* For types (struct, enum, union), a "U" is appended to the beginning, often 1270 "`U<module identifier>`" directly to the typename, without an underscore. For 1271 example, use `UComparisonResult`. 1272* For #defined constants and macros, a "U_" is appended to the beginning, 1273 often "`U<module identifier>_`" with an underscore to the uppercase macro 1274 name. For example, use `U_ZERO_ERROR`, `U_SUCCESS()`. For example, `UNORM_NFC` 1275 1276#### Functions for Constructors and Destructors 1277 1278Functions that roughly compare to constructors and destructors are called 1279`umod_open()` and `umod_close()`. See the following example: 1280 1281```c++ 1282CAPI UBiDi * U_EXPORT2 1283ubidi_open(); 1284 1285CAPI UBiDi * U_EXPORT2 1286ubidi_openSized(UTextOffset maxLength, UTextOffset maxRunCount); 1287 1288CAPI void U_EXPORT2 1289ubidi_close(UBiDi *pBiDi); 1290``` 1291 1292Each successful call to a `umod_open()` returns a pointer to an object that must 1293be released by the user/owner by calling the matching `umod_close()`. 1294 1295#### C "Service Object" Types and LocalPointer Equivalents 1296 1297For every C "service object" type (equivalent to C++ class), we want to have a 1298[LocalPointer](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localpointer_8h.html) 1299equivalent, so that C++ code calling the C API can use the specific "smart 1300pointer" to implement the "[Resource Acquisition Is Initialization 1301(RAII)](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)" 1302idiom. 1303 1304For example, in `ubidi.h` we define the `UBiDi` "service object" type and also 1305have the following "smart pointer" definition which will call `ubidi_close()` on 1306destruction: 1307 1308```c++ 1309// Use config switches like this only after including unicode/utypes.h 1310// or another ICU header. 1311#if U_SHOW_CPLUSPLUS_API 1312 1313U_NAMESPACE_BEGIN 1314 1315/** 1316 * class LocalUBiDiPointer 1317 * "Smart pointer" class, closes a UBiDi via ubidi_close(). 1318 * For most methods see the LocalPointerBase base class. 1319 * 1320 * @see LocalPointerBase 1321 * @see LocalPointer 1322 * @stable ICU 4.4 1323 */ 1324U_DEFINE_LOCAL_OPEN_POINTER(LocalUBiDiPointer, UBiDi, ubidi_close); 1325 1326U_NAMESPACE_END 1327 1328#endif 1329``` 1330 1331#### Inline Implementation Functions 1332 1333Some, but not all, C compilers allow ICU users to declare functions inline 1334(which is a C++ language feature) with various keywords. This has advantages for 1335implementations because inline functions are much safer and more easily debugged 1336than macros. 1337 1338ICU *used to* use a portable `U_INLINE` declaration macro that can be used for 1339inline functions in C. However, this was an unnecessary platform dependency. 1340 1341We have changed all code that used `U_INLINE` to C++ (.cpp) using "inline", and 1342removed the `U_INLINE` definition. 1343 1344If you find yourself constrained by .c, change it to .cpp. 1345 1346All functions that are declared inline, or are small enough that an optimizing 1347compiler might inline them even without the inline declaration, should be 1348defined (implemented) – not just declared – before they are first used. This is 1349to enable as much inlining as possible, and also to prevent compiler warnings 1350for functions that are declared inline but whose definition is not available 1351when they are called. 1352 1353#### C Equivalents for Classes with Multiple Constructors 1354 1355In cases like `BreakIterator` and `NumberFormat`, instead of having several 1356different 'open' APIs for each kind of instances, use an enum selector. 1357 1358#### Source File Names 1359 1360Source file names for C begin with a 'u'. 1361 1362#### Memory APIs Inside ICU 1363 1364For memory allocation in C implementation files for ICU, use the functions and 1365macros in `cmemory.h`. When allocated memory is returned from a C API function, 1366there must be a corresponding function (like a `ucnv_close()`) that deallocates 1367that memory. 1368 1369All memory allocations in ICU should be checked for success. In the event of a 1370failure (a `NULL` returned from `uprv_malloc()`), a `U_MEMORY_ALLOCATION_ERROR` status 1371should be returned by the ICU function in question. If the allocation failure 1372leaves the ICU service in an invalid state, such that subsequent ICU operations 1373could also fail, the situation should be flagged so that the subsequent 1374operations will fail cleanly. Under no circumstances should a memory allocation 1375failure result in a crash in ICU code, or cause incorrect results rather than a 1376clean error return from an ICU function. 1377 1378#### // Comments 1379 1380C++ style // comments may be used in plain C files and in headers that will be 1381included in C files. 1382 1383## Source Code Strings with Unicode Characters 1384 1385### `char *` strings in ICU 1386 1387| Declared type | encoding | example | Used with | 1388| --- | --- | --- | --- | 1389| `char *` | varies with platform | `"Hello"` | Most ICU API functions taking `char *` parameters. Unless otherwise noted, characters are restricted to the "Invariant" set, described below | 1390| `char *` | UTF-8 | `u8"¡Hola!"` | Only functions that are explicitly documented as expecting UTF-8. No restrictions on the characters used. | 1391| `UChar *` | UTF-16 | `u"¡Hola!"` | All ICU functions with `UChar *` parameters | 1392| `UChar32` | Code Point value | `U''` | UChar32 single code point constant. | 1393| `wchar_t` | unknown | `L"Hello"` | Not used with ICU. Unknown encoding, unknown size, not portable. | 1394 1395ICU source files are UTF-8 encoded, allowing any Unicode character to appear in 1396Unicode string or character literals, without the need for escaping. But, for 1397clarity, use escapes when plain text would be confusing, e.g. for invisible 1398characters. 1399 1400For convenience, ICU4C tends to use `char *` strings in places where only 1401"invariant characters" (a portable subset of the 7-bit ASCII repertoire) are 1402used. This allows locale IDs, charset names, resource bundle item keys and 1403similar items to be easily specified as string literals in the source code. The 1404same types of strings are also stored as "invariant character" `char *` strings 1405in the ICU data files. 1406 1407ICU has hard coded mapping tables in `source/common/putil.c` to convert invariant 1408characters to and from Unicode without using a full ICU converter. These tables 1409must match the encoding of string literals in the ICU code as well as in the ICU 1410data files. 1411 1412> :point_right: **Note**: Important: ICU assumes that at least the invariant 1413characters always have the same codes as is common on platforms with the same 1414charset family (ASCII vs. EBCDIC). **ICU has not been tested on platforms where 1415this is not the case.** 1416 1417Some usage of `char *` strings in ICU assumes the system charset instead of 1418invariant characters. Such strings are only handled with the default converter 1419(See the following section). The system charset is usually a superset of the 1420invariant characters. 1421 1422The following are the ASCII and EBCDIC byte values for all of the invariant 1423characters (see also `unicode/utypes.h`): 1424 1425| Character(s) | ASCII | EBCDIC | 1426| --- | --- | --- | 1427| a..i | 61..69 | 81..89 | 1428| j..r | 6A..72 | 91..99 | 1429| s..z | 73..7A | A2..A9 | 1430| A..I | 41..49 | C1..C9 | 1431| J..R | 4A..52 | D1..D9 | 1432| S..Z | 53..5A | E2..E9 | 1433| 0..9 | 30..39 | F0..F9 | 1434| (space) | 20 | 40 | 1435| " | 22 | 7F | 1436| % | 25 | 6C | 1437| & | 26 | 50 | 1438| ' | 27 | 7D | 1439| ( | 28 | 4D | 1440| ) | 29 | 5D | 1441| \* | 2A | 5C | 1442| + | 2B | 4E | 1443| , | 2C | 6B | 1444| - | 2D | 60 | 1445| . | 2E | 4B | 1446| / | 2F | 61 | 1447| : | 3A | 7A | 1448| ; | 3B | 5E | 1449| < | 3C | 4C | 1450| = | 3D | 7E | 1451| > | 3E | 6E | 1452| ? | 3F | 6F | 1453| _ | 5F | 6D | 1454 1455### Rules Strings with Unicode Characters 1456 1457In order to include characters in source code strings that are not part of the 1458invariant subset of ASCII, one has to use character escapes. In addition, rules 1459strings for collation, etc. need to follow service-specific syntax, which means 1460that spaces and ASCII punctuation must be quoted using the following rules: 1461 1462* Single quotes delineate literal text: `a'>'b` => `a>b` 1463* Two single quotes, either between or outside of single quoted text, indicate 1464 a literal single quote: 1465 * `a''b` => `a'b` 1466 * `a'>''<'b` => `a>'<b` 1467* A backslash precedes a single literal character: 1468* Several standard mechanisms are handled by `u_unescape()` and its variants. 1469 1470> :point_right: **Note**: All of these quoting mechanisms are supported by the 1471`RuleBasedTransliterator`. The single quote mechanisms (not backslash, not 1472`u_unescape()`) are supported by the format classes. In its infancy, 1473`ResourceBundle` supported the `\uXXXX` mechanism and nothing else. 1474This quoting method is the current policy. However, there are modules within 1475the ICU services that are being updated and this quoting method might not have 1476been applied to all of the modules. 1477 1478## Java Coding Conventions Overview 1479 1480The ICU group uses the following coding guidelines to create software using the 1481ICU Java classes and methods. 1482 1483### Code style 1484 1485The standard order for modifier keywords on APIs is: 1486 1487* `public static final synchronized strictfp` 1488* `public abstract` 1489 1490Do not use wild card import, such as "`import java.util.*`". The sort order of 1491import statements is `java` / `javax` / `org` / `com`. Within each top level package 1492category, sub packages and classes are sorted by alphabetical order. We 1493recommend ICU developers to use the Eclipse IDE feature \[Source\] - \[Organize 1494Imports\] (Ctrl+Shift+O) to organize import statements. 1495 1496All if/else/for/while/do loops use braces, even if the controlled statement is a 1497single line. This is for clarity and to avoid mistakes due to bad nesting of 1498control statements, especially during maintenance. 1499 1500Tabs should not be present in source files. 1501 1502Indentation is 4 spaces. 1503 1504Make sure the code is formatted cleanly with regular indentation. Follow Java 1505style code conventions, e.g., don't put multiple statements on a single line, 1506use mixed-case identifiers for classes and methods and upper case for constants, 1507and so on. 1508 1509Java source formatting rules described above is coming with the Eclipse project 1510file. It is recommended to run \[Source\] - \[Format\] (Ctrl+Shift+F) on Eclipse 1511IDE to clean up source files if necessary. 1512 1513Use UTF-8 encoding (without BOM) for java source files. 1514 1515Javadoc should be complete and correct when code is checked in, to avoid playing 1516catch-up later during the throes of the release. Please javadoc all methods, not 1517just external APIs, since this helps with maintenance. 1518 1519### Code organization 1520 1521Avoid putting more than one top-level class in a single file. Either use 1522separate files or nested classes. 1523 1524Always define at least one constructor in a public API class. The Java compiler 1525automatically generates no-arg constructor when a class has no explicit 1526constructors. We cannot provide proper API documentations for such default 1527constructors. 1528 1529Do not mix test, tool, and runtime code in the same file. If you need some 1530access to private or package methods or data, provide public accessors for them 1531and mark them `@internal`. Test code should be placed in `com.ibm.icu.dev.test` 1532package, and tools (e.g., code that generates data, source code, or computes 1533constants) in `com.ibm.icu.dev.tool` package. Occasionally for very simple cases 1534you can leave a few lines of tool code in the main source and comment it out, 1535but maintenance is easier if you just comment the location of the tools in the 1536source and put the actual code elsewhere. 1537 1538Avoid creating new interfaces unless you know you need to mix the interface into 1539two or more classes that have separate inheritance. Interfaces are impossible to 1540modify later in a backwards-compatible way. Abstract classes, on the other hand, 1541can add new methods with default behavior. Use interfaces only if it is required 1542by the architecture, not just for expediency. 1543 1544Current releases of ICU4J (since ICU 63) are restricted to use Java SE 7 APIs 1545and language features. 1546 1547### ICU Packages 1548 1549Public APIs should be placed in `com.ibm.icu.text`, `com.ibm.icu.util`, and 1550`com.ibm.icu.lang`. For historical reasons and for easier migration from JDK 1551classes, there are also APIs in `com.ibm.icu.math` but new APIs should not be 1552added there. 1553 1554APIs used only during development, testing, or tools work should be placed in 1555`com.ibm.icu.dev`. 1556 1557A class or method which is used by public APIs (listed above) but which is not 1558itself public can be placed in different places: 1559 15601. If it is only used by one class, make it private in that class. 15612. If it is only used by one class and its subclasses, make it protected in 1562 that class. In general, also tag it `@internal` unless you are working on a 1563 class that supports user-subclassing (rare). 15643. If it is used by multiple classes in one package, make it package private 1565 (also known as default access) and mark it `@internal`. 15664. If it is used by multiple packages, make it public and place the class in 1567 `the com.ibm.icu.impl` package. 1568 1569### ICU4J API Stability 1570 1571General discussion: See [ICU Design / ICU API compatibility](../icu/design.md#icu-api-compatibility). 1572 1573Occasionally, we “broaden” or “widen” a Java API by making a parameter broader 1574(e.g., `char` (code unit) to `int` (code point), or `String` to `CharSequence`) 1575or a return type narrower (e.g., `Object` to `UnicodeSet`). 1576 1577Such a change is source-compatible but not binary compatible. 1578Before we do this, we need to check with users like Android whether this is ok. 1579For example, in a class that Android exposes via its SDK, 1580Android may need to retain hidden compatibility overloads with the old input types. 1581 1582In addition, we should test with code using both the old and new types, 1583so that if someone has such compatibility overloads they all get exercised. 1584 1585### Error Handling and Exceptions 1586 1587Errors should be indicated by throwing exceptions, not by returning “bogus” 1588values. 1589 1590If an input parameter is in error, then a new 1591`IllegalArgumentException("description")` should be thrown. 1592 1593Exceptions should be caught only when something must be done, for example 1594special cleanup or rethrowing a different exception. If the error “should never 1595occur”, then throw a `new RuntimeException("description")` (rare). In this case, 1596a comment should be added with a justification. 1597 1598Use exception chaining: When an exception is caught and a new one created and 1599thrown (usually with additional information), the original exception should be 1600chained to the new one. 1601 1602A catch expression should not catch Throwable. Catch expressions should specify 1603the most specific subclass of Throwable that applies. If there are two concrete 1604subclasses, both should be specified in separate catch statements. 1605 1606### Binary Data Files 1607 1608ICU4J uses the same binary data files as ICU4C, in the big-endian/ASCII form. 1609The `ICUBinary` class should be used to read them. 1610 1611Some data sources (for example, compressed Jar files) do not allow the use of 1612several `InputStream` and related APIs: 1613 1614* Memory mapping is efficient, but not available for all data sources. 1615* Do not depend on `InputStream.available()`: It does not provide reliable 1616 information for some data sources. Instead, the length of the data needs to 1617 be determined from the data itself. 1618* Do not call `mark()` and `reset()` methods on `InputStream` without wrapping the 1619 `InputStream` object in a new `BufferedInputStream` object. These methods are 1620 not implemented by the `ZipInputStream` class, and their use may result in an 1621 `IOException`. 1622 1623### Compiler Warnings 1624 1625There should be no compiler warnings when building ICU4J. It is recommended to 1626develop using Eclipse, and to fix any problems that are shown in the Eclipse 1627Problems panel (below the main window). 1628 1629When a warning is not avoidable, you should add `@SuppressWarnings` annotations 1630with minimum scope. 1631 1632### Miscellaneous 1633 1634Objects should not be cast to a class in the `sun.*` packages because this would 1635cause a `SecurityException` when run under a `SecurityManager`. The exception needs 1636to be caught and default action taken, instead of propagating the exception. 1637 1638## Adding .c, .cpp and .h files to ICU 1639 1640In order to add compilable files to ICU, add them to the source code control 1641system in the appropriate folder and also to the build environment. 1642 1643To add these files, use the following steps: 1644 16451. Choose one of the ICU libraries: 1646 * The common library provides mostly low-level utilities and basic APIs that 1647 often do not make use of Locales. Examples are APIs that deal with character 1648 properties, the Locale APIs themselves, and ResourceBundle APIs. 1649 * The i18n library provides Locale-dependent and -using APIs, such as for 1650 collation and formatting, that are most useful for internationalized user 1651 input and output. 16522. Put the source code files into the folder `icu/source/library-name`, then add 1653 them to the build system: 1654 * For most platforms, add the expected .o files to 1655 `icu/source/library-name/Makefile.in`, to the OBJECTS variable. Add the 1656 **public** header files to the HEADERS variable. 1657 * For Microsoft Visual C++ 6.0, add all the source code files to 1658 `icu/source/library-name/library-name.dsp`. If you don't have Visual C++, add 1659 the filenames to the project file manually. 16603. Add test code to `icu/source/test/cintltest` for C APIs and to 1661 `icu/source/test/intltest` for C++ APIs. 16624. Make sure that the API functions are called by the test code (100% API 1663 coverage) and that at least 85% of the implementation code is exercised by 1664 the tests (>=85% code coverage). 16655. Create test code for C using the `log_err()`, `log_info()`, and `log_verbose()` 1666 APIs from `cintltst.h` (which uses `ctest.h`) and check it into the appropriate 1667 folder. 16686. In order to get your C test code called, add its top level function and a 1669 descriptive test module path to the test system by calling `addTest()`. The 1670 function that makes the call to `addTest()` ultimately must be called by 1671 `addAllTests()` in `calltest.c`. Groups of tests typically have a common 1672 `addGroup()` function that calls `addTest()` for the test functions in its 1673 group, according to the common part of the test module path. 16747. Add that test code to the build system also. Modify `Makefile.in` and the 1675 appropriate `.dsp` file (For example, the file for the library code). 1676 1677## C Test Suite Notes 1678 1679The cintltst Test Suite contains all the tests for the International Components 1680for Unicode C API. These tests may be automatically run by typing "cintltst" or 1681"cintltst -all" at the command line. This depends on the C Test Services: 1682`cintltst` or `cintltst -all`. 1683 1684### C Test Services 1685 1686The purpose of the test services is to enable the writing of tests entirely in 1687C. The services have been designed to make creating tests or converting old ones 1688as simple as possible with a minimum of services overhead. A sample test file, 1689"demo.c", is included at the end of this document. For more information 1690regarding C test services, please see the `icu4c/source/tools/ctestfw` directory. 1691 1692### Writing Test Functions 1693 1694The following shows the possible format of test functions: 1695 1696```c++ 1697void some_test() 1698{ 1699} 1700``` 1701 1702Output from the test is accomplished with three printf-like functions: 1703 1704```c++ 1705void log_err ( const char *fmt, ... ); 1706void log_info ( const char *fmt, ... ); 1707void log_verbose ( const char *fmt, ... ); 1708``` 1709 1710* `log_info()` writes to the console for informational messages. 1711* `log_verbose()` writes to the console ONLY if the VERBOSE flag is turned 1712 on (or the `-v` option to the command line). This option is useful for 1713 debugging. By default, the VERBOSE flag is turned OFF. 1714* `log_error()` can be called when a test failure is detected. The error is 1715 then logged and error count is incremented by one. 1716 1717To use the tests, link them into a hierarchical structure. The root of the 1718structure will be allocated by default. 1719 1720```c++ 1721TestNode *root = NULL; /* empty */ 1722addTest( &root, &some_test, "/test"); 1723``` 1724 1725Provide `addTest()` with the function pointer for the function that performs the 1726test as well as the absolute 'path' to the test. Paths may be up to 127 chars in 1727length and may be used to group tests. 1728 1729The calls to `addTest` must be placed in a function or a hierarchy of functions 1730(perhaps mirroring the paths). See the existing cintltst for more details. 1731 1732### Running the Tests 1733 1734A subtree may be extracted from another tree of tests for the programmatic 1735running of subtests. 1736 1737```c++ 1738TestNode* sub; 1739sub = getTest(root, "/mytests"); 1740``` 1741 1742And a tree of tests may be run simply by: 1743 1744```c++ 1745runTests( root ); /* or 'sub' */ 1746``` 1747 1748Similarly, `showTests()` lists out the tests. However, it is easier to use the 1749command prompt with the Usage specified below. 1750 1751### Globals 1752 1753The command line parser resets the error count and prints a summary of the 1754failed tests. But if `runTest` is called directly, for instance, it needs to be 1755managed manually. `ERROR_COUNT` contains the number of times `log_err` was 1756called. `runTests` resets the count to zero before running the tests. 1757`VERBOSITY` must be 1 to display `log_verbose()` data. Otherwise, `VERBOSITY` 1758must be set to 0 (default). 1759 1760### Building cintltst 1761 1762To compile this test suite using Microsoft Visual C++ (MSVC), follow the 1763instructions in [How To Build And Install On Windows](../icu4c/build#how-to-build-and-install-on-windows). This builds the libraries as well as the `cintltst` executable. 1764 1765### Executing cintltst 1766 1767To run the test suite from the command line, change the directories to 1768`icu4c/source/test/cintltst/Debug` for the debug build (or 1769`icu4c/source/test/cintltst/Release` for the release build) and then type `cintltst`. 1770 1771### cintltst Usage 1772 1773Type `cintltst -h` to view its command line parameters. 1774 1775```text 1776### Syntax: 1777### Usage: [ -l ] [ -v ] [ -verbose] [-a] [ -all] [-n] 1778 [-no_err_msg] [ -h] [ /path/to/test ] 1779### -l To get a list of test names 1780### -all To run all the test 1781### -a To run all the test(same as -all) 1782### -verbose To turn ON verbosity 1783### -v To turn ON verbosity(same as -verbose) 1784### -h To print this message 1785### -n To turn OFF printing error messages 1786### -no_err_msg (same as -n) 1787### -[/subtest] To run a subtest 1788### For example to run just the utility tests type: cintltest /tsutil) 1789### To run just the locale test type: cintltst /tsutil/loctst 1790### 1791 1792/******************** sample ctestfw test ******************** 1793********* Simply link this with libctestfw or ctestfw.dll **** 1794************************* demo.c *****************************/ 1795 1796#include "stdlib.h" 1797#include "ctest.h" 1798#include "stdio.h" 1799#include "string.h" 1800 1801/** 1802* Some sample dummy tests. 1803* the statics simply show how often the test is called. 1804*/ 1805void mytest() 1806{ 1807 static i = 0; 1808 log_info("I am a test[%d]\n", i++); 1809} 1810 1811void mytest_err() 1812{ 1813 static i = 0; 1814 log_err("I am a test containing an error[%d]\n", i++); 1815 log_err("I am a test containing an error[%d]\n", i++); 1816} 1817 1818void mytest_verbose() 1819{ 1820 /* will only show if verbose is on (-v) */ 1821 log_verbose("I am a verbose test, blabbing about nothing at 1822all.\n"); 1823} 1824 1825/** 1826* Add your tests from this function 1827*/ 1828 1829void add_tests( TestNode** root ) 1830{ 1831 addTest(root, &mytest, "/apple/bravo" ); 1832 addTest(root, &mytest, "/a/b/c/d/mytest"); 1833 addTest(root, &mytest_err, "/d/e/f/h/junk"); 1834 addTest(root, &mytest, "/a/b/c/d/another"); 1835 addTest(root, &mytest, "/a/b/c/etest"); 1836 addTest(root, &mytest_err, "/a/b/c"); 1837 addTest(root, &mytest, "/bertrand/andre/damiba"); 1838 addTest(root, &mytest_err, "/bertrand/andre/OJSimpson"); 1839 addTest(root, &mytest, "/bertrand/andre/juice/oj"); 1840 addTest(root, &mytest, "/bertrand/andre/juice/prune"); 1841 addTest(root, &mytest_verbose, "/verbose"); 1842 1843} 1844 1845int main(int argc, const char *argv[]) 1846{ 1847 TestNode *root = NULL; 1848 1849 add_tests(&root); /* address of root ptr- will be filled in */ 1850 1851 /* Run the tests. An int is returned suitable for the OS status code. 1852 (0 for success, neg for parameter errors, positive for the # of 1853 failed tests) */ 1854 return processArgs( root, argc, argv ); 1855} 1856``` 1857 1858## C++ IntlTest Test Suite Documentation 1859 1860The IntlTest suite contains all of the tests for the C++ API of International 1861Components for Unicode. These tests may be automatically run by typing `intltest` 1862at the command line. Since the verbose option prints out a considerable amount 1863of information, it is recommended that the output be redirected to a file: 1864`intltest -v > testOutput`. 1865 1866### Building IntlTest 1867 1868To compile this test suite using MSVC, follow the instructions for building the 1869`alCPP` (All C++ interfaces) workspace. This builds the libraries as well as the 1870`intltest` executable. 1871 1872### Executing IntelTest 1873 1874To run the test suite from the command line, change the directories to 1875`icu4c/source/test/intltest/Debug`, then type: `intltest -v >testOutput`. For the 1876release build, the executable will reside in the 1877`icu4c/source/test/intltest/Release` directory. 1878 1879### IntelTest Usage 1880 1881Type just `intltest -h` to see the usage: 1882 1883```text 1884### Syntax: 1885### IntlTest [-option1 -option2 ...] [testname1 testname2 ...] 1886### where options are: verbose (v), all (a), noerrormsg (n), 1887### exhaustive (e) and leaks (l). 1888### (Specify either -all (shortcut -a) or a test name). 1889### -all will run all of the tests. 1890### 1891### To get a list of the test names type: intltest LIST 1892### To run just the utility tests type: intltest utility 1893### 1894### Test names can be nested using slashes ("testA/subtest1") 1895### For example to list the utility tests type: intltest utility/LIST 1896### To run just the Locale test type: intltest utility/LocaleTest 1897### 1898### A parameter can be specified for a test by appending '@' and the value 1899### to the testname. 1900``` 1901 1902## C: Testing with Fake Time 1903 1904The "Fake Time" capability allows ICU4C to be tested as if the hardware clock is 1905set to a specific time. This section documents how to use this facility. 1906Note that this facility requires the POSIX `'gettimeofday'` function to be 1907operable. 1908 1909This facility affects all ICU 'current time' calculations, including date, 1910calendar, time zone formats, and relative formats. It doesn't affect any calls 1911directly to the underlying operating system. 1912 19131. Build ICU with the **`U_DEBUG_FAKETIME`** preprocessor macro set. This can 1914 be accomplished with the following line in a file 1915 **icu/source/icudefs.local** : 1916 1917 ```shell 1918 CPPFLAGS+=-DU_DEBUG_FAKETIME 1919 ``` 1920 19212. Determine the `UDate` value (the time value in milliseconds ± Midnight, Jan 1, 1922 1970 GMT) which you want to use as the target. For this sample we will use 1923 the value `28800000`, which is Midnight, Pacific Standard Time 1/1/1970. 19243. Set the environment variable `U_FAKETIME_START=28800000` 19254. Now, the first time ICU checks the current time, it will start at midnight 1926 1/1/1970 (pacific time) and roll forward. So, at the end of 10 seconds of 1927 program runtime, the clock will appear to be at 12:00:10. 19285. You can test this by running the utility '`icuinfo -m`' which will print out 1929 the 'Milliseconds since Epoch'. 19306. You can also test this by running the cintltest test 1931 `/tsformat/ccaltst/TestCalendar` in verbose mode which will print out the 1932 current time: 1933 1934 ```shell 1935 $ make check ICUINFO_OPTS=-m U_FAKETIME_START=28800000 CINTLTST_OPTS=-v 1936 /tsformat/ccaltst/TestCalendar 1937 U_DEBUG_FAKETIME was set at compile time, so the ICU clock will start at a 1938 preset value 1939 env variable U_FAKETIME_START=28800000 (28800000) for an offset of 1940 -1281957858861 ms from the current time 1281986658861 1941 PASS: The current date and time fetched is Thursday, January 1, 1970 12:00:00 1942 ``` 1943 1944## C: Threading Tests 1945 1946Threading tests for ICU4C functions should be placed in under utility / 1947`MultithreadTest`, in the file `intltest/tsmthred.h` and `.cpp`. See the existing 1948tests in this file for examples. 1949 1950Tests from this location are automatically run under the [Thread 1951Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual) 1952(TSAN) in the ICU continuous build system. TSAN will reliably detect race 1953conditions that could possibly occur, however improbable that occurrence might 1954be normally. 1955 1956Data races are one of the most common and hardest to debug types of bugs in 1957concurrent systems. A data race occurs when two threads access the same variable 1958concurrently and at least one of the accesses is write. The C++11 standard 1959officially bans data races as undefined behavior. 1960 1961## Binary Data Formats 1962 1963ICU services rely heavily on data to perform their functions. Such data is 1964available in various more or less structured text file formats, which make it 1965easy to update and maintain. For high runtime performance, most data items are 1966pre-built into binary formats, i.e., they are parsed and processed once and then 1967stored in a format that is used directly during processing. 1968 1969Most of the data items are pre-built into binary files that are then installed 1970on a user's machine. Some data can also be built at runtime but is not 1971persistent. In the latter case, a primary object should be built once and then 1972cloned to avoid the multiple parsing, processing, and building of the same data. 1973 1974Binary data formats for ICU must be portable across platforms that share the 1975same endianness and the same charset family (ASCII vs. EBCDIC). It would be 1976possible to handle data from other platform types, but that would require 1977load-time or even runtime conversion. 1978 1979### Data Types 1980 1981Binary data items are memory-mapped, i.e., they are used as readonly, constant 1982data. Their structures must be portable according to the criteria above and 1983should be efficiently usable at runtime without building additional runtime data 1984structures. 1985 1986Most native C/C++ data types cannot be used as part of binary data formats 1987because their sizes are not fixed across compilers. For example, an int could be 198816/32/64 or even any other number of bits wide. Only types with absolutely known 1989widths and semantics must be used. 1990 1991Use for example: 1992 1993* `uint8_t`, `uint16_t`, `int32_t` etc. 1994* `UBool`: same as `int8_t` 1995* `UChar`: for 16-bit Unicode strings 1996* `UChar32`: for Unicode code points 1997* `char`: for "invariant characters", see `utypes.h` 1998 1999> :point_right: **Note**: ICU assumes that `char` is an 8-bit byte but makes no 2000assumption about its signedness. 2001 2002**Do not use** for example: 2003 2004* `short`, `int`, `long`, `unsigned int` etc.: undefined widths 2005* `float`, `double`: undefined formats 2006* `bool`: undefined width and signedness 2007* `enum`: undefined width and signedness 2008* `wchar_t`: undefined width, signedness and encoding/charset 2009 2010Each field in a binary/mappable data format must be aligned naturally. This 2011means that a field with a primitive type of size n bytes must be at an n-aligned 2012offset from the start of the data block. `UChar` must be 2-aligned, `int32_t` must 2013be 4-aligned, etc. 2014 2015It is possible to use struct types, but one must make sure that each field is 2016naturally aligned, without possible implicit field padding by the compiler — 2017assuming a reasonable compiler. 2018 2019```c++ 2020// bad because i will be preceded by compiler-dependent padding 2021// for proper alignment 2022struct BadExample { 2023 UBool flag; 2024 int32_t i; 2025}; 2026 2027// ok with explicitly added padding or generally conscious 2028// sequence of types 2029struct OKExample { 2030 UBool flag; 2031 uint8_t pad[3]; 2032 int32_t i; 2033}; 2034``` 2035 2036Within the binary data, a `struct` type field must be aligned according to its 2037widest member field. The struct `OKExample` must be 4-aligned because it contains 2038an `int32_t` field. Make padding explicit via additional fields, rather than 2039letting the compiler choose optional padding. 2040 2041Another potential problem with `struct` types, especially in C++, is that some 2042compilers provide RTTI for all classes and structs, which inserts a `_vtable` 2043pointer before the first declared field. When using `struct` types with 2044binary/mappable data in C++, assert in some place in the code that `offsetof` the 2045first field is 0. For an example see the genpname tool. 2046 2047### Versioning 2048 2049ICU data files have a `UDataHeader` structure preceding the actual data. Among 2050other fields, it contains a `formatVersion` field with four parts (one `uint8_t` 2051each). It is best to use only the first (major) or first and second 2052(major/minor) fields in the runtime code to determine binary compatibility, 2053i.e., reject a data item only if its `formatVersion` contains an unrecognized 2054major (or major/minor) version number. The following parts of the version should 2055be used to indicate variations in the format that are backward compatible, or 2056carry other information. 2057 2058For example, the current `uprops.icu` file's `formatVersion` (see the genprops tool 2059and `uchar.c`/`uprops.c`) is set to indicate backward-incompatible changes with the 2060major version number, backward-compatible additions with the minor version 2061number, and shift width constants for the `UTrie` data structure in the third and 2062fourth version numbers (these could change independently of the `uprops.icu` 2063format). 2064 2065## C/C++ Debugging Hints and Tips 2066 2067### Makefile-based platforms 2068 2069* use `Makefile.local` files (override of `Makefile`), or `icudefs.local` (at the 2070 top level, override of `icudefs.mk`) to avoid the need to modify 2071 change-controlled source files with debugging information. 2072 * Example: **`CPPFLAGS+=-DUDATA_DEBUG`** in common to enable data 2073 debugging 2074 * Example: **`CINTLTST_OPTS=/tscoll`** in the cintltst directory provides 2075 arguments to the cintltest test upon make check, to only run collation 2076 tests. 2077 * intltest: `INTLTEST_OPTS` 2078 * cintltst: `CINTLTST_OPTS` 2079 * iotest: `IOTEST_OPTS` 2080 * icuinfo: `ICUINFO_OPTS` 2081 * (letest does not have an OPTS variable as of ICU 4.6.) 2082 2083### Windows/Microsoft Visual Studio 2084 2085The following addition to autoexp.dat will cause **`UnicodeString`**s to be 2086visible as strings in the debugger without expanding sub-items: 2087 2088```text 2089;; Copyright (C) 2010 IBM Corporation and Others. All Rights Reserved. 2090;; ICU Additions 2091;; Add to {VISUAL STUDIO} \Common7\Packages\Debugger\autoexp.dat 2092;; in the [autoexpand] section just before the final [hresult] section. 2093;; 2094;; Need to change 'icu_##' to the current major+minor (so icu_46 for 4.6.1 etc) 2095 2096icu_46::UnicodeString { 2097 preview ( 2098 #if($e.fFlags & 2) ; stackbuffer 2099 ( 2100 #( 2101 "U= '", 2102 [$e.fUnion.fStackBuffer, su], 2103 "', len=", 2104 [$e.fShortLength, u] 2105 ;[$e.fFields.fArray, su] 2106 ) 2107 ) 2108 #else 2109 ( 2110 #( 2111 "U* '", 2112 [$e.fUnion.fFields.fArray, su], 2113 "', len=", 2114 [$e.fShortLength, u] 2115 ;[$e.fFields.fArray, su] 2116 ) 2117 ) 2118 ) 2119 2120 stringview ( 2121 #if($e.fFlags & 2) ; stackbuffer 2122 ( 2123 #( 2124 "U= '", 2125 [$e.fUnion.fStackBuffer, su], 2126 "', len=", 2127 [$e.fShortLength, u] 2128 ;[$e.fFields.fArray, su] 2129 ) 2130 ) 2131 #else 2132 ( 2133 #( 2134 "U* '", 2135 [$e.fUnion.fFields.fArray, su], 2136 "', len=", 2137 [$e.fShortLength, u] 2138 ;[$e.fFields.fArray, su] 2139 ) 2140 ) 2141 ) 2142 2143} 2144;;; 2145;;; End ICU Additions 2146;;; 2147``` 2148