12e5b6d6dSopenharmony_ci--- 22e5b6d6dSopenharmony_cilayout: default 32e5b6d6dSopenharmony_cititle: BiDi Algorithm 42e5b6d6dSopenharmony_cinav_order: 2 52e5b6d6dSopenharmony_ciparent: Transforms 62e5b6d6dSopenharmony_ci--- 72e5b6d6dSopenharmony_ci<!-- 82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others. 92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html 102e5b6d6dSopenharmony_ci--> 112e5b6d6dSopenharmony_ci 122e5b6d6dSopenharmony_ci# BiDi Algorithm 132e5b6d6dSopenharmony_ci{: .no_toc } 142e5b6d6dSopenharmony_ci 152e5b6d6dSopenharmony_ci## Contents 162e5b6d6dSopenharmony_ci{: .no_toc .text-delta } 172e5b6d6dSopenharmony_ci 182e5b6d6dSopenharmony_ci1. TOC 192e5b6d6dSopenharmony_ci{:toc} 202e5b6d6dSopenharmony_ci 212e5b6d6dSopenharmony_ci--- 222e5b6d6dSopenharmony_ci 232e5b6d6dSopenharmony_ci## Overview 242e5b6d6dSopenharmony_ci 252e5b6d6dSopenharmony_ciBidirectional text consists of mainly right-to-left text with some left-to-right 262e5b6d6dSopenharmony_cinested segments (such as an Arabic text with some information in English), or 272e5b6d6dSopenharmony_civice versa (such as an English letter with a Hebrew address nested within it.) 282e5b6d6dSopenharmony_ciThe predominant direction is called the global orientation. 292e5b6d6dSopenharmony_ci 302e5b6d6dSopenharmony_ciLanguages involving bidirectional text are used mainly in the Middle East. They 312e5b6d6dSopenharmony_ciinclude Arabic, Urdu, Persian, Hebrew, and Yiddish. 322e5b6d6dSopenharmony_ci 332e5b6d6dSopenharmony_ciIn such a language, the general flow of text proceeds horizontally from right to 342e5b6d6dSopenharmony_cileft, but numbers are written from left to right, the same way as they are 352e5b6d6dSopenharmony_ciwritten in English. In addition, if some text (addresses, acronyms, or 362e5b6d6dSopenharmony_ciquotations) in English or another left-to-right language is embedded, it is also 372e5b6d6dSopenharmony_ciwritten from left to right. 382e5b6d6dSopenharmony_ci 392e5b6d6dSopenharmony_ci* Libraries that perform a bidirectional algorithm and reorder strings 402e5b6d6dSopenharmony_ciaccordingly are sometimes called "Storage Layout Engines". ICU's BiDi (ubidi.h) 412e5b6d6dSopenharmony_ciand shaping (ushape.h) APIs can be used at the core of such "Storage Layout 422e5b6d6dSopenharmony_ciEngines". * 432e5b6d6dSopenharmony_ci 442e5b6d6dSopenharmony_ci## Countries with Languages that Require Bidirectional Scripting 452e5b6d6dSopenharmony_ci 462e5b6d6dSopenharmony_ciThere are over 600 million people whose languages are written right-to-left, including 472e5b6d6dSopenharmony_ciPersian and Urdu which use the Arabic script with additional characters. 482e5b6d6dSopenharmony_ci 492e5b6d6dSopenharmony_ci| Language | Countries (examples) | 502e5b6d6dSopenharmony_ci|----------|------------------------------------------------------| 512e5b6d6dSopenharmony_ci| Arabic | Egypt, Jordan, Morocco, Saudi Arabia, ... Middle East & North Africa | 522e5b6d6dSopenharmony_ci| Persian | Iran, Afghanistan | 532e5b6d6dSopenharmony_ci| Urdu | India, Pakistan | 542e5b6d6dSopenharmony_ci| Hebrew | Israel | 552e5b6d6dSopenharmony_ci| Yiddish | Israel, North America, South America, Russia, Europe | 562e5b6d6dSopenharmony_ci 572e5b6d6dSopenharmony_ciThis list of languages is far from complete. Other languages with RTL scripts include 582e5b6d6dSopenharmony_ciDivehi (Maldives), Kurdish (Iraq), Kashmiri (India), Sindhi (Pakistan and India), Uighur (China), and Pashto (Afghanistan), etc. 592e5b6d6dSopenharmony_ci 602e5b6d6dSopenharmony_ci## Logical Order versus Visual Order 612e5b6d6dSopenharmony_ci 622e5b6d6dSopenharmony_ciWhen reading bidirectional text, whenever the eye of the experienced reader 632e5b6d6dSopenharmony_ciencounters an embedded segment, it "automatically" jumps to the other end of the 642e5b6d6dSopenharmony_cisegment and reads it in the opposite direction. The sequence in which the 652e5b6d6dSopenharmony_cicharacters are pronounced is thus a logical sequence which differs from the 662e5b6d6dSopenharmony_civisual sequence in which they are presented on the screen or page. 672e5b6d6dSopenharmony_ci 682e5b6d6dSopenharmony_ciThe logical order of bidirectional text is also the order in which it is usually 692e5b6d6dSopenharmony_cikeyed, and in which it is stored in memory. 702e5b6d6dSopenharmony_ci 712e5b6d6dSopenharmony_ciConsider the following example, where Arabic or Hebrew letters are represented 722e5b6d6dSopenharmony_ciby uppercase English letters and English text is represented by lowercase 732e5b6d6dSopenharmony_ciletters: 742e5b6d6dSopenharmony_ci 752e5b6d6dSopenharmony_ci english CIBARA text 762e5b6d6dSopenharmony_ci 772e5b6d6dSopenharmony_ciThe English letter h is visually followed by the Arabic letter C, but logically 782e5b6d6dSopenharmony_cih is followed by the rightmost letter A. The next letter, in logical order, will 792e5b6d6dSopenharmony_cibe R. In other words, the logical and storage order of the same text would be: 802e5b6d6dSopenharmony_ci 812e5b6d6dSopenharmony_ci english ARABIC text 822e5b6d6dSopenharmony_ci 832e5b6d6dSopenharmony_ciText is stored and processed in logical order to make processing feasible: A 842e5b6d6dSopenharmony_cicontiguous substring of logical-order text (e.g., from a copy&paste operation) 852e5b6d6dSopenharmony_cicontains a logically contiguous piece of the text. For example, "ish ARA" is a 862e5b6d6dSopenharmony_cilogically contiguous piece of the sample text above. By contrast, a contiguous 872e5b6d6dSopenharmony_cisubstring of visual-order text may contain pieces of the text from distant parts 882e5b6d6dSopenharmony_ciof a paragraph. ("ish" and "CIB" from the sample text above are not logically 892e5b6d6dSopenharmony_ciadjacent.) Sorting and searching in text (establishing lexical order among 902e5b6d6dSopenharmony_cistrings) as well as any other kind of context-sensitive text analysis also rely 912e5b6d6dSopenharmony_cion the storage of text in logical order because such processing must match user 922e5b6d6dSopenharmony_ciexpectations. 932e5b6d6dSopenharmony_ci 942e5b6d6dSopenharmony_ciWhen text is displayed or printed, it must be "reordered" into visual order with 952e5b6d6dSopenharmony_cisome parts of the text laid out left-to-right, and other parts laid out 962e5b6d6dSopenharmony_ciright-to-left. The Unicode standard specifies an algorithm for this 972e5b6d6dSopenharmony_cilogical-to-visual reordering. It always works on a paragraph as a whole; the 982e5b6d6dSopenharmony_ciactual positioning of the text on the screen or paper must then take line breaks 992e5b6d6dSopenharmony_ciinto account, based on the output of the bidirectional algorithm. The reordering 1002e5b6d6dSopenharmony_cioutput is also used for cursor movement and selection. 1012e5b6d6dSopenharmony_ci 1022e5b6d6dSopenharmony_ciLegacy systems frequently stored text in visual order to avoid reordering for 1032e5b6d6dSopenharmony_cidisplay. When exchanging data with such systems for processing in Unicode it is 1042e5b6d6dSopenharmony_cinecessary to reorder the data from visual order to logical order and back. Such 1052e5b6d6dSopenharmony_cinot-for-display transformations are sometimes referred to as "storage layout" 1062e5b6d6dSopenharmony_citransformations. 1072e5b6d6dSopenharmony_ci 1082e5b6d6dSopenharmony_ciThe are two problems with an "inverse reordering" from visual to logical order: 1092e5b6d6dSopenharmony_ciThere may be more than one logical order of text that results in the same 1102e5b6d6dSopenharmony_cidisplay (logical-to-visual reordering is a many-to-one function), and there is 1112e5b6d6dSopenharmony_cino standard algorithm for it. ICU's BiDi API provides a setting for "inverse" 1122e5b6d6dSopenharmony_cioperation that modifies the standard Unicode Bidi algorithm. However, it may not 1132e5b6d6dSopenharmony_cialways produce the expected results. Bidirectional data should be converted to 1142e5b6d6dSopenharmony_ciUnicode and reordered to logical order only once to avoid roundtrip losses. Just 1152e5b6d6dSopenharmony_cias it is best to never convert to non-Unicode charsets, data should not be 1162e5b6d6dSopenharmony_cireordered from logical to visual order except for display and printing. 1172e5b6d6dSopenharmony_ci 1182e5b6d6dSopenharmony_ci## References 1192e5b6d6dSopenharmony_ci 1202e5b6d6dSopenharmony_ciICU provides an implementation of the Unicode BiDi algorithm, as well as simple 1212e5b6d6dSopenharmony_cifunctions to write a reordered version of the string using the generated 1222e5b6d6dSopenharmony_cimeta-data. An "inverse" flag can be set to **approximate** visual-to-logical 1232e5b6d6dSopenharmony_cireordering. See the ubidi.h header file and the [BiDi API 1242e5b6d6dSopenharmony_ciReferences](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html) . 1252e5b6d6dSopenharmony_ci 1262e5b6d6dSopenharmony_ciSee [Unicode Standard Annex #9: The Bidirectional 1272e5b6d6dSopenharmony_ciAlgorithm](http://www.unicode.org/reports/tr9/) . 1282e5b6d6dSopenharmony_ci 1292e5b6d6dSopenharmony_ci## Programming Examples in C and C++ 1302e5b6d6dSopenharmony_ci 1312e5b6d6dSopenharmony_ciSee the [BiDi API reference](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html) 1322e5b6d6dSopenharmony_cifor more information. 133