12e5b6d6dSopenharmony_ci---
22e5b6d6dSopenharmony_cilayout: default
32e5b6d6dSopenharmony_cititle: BiDi Algorithm
42e5b6d6dSopenharmony_cinav_order: 2
52e5b6d6dSopenharmony_ciparent: Transforms
62e5b6d6dSopenharmony_ci---
72e5b6d6dSopenharmony_ci<!--
82e5b6d6dSopenharmony_ci© 2020 and later: Unicode, Inc. and others.
92e5b6d6dSopenharmony_ciLicense & terms of use: http://www.unicode.org/copyright.html
102e5b6d6dSopenharmony_ci-->
112e5b6d6dSopenharmony_ci
122e5b6d6dSopenharmony_ci# BiDi Algorithm
132e5b6d6dSopenharmony_ci{: .no_toc }
142e5b6d6dSopenharmony_ci
152e5b6d6dSopenharmony_ci## Contents
162e5b6d6dSopenharmony_ci{: .no_toc .text-delta }
172e5b6d6dSopenharmony_ci
182e5b6d6dSopenharmony_ci1. TOC
192e5b6d6dSopenharmony_ci{:toc}
202e5b6d6dSopenharmony_ci
212e5b6d6dSopenharmony_ci---
222e5b6d6dSopenharmony_ci
232e5b6d6dSopenharmony_ci## Overview
242e5b6d6dSopenharmony_ci
252e5b6d6dSopenharmony_ciBidirectional text consists of mainly right-to-left text with some left-to-right
262e5b6d6dSopenharmony_cinested segments (such as an Arabic text with some information in English), or
272e5b6d6dSopenharmony_civice versa (such as an English letter with a Hebrew address nested within it.)
282e5b6d6dSopenharmony_ciThe predominant direction is called the global orientation.
292e5b6d6dSopenharmony_ci
302e5b6d6dSopenharmony_ciLanguages involving bidirectional text are used mainly in the Middle East. They
312e5b6d6dSopenharmony_ciinclude Arabic, Urdu, Persian, Hebrew, and Yiddish.
322e5b6d6dSopenharmony_ci
332e5b6d6dSopenharmony_ciIn such a language, the general flow of text proceeds horizontally from right to
342e5b6d6dSopenharmony_cileft, but numbers are written from left to right, the same way as they are
352e5b6d6dSopenharmony_ciwritten in English. In addition, if some text (addresses, acronyms, or
362e5b6d6dSopenharmony_ciquotations) in English or another left-to-right language is embedded, it is also
372e5b6d6dSopenharmony_ciwritten from left to right.
382e5b6d6dSopenharmony_ci
392e5b6d6dSopenharmony_ci* Libraries that perform a bidirectional algorithm and reorder strings
402e5b6d6dSopenharmony_ciaccordingly are sometimes called "Storage Layout Engines". ICU's BiDi (ubidi.h)
412e5b6d6dSopenharmony_ciand shaping (ushape.h) APIs can be used at the core of such "Storage Layout
422e5b6d6dSopenharmony_ciEngines". *
432e5b6d6dSopenharmony_ci
442e5b6d6dSopenharmony_ci## Countries with Languages that Require Bidirectional Scripting
452e5b6d6dSopenharmony_ci
462e5b6d6dSopenharmony_ciThere are over 600 million people whose languages are written right-to-left, including
472e5b6d6dSopenharmony_ciPersian and Urdu which use the Arabic script with additional characters.
482e5b6d6dSopenharmony_ci
492e5b6d6dSopenharmony_ci| Language | Countries (examples) |
502e5b6d6dSopenharmony_ci|----------|------------------------------------------------------|
512e5b6d6dSopenharmony_ci| Arabic   | Egypt, Jordan, Morocco, Saudi Arabia, ... Middle East & North Africa |
522e5b6d6dSopenharmony_ci| Persian  | Iran, Afghanistan |
532e5b6d6dSopenharmony_ci| Urdu     | India, Pakistan |
542e5b6d6dSopenharmony_ci| Hebrew   | Israel |
552e5b6d6dSopenharmony_ci| Yiddish  | Israel, North America, South America, Russia, Europe |
562e5b6d6dSopenharmony_ci
572e5b6d6dSopenharmony_ciThis list of languages is far from complete. Other languages with RTL scripts include
582e5b6d6dSopenharmony_ciDivehi (Maldives), Kurdish (Iraq), Kashmiri (India), Sindhi (Pakistan and India), Uighur (China), and Pashto (Afghanistan), etc.
592e5b6d6dSopenharmony_ci
602e5b6d6dSopenharmony_ci## Logical Order versus Visual Order
612e5b6d6dSopenharmony_ci
622e5b6d6dSopenharmony_ciWhen reading bidirectional text, whenever the eye of the experienced reader
632e5b6d6dSopenharmony_ciencounters an embedded segment, it "automatically" jumps to the other end of the
642e5b6d6dSopenharmony_cisegment and reads it in the opposite direction. The sequence in which the
652e5b6d6dSopenharmony_cicharacters are pronounced is thus a logical sequence which differs from the
662e5b6d6dSopenharmony_civisual sequence in which they are presented on the screen or page.
672e5b6d6dSopenharmony_ci
682e5b6d6dSopenharmony_ciThe logical order of bidirectional text is also the order in which it is usually
692e5b6d6dSopenharmony_cikeyed, and in which it is stored in memory.
702e5b6d6dSopenharmony_ci
712e5b6d6dSopenharmony_ciConsider the following example, where Arabic or Hebrew letters are represented
722e5b6d6dSopenharmony_ciby uppercase English letters and English text is represented by lowercase
732e5b6d6dSopenharmony_ciletters:
742e5b6d6dSopenharmony_ci
752e5b6d6dSopenharmony_ci    english CIBARA text
762e5b6d6dSopenharmony_ci
772e5b6d6dSopenharmony_ciThe English letter h is visually followed by the Arabic letter C, but logically
782e5b6d6dSopenharmony_cih is followed by the rightmost letter A. The next letter, in logical order, will
792e5b6d6dSopenharmony_cibe R. In other words, the logical and storage order of the same text would be:
802e5b6d6dSopenharmony_ci
812e5b6d6dSopenharmony_ci    english ARABIC text
822e5b6d6dSopenharmony_ci
832e5b6d6dSopenharmony_ciText is stored and processed in logical order to make processing feasible: A
842e5b6d6dSopenharmony_cicontiguous substring of logical-order text (e.g., from a copy&paste operation)
852e5b6d6dSopenharmony_cicontains a logically contiguous piece of the text. For example, "ish ARA" is a
862e5b6d6dSopenharmony_cilogically contiguous piece of the sample text above. By contrast, a contiguous
872e5b6d6dSopenharmony_cisubstring of visual-order text may contain pieces of the text from distant parts
882e5b6d6dSopenharmony_ciof a paragraph. ("ish" and "CIB" from the sample text above are not logically
892e5b6d6dSopenharmony_ciadjacent.) Sorting and searching in text (establishing lexical order among
902e5b6d6dSopenharmony_cistrings) as well as any other kind of context-sensitive text analysis also rely
912e5b6d6dSopenharmony_cion the storage of text in logical order because such processing must match user
922e5b6d6dSopenharmony_ciexpectations.
932e5b6d6dSopenharmony_ci
942e5b6d6dSopenharmony_ciWhen text is displayed or printed, it must be "reordered" into visual order with
952e5b6d6dSopenharmony_cisome parts of the text laid out left-to-right, and other parts laid out
962e5b6d6dSopenharmony_ciright-to-left. The Unicode standard specifies an algorithm for this
972e5b6d6dSopenharmony_cilogical-to-visual reordering. It always works on a paragraph as a whole; the
982e5b6d6dSopenharmony_ciactual positioning of the text on the screen or paper must then take line breaks
992e5b6d6dSopenharmony_ciinto account, based on the output of the bidirectional algorithm. The reordering
1002e5b6d6dSopenharmony_cioutput is also used for cursor movement and selection.
1012e5b6d6dSopenharmony_ci
1022e5b6d6dSopenharmony_ciLegacy systems frequently stored text in visual order to avoid reordering for
1032e5b6d6dSopenharmony_cidisplay. When exchanging data with such systems for processing in Unicode it is
1042e5b6d6dSopenharmony_cinecessary to reorder the data from visual order to logical order and back. Such
1052e5b6d6dSopenharmony_cinot-for-display transformations are sometimes referred to as "storage layout"
1062e5b6d6dSopenharmony_citransformations.
1072e5b6d6dSopenharmony_ci
1082e5b6d6dSopenharmony_ciThe are two problems with an "inverse reordering" from visual to logical order:
1092e5b6d6dSopenharmony_ciThere may be more than one logical order of text that results in the same
1102e5b6d6dSopenharmony_cidisplay (logical-to-visual reordering is a many-to-one function), and there is
1112e5b6d6dSopenharmony_cino standard algorithm for it. ICU's BiDi API provides a setting for "inverse"
1122e5b6d6dSopenharmony_cioperation that modifies the standard Unicode Bidi algorithm. However, it may not
1132e5b6d6dSopenharmony_cialways produce the expected results. Bidirectional data should be converted to
1142e5b6d6dSopenharmony_ciUnicode and reordered to logical order only once to avoid roundtrip losses. Just
1152e5b6d6dSopenharmony_cias it is best to never convert to non-Unicode charsets, data should not be
1162e5b6d6dSopenharmony_cireordered from logical to visual order except for display and printing.
1172e5b6d6dSopenharmony_ci
1182e5b6d6dSopenharmony_ci## References
1192e5b6d6dSopenharmony_ci
1202e5b6d6dSopenharmony_ciICU provides an implementation of the Unicode BiDi algorithm, as well as simple
1212e5b6d6dSopenharmony_cifunctions to write a reordered version of the string using the generated
1222e5b6d6dSopenharmony_cimeta-data. An "inverse" flag can be set to **approximate** visual-to-logical
1232e5b6d6dSopenharmony_cireordering. See the ubidi.h header file and the [BiDi API
1242e5b6d6dSopenharmony_ciReferences](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html) .
1252e5b6d6dSopenharmony_ci
1262e5b6d6dSopenharmony_ciSee [Unicode Standard Annex #9: The Bidirectional
1272e5b6d6dSopenharmony_ciAlgorithm](http://www.unicode.org/reports/tr9/) .
1282e5b6d6dSopenharmony_ci
1292e5b6d6dSopenharmony_ci## Programming Examples in C and C++
1302e5b6d6dSopenharmony_ci
1312e5b6d6dSopenharmony_ciSee the [BiDi API reference](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html)
1322e5b6d6dSopenharmony_cifor more information.
133