1# Ark Bytecode File Format 2The Ark bytecode file is a binary product compiled by ArkTS/TS/JS. This topic describes the Ark bytecode file format in detail, aiming to introduce each part of the bytecode and guide you to analyze and modify it. 3 4 5## Constraints 6This topic applies only to Ark bytecode whose version number is 11.0.2.0. (The version number is an internal reserved field of the Ark compiler.) 7 8 9## Data Types of Bytecode File 10 11### Integer 12 13| **Name** | **Description** | 14| -------------- | ---------------------------------- | 15| `uint8_t` | 8-bit unsigned integer. | 16| `uint16_t` | 16-bit unsigned integer in little-endian mode. | 17| `uint32_t` | 32-bit unsigned integer in little-endian mode. | 18| `uleb128` | Leb128-encoded unsigned integer. | 19| `sleb128` | Leb128-encoded signed integer. | 20 21 22### String 23- Alignment mode: single-byte alignment. 24- Format 25 26| **Name**| **Format**| **Description** | 27| -------------- | -------------- | ------------------------------------------------------------ | 28| `utf16_length` | `uleb128` | The value is **`len << 1 \**| **is_ascii**, where **len** indicates the size of a string encoded by UTF-16, and **is_ascii`** indicates whether the string contains only ASCII characters. The value can be 0 or 1.| 29| `data` | `uint8_t[]` | MUTF-8 encoded character sequence ending with **\0**. | 30 31 32### TaggedValue 33- Alignment mode: single-byte alignment. 34- Format 35 36| **Name**| **Format**| **Description** | 37| -------------- | -------------- | -------------------------------------------- | 38| `tag` | `uint8_t` | Indicates the tag of a data type. | 39| `data` | `uint8_t[]` | According to different tags, **data** is of different types or is empty.| 40 41 42## TypeDescriptor 43**TypeDescriptor** is the format of the class ([Class](#class)) name. Its name **L_ClassName;** is consisted by **'L'**, **'_'**, **ClassName**, and **';'**. In the preceding information, **ClassName** indicates the full name of the class. **'.'** in the name is replaced with **'/'**. 44 45 46## Bytecode File Layout 47The bytecode file is compiled based on the [Header](#header) structure. All structures in the file can be accessed directly or indirectly from the **Header**. The reference modes of the structure in the bytecode file include offset and index. The offset is a 32-bit value, indicating the distance, which is calculated from scratch, between the start position of the current structure and the file header in the bytecode file. An index is a 16-bit value that indicates the position of the current structure in the index area. This mechanism is described in [IndexSection](#indexsection). 48 49All multi-byte values in the bytecode file are in little-endian. 50 51 52### Header 53- Alignment mode: single-byte alignment. 54- Format 55 56| **Name** | **Format**| **Description** | 57| ----------------- | -------------- | ------------------------------------------------------------ | 58| `magic` | `uint8_t[8]` | Value of the magic number must be **'P' 'A' 'N' 'D' 'A' '\0' '\0' '\0'**. | 59| `checksum` | `uint32_t` | **Adler32** checksum of the content in the bytecode file except the magic number and this check field.| 60| `version` | `uint8_t[4]` | Version number of the bytecode file ([Version](#version)).| 61| `file_size` | `uint32_t` | Size of a bytecode file, in bytes. | 62| `foreign_off` | `uint32_t` | An offset that points to an external area. The external area contains two types of elements: [ForeignClass](#foreignclass) or [ForeignMethod](#foreignmethod). **foreign_off** points to the first element in the area.| 63| `foreign_size` | `uint32_t` | Size of the external area, in bytes. | 64| `num_classes` | `uint32_t` | Number of elements in the [ClassIndex](#classindex) structure, that is, the number of [Class](#class) defined in the file.| 65| `class_idx_off` | `uint32_t` | An offset that points to [ClassIndex](#classindex).| 66| `num_lnps` | `uint32_t` | Number of elements in the [LineNumberProgramIndex](#linenumberprogramindex) structure, that is, the number of [Line number program](#line-number-program) defined in the file.| 67| `lnp_idx_off` | `uint32_t` | An offset that points to [LineNumberProgramIndex](#linenumberprogramindex).| 68| `reserved` | `uint32_t` | Reserved field used internally in the Ark bytecode file. | 69| `reserved` | `uint32_t` | Reserved field used internally in the Ark bytecode file. | 70| `num_index_regions` | `uint32_t` | Number of elements in the [IndexSection](#indexsection) structure, that is, the number of [IndexHeader](#indexheader) in the file.| 71| `index_section_off` | `uint32_t` | An offset that points to [IndexSection](#indexsection).| 72 73 74### Version 75The bytecode version number consists of four parts in the format of **major version number.minor version number.feature version number.Build version number**. 76 77| **Name**| **Format**| **Description** | 78| -------------- | -------------- | ---------------------------------------------------------- | 79| Major version number | `uint8_t` | Indicates the bytecode file format change caused by the overall structure adjustment. | 80| Minor version number | `uint8_t` | Indicates the bytecode file format change caused by partial structure adjustment or major feature adjustment.| 81| Feature version number | `uint8_t` | Indicates the bytecode file format change caused by small- and medium-sized features. | 82| Build version number | `uint8_t` | Indicates the bytecode file format change caused by defect rectification. | 83 84 85### ForeignClass 86Describes the enclosing classes in the bytecode file. They are declared in other files and referenced in the current bytecode file. 87- Alignment mode: single-byte alignment. 88- Format 89 90| **Name**| **Format**| **Description** | 91| -------------- | -------------- | ------------------------------------------------------------ | 92| `name` | `String` | Name of the enclosing class, which follows the [TypeDescriptor](#typedescriptor) syntax.| 93 94 95### ForeignMethod 96Describes external methods in bytecode files. They are declared in other files and referenced in the current bytecode file. 97- Alignment mode: single-byte alignment. 98- Format 99 100| **Name**| **Format**| **Description** | 101| -------------- | -------------- | ------------------------------------------------------------ | 102| `class_idx` | `uint16_t` | An index pointing to the class to which the method belongs. It points to a position in [ClassRegionIndex](#classregionindex), whose value is an offset pointing to [Class](#class) or [ForeignClass](#foreignclass).| 103| `reserved` | `uint16_t` | Reserved field used internally in the Ark bytecode file. | 104| `name_off` | `uint32_t` | An offset that points to [string](#string), indicating the method name.| 105| `index_data` | `uleb128` | [MethodIndexData](#methodindexdata) data of the method.| 106 107**Note:**<br> 108With the offset of **ForeignMethod**, the appropriate **IndexHeader** can be found to parse the **class_idx**. 109 110 111### ClassIndex 112The **ClassIndex** structure is used to quickly locate the definition of the **Class** by name. 113- Alignment mode: 4-byte alignment. 114- Format 115 116| **Name**| **Format**| **Description** | 117| -------------- | -------------- | ------------------------------------------------------------ | 118| `offsets` | `uint32_t[]` | An array. The value of each element in this array is an offset pointing to [Class](#class). Elements in an array are sorted by class name. This name follows the [TypeDescriptor](#typedescriptor) syntax. The array length is specified by **num_classes** in [Header](#header).| 119 120 121### Class 122In a bytecode file, a class can represent a source code file of Ark bytecode or a built-in [Annotation](#annotation). When it indicates a source code file, the method of the class corresponds to the function in the source code file, and class field corresponds to the internal information in the source file. When it indicates a built-in **Annotation**, the class does not contain the field or method. A class in the source code file is represented in the bytecode file as a method corresponding to its constructor. 123 124- Alignment mode: single-byte alignment. 125- Format 126 127| **Name**| **Format**| **Description** | 128| -------------- | -------------- | ------------------------------------------------------------ | 129| `name` | `String` | Class name, which follows the [TypeDescriptor](#typedescriptor) syntax.| 130| `reserved` | `uint32_t` | Reserved field used internally in the Ark bytecode file. | 131| `access_flags` | `uleb128` | Accessing tag of **Class**, which is a combination of [ClassAccessFlag](#classaccessflag).| 132| `num_fields` | `uleb128` | Number of fields of **Class**. | 133| `num_methods` | `uleb128` | Number of methods of **Class**. | 134| `class_data` | `TaggedValue[]` | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [ClassTag](#classtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).| 135| `fields` | `Field[]` | Array of Class fields. Each element in the array is of the [Field](#field) type. The array length is specified by **num_fields**.| 136| `methods` | `Method[]` | Array of Class methods. Each element in the array is of the [Method](#method) type. The array length is specified by `num_methods`.| 137 138 139### ClassAccessFlag 140 141| **Name**| **Value**| **Description** | 142| -------------- | ------------ | ------------------------------------------------------------ | 143| `ACC_PUBLIC` | `0x0001` | Default attribute. [Class](#class) in the Ark bytecode has this tag.| 144| `ACC_ANNOTATION` | `0x2000` | Declares the class as the [Annotation](#annotation) type.| 145 146 147### ClassTag 148- Alignment mode: single-byte alignment. 149- Format 150 151| **Name**| **Value**| **Quantity**| **Format**| **Description** | 152| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ | 153| `NOTHING` | `0x00` | `1` | `none` | The [TaggedValue](#taggedvalue) with this tag is the last item of the **class_data**.| 154| `SOURCE_LANG` | `0x02` | `0-1 ` | `uint8_t` | The **data** of [TaggedValue](#taggedvalue) with this tag is 0, indicating that the source code language is ArkTS, TS, or JS.| 155| `SOURCE_FILE` | `0x07` | `0-1` | `uint32_t`| The **data** of [TaggedValue](#taggedvalue) with this tag is an offset that points to [string](#string), indicating the name of the source file.| 156 157**Note:**<br> 158**ClassTag** is the tag of the element ([TaggedValue](#taggedvalue)) in the **class_data**. The number in the table header refers to the number of occurrences of the element with this tag in the **class_data** of a [Class](#class). 159 160 161### Field 162Describes the fields in the bytecode file. 163 164- Alignment mode: single-byte alignment. 165- Format 166 167| **Name**| **Format**| **Description** | 168| -------------- | -------------- | ------------------------------------------------------------ | 169| `class_idx` | `uint16_t` | An index pointing to the class to which the field belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).| 170| `type_idx` | `uint16_t` | An index that points to the type of the field and points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type.| 171| `name_off` | `uint32_t` | An offset that points to [string](#string), indicating the name of the field.| 172| `reserved` | `uleb128` | Reserved field used internally in the Ark bytecode file. | 173| `field_data` | `TaggedValue[]` | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [FieldTag](#fieldtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).| 174 175**Note:**<br> 176Based on the offset of the **Field**, the appropriate **IndexHeader** can be found to parse the **class_idx** and **type_idx**. 177 178 179### FieldTag 180 181- Alignment mode: single-byte alignment. 182- Format 183 184| **Name**| **Value**| **Quantity**| **Format**| **Description** | 185| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ | 186| `NOTHING` | `0x00` | `1` | `none` | The [TaggedValue](#taggedvalue) with this tag is the last item of the **field_data**.| 187| `INT_VALUE` | `0x01` | `0-1` | `sleb128` | The **data** type of the [TaggedValue](#taggedvalue) with this tag is of **boolean**, **byte**, **char**, **short**, or **int**.| 188| `VALUE` | `0x02` | `0-1` | `uint32_t` | The **data** type of the [TaggedValue](#taggedvalue) with this tag is of **FLOAT** or **ID** in [Value formats](#value-formats).| 189 190**Note:**<br> 191**FieldTag** is the tag of the element ([TaggedValue](#taggedvalue)) in the **field_data**. The number in the table header refers to the number of occurrences of the element with this tag in the **field_data** of a [Field](#field). 192 193 194### Method 195Describes methods in bytecode files. 196 197- Alignment mode: single-byte alignment. 198- Format 199 200| **Name**| **Format**| **Description** | 201| -------------- | -------------- | ------------------------------------------------------------ | 202| `class_idx` | `uint16_t` | An index pointing to the class to which the method belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).| 203| `reserved` | `uint16_t` | Reserved field used internally in the Ark bytecode file. | 204| `name_off` | `uint32_t` | An offset that points to [string](#string), indicating the method name.| 205| `index_data` | `uleb128` | [MethodIndexData](#methodindexdata) data of the method.| 206| `method_data` | `TaggedValue[]` | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [MethodTag](#methodtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).| 207 208**Note:**<br> 209With the offset of **Method**, the appropriate **IndexHeader** can be found to parse the **class_idx**. 210 211 212### MethodIndexData 213**MethodIndexData** is an unsigned 32-bit integer divided into three parts. 214 215| **Bit**| **Name**| **Format**| **Description** | 216| ------------ | -------------- | -------------- | ------------------------------------------------------------ | 217| 0 - 15 | `header_index` | `uint16_t` | Point to a position in [IndexSection](#indexsection). The value of this position is [IndexHeader](#indexheader). You can use **IndexHeader** to find the offsets of all methods ([Method](#method)), [string](#string), or literal arrays ([LiteralArray](#literalarray)) referenced by the method.| 218| 16 - 23 | `function_kind` | `uint8_t` | Function type of a method ([FunctionKind](#functionkind)).| 219| 24 - 31 | `reserved` | `uint8_t` | Reserved field used internally in the Ark bytecode file. | 220 221 222#### FunctionKind 223 224| **Name** | **Value**| **Description** | 225| ------------------------ | ------------ | ---------------- | 226| `FUNCTION` | `0x1` | Common function. | 227| `NC_FUNCTION` | `0x2` | Common arrow function. | 228| `GENERATOR_FUNCTION` | `0x3` | Generator function. | 229| `ASYNC_FUNCTION` | `0x4` | Asynchronous function. | 230| `ASYNC_GENERATOR_FUNCTION` | `0x5` | Asynchronous generator function.| 231| `ASYNC_NC_FUNCTION` | `0x6` | Asynchronous arrow function. | 232| `CONCURRENT_FUNCTION` | `0x7` | Concurrent function. | 233 234 235### MethodTag 236 237| **Name**| **Value**| **Quantity**| **Format**| **Description** | 238| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ | 239| `NOTHING` | `0x00` | `1` | `none` | The [TaggedValue](#taggedvalue) with this tag is the last item of the **method_data**.| 240| `CODE` | `0x01` | `0-1 ` | `uint32_t` | The **data** of [TaggedValue](#taggedvalue) that has this tag is an offset pointing to [Code](#code), indicating the code segment of the method.| 241| `SOURCE_LANG` | `0x02` | `0-1` | `uint8_t` | The **data** of [TaggedValue](#taggedvalue) with this tag is 0, indicating that the source code language is ArkTS, TS, or JS.| 242| `DEBUG_INFO` | `0x05` | `0-1` | `uint32_t` | The **data** of [TaggedValue](#taggedvalue) with this tag is an offset that points to [DebugInfo](#debuginfo) and indicates the debugging information of the method.| 243| `ANNOTATION` | `0x06` | `>=0` | `uint32_t` | The **data** of [TaggedValue](#taggedvalue) that has this tag is an offset that points to [Annotation](#annotation) and indicates the annotation of the method.| 244 245**Note:**<br> 246**MethodTag** is the tag of the element ([TaggedValue](#taggedvalue)) in the **method_data**. The number in the table header refers to the number of occurrences of the element with this tag in the **method_data** of a [Method](#method). 247 248 249### Code 250 251- Alignment mode: single-byte alignment. 252- Format 253 254| **Name**| **Format**| **Description** | 255| -------------- | -------------- | ------------------------------------------------------------ | 256| `num_vregs` | `uleb128` | Number of registers. Registers that store input parameters and default parameters are not counted. | 257| `num_args` | `uleb128` | Total number of input parameters and default parameters. | 258| `code_size` | `uleb128` | Total size of all instructions, in bytes. | 259| `tries_size` | `uleb128` | Length of the **try_blocks** array, that is, the number of [TryBlock](#tryblock). | 260| `instructions` | `uint8_t[]` | Array of all instructions. | 261| `try_blocks` | `TryBlock[]` | An array. Each element in the array is of the **TryBlock** type.| 262 263 264### TryBlock 265 266- Alignment mode: single-byte alignment. 267- Format 268 269| **Name**| **Format**| **Description** | 270| -------------- | -------------- | ------------------------------------------------------------ | 271| `start_pc` | `uleb128` | Offset between the first instruction of the **TryBlock** and the start position of the **instructions** of [Code](#code).| 272| `length` | `uleb128` | Size (in bytes) of the **TryBlock** object to create. | 273| `num_catches` | `uleb128` | Number of [CatchBlock](#catchblock) associated with **TryBlock**. The value is 1.| 274| `catch_blocks` | `CatchBlock[]` | Array of **CatchBlocks** associated with **TryBlock**. The array contains only one **CatchBlock** that can capture all types of exceptions.| 275 276 277### CatchBlock 278 279- Alignment mode: single-byte alignment. 280- Format 281 282| **Name**| **Format**| **Description** | 283| -------------- | -------------- | ----------------------------------------------- | 284| `type_idx` | `uleb128` | If the value is 0, the **CatchBlock** captures all types of exceptions.| 285| `handler_pc` | `uleb128` | Program counter of the first instruction of the exception handling logic. | 286| `code_size` | `uleb128` | Size of the **CatchBlock**, in bytes. | 287 288 289### Annotation 290Describes an annotation structure. 291 292- Alignment mode: single-byte alignment. 293- Format 294 295| **Name**| **Format** | **Description** | 296| -------------- | ------------------- | ------------------------------------------------------------ | 297| `class_idx` | `uint16_t` | An index pointing to the class to which the **Annotation** belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).| 298| `count` | `uint16_t` | Length of the **elements** array. | 299| `elements` | AnnotationElement[] | An array. Each element of the array is of the [AnnotationElement](#annotationelement) type.| 300| `element_types` | `uint8_t[]` | An array. Each element in the array is of the [AnnotationElementTag](#annotationelementtag) type and is used to describe an **AnnotationElement.** The position of each element in the **element_types** array is the same as that of the corresponding **AnnotationElement** in the **elements** array.| 301 302**Note:**<br> 303Based on the **Annotation** offset, an appropriate **IndexHeader** can be found to parse the **class_idx**. 304 305 306### AnnotationElementTag 307 308| **Name**| **Tag**| 309| -------------- | --------- | 310| `u1` | `'1'` | 311| `i8` | `'2'` | 312| `u8` | `'3'` | 313| `i16` | `'4'` | 314| `u16` | `'5'` | 315| `i32` | `'6'` | 316| `u32` | `'7'` | 317| `i64` | `'8'` | 318| `u64` | `'9'` | 319| `f32` | `'A'` | 320| `f64` | `'B'` | 321| `string` | `'C'` | 322| `method` | `'E'` | 323| `annotation` | `'G'` | 324| `literalarray` | `'#'` | 325| `unknown` | `'0'` | 326 327 328### AnnotationElement 329 330- Alignment mode: single-byte alignment. 331- Format 332 333| **Name**| **Format**| **Description** | 334| -------------- | -------------- | ------------------------------------------------------------ | 335| `name_off` | `uint32_t` | An offset that points to [string](#string), indicating the name of the annotation element.| 336| `value` | `uint32_t` | Value of the annotation element. If the width of the value does not exceed 32 bits, the value itself is stored here. Otherwise, the value stored here is an offset pointing to the [Value formats](#value-formats) format.| 337 338 339### Value formats 340Different value types have different value encoding formats, including INTEGER, LONG, FLOAT, DOUBLE, and ID. 341 342| **Name**| **Format**| **Description** | 343| -------------- | -------------- | ------------------------------------------------------------ | 344| `INTEGER` | `uint32_t` | Signed 4-byte integer value. | 345| `LONG` | `uint64_t` | Signed 8-byte integer value. | 346| `FLOAT` | `uint32_t` | 4-byte mode, which is extended to the right zero. The system interprets it as an IEEE754 32-bit floating-point value.| 347| `DOUBLE` | `uint64_t` | 8-byte mode, which is extended to the right zero. The system interprets it as an IEEE754 64-bit floating-point value.| 348| `ID` | `uint32_t` | 4-byte mode, indicating the offset of a structure in a file. | 349 350 351### LineNumberProgramIndex 352The **LineNumberProgramIndex** structure is an array that facilitates the use of a more compact index to access the [Line number program](#line-number-program). 353 354- Alignment mode: 4-byte alignment. 355- Format 356 357| **Name**| **Format**| **Description** | 358| -------------- | -------------- | ------------------------------------------------------------ | 359| `offsets` | `uint32_t[]` | An array in which the value of each element is an offset pointing to a line number program. The array length is specified by **num_lnps** in [Header](#header).| 360 361 362### DebugInfo 363The **DebugInfo** contains the mapping between the program counter of the method and the row and column numbers in the source code, as well as information about local variables. The format of the debugging information evolves from the contents of [DWARF 3.0 Standard](https://dwarfstd.org/dwarf3std.html) (see section 6.2). Based on the execution model of the ([State machine](#state-machine)), the ([Line number program](#line-number-program)) is interpreted to obtain the mapping and local variable information code. To deduplicate programs with the same line number in different methods, all constants referenced in the programs are moved to the constant pool ([Constant pool](#constant-pool)). 364 365- Alignment mode: single-byte alignment. 366- Format 367 368| **Name** | **Format**| **Description** | 369| ----------------------- | -------------- | ------------------------------------------------------------ | 370| `line_start` | `uleb128` | Initial value of the line number register of the state machine. | 371| `num_parameters` | `uleb128` | Total number of input parameters and default parameters. | 372| `parameters` | `uleb128[]` | Array that stores the names of method input parameters. The array length is **num_parameters**. The value of each element is the offset of the string or 0. If the value is 0, the corresponding parameter does not have a name.| 373| `constant_pool_size` | `uleb128` | Size of the constant pool, in bytes. | 374| `constant_pool` | `uleb128[]` | Array for storing constant pool data. The array length is **constant_pool_size**. | 375| `line_number_program_idx` | `uleb128` | An index that points to a position in [LineNumberProgramIndex](#linenumberprogramindex). The value of this position is an offset pointing to [Line number program](#line-number-program). The length of Line number program is variable and ends with the **END_SEQUENCE** operation code.| 376 377 378#### Constant pool 379A constant pool is a structure for storing constants in **DebugInfo**. Many methods have similar line-number programs, which differ only in variable names, variable types, and file names. To deduplicate such line number programs, all constants referenced in the programs are stored in the constant pool. When interpreting the program, the state machine maintains a pointer to the constant pool. When interpreting an instruction that requires constant parameters, the state machine reads the value from the position pointed to by the memory constant pool pointer and then increments the pointer. 380 381 382#### State machine 383The state machine is used to generate [DebugInfo](#debuginfo) information. It contains the following registers. 384 385| **Name** | **Initial Value** | **Description** | 386| ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | 387| `address` | 0 | Program counter (pointing to an instruction of a method), which can only monotonically increase. | 388| `line` | Value of the **line_start** attribute of [DebugInfo](#debuginfo)| Unsigned integer, corresponding to the line number in the source code. All rows are numbered from 1. Therefore, the register value cannot be less than 1.| 389| `column` | 0 | Unsigned integer, corresponding to the column number in the source code. | 390| `file` | Value of **SOURCE_FILE** in **class_data** (see [Class](#class)), or 0.| An offset that points to [string](#string), indicating the name of the source file. If there is no file name information, that is, there is no **SOURCE_FILE** tag in [Class](#class), the register value is 0.| 391| `source_code` | 0 | An offset that points to [string](#string), indicating the source code of the source file. If there is no source code information, the register value is 0.| 392| `constant_pool_ptr` | Address of the first byte in the constant pool in [DebugInfo](#debuginfo)| Pointer to the current constant value. | 393 394 395#### Line number program 396A line number program consists of instructions. Each instruction contains a single-byte operation code and optional parameters. Depending on the operation code, the value of a parameter may be encoded in an instruction (called an instruction parameter) or needs to be obtained from a constant pool (called a constant pool parameter). 397 398| **Operation Code** | **Value**| **Command Parameters** | **Constant Pool Parameters** | **Parameters**| **Description** | 399| ----- | ----- | ------- | ---- | ------- | ------ | 400| `END_SEQUENCE` | `0x00` | | | | Marks the end of the line number program. | 401| `ADVANCE_PC` | `0x01` | | `uleb128 addr_diff` | **addr_diff**: value to be added to the **address** register value. | The value in the **address** register plus **addr_diff** points to the next address without generating a location entry.| 402| `ADVANCE_LINE` | `0x02` | | `sleb128 line_diff` | **line_diff**: value to be added to the **line** register value | The value in the **line** register plus **line_diff** points to the next row position without generating a position entry.| 403| `START_LOCAL` | `0x03` | `sleb128 register_num` | `uleb128 name_idx`<br>`uleb128 type_idx` | **register_num**: register that will contain local variables<br>**name_idx**: an offset pointing to [string](#string), indicating the name of a variable<br>**type_idx**: an offset pointing to [string](#string), indicating the variable type.| Introduces a local variable with a name and type in the current address. The number of the register that will contain this variable is encoded in the instruction. If the register number is -1, it indicates that the register is an accumulator register. The values of **name_idx** and **type_idx** may be 0. If the values are 0, the corresponding information does not exist.| 404| `START_LOCAL_EXTENDED` | `0x04` | `sleb128 register_num` | `uleb128 name_idx`<br>`uleb128 type_idx`<br>`uleb128 sig_idx` | **register_num**: register that will contain local variables.<br>**name_idx**: an offset pointing to [string](#string), indicating the name of a variable.<br>**type_idx**: an offset pointing to [string](#string), indicating the variable type.<br>**sig_idx**: an offset pointing to [string](#string), indicating the signature of the variable.| Introduces a local variable with a name, type, and signature in the current address. The number of the register that will contain this variable is encoded in the instruction. If the register number is -1, it indicates that the register is an accumulator register. The values of **name_idx**, **type_idx**, and **sig_idx** may be 0. If the values are 0, the corresponding information does not exist.| 405| `END_LOCAL` | `0x05` | `sleb128 register_num` | | **register_num**: register containing local variables | Marks a local variable in the specified register as out of range at the current address. If the register number is -1, it indicates that the register is an accumulator register.| 406| `SET_FILE` | `0x09` | | `uleb128 name_idx` | **name_idx**: an offset pointing to [string](#string), indicating the file name| Sets the value of the file register. The value of **name_idx** may be 0. If the value is 0, it indicates that the corresponding information does not exist.| 407| `SET_SOURCE_CODE` | `0x0a` | | `uleb128 source_idx` | **source_idx**: an offset pointing to [string](#string), indicating the source code of the file.| Sets the value of the **source_code** register. The value of **source_idx** may be 0. If the value is 0, it indicates that the corresponding information does not exist.| 408| `SET_COLUMN` | `0x0b` | | `uleb128 column_num` | **column_num**: column number to be set. | Sets the value of the **column** register and generates a location entry. | 409| Special operation code | `0x0c..0xff` | | | | Make the **line** and **address** registers point to the next address and generate a location entry. For details, see the following description.| 410 411 412For special operation codes whose values are between **0x0c** and **0xff** (included), the state machine moves the **line** and **address** registers by a small part and then generates a new location entry. For details, see section 6.2.5.1 "Special Opcodes" in [DWARF 3.0 Standard](https://dwarfstd.org/dwarf3std.html). 413 414| **No.**| **Operation** | **Description** | 415| ----- | -------------------------------------------------- | ------------------------------------------------------------ | 416| 1 | `adjusted_opcode = opcode - OPCODE_BASE` | Calculates the adjusted operation code. The value of **OPCODE_BASE** is **0x0c**, which is the first special operation code.| 417| 2 | `address += adjusted_opcode / LINE_RANGE` | Increase the value of the **address** register. The value of **LINE_RANGE** is 15, which is used to calculate the change of line number information.| 418| 3 | `line += LINE_BASE + (adjusted_opcode % LINE_RANGE)` | Increase the value of the **line** register. The value of **LINE_BASE** is -4, which is the minimum row number increment. The maximum row number increment is **LINE_BASE + LINE_RANGE - 1**.| 419| 4 | | Generates a new location entry. | 420 421**Note:**<br> 422The special operation code is calculated using the following formula: **(line_increment - LINE_BASE) + (address_increment * LINE_RANGE) + OPCODE_BASE**. 423 424 425### IndexSection 426Generally, each structure of a bytecode file is referenced by using a 32-bit offset. When a structure references another structure, the 32-bit offset of the referenced structure needs to be recorded in the current structure. To reduce a file size, a bytecode file is divided into multiple index regions (Index region), and a structure in each index region uses a 16-bit index. The **IndexSection** structure describes a collection of index areas. 427 428- Alignment mode: 4-byte alignment. 429- Format 430 431| **Name**| **Format**| **Description** | 432| -------------- | -------------- | --------- | 433| `headers` | `IndexHeader[]` | An array. Each element in the array is of the [IndexHeader](#indexheader) type. Elements in the array are sorted based on the start offset of the area. The array length is specified by **num_index_regions** in [Header](#header).| 434 435 436### IndexHeader 437Each **IndexHeader** structure describes an index area. Each index area has two types of indexes: indexes pointing to [Type](#type) and indexes pointing to methods, strings, or literal arrays. 438 439- Alignment mode: 4-byte alignment. 440- Format 441 442| **Name** | **Format**| **Description** | 443| -------------- | -------------- | ---------- | 444| `start_off` | `uint32_t` | Offset to the start position in this area. | 445| `end_off` | `uint32_t` | Offset to the end position in this area. | 446| `class_region_idx_size` | `uint32_t` | Number of elements in [ClassRegionIndex](#classregionindex) of the region. The maximum value is 65536.| 447| `class_region_idx_off` | `uint32_t` | An offset that points to [ClassRegionIndex](#classregionindex).| 448| `method_string_literal_region_idx_size` | `uint32_t` | Number of elements in the [MethodStringLiteralRegionIndex](#methodstringliteralregionindex) of the region. The maximum value is 65536.| 449| `method_string_literal_region_idx_off` | `uint32_t` | An offset that points to [MethodStringLiteralRegionIndex](#methodstringliteralregionindex).| 450| `reserved` | `uint32_t` | Reserved field used internally in the Ark bytecode file. | 451| `reserved` | `uint32_t` | Reserved field used internally in the Ark bytecode file. | 452| `reserved` | `uint32_t` | Reserved field used internally in the Ark bytecode file. | 453| `reserved` | `uint32_t` | Reserved field used internally in the Ark bytecode file. | 454 455 456### ClassRegionIndex 457The **ClassRegionIndex** structure is used to find the corresponding [Type](#type) through a more compact index. 458 459- Alignment mode: 4-byte alignment. 460- Format 461 462| **Name**| **Format**| **Description** | 463| -------------- | -------------- | ------------------------------------------------------------ | 464| `types` | `Type[]` | An array. Each element in the array is of the [Type](#type) type. The array length is specified by **class_region_idx_size** in [IndexHeader](#indexheader).| 465 466 467### Type 468Indicates a basic type code or an offset pointing to [Class](#class). It is a 32-bit value. 469 470Basic types are encoded in the following ways. 471 472| **Type** | **Code** | 473| -------------- | -------------- | 474| `u1` | `0x00` | 475| `i8` | `0x01` | 476| `u8` | `0x02` | 477| `i16` | `0x03` | 478| `u16` | `0x04` | 479| `i32` | `0x05` | 480| `u32` | `0x06` | 481| `f32` | `0x07` | 482| `f64` | `0x08` | 483| `i64` | `0x09` | 484| `u64` | `0x0a` | 485| `any` | `0x0c` | 486 487 488### MethodStringLiteralRegionIndex 489The **MethodStringLiteralRegionIndex** structure allows you to find the corresponding method, string, or literal array through a more compact index. 490 491- Alignment mode: 4-byte alignment. 492- Format 493 494| **Name**| **Format**| **Description** | 495| -------------- | -------------- | ------------------------------------------------------------ | 496| `offsets` | `uint32_t[]` | An array in which the value of each element is an offset pointing to a method, string, or literal array. The array length is specified by **method_string_literal_region_idx_size** in [IndexHeader](#indexheader).| 497 498 499### LiteralArray 500Describes the literal array in the bytecode file. 501 502- Alignment mode: single-byte alignment. 503- Format 504 505| **Name**| **Format**| **Description** | 506| -------------- | -------------- | ------------------------------------------------------------ | 507| `num_literals` | `uint32_t` | Length of the **literals** array. | 508| `literals` | `Literal[]` | An array. Each element of the array is of the [Literal](#literal) type.| 509 510 511### Literal 512Describes the literals in a bytecode file. There are four encoding formats based on the number of bytes of the literals. 513 514| **Name**| **Format**| **Alignment Type**| **Description**| 515| -------------- | -------------- | ------------------ | -------------- | 516| ByteOne | `uint8_t` | 1 byte | Single-byte value. | 517| ByteTwo | `uint16_t` | 2 bytes | Double-byte value. | 518| ByteFour | `uint32_t` | 4 bytes | Four-byte value. | 519| ByteEight | `uint64_t` | 8 bytes | Eight-byte value. | 520