1# Ark Bytecode File Format
2The Ark bytecode file is a binary product compiled by ArkTS/TS/JS. This topic describes the Ark bytecode file format in detail, aiming to introduce each part of the bytecode and guide you to analyze and modify it.
3
4
5## Constraints
6This topic applies only to Ark bytecode whose version number is 11.0.2.0. (The version number is an internal reserved field of the Ark compiler.)
7
8
9## Data Types of Bytecode File
10
11### Integer
12
13| **Name**       | **Description**                          |
14| -------------- | ---------------------------------- |
15| `uint8_t`      | 8-bit unsigned integer.                 |
16| `uint16_t`     | 16-bit unsigned integer in little-endian mode.  |
17| `uint32_t`     | 32-bit unsigned integer in little-endian mode.  |
18| `uleb128`      | Leb128-encoded unsigned integer.            |
19| `sleb128`      | Leb128-encoded signed integer.            |
20
21
22### String
23- Alignment mode: single-byte alignment.
24- Format
25
26| **Name**| **Format**| **Description**                                              |
27| -------------- | -------------- | ------------------------------------------------------------ |
28| `utf16_length`   | `uleb128`  | The value is **`len << 1 \**| **is_ascii**, where **len** indicates the size of a string encoded by UTF-16, and **is_ascii`** indicates whether the string contains only ASCII characters. The value can be 0 or 1.|
29| `data`           | `uint8_t[]` | MUTF-8 encoded character sequence ending with **\0**. |
30
31
32### TaggedValue
33- Alignment mode: single-byte alignment.
34- Format
35
36| **Name**| **Format**| **Description**                               |
37| -------------- | -------------- | -------------------------------------------- |
38| `tag`          | `uint8_t`      | Indicates the tag of a data type.                          |
39| `data`         | `uint8_t[]`    | According to different tags, **data** is of different types or is empty.|
40
41
42## TypeDescriptor
43**TypeDescriptor** is the format of the class ([Class](#class)) name. Its name **L_ClassName;** is consisted by **'L'**, **'_'**, **ClassName**, and **';'**. In the preceding information, **ClassName** indicates the full name of the class. **'.'** in the name is replaced with **'/'**.
44
45
46## Bytecode File Layout
47The bytecode file is compiled based on the [Header](#header) structure. All structures in the file can be accessed directly or indirectly from the **Header**. The reference modes of the structure in the bytecode file include offset and index. The offset is a 32-bit value, indicating the distance, which is calculated from scratch, between the start position of the current structure and the file header in the bytecode file. An index is a 16-bit value that indicates the position of the current structure in the index area. This mechanism is described in [IndexSection](#indexsection).
48
49All multi-byte values in the bytecode file are in little-endian.
50
51
52### Header
53- Alignment mode: single-byte alignment.
54- Format
55
56| **Name**   | **Format**| **Description**                                              |
57| ----------------- | -------------- | ------------------------------------------------------------ |
58| `magic`             | `uint8_t[8]`     | Value of the magic number must be **'P' 'A' 'N' 'D' 'A' '\0' '\0' '\0'**.   |
59| `checksum`          | `uint32_t`       | **Adler32** checksum of the content in the bytecode file except the magic number and this check field.|
60| `version`           | `uint8_t[4]`     | Version number of the bytecode file ([Version](#version)).|
61| `file_size`         | `uint32_t`       | Size of a bytecode file, in bytes.                            |
62| `foreign_off`       | `uint32_t`       | An offset that points to an external area. The external area contains two types of elements: [ForeignClass](#foreignclass) or [ForeignMethod](#foreignmethod). **foreign_off** points to the first element in the area.|
63| `foreign_size`      | `uint32_t`       | Size of the external area, in bytes.                              |
64| `num_classes`       | `uint32_t`       | Number of elements in the [ClassIndex](#classindex) structure, that is, the number of [Class](#class) defined in the file.|
65| `class_idx_off`     | `uint32_t`       | An offset that points to [ClassIndex](#classindex).|
66| `num_lnps`          | `uint32_t`       | Number of elements in the [LineNumberProgramIndex](#linenumberprogramindex) structure, that is, the number of [Line number program](#line-number-program) defined in the file.|
67| `lnp_idx_off`       | `uint32_t`       | An offset that points to [LineNumberProgramIndex](#linenumberprogramindex).|
68| `reserved`          | `uint32_t`       | Reserved field used internally in the Ark bytecode file.                          |
69| `reserved`          | `uint32_t`       | Reserved field used internally in the Ark bytecode file.                          |
70| `num_index_regions` | `uint32_t`       | Number of elements in the [IndexSection](#indexsection) structure, that is, the number of [IndexHeader](#indexheader) in the file.|
71| `index_section_off` | `uint32_t`       | An offset that points to [IndexSection](#indexsection).|
72
73
74### Version
75The bytecode version number consists of four parts in the format of **major version number.minor version number.feature version number.Build version number**.
76
77| **Name**| **Format**| **Description**                                            |
78| -------------- | -------------- | ---------------------------------------------------------- |
79| Major version number      | `uint8_t`        | Indicates the bytecode file format change caused by the overall structure adjustment.                |
80| Minor version number      | `uint8_t`        | Indicates the bytecode file format change caused by partial structure adjustment or major feature adjustment.|
81| Feature version number    | `uint8_t`        | Indicates the bytecode file format change caused by small- and medium-sized features.                    |
82| Build version number    | `uint8_t`        | Indicates the bytecode file format change caused by defect rectification.                    |
83
84
85### ForeignClass
86Describes the enclosing classes in the bytecode file. They are declared in other files and referenced in the current bytecode file.
87- Alignment mode: single-byte alignment.
88- Format
89
90| **Name**| **Format**| **Description**                                              |
91| -------------- | -------------- | ------------------------------------------------------------ |
92| `name`           | `String`         | Name of the enclosing class, which follows the [TypeDescriptor](#typedescriptor) syntax.|
93
94
95### ForeignMethod
96Describes external methods in bytecode files. They are declared in other files and referenced in the current bytecode file.
97- Alignment mode: single-byte alignment.
98- Format
99
100| **Name**| **Format**| **Description**                                              |
101| -------------- | -------------- | ------------------------------------------------------------ |
102| `class_idx`      | `uint16_t`       | An index pointing to the class to which the method belongs. It points to a position in [ClassRegionIndex](#classregionindex), whose value is an offset pointing to [Class](#class) or [ForeignClass](#foreignclass).|
103| `reserved`       | `uint16_t`       | Reserved field used internally in the Ark bytecode file.              |
104| `name_off`       | `uint32_t`       | An offset that points to [string](#string), indicating the method name.|
105| `index_data`     | `uleb128`        | [MethodIndexData](#methodindexdata) data of the method.|
106
107**Note:**<br>
108With the offset of **ForeignMethod**, the appropriate **IndexHeader** can be found to parse the **class_idx**.
109
110
111### ClassIndex
112The **ClassIndex** structure is used to quickly locate the definition of the **Class** by name.
113- Alignment mode: 4-byte alignment.
114- Format
115
116| **Name**| **Format**| **Description**                                              |
117| -------------- | -------------- | ------------------------------------------------------------ |
118| `offsets`        | `uint32_t[]`     | An array. The value of each element in this array is an offset pointing to [Class](#class). Elements in an array are sorted by class name. This name follows the [TypeDescriptor](#typedescriptor) syntax. The array length is specified by **num_classes** in [Header](#header).|
119
120
121### Class
122In a bytecode file, a class can represent a source code file of Ark bytecode or a built-in [Annotation](#annotation). When it indicates a source code file, the method of the class corresponds to the function in the source code file, and class field corresponds to the internal information in the source file. When it indicates a built-in **Annotation**, the class does not contain the field or method. A class in the source code file is represented in the bytecode file as a method corresponding to its constructor.
123
124- Alignment mode: single-byte alignment.
125- Format
126
127| **Name**| **Format**| **Description**                                              |
128| -------------- | -------------- | ------------------------------------------------------------ |
129| `name`           | `String`         | Class name, which follows the [TypeDescriptor](#typedescriptor) syntax.|
130| `reserved`       | `uint32_t`       | Reserved field used internally in the Ark bytecode file.                          |
131| `access_flags`   | `uleb128`        | Accessing tag of **Class**, which is a combination of [ClassAccessFlag](#classaccessflag).|
132| `num_fields`     | `uleb128`        | Number of fields of **Class**.                                         |
133| `num_methods`    | `uleb128`        | Number of methods of **Class**.                                         |
134| `class_data`     | `TaggedValue[]`  | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [ClassTag](#classtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).|
135| `fields`         | `Field[]`        | Array of Class fields. Each element in the array is of the [Field](#field) type. The array length is specified by **num_fields**.|
136| `methods`        | `Method[]`       | Array of Class methods. Each element in the array is of the [Method](#method) type. The array length is specified by `num_methods`.|
137
138
139### ClassAccessFlag
140
141| **Name**| **Value**| **Description**                                              |
142| -------------- | ------------ | ------------------------------------------------------------ |
143| `ACC_PUBLIC`     | `0x0001`       | Default attribute. [Class](#class) in the Ark bytecode has this tag.|
144| `ACC_ANNOTATION` | `0x2000`       | Declares the class as the [Annotation](#annotation) type.|
145
146
147### ClassTag
148- Alignment mode: single-byte alignment.
149- Format
150
151| **Name**| **Value**| **Quantity**| **Format**| **Description**                                              |
152| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ |
153| `NOTHING`        | `0x00`  | `1`  | `none`    | The [TaggedValue](#taggedvalue) with this tag is the last item of the **class_data**.|
154| `SOURCE_LANG`    | `0x02`  | `0-1 ` | `uint8_t` | The **data** of [TaggedValue](#taggedvalue) with this tag is 0, indicating that the source code language is ArkTS, TS, or JS.|
155| `SOURCE_FILE`    | `0x07`  | `0-1`  | `uint32_t`| The **data** of [TaggedValue](#taggedvalue) with this tag is an offset that points to [string](#string), indicating the name of the source file.|
156
157**Note:**<br>
158**ClassTag** is the tag of the element ([TaggedValue](#taggedvalue)) in the **class_data**. The number in the table header refers to the number of occurrences of the element with this tag in the **class_data** of a [Class](#class).
159
160
161### Field
162Describes the fields in the bytecode file.
163
164- Alignment mode: single-byte alignment.
165- Format
166
167| **Name**| **Format**| **Description**                                              |
168| -------------- | -------------- | ------------------------------------------------------------ |
169| `class_idx`      | `uint16_t`       | An index pointing to the class to which the field belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).|
170| `type_idx`       | `uint16_t`       | An index that points to the type of the field and points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type.|
171| `name_off`       | `uint32_t`       | An offset that points to [string](#string), indicating the name of the field.|
172| `reserved`       | `uleb128`        | Reserved field used internally in the Ark bytecode file.                          |
173| `field_data`     | `TaggedValue[]`  | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [FieldTag](#fieldtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).|
174
175**Note:**<br>
176Based on the offset of the **Field**, the appropriate **IndexHeader** can be found to parse the **class_idx** and **type_idx**.
177
178
179### FieldTag
180
181- Alignment mode: single-byte alignment.
182- Format
183
184| **Name**| **Value**| **Quantity**| **Format**| **Description** |
185| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ |
186| `NOTHING`        | `0x00`   | `1`   | `none`     | The [TaggedValue](#taggedvalue) with this tag is the last item of the **field_data**.|
187| `INT_VALUE`      | `0x01`   | `0-1` | `sleb128`  | The **data** type of the [TaggedValue](#taggedvalue) with this tag is of **boolean**, **byte**, **char**, **short**, or **int**.|
188| `VALUE`          | `0x02`   | `0-1` | `uint32_t` | The **data** type of the [TaggedValue](#taggedvalue) with this tag is of **FLOAT** or **ID** in [Value formats](#value-formats).|
189
190**Note:**<br>
191**FieldTag** is the tag of the element ([TaggedValue](#taggedvalue)) in the **field_data**. The number in the table header refers to the number of occurrences of the element with this tag in the **field_data** of a [Field](#field).
192
193
194### Method
195Describes methods in bytecode files.
196
197- Alignment mode: single-byte alignment.
198- Format
199
200| **Name**| **Format**| **Description**                                              |
201| -------------- | -------------- | ------------------------------------------------------------ |
202| `class_idx`      | `uint16_t`       | An index pointing to the class to which the method belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).|
203| `reserved`       | `uint16_t`       | Reserved field used internally in the Ark bytecode file.                          |
204| `name_off`       | `uint32_t`       | An offset that points to [string](#string), indicating the method name.|
205| `index_data`     | `uleb128`        | [MethodIndexData](#methodindexdata) data of the method.|
206| `method_data`    | `TaggedValue[]`  | Array with variable length. Each element in the array is of the [TaggedValue](#taggedvalue) type, and the element tag is of the [MethodTag](#methodtag) type. Elements in the array are sorted in ascending order based on the tag (except the **0x00** tag).|
207
208**Note:**<br>
209With the offset of **Method**, the appropriate **IndexHeader** can be found to parse the **class_idx**.
210
211
212### MethodIndexData
213**MethodIndexData** is an unsigned 32-bit integer divided into three parts.
214
215| **Bit**| **Name**| **Format**| **Description**                                              |
216| ------------ | -------------- | -------------- | ------------------------------------------------------------ |
217| 0 - 15       | `header_index`   | `uint16_t`       | Point to a position in [IndexSection](#indexsection). The value of this position is [IndexHeader](#indexheader). You can use **IndexHeader** to find the offsets of all methods ([Method](#method)), [string](#string), or literal arrays ([LiteralArray](#literalarray)) referenced by the method.|
218| 16 - 23      | `function_kind`  | `uint8_t`        | Function type of a method ([FunctionKind](#functionkind)).|
219| 24 - 31      | `reserved`       | `uint8_t`        | Reserved field used internally in the Ark bytecode file.                          |
220
221
222#### FunctionKind
223
224| **Name**          | **Value**| **Description**  |
225| ------------------------ | ------------ | ---------------- |
226| `FUNCTION`                 | `0x1`          | Common function.      |
227| `NC_FUNCTION`              | `0x2`          | Common arrow function.  |
228| `GENERATOR_FUNCTION`       | `0x3`          | Generator function.    |
229| `ASYNC_FUNCTION`           | `0x4`          | Asynchronous function.      |
230| `ASYNC_GENERATOR_FUNCTION` | `0x5`          | Asynchronous generator function.|
231| `ASYNC_NC_FUNCTION`        | `0x6`          | Asynchronous arrow function.  |
232| `CONCURRENT_FUNCTION`      | `0x7`          | Concurrent function.      |
233
234
235### MethodTag
236
237| **Name**| **Value**| **Quantity**| **Format**| **Description**                                              |
238| -------------- | ------------ | -------------- | -------------- | ------------------------------------------------------------ |
239| `NOTHING`        | `0x00`         | `1`             | `none`           | The [TaggedValue](#taggedvalue) with this tag is the last item of the **method_data**.|
240| `CODE`           | `0x01`         | `0-1 `           | `uint32_t`       | The **data** of [TaggedValue](#taggedvalue) that has this tag is an offset pointing to [Code](#code), indicating the code segment of the method.|
241| `SOURCE_LANG`    | `0x02`         | `0-1`            | `uint8_t`        | The **data** of [TaggedValue](#taggedvalue) with this tag is 0, indicating that the source code language is ArkTS, TS, or JS.|
242| `DEBUG_INFO`     | `0x05`         | `0-1`            | `uint32_t`       | The **data** of [TaggedValue](#taggedvalue) with this tag is an offset that points to [DebugInfo](#debuginfo) and indicates the debugging information of the method.|
243| `ANNOTATION`     | `0x06`         | `>=0`            | `uint32_t`       | The **data** of [TaggedValue](#taggedvalue) that has this tag is an offset that points to [Annotation](#annotation) and indicates the annotation of the method.|
244
245**Note:**<br>
246**MethodTag** is the tag of the element ([TaggedValue](#taggedvalue)) in the **method_data**. The number in the table header refers to the number of occurrences of the element with this tag in the **method_data** of a [Method](#method).
247
248
249### Code
250
251- Alignment mode: single-byte alignment.
252- Format
253
254| **Name**| **Format**| **Description**                                              |
255| -------------- | -------------- | ------------------------------------------------------------ |
256| `num_vregs`      | `uleb128`        | Number of registers. Registers that store input parameters and default parameters are not counted.        |
257| `num_args`       | `uleb128`        | Total number of input parameters and default parameters.                                    |
258| `code_size`      | `uleb128`        | Total size of all instructions, in bytes.                            |
259| `tries_size`     | `uleb128`        | Length of the **try_blocks** array, that is, the number of [TryBlock](#tryblock).   |
260| `instructions`   | `uint8_t[]`      | Array of all instructions.                                          |
261| `try_blocks`     | `TryBlock[]`     | An array. Each element in the array is of the **TryBlock** type.|
262
263
264### TryBlock
265
266- Alignment mode: single-byte alignment.
267- Format
268
269| **Name**| **Format**| **Description**                                              |
270| -------------- | -------------- | ------------------------------------------------------------ |
271| `start_pc`       | `uleb128`        | Offset between the first instruction of the **TryBlock** and the start position of the **instructions** of [Code](#code).|
272| `length`         | `uleb128`        | Size (in bytes) of the **TryBlock** object to create.                              |
273| `num_catches`    | `uleb128`        | Number of [CatchBlock](#catchblock) associated with **TryBlock**. The value is 1.|
274| `catch_blocks`   | `CatchBlock[]`   | Array of **CatchBlocks** associated with **TryBlock**. The array contains only one **CatchBlock** that can capture all types of exceptions.|
275
276
277### CatchBlock
278
279- Alignment mode: single-byte alignment.
280- Format
281
282| **Name**| **Format**| **Description**                                 |
283| -------------- | -------------- | ----------------------------------------------- |
284| `type_idx`       | `uleb128`        | If the value is 0, the **CatchBlock** captures all types of exceptions.|
285| `handler_pc`     | `uleb128`        | Program counter of the first instruction of the exception handling logic.         |
286| `code_size`      | `uleb128`        | Size of the **CatchBlock**, in bytes.             |
287
288
289### Annotation
290Describes an annotation structure.
291
292- Alignment mode: single-byte alignment.
293- Format
294
295| **Name**| **Format**     | **Description**                                              |
296| -------------- | ------------------- | ------------------------------------------------------------ |
297| `class_idx`      | `uint16_t`   | An index pointing to the class to which the **Annotation** belongs. It points to a position in [ClassRegionIndex](#classregionindex). The value of the position is of the [Type](#type) type and is an offset pointing to [Class](#class).|
298| `count`          | `uint16_t`   | Length of the **elements** array.                                        |
299| `elements`       | AnnotationElement[] | An array. Each element of the array is of the [AnnotationElement](#annotationelement) type.|
300| `element_types`  | `uint8_t[]`  | An array. Each element in the array is of the [AnnotationElementTag](#annotationelementtag) type and is used to describe an **AnnotationElement.** The position of each element in the **element_types** array is the same as that of the corresponding **AnnotationElement** in the **elements** array.|
301
302**Note:**<br>
303Based on the **Annotation** offset, an appropriate **IndexHeader** can be found to parse the **class_idx**.
304
305
306### AnnotationElementTag
307
308| **Name**| **Tag**|
309| -------------- | --------- |
310| `u1`             | `'1'`   |
311| `i8`             | `'2'`   |
312| `u8`             | `'3'`   |
313| `i16`            | `'4'`   |
314| `u16`            | `'5'`   |
315| `i32`            | `'6'`   |
316| `u32`            | `'7'`   |
317| `i64`            | `'8'`   |
318| `u64`            | `'9'`   |
319| `f32`            | `'A'`   |
320| `f64`            | `'B'`   |
321| `string`         | `'C'`   |
322| `method`         | `'E'`   |
323| `annotation`     | `'G'`   |
324| `literalarray`   | `'#'`   |
325| `unknown`        | `'0'`   |
326
327
328### AnnotationElement
329
330- Alignment mode: single-byte alignment.
331- Format
332
333| **Name**| **Format**| **Description**                                              |
334| -------------- | -------------- | ------------------------------------------------------------ |
335| `name_off`       | `uint32_t`       | An offset that points to [string](#string), indicating the name of the annotation element.|
336| `value`          | `uint32_t`       | Value of the annotation element. If the width of the value does not exceed 32 bits, the value itself is stored here. Otherwise, the value stored here is an offset pointing to the [Value formats](#value-formats) format.|
337
338
339### Value formats
340Different value types have different value encoding formats, including INTEGER, LONG, FLOAT, DOUBLE, and ID.
341
342| **Name**| **Format**| **Description**                                              |
343| -------------- | -------------- | ------------------------------------------------------------ |
344| `INTEGER`        | `uint32_t`       | Signed 4-byte integer value.                                      |
345| `LONG`           | `uint64_t`       | Signed 8-byte integer value.                                      |
346| `FLOAT`          | `uint32_t`       | 4-byte mode, which is extended to the right zero. The system interprets it as an IEEE754 32-bit floating-point value.|
347| `DOUBLE`         | `uint64_t`       | 8-byte mode, which is extended to the right zero. The system interprets it as an IEEE754 64-bit floating-point value.|
348| `ID`             | `uint32_t`       | 4-byte mode, indicating the offset of a structure in a file.                  |
349
350
351### LineNumberProgramIndex
352The **LineNumberProgramIndex** structure is an array that facilitates the use of a more compact index to access the [Line number program](#line-number-program).
353
354- Alignment mode: 4-byte alignment.
355- Format
356
357| **Name**| **Format**| **Description**                                              |
358| -------------- | -------------- | ------------------------------------------------------------ |
359| `offsets`        | `uint32_t[]`     | An array in which the value of each element is an offset pointing to a line number program. The array length is specified by **num_lnps** in [Header](#header).|
360
361
362### DebugInfo
363The **DebugInfo** contains the mapping between the program counter of the method and the row and column numbers in the source code, as well as information about local variables. The format of the debugging information evolves from the contents of [DWARF 3.0 Standard](https://dwarfstd.org/dwarf3std.html) (see section 6.2). Based on the execution model of the ([State machine](#state-machine)), the ([Line number program](#line-number-program)) is interpreted to obtain the mapping and local variable information code. To deduplicate programs with the same line number in different methods, all constants referenced in the programs are moved to the constant pool ([Constant pool](#constant-pool)).
364
365- Alignment mode: single-byte alignment.
366- Format
367
368| **Name**         | **Format**| **Description**                                              |
369| ----------------------- | -------------- | ------------------------------------------------------------ |
370| `line_start`              | `uleb128`        | Initial value of the line number register of the state machine.                                |
371| `num_parameters`          | `uleb128`        | Total number of input parameters and default parameters.                                    |
372| `parameters`              | `uleb128[]`      | Array that stores the names of method input parameters. The array length is **num_parameters**. The value of each element is the offset of the string or 0. If the value is 0, the corresponding parameter does not have a name.|
373| `constant_pool_size`      | `uleb128`        | Size of the constant pool, in bytes.                                |
374| `constant_pool`           | `uleb128[]`      | Array for storing constant pool data. The array length is **constant_pool_size**.        |
375| `line_number_program_idx` | `uleb128`        | An index that points to a position in [LineNumberProgramIndex](#linenumberprogramindex). The value of this position is an offset pointing to [Line number program](#line-number-program). The length of Line number program is variable and ends with the **END_SEQUENCE** operation code.|
376
377
378#### Constant pool
379A constant pool is a structure for storing constants in **DebugInfo**. Many methods have similar line-number programs, which differ only in variable names, variable types, and file names. To deduplicate such line number programs, all constants referenced in the programs are stored in the constant pool. When interpreting the program, the state machine maintains a pointer to the constant pool. When interpreting an instruction that requires constant parameters, the state machine reads the value from the position pointed to by the memory constant pool pointer and then increments the pointer.
380
381
382#### State machine
383The state machine is used to generate [DebugInfo](#debuginfo) information. It contains the following registers.
384
385| **Name**   | **Initial Value**                                            | **Description**                                              |
386| ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
387| `address`           | 0                                                            | Program counter (pointing to an instruction of a method), which can only monotonically increase.            |
388| `line`              | Value of the **line_start** attribute of [DebugInfo](#debuginfo)| Unsigned integer, corresponding to the line number in the source code. All rows are numbered from 1. Therefore, the register value cannot be less than 1.|
389| `column`            | 0                                                            | Unsigned integer, corresponding to the column number in the source code.                              |
390| `file`              | Value of **SOURCE_FILE** in **class_data** (see [Class](#class)), or 0.| An offset that points to [string](#string), indicating the name of the source file. If there is no file name information, that is, there is no **SOURCE_FILE** tag in [Class](#class), the register value is 0.|
391| `source_code`       | 0                                                            | An offset that points to [string](#string), indicating the source code of the source file. If there is no source code information, the register value is 0.|
392| `constant_pool_ptr` | Address of the first byte in the constant pool in [DebugInfo](#debuginfo)| Pointer to the current constant value.                                      |
393
394
395#### Line number program
396A line number program consists of instructions. Each instruction contains a single-byte operation code and optional parameters. Depending on the operation code, the value of a parameter may be encoded in an instruction (called an instruction parameter) or needs to be obtained from a constant pool (called a constant pool parameter).
397
398| **Operation Code** | **Value**| **Command Parameters**  | **Constant Pool Parameters**    | **Parameters**| **Description** |
399| ----- | ----- | ------- | ---- | ------- | ------ |
400| `END_SEQUENCE`         | `0x00`  |       |          |        | Marks the end of the line number program.   |
401| `ADVANCE_PC`           | `0x01`  |    | `uleb128 addr_diff`   | **addr_diff**: value to be added to the **address** register value.   | The value in the **address** register plus **addr_diff** points to the next address without generating a location entry.|
402| `ADVANCE_LINE`         | `0x02` |     | `sleb128 line_diff`  | **line_diff**: value to be added to the **line** register value   | The value in the **line** register plus **line_diff** points to the next row position without generating a position entry.|
403| `START_LOCAL`          | `0x03` | `sleb128 register_num` | `uleb128 name_idx`<br>`uleb128 type_idx`   | **register_num**: register that will contain local variables<br>**name_idx**: an offset pointing to [string](#string), indicating the name of a variable<br>**type_idx**: an offset pointing to [string](#string), indicating the variable type.| Introduces a local variable with a name and type in the current address. The number of the register that will contain this variable is encoded in the instruction. If the register number is -1, it indicates that the register is an accumulator register. The values of **name_idx** and **type_idx** may be 0. If the values are 0, the corresponding information does not exist.|
404| `START_LOCAL_EXTENDED` | `0x04` | `sleb128 register_num` | `uleb128 name_idx`<br>`uleb128 type_idx`<br>`uleb128 sig_idx` | **register_num**: register that will contain local variables.<br>**name_idx**: an offset pointing to [string](#string), indicating the name of a variable.<br>**type_idx**: an offset pointing to [string](#string), indicating the variable type.<br>**sig_idx**: an offset pointing to [string](#string), indicating the signature of the variable.| Introduces a local variable with a name, type, and signature in the current address. The number of the register that will contain this variable is encoded in the instruction. If the register number is -1, it indicates that the register is an accumulator register. The values of **name_idx**, **type_idx**, and **sig_idx** may be 0. If the values are 0, the corresponding information does not exist.|
405| `END_LOCAL`            | `0x05` | `sleb128 register_num` |    | **register_num**: register containing local variables | Marks a local variable in the specified register as out of range at the current address. If the register number is -1, it indicates that the register is an accumulator register.|
406| `SET_FILE`             | `0x09`  |    | `uleb128 name_idx`  | **name_idx**: an offset pointing to [string](#string), indicating the file name| Sets the value of the file register. The value of **name_idx** may be 0. If the value is 0, it indicates that the corresponding information does not exist.|
407| `SET_SOURCE_CODE`      | `0x0a`  |    | `uleb128 source_idx` | **source_idx**: an offset pointing to [string](#string), indicating the source code of the file.| Sets the value of the **source_code** register. The value of **source_idx** may be 0. If the value is 0, it indicates that the corresponding information does not exist.|
408| `SET_COLUMN`           | `0x0b` |    | `uleb128 column_num`   | **column_num**: column number to be set.  | Sets the value of the **column** register and generates a location entry. |
409| Special operation code          | `0x0c..0xff`   |   |  |   | Make the **line** and **address** registers point to the next address and generate a location entry. For details, see the following description.|
410
411
412For special operation codes whose values are between **0x0c** and **0xff** (included), the state machine moves the **line** and **address** registers by a small part and then generates a new location entry. For details, see section 6.2.5.1 "Special Opcodes" in [DWARF 3.0 Standard](https://dwarfstd.org/dwarf3std.html).
413
414| **No.**| **Operation**                                    | **Description**                                              |
415| ----- | -------------------------------------------------- | ------------------------------------------------------------ |
416| 1     | `adjusted_opcode = opcode - OPCODE_BASE`            | Calculates the adjusted operation code. The value of **OPCODE_BASE** is **0x0c**, which is the first special operation code.|
417| 2     | `address += adjusted_opcode / LINE_RANGE`            | Increase the value of the **address** register. The value of **LINE_RANGE** is 15, which is used to calculate the change of line number information.|
418| 3     | `line += LINE_BASE + (adjusted_opcode % LINE_RANGE)` | Increase the value of the **line** register. The value of **LINE_BASE** is -4, which is the minimum row number increment. The maximum row number increment is **LINE_BASE + LINE_RANGE - 1**.|
419| 4     |                                                    | Generates a new location entry.                                      |
420
421**Note:**<br>
422The special operation code is calculated using the following formula: **(line_increment - LINE_BASE) + (address_increment * LINE_RANGE) + OPCODE_BASE**.
423
424
425### IndexSection
426Generally, each structure of a bytecode file is referenced by using a 32-bit offset. When a structure references another structure, the 32-bit offset of the referenced structure needs to be recorded in the current structure. To reduce a file size, a bytecode file is divided into multiple index regions (Index region), and a structure in each index region uses a 16-bit index. The **IndexSection** structure describes a collection of index areas.
427
428- Alignment mode: 4-byte alignment.
429- Format
430
431| **Name**| **Format**| **Description**      |
432| -------------- | -------------- | --------- |
433| `headers`        | `IndexHeader[]`  | An array. Each element in the array is of the [IndexHeader](#indexheader) type. Elements in the array are sorted based on the start offset of the area. The array length is specified by **num_index_regions** in [Header](#header).|
434
435
436### IndexHeader
437Each **IndexHeader** structure describes an index area. Each index area has two types of indexes: indexes pointing to [Type](#type) and indexes pointing to methods, strings, or literal arrays.
438
439- Alignment mode: 4-byte alignment.
440- Format
441
442| **Name**       | **Format**| **Description**   |
443| -------------- | -------------- | ---------- |
444| `start_off`                             | `uint32_t`       | Offset to the start position in this area.                                        |
445| `end_off`                               | `uint32_t`       | Offset to the end position in this area.                                        |
446| `class_region_idx_size`                 | `uint32_t`       | Number of elements in [ClassRegionIndex](#classregionindex) of the region. The maximum value is 65536.|
447| `class_region_idx_off`                  | `uint32_t`       | An offset that points to [ClassRegionIndex](#classregionindex).|
448| `method_string_literal_region_idx_size` | `uint32_t`       | Number of elements in the [MethodStringLiteralRegionIndex](#methodstringliteralregionindex) of the region. The maximum value is 65536.|
449| `method_string_literal_region_idx_off`  | `uint32_t`       | An offset that points to [MethodStringLiteralRegionIndex](#methodstringliteralregionindex).|
450| `reserved`                              | `uint32_t`       | Reserved field used internally in the Ark bytecode file.                          |
451| `reserved`                              | `uint32_t`       | Reserved field used internally in the Ark bytecode file.                          |
452| `reserved`                              | `uint32_t`       | Reserved field used internally in the Ark bytecode file.                          |
453| `reserved`                              | `uint32_t`       | Reserved field used internally in the Ark bytecode file.                          |
454
455
456### ClassRegionIndex
457The **ClassRegionIndex** structure is used to find the corresponding [Type](#type) through a more compact index.
458
459- Alignment mode: 4-byte alignment.
460- Format
461
462| **Name**| **Format**| **Description**                                              |
463| -------------- | -------------- | ------------------------------------------------------------ |
464| `types`          | `Type[]`         | An array. Each element in the array is of the [Type](#type) type. The array length is specified by **class_region_idx_size** in [IndexHeader](#indexheader).|
465
466
467### Type
468Indicates a basic type code or an offset pointing to [Class](#class). It is a 32-bit value.
469
470Basic types are encoded in the following ways.
471
472| **Type**      | **Code**       |
473| -------------- | -------------- |
474| `u1`           | `0x00`         |
475| `i8`           | `0x01`         |
476| `u8`           | `0x02`         |
477| `i16`          | `0x03`         |
478| `u16`          | `0x04`         |
479| `i32`          | `0x05`         |
480| `u32`          | `0x06`         |
481| `f32`          | `0x07`         |
482| `f64`          | `0x08`         |
483| `i64`          | `0x09`         |
484| `u64`          | `0x0a`         |
485| `any`          | `0x0c`         |
486
487
488### MethodStringLiteralRegionIndex
489The **MethodStringLiteralRegionIndex** structure allows you to find the corresponding method, string, or literal array through a more compact index.
490
491- Alignment mode: 4-byte alignment.
492- Format
493
494| **Name**| **Format**| **Description**                                              |
495| -------------- | -------------- | ------------------------------------------------------------ |
496| `offsets`      | `uint32_t[]`   | An array in which the value of each element is an offset pointing to a method, string, or literal array. The array length is specified by **method_string_literal_region_idx_size** in [IndexHeader](#indexheader).|
497
498
499### LiteralArray
500Describes the literal array in the bytecode file.
501
502- Alignment mode: single-byte alignment.
503- Format
504
505| **Name**| **Format**| **Description**                                              |
506| -------------- | -------------- | ------------------------------------------------------------ |
507| `num_literals`   | `uint32_t`       | Length of the **literals** array.                                        |
508| `literals`       | `Literal[]`      | An array. Each element of the array is of the [Literal](#literal) type.|
509
510
511### Literal
512Describes the literals in a bytecode file. There are four encoding formats based on the number of bytes of the literals.
513
514| **Name**| **Format**| **Alignment Type**| **Description**|
515| -------------- | -------------- | ------------------ | -------------- |
516| ByteOne        | `uint8_t`        | 1 byte           | Single-byte value.  |
517| ByteTwo        | `uint16_t`       | 2 bytes           | Double-byte value.  |
518| ByteFour       | `uint32_t`       | 4 bytes           | Four-byte value.  |
519| ByteEight      | `uint64_t`       | 8 bytes           | Eight-byte value.  |
520