1b1994897Sopenharmony_ci# Interpreter Design
2b1994897Sopenharmony_ci
3b1994897Sopenharmony_ci## Introduction
4b1994897Sopenharmony_ci
5b1994897Sopenharmony_ciThis document outlines the key design decisions in the interpreter and its companion components:
6b1994897Sopenharmony_cibytecode instruction set, executable file format, etc. Each subsection below consists of following
7b1994897Sopenharmony_ciparts:
8b1994897Sopenharmony_ci
9b1994897Sopenharmony_ci| Title                           | Description                                                |
10b1994897Sopenharmony_ci| ------------------------------- | ---------------------------------------------------------- |
11b1994897Sopenharmony_ci| Requirements                    | Enlists the requirements to the component.                 |
12b1994897Sopenharmony_ci| Key Design Decisions            | Summarizes the key decisions to address the requirements.  |
13b1994897Sopenharmony_ci| Rationale                       | Elaborates on the rationales behind the decisions.         |
14b1994897Sopenharmony_ci| Specification / Implementation  | Provides relevant links.                                   |
15b1994897Sopenharmony_ci
16b1994897Sopenharmony_ciPlease refer to the [glossary](glossary.md) for terminology clarification.
17b1994897Sopenharmony_ci
18b1994897Sopenharmony_ci### Common Requirements
19b1994897Sopenharmony_ci
20b1994897Sopenharmony_ciThis section outlines common requirements that should be considered while designing the interpreter
21b1994897Sopenharmony_ciand all related components of the platform:
22b1994897Sopenharmony_ci
23b1994897Sopenharmony_ci1. The platform should scale from microcontrollers to hi-end mobile phones:
24b1994897Sopenharmony_ci    1. It should fit into 50Kb of ROM.
25b1994897Sopenharmony_ci    1. It should be able to run consuming 64Kb of RAM.
26b1994897Sopenharmony_ci1. Program execution via bytecode interpretation should be enabled on all targets.
27b1994897Sopenharmony_ci1. The platform should support multiple programming languages
28b1994897Sopenharmony_ci
29b1994897Sopenharmony_ci## Bytecode
30b1994897Sopenharmony_ci
31b1994897Sopenharmony_ci### Requirements
32b1994897Sopenharmony_ci
33b1994897Sopenharmony_ci1. Bytecode should allow the interpreter to run no slower than state of the art interpreters.
34b1994897Sopenharmony_ci1. Bytecode should be compact in size to avoid bloating application code.
35b1994897Sopenharmony_ci1. Bytecode description should have a single entry point to simplify maintenance
36b1994897Sopenharmony_ci   across all components of the platform.
37b1994897Sopenharmony_ci
38b1994897Sopenharmony_ci### Key Design Decisions
39b1994897Sopenharmony_ci
40b1994897Sopenharmony_ci1. Bytecode is register-based: all arguments and variables are mapped to virtual registers,
41b1994897Sopenharmony_ci   and most of bytecodes encode virtual registers as operands.
42b1994897Sopenharmony_ci1. There is a dedicated register called accumulator, which is addressed implicitly by some
43b1994897Sopenharmony_ci   bytecodes and shared across all function frames during runtime.
44b1994897Sopenharmony_ci1. Bytecode's instruction set architecture is machine-readable with a dedicated API for
45b1994897Sopenharmony_ci   code and documentation generation.
46b1994897Sopenharmony_ci
47b1994897Sopenharmony_ci### Rationale
48b1994897Sopenharmony_ci
49b1994897Sopenharmony_ciFor the rationale on the bytecode type, please see [here](rationale-for-bytecode.md).
50b1994897Sopenharmony_ci
51b1994897Sopenharmony_ciRationale on the machine-readable instruction set architecture is following. Since bytecode
52b1994897Sopenharmony_ciis the main form of program representation, information about it is needed in many components of
53b1994897Sopenharmony_cithe platform outside the interpreter. Having this crucial information copy-pasted or delivered as
54b1994897Sopenharmony_cia bunch of C/C++ headers is very fragile and error-prone. At the same time,
55b1994897Sopenharmony_ciwith the machine-readable ISA we can re-use this information in many places already now, e.g.:
56b1994897Sopenharmony_ci
57b1994897Sopenharmony_ci* In Panda Assembler's back-end, we automatically generate emission of bytecode in the binary form.
58b1994897Sopenharmony_ci* Converter from bytecode to the compiler's intermediate representation is partially implemented
59b1994897Sopenharmony_ci  with code autogenerated from the ISA.
60b1994897Sopenharmony_ci
61b1994897Sopenharmony_ciBesides, the machine-readable form naturally sets up the framework for self-testing (e.g.
62b1994897Sopenharmony_cidefinitions were defined and used, different parts of instruction description doesn't contradict
63b1994897Sopenharmony_cithemselves, etc.).
64b1994897Sopenharmony_ci
65b1994897Sopenharmony_ci### Specification / Implementation
66b1994897Sopenharmony_ci
67b1994897Sopenharmony_ciPlease find the implementation of the instruction set architecture [here](../isa/isa.yaml).
68b1994897Sopenharmony_ci
69b1994897Sopenharmony_ci## Executable File Format
70b1994897Sopenharmony_ci
71b1994897Sopenharmony_ci### Requirements
72b1994897Sopenharmony_ci
73b1994897Sopenharmony_ci1. All entities in the executable file should be encoded and stored compactly to avoid bloating
74b1994897Sopenharmony_ci   application code.
75b1994897Sopenharmony_ci1. Format should enforce as "flat" layout as possible: referencing metadata from other metadata
76b1994897Sopenharmony_ci   should be kept at minimum to reduce access overhead during runtime.
77b1994897Sopenharmony_ci1. Runtime memory footprint of executable files should be low.
78b1994897Sopenharmony_ci
79b1994897Sopenharmony_ci### Key Design Decisions
80b1994897Sopenharmony_ci
81b1994897Sopenharmony_ci1. The entire code of the application (excluding frameworks and external libraries) fits into a
82b1994897Sopenharmony_ci   single file to gain maximum benefits from deduplicating constant string pools, information
83b1994897Sopenharmony_ci   about types, etc.
84b1994897Sopenharmony_ci1. All metadata entities are split into two groups: local (declared in the current executable file)
85b1994897Sopenharmony_ci   and foreign (declared elsewhere). Local entities can be accessed directly by the offset
86b1994897Sopenharmony_ci   in the offset. Additionally, 4-byte alignment is enforced for the most of data structures for
87b1994897Sopenharmony_ci   more efficient data reads from the Panda binary file. Foreign entities are loaded from their
88b1994897Sopenharmony_ci   respective files on demand.
89b1994897Sopenharmony_ci1. The format uses raw offsets as the main access method to the actual data and doesn't
90b1994897Sopenharmony_ci   require explicitly how structures should be located relative to each other.
91b1994897Sopenharmony_ci
92b1994897Sopenharmony_ci### Rationale
93b1994897Sopenharmony_ci
94b1994897Sopenharmony_ciAccording to our measurements of the most popular 200 Chinese mobile applications,
95b1994897Sopenharmony_ci90% of them do not fit into a single mobile executable already now. Having multiple executables
96b1994897Sopenharmony_cifor the same application introduces extra duplication of metadata and implies extra burden on
97b1994897Sopenharmony_citooling.
98b1994897Sopenharmony_ci
99b1994897Sopenharmony_ciOur aim is to address these issues by having a single file for application code. This, however,
100b1994897Sopenharmony_cimay introduce a new issue: a single file will require larger identifiers, which obviously consume
101b1994897Sopenharmony_cimore space. As a solution, our file format will support variable length of identifiers to
102b1994897Sopenharmony_ciselect the optimal size based on actual code base.
103b1994897Sopenharmony_ci
104b1994897Sopenharmony_ciTo enable even more compact size of resulting binaries, we will compress it with the
105b1994897Sopenharmony_ci`zstd-19` algorithm. According to our research, it decreases file size by 21% and
106b1994897Sopenharmony_cioperating by 9% faster than `gzip`.
107b1994897Sopenharmony_ci
108b1994897Sopenharmony_ci### Specification / Implementation
109b1994897Sopenharmony_ci
110b1994897Sopenharmony_ciPlease find the specification [here](file_format.md).
111b1994897Sopenharmony_ci
112b1994897Sopenharmony_ci## Interpreter
113b1994897Sopenharmony_ci
114b1994897Sopenharmony_ci### Requirements
115b1994897Sopenharmony_ci
116b1994897Sopenharmony_ci1. Interpreter should run no slower than state of the art interpreters.
117b1994897Sopenharmony_ci1. Interpreter should be portable enough to run on targets from IoT devices
118b1994897Sopenharmony_ci   to hi-end mobile phones.
119b1994897Sopenharmony_ci1. Interpreter should not create extra pressure on the host system.
120b1994897Sopenharmony_ci
121b1994897Sopenharmony_ci### Key Design Decisions
122b1994897Sopenharmony_ci
123b1994897Sopenharmony_ci1. Interpreter uses indirect threaded dispatch technique
124b1994897Sopenharmony_ci   (implemented via computed goto) to reduce dispatch overhead.
125b1994897Sopenharmony_ci1. Interpreter does not depend on C++ standard library. All necessary classes, containers, etc.
126b1994897Sopenharmony_ci   are reimplemented by the platform.
127b1994897Sopenharmony_ci1. Interpreter is stackless (from the host stack perspective): Whenever a call from managed code
128b1994897Sopenharmony_ci   to managed code is performed, no new host frame is created for the interpreter itself.
129b1994897Sopenharmony_ci
130b1994897Sopenharmony_ci### Rationale
131b1994897Sopenharmony_ci
132b1994897Sopenharmony_ci1. Interpreters are by nature slower than native code execution. Slowdown can be explained by:
133b1994897Sopenharmony_ci    1. The inevitable extra cost that interpreters pay for decoding and dispatching bytecode
134b1994897Sopenharmony_ci       instructions.
135b1994897Sopenharmony_ci    1. The "heaviness" of language semantics that interpreters should implement.
136b1994897Sopenharmony_ci    1. The "nativeness" of the language implementation to the platform. A language that is
137b1994897Sopenharmony_ci       implemented in another language that runs on the platform may run slower because of
138b1994897Sopenharmony_ci       additional runtime overhead.
139b1994897Sopenharmony_ci1. An ideal target would be 5x-10x slowdown factor (compared to native execution)
140b1994897Sopenharmony_ci   for statically typed languages that run on the platform natively, and 15x-20x for L2 languages.
141b1994897Sopenharmony_ci1. Panda should scale onto a wide range of devices, including IoT devices. Although more and
142b1994897Sopenharmony_ci   more toolchains support C++ compilation for IoT, the standard library is often not present
143b1994897Sopenharmony_ci   on the device. Since static linking with a subset of the library is a pain and may not guarantee
144b1994897Sopenharmony_ci   optimal size of resulting native binary executable files, it is more reasonable to reimplement
145b1994897Sopenharmony_ci   required parts.
146b1994897Sopenharmony_ci1. According to our experiments, a stackless interpreter for a stack-based bytecode (which is by
147b1994897Sopenharmony_ci   nature slower) can even beat sometimes a non-stackless interpreter for a register-based
148b1994897Sopenharmony_ci   bytecode (which is by nature faster).
149b1994897Sopenharmony_ci
150b1994897Sopenharmony_ci### Specification / Implementation
151b1994897Sopenharmony_ci
152b1994897Sopenharmony_ciPlease find the reference implementation [here](../runtime/interpreter).
153b1994897Sopenharmony_ci
154b1994897Sopenharmony_ci## Virtual Stack
155b1994897Sopenharmony_ci
156b1994897Sopenharmony_ci### Requirements
157b1994897Sopenharmony_ci
158b1994897Sopenharmony_ci1. All virtual registers should explicitly and precisely distinct between garbage collectable
159b1994897Sopenharmony_ci   objects and non-garbage collectable primitives to simplify automatic memory management.
160b1994897Sopenharmony_ci1. Virtual registers should be able to hold following types: unsigned and signed integers with
161b1994897Sopenharmony_ci   the size of up to 64 bits, floating point numbers of single and double precision, raw pointers
162b1994897Sopenharmony_ci   to the objects on 32-bit and 64-bit architectures.
163b1994897Sopenharmony_ci1. Virtual stack should abstract limitations possibly imposed by the host stack.
164b1994897Sopenharmony_ci
165b1994897Sopenharmony_ci### Key Design Decisions
166b1994897Sopenharmony_ci
167b1994897Sopenharmony_ci1. All virtual registers are "tagged", meaning that they contain both the payload and additional
168b1994897Sopenharmony_ci   metadata (a "tag"), which allows at least distinguish between garbage collectable and other
169b1994897Sopenharmony_ci   types.
170b1994897Sopenharmony_ci1. The size of a virtual register is not hardcoded by design. Currently it is always 128 bits,
171b1994897Sopenharmony_ci   but this is an implementation detail which is hidden behind the interfaces
172b1994897Sopenharmony_ci   and can be reduced for memory-constrained targets and to 64 bits (using NaN-tagging technique).
173b1994897Sopenharmony_ci1. By default virtual stack is not mapped to a host stack, instead it is allocated on heap using
174b1994897Sopenharmony_ci   platform's memory management facilities. It is important that this behavior is not
175b1994897Sopenharmony_ci   hardcoded by design, it can be reconfigured for platforms where performance may benefit from
176b1994897Sopenharmony_ci   mapping virtual stack directly to the host stack.
177b1994897Sopenharmony_ci
178b1994897Sopenharmony_ci### Rationale
179b1994897Sopenharmony_ci
180b1994897Sopenharmony_ci1. Although tagged virtual registers occupy more memory (especially on 64-bit architectures),
181b1994897Sopenharmony_ci   redundant memory consumption is cheaper than ongoing runtime penalties on garbage collector
182b1994897Sopenharmony_ci   trying to distinguish between objects from non-objects on an "imprecise" stack.
183b1994897Sopenharmony_ci1. Where does 128 comes from? It is `128 = 64 + 64`. The first 64 is either `sizeof(long int)`
184b1994897Sopenharmony_ci   or `sizeof(double)` or `sizeof(void *)` on a 64-bit architecture (i.e. theoretical maximum
185b1994897Sopenharmony_ci   size of the payload we are required to store in a virtual register). The second 64 is for tag
186b1994897Sopenharmony_ci   and padding. A lot of free space is expected to be in the padding area. Probably we may use it
187b1994897Sopenharmony_ci   for memory management or some other needs.
188b1994897Sopenharmony_ci1. Enforcing a virtual stack to mapped to the host stack faces us with some unpleasant constraints
189b1994897Sopenharmony_ci   on IoT devices. On such devices the size of the host stack may be severely limited: as a result,
190b1994897Sopenharmony_ci   managed application have to think about this limitation, which contradicts the idea of
191b1994897Sopenharmony_ci   portability of managed applications. Configurable virtual stack implementation relaxes this
192b1994897Sopenharmony_ci   constraint, which can be relaxed even further with the stackless interpreter (see above).
193b1994897Sopenharmony_ci
194b1994897Sopenharmony_ci### Specification / Implementation
195b1994897Sopenharmony_ci
196b1994897Sopenharmony_ciPlease find the reference implementation [here](../runtime/interpreter/frame.h).
197b1994897Sopenharmony_ci
198b1994897Sopenharmony_ci## Quality Assurance
199b1994897Sopenharmony_ci
200b1994897Sopenharmony_ci### Requirements
201b1994897Sopenharmony_ci
202b1994897Sopenharmony_ci1. Interpreter should allow testing without involving front-ends of concrete languages.
203b1994897Sopenharmony_ci1. Interpreter should allow early testing.
204b1994897Sopenharmony_ci1. Interpreter should allow early benchmarking.
205b1994897Sopenharmony_ci
206b1994897Sopenharmony_ci### Key Design Decisions
207b1994897Sopenharmony_ci
208b1994897Sopenharmony_ci1. A light-weight Panda Assembly language is developed, along with the Panda Assembler tool.
209b1994897Sopenharmony_ci1. A compliance test suite for the Panda Assembly language is created. The core part of the suite
210b1994897Sopenharmony_ci   are small chunks of hand-written Panda Assembly covering corner cases, while the majority of
211b1994897Sopenharmony_ci   cases are covered by automatically generated chunks.
212b1994897Sopenharmony_ci1. A set of benchmarks is ported to Panda Assembly and maintained as a part of the source tree.
213b1994897Sopenharmony_ci
214b1994897Sopenharmony_ci### Rationale
215b1994897Sopenharmony_ci
216b1994897Sopenharmony_ciA general overview of managed assembly languages can be found [here](overview-of-managed-assemblers.md).
217b1994897Sopenharmony_ci
218b1994897Sopenharmony_ciLots of things are being created in parallel, and currently there is no stable front-end for Panda
219b1994897Sopenharmony_cifor any high-level language. However, once they appear, front-end engineer will obviously want a
220b1994897Sopenharmony_cistable interpreter to run their code. At the same time, interpreter developers need some tools
221b1994897Sopenharmony_cibeside too granular unit tests to ensure quality of the interpreter and companion components
222b1994897Sopenharmony_cidescribed above.
223b1994897Sopenharmony_ci
224b1994897Sopenharmony_ci### Specification / Implementation
225b1994897Sopenharmony_ci
226b1994897Sopenharmony_ciPlease find the specification [here](assembly_format.md).
227b1994897Sopenharmony_ci
228b1994897Sopenharmony_ciPlease find the implementation [here](../assembler).
229