1# Interpreter Design
2
3## Introduction
4
5This document outlines the key design decisions in the interpreter and its companion components:
6bytecode instruction set, executable file format, etc. Each subsection below consists of following
7parts:
8
9| Title                           | Description                                                |
10| ------------------------------- | ---------------------------------------------------------- |
11| Requirements                    | Enlists the requirements to the component.                 |
12| Key Design Decisions            | Summarizes the key decisions to address the requirements.  |
13| Rationale                       | Elaborates on the rationales behind the decisions.         |
14| Specification / Implementation  | Provides relevant links.                                   |
15
16Please refer to the [glossary](glossary.md) for terminology clarification.
17
18### Common Requirements
19
20This section outlines common requirements that should be considered while designing the interpreter
21and all related components of the platform:
22
231. The platform should scale from microcontrollers to hi-end mobile phones:
24    1. It should fit into 50Kb of ROM.
25    1. It should be able to run consuming 64Kb of RAM.
261. Program execution via bytecode interpretation should be enabled on all targets.
271. The platform should support multiple programming languages
28
29## Bytecode
30
31### Requirements
32
331. Bytecode should allow the interpreter to run no slower than state of the art interpreters.
341. Bytecode should be compact in size to avoid bloating application code.
351. Bytecode description should have a single entry point to simplify maintenance
36   across all components of the platform.
37
38### Key Design Decisions
39
401. Bytecode is register-based: all arguments and variables are mapped to virtual registers,
41   and most of bytecodes encode virtual registers as operands.
421. There is a dedicated register called accumulator, which is addressed implicitly by some
43   bytecodes and shared across all function frames during runtime.
441. Bytecode's instruction set architecture is machine-readable with a dedicated API for
45   code and documentation generation.
46
47### Rationale
48
49For the rationale on the bytecode type, please see [here](rationale-for-bytecode.md).
50
51Rationale on the machine-readable instruction set architecture is following. Since bytecode
52is the main form of program representation, information about it is needed in many components of
53the platform outside the interpreter. Having this crucial information copy-pasted or delivered as
54a bunch of C/C++ headers is very fragile and error-prone. At the same time,
55with the machine-readable ISA we can re-use this information in many places already now, e.g.:
56
57* In Panda Assembler's back-end, we automatically generate emission of bytecode in the binary form.
58* Converter from bytecode to the compiler's intermediate representation is partially implemented
59  with code autogenerated from the ISA.
60
61Besides, the machine-readable form naturally sets up the framework for self-testing (e.g.
62definitions were defined and used, different parts of instruction description doesn't contradict
63themselves, etc.).
64
65### Specification / Implementation
66
67Please find the implementation of the instruction set architecture [here](../isa/isa.yaml).
68
69## Executable File Format
70
71### Requirements
72
731. All entities in the executable file should be encoded and stored compactly to avoid bloating
74   application code.
751. Format should enforce as "flat" layout as possible: referencing metadata from other metadata
76   should be kept at minimum to reduce access overhead during runtime.
771. Runtime memory footprint of executable files should be low.
78
79### Key Design Decisions
80
811. The entire code of the application (excluding frameworks and external libraries) fits into a
82   single file to gain maximum benefits from deduplicating constant string pools, information
83   about types, etc.
841. All metadata entities are split into two groups: local (declared in the current executable file)
85   and foreign (declared elsewhere). Local entities can be accessed directly by the offset
86   in the offset. Additionally, 4-byte alignment is enforced for the most of data structures for
87   more efficient data reads from the Panda binary file. Foreign entities are loaded from their
88   respective files on demand.
891. The format uses raw offsets as the main access method to the actual data and doesn't
90   require explicitly how structures should be located relative to each other.
91
92### Rationale
93
94According to our measurements of the most popular 200 Chinese mobile applications,
9590% of them do not fit into a single mobile executable already now. Having multiple executables
96for the same application introduces extra duplication of metadata and implies extra burden on
97tooling.
98
99Our aim is to address these issues by having a single file for application code. This, however,
100may introduce a new issue: a single file will require larger identifiers, which obviously consume
101more space. As a solution, our file format will support variable length of identifiers to
102select the optimal size based on actual code base.
103
104To enable even more compact size of resulting binaries, we will compress it with the
105`zstd-19` algorithm. According to our research, it decreases file size by 21% and
106operating by 9% faster than `gzip`.
107
108### Specification / Implementation
109
110Please find the specification [here](file_format.md).
111
112## Interpreter
113
114### Requirements
115
1161. Interpreter should run no slower than state of the art interpreters.
1171. Interpreter should be portable enough to run on targets from IoT devices
118   to hi-end mobile phones.
1191. Interpreter should not create extra pressure on the host system.
120
121### Key Design Decisions
122
1231. Interpreter uses indirect threaded dispatch technique
124   (implemented via computed goto) to reduce dispatch overhead.
1251. Interpreter does not depend on C++ standard library. All necessary classes, containers, etc.
126   are reimplemented by the platform.
1271. Interpreter is stackless (from the host stack perspective): Whenever a call from managed code
128   to managed code is performed, no new host frame is created for the interpreter itself.
129
130### Rationale
131
1321. Interpreters are by nature slower than native code execution. Slowdown can be explained by:
133    1. The inevitable extra cost that interpreters pay for decoding and dispatching bytecode
134       instructions.
135    1. The "heaviness" of language semantics that interpreters should implement.
136    1. The "nativeness" of the language implementation to the platform. A language that is
137       implemented in another language that runs on the platform may run slower because of
138       additional runtime overhead.
1391. An ideal target would be 5x-10x slowdown factor (compared to native execution)
140   for statically typed languages that run on the platform natively, and 15x-20x for L2 languages.
1411. Panda should scale onto a wide range of devices, including IoT devices. Although more and
142   more toolchains support C++ compilation for IoT, the standard library is often not present
143   on the device. Since static linking with a subset of the library is a pain and may not guarantee
144   optimal size of resulting native binary executable files, it is more reasonable to reimplement
145   required parts.
1461. According to our experiments, a stackless interpreter for a stack-based bytecode (which is by
147   nature slower) can even beat sometimes a non-stackless interpreter for a register-based
148   bytecode (which is by nature faster).
149
150### Specification / Implementation
151
152Please find the reference implementation [here](../runtime/interpreter).
153
154## Virtual Stack
155
156### Requirements
157
1581. All virtual registers should explicitly and precisely distinct between garbage collectable
159   objects and non-garbage collectable primitives to simplify automatic memory management.
1601. Virtual registers should be able to hold following types: unsigned and signed integers with
161   the size of up to 64 bits, floating point numbers of single and double precision, raw pointers
162   to the objects on 32-bit and 64-bit architectures.
1631. Virtual stack should abstract limitations possibly imposed by the host stack.
164
165### Key Design Decisions
166
1671. All virtual registers are "tagged", meaning that they contain both the payload and additional
168   metadata (a "tag"), which allows at least distinguish between garbage collectable and other
169   types.
1701. The size of a virtual register is not hardcoded by design. Currently it is always 128 bits,
171   but this is an implementation detail which is hidden behind the interfaces
172   and can be reduced for memory-constrained targets and to 64 bits (using NaN-tagging technique).
1731. By default virtual stack is not mapped to a host stack, instead it is allocated on heap using
174   platform's memory management facilities. It is important that this behavior is not
175   hardcoded by design, it can be reconfigured for platforms where performance may benefit from
176   mapping virtual stack directly to the host stack.
177
178### Rationale
179
1801. Although tagged virtual registers occupy more memory (especially on 64-bit architectures),
181   redundant memory consumption is cheaper than ongoing runtime penalties on garbage collector
182   trying to distinguish between objects from non-objects on an "imprecise" stack.
1831. Where does 128 comes from? It is `128 = 64 + 64`. The first 64 is either `sizeof(long int)`
184   or `sizeof(double)` or `sizeof(void *)` on a 64-bit architecture (i.e. theoretical maximum
185   size of the payload we are required to store in a virtual register). The second 64 is for tag
186   and padding. A lot of free space is expected to be in the padding area. Probably we may use it
187   for memory management or some other needs.
1881. Enforcing a virtual stack to mapped to the host stack faces us with some unpleasant constraints
189   on IoT devices. On such devices the size of the host stack may be severely limited: as a result,
190   managed application have to think about this limitation, which contradicts the idea of
191   portability of managed applications. Configurable virtual stack implementation relaxes this
192   constraint, which can be relaxed even further with the stackless interpreter (see above).
193
194### Specification / Implementation
195
196Please find the reference implementation [here](../runtime/interpreter/frame.h).
197
198## Quality Assurance
199
200### Requirements
201
2021. Interpreter should allow testing without involving front-ends of concrete languages.
2031. Interpreter should allow early testing.
2041. Interpreter should allow early benchmarking.
205
206### Key Design Decisions
207
2081. A light-weight Panda Assembly language is developed, along with the Panda Assembler tool.
2091. A compliance test suite for the Panda Assembly language is created. The core part of the suite
210   are small chunks of hand-written Panda Assembly covering corner cases, while the majority of
211   cases are covered by automatically generated chunks.
2121. A set of benchmarks is ported to Panda Assembly and maintained as a part of the source tree.
213
214### Rationale
215
216A general overview of managed assembly languages can be found [here](overview-of-managed-assemblers.md).
217
218Lots of things are being created in parallel, and currently there is no stable front-end for Panda
219for any high-level language. However, once they appear, front-end engineer will obviously want a
220stable interpreter to run their code. At the same time, interpreter developers need some tools
221beside too granular unit tests to ensure quality of the interpreter and companion components
222described above.
223
224### Specification / Implementation
225
226Please find the specification [here](assembly_format.md).
227
228Please find the implementation [here](../assembler).
229