1b1994897Sopenharmony_ci# Interpreter Design 2b1994897Sopenharmony_ci 3b1994897Sopenharmony_ci## Introduction 4b1994897Sopenharmony_ci 5b1994897Sopenharmony_ciThis document outlines the key design decisions in the interpreter and its companion components: 6b1994897Sopenharmony_cibytecode instruction set, executable file format, etc. Each subsection below consists of following 7b1994897Sopenharmony_ciparts: 8b1994897Sopenharmony_ci 9b1994897Sopenharmony_ci| Title | Description | 10b1994897Sopenharmony_ci| ------------------------------- | ---------------------------------------------------------- | 11b1994897Sopenharmony_ci| Requirements | Enlists the requirements to the component. | 12b1994897Sopenharmony_ci| Key Design Decisions | Summarizes the key decisions to address the requirements. | 13b1994897Sopenharmony_ci| Rationale | Elaborates on the rationales behind the decisions. | 14b1994897Sopenharmony_ci| Specification / Implementation | Provides relevant links. | 15b1994897Sopenharmony_ci 16b1994897Sopenharmony_ciPlease refer to the [glossary](glossary.md) for terminology clarification. 17b1994897Sopenharmony_ci 18b1994897Sopenharmony_ci### Common Requirements 19b1994897Sopenharmony_ci 20b1994897Sopenharmony_ciThis section outlines common requirements that should be considered while designing the interpreter 21b1994897Sopenharmony_ciand all related components of the platform: 22b1994897Sopenharmony_ci 23b1994897Sopenharmony_ci1. The platform should scale from microcontrollers to hi-end mobile phones: 24b1994897Sopenharmony_ci 1. It should fit into 50Kb of ROM. 25b1994897Sopenharmony_ci 1. It should be able to run consuming 64Kb of RAM. 26b1994897Sopenharmony_ci1. Program execution via bytecode interpretation should be enabled on all targets. 27b1994897Sopenharmony_ci1. The platform should support multiple programming languages 28b1994897Sopenharmony_ci 29b1994897Sopenharmony_ci## Bytecode 30b1994897Sopenharmony_ci 31b1994897Sopenharmony_ci### Requirements 32b1994897Sopenharmony_ci 33b1994897Sopenharmony_ci1. Bytecode should allow the interpreter to run no slower than state of the art interpreters. 34b1994897Sopenharmony_ci1. Bytecode should be compact in size to avoid bloating application code. 35b1994897Sopenharmony_ci1. Bytecode description should have a single entry point to simplify maintenance 36b1994897Sopenharmony_ci across all components of the platform. 37b1994897Sopenharmony_ci 38b1994897Sopenharmony_ci### Key Design Decisions 39b1994897Sopenharmony_ci 40b1994897Sopenharmony_ci1. Bytecode is register-based: all arguments and variables are mapped to virtual registers, 41b1994897Sopenharmony_ci and most of bytecodes encode virtual registers as operands. 42b1994897Sopenharmony_ci1. There is a dedicated register called accumulator, which is addressed implicitly by some 43b1994897Sopenharmony_ci bytecodes and shared across all function frames during runtime. 44b1994897Sopenharmony_ci1. Bytecode's instruction set architecture is machine-readable with a dedicated API for 45b1994897Sopenharmony_ci code and documentation generation. 46b1994897Sopenharmony_ci 47b1994897Sopenharmony_ci### Rationale 48b1994897Sopenharmony_ci 49b1994897Sopenharmony_ciFor the rationale on the bytecode type, please see [here](rationale-for-bytecode.md). 50b1994897Sopenharmony_ci 51b1994897Sopenharmony_ciRationale on the machine-readable instruction set architecture is following. Since bytecode 52b1994897Sopenharmony_ciis the main form of program representation, information about it is needed in many components of 53b1994897Sopenharmony_cithe platform outside the interpreter. Having this crucial information copy-pasted or delivered as 54b1994897Sopenharmony_cia bunch of C/C++ headers is very fragile and error-prone. At the same time, 55b1994897Sopenharmony_ciwith the machine-readable ISA we can re-use this information in many places already now, e.g.: 56b1994897Sopenharmony_ci 57b1994897Sopenharmony_ci* In Panda Assembler's back-end, we automatically generate emission of bytecode in the binary form. 58b1994897Sopenharmony_ci* Converter from bytecode to the compiler's intermediate representation is partially implemented 59b1994897Sopenharmony_ci with code autogenerated from the ISA. 60b1994897Sopenharmony_ci 61b1994897Sopenharmony_ciBesides, the machine-readable form naturally sets up the framework for self-testing (e.g. 62b1994897Sopenharmony_cidefinitions were defined and used, different parts of instruction description doesn't contradict 63b1994897Sopenharmony_cithemselves, etc.). 64b1994897Sopenharmony_ci 65b1994897Sopenharmony_ci### Specification / Implementation 66b1994897Sopenharmony_ci 67b1994897Sopenharmony_ciPlease find the implementation of the instruction set architecture [here](../isa/isa.yaml). 68b1994897Sopenharmony_ci 69b1994897Sopenharmony_ci## Executable File Format 70b1994897Sopenharmony_ci 71b1994897Sopenharmony_ci### Requirements 72b1994897Sopenharmony_ci 73b1994897Sopenharmony_ci1. All entities in the executable file should be encoded and stored compactly to avoid bloating 74b1994897Sopenharmony_ci application code. 75b1994897Sopenharmony_ci1. Format should enforce as "flat" layout as possible: referencing metadata from other metadata 76b1994897Sopenharmony_ci should be kept at minimum to reduce access overhead during runtime. 77b1994897Sopenharmony_ci1. Runtime memory footprint of executable files should be low. 78b1994897Sopenharmony_ci 79b1994897Sopenharmony_ci### Key Design Decisions 80b1994897Sopenharmony_ci 81b1994897Sopenharmony_ci1. The entire code of the application (excluding frameworks and external libraries) fits into a 82b1994897Sopenharmony_ci single file to gain maximum benefits from deduplicating constant string pools, information 83b1994897Sopenharmony_ci about types, etc. 84b1994897Sopenharmony_ci1. All metadata entities are split into two groups: local (declared in the current executable file) 85b1994897Sopenharmony_ci and foreign (declared elsewhere). Local entities can be accessed directly by the offset 86b1994897Sopenharmony_ci in the offset. Additionally, 4-byte alignment is enforced for the most of data structures for 87b1994897Sopenharmony_ci more efficient data reads from the Panda binary file. Foreign entities are loaded from their 88b1994897Sopenharmony_ci respective files on demand. 89b1994897Sopenharmony_ci1. The format uses raw offsets as the main access method to the actual data and doesn't 90b1994897Sopenharmony_ci require explicitly how structures should be located relative to each other. 91b1994897Sopenharmony_ci 92b1994897Sopenharmony_ci### Rationale 93b1994897Sopenharmony_ci 94b1994897Sopenharmony_ciAccording to our measurements of the most popular 200 Chinese mobile applications, 95b1994897Sopenharmony_ci90% of them do not fit into a single mobile executable already now. Having multiple executables 96b1994897Sopenharmony_cifor the same application introduces extra duplication of metadata and implies extra burden on 97b1994897Sopenharmony_citooling. 98b1994897Sopenharmony_ci 99b1994897Sopenharmony_ciOur aim is to address these issues by having a single file for application code. This, however, 100b1994897Sopenharmony_cimay introduce a new issue: a single file will require larger identifiers, which obviously consume 101b1994897Sopenharmony_cimore space. As a solution, our file format will support variable length of identifiers to 102b1994897Sopenharmony_ciselect the optimal size based on actual code base. 103b1994897Sopenharmony_ci 104b1994897Sopenharmony_ciTo enable even more compact size of resulting binaries, we will compress it with the 105b1994897Sopenharmony_ci`zstd-19` algorithm. According to our research, it decreases file size by 21% and 106b1994897Sopenharmony_cioperating by 9% faster than `gzip`. 107b1994897Sopenharmony_ci 108b1994897Sopenharmony_ci### Specification / Implementation 109b1994897Sopenharmony_ci 110b1994897Sopenharmony_ciPlease find the specification [here](file_format.md). 111b1994897Sopenharmony_ci 112b1994897Sopenharmony_ci## Interpreter 113b1994897Sopenharmony_ci 114b1994897Sopenharmony_ci### Requirements 115b1994897Sopenharmony_ci 116b1994897Sopenharmony_ci1. Interpreter should run no slower than state of the art interpreters. 117b1994897Sopenharmony_ci1. Interpreter should be portable enough to run on targets from IoT devices 118b1994897Sopenharmony_ci to hi-end mobile phones. 119b1994897Sopenharmony_ci1. Interpreter should not create extra pressure on the host system. 120b1994897Sopenharmony_ci 121b1994897Sopenharmony_ci### Key Design Decisions 122b1994897Sopenharmony_ci 123b1994897Sopenharmony_ci1. Interpreter uses indirect threaded dispatch technique 124b1994897Sopenharmony_ci (implemented via computed goto) to reduce dispatch overhead. 125b1994897Sopenharmony_ci1. Interpreter does not depend on C++ standard library. All necessary classes, containers, etc. 126b1994897Sopenharmony_ci are reimplemented by the platform. 127b1994897Sopenharmony_ci1. Interpreter is stackless (from the host stack perspective): Whenever a call from managed code 128b1994897Sopenharmony_ci to managed code is performed, no new host frame is created for the interpreter itself. 129b1994897Sopenharmony_ci 130b1994897Sopenharmony_ci### Rationale 131b1994897Sopenharmony_ci 132b1994897Sopenharmony_ci1. Interpreters are by nature slower than native code execution. Slowdown can be explained by: 133b1994897Sopenharmony_ci 1. The inevitable extra cost that interpreters pay for decoding and dispatching bytecode 134b1994897Sopenharmony_ci instructions. 135b1994897Sopenharmony_ci 1. The "heaviness" of language semantics that interpreters should implement. 136b1994897Sopenharmony_ci 1. The "nativeness" of the language implementation to the platform. A language that is 137b1994897Sopenharmony_ci implemented in another language that runs on the platform may run slower because of 138b1994897Sopenharmony_ci additional runtime overhead. 139b1994897Sopenharmony_ci1. An ideal target would be 5x-10x slowdown factor (compared to native execution) 140b1994897Sopenharmony_ci for statically typed languages that run on the platform natively, and 15x-20x for L2 languages. 141b1994897Sopenharmony_ci1. Panda should scale onto a wide range of devices, including IoT devices. Although more and 142b1994897Sopenharmony_ci more toolchains support C++ compilation for IoT, the standard library is often not present 143b1994897Sopenharmony_ci on the device. Since static linking with a subset of the library is a pain and may not guarantee 144b1994897Sopenharmony_ci optimal size of resulting native binary executable files, it is more reasonable to reimplement 145b1994897Sopenharmony_ci required parts. 146b1994897Sopenharmony_ci1. According to our experiments, a stackless interpreter for a stack-based bytecode (which is by 147b1994897Sopenharmony_ci nature slower) can even beat sometimes a non-stackless interpreter for a register-based 148b1994897Sopenharmony_ci bytecode (which is by nature faster). 149b1994897Sopenharmony_ci 150b1994897Sopenharmony_ci### Specification / Implementation 151b1994897Sopenharmony_ci 152b1994897Sopenharmony_ciPlease find the reference implementation [here](../runtime/interpreter). 153b1994897Sopenharmony_ci 154b1994897Sopenharmony_ci## Virtual Stack 155b1994897Sopenharmony_ci 156b1994897Sopenharmony_ci### Requirements 157b1994897Sopenharmony_ci 158b1994897Sopenharmony_ci1. All virtual registers should explicitly and precisely distinct between garbage collectable 159b1994897Sopenharmony_ci objects and non-garbage collectable primitives to simplify automatic memory management. 160b1994897Sopenharmony_ci1. Virtual registers should be able to hold following types: unsigned and signed integers with 161b1994897Sopenharmony_ci the size of up to 64 bits, floating point numbers of single and double precision, raw pointers 162b1994897Sopenharmony_ci to the objects on 32-bit and 64-bit architectures. 163b1994897Sopenharmony_ci1. Virtual stack should abstract limitations possibly imposed by the host stack. 164b1994897Sopenharmony_ci 165b1994897Sopenharmony_ci### Key Design Decisions 166b1994897Sopenharmony_ci 167b1994897Sopenharmony_ci1. All virtual registers are "tagged", meaning that they contain both the payload and additional 168b1994897Sopenharmony_ci metadata (a "tag"), which allows at least distinguish between garbage collectable and other 169b1994897Sopenharmony_ci types. 170b1994897Sopenharmony_ci1. The size of a virtual register is not hardcoded by design. Currently it is always 128 bits, 171b1994897Sopenharmony_ci but this is an implementation detail which is hidden behind the interfaces 172b1994897Sopenharmony_ci and can be reduced for memory-constrained targets and to 64 bits (using NaN-tagging technique). 173b1994897Sopenharmony_ci1. By default virtual stack is not mapped to a host stack, instead it is allocated on heap using 174b1994897Sopenharmony_ci platform's memory management facilities. It is important that this behavior is not 175b1994897Sopenharmony_ci hardcoded by design, it can be reconfigured for platforms where performance may benefit from 176b1994897Sopenharmony_ci mapping virtual stack directly to the host stack. 177b1994897Sopenharmony_ci 178b1994897Sopenharmony_ci### Rationale 179b1994897Sopenharmony_ci 180b1994897Sopenharmony_ci1. Although tagged virtual registers occupy more memory (especially on 64-bit architectures), 181b1994897Sopenharmony_ci redundant memory consumption is cheaper than ongoing runtime penalties on garbage collector 182b1994897Sopenharmony_ci trying to distinguish between objects from non-objects on an "imprecise" stack. 183b1994897Sopenharmony_ci1. Where does 128 comes from? It is `128 = 64 + 64`. The first 64 is either `sizeof(long int)` 184b1994897Sopenharmony_ci or `sizeof(double)` or `sizeof(void *)` on a 64-bit architecture (i.e. theoretical maximum 185b1994897Sopenharmony_ci size of the payload we are required to store in a virtual register). The second 64 is for tag 186b1994897Sopenharmony_ci and padding. A lot of free space is expected to be in the padding area. Probably we may use it 187b1994897Sopenharmony_ci for memory management or some other needs. 188b1994897Sopenharmony_ci1. Enforcing a virtual stack to mapped to the host stack faces us with some unpleasant constraints 189b1994897Sopenharmony_ci on IoT devices. On such devices the size of the host stack may be severely limited: as a result, 190b1994897Sopenharmony_ci managed application have to think about this limitation, which contradicts the idea of 191b1994897Sopenharmony_ci portability of managed applications. Configurable virtual stack implementation relaxes this 192b1994897Sopenharmony_ci constraint, which can be relaxed even further with the stackless interpreter (see above). 193b1994897Sopenharmony_ci 194b1994897Sopenharmony_ci### Specification / Implementation 195b1994897Sopenharmony_ci 196b1994897Sopenharmony_ciPlease find the reference implementation [here](../runtime/interpreter/frame.h). 197b1994897Sopenharmony_ci 198b1994897Sopenharmony_ci## Quality Assurance 199b1994897Sopenharmony_ci 200b1994897Sopenharmony_ci### Requirements 201b1994897Sopenharmony_ci 202b1994897Sopenharmony_ci1. Interpreter should allow testing without involving front-ends of concrete languages. 203b1994897Sopenharmony_ci1. Interpreter should allow early testing. 204b1994897Sopenharmony_ci1. Interpreter should allow early benchmarking. 205b1994897Sopenharmony_ci 206b1994897Sopenharmony_ci### Key Design Decisions 207b1994897Sopenharmony_ci 208b1994897Sopenharmony_ci1. A light-weight Panda Assembly language is developed, along with the Panda Assembler tool. 209b1994897Sopenharmony_ci1. A compliance test suite for the Panda Assembly language is created. The core part of the suite 210b1994897Sopenharmony_ci are small chunks of hand-written Panda Assembly covering corner cases, while the majority of 211b1994897Sopenharmony_ci cases are covered by automatically generated chunks. 212b1994897Sopenharmony_ci1. A set of benchmarks is ported to Panda Assembly and maintained as a part of the source tree. 213b1994897Sopenharmony_ci 214b1994897Sopenharmony_ci### Rationale 215b1994897Sopenharmony_ci 216b1994897Sopenharmony_ciA general overview of managed assembly languages can be found [here](overview-of-managed-assemblers.md). 217b1994897Sopenharmony_ci 218b1994897Sopenharmony_ciLots of things are being created in parallel, and currently there is no stable front-end for Panda 219b1994897Sopenharmony_cifor any high-level language. However, once they appear, front-end engineer will obviously want a 220b1994897Sopenharmony_cistable interpreter to run their code. At the same time, interpreter developers need some tools 221b1994897Sopenharmony_cibeside too granular unit tests to ensure quality of the interpreter and companion components 222b1994897Sopenharmony_cidescribed above. 223b1994897Sopenharmony_ci 224b1994897Sopenharmony_ci### Specification / Implementation 225b1994897Sopenharmony_ci 226b1994897Sopenharmony_ciPlease find the specification [here](assembly_format.md). 227b1994897Sopenharmony_ci 228b1994897Sopenharmony_ciPlease find the implementation [here](../assembler). 229