1b1994897Sopenharmony_ci# On-Stack Replacement
2b1994897Sopenharmony_ci
3b1994897Sopenharmony_ci### Overview
4b1994897Sopenharmony_ci
5b1994897Sopenharmony_ciOn-Stack Replacement (OSR) is a technique for switching between different implementations of the same function.
6b1994897Sopenharmony_ci
7b1994897Sopenharmony_ciUnder the OSR, we mean the transition from interpreter code to optimized code. Opposite transition - from optimized to
8b1994897Sopenharmony_ciunoptimized - we call `Deoptimization`.
9b1994897Sopenharmony_ci
10b1994897Sopenharmony_ciOSR workflow:
11b1994897Sopenharmony_ci```
12b1994897Sopenharmony_ci                                    +-----------------------+
13b1994897Sopenharmony_ci                                    |                       |
14b1994897Sopenharmony_ci                                    |     Interpreter       |
15b1994897Sopenharmony_ci                                    |                       |
16b1994897Sopenharmony_ci                                    +-----------------------+
17b1994897Sopenharmony_ci    Method::osr_code                            |
18b1994897Sopenharmony_ci    +------------------------+                  |
19b1994897Sopenharmony_ci    | Method Prologue        |                  V
20b1994897Sopenharmony_ci    +------------------------+         +-----------------+
21b1994897Sopenharmony_ci    | mov x10, 0             |         |OsrEntry         |
22b1994897Sopenharmony_ci    | mov d4, 3.14           |         +-----------------+
23b1994897Sopenharmony_ci    |                        |                  |
24b1994897Sopenharmony_ci    |                        |                  +---------------------+
25b1994897Sopenharmony_ci    |        . . .           |                  |                     V
26b1994897Sopenharmony_ci    |                        |                  |            +-------------------+
27b1994897Sopenharmony_ci    | osr_entry_1:           |                  |            |  PrepareOsrEntry  |
28b1994897Sopenharmony_ci+-->|------------------------|                  |            |(fill CFrame from  |
29b1994897Sopenharmony_ci|   |  Loop 2                |                  |            | OsrStateStamp)    |
30b1994897Sopenharmony_ci|   |                        |                  |            +-------------------+
31b1994897Sopenharmony_ci|   |                        |                  |   CFrame          |       ^
32b1994897Sopenharmony_ci|   |------------------------|                  |<------------------+       |
33b1994897Sopenharmony_ci|   |        . . .           |                  |                           |
34b1994897Sopenharmony_ci|   |                        |                  |       OsrStateStamp       |
35b1994897Sopenharmony_ci|   |------------------------|                  |      +-----------------------------------+
36b1994897Sopenharmony_ci|   | Method epilogue        |                  |      |native_pc   : INVALID              |
37b1994897Sopenharmony_ci|   |------------------------|                  |      |bytecode_pc : offsetof osr_entry_1 |
38b1994897Sopenharmony_ci|   | OSR Stub 1:            |<-----------------+      |osr_entry   : osr_code+bytecode_pc |
39b1994897Sopenharmony_ci|   | mov x10, 0             |                         |vregs[]     : vreg1=Slot(2)        |
40b1994897Sopenharmony_ci|   | mov d4, 3.14           |                         |              vreg4=CpuReg(8)      |
41b1994897Sopenharmony_ci+---| jump osr_entry_1       |                         +-----------------------------------+
42b1994897Sopenharmony_ci    +------------------------+
43b1994897Sopenharmony_ci```
44b1994897Sopenharmony_ci
45b1994897Sopenharmony_ci### Triggering
46b1994897Sopenharmony_ci
47b1994897Sopenharmony_ciBoth, OSR and regular compilation use the same hotness counter. First time, when counter is overflowed we look 
48b1994897Sopenharmony_ciwhether method is already compiled or not. If not, we start compilation in regular mode. Otherwise, we compile
49b1994897Sopenharmony_cimethod in OSR mode.
50b1994897Sopenharmony_ci
51b1994897Sopenharmony_ciOnce compilation is triggered and OSR compiled code is already set, we begin On-Stack Replacement procedure.
52b1994897Sopenharmony_ci
53b1994897Sopenharmony_ciTriggering workflow:
54b1994897Sopenharmony_ci
55b1994897Sopenharmony_ci![triggering_scheme](images/osr_trigger.png)
56b1994897Sopenharmony_ci
57b1994897Sopenharmony_ci### Compilation
58b1994897Sopenharmony_ci
59b1994897Sopenharmony_ciJIT compiles the whole OSR-method the same way it compiles a hot method.
60b1994897Sopenharmony_ci
61b1994897Sopenharmony_ciTo ensure all loops in the compiled code may be entered from the interpreter, we need to avoid loop-optimizations.
62b1994897Sopenharmony_ciIn OSR-methods special osr-entry flag is added to the loop-header basic blocks and some optimizations have to skip
63b1994897Sopenharmony_cisuch loops.
64b1994897Sopenharmony_ci
65b1994897Sopenharmony_ciThere are no restrictions for inlining: methods can be inlined in a general way and all loop-optimizations are
66b1994897Sopenharmony_ciapplicable for them, because methods' loop-headers are not marked as osr-entry.
67b1994897Sopenharmony_ci
68b1994897Sopenharmony_ciNew pseudo-instruction is introduced: SaveStateOsr - instruction should be the first one in each loop-header basic block
69b1994897Sopenharmony_ciwith true osr-entry flag.
70b1994897Sopenharmony_ciThis instruction contains information about all live virtual registers at the enter to the loop.
71b1994897Sopenharmony_ciCodegen creates special OsrStackMap for each SaveStateOsr instruction. Difference from regular stackmap is that it has
72b1994897Sopenharmony_ci`osr entry bytecode offset` field.
73b1994897Sopenharmony_ci
74b1994897Sopenharmony_ci### Metainfo
75b1994897Sopenharmony_ci
76b1994897Sopenharmony_ciOn each OSR entry, we need to restore execution context.
77b1994897Sopenharmony_ciTo do this, we need to know all live virtual registers at this moment.
78b1994897Sopenharmony_ciFor this purpose new stackmap and new opcode were introduced.
79b1994897Sopenharmony_ci 
80b1994897Sopenharmony_ciNew opcode(OsrSaveState) has the same properties as regular SaveState, except that codegen handles them differently.
81b1994897Sopenharmony_ciNo code is generated in place of OsrSaveState, but a special OsrEntryStub entity is created,
82b1994897Sopenharmony_ciwhich is necessary to generate an OSR entry code.
83b1994897Sopenharmony_ci
84b1994897Sopenharmony_ciOsrEntryStub does the following:
85b1994897Sopenharmony_ci1. move all constants to the cpu registers or frame slots by inserting move or store instructions
86b1994897Sopenharmony_ci2. encodes jump instruction to the head of the loop where the corresponding OsrSaveState is located
87b1994897Sopenharmony_ci
88b1994897Sopenharmony_ciThe first point is necessary because the Panda compiler can place some constants in the cpu registers,
89b1994897Sopenharmony_cibut the constants themselves are not virtual registers and won't be stored in the metainfo.
90b1994897Sopenharmony_ciAccordingly, they need to be restored back to the CPU registers or frame slots.
91b1994897Sopenharmony_ci
92b1994897Sopenharmony_ciOsr stackmaps (OsrStateStamp) are needed to restore virtual registers.
93b1994897Sopenharmony_ciEach OsrStateStamp is linked to specific bytecode offset, which is offset to the first instruction of the loop.
94b1994897Sopenharmony_ciStackmap contains all needed information to convert IFrame to CFrame.
95b1994897Sopenharmony_ci
96b1994897Sopenharmony_ci### Frame replacement
97b1994897Sopenharmony_ci
98b1994897Sopenharmony_ciSince Panda Interpreter is written in the C++ language, we haven't access to its stack. Thus, we can't just replace
99b1994897Sopenharmony_ciinterpreter frame by cframe on the stack. When OSR is occurred we call OSR compiled code, and once it finishes execution
100b1994897Sopenharmony_ciwe return `true` to the Interpreter. Interpreter, in turn, execute fake `return` instruction to exit from the execution
101b1994897Sopenharmony_ciprocedure.
102b1994897Sopenharmony_ci
103b1994897Sopenharmony_ciPseudocode:
104b1994897Sopenharmony_ci```python
105b1994897Sopenharmony_cidef interpreter_work():
106b1994897Sopenharmony_ci    switch(current_inst):
107b1994897Sopenharmony_ci        case Return:
108b1994897Sopenharmony_ci            return
109b1994897Sopenharmony_ci        case Jump:
110b1994897Sopenharmony_ci            if target < current_inst.offset:
111b1994897Sopenharmony_ci                if update_hotness(method, current_inst.bytecode_offset):
112b1994897Sopenharmony_ci                    set_current_inst(Return)
113b1994897Sopenharmony_ci        ...
114b1994897Sopenharmony_ci
115b1994897Sopenharmony_cidef update_hotness(method: Method*, bytecode_offset: int) -> bool:
116b1994897Sopenharmony_ci    hotness_counter += 1
117b1994897Sopenharmony_ci    return false if hotness_counter < threshold:
118b1994897Sopenharmony_ci
119b1994897Sopenharmony_ci    if method.HasOsrCode():
120b1994897Sopenharmony_ci        return OsrEntry(method, bytecode_offset)
121b1994897Sopenharmony_ci    
122b1994897Sopenharmony_ci    ... # run compilation, see Triggering for more information
123b1994897Sopenharmony_ci
124b1994897Sopenharmony_ci    return false
125b1994897Sopenharmony_ci
126b1994897Sopenharmony_cidef osr_entry(method: Method*, bytecode_offset: int) -> bool:
127b1994897Sopenharmony_ci    stamp = Metainfo.find_stamp(bytecode_offset)
128b1994897Sopenharmony_ci    return false if not stamp
129b1994897Sopenharmony_ci
130b1994897Sopenharmony_ci    # Call assembly functions to do OSR magic
131b1994897Sopenharmony_ci
132b1994897Sopenharmony_ci    return true
133b1994897Sopenharmony_ci```
134b1994897Sopenharmony_ci
135b1994897Sopenharmony_ciMost part of the OSR entry is written in an assembly language, because CFrame is resided in the native stack.
136b1994897Sopenharmony_ci
137b1994897Sopenharmony_ciOsr Entry can occur in three different contexts according to the previous frame's kind:
138b1994897Sopenharmony_ci1. **Previous frame is CFrame**
139b1994897Sopenharmony_ci
140b1994897Sopenharmony_ci    Before: cframe->c2i->iframe
141b1994897Sopenharmony_ci
142b1994897Sopenharmony_ci    After: cframe->cframe'
143b1994897Sopenharmony_ci
144b1994897Sopenharmony_ci    New cframe is created in place of `c2i` frame, which is just dropped
145b1994897Sopenharmony_ci
146b1994897Sopenharmony_ci2. **Previous frame is IFrame**
147b1994897Sopenharmony_ci
148b1994897Sopenharmony_ci    Before: iframe->iframe
149b1994897Sopenharmony_ci
150b1994897Sopenharmony_ci    After: iframe->i2c->cframe'
151b1994897Sopenharmony_ci
152b1994897Sopenharmony_ci    New cframe is created in the current stack position. But before it we need to insert i2c bridge.
153b1994897Sopenharmony_ci
154b1994897Sopenharmony_ci3. **Previous frame is null(current frame is the top frame)**
155b1994897Sopenharmony_ci
156b1994897Sopenharmony_ci    Before: iframe
157b1994897Sopenharmony_ci
158b1994897Sopenharmony_ci    After: cframe'
159b1994897Sopenharmony_ci
160b1994897Sopenharmony_cic2i - compiled to interpreter code bridge
161b1994897Sopenharmony_ci
162b1994897Sopenharmony_cii2c - interpreter to compiled code bridge
163b1994897Sopenharmony_ci
164b1994897Sopenharmony_cicframe' - new cframe, converted from iframe