1b1994897Sopenharmony_ci# On-Stack Replacement 2b1994897Sopenharmony_ci 3b1994897Sopenharmony_ci### Overview 4b1994897Sopenharmony_ci 5b1994897Sopenharmony_ciOn-Stack Replacement (OSR) is a technique for switching between different implementations of the same function. 6b1994897Sopenharmony_ci 7b1994897Sopenharmony_ciUnder the OSR, we mean the transition from interpreter code to optimized code. Opposite transition - from optimized to 8b1994897Sopenharmony_ciunoptimized - we call `Deoptimization`. 9b1994897Sopenharmony_ci 10b1994897Sopenharmony_ciOSR workflow: 11b1994897Sopenharmony_ci``` 12b1994897Sopenharmony_ci +-----------------------+ 13b1994897Sopenharmony_ci | | 14b1994897Sopenharmony_ci | Interpreter | 15b1994897Sopenharmony_ci | | 16b1994897Sopenharmony_ci +-----------------------+ 17b1994897Sopenharmony_ci Method::osr_code | 18b1994897Sopenharmony_ci +------------------------+ | 19b1994897Sopenharmony_ci | Method Prologue | V 20b1994897Sopenharmony_ci +------------------------+ +-----------------+ 21b1994897Sopenharmony_ci | mov x10, 0 | |OsrEntry | 22b1994897Sopenharmony_ci | mov d4, 3.14 | +-----------------+ 23b1994897Sopenharmony_ci | | | 24b1994897Sopenharmony_ci | | +---------------------+ 25b1994897Sopenharmony_ci | . . . | | V 26b1994897Sopenharmony_ci | | | +-------------------+ 27b1994897Sopenharmony_ci | osr_entry_1: | | | PrepareOsrEntry | 28b1994897Sopenharmony_ci+-->|------------------------| | |(fill CFrame from | 29b1994897Sopenharmony_ci| | Loop 2 | | | OsrStateStamp) | 30b1994897Sopenharmony_ci| | | | +-------------------+ 31b1994897Sopenharmony_ci| | | | CFrame | ^ 32b1994897Sopenharmony_ci| |------------------------| |<------------------+ | 33b1994897Sopenharmony_ci| | . . . | | | 34b1994897Sopenharmony_ci| | | | OsrStateStamp | 35b1994897Sopenharmony_ci| |------------------------| | +-----------------------------------+ 36b1994897Sopenharmony_ci| | Method epilogue | | |native_pc : INVALID | 37b1994897Sopenharmony_ci| |------------------------| | |bytecode_pc : offsetof osr_entry_1 | 38b1994897Sopenharmony_ci| | OSR Stub 1: |<-----------------+ |osr_entry : osr_code+bytecode_pc | 39b1994897Sopenharmony_ci| | mov x10, 0 | |vregs[] : vreg1=Slot(2) | 40b1994897Sopenharmony_ci| | mov d4, 3.14 | | vreg4=CpuReg(8) | 41b1994897Sopenharmony_ci+---| jump osr_entry_1 | +-----------------------------------+ 42b1994897Sopenharmony_ci +------------------------+ 43b1994897Sopenharmony_ci``` 44b1994897Sopenharmony_ci 45b1994897Sopenharmony_ci### Triggering 46b1994897Sopenharmony_ci 47b1994897Sopenharmony_ciBoth, OSR and regular compilation use the same hotness counter. First time, when counter is overflowed we look 48b1994897Sopenharmony_ciwhether method is already compiled or not. If not, we start compilation in regular mode. Otherwise, we compile 49b1994897Sopenharmony_cimethod in OSR mode. 50b1994897Sopenharmony_ci 51b1994897Sopenharmony_ciOnce compilation is triggered and OSR compiled code is already set, we begin On-Stack Replacement procedure. 52b1994897Sopenharmony_ci 53b1994897Sopenharmony_ciTriggering workflow: 54b1994897Sopenharmony_ci 55b1994897Sopenharmony_ci 56b1994897Sopenharmony_ci 57b1994897Sopenharmony_ci### Compilation 58b1994897Sopenharmony_ci 59b1994897Sopenharmony_ciJIT compiles the whole OSR-method the same way it compiles a hot method. 60b1994897Sopenharmony_ci 61b1994897Sopenharmony_ciTo ensure all loops in the compiled code may be entered from the interpreter, we need to avoid loop-optimizations. 62b1994897Sopenharmony_ciIn OSR-methods special osr-entry flag is added to the loop-header basic blocks and some optimizations have to skip 63b1994897Sopenharmony_cisuch loops. 64b1994897Sopenharmony_ci 65b1994897Sopenharmony_ciThere are no restrictions for inlining: methods can be inlined in a general way and all loop-optimizations are 66b1994897Sopenharmony_ciapplicable for them, because methods' loop-headers are not marked as osr-entry. 67b1994897Sopenharmony_ci 68b1994897Sopenharmony_ciNew pseudo-instruction is introduced: SaveStateOsr - instruction should be the first one in each loop-header basic block 69b1994897Sopenharmony_ciwith true osr-entry flag. 70b1994897Sopenharmony_ciThis instruction contains information about all live virtual registers at the enter to the loop. 71b1994897Sopenharmony_ciCodegen creates special OsrStackMap for each SaveStateOsr instruction. Difference from regular stackmap is that it has 72b1994897Sopenharmony_ci`osr entry bytecode offset` field. 73b1994897Sopenharmony_ci 74b1994897Sopenharmony_ci### Metainfo 75b1994897Sopenharmony_ci 76b1994897Sopenharmony_ciOn each OSR entry, we need to restore execution context. 77b1994897Sopenharmony_ciTo do this, we need to know all live virtual registers at this moment. 78b1994897Sopenharmony_ciFor this purpose new stackmap and new opcode were introduced. 79b1994897Sopenharmony_ci 80b1994897Sopenharmony_ciNew opcode(OsrSaveState) has the same properties as regular SaveState, except that codegen handles them differently. 81b1994897Sopenharmony_ciNo code is generated in place of OsrSaveState, but a special OsrEntryStub entity is created, 82b1994897Sopenharmony_ciwhich is necessary to generate an OSR entry code. 83b1994897Sopenharmony_ci 84b1994897Sopenharmony_ciOsrEntryStub does the following: 85b1994897Sopenharmony_ci1. move all constants to the cpu registers or frame slots by inserting move or store instructions 86b1994897Sopenharmony_ci2. encodes jump instruction to the head of the loop where the corresponding OsrSaveState is located 87b1994897Sopenharmony_ci 88b1994897Sopenharmony_ciThe first point is necessary because the Panda compiler can place some constants in the cpu registers, 89b1994897Sopenharmony_cibut the constants themselves are not virtual registers and won't be stored in the metainfo. 90b1994897Sopenharmony_ciAccordingly, they need to be restored back to the CPU registers or frame slots. 91b1994897Sopenharmony_ci 92b1994897Sopenharmony_ciOsr stackmaps (OsrStateStamp) are needed to restore virtual registers. 93b1994897Sopenharmony_ciEach OsrStateStamp is linked to specific bytecode offset, which is offset to the first instruction of the loop. 94b1994897Sopenharmony_ciStackmap contains all needed information to convert IFrame to CFrame. 95b1994897Sopenharmony_ci 96b1994897Sopenharmony_ci### Frame replacement 97b1994897Sopenharmony_ci 98b1994897Sopenharmony_ciSince Panda Interpreter is written in the C++ language, we haven't access to its stack. Thus, we can't just replace 99b1994897Sopenharmony_ciinterpreter frame by cframe on the stack. When OSR is occurred we call OSR compiled code, and once it finishes execution 100b1994897Sopenharmony_ciwe return `true` to the Interpreter. Interpreter, in turn, execute fake `return` instruction to exit from the execution 101b1994897Sopenharmony_ciprocedure. 102b1994897Sopenharmony_ci 103b1994897Sopenharmony_ciPseudocode: 104b1994897Sopenharmony_ci```python 105b1994897Sopenharmony_cidef interpreter_work(): 106b1994897Sopenharmony_ci switch(current_inst): 107b1994897Sopenharmony_ci case Return: 108b1994897Sopenharmony_ci return 109b1994897Sopenharmony_ci case Jump: 110b1994897Sopenharmony_ci if target < current_inst.offset: 111b1994897Sopenharmony_ci if update_hotness(method, current_inst.bytecode_offset): 112b1994897Sopenharmony_ci set_current_inst(Return) 113b1994897Sopenharmony_ci ... 114b1994897Sopenharmony_ci 115b1994897Sopenharmony_cidef update_hotness(method: Method*, bytecode_offset: int) -> bool: 116b1994897Sopenharmony_ci hotness_counter += 1 117b1994897Sopenharmony_ci return false if hotness_counter < threshold: 118b1994897Sopenharmony_ci 119b1994897Sopenharmony_ci if method.HasOsrCode(): 120b1994897Sopenharmony_ci return OsrEntry(method, bytecode_offset) 121b1994897Sopenharmony_ci 122b1994897Sopenharmony_ci ... # run compilation, see Triggering for more information 123b1994897Sopenharmony_ci 124b1994897Sopenharmony_ci return false 125b1994897Sopenharmony_ci 126b1994897Sopenharmony_cidef osr_entry(method: Method*, bytecode_offset: int) -> bool: 127b1994897Sopenharmony_ci stamp = Metainfo.find_stamp(bytecode_offset) 128b1994897Sopenharmony_ci return false if not stamp 129b1994897Sopenharmony_ci 130b1994897Sopenharmony_ci # Call assembly functions to do OSR magic 131b1994897Sopenharmony_ci 132b1994897Sopenharmony_ci return true 133b1994897Sopenharmony_ci``` 134b1994897Sopenharmony_ci 135b1994897Sopenharmony_ciMost part of the OSR entry is written in an assembly language, because CFrame is resided in the native stack. 136b1994897Sopenharmony_ci 137b1994897Sopenharmony_ciOsr Entry can occur in three different contexts according to the previous frame's kind: 138b1994897Sopenharmony_ci1. **Previous frame is CFrame** 139b1994897Sopenharmony_ci 140b1994897Sopenharmony_ci Before: cframe->c2i->iframe 141b1994897Sopenharmony_ci 142b1994897Sopenharmony_ci After: cframe->cframe' 143b1994897Sopenharmony_ci 144b1994897Sopenharmony_ci New cframe is created in place of `c2i` frame, which is just dropped 145b1994897Sopenharmony_ci 146b1994897Sopenharmony_ci2. **Previous frame is IFrame** 147b1994897Sopenharmony_ci 148b1994897Sopenharmony_ci Before: iframe->iframe 149b1994897Sopenharmony_ci 150b1994897Sopenharmony_ci After: iframe->i2c->cframe' 151b1994897Sopenharmony_ci 152b1994897Sopenharmony_ci New cframe is created in the current stack position. But before it we need to insert i2c bridge. 153b1994897Sopenharmony_ci 154b1994897Sopenharmony_ci3. **Previous frame is null(current frame is the top frame)** 155b1994897Sopenharmony_ci 156b1994897Sopenharmony_ci Before: iframe 157b1994897Sopenharmony_ci 158b1994897Sopenharmony_ci After: cframe' 159b1994897Sopenharmony_ci 160b1994897Sopenharmony_cic2i - compiled to interpreter code bridge 161b1994897Sopenharmony_ci 162b1994897Sopenharmony_cii2c - interpreter to compiled code bridge 163b1994897Sopenharmony_ci 164b1994897Sopenharmony_cicframe' - new cframe, converted from iframe