1b1994897Sopenharmony_ci# PLT Resolvers 2b1994897Sopenharmony_ci 3b1994897Sopenharmony_ciAOT compiler mode mainly described in [aot.md](../../docs/aot.md), please read it first. 4b1994897Sopenharmony_ci 5b1994897Sopenharmony_ci## Brief SlowPath idea description 6b1994897Sopenharmony_ci 7b1994897Sopenharmony_ciJIT/AOT compiler has a `SlowPath` mechanism. It is used for some opcodes where a call to runtime is required conditionally, 8b1994897Sopenharmony_cibut not always. 9b1994897Sopenharmony_ciDuring code generation so-called `SlowPath` code is created, and we put it into a special cold code block at the end of the function. 10b1994897Sopenharmony_ciUnique `SlowPath` blob is generated for each place it is called, and as it contains saving registers and setting up of so-called 11b1994897Sopenharmony_ci`BoundaryFrame` for stack walker, it's code is much longer than few runtime-call-related instructions mentioned in the section above. 12b1994897Sopenharmony_ci 13b1994897Sopenharmony_ci## Code size issue 14b1994897Sopenharmony_ci 15b1994897Sopenharmony_ciSpeaking about AOT mode, for opcodes like `CallStatic`, `CallVirtual`, and opcodes related to `Class` resolving such 16b1994897Sopenharmony_ci`SlowPath` also can be used, as we can cache gathered Method or Class pointer into a slot in GOT table (in `.aot_got` section). 17b1994897Sopenharmony_ciThe problem is that such a `SlowPath` would be actually required only once when we first time reach appropriate `method Id` 18b1994897Sopenharmony_cior `class Id`. So, in order to reduce code size in AOT mode, more tricky solution with PLT Resolvers is used. 19b1994897Sopenharmony_ci 20b1994897Sopenharmony_ci## Static Call Resolver 21b1994897Sopenharmony_ci 22b1994897Sopenharmony_ciFor each pair of File (input for `ark_aot` compiler) and callee `method Id` (`panda_file::File::EntityId`) three 23b1994897Sopenharmony_ciconsecutive slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `method Id`. 24b1994897Sopenharmony_ci`SecondSlot` is filled during AOT file loading into runtime and contains `PLT CallStatic Resolver` address. 25b1994897Sopenharmony_ci`ThirdSlot` would actually store `Method pointer` after resolving, but during AOT file loading it is initialized 26b1994897Sopenharmony_cito address of `SecondSlot`, subtracted by `GetCompiledEntryPointOffset` value. 27b1994897Sopenharmony_ci 28b1994897Sopenharmony_ciDuring calls, first parameter is always a callee `Method pointer`, so the trick from previous paragraph allows to have 29b1994897Sopenharmony_cifully transparent resolver for code generation. Lets see `arm64` example (`GetCompiledEntryPointOffset` is 56 = 7 * 8, all function 30b1994897Sopenharmony_ciparameters are already in proper registers): 31b1994897Sopenharmony_ci 32b1994897Sopenharmony_ci``` 33b1994897Sopenharmony_ci========= .aot_got ======== 34b1994897Sopenharmony_ci; Somewhere in PLT-GOT table 35b1994897Sopenharmony_ci . . . 36b1994897Sopenharmony_ci-YY-16: FirstSlot - method Id 37b1994897Sopenharmony_ci-YY-08: SecondSlot - PLT CallStatic Resolver 38b1994897Sopenharmony_ci-YY-00: ThirdSlot - address of (-YY-08-56) <-------------- 39b1994897Sopenharmony_ci . . . | 40b1994897Sopenharmony_ci; start of entrypoint table | 41b1994897Sopenharmony_ci-NN: address of handler 0, NN = N * 8 | 42b1994897Sopenharmony_ci . . . | 43b1994897Sopenharmony_ci-16: address of handler N-1 | 44b1994897Sopenharmony_ci-08: address of handler N | 45b1994897Sopenharmony_ci========== .text ========== | 46b1994897Sopenharmony_ci00: | 47b1994897Sopenharmony_ci . . . | 48b1994897Sopenharmony_ciXX+00: adr x0, #-(YY+XX) ; Put to the x0 address of ThirdSlot ; before resolve ; after resolve 49b1994897Sopenharmony_ciXX+04: ldr x0, [x0] ; Load value stored in ThirdSlot ; (&FirstSlot)-48 ; Method Pointer 50b1994897Sopenharmony_ciXX+08: ldr x30, [x0, #56] ; Load EntryPoint ; SecondSlot value ; Executable code 51b1994897Sopenharmony_ciXX+12: blr x30 ; Call ; Call Resolver ; Call Method 52b1994897Sopenharmony_ci . . . 53b1994897Sopenharmony_ci``` 54b1994897Sopenharmony_ci 55b1994897Sopenharmony_ci`PLT CallStatic Resolver` after saving all registers to the stack and `BoundaryFrame` generation, have `(&FirstSlot)-48` 56b1994897Sopenharmony_civalue in `x0`, so it may load `ldr x1, [x0, #48]` to get `method Id` from `FirstSlot`. 57b1994897Sopenharmony_ciCaller `Method pointer` could be extracted (into `x0`) directly from Caller's CFrame, so, 58b1994897Sopenharmony_cihaving this two values in `x0` and `x1` it just call `GetCalleeMethod` to gather `Method pointer`. 59b1994897Sopenharmony_ci 60b1994897Sopenharmony_ciWhen we have `Method pointer`, it is stored into `ThirdSlot`, allow to load proper executable address, and goes as first 61b1994897Sopenharmony_ciparameter in actual method call. Jump by register value operation is used instead of call to return back directly into code, 62b1994897Sopenharmony_cinot the resolver. 63b1994897Sopenharmony_ci 64b1994897Sopenharmony_ci## Virtual Call Resolver 65b1994897Sopenharmony_ci 66b1994897Sopenharmony_ciFor each pair of File (input for `ark_aot` compiler) and callee `method Id` (`panda_file::File::EntityId`) two consecutive 67b1994897Sopenharmony_cislots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `method Id`. 68b1994897Sopenharmony_ci`SecondSlot` is filled with zero and after resolving it stores `VTable index` incremented by 1. 69b1994897Sopenharmony_ci 70b1994897Sopenharmony_ci``` 71b1994897Sopenharmony_ci========= .aot_got ======== 72b1994897Sopenharmony_ci; Somewhere in PLT-GOT table 73b1994897Sopenharmony_ci . . . 74b1994897Sopenharmony_ci-YY-08: FirstSlot - method Id 75b1994897Sopenharmony_ci-YY-00: SecondSlot, zero or (index+1) <--------------------------- 76b1994897Sopenharmony_ci . . . | 77b1994897Sopenharmony_ci; start of entrypoint table | 78b1994897Sopenharmony_ci-NN: address of handler 0, NN = N * 8 | 79b1994897Sopenharmony_ci . . . | 80b1994897Sopenharmony_ci-16: address of handler N-1 | 81b1994897Sopenharmony_ci-08: address of handler N | 82b1994897Sopenharmony_ci========== .text ========== | 83b1994897Sopenharmony_ci00: | 84b1994897Sopenharmony_ci . . . | 85b1994897Sopenharmony_ci; CallVirtual opcode (register allocator used x5 for Class ptr) | 86b1994897Sopenharmony_ciXX+00: adr x16, #-(YY+XX) ; Put to the x16 address of SecondSlot 87b1994897Sopenharmony_ciXX+04: ldr w17, [x16] ; Load value from SecondSlot 88b1994897Sopenharmony_ciXX+08: cbnz w17, #16 ; Jump to XX+24 when non-zero 89b1994897Sopenharmony_ciXX+16: ldr x28, [#CALL_VIRTUAL_RESOLVER] ; Load VirtualCall Resolver address 90b1994897Sopenharmony_ciXX+20: blr x30 ; Call Resolver, x16 is like a "parameter" and "return value" 91b1994897Sopenharmony_ciXX+24: ldr w16, [x5, #4] ; Get Class pointer into x16 92b1994897Sopenharmony_ciXX+28: add w16, w16, w17, lsl #3 ; x16 = Class+(index+1)*8 93b1994897Sopenharmony_ciXX+32: ldr w16, [x16, #160] ; Load Method from VTable (compensating index+1, as VTable start offset is 168) 94b1994897Sopenharmony_ci . . . ; Check IsAbstract 95b1994897Sopenharmony_ci . . . ; Save caller-saved registers 96b1994897Sopenharmony_ci . . . ; Set call parameters 97b1994897Sopenharmony_ciZZ+00: mov x0, x16 ; x0 = Method address 98b1994897Sopenharmony_ciZZ+04: ldr x30, [x0, #56] ; Executable code address 99b1994897Sopenharmony_ciZZ+08: blr x30 ; Call 100b1994897Sopenharmony_ci . . . 101b1994897Sopenharmony_ci``` 102b1994897Sopenharmony_ci 103b1994897Sopenharmony_ciUnlike CallStatic, there is no way to use default parameter registers to send/receive values into resolver. 104b1994897Sopenharmony_ciThus for `PLT CallVirtual Resolver` convention is the following - first `Encoder` temporary register 105b1994897Sopenharmony_ci(`x16` for `arm64` or `r12` for `x86_84`) is a parameter with `SecondSlot` address and also the same register 106b1994897Sopenharmony_ciworks as "return value" 107b1994897Sopenharmony_ci 108b1994897Sopenharmony_ci`PLT CallVirtual Resolver` loads `method Id` from `FirstSlot` using address `x16-8`, 109b1994897Sopenharmony_citakes caller `Method pointer` from previous frame and calls `GetCalleeMethod` entrypoint. 110b1994897Sopenharmony_ciHaving `Method pointer` it is easy to load `VTable index` value. 111b1994897Sopenharmony_ciResolver returns `index+1` value using `x16`, and don't call any other functions like `PLT CallStatic Resolver` do. 112b1994897Sopenharmony_ciControl is returned back into code instead. 113b1994897Sopenharmony_ci 114b1994897Sopenharmony_ci## Class and InitClass Resolvers 115b1994897Sopenharmony_ci 116b1994897Sopenharmony_ciFor each pair of File (input for `ark_aot` compiler) and `class Id` (`panda_file::File::EntityId`) which needs to be resolved 117b1994897Sopenharmony_cithree consecutive slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `class Id`. 118b1994897Sopenharmony_ci`SecondSlot` and `ThirdSlot` are filled with zeroes and after resolving they both store `Class pointer`, but have different meaning. 119b1994897Sopenharmony_ciWhen `SecondSlot` in non-zero it means that `Class` is known to be in `Initialized` state already. 120b1994897Sopenharmony_ci 121b1994897Sopenharmony_ci``` 122b1994897Sopenharmony_ci========= .aot_got ======== 123b1994897Sopenharmony_ci; Somewhere in PLT-GOT table 124b1994897Sopenharmony_ci . . . 125b1994897Sopenharmony_ci-YY-16: FirstSlot - class Id 126b1994897Sopenharmony_ci-YY-08: SecondSlot, zero or "Inialized Class" pointer <----------- 127b1994897Sopenharmony_ci-YY-00: ThirdSlot, zero or Class pointer | 128b1994897Sopenharmony_ci . . . | 129b1994897Sopenharmony_ci; start of entrypoint table | 130b1994897Sopenharmony_ci-NN: address of handler 0, NN = N * 8 | 131b1994897Sopenharmony_ci . . . | 132b1994897Sopenharmony_ci-16: address of handler N-1 | 133b1994897Sopenharmony_ci-08: address of handler N | 134b1994897Sopenharmony_ci========== .text ========== | 135b1994897Sopenharmony_ci00: | 136b1994897Sopenharmony_ci . . . | 137b1994897Sopenharmony_ci; Shared resolved slow path for PLT resolver | 138b1994897Sopenharmony_ciYY+00: ldr x17, x28, [CLASS_INIT_RESOLVER] ; Load InitClass Resolver address 139b1994897Sopenharmony_ciYY+04: br x17 ; Jump to resolver, x16 works like a "parameter" and "return value" 140b1994897Sopenharmony_ci . . . | 141b1994897Sopenharmony_ci; LoadAndInitClass opcode (w7 register allocated for result) | 142b1994897Sopenharmony_ciXX+00: adr x16, #-(YY+8+XX) ; Put to the x16 address of SecondSlot 143b1994897Sopenharmony_ciXX+04: ldr w7, [x16] ; Load value from SecondSlot 144b1994897Sopenharmony_ciXX+08: cbnz w7, #20 ; Jump to XX+28 when non-zero 145b1994897Sopenharmony_ciXX+12: bl YY - (XX+08) ; Call shared slow path for PLT resolver, x16 works like a "parameter" and "return value" 146b1994897Sopenharmony_ciXX+16: mov w7, w16 ; Class should be in w7 147b1994897Sopenharmony_ciXX+20: ... ; run next opcode 148b1994897Sopenharmony_ci . . . 149b1994897Sopenharmony_ci``` 150b1994897Sopenharmony_ci 151b1994897Sopenharmony_ciFor class-related resolvers convention is the following - first `Encoder` temporary register 152b1994897Sopenharmony_ci(`x16` for `arm64` or `r12` for `x86_84`) is a parameter with Slot address, and it is also used as "return value". 153b1994897Sopenharmony_ci 154b1994897Sopenharmony_ci`PLT InitClass Resolver` loads `class Id` from `FirstSlot` using address `x16-8`, 155b1994897Sopenharmony_citakes caller `Method pointer` from previous frame and calls `InitializeClassById` entrypoint. 156b1994897Sopenharmony_ciIt stores gathered `Class pointer` into `ThirdSlot`, and also does the same for `SecondSlot` but under condition. 157b1994897Sopenharmony_ciThe condition is whether `Class` state is `Initialized`, as returning from `InitializeClassById` entrypoint in some corner 158b1994897Sopenharmony_cicases can happen when `Class` is yet only in `Initializing` state. 159b1994897Sopenharmony_ci 160b1994897Sopenharmony_ci`PLT Class Resolver` receives `x16` addressing `ThirdSlot`, so it loads `class Id` from `FirstSlot` using address `x16-16`. 161b1994897Sopenharmony_ciAnother entrypoint is called here - `ResolveClass`. Gathered `Class pointer` value is stored into `ThirdSlot` only. 162b1994897Sopenharmony_ci 163b1994897Sopenharmony_ciBoth Resolvers returns `Class pointer` value using `x16` back into code. 164b1994897Sopenharmony_ci 165b1994897Sopenharmony_ci## Resolver Encoding 166b1994897Sopenharmony_ci 167b1994897Sopenharmony_ciAs all 4 resolvers have a lot of similar parts, their generation in implemented in one method - `EncodePltHelper`. 168b1994897Sopenharmony_ciMoreover, it is placed in platform-independent file `code_generator/target/target.cpp`, although there are actually several 169b1994897Sopenharmony_cidifferences in what's happening for `arm64` and `x86_64`. 170b1994897Sopenharmony_ci 171b1994897Sopenharmony_ciMain difference between two supported platforms is a main temporary register to use in Resolver. 172b1994897Sopenharmony_ciFor `arm64` we use `LR` register (`x30`), and for `x86_64` third `Encoder` temporary - `r14` is used. 173b1994897Sopenharmony_ci 174b1994897Sopenharmony_ciOne more issue is that first `Encoder` temporary register (`x16` for `arm64` or `r12` for `x86_84`) used as parameter 175b1994897Sopenharmony_ciin 3 Resolvers (all but CallStatic) is actually a caller-saved for `arm64`, but callee-saved for `x86`, leading to some 176b1994897Sopenharmony_cidifference. 177b1994897Sopenharmony_ci 178b1994897Sopenharmony_ciLets briefly discuss all steps which happen consecutively in any Resolver: 179b1994897Sopenharmony_ci* **Save LR and FP register to stack.** 180b1994897Sopenharmony_ciOn `arm64` is is just a one `stp x29, x30, [sp, #-16]` instruction,while on `x86` caller return address is already 181b1994897Sopenharmony_cion stack, so we load it into temporary (we need it for `BoundaryFrame`), and push `rbp` to the stack. 182b1994897Sopenharmony_ci 183b1994897Sopenharmony_ci* **Create BoundaryFrame.** 184b1994897Sopenharmony_ciIt actually copies the `SlowPath` behavior of usual `BoundaryFrame` class constructor, but with one special trick: 185b1994897Sopenharmony_cifor 3 out of 4 Resolvers (all but CallStatic) "return address" and "previous frame" values which are already on stack 186b1994897Sopenharmony_ci(see previous step) directly became the upper part of `BoundaryFrame` stack part. 187b1994897Sopenharmony_ci 188b1994897Sopenharmony_ci* **Save caller-saved registers.** 189b1994897Sopenharmony_ciIn CallStatic resolver we prepare place on the stack and save registers there. In three other Resolvers caller-saved 190b1994897Sopenharmony_ciregisters are saved directly into appropriate places in previous CFrame. 191b1994897Sopenharmony_ciStack pointer is temporarily manually adjusted in this case to allow `SaveCallerRegisters` function to do it's job. 192b1994897Sopenharmony_ciMoreover, for `arm64` we manually add `x16` to live registers set. 193b1994897Sopenharmony_ci 194b1994897Sopenharmony_ci* **Prepare parameters for Runtime Call.** 195b1994897Sopenharmony_ciThis step is described above separately in each resolver description. 196b1994897Sopenharmony_ci 197b1994897Sopenharmony_ci* **Save callee-saved registers.** 198b1994897Sopenharmony_ciAdjust stack pointer (second time for `CallStatic` Resolver, and the only time for other) and 199b1994897Sopenharmony_cicall `SaveRegisters` two times - for float and scalar registers. 200b1994897Sopenharmony_ci 201b1994897Sopenharmony_ci* **Make a Runtime Call.** 202b1994897Sopenharmony_ciThis step is done using `MakeCallAot` function with properly calculated offset. Resolvers are placed after all functions in 203b1994897Sopenharmony_ciAOT file, but distance to `.aot_got` section can be calculated in the same way like for usual code generation. 204b1994897Sopenharmony_ci 205b1994897Sopenharmony_ci* **Load callee-saved registers.** 206b1994897Sopenharmony_ciReverse what was done two steps above - `LoadRegisters` for float and scalar registers, then adjust the stack pointer back. 207b1994897Sopenharmony_ci 208b1994897Sopenharmony_ci* **Restore previous Frame.** 209b1994897Sopenharmony_ciWorks similar to `BoundaryFrame` class destructor. 210b1994897Sopenharmony_ci 211b1994897Sopenharmony_ci* **Process gathered result.** 212b1994897Sopenharmony_ciFirst, `arm64` non-`CallStatic` Resolvers need to manually restore `x16` from the place it was saved. 213b1994897Sopenharmony_ciOn `x86_64` this step is not required, as `r12` appears to be callee-saved register and is restored already. 214b1994897Sopenharmony_ciMain logic of this step is described above separately in each resolver description. 215b1994897Sopenharmony_ci 216b1994897Sopenharmony_ci* **Load caller-saved registers.** 217b1994897Sopenharmony_ciRegisters are loaded in the same manner they were saved. So, in CallStatic we have to adjust stack pointer after loading, 218b1994897Sopenharmony_ciwhile in other Resolvers it is temporarily manually adjusted to previous frame before calling `LoadCallerRegisters` function. 219b1994897Sopenharmony_ci 220b1994897Sopenharmony_ci* **Restore LR and FP.** 221b1994897Sopenharmony_ciNothing special, symmetric to the very first step. 222b1994897Sopenharmony_ci 223b1994897Sopenharmony_ci* **Leave Resolver.** 224b1994897Sopenharmony_ciJump to the callee Method in `CallStatic` Resolver, and do a usual "return" in others. 225