1b1994897Sopenharmony_ci# PLT Resolvers
2b1994897Sopenharmony_ci
3b1994897Sopenharmony_ciAOT compiler mode mainly described in [aot.md](../../docs/aot.md), please read it first.
4b1994897Sopenharmony_ci
5b1994897Sopenharmony_ci## Brief SlowPath idea description
6b1994897Sopenharmony_ci
7b1994897Sopenharmony_ciJIT/AOT compiler has a `SlowPath` mechanism. It is used for some opcodes where a call to runtime is required conditionally,
8b1994897Sopenharmony_cibut not always.
9b1994897Sopenharmony_ciDuring code generation so-called `SlowPath` code is created, and we put it into a special cold code block at the end of the function.
10b1994897Sopenharmony_ciUnique `SlowPath` blob is generated for each place it is called, and as it contains saving registers and setting up of so-called
11b1994897Sopenharmony_ci`BoundaryFrame` for stack walker, it's code is much longer than few runtime-call-related instructions mentioned in the section above.
12b1994897Sopenharmony_ci
13b1994897Sopenharmony_ci## Code size issue
14b1994897Sopenharmony_ci
15b1994897Sopenharmony_ciSpeaking about AOT mode, for opcodes like `CallStatic`, `CallVirtual`, and opcodes related to `Class` resolving such
16b1994897Sopenharmony_ci`SlowPath` also can be used, as we can cache gathered Method or Class pointer into a slot in GOT table (in `.aot_got` section).
17b1994897Sopenharmony_ciThe problem is that such a `SlowPath` would be actually required only once when we first time reach appropriate `method Id`
18b1994897Sopenharmony_cior `class Id`. So, in order to reduce code size in AOT mode, more tricky solution with PLT Resolvers is used.
19b1994897Sopenharmony_ci
20b1994897Sopenharmony_ci## Static Call Resolver
21b1994897Sopenharmony_ci
22b1994897Sopenharmony_ciFor each pair of File (input for `ark_aot` compiler) and callee `method Id` (`panda_file::File::EntityId`) three
23b1994897Sopenharmony_ciconsecutive slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `method Id`.
24b1994897Sopenharmony_ci`SecondSlot` is filled during AOT file loading into runtime and contains `PLT CallStatic Resolver` address.
25b1994897Sopenharmony_ci`ThirdSlot` would actually store `Method pointer` after resolving, but during AOT file loading it is initialized
26b1994897Sopenharmony_cito address of `SecondSlot`, subtracted by `GetCompiledEntryPointOffset` value.
27b1994897Sopenharmony_ci
28b1994897Sopenharmony_ciDuring calls, first parameter is always a callee `Method pointer`, so the trick from previous paragraph allows to have 
29b1994897Sopenharmony_cifully transparent resolver for code generation. Lets see `arm64` example (`GetCompiledEntryPointOffset` is 56 = 7 * 8, all function
30b1994897Sopenharmony_ciparameters are already in proper registers):
31b1994897Sopenharmony_ci
32b1994897Sopenharmony_ci```
33b1994897Sopenharmony_ci========= .aot_got ========
34b1994897Sopenharmony_ci; Somewhere in PLT-GOT table
35b1994897Sopenharmony_ci . . .
36b1994897Sopenharmony_ci-YY-16: FirstSlot - method Id
37b1994897Sopenharmony_ci-YY-08: SecondSlot - PLT CallStatic Resolver
38b1994897Sopenharmony_ci-YY-00: ThirdSlot - address of (-YY-08-56)  <--------------
39b1994897Sopenharmony_ci . . .                                                    |
40b1994897Sopenharmony_ci; start of entrypoint table                               |
41b1994897Sopenharmony_ci-NN: address of handler 0, NN = N * 8                     |
42b1994897Sopenharmony_ci . . .                                                    |
43b1994897Sopenharmony_ci-16: address of handler N-1                               |
44b1994897Sopenharmony_ci-08: address of handler N                                 |
45b1994897Sopenharmony_ci========== .text ==========                               |
46b1994897Sopenharmony_ci00:                                                       |
47b1994897Sopenharmony_ci . . .                                                    |
48b1994897Sopenharmony_ciXX+00: adr x0, #-(YY+XX)   ; Put to the x0 address of ThirdSlot ; before resolve   ; after resolve
49b1994897Sopenharmony_ciXX+04: ldr x0, [x0]        ; Load value stored in ThirdSlot     ; (&FirstSlot)-48  ; Method Pointer
50b1994897Sopenharmony_ciXX+08: ldr x30, [x0, #56]  ; Load EntryPoint                    ; SecondSlot value ; Executable code
51b1994897Sopenharmony_ciXX+12: blr x30             ; Call                               ; Call Resolver    ; Call Method
52b1994897Sopenharmony_ci . . .
53b1994897Sopenharmony_ci```
54b1994897Sopenharmony_ci
55b1994897Sopenharmony_ci`PLT CallStatic Resolver` after saving all registers to the stack and `BoundaryFrame` generation, have `(&FirstSlot)-48`
56b1994897Sopenharmony_civalue in `x0`, so it may load `ldr x1, [x0, #48]` to get `method Id` from `FirstSlot`.
57b1994897Sopenharmony_ciCaller `Method pointer` could be extracted (into `x0`) directly from Caller's CFrame, so,
58b1994897Sopenharmony_cihaving this two values in `x0` and `x1` it just call `GetCalleeMethod` to gather `Method pointer`.
59b1994897Sopenharmony_ci
60b1994897Sopenharmony_ciWhen we have `Method pointer`, it is stored into `ThirdSlot`, allow to load proper executable address, and goes as first
61b1994897Sopenharmony_ciparameter in actual method call. Jump by register value operation is used instead of call to return back directly into code,
62b1994897Sopenharmony_cinot the resolver.
63b1994897Sopenharmony_ci
64b1994897Sopenharmony_ci## Virtual Call Resolver
65b1994897Sopenharmony_ci
66b1994897Sopenharmony_ciFor each pair of File (input for `ark_aot` compiler) and callee `method Id` (`panda_file::File::EntityId`) two consecutive
67b1994897Sopenharmony_cislots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `method Id`.
68b1994897Sopenharmony_ci`SecondSlot` is filled with zero and after resolving it stores `VTable index` incremented by 1.
69b1994897Sopenharmony_ci
70b1994897Sopenharmony_ci```
71b1994897Sopenharmony_ci========= .aot_got ========
72b1994897Sopenharmony_ci; Somewhere in PLT-GOT table
73b1994897Sopenharmony_ci . . .
74b1994897Sopenharmony_ci-YY-08: FirstSlot - method Id
75b1994897Sopenharmony_ci-YY-00: SecondSlot, zero or (index+1) <---------------------------
76b1994897Sopenharmony_ci . . .                                                           |
77b1994897Sopenharmony_ci; start of entrypoint table                                      |
78b1994897Sopenharmony_ci-NN: address of handler 0, NN = N * 8                            |
79b1994897Sopenharmony_ci . . .                                                           |
80b1994897Sopenharmony_ci-16: address of handler N-1                                      |
81b1994897Sopenharmony_ci-08: address of handler N                                        |
82b1994897Sopenharmony_ci========== .text ==========                                      |
83b1994897Sopenharmony_ci00:                                                              |
84b1994897Sopenharmony_ci . . .                                                           |
85b1994897Sopenharmony_ci; CallVirtual opcode (register allocator used x5 for Class ptr)  |
86b1994897Sopenharmony_ciXX+00: adr x16, #-(YY+XX)        ; Put to the x16 address of SecondSlot
87b1994897Sopenharmony_ciXX+04: ldr w17, [x16]            ; Load value from SecondSlot
88b1994897Sopenharmony_ciXX+08: cbnz w17, #16             ; Jump to XX+24 when non-zero
89b1994897Sopenharmony_ciXX+16: ldr x28, [#CALL_VIRTUAL_RESOLVER] ; Load VirtualCall Resolver address
90b1994897Sopenharmony_ciXX+20: blr x30                   ; Call Resolver, x16 is like a "parameter" and "return value"
91b1994897Sopenharmony_ciXX+24: ldr w16, [x5, #4]         ; Get Class pointer into x16
92b1994897Sopenharmony_ciXX+28: add w16, w16, w17, lsl #3 ; x16 = Class+(index+1)*8
93b1994897Sopenharmony_ciXX+32: ldr w16, [x16, #160]      ; Load Method from VTable (compensating index+1, as VTable start offset is 168)
94b1994897Sopenharmony_ci . . . ; Check IsAbstract
95b1994897Sopenharmony_ci . . . ; Save caller-saved registers
96b1994897Sopenharmony_ci . . . ; Set call parameters
97b1994897Sopenharmony_ciZZ+00: mov x0, x16               ; x0 = Method address
98b1994897Sopenharmony_ciZZ+04: ldr x30, [x0, #56]        ; Executable code address
99b1994897Sopenharmony_ciZZ+08: blr x30                   ; Call
100b1994897Sopenharmony_ci . . .
101b1994897Sopenharmony_ci```
102b1994897Sopenharmony_ci
103b1994897Sopenharmony_ciUnlike CallStatic, there is no way to use default parameter registers to send/receive values into resolver.
104b1994897Sopenharmony_ciThus for `PLT CallVirtual Resolver` convention is the following - first `Encoder` temporary register
105b1994897Sopenharmony_ci(`x16` for `arm64` or `r12` for `x86_84`) is a parameter with `SecondSlot` address and also the same register
106b1994897Sopenharmony_ciworks as "return value"
107b1994897Sopenharmony_ci
108b1994897Sopenharmony_ci`PLT CallVirtual Resolver` loads `method Id` from `FirstSlot` using address `x16-8`,
109b1994897Sopenharmony_citakes caller `Method pointer` from previous frame and calls `GetCalleeMethod` entrypoint.
110b1994897Sopenharmony_ciHaving `Method pointer` it is easy to load `VTable index` value.
111b1994897Sopenharmony_ciResolver returns `index+1` value using `x16`, and don't call any other functions like `PLT CallStatic Resolver` do.
112b1994897Sopenharmony_ciControl is returned back into code instead.
113b1994897Sopenharmony_ci
114b1994897Sopenharmony_ci## Class and InitClass Resolvers
115b1994897Sopenharmony_ci
116b1994897Sopenharmony_ciFor each pair of File (input for `ark_aot` compiler) and `class Id` (`panda_file::File::EntityId`) which needs to be resolved
117b1994897Sopenharmony_cithree consecutive slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `class Id`.
118b1994897Sopenharmony_ci`SecondSlot` and `ThirdSlot` are filled with zeroes and after resolving they both store `Class pointer`, but have different meaning.
119b1994897Sopenharmony_ciWhen `SecondSlot` in non-zero it means that `Class` is known to be in `Initialized` state already.
120b1994897Sopenharmony_ci
121b1994897Sopenharmony_ci```
122b1994897Sopenharmony_ci========= .aot_got ========
123b1994897Sopenharmony_ci; Somewhere in PLT-GOT table
124b1994897Sopenharmony_ci . . .
125b1994897Sopenharmony_ci-YY-16: FirstSlot - class Id
126b1994897Sopenharmony_ci-YY-08: SecondSlot, zero or "Inialized Class" pointer <-----------
127b1994897Sopenharmony_ci-YY-00: ThirdSlot, zero or Class pointer                         |
128b1994897Sopenharmony_ci . . .                                                           |
129b1994897Sopenharmony_ci; start of entrypoint table                                      |
130b1994897Sopenharmony_ci-NN: address of handler 0, NN = N * 8                            |
131b1994897Sopenharmony_ci . . .                                                           |
132b1994897Sopenharmony_ci-16: address of handler N-1                                      |
133b1994897Sopenharmony_ci-08: address of handler N                                        |
134b1994897Sopenharmony_ci========== .text ==========                                      |
135b1994897Sopenharmony_ci00:                                                              |
136b1994897Sopenharmony_ci . . .                                                           |
137b1994897Sopenharmony_ci; Shared resolved slow path for PLT resolver                     |
138b1994897Sopenharmony_ciYY+00: ldr x17, x28, [CLASS_INIT_RESOLVER] ; Load InitClass Resolver address
139b1994897Sopenharmony_ciYY+04: br  x17                             ; Jump to resolver, x16 works like a "parameter" and "return value"
140b1994897Sopenharmony_ci . . .                                                           |
141b1994897Sopenharmony_ci; LoadAndInitClass opcode (w7 register allocated for result)     |
142b1994897Sopenharmony_ciXX+00: adr x16, #-(YY+8+XX)      ; Put to the x16 address of SecondSlot
143b1994897Sopenharmony_ciXX+04: ldr w7, [x16]             ; Load value from SecondSlot
144b1994897Sopenharmony_ciXX+08: cbnz w7, #20              ; Jump to XX+28 when non-zero
145b1994897Sopenharmony_ciXX+12: bl YY - (XX+08)           ; Call shared slow path for PLT resolver, x16 works like a "parameter" and "return value"
146b1994897Sopenharmony_ciXX+16: mov w7, w16               ; Class should be in w7
147b1994897Sopenharmony_ciXX+20: ... ; run next opcode
148b1994897Sopenharmony_ci . . .
149b1994897Sopenharmony_ci```
150b1994897Sopenharmony_ci
151b1994897Sopenharmony_ciFor class-related resolvers convention is the following - first `Encoder` temporary register
152b1994897Sopenharmony_ci(`x16` for `arm64` or `r12` for `x86_84`) is a parameter with Slot address, and it is also used as "return value".
153b1994897Sopenharmony_ci
154b1994897Sopenharmony_ci`PLT InitClass Resolver` loads `class Id` from `FirstSlot` using address `x16-8`,
155b1994897Sopenharmony_citakes caller `Method pointer` from previous frame and calls `InitializeClassById` entrypoint.
156b1994897Sopenharmony_ciIt stores gathered `Class pointer` into `ThirdSlot`, and also does the same for `SecondSlot` but under condition.
157b1994897Sopenharmony_ciThe condition is whether `Class` state is `Initialized`, as returning from `InitializeClassById` entrypoint in some corner
158b1994897Sopenharmony_cicases can happen when `Class` is yet only in `Initializing` state.
159b1994897Sopenharmony_ci
160b1994897Sopenharmony_ci`PLT Class Resolver` receives `x16` addressing `ThirdSlot`, so it loads `class Id` from `FirstSlot` using address `x16-16`.
161b1994897Sopenharmony_ciAnother entrypoint is called here - `ResolveClass`. Gathered `Class pointer` value is stored into `ThirdSlot` only.
162b1994897Sopenharmony_ci
163b1994897Sopenharmony_ciBoth Resolvers returns `Class pointer` value using `x16` back into code.
164b1994897Sopenharmony_ci
165b1994897Sopenharmony_ci## Resolver Encoding
166b1994897Sopenharmony_ci
167b1994897Sopenharmony_ciAs all 4 resolvers have a lot of similar parts, their generation in implemented in one method - `EncodePltHelper`.
168b1994897Sopenharmony_ciMoreover, it is placed in platform-independent file `code_generator/target/target.cpp`, although there are actually several
169b1994897Sopenharmony_cidifferences in what's happening for `arm64` and `x86_64`.
170b1994897Sopenharmony_ci
171b1994897Sopenharmony_ciMain difference between two supported platforms is a main temporary register to use in Resolver.
172b1994897Sopenharmony_ciFor `arm64` we use `LR` register (`x30`), and for `x86_64` third `Encoder` temporary - `r14` is used.
173b1994897Sopenharmony_ci
174b1994897Sopenharmony_ciOne more issue is that first `Encoder` temporary register (`x16` for `arm64` or `r12` for `x86_84`) used as parameter
175b1994897Sopenharmony_ciin 3 Resolvers (all but CallStatic) is actually a caller-saved for `arm64`, but callee-saved for `x86`, leading to some
176b1994897Sopenharmony_cidifference.
177b1994897Sopenharmony_ci
178b1994897Sopenharmony_ciLets briefly discuss all steps which happen consecutively in any Resolver:
179b1994897Sopenharmony_ci* **Save LR and FP register to stack.**
180b1994897Sopenharmony_ciOn `arm64` is is just a one `stp x29, x30, [sp, #-16]` instruction,while on `x86` caller return address is already
181b1994897Sopenharmony_cion stack, so we load it into temporary (we need it for `BoundaryFrame`), and push `rbp` to the stack.
182b1994897Sopenharmony_ci
183b1994897Sopenharmony_ci* **Create BoundaryFrame.**
184b1994897Sopenharmony_ciIt actually copies the `SlowPath` behavior of usual `BoundaryFrame` class constructor, but with one special trick:
185b1994897Sopenharmony_cifor 3 out of 4 Resolvers (all but CallStatic) "return address" and "previous frame" values which are already on stack
186b1994897Sopenharmony_ci(see previous step) directly became the upper part of `BoundaryFrame` stack part.
187b1994897Sopenharmony_ci
188b1994897Sopenharmony_ci* **Save caller-saved registers.**
189b1994897Sopenharmony_ciIn CallStatic resolver we prepare place on the stack and save registers there. In three other Resolvers caller-saved
190b1994897Sopenharmony_ciregisters are saved directly into appropriate places in previous CFrame.
191b1994897Sopenharmony_ciStack pointer is temporarily manually adjusted in this case to allow `SaveCallerRegisters` function to do it's job.
192b1994897Sopenharmony_ciMoreover, for `arm64` we manually add `x16` to live registers set.
193b1994897Sopenharmony_ci
194b1994897Sopenharmony_ci* **Prepare parameters for Runtime Call.**
195b1994897Sopenharmony_ciThis step is described above separately in each resolver description.
196b1994897Sopenharmony_ci
197b1994897Sopenharmony_ci* **Save callee-saved registers.**
198b1994897Sopenharmony_ciAdjust stack pointer (second time for `CallStatic` Resolver, and the only time for other) and
199b1994897Sopenharmony_cicall `SaveRegisters` two times - for float and scalar registers.  
200b1994897Sopenharmony_ci
201b1994897Sopenharmony_ci* **Make a Runtime Call.**
202b1994897Sopenharmony_ciThis step is done using `MakeCallAot` function with properly calculated offset. Resolvers are placed after all functions in
203b1994897Sopenharmony_ciAOT file, but distance to `.aot_got` section can be calculated in the same way like for usual code generation.
204b1994897Sopenharmony_ci
205b1994897Sopenharmony_ci* **Load callee-saved registers.**
206b1994897Sopenharmony_ciReverse what was done two steps above - `LoadRegisters` for float and scalar registers, then adjust the stack pointer back.
207b1994897Sopenharmony_ci
208b1994897Sopenharmony_ci* **Restore previous Frame.**
209b1994897Sopenharmony_ciWorks similar to `BoundaryFrame` class destructor.
210b1994897Sopenharmony_ci
211b1994897Sopenharmony_ci* **Process gathered result.**
212b1994897Sopenharmony_ciFirst, `arm64` non-`CallStatic` Resolvers need to manually restore `x16` from the place it was saved.
213b1994897Sopenharmony_ciOn `x86_64` this step is not required, as `r12` appears to be callee-saved register and is restored already.
214b1994897Sopenharmony_ciMain logic of this step is described above separately in each resolver description.
215b1994897Sopenharmony_ci
216b1994897Sopenharmony_ci* **Load caller-saved registers.**
217b1994897Sopenharmony_ciRegisters are loaded in the same manner they were saved. So, in CallStatic we have to adjust stack pointer after loading,
218b1994897Sopenharmony_ciwhile in other Resolvers it is temporarily manually adjusted to previous frame before calling `LoadCallerRegisters` function.
219b1994897Sopenharmony_ci
220b1994897Sopenharmony_ci* **Restore LR and FP.**
221b1994897Sopenharmony_ciNothing special, symmetric to the very first step.
222b1994897Sopenharmony_ci
223b1994897Sopenharmony_ci* **Leave Resolver.**
224b1994897Sopenharmony_ciJump to the callee Method in `CallStatic` Resolver, and do a usual "return" in others.
225