1b1994897Sopenharmony_ci# Memory management and object layout. Design.
2b1994897Sopenharmony_ci
3b1994897Sopenharmony_ci## Overview
4b1994897Sopenharmony_ci
5b1994897Sopenharmony_ciPanda Runtime should be scalable onto different devices/OSes. So we need some abstraction level for the OS memory management.
6b1994897Sopenharmony_ciFor now, all targets suppose interaction with the user, so we have some limitations for the STW pause metric.
7b1994897Sopenharmony_ciWe have very limited memory resources for IoT target, so we should maximize efforts on reducing memory overhead(fragmentation and object header size).
8b1994897Sopenharmony_ci
9b1994897Sopenharmony_ciThe main components of Panda memory management and object model:
10b1994897Sopenharmony_ci* [Allocators](#allocators)
11b1994897Sopenharmony_ci* [GC](#gc)
12b1994897Sopenharmony_ci* [Object header](#object-header)
13b1994897Sopenharmony_ci
14b1994897Sopenharmony_ciPanda runtime works/interacts with these memory types:
15b1994897Sopenharmony_ci* internal memory for runtime(ArenaAllocators for JIT, etc)
16b1994897Sopenharmony_ci* application memory(i.e., memory for objects created by application)
17b1994897Sopenharmony_ci* native memory via JNI/FFI
18b1994897Sopenharmony_ci* memory for JITed code
19b1994897Sopenharmony_ci
20b1994897Sopenharmony_ci![High-level design](./images/panda-mm-overview.png "Memory management high-level design")
21b1994897Sopenharmony_ci
22b1994897Sopenharmony_ciThere are several modes for memory management:
23b1994897Sopenharmony_ci- base mode
24b1994897Sopenharmony_ci  - allocators with some average metrics and profile-based configuration(if available)
25b1994897Sopenharmony_ci  - some baseline GC with profile-based configuration(if available)
26b1994897Sopenharmony_ci- performance
27b1994897Sopenharmony_ci  - allocators with low allocation cost
28b1994897Sopenharmony_ci  - low-pause/pauseless GC(for games) or GC with high throughput and acceptable STW pause (for not games)
29b1994897Sopenharmony_ci- power-saving mode
30b1994897Sopenharmony_ci  - energy-efficient allocators(if possible)
31b1994897Sopenharmony_ci  - special thresholds to improve power efficiency, 
32b1994897Sopenharmony_ci
33b1994897Sopenharmony_ciMode are chosen at the startup time (we'll use profile info from cloud for that).
34b1994897Sopenharmony_ci
35b1994897Sopenharmony_ci## Object header
36b1994897Sopenharmony_ci
37b1994897Sopenharmony_ciRationale see [here](memory-management-overview.md).
38b1994897Sopenharmony_ci
39b1994897Sopenharmony_ci### Requirements
40b1994897Sopenharmony_ci
41b1994897Sopenharmony_ci* Support all required features from Runtime
42b1994897Sopenharmony_ci* Similar design for two different platforms - high-end and low-end
43b1994897Sopenharmony_ci* Compact Object Header for low-end target
44b1994897Sopenharmony_ci
45b1994897Sopenharmony_ci### Specification / Implementation
46b1994897Sopenharmony_ci
47b1994897Sopenharmony_ci**Common ObjectHeader methods:**
48b1994897Sopenharmony_ci
49b1994897Sopenharmony_ci* Get/Set Mark or Class word
50b1994897Sopenharmony_ci* Get size of the object header and an object itself
51b1994897Sopenharmony_ci* Get/Generate an object hash
52b1994897Sopenharmony_ci
53b1994897Sopenharmony_ci**Methods, specific for Class word:**
54b1994897Sopenharmony_ci
55b1994897Sopenharmony_ci* Get different object fields
56b1994897Sopenharmony_ci* Return object type
57b1994897Sopenharmony_ci* Verify object
58b1994897Sopenharmony_ci* Is it a subclass or not, is it an array or not, etc.
59b1994897Sopenharmony_ci* Get field address
60b1994897Sopenharmony_ci
61b1994897Sopenharmony_ci**Methods, specific for Mark word:**
62b1994897Sopenharmony_ci
63b1994897Sopenharmony_ci* Object locked/unlocked
64b1994897Sopenharmony_ci* Marked for GC or not
65b1994897Sopenharmony_ci* Monitor functions(get monitor, notify, notify all, wait)
66b1994897Sopenharmony_ci* Forwarded or not
67b1994897Sopenharmony_ci
68b1994897Sopenharmony_ciMark word depends on configuration and can have different sizes and layout. So here all possible configurations:
69b1994897Sopenharmony_ci
70b1994897Sopenharmony_ci128 bits object header for high-end devices(64 bits pointers):
71b1994897Sopenharmony_ci```
72b1994897Sopenharmony_ci|--------------------------------------------------------------------------------------|--------------------|
73b1994897Sopenharmony_ci|                                   Object Header (128 bits)                           |        State       |
74b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
75b1994897Sopenharmony_ci|                 Mark Word (64 bits)                 |      Class Word (64 bits)      |                    |
76b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
77b1994897Sopenharmony_ci|               nothing:61          | GC:1 | state:00 |     OOP to metadata object     |       Unlock       |
78b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
79b1994897Sopenharmony_ci|    tId:29    |      Lcount:32     | GC:1 | state:00 |     OOP to metadata object     |  Lightweight Lock  |
80b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
81b1994897Sopenharmony_ci|               Monitor:61          | GC:1 | state:01 |     OOP to metadata object     |  Heavyweight Lock  |
82b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
83b1994897Sopenharmony_ci|                Hash:61            | GC:1 | state:10 |     OOP to metadata object     |       Hashed       |
84b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
85b1994897Sopenharmony_ci|           Forwarding address:62          | state:11 |     OOP to metadata object     |         GC         |
86b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
87b1994897Sopenharmony_ci```
88b1994897Sopenharmony_ci64 bits object header for high-end devices(32 bits pointers):
89b1994897Sopenharmony_ci```
90b1994897Sopenharmony_ci|--------------------------------------------------------------------------------------|--------------------|
91b1994897Sopenharmony_ci|                                   Object Header (64 bits)                            |        State       |
92b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
93b1994897Sopenharmony_ci|                 Mark Word (32 bits)                 |      Class Word (32 bits)      |                    |
94b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
95b1994897Sopenharmony_ci|               nothing:29          | GC:1 | state:00 |     OOP to metadata object     |       Unlock       |
96b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
97b1994897Sopenharmony_ci|    tId:13    |      Lcount:16     | GC:1 | state:00 |     OOP to metadata object     |  Lightweight Lock  |
98b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
99b1994897Sopenharmony_ci|               Monitor:29          | GC:1 | state:01 |     OOP to metadata object     |  Heavyweight Lock  |
100b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
101b1994897Sopenharmony_ci|                Hash:29            | GC:1 | state:10 |     OOP to metadata object     |       Hashed       |
102b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
103b1994897Sopenharmony_ci|           Forwarding address:30          | state:11 |     OOP to metadata object     |         GC         |
104b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
105b1994897Sopenharmony_ci```
106b1994897Sopenharmony_ci
107b1994897Sopenharmony_ciHowever, we can also support such version of the object header(Hash is stored just after the object in memory if object was relocated):
108b1994897Sopenharmony_ci```
109b1994897Sopenharmony_ci|--------------------------------------------------------------------------------------|--------------------|
110b1994897Sopenharmony_ci|                                   Object Header (64 bits)                            |        State       |
111b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
112b1994897Sopenharmony_ci|                 Mark Word (32 bits)                 |      Class Word (32 bits)      |                    |
113b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
114b1994897Sopenharmony_ci|        nothing:28        | Hash:1 | GC:1 | state:00 |     OOP to metadata object     |       Unlock       |
115b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
116b1994897Sopenharmony_ci|  tId:13  |   LCount:15   | Hash:1 | GC:1 | state:00 |     OOP to metadata object     |  Lightweight Lock  |
117b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
118b1994897Sopenharmony_ci|        Monitor:28        | Hash:1 | GC:1 | state:01 |     OOP to metadata object     |  Heavyweight Lock  |
119b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
120b1994897Sopenharmony_ci|   Forwarding address:28  | Hash:1 | GC:1 | state:11 |     OOP to metadata object     |         GC         |
121b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
122b1994897Sopenharmony_ci```
123b1994897Sopenharmony_ciThis scenario decreases the size of a Monitor instance, and we don't need to save Hash somewhere during Lightweight Lock too.
124b1994897Sopenharmony_ciUnfortunately, it requires extra memory after GC moved the object (where the original hash value will be stored) and also required extra GC work.
125b1994897Sopenharmony_ciBut, this scenario will be useful if we have allocator and GC which decreases such a situation to a minimum.
126b1994897Sopenharmony_ci
127b1994897Sopenharmony_ci32 bits object header for low-end devices:
128b1994897Sopenharmony_ci```
129b1994897Sopenharmony_ci|--------------------------------------------------------------------------------------|--------------------|
130b1994897Sopenharmony_ci|                                   Object Header (32 bits)                            |        State       |
131b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
132b1994897Sopenharmony_ci|                 Mark Word (16 bits)                 |      Class Word (16 bits)      |                    |
133b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
134b1994897Sopenharmony_ci|               nothing:13          | GC:1 | state:00 |     OOP to metadata object     |       Unlock       |
135b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
136b1994897Sopenharmony_ci|    thread Id:7    | Lock Count:6  | GC:1 | state:00 |     OOP to metadata object     |  Lightweight Lock  |
137b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
138b1994897Sopenharmony_ci|               Monitor:13          | GC:1 | state:01 |     OOP to metadata object     |  Heavyweight Lock  |
139b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
140b1994897Sopenharmony_ci|                Hash:13            | GC:1 | state:10 |     OOP to metadata object     |       Hashed       |
141b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
142b1994897Sopenharmony_ci|         Forwarding address:14            | state:11 |     OOP to metadata object     |         GC         |
143b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------|
144b1994897Sopenharmony_ci```
145b1994897Sopenharmony_ci
146b1994897Sopenharmony_ciStates description:
147b1994897Sopenharmony_ci
148b1994897Sopenharmony_ciUnlock - the object not locked.
149b1994897Sopenharmony_ci
150b1994897Sopenharmony_ciLightweight Lock - object locked by one thread.
151b1994897Sopenharmony_ci
152b1994897Sopenharmony_ciHeavyweight Lock - we have competition for this object(few threads try to lock this object).
153b1994897Sopenharmony_ci
154b1994897Sopenharmony_ciHashed - the object has been hashed, and hash has been stored inside MarkWord.
155b1994897Sopenharmony_ci
156b1994897Sopenharmony_ciGC - the object has been moved by GC. 
157b1994897Sopenharmony_ci
158b1994897Sopenharmony_ci## String and array representation
159b1994897Sopenharmony_ci
160b1994897Sopenharmony_ciArray:
161b1994897Sopenharmony_ci```
162b1994897Sopenharmony_ci+------------------------------------------------+
163b1994897Sopenharmony_ci|             Object Header (64 bits)            |
164b1994897Sopenharmony_ci|------------------------------------------------|
165b1994897Sopenharmony_ci|                Length (32 bits)                |
166b1994897Sopenharmony_ci|------------------------------------------------|
167b1994897Sopenharmony_ci|                  Array payload                 |
168b1994897Sopenharmony_ci+------------------------------------------------+
169b1994897Sopenharmony_ci```
170b1994897Sopenharmony_ciString:
171b1994897Sopenharmony_ci
172b1994897Sopenharmony_ciIf we don't use strings compressing, each string has this structure:
173b1994897Sopenharmony_ci```
174b1994897Sopenharmony_ci+------------------------------------------------+
175b1994897Sopenharmony_ci|             Object Header (64 bits)            |
176b1994897Sopenharmony_ci|------------------------------------------------|
177b1994897Sopenharmony_ci|                Length (32 bits)                |
178b1994897Sopenharmony_ci|------------------------------------------------|
179b1994897Sopenharmony_ci|           String hash value (32 bits)          |
180b1994897Sopenharmony_ci|------------------------------------------------|
181b1994897Sopenharmony_ci|                  String payload                |
182b1994897Sopenharmony_ci+------------------------------------------------+
183b1994897Sopenharmony_ci```
184b1994897Sopenharmony_ciIf we use strings compressing, each string has this structure:
185b1994897Sopenharmony_ci```
186b1994897Sopenharmony_ci+------------------------------------------------+
187b1994897Sopenharmony_ci|             Object Header (64 bits)            |
188b1994897Sopenharmony_ci|------------------------------------------------|
189b1994897Sopenharmony_ci|                Length (31 bits)                |
190b1994897Sopenharmony_ci|------------------------------------------------|
191b1994897Sopenharmony_ci|             Compressed bit (1 bit)             |
192b1994897Sopenharmony_ci|------------------------------------------------|
193b1994897Sopenharmony_ci|           String hash value (32 bits)          |
194b1994897Sopenharmony_ci|------------------------------------------------|
195b1994897Sopenharmony_ci|                  String payload                |
196b1994897Sopenharmony_ci+------------------------------------------------+
197b1994897Sopenharmony_ci```
198b1994897Sopenharmony_ciIf the compressed bit is 1, the string has a compressed payload - 8 bits for each element.
199b1994897Sopenharmony_ci
200b1994897Sopenharmony_ciIf the compressed bit is 0, the string has not been compressed - its payload consists of 16 bits elements.
201b1994897Sopenharmony_ci
202b1994897Sopenharmony_ciOne of the ideas about string representation is to use a hash state inside Mark Word as a container for string hash value (of course we should save object hash somewhere else if it is needed or should use string hash value as the object hash value).
203b1994897Sopenharmony_ci
204b1994897Sopenharmony_ciString:
205b1994897Sopenharmony_ci```
206b1994897Sopenharmony_ci+------------------------------------------------+
207b1994897Sopenharmony_ci| String Hash | GC bit (1 bit) | Status (2 bits) |    <--- Mark Word (32 bits)
208b1994897Sopenharmony_ci|------------------------------------------------|
209b1994897Sopenharmony_ci|              Class Word (32 bits)              |
210b1994897Sopenharmony_ci|------------------------------------------------|
211b1994897Sopenharmony_ci|                Length (32 bits)                |
212b1994897Sopenharmony_ci|------------------------------------------------|
213b1994897Sopenharmony_ci|                  String payload                |
214b1994897Sopenharmony_ci+------------------------------------------------+
215b1994897Sopenharmony_ci```
216b1994897Sopenharmony_ci
217b1994897Sopenharmony_ciSee research [here](./memory-management-overview.md#possible-string-objects-size-reduction).
218b1994897Sopenharmony_ciAbout JS strings and arrays see [here](./memory-management-overview.md#js-strings-and-arrays).
219b1994897Sopenharmony_ci
220b1994897Sopenharmony_ci## Allocators
221b1994897Sopenharmony_ci
222b1994897Sopenharmony_ciRequirements:
223b1994897Sopenharmony_ci- simple and effective allocator for JIT
224b1994897Sopenharmony_ci  - no need to manual cleanup memory
225b1994897Sopenharmony_ci  - efficient all at once deallocation to improve performance
226b1994897Sopenharmony_ci- reasonable fragmentation
227b1994897Sopenharmony_ci- scalable 
228b1994897Sopenharmony_ci- support for pool extension and reduction(i.e., we can add another memory chunk to the allocator, and it can give it back to the global "pool" when it is empty)
229b1994897Sopenharmony_ci- cache awareness
230b1994897Sopenharmony_ci
231b1994897Sopenharmony_ci*(optional) power efficiency
232b1994897Sopenharmony_ci
233b1994897Sopenharmony_ciAll allocators should have these methods:
234b1994897Sopenharmony_ci- method which allocates ```X``` bytes
235b1994897Sopenharmony_ci- method which allocates ```X``` bytes with specified alignment
236b1994897Sopenharmony_ci- method which frees allocated pointed by pointer memory(ArenaAllocator is an exception)
237b1994897Sopenharmony_ci
238b1994897Sopenharmony_ci### Arena Allocator
239b1994897Sopenharmony_ci
240b1994897Sopenharmony_ciIt is a region-based allocator, i.e., all allocated in region/arena objects can be efficiently deallocated all at once.
241b1994897Sopenharmony_ciDeallocation for the specific object doesn't have effect in this allocator.
242b1994897Sopenharmony_ci
243b1994897Sopenharmony_ciJIT flow looks like this:
244b1994897Sopenharmony_ci```
245b1994897Sopenharmony_ciIR -> Optimizations -> Code
246b1994897Sopenharmony_ci```
247b1994897Sopenharmony_ci
248b1994897Sopenharmony_ciAfter code generation, all internal structures of JIT should be deleted.
249b1994897Sopenharmony_ciSo, if we can hold JIT memory usage at some reasonable level - Arena Allocator ideally fits JIT requirements for allocator.
250b1994897Sopenharmony_ci
251b1994897Sopenharmony_ci### Code Allocator
252b1994897Sopenharmony_ci
253b1994897Sopenharmony_ciRequirements:
254b1994897Sopenharmony_ci- should allocate executable memory for JITed code
255b1994897Sopenharmony_ci
256b1994897Sopenharmony_ciThis allocator can be tuned to provide more performance.
257b1994897Sopenharmony_ciFor example, if we have some callgraph info, we can use it and allocate code for connected methods with a minimized potential cache-collision rate.
258b1994897Sopenharmony_ci
259b1994897Sopenharmony_ci### Main allocator
260b1994897Sopenharmony_ci
261b1994897Sopenharmony_ciRequirements:
262b1994897Sopenharmony_ci- acceptable fragmentation
263b1994897Sopenharmony_ci- acceptable allocation cost
264b1994897Sopenharmony_ci- possibility to iterate over the heap
265b1994897Sopenharmony_ci- scalable
266b1994897Sopenharmony_cidesired:
267b1994897Sopenharmony_ci- flexible allocation size list(required to support profile-guided allocation to improve fragmentation and power efficiency)
268b1994897Sopenharmony_ci
269b1994897Sopenharmony_ci#### Implementation details
270b1994897Sopenharmony_ci
271b1994897Sopenharmony_ciEach allocator works over some pool
272b1994897Sopenharmony_ci
273b1994897Sopenharmony_ciSize classes(numbers just informational - they will be tuned after performance analysis):
274b1994897Sopenharmony_ci- small(1b-4Kb)
275b1994897Sopenharmony_ci- large(4Kb - 4Mb)
276b1994897Sopenharmony_ci- humongous(4Mb - Inf)
277b1994897Sopenharmony_ci
278b1994897Sopenharmony_ciSize-segregated algorithm used for small size class to reduce fragmentation. 
279b1994897Sopenharmony_ciSmall objects are joined in "runs"(not individual element for each size, but some "container" with X elements of the same size in it).
280b1994897Sopenharmony_ci```
281b1994897Sopenharmony_ci+--------------------------------------+-----------------+-----------------+-----+-----------------+
282b1994897Sopenharmony_ci| header for run of objects with size X| obj with size X | free mem size X | ... | obj with size X |
283b1994897Sopenharmony_ci+--------------------------------------+-----------------+-----------------+-----+-----------------+
284b1994897Sopenharmony_ci```
285b1994897Sopenharmony_ci
286b1994897Sopenharmony_ciLarge objects are not joined in "runs".
287b1994897Sopenharmony_ci
288b1994897Sopenharmony_ciHumongous objects can be allocated just by proxying requests to the OS(but keep reference to it somewhere) or by using special allocator.
289b1994897Sopenharmony_ci
290b1994897Sopenharmony_ci_Note: below for non-embedded target_
291b1994897Sopenharmony_ci
292b1994897Sopenharmony_ciEach thread maintains a cache for objects(at least for all objects with small size).
293b1994897Sopenharmony_ciThis should reduce overhead because of synchronization tasks.
294b1994897Sopenharmony_ci
295b1994897Sopenharmony_ciLocking policy:
296b1994897Sopenharmony_ci- locks should protect localized/categorized resources(for example one lock for each size in small size class)
297b1994897Sopenharmony_ci- avoid holding locks during memory related system calls(mmap etc.)
298b1994897Sopenharmony_ci
299b1994897Sopenharmony_ci#### Profile-guided allocation
300b1994897Sopenharmony_ci
301b1994897Sopenharmony_ciWe can use profile information about allocation size for improving main allocator metrics.
302b1994897Sopenharmony_ciIf we see a very popular allocation size in profile, we can add it as an explicit segregated size and reduce fragmentation.
303b1994897Sopenharmony_ciTo make it work, allocator should support dynamic size table(or should have possibility choose from statically predefined).
304b1994897Sopenharmony_ci
305b1994897Sopenharmony_ci### Energy efficiency in allocators
306b1994897Sopenharmony_ci
307b1994897Sopenharmony_ciAs shown in this [paper](https://www.cs.york.ac.uk/rts/docs/CODES-EMSOFT-CASES-2006/emsoft/p215.pdf) by changing 
308b1994897Sopenharmony_civarious settings of the allocator, it is possible to get very energy efficient allocator.
309b1994897Sopenharmony_ciThere is no universal approach in this paper, but we can try to mix approach from this paper 
310b1994897Sopenharmony_ciwith our profile-guided approach.
311b1994897Sopenharmony_ci
312b1994897Sopenharmony_ci## Pools and OS interactions
313b1994897Sopenharmony_ci
314b1994897Sopenharmony_ciAll used memory is divided in chunks. Main allocator can extend his pool with these chunks.
315b1994897Sopenharmony_ci
316b1994897Sopenharmony_ciFor the cases when we can get memory shortage we should have some preallocated buffer which allow Runtime to continue to work, while GC trying to free memory.
317b1994897Sopenharmony_ci
318b1994897Sopenharmony_ciNote:
319b1994897Sopenharmony_ciFor the IoT systems without MMU Pools should have non-trivial implementation.
320b1994897Sopenharmony_ci
321b1994897Sopenharmony_ciFor some systems/languages will be implemented context-scoped allocator.
322b1994897Sopenharmony_ciThis allocator works over some arena and after the program will be out of the context - this arena will be returned to the OS.
323b1994897Sopenharmony_ci
324b1994897Sopenharmony_ci## Spaces
325b1994897Sopenharmony_ci
326b1994897Sopenharmony_ci- MemMapSpace, shared between these:
327b1994897Sopenharmony_ci  - Code space (executable)
328b1994897Sopenharmony_ci  - Compiler Internal Space(linked list of arenas)
329b1994897Sopenharmony_ci  - Internal memory space for non-compiler part of runtime (including GC internals)
330b1994897Sopenharmony_ci  - Object space
331b1994897Sopenharmony_ci     - BumpPointerSpace
332b1994897Sopenharmony_ci     - Regular object space
333b1994897Sopenharmony_ci     - Humonguous objects space
334b1994897Sopenharmony_ci     - TLAB space(optional)
335b1994897Sopenharmony_ci     - RegionSpace(optional for some GCs)
336b1994897Sopenharmony_ci     - Non-moving space
337b1994897Sopenharmony_ci- MallocMemSpace
338b1994897Sopenharmony_ci  - Humonguous objects space(optional)
339b1994897Sopenharmony_ci
340b1994897Sopenharmony_ciLogical GC spaces:
341b1994897Sopenharmony_ci- young space (optional for some GCs)
342b1994897Sopenharmony_ci- survivor space (optional)
343b1994897Sopenharmony_ci- tenured space
344b1994897Sopenharmony_ci
345b1994897Sopenharmony_ci## GC
346b1994897Sopenharmony_ci
347b1994897Sopenharmony_ciGarbage collector(GC) automatically recycles memory that it can prove will never be used again.
348b1994897Sopenharmony_ci
349b1994897Sopenharmony_ciGC development will be iterative process.
350b1994897Sopenharmony_ci
351b1994897Sopenharmony_ciCommon requirements:
352b1994897Sopenharmony_ci- precise GC (see [glossary](./glossary.md#memory-management-terms))
353b1994897Sopenharmony_ci- GC should support various [modes](#overview)(performance, power-saving mode, normal mode);
354b1994897Sopenharmony_ci- GC suitable for each mode he shouldn't violate requirements for this mode(see [here](#overview))
355b1994897Sopenharmony_ci
356b1994897Sopenharmony_ciRequirements for Runtime:
357b1994897Sopenharmony_ci- support for precise/exact roots
358b1994897Sopenharmony_ci- GC barriers support by Interpreter and JIT
359b1994897Sopenharmony_ci- safepoints support by Interpreter and JIT
360b1994897Sopenharmony_ci
361b1994897Sopenharmony_ciPanda should support multiple GCs, since different GCs have different advantages(memory usage, throughput) at different benchmarks/applications.
362b1994897Sopenharmony_ciSo we should have possibility to use optimal GC for each application.
363b1994897Sopenharmony_ci
364b1994897Sopenharmony_ci### Epsilon GC
365b1994897Sopenharmony_ci
366b1994897Sopenharmony_ciEpsilon GC does absolutely nothing but makes the impression that Runtime has GC. I.e., it supports all required GC interfaces and can be integrated into Runtime.
367b1994897Sopenharmony_ci
368b1994897Sopenharmony_ciEpsilon GC should be used only for debug and profiling purposes. I.e., we can disable GC and measure in mode "What if we don't have GC".
369b1994897Sopenharmony_ci
370b1994897Sopenharmony_ci### STW GC
371b1994897Sopenharmony_ci
372b1994897Sopenharmony_ciStop-The-World GC.
373b1994897Sopenharmony_ci
374b1994897Sopenharmony_ciNon-generational non-moving GC, during the work all mutator threads should be at safepoint.
375b1994897Sopenharmony_ci
376b1994897Sopenharmony_ci1. Root scan
377b1994897Sopenharmony_ci1. Mark
378b1994897Sopenharmony_ci1. Sweep
379b1994897Sopenharmony_ci
380b1994897Sopenharmony_ci### Concurrent Mark Sweep GC 
381b1994897Sopenharmony_ci
382b1994897Sopenharmony_ciRequirements:
383b1994897Sopenharmony_ci- concurrent
384b1994897Sopenharmony_ci- generational
385b1994897Sopenharmony_ci- low cpu usage (high throughput)
386b1994897Sopenharmony_ci- acceptable STW pause
387b1994897Sopenharmony_ci- (optional) compaction
388b1994897Sopenharmony_ci
389b1994897Sopenharmony_ciWe need to conduct more performance analysis experiments for choosing optimal scheme, but for now let's consider these options:
390b1994897Sopenharmony_ci- generational moving (optionally compacting) GC
391b1994897Sopenharmony_ci- (optional) generational non-moving (optionally compacting) GC
392b1994897Sopenharmony_ci
393b1994897Sopenharmony_ciSpaces(for moving CMS):
394b1994897Sopenharmony_ci```
395b1994897Sopenharmony_ci+------------+------------+----------------------------+
396b1994897Sopenharmony_ci| Eden/young |  Survivor  |        Tenured/old         |
397b1994897Sopenharmony_ci|            | (optional) |                            |
398b1994897Sopenharmony_ci+------------+------------+----------------------------+
399b1994897Sopenharmony_ci```
400b1994897Sopenharmony_ci
401b1994897Sopenharmony_ciSurvivor space is optional and only for high-end targets.
402b1994897Sopenharmony_ciSince one of the metric for this GC - high throughput, the most of the objects in the Eden will live enough to die.
403b1994897Sopenharmony_ciIf we prioritize energy-efficiency metric and the heap sizes at average not gigantic, it seems that we should avoid using survivor spaces.
404b1994897Sopenharmony_ciSo we can support it optionally for experiments. As alternative we can introduce some average age metadata for run of small objects.
405b1994897Sopenharmony_ci
406b1994897Sopenharmony_ciMinor GC(STW):
407b1994897Sopenharmony_ci1. Root scan for young gen, CardTable used for finding roots in old gen
408b1994897Sopenharmony_ci1. Mark eden and move alive objects to the tenured(or survivor)
409b1994897Sopenharmony_ci1. Sweep eden
410b1994897Sopenharmony_ci
411b1994897Sopenharmony_ciNote: we'll use adaptive thresholds for triggering Minor GC for minimizing STW pause
412b1994897Sopenharmony_ciNote #2: we can tune minor GC by trying make concurrent marking and re-mark, but it will require copy of the card table.
413b1994897Sopenharmony_ci
414b1994897Sopenharmony_ciMajor GC
415b1994897Sopenharmony_ci1. Concurrent scan of static roots
416b1994897Sopenharmony_ci1. Initial Mark - root scan(STW #1)
417b1994897Sopenharmony_ci1. Concurrent Marking + Reference processor
418b1994897Sopenharmony_ci1. Remark missed during concurrent marking objects (STW #2)
419b1994897Sopenharmony_ci1. Concurrent Sweep + Finalizers
420b1994897Sopenharmony_ci1. Reset
421b1994897Sopenharmony_ci
422b1994897Sopenharmony_ciReference processor - prevents issues with wrong finalization order.
423b1994897Sopenharmony_ci
424b1994897Sopenharmony_ciNote: If we don't have Survivor spaces we can implement non-moving generational GC. 
425b1994897Sopenharmony_ci
426b1994897Sopenharmony_ci### Region based GC (main)
427b1994897Sopenharmony_ci
428b1994897Sopenharmony_ciRequirements:
429b1994897Sopenharmony_ci- concurrent
430b1994897Sopenharmony_ci- generational
431b1994897Sopenharmony_ci- acceptable stable STW pause
432b1994897Sopenharmony_ci- (optional) compaction
433b1994897Sopenharmony_ci
434b1994897Sopenharmony_ciSince typical heap size for mobile applications is small - this GC can be considered as good choice for production.
435b1994897Sopenharmony_ci
436b1994897Sopenharmony_ciAll heap consists of memory regions with fixed size(it can vary, i.e. size of memory region #K+1 can be different than size of memory region #K).
437b1994897Sopenharmony_ci```
438b1994897Sopenharmony_ci+------------------+------------------+-----+------------------+
439b1994897Sopenharmony_ci| Memory region #1 | Memory region #2 | ... | Memory region #N |
440b1994897Sopenharmony_ci| young            | tenured          | ... | tenured          |
441b1994897Sopenharmony_ci+------------------+------------------+-----+------------------+
442b1994897Sopenharmony_ci```
443b1994897Sopenharmony_ci
444b1994897Sopenharmony_ciRegions types:
445b1994897Sopenharmony_ci- young regions
446b1994897Sopenharmony_ci- tenured regions
447b1994897Sopenharmony_ci- humonguous regions(for humonguous objects)
448b1994897Sopenharmony_ci- empty regions
449b1994897Sopenharmony_ci
450b1994897Sopenharmony_ciIncoming references for each region are tracked via remembered sets:
451b1994897Sopenharmony_ci- old-to-young references
452b1994897Sopenharmony_ci- old-to-old references
453b1994897Sopenharmony_ci
454b1994897Sopenharmony_ciMinor GC(only for young regions - STW):
455b1994897Sopenharmony_ci1. Root scan for young gen, remembered sets used for finding roots in old gen
456b1994897Sopenharmony_ci1. Marking young gen + Reference processor + moving alive objects to the tenured space
457b1994897Sopenharmony_ci1. Sweep + finalizers
458b1994897Sopenharmony_ci
459b1994897Sopenharmony_ciThe size of young space selected to satisfy 
460b1994897Sopenharmony_ci
461b1994897Sopenharmony_ciMixed GC - minor GC + some tenured regions added to the young gen regions after the concurrent marking.
462b1994897Sopenharmony_ciConcurrent marking(triggered when we reach some threshold for tenured generation size):
463b1994897Sopenharmony_ci1. Root scan (STW #1)
464b1994897Sopenharmony_ci1. Concurrent marking + Reference processor
465b1994897Sopenharmony_ci1. Re-mark - finishes marking and update liveness statistics (STW #2)
466b1994897Sopenharmony_ci1. Cleanup - reclaims empty regions and determines if we need mixed collections to reclaim tenured space. Tenured regions selected by using different thresholds.
467b1994897Sopenharmony_ci
468b1994897Sopenharmony_ciNote: RSets optionally can be refined with special threads
469b1994897Sopenharmony_ci
470b1994897Sopenharmony_ci### Low-pause GC (deffered)
471b1994897Sopenharmony_ci
472b1994897Sopenharmony_ciRequirements:
473b1994897Sopenharmony_ci- stable low STW pause/pauseless
474b1994897Sopenharmony_ci- (optional)incremental
475b1994897Sopenharmony_ci- with compaction
476b1994897Sopenharmony_ci
477b1994897Sopenharmony_ciNo explicit minor GC.
478b1994897Sopenharmony_ci
479b1994897Sopenharmony_ciMajor GC
480b1994897Sopenharmony_ci1. Concurrent scan of static roots
481b1994897Sopenharmony_ci1. Initial Mark - root scan(STW #1)
482b1994897Sopenharmony_ci1. Concurrent Marking + Reference processor
483b1994897Sopenharmony_ci1. Concurrent Sweep + Finalizers + Concurrent Copy & Compact
484b1994897Sopenharmony_ci1. Reset
485b1994897Sopenharmony_ci
486b1994897Sopenharmony_ciNote: good choice for the applications with big heap or for applications when it is hard to provide stable low pause with Region based GC.
487b1994897Sopenharmony_ci
488b1994897Sopenharmony_ciNote: compaction is target and mode dependent, so for low-memory devices we can consider [semi-space compaction](./glossary.md#memory-management-terms).
489b1994897Sopenharmony_ciFor straight-forward approach we can consider some support from OS to minimize overlapping of semi-space compaction phases between applications.
490b1994897Sopenharmony_ci
491b1994897Sopenharmony_ci### GC: interaction with Interpreter, JIT and AOT
492b1994897Sopenharmony_ci
493b1994897Sopenharmony_ci#### Safepoints
494b1994897Sopenharmony_ci
495b1994897Sopenharmony_ciPrerequisites:
496b1994897Sopenharmony_ci* one HW register reserved for the pointer to the ExecState(per-thread state), let's call it `RVState`
497b1994897Sopenharmony_ci* ExecState structure has field with address of some page used for safepoints and we knew offset of this field `SPaddrOffset`
498b1994897Sopenharmony_ci
499b1994897Sopenharmony_ciIn general, safepoint will be just presented as some implicit or explicit load from the `[RVState, SPaddrOffset]`.
500b1994897Sopenharmony_ciFor example, it can be something like this: `LDR R12, [RVState, #SPaddrOffset]`
501b1994897Sopenharmony_ci
502b1994897Sopenharmony_ciNote: In some architectures it is make sense to use store instead of load because it requires less registers.
503b1994897Sopenharmony_ci
504b1994897Sopenharmony_ciNote: If it is no MMU available - it is allowed to use explicit condition for safepoint, i.e. something like this(pseudocode):
505b1994897Sopenharmony_ci```
506b1994897Sopenharmony_ciif (SafepointFlag == true) {
507b1994897Sopenharmony_ci    call Runtime::SafepointHandler
508b1994897Sopenharmony_ci}
509b1994897Sopenharmony_ci```
510b1994897Sopenharmony_ci
511b1994897Sopenharmony_ciWhen GC wants to stop the world, it forces it by stopping all threads at the safepoint.
512b1994897Sopenharmony_ciIt protects some predefined safepoint memory page, and it leads to segmentation faults in all execution threads when they do the load from this address.
513b1994897Sopenharmony_ci
514b1994897Sopenharmony_ciSafepoints should be inserted at the beginning of the method and at the head of each loop.
515b1994897Sopenharmony_ci
516b1994897Sopenharmony_ciFor each safepoint, we should have a method that can provide GC with information about objects on the stack.
517b1994897Sopenharmony_ciInterpreter already supports such info in the frames.
518b1994897Sopenharmony_ciBut for JIT/compiler, it looks like we need some generated(by JIT/compiler) method that can get all necessary data for the safepoint.
519b1994897Sopenharmony_ciThis method can actually be just some code without prologue and epilogue.
520b1994897Sopenharmony_ciWe'll jump to its beginning from signal handler, and in the end, we should jump back to the safepoint, so probably we should put it near the original code.
521b1994897Sopenharmony_ci
522b1994897Sopenharmony_ciSo the flow looks like this:
523b1994897Sopenharmony_ci
524b1994897Sopenharmony_ci```
525b1994897Sopenharmony_ci ...
526b1994897Sopenharmony_ci | compiled/jitted code | ------>
527b1994897Sopenharmony_ci | safepoint #X in the code | ---seg fault---> 
528b1994897Sopenharmony_ci | signal handler | ---change return pc explicitly---> 
529b1994897Sopenharmony_ci | method that prepares data about objects on stack for the #X safepoint and waits until STW ends | ---jump via encoded relative branch to safepoint--->
530b1994897Sopenharmony_ci | safepoint #X in the code | ---normal execution--->
531b1994897Sopenharmony_ci | compiled/jitted code | ------>
532b1994897Sopenharmony_ci ...
533b1994897Sopenharmony_ci```
534b1994897Sopenharmony_ci
535b1994897Sopenharmony_ci**Opens**: 
536b1994897Sopenharmony_ci* should we generate method for each safepoint, or all safepoints at once?
537b1994897Sopenharmony_ci
538b1994897Sopenharmony_ci#### GC Barriers
539b1994897Sopenharmony_ci
540b1994897Sopenharmony_ciGC barrier is a block on writing to(write barrier) or reading from(read barrier) certain memory by the application code. GC Barriers used to ensure heap consistency and optimize some of GC flows.  
541b1994897Sopenharmony_ci
542b1994897Sopenharmony_ci##### GC Write Barriers
543b1994897Sopenharmony_ci
544b1994897Sopenharmony_ciHeap inconsistency can happen when GC reclaim alive/reachable object.
545b1994897Sopenharmony_ciI.e. these two conditions should happen to reclaim active/reachable:
546b1994897Sopenharmony_ci1. We store reference to a white object into a black object
547b1994897Sopenharmony_ci1. There are no paths from any gray object to that white object
548b1994897Sopenharmony_ci
549b1994897Sopenharmony_ciBesides addressing of heap inconsistency problem, write barrier can be used for maintaining incoming references for young generation or region.
550b1994897Sopenharmony_ci 
551b1994897Sopenharmony_ciSo we can solve these issues with GC WRB(write barrier). GC WRB can be _pre_(inserted before the store) and _post_(inserted after the store). This barriers used **only** when we store reference to the object to some field of an object.
552b1994897Sopenharmony_ci
553b1994897Sopenharmony_ci_Pre_ barrier usually used to solve issue with lost alive object during concurrent marking. Pseudocode(example):
554b1994897Sopenharmony_ci```c++
555b1994897Sopenharmony_ciif (UNLIKELY(concurrent_marking)) {
556b1994897Sopenharmony_ci    auto pre_val = obj.field;
557b1994897Sopenharmony_ci    if (pre_val != nullptr) {
558b1994897Sopenharmony_ci         store_in_buff_to_mark(pre_val); // call function which stores reference to object stored in the field to process it later
559b1994897Sopenharmony_ci    }
560b1994897Sopenharmony_ci}
561b1994897Sopenharmony_ciobj.field = new_val; // STORE for which barrier generated
562b1994897Sopenharmony_ci```    
563b1994897Sopenharmony_ci
564b1994897Sopenharmony_ci_Post_ barrier can be used to solve issue with tracking references from tenured generation to the young generation(or inter-region references). In this case we always know external roots for the young generation space(or for region). Pseudocode(abstract example, not real one):
565b1994897Sopenharmony_ci```c++
566b1994897Sopenharmony_ciobj.field = new_val; // STORE for which barrier generated 
567b1994897Sopenharmony_ciif ((AddressOf(obj.field) not in [YOUNG_GENERATION_ADDR_BEG, YOUNG_GENERATION_ADDR_END]) && 
568b1994897Sopenharmony_ci    (AddressOf(new_val) in [YOUNG_GENERATION_ADDR_BEG, YOUNG_GENERATION_ADDR_END])) {
569b1994897Sopenharmony_ci    update_card(AddressOf(obj.field)); // call function which marks some memory range as containing roots for young generation 
570b1994897Sopenharmony_ci}
571b1994897Sopenharmony_ci``` 
572b1994897Sopenharmony_ciNote: Sometimes we don't check if object and stored reference in different generations. Because we get much less overhead this way.
573b1994897Sopenharmony_ci
574b1994897Sopenharmony_ci##### GC Read Barriers
575b1994897Sopenharmony_ci
576b1994897Sopenharmony_ciRead barriers used during concurrent compaction in some GCs.
577b1994897Sopenharmony_ciFor example we concurrently moving object from one place(`from-space`) to the another(`to-space`).
578b1994897Sopenharmony_ciAt some moment we can have two instance of the one object.
579b1994897Sopenharmony_ciSo we need one of these conditions should stand if we want to keep heap consistent:
580b1994897Sopenharmony_ci1. All writes happen into `to-space` instance of the object, but reads can happen from both `from-space` and `to-space` instances
581b1994897Sopenharmony_ci1. All writes and reads happen into/from `to-space`
582b1994897Sopenharmony_ci
583b1994897Sopenharmony_ci#### GC Barriers integration with Interpreter and compiler
584b1994897Sopenharmony_ci
585b1994897Sopenharmony_ci
586b1994897Sopenharmony_ciFrom Interpreter you could use runtime interface methods:
587b1994897Sopenharmony_ci```c++
588b1994897Sopenharmony_cistatic void PreBarrier(void *obj_field_addr, void *pre_val_addr);
589b1994897Sopenharmony_cistatic void PostBarrier(void *obj_field_addr, void *val_addr);
590b1994897Sopenharmony_ci```
591b1994897Sopenharmony_ciNote: for performance, we can put into ExecState address of conditional flag for conditional barriers with trivial condition (`if (*x) ...`).
592b1994897Sopenharmony_ci
593b1994897Sopenharmony_ciIt is critical to make compiler to encode barriers very optimally. At least fast path should be encoded effectively.
594b1994897Sopenharmony_ciThere are several approaches for that:
595b1994897Sopenharmony_ci 1. To describe barrier use some meta-language or IR which can be interpreted/encoded by all compilers compatible with runtime (it is currently not applicable for the runtime)
596b1994897Sopenharmony_ci 1. (a lot of open questions here, so consider this as an idea) One compiler knows how to encode barrier using runtime interfaces (see next item) and could provide some more compiler-friendly interface to the other compilers to encode GC barriers.
597b1994897Sopenharmony_ci 1. The compiler knows for each barrier type how it should be encoded (see pseudocode in `libpandabase/mem/gc_barrier.h`). And could use the runtime to get all required operands to do this.
598b1994897Sopenharmony_ciLet's consider below encoding of PRE_ barrier:
599b1994897Sopenharmony_ci   - get barrier type via RuntimeInterface: `BarrierType GetPreType() const`
600b1994897Sopenharmony_ci   - for this barrier type get all needed operands provided by Runtime via
601b1994897Sopenharmony_ci     `BarrierOperand GCBarrierSet::GetBarrierOperand(BarrierPosition barrier_position, std::string_view name);`
602b1994897Sopenharmony_ci     (you should use operand/parameters names from pseudocode provided in `enum BarrierType`)
603b1994897Sopenharmony_ci   - encode barrier code using loaded operands and pseudocode from `enum BarrierType`
604b1994897Sopenharmony_ci
605b1994897Sopenharmony_ci## Memory sanitizers support
606b1994897Sopenharmony_ci
607b1994897Sopenharmony_ciPanda Runtime should support [ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer).
608b1994897Sopenharmony_ci
609b1994897Sopenharmony_ciOptional: [MSAN](https://github.com/google/sanitizers/wiki/MemorySanitizer)
610b1994897Sopenharmony_ci(Note: not possible to use without custom built toolchain) 
611b1994897Sopenharmony_ci
612b1994897Sopenharmony_ciDesirable, but not easy to support: [HWSAN](https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html)
613