1b1994897Sopenharmony_ci# Memory management and object layout. Design. 2b1994897Sopenharmony_ci 3b1994897Sopenharmony_ci## Overview 4b1994897Sopenharmony_ci 5b1994897Sopenharmony_ciPanda Runtime should be scalable onto different devices/OSes. So we need some abstraction level for the OS memory management. 6b1994897Sopenharmony_ciFor now, all targets suppose interaction with the user, so we have some limitations for the STW pause metric. 7b1994897Sopenharmony_ciWe have very limited memory resources for IoT target, so we should maximize efforts on reducing memory overhead(fragmentation and object header size). 8b1994897Sopenharmony_ci 9b1994897Sopenharmony_ciThe main components of Panda memory management and object model: 10b1994897Sopenharmony_ci* [Allocators](#allocators) 11b1994897Sopenharmony_ci* [GC](#gc) 12b1994897Sopenharmony_ci* [Object header](#object-header) 13b1994897Sopenharmony_ci 14b1994897Sopenharmony_ciPanda runtime works/interacts with these memory types: 15b1994897Sopenharmony_ci* internal memory for runtime(ArenaAllocators for JIT, etc) 16b1994897Sopenharmony_ci* application memory(i.e., memory for objects created by application) 17b1994897Sopenharmony_ci* native memory via JNI/FFI 18b1994897Sopenharmony_ci* memory for JITed code 19b1994897Sopenharmony_ci 20b1994897Sopenharmony_ci 21b1994897Sopenharmony_ci 22b1994897Sopenharmony_ciThere are several modes for memory management: 23b1994897Sopenharmony_ci- base mode 24b1994897Sopenharmony_ci - allocators with some average metrics and profile-based configuration(if available) 25b1994897Sopenharmony_ci - some baseline GC with profile-based configuration(if available) 26b1994897Sopenharmony_ci- performance 27b1994897Sopenharmony_ci - allocators with low allocation cost 28b1994897Sopenharmony_ci - low-pause/pauseless GC(for games) or GC with high throughput and acceptable STW pause (for not games) 29b1994897Sopenharmony_ci- power-saving mode 30b1994897Sopenharmony_ci - energy-efficient allocators(if possible) 31b1994897Sopenharmony_ci - special thresholds to improve power efficiency, 32b1994897Sopenharmony_ci 33b1994897Sopenharmony_ciMode are chosen at the startup time (we'll use profile info from cloud for that). 34b1994897Sopenharmony_ci 35b1994897Sopenharmony_ci## Object header 36b1994897Sopenharmony_ci 37b1994897Sopenharmony_ciRationale see [here](memory-management-overview.md). 38b1994897Sopenharmony_ci 39b1994897Sopenharmony_ci### Requirements 40b1994897Sopenharmony_ci 41b1994897Sopenharmony_ci* Support all required features from Runtime 42b1994897Sopenharmony_ci* Similar design for two different platforms - high-end and low-end 43b1994897Sopenharmony_ci* Compact Object Header for low-end target 44b1994897Sopenharmony_ci 45b1994897Sopenharmony_ci### Specification / Implementation 46b1994897Sopenharmony_ci 47b1994897Sopenharmony_ci**Common ObjectHeader methods:** 48b1994897Sopenharmony_ci 49b1994897Sopenharmony_ci* Get/Set Mark or Class word 50b1994897Sopenharmony_ci* Get size of the object header and an object itself 51b1994897Sopenharmony_ci* Get/Generate an object hash 52b1994897Sopenharmony_ci 53b1994897Sopenharmony_ci**Methods, specific for Class word:** 54b1994897Sopenharmony_ci 55b1994897Sopenharmony_ci* Get different object fields 56b1994897Sopenharmony_ci* Return object type 57b1994897Sopenharmony_ci* Verify object 58b1994897Sopenharmony_ci* Is it a subclass or not, is it an array or not, etc. 59b1994897Sopenharmony_ci* Get field address 60b1994897Sopenharmony_ci 61b1994897Sopenharmony_ci**Methods, specific for Mark word:** 62b1994897Sopenharmony_ci 63b1994897Sopenharmony_ci* Object locked/unlocked 64b1994897Sopenharmony_ci* Marked for GC or not 65b1994897Sopenharmony_ci* Monitor functions(get monitor, notify, notify all, wait) 66b1994897Sopenharmony_ci* Forwarded or not 67b1994897Sopenharmony_ci 68b1994897Sopenharmony_ciMark word depends on configuration and can have different sizes and layout. So here all possible configurations: 69b1994897Sopenharmony_ci 70b1994897Sopenharmony_ci128 bits object header for high-end devices(64 bits pointers): 71b1994897Sopenharmony_ci``` 72b1994897Sopenharmony_ci|--------------------------------------------------------------------------------------|--------------------| 73b1994897Sopenharmony_ci| Object Header (128 bits) | State | 74b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 75b1994897Sopenharmony_ci| Mark Word (64 bits) | Class Word (64 bits) | | 76b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 77b1994897Sopenharmony_ci| nothing:61 | GC:1 | state:00 | OOP to metadata object | Unlock | 78b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 79b1994897Sopenharmony_ci| tId:29 | Lcount:32 | GC:1 | state:00 | OOP to metadata object | Lightweight Lock | 80b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 81b1994897Sopenharmony_ci| Monitor:61 | GC:1 | state:01 | OOP to metadata object | Heavyweight Lock | 82b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 83b1994897Sopenharmony_ci| Hash:61 | GC:1 | state:10 | OOP to metadata object | Hashed | 84b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 85b1994897Sopenharmony_ci| Forwarding address:62 | state:11 | OOP to metadata object | GC | 86b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 87b1994897Sopenharmony_ci``` 88b1994897Sopenharmony_ci64 bits object header for high-end devices(32 bits pointers): 89b1994897Sopenharmony_ci``` 90b1994897Sopenharmony_ci|--------------------------------------------------------------------------------------|--------------------| 91b1994897Sopenharmony_ci| Object Header (64 bits) | State | 92b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 93b1994897Sopenharmony_ci| Mark Word (32 bits) | Class Word (32 bits) | | 94b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 95b1994897Sopenharmony_ci| nothing:29 | GC:1 | state:00 | OOP to metadata object | Unlock | 96b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 97b1994897Sopenharmony_ci| tId:13 | Lcount:16 | GC:1 | state:00 | OOP to metadata object | Lightweight Lock | 98b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 99b1994897Sopenharmony_ci| Monitor:29 | GC:1 | state:01 | OOP to metadata object | Heavyweight Lock | 100b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 101b1994897Sopenharmony_ci| Hash:29 | GC:1 | state:10 | OOP to metadata object | Hashed | 102b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 103b1994897Sopenharmony_ci| Forwarding address:30 | state:11 | OOP to metadata object | GC | 104b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 105b1994897Sopenharmony_ci``` 106b1994897Sopenharmony_ci 107b1994897Sopenharmony_ciHowever, we can also support such version of the object header(Hash is stored just after the object in memory if object was relocated): 108b1994897Sopenharmony_ci``` 109b1994897Sopenharmony_ci|--------------------------------------------------------------------------------------|--------------------| 110b1994897Sopenharmony_ci| Object Header (64 bits) | State | 111b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 112b1994897Sopenharmony_ci| Mark Word (32 bits) | Class Word (32 bits) | | 113b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 114b1994897Sopenharmony_ci| nothing:28 | Hash:1 | GC:1 | state:00 | OOP to metadata object | Unlock | 115b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 116b1994897Sopenharmony_ci| tId:13 | LCount:15 | Hash:1 | GC:1 | state:00 | OOP to metadata object | Lightweight Lock | 117b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 118b1994897Sopenharmony_ci| Monitor:28 | Hash:1 | GC:1 | state:01 | OOP to metadata object | Heavyweight Lock | 119b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 120b1994897Sopenharmony_ci| Forwarding address:28 | Hash:1 | GC:1 | state:11 | OOP to metadata object | GC | 121b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 122b1994897Sopenharmony_ci``` 123b1994897Sopenharmony_ciThis scenario decreases the size of a Monitor instance, and we don't need to save Hash somewhere during Lightweight Lock too. 124b1994897Sopenharmony_ciUnfortunately, it requires extra memory after GC moved the object (where the original hash value will be stored) and also required extra GC work. 125b1994897Sopenharmony_ciBut, this scenario will be useful if we have allocator and GC which decreases such a situation to a minimum. 126b1994897Sopenharmony_ci 127b1994897Sopenharmony_ci32 bits object header for low-end devices: 128b1994897Sopenharmony_ci``` 129b1994897Sopenharmony_ci|--------------------------------------------------------------------------------------|--------------------| 130b1994897Sopenharmony_ci| Object Header (32 bits) | State | 131b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 132b1994897Sopenharmony_ci| Mark Word (16 bits) | Class Word (16 bits) | | 133b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 134b1994897Sopenharmony_ci| nothing:13 | GC:1 | state:00 | OOP to metadata object | Unlock | 135b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 136b1994897Sopenharmony_ci| thread Id:7 | Lock Count:6 | GC:1 | state:00 | OOP to metadata object | Lightweight Lock | 137b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 138b1994897Sopenharmony_ci| Monitor:13 | GC:1 | state:01 | OOP to metadata object | Heavyweight Lock | 139b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 140b1994897Sopenharmony_ci| Hash:13 | GC:1 | state:10 | OOP to metadata object | Hashed | 141b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 142b1994897Sopenharmony_ci| Forwarding address:14 | state:11 | OOP to metadata object | GC | 143b1994897Sopenharmony_ci|-----------------------------------------------------|--------------------------------|--------------------| 144b1994897Sopenharmony_ci``` 145b1994897Sopenharmony_ci 146b1994897Sopenharmony_ciStates description: 147b1994897Sopenharmony_ci 148b1994897Sopenharmony_ciUnlock - the object not locked. 149b1994897Sopenharmony_ci 150b1994897Sopenharmony_ciLightweight Lock - object locked by one thread. 151b1994897Sopenharmony_ci 152b1994897Sopenharmony_ciHeavyweight Lock - we have competition for this object(few threads try to lock this object). 153b1994897Sopenharmony_ci 154b1994897Sopenharmony_ciHashed - the object has been hashed, and hash has been stored inside MarkWord. 155b1994897Sopenharmony_ci 156b1994897Sopenharmony_ciGC - the object has been moved by GC. 157b1994897Sopenharmony_ci 158b1994897Sopenharmony_ci## String and array representation 159b1994897Sopenharmony_ci 160b1994897Sopenharmony_ciArray: 161b1994897Sopenharmony_ci``` 162b1994897Sopenharmony_ci+------------------------------------------------+ 163b1994897Sopenharmony_ci| Object Header (64 bits) | 164b1994897Sopenharmony_ci|------------------------------------------------| 165b1994897Sopenharmony_ci| Length (32 bits) | 166b1994897Sopenharmony_ci|------------------------------------------------| 167b1994897Sopenharmony_ci| Array payload | 168b1994897Sopenharmony_ci+------------------------------------------------+ 169b1994897Sopenharmony_ci``` 170b1994897Sopenharmony_ciString: 171b1994897Sopenharmony_ci 172b1994897Sopenharmony_ciIf we don't use strings compressing, each string has this structure: 173b1994897Sopenharmony_ci``` 174b1994897Sopenharmony_ci+------------------------------------------------+ 175b1994897Sopenharmony_ci| Object Header (64 bits) | 176b1994897Sopenharmony_ci|------------------------------------------------| 177b1994897Sopenharmony_ci| Length (32 bits) | 178b1994897Sopenharmony_ci|------------------------------------------------| 179b1994897Sopenharmony_ci| String hash value (32 bits) | 180b1994897Sopenharmony_ci|------------------------------------------------| 181b1994897Sopenharmony_ci| String payload | 182b1994897Sopenharmony_ci+------------------------------------------------+ 183b1994897Sopenharmony_ci``` 184b1994897Sopenharmony_ciIf we use strings compressing, each string has this structure: 185b1994897Sopenharmony_ci``` 186b1994897Sopenharmony_ci+------------------------------------------------+ 187b1994897Sopenharmony_ci| Object Header (64 bits) | 188b1994897Sopenharmony_ci|------------------------------------------------| 189b1994897Sopenharmony_ci| Length (31 bits) | 190b1994897Sopenharmony_ci|------------------------------------------------| 191b1994897Sopenharmony_ci| Compressed bit (1 bit) | 192b1994897Sopenharmony_ci|------------------------------------------------| 193b1994897Sopenharmony_ci| String hash value (32 bits) | 194b1994897Sopenharmony_ci|------------------------------------------------| 195b1994897Sopenharmony_ci| String payload | 196b1994897Sopenharmony_ci+------------------------------------------------+ 197b1994897Sopenharmony_ci``` 198b1994897Sopenharmony_ciIf the compressed bit is 1, the string has a compressed payload - 8 bits for each element. 199b1994897Sopenharmony_ci 200b1994897Sopenharmony_ciIf the compressed bit is 0, the string has not been compressed - its payload consists of 16 bits elements. 201b1994897Sopenharmony_ci 202b1994897Sopenharmony_ciOne of the ideas about string representation is to use a hash state inside Mark Word as a container for string hash value (of course we should save object hash somewhere else if it is needed or should use string hash value as the object hash value). 203b1994897Sopenharmony_ci 204b1994897Sopenharmony_ciString: 205b1994897Sopenharmony_ci``` 206b1994897Sopenharmony_ci+------------------------------------------------+ 207b1994897Sopenharmony_ci| String Hash | GC bit (1 bit) | Status (2 bits) | <--- Mark Word (32 bits) 208b1994897Sopenharmony_ci|------------------------------------------------| 209b1994897Sopenharmony_ci| Class Word (32 bits) | 210b1994897Sopenharmony_ci|------------------------------------------------| 211b1994897Sopenharmony_ci| Length (32 bits) | 212b1994897Sopenharmony_ci|------------------------------------------------| 213b1994897Sopenharmony_ci| String payload | 214b1994897Sopenharmony_ci+------------------------------------------------+ 215b1994897Sopenharmony_ci``` 216b1994897Sopenharmony_ci 217b1994897Sopenharmony_ciSee research [here](./memory-management-overview.md#possible-string-objects-size-reduction). 218b1994897Sopenharmony_ciAbout JS strings and arrays see [here](./memory-management-overview.md#js-strings-and-arrays). 219b1994897Sopenharmony_ci 220b1994897Sopenharmony_ci## Allocators 221b1994897Sopenharmony_ci 222b1994897Sopenharmony_ciRequirements: 223b1994897Sopenharmony_ci- simple and effective allocator for JIT 224b1994897Sopenharmony_ci - no need to manual cleanup memory 225b1994897Sopenharmony_ci - efficient all at once deallocation to improve performance 226b1994897Sopenharmony_ci- reasonable fragmentation 227b1994897Sopenharmony_ci- scalable 228b1994897Sopenharmony_ci- support for pool extension and reduction(i.e., we can add another memory chunk to the allocator, and it can give it back to the global "pool" when it is empty) 229b1994897Sopenharmony_ci- cache awareness 230b1994897Sopenharmony_ci 231b1994897Sopenharmony_ci*(optional) power efficiency 232b1994897Sopenharmony_ci 233b1994897Sopenharmony_ciAll allocators should have these methods: 234b1994897Sopenharmony_ci- method which allocates ```X``` bytes 235b1994897Sopenharmony_ci- method which allocates ```X``` bytes with specified alignment 236b1994897Sopenharmony_ci- method which frees allocated pointed by pointer memory(ArenaAllocator is an exception) 237b1994897Sopenharmony_ci 238b1994897Sopenharmony_ci### Arena Allocator 239b1994897Sopenharmony_ci 240b1994897Sopenharmony_ciIt is a region-based allocator, i.e., all allocated in region/arena objects can be efficiently deallocated all at once. 241b1994897Sopenharmony_ciDeallocation for the specific object doesn't have effect in this allocator. 242b1994897Sopenharmony_ci 243b1994897Sopenharmony_ciJIT flow looks like this: 244b1994897Sopenharmony_ci``` 245b1994897Sopenharmony_ciIR -> Optimizations -> Code 246b1994897Sopenharmony_ci``` 247b1994897Sopenharmony_ci 248b1994897Sopenharmony_ciAfter code generation, all internal structures of JIT should be deleted. 249b1994897Sopenharmony_ciSo, if we can hold JIT memory usage at some reasonable level - Arena Allocator ideally fits JIT requirements for allocator. 250b1994897Sopenharmony_ci 251b1994897Sopenharmony_ci### Code Allocator 252b1994897Sopenharmony_ci 253b1994897Sopenharmony_ciRequirements: 254b1994897Sopenharmony_ci- should allocate executable memory for JITed code 255b1994897Sopenharmony_ci 256b1994897Sopenharmony_ciThis allocator can be tuned to provide more performance. 257b1994897Sopenharmony_ciFor example, if we have some callgraph info, we can use it and allocate code for connected methods with a minimized potential cache-collision rate. 258b1994897Sopenharmony_ci 259b1994897Sopenharmony_ci### Main allocator 260b1994897Sopenharmony_ci 261b1994897Sopenharmony_ciRequirements: 262b1994897Sopenharmony_ci- acceptable fragmentation 263b1994897Sopenharmony_ci- acceptable allocation cost 264b1994897Sopenharmony_ci- possibility to iterate over the heap 265b1994897Sopenharmony_ci- scalable 266b1994897Sopenharmony_cidesired: 267b1994897Sopenharmony_ci- flexible allocation size list(required to support profile-guided allocation to improve fragmentation and power efficiency) 268b1994897Sopenharmony_ci 269b1994897Sopenharmony_ci#### Implementation details 270b1994897Sopenharmony_ci 271b1994897Sopenharmony_ciEach allocator works over some pool 272b1994897Sopenharmony_ci 273b1994897Sopenharmony_ciSize classes(numbers just informational - they will be tuned after performance analysis): 274b1994897Sopenharmony_ci- small(1b-4Kb) 275b1994897Sopenharmony_ci- large(4Kb - 4Mb) 276b1994897Sopenharmony_ci- humongous(4Mb - Inf) 277b1994897Sopenharmony_ci 278b1994897Sopenharmony_ciSize-segregated algorithm used for small size class to reduce fragmentation. 279b1994897Sopenharmony_ciSmall objects are joined in "runs"(not individual element for each size, but some "container" with X elements of the same size in it). 280b1994897Sopenharmony_ci``` 281b1994897Sopenharmony_ci+--------------------------------------+-----------------+-----------------+-----+-----------------+ 282b1994897Sopenharmony_ci| header for run of objects with size X| obj with size X | free mem size X | ... | obj with size X | 283b1994897Sopenharmony_ci+--------------------------------------+-----------------+-----------------+-----+-----------------+ 284b1994897Sopenharmony_ci``` 285b1994897Sopenharmony_ci 286b1994897Sopenharmony_ciLarge objects are not joined in "runs". 287b1994897Sopenharmony_ci 288b1994897Sopenharmony_ciHumongous objects can be allocated just by proxying requests to the OS(but keep reference to it somewhere) or by using special allocator. 289b1994897Sopenharmony_ci 290b1994897Sopenharmony_ci_Note: below for non-embedded target_ 291b1994897Sopenharmony_ci 292b1994897Sopenharmony_ciEach thread maintains a cache for objects(at least for all objects with small size). 293b1994897Sopenharmony_ciThis should reduce overhead because of synchronization tasks. 294b1994897Sopenharmony_ci 295b1994897Sopenharmony_ciLocking policy: 296b1994897Sopenharmony_ci- locks should protect localized/categorized resources(for example one lock for each size in small size class) 297b1994897Sopenharmony_ci- avoid holding locks during memory related system calls(mmap etc.) 298b1994897Sopenharmony_ci 299b1994897Sopenharmony_ci#### Profile-guided allocation 300b1994897Sopenharmony_ci 301b1994897Sopenharmony_ciWe can use profile information about allocation size for improving main allocator metrics. 302b1994897Sopenharmony_ciIf we see a very popular allocation size in profile, we can add it as an explicit segregated size and reduce fragmentation. 303b1994897Sopenharmony_ciTo make it work, allocator should support dynamic size table(or should have possibility choose from statically predefined). 304b1994897Sopenharmony_ci 305b1994897Sopenharmony_ci### Energy efficiency in allocators 306b1994897Sopenharmony_ci 307b1994897Sopenharmony_ciAs shown in this [paper](https://www.cs.york.ac.uk/rts/docs/CODES-EMSOFT-CASES-2006/emsoft/p215.pdf) by changing 308b1994897Sopenharmony_civarious settings of the allocator, it is possible to get very energy efficient allocator. 309b1994897Sopenharmony_ciThere is no universal approach in this paper, but we can try to mix approach from this paper 310b1994897Sopenharmony_ciwith our profile-guided approach. 311b1994897Sopenharmony_ci 312b1994897Sopenharmony_ci## Pools and OS interactions 313b1994897Sopenharmony_ci 314b1994897Sopenharmony_ciAll used memory is divided in chunks. Main allocator can extend his pool with these chunks. 315b1994897Sopenharmony_ci 316b1994897Sopenharmony_ciFor the cases when we can get memory shortage we should have some preallocated buffer which allow Runtime to continue to work, while GC trying to free memory. 317b1994897Sopenharmony_ci 318b1994897Sopenharmony_ciNote: 319b1994897Sopenharmony_ciFor the IoT systems without MMU Pools should have non-trivial implementation. 320b1994897Sopenharmony_ci 321b1994897Sopenharmony_ciFor some systems/languages will be implemented context-scoped allocator. 322b1994897Sopenharmony_ciThis allocator works over some arena and after the program will be out of the context - this arena will be returned to the OS. 323b1994897Sopenharmony_ci 324b1994897Sopenharmony_ci## Spaces 325b1994897Sopenharmony_ci 326b1994897Sopenharmony_ci- MemMapSpace, shared between these: 327b1994897Sopenharmony_ci - Code space (executable) 328b1994897Sopenharmony_ci - Compiler Internal Space(linked list of arenas) 329b1994897Sopenharmony_ci - Internal memory space for non-compiler part of runtime (including GC internals) 330b1994897Sopenharmony_ci - Object space 331b1994897Sopenharmony_ci - BumpPointerSpace 332b1994897Sopenharmony_ci - Regular object space 333b1994897Sopenharmony_ci - Humonguous objects space 334b1994897Sopenharmony_ci - TLAB space(optional) 335b1994897Sopenharmony_ci - RegionSpace(optional for some GCs) 336b1994897Sopenharmony_ci - Non-moving space 337b1994897Sopenharmony_ci- MallocMemSpace 338b1994897Sopenharmony_ci - Humonguous objects space(optional) 339b1994897Sopenharmony_ci 340b1994897Sopenharmony_ciLogical GC spaces: 341b1994897Sopenharmony_ci- young space (optional for some GCs) 342b1994897Sopenharmony_ci- survivor space (optional) 343b1994897Sopenharmony_ci- tenured space 344b1994897Sopenharmony_ci 345b1994897Sopenharmony_ci## GC 346b1994897Sopenharmony_ci 347b1994897Sopenharmony_ciGarbage collector(GC) automatically recycles memory that it can prove will never be used again. 348b1994897Sopenharmony_ci 349b1994897Sopenharmony_ciGC development will be iterative process. 350b1994897Sopenharmony_ci 351b1994897Sopenharmony_ciCommon requirements: 352b1994897Sopenharmony_ci- precise GC (see [glossary](./glossary.md#memory-management-terms)) 353b1994897Sopenharmony_ci- GC should support various [modes](#overview)(performance, power-saving mode, normal mode); 354b1994897Sopenharmony_ci- GC suitable for each mode he shouldn't violate requirements for this mode(see [here](#overview)) 355b1994897Sopenharmony_ci 356b1994897Sopenharmony_ciRequirements for Runtime: 357b1994897Sopenharmony_ci- support for precise/exact roots 358b1994897Sopenharmony_ci- GC barriers support by Interpreter and JIT 359b1994897Sopenharmony_ci- safepoints support by Interpreter and JIT 360b1994897Sopenharmony_ci 361b1994897Sopenharmony_ciPanda should support multiple GCs, since different GCs have different advantages(memory usage, throughput) at different benchmarks/applications. 362b1994897Sopenharmony_ciSo we should have possibility to use optimal GC for each application. 363b1994897Sopenharmony_ci 364b1994897Sopenharmony_ci### Epsilon GC 365b1994897Sopenharmony_ci 366b1994897Sopenharmony_ciEpsilon GC does absolutely nothing but makes the impression that Runtime has GC. I.e., it supports all required GC interfaces and can be integrated into Runtime. 367b1994897Sopenharmony_ci 368b1994897Sopenharmony_ciEpsilon GC should be used only for debug and profiling purposes. I.e., we can disable GC and measure in mode "What if we don't have GC". 369b1994897Sopenharmony_ci 370b1994897Sopenharmony_ci### STW GC 371b1994897Sopenharmony_ci 372b1994897Sopenharmony_ciStop-The-World GC. 373b1994897Sopenharmony_ci 374b1994897Sopenharmony_ciNon-generational non-moving GC, during the work all mutator threads should be at safepoint. 375b1994897Sopenharmony_ci 376b1994897Sopenharmony_ci1. Root scan 377b1994897Sopenharmony_ci1. Mark 378b1994897Sopenharmony_ci1. Sweep 379b1994897Sopenharmony_ci 380b1994897Sopenharmony_ci### Concurrent Mark Sweep GC 381b1994897Sopenharmony_ci 382b1994897Sopenharmony_ciRequirements: 383b1994897Sopenharmony_ci- concurrent 384b1994897Sopenharmony_ci- generational 385b1994897Sopenharmony_ci- low cpu usage (high throughput) 386b1994897Sopenharmony_ci- acceptable STW pause 387b1994897Sopenharmony_ci- (optional) compaction 388b1994897Sopenharmony_ci 389b1994897Sopenharmony_ciWe need to conduct more performance analysis experiments for choosing optimal scheme, but for now let's consider these options: 390b1994897Sopenharmony_ci- generational moving (optionally compacting) GC 391b1994897Sopenharmony_ci- (optional) generational non-moving (optionally compacting) GC 392b1994897Sopenharmony_ci 393b1994897Sopenharmony_ciSpaces(for moving CMS): 394b1994897Sopenharmony_ci``` 395b1994897Sopenharmony_ci+------------+------------+----------------------------+ 396b1994897Sopenharmony_ci| Eden/young | Survivor | Tenured/old | 397b1994897Sopenharmony_ci| | (optional) | | 398b1994897Sopenharmony_ci+------------+------------+----------------------------+ 399b1994897Sopenharmony_ci``` 400b1994897Sopenharmony_ci 401b1994897Sopenharmony_ciSurvivor space is optional and only for high-end targets. 402b1994897Sopenharmony_ciSince one of the metric for this GC - high throughput, the most of the objects in the Eden will live enough to die. 403b1994897Sopenharmony_ciIf we prioritize energy-efficiency metric and the heap sizes at average not gigantic, it seems that we should avoid using survivor spaces. 404b1994897Sopenharmony_ciSo we can support it optionally for experiments. As alternative we can introduce some average age metadata for run of small objects. 405b1994897Sopenharmony_ci 406b1994897Sopenharmony_ciMinor GC(STW): 407b1994897Sopenharmony_ci1. Root scan for young gen, CardTable used for finding roots in old gen 408b1994897Sopenharmony_ci1. Mark eden and move alive objects to the tenured(or survivor) 409b1994897Sopenharmony_ci1. Sweep eden 410b1994897Sopenharmony_ci 411b1994897Sopenharmony_ciNote: we'll use adaptive thresholds for triggering Minor GC for minimizing STW pause 412b1994897Sopenharmony_ciNote #2: we can tune minor GC by trying make concurrent marking and re-mark, but it will require copy of the card table. 413b1994897Sopenharmony_ci 414b1994897Sopenharmony_ciMajor GC 415b1994897Sopenharmony_ci1. Concurrent scan of static roots 416b1994897Sopenharmony_ci1. Initial Mark - root scan(STW #1) 417b1994897Sopenharmony_ci1. Concurrent Marking + Reference processor 418b1994897Sopenharmony_ci1. Remark missed during concurrent marking objects (STW #2) 419b1994897Sopenharmony_ci1. Concurrent Sweep + Finalizers 420b1994897Sopenharmony_ci1. Reset 421b1994897Sopenharmony_ci 422b1994897Sopenharmony_ciReference processor - prevents issues with wrong finalization order. 423b1994897Sopenharmony_ci 424b1994897Sopenharmony_ciNote: If we don't have Survivor spaces we can implement non-moving generational GC. 425b1994897Sopenharmony_ci 426b1994897Sopenharmony_ci### Region based GC (main) 427b1994897Sopenharmony_ci 428b1994897Sopenharmony_ciRequirements: 429b1994897Sopenharmony_ci- concurrent 430b1994897Sopenharmony_ci- generational 431b1994897Sopenharmony_ci- acceptable stable STW pause 432b1994897Sopenharmony_ci- (optional) compaction 433b1994897Sopenharmony_ci 434b1994897Sopenharmony_ciSince typical heap size for mobile applications is small - this GC can be considered as good choice for production. 435b1994897Sopenharmony_ci 436b1994897Sopenharmony_ciAll heap consists of memory regions with fixed size(it can vary, i.e. size of memory region #K+1 can be different than size of memory region #K). 437b1994897Sopenharmony_ci``` 438b1994897Sopenharmony_ci+------------------+------------------+-----+------------------+ 439b1994897Sopenharmony_ci| Memory region #1 | Memory region #2 | ... | Memory region #N | 440b1994897Sopenharmony_ci| young | tenured | ... | tenured | 441b1994897Sopenharmony_ci+------------------+------------------+-----+------------------+ 442b1994897Sopenharmony_ci``` 443b1994897Sopenharmony_ci 444b1994897Sopenharmony_ciRegions types: 445b1994897Sopenharmony_ci- young regions 446b1994897Sopenharmony_ci- tenured regions 447b1994897Sopenharmony_ci- humonguous regions(for humonguous objects) 448b1994897Sopenharmony_ci- empty regions 449b1994897Sopenharmony_ci 450b1994897Sopenharmony_ciIncoming references for each region are tracked via remembered sets: 451b1994897Sopenharmony_ci- old-to-young references 452b1994897Sopenharmony_ci- old-to-old references 453b1994897Sopenharmony_ci 454b1994897Sopenharmony_ciMinor GC(only for young regions - STW): 455b1994897Sopenharmony_ci1. Root scan for young gen, remembered sets used for finding roots in old gen 456b1994897Sopenharmony_ci1. Marking young gen + Reference processor + moving alive objects to the tenured space 457b1994897Sopenharmony_ci1. Sweep + finalizers 458b1994897Sopenharmony_ci 459b1994897Sopenharmony_ciThe size of young space selected to satisfy 460b1994897Sopenharmony_ci 461b1994897Sopenharmony_ciMixed GC - minor GC + some tenured regions added to the young gen regions after the concurrent marking. 462b1994897Sopenharmony_ciConcurrent marking(triggered when we reach some threshold for tenured generation size): 463b1994897Sopenharmony_ci1. Root scan (STW #1) 464b1994897Sopenharmony_ci1. Concurrent marking + Reference processor 465b1994897Sopenharmony_ci1. Re-mark - finishes marking and update liveness statistics (STW #2) 466b1994897Sopenharmony_ci1. Cleanup - reclaims empty regions and determines if we need mixed collections to reclaim tenured space. Tenured regions selected by using different thresholds. 467b1994897Sopenharmony_ci 468b1994897Sopenharmony_ciNote: RSets optionally can be refined with special threads 469b1994897Sopenharmony_ci 470b1994897Sopenharmony_ci### Low-pause GC (deffered) 471b1994897Sopenharmony_ci 472b1994897Sopenharmony_ciRequirements: 473b1994897Sopenharmony_ci- stable low STW pause/pauseless 474b1994897Sopenharmony_ci- (optional)incremental 475b1994897Sopenharmony_ci- with compaction 476b1994897Sopenharmony_ci 477b1994897Sopenharmony_ciNo explicit minor GC. 478b1994897Sopenharmony_ci 479b1994897Sopenharmony_ciMajor GC 480b1994897Sopenharmony_ci1. Concurrent scan of static roots 481b1994897Sopenharmony_ci1. Initial Mark - root scan(STW #1) 482b1994897Sopenharmony_ci1. Concurrent Marking + Reference processor 483b1994897Sopenharmony_ci1. Concurrent Sweep + Finalizers + Concurrent Copy & Compact 484b1994897Sopenharmony_ci1. Reset 485b1994897Sopenharmony_ci 486b1994897Sopenharmony_ciNote: good choice for the applications with big heap or for applications when it is hard to provide stable low pause with Region based GC. 487b1994897Sopenharmony_ci 488b1994897Sopenharmony_ciNote: compaction is target and mode dependent, so for low-memory devices we can consider [semi-space compaction](./glossary.md#memory-management-terms). 489b1994897Sopenharmony_ciFor straight-forward approach we can consider some support from OS to minimize overlapping of semi-space compaction phases between applications. 490b1994897Sopenharmony_ci 491b1994897Sopenharmony_ci### GC: interaction with Interpreter, JIT and AOT 492b1994897Sopenharmony_ci 493b1994897Sopenharmony_ci#### Safepoints 494b1994897Sopenharmony_ci 495b1994897Sopenharmony_ciPrerequisites: 496b1994897Sopenharmony_ci* one HW register reserved for the pointer to the ExecState(per-thread state), let's call it `RVState` 497b1994897Sopenharmony_ci* ExecState structure has field with address of some page used for safepoints and we knew offset of this field `SPaddrOffset` 498b1994897Sopenharmony_ci 499b1994897Sopenharmony_ciIn general, safepoint will be just presented as some implicit or explicit load from the `[RVState, SPaddrOffset]`. 500b1994897Sopenharmony_ciFor example, it can be something like this: `LDR R12, [RVState, #SPaddrOffset]` 501b1994897Sopenharmony_ci 502b1994897Sopenharmony_ciNote: In some architectures it is make sense to use store instead of load because it requires less registers. 503b1994897Sopenharmony_ci 504b1994897Sopenharmony_ciNote: If it is no MMU available - it is allowed to use explicit condition for safepoint, i.e. something like this(pseudocode): 505b1994897Sopenharmony_ci``` 506b1994897Sopenharmony_ciif (SafepointFlag == true) { 507b1994897Sopenharmony_ci call Runtime::SafepointHandler 508b1994897Sopenharmony_ci} 509b1994897Sopenharmony_ci``` 510b1994897Sopenharmony_ci 511b1994897Sopenharmony_ciWhen GC wants to stop the world, it forces it by stopping all threads at the safepoint. 512b1994897Sopenharmony_ciIt protects some predefined safepoint memory page, and it leads to segmentation faults in all execution threads when they do the load from this address. 513b1994897Sopenharmony_ci 514b1994897Sopenharmony_ciSafepoints should be inserted at the beginning of the method and at the head of each loop. 515b1994897Sopenharmony_ci 516b1994897Sopenharmony_ciFor each safepoint, we should have a method that can provide GC with information about objects on the stack. 517b1994897Sopenharmony_ciInterpreter already supports such info in the frames. 518b1994897Sopenharmony_ciBut for JIT/compiler, it looks like we need some generated(by JIT/compiler) method that can get all necessary data for the safepoint. 519b1994897Sopenharmony_ciThis method can actually be just some code without prologue and epilogue. 520b1994897Sopenharmony_ciWe'll jump to its beginning from signal handler, and in the end, we should jump back to the safepoint, so probably we should put it near the original code. 521b1994897Sopenharmony_ci 522b1994897Sopenharmony_ciSo the flow looks like this: 523b1994897Sopenharmony_ci 524b1994897Sopenharmony_ci``` 525b1994897Sopenharmony_ci ... 526b1994897Sopenharmony_ci | compiled/jitted code | ------> 527b1994897Sopenharmony_ci | safepoint #X in the code | ---seg fault---> 528b1994897Sopenharmony_ci | signal handler | ---change return pc explicitly---> 529b1994897Sopenharmony_ci | method that prepares data about objects on stack for the #X safepoint and waits until STW ends | ---jump via encoded relative branch to safepoint---> 530b1994897Sopenharmony_ci | safepoint #X in the code | ---normal execution---> 531b1994897Sopenharmony_ci | compiled/jitted code | ------> 532b1994897Sopenharmony_ci ... 533b1994897Sopenharmony_ci``` 534b1994897Sopenharmony_ci 535b1994897Sopenharmony_ci**Opens**: 536b1994897Sopenharmony_ci* should we generate method for each safepoint, or all safepoints at once? 537b1994897Sopenharmony_ci 538b1994897Sopenharmony_ci#### GC Barriers 539b1994897Sopenharmony_ci 540b1994897Sopenharmony_ciGC barrier is a block on writing to(write barrier) or reading from(read barrier) certain memory by the application code. GC Barriers used to ensure heap consistency and optimize some of GC flows. 541b1994897Sopenharmony_ci 542b1994897Sopenharmony_ci##### GC Write Barriers 543b1994897Sopenharmony_ci 544b1994897Sopenharmony_ciHeap inconsistency can happen when GC reclaim alive/reachable object. 545b1994897Sopenharmony_ciI.e. these two conditions should happen to reclaim active/reachable: 546b1994897Sopenharmony_ci1. We store reference to a white object into a black object 547b1994897Sopenharmony_ci1. There are no paths from any gray object to that white object 548b1994897Sopenharmony_ci 549b1994897Sopenharmony_ciBesides addressing of heap inconsistency problem, write barrier can be used for maintaining incoming references for young generation or region. 550b1994897Sopenharmony_ci 551b1994897Sopenharmony_ciSo we can solve these issues with GC WRB(write barrier). GC WRB can be _pre_(inserted before the store) and _post_(inserted after the store). This barriers used **only** when we store reference to the object to some field of an object. 552b1994897Sopenharmony_ci 553b1994897Sopenharmony_ci_Pre_ barrier usually used to solve issue with lost alive object during concurrent marking. Pseudocode(example): 554b1994897Sopenharmony_ci```c++ 555b1994897Sopenharmony_ciif (UNLIKELY(concurrent_marking)) { 556b1994897Sopenharmony_ci auto pre_val = obj.field; 557b1994897Sopenharmony_ci if (pre_val != nullptr) { 558b1994897Sopenharmony_ci store_in_buff_to_mark(pre_val); // call function which stores reference to object stored in the field to process it later 559b1994897Sopenharmony_ci } 560b1994897Sopenharmony_ci} 561b1994897Sopenharmony_ciobj.field = new_val; // STORE for which barrier generated 562b1994897Sopenharmony_ci``` 563b1994897Sopenharmony_ci 564b1994897Sopenharmony_ci_Post_ barrier can be used to solve issue with tracking references from tenured generation to the young generation(or inter-region references). In this case we always know external roots for the young generation space(or for region). Pseudocode(abstract example, not real one): 565b1994897Sopenharmony_ci```c++ 566b1994897Sopenharmony_ciobj.field = new_val; // STORE for which barrier generated 567b1994897Sopenharmony_ciif ((AddressOf(obj.field) not in [YOUNG_GENERATION_ADDR_BEG, YOUNG_GENERATION_ADDR_END]) && 568b1994897Sopenharmony_ci (AddressOf(new_val) in [YOUNG_GENERATION_ADDR_BEG, YOUNG_GENERATION_ADDR_END])) { 569b1994897Sopenharmony_ci update_card(AddressOf(obj.field)); // call function which marks some memory range as containing roots for young generation 570b1994897Sopenharmony_ci} 571b1994897Sopenharmony_ci``` 572b1994897Sopenharmony_ciNote: Sometimes we don't check if object and stored reference in different generations. Because we get much less overhead this way. 573b1994897Sopenharmony_ci 574b1994897Sopenharmony_ci##### GC Read Barriers 575b1994897Sopenharmony_ci 576b1994897Sopenharmony_ciRead barriers used during concurrent compaction in some GCs. 577b1994897Sopenharmony_ciFor example we concurrently moving object from one place(`from-space`) to the another(`to-space`). 578b1994897Sopenharmony_ciAt some moment we can have two instance of the one object. 579b1994897Sopenharmony_ciSo we need one of these conditions should stand if we want to keep heap consistent: 580b1994897Sopenharmony_ci1. All writes happen into `to-space` instance of the object, but reads can happen from both `from-space` and `to-space` instances 581b1994897Sopenharmony_ci1. All writes and reads happen into/from `to-space` 582b1994897Sopenharmony_ci 583b1994897Sopenharmony_ci#### GC Barriers integration with Interpreter and compiler 584b1994897Sopenharmony_ci 585b1994897Sopenharmony_ci 586b1994897Sopenharmony_ciFrom Interpreter you could use runtime interface methods: 587b1994897Sopenharmony_ci```c++ 588b1994897Sopenharmony_cistatic void PreBarrier(void *obj_field_addr, void *pre_val_addr); 589b1994897Sopenharmony_cistatic void PostBarrier(void *obj_field_addr, void *val_addr); 590b1994897Sopenharmony_ci``` 591b1994897Sopenharmony_ciNote: for performance, we can put into ExecState address of conditional flag for conditional barriers with trivial condition (`if (*x) ...`). 592b1994897Sopenharmony_ci 593b1994897Sopenharmony_ciIt is critical to make compiler to encode barriers very optimally. At least fast path should be encoded effectively. 594b1994897Sopenharmony_ciThere are several approaches for that: 595b1994897Sopenharmony_ci 1. To describe barrier use some meta-language or IR which can be interpreted/encoded by all compilers compatible with runtime (it is currently not applicable for the runtime) 596b1994897Sopenharmony_ci 1. (a lot of open questions here, so consider this as an idea) One compiler knows how to encode barrier using runtime interfaces (see next item) and could provide some more compiler-friendly interface to the other compilers to encode GC barriers. 597b1994897Sopenharmony_ci 1. The compiler knows for each barrier type how it should be encoded (see pseudocode in `libpandabase/mem/gc_barrier.h`). And could use the runtime to get all required operands to do this. 598b1994897Sopenharmony_ciLet's consider below encoding of PRE_ barrier: 599b1994897Sopenharmony_ci - get barrier type via RuntimeInterface: `BarrierType GetPreType() const` 600b1994897Sopenharmony_ci - for this barrier type get all needed operands provided by Runtime via 601b1994897Sopenharmony_ci `BarrierOperand GCBarrierSet::GetBarrierOperand(BarrierPosition barrier_position, std::string_view name);` 602b1994897Sopenharmony_ci (you should use operand/parameters names from pseudocode provided in `enum BarrierType`) 603b1994897Sopenharmony_ci - encode barrier code using loaded operands and pseudocode from `enum BarrierType` 604b1994897Sopenharmony_ci 605b1994897Sopenharmony_ci## Memory sanitizers support 606b1994897Sopenharmony_ci 607b1994897Sopenharmony_ciPanda Runtime should support [ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer). 608b1994897Sopenharmony_ci 609b1994897Sopenharmony_ciOptional: [MSAN](https://github.com/google/sanitizers/wiki/MemorySanitizer) 610b1994897Sopenharmony_ci(Note: not possible to use without custom built toolchain) 611b1994897Sopenharmony_ci 612b1994897Sopenharmony_ciDesirable, but not easy to support: [HWSAN](https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html) 613