162306a36Sopenharmony_ci======================= 262306a36Sopenharmony_ciNUMA Memory Performance 362306a36Sopenharmony_ci======================= 462306a36Sopenharmony_ci 562306a36Sopenharmony_ciNUMA Locality 662306a36Sopenharmony_ci============= 762306a36Sopenharmony_ci 862306a36Sopenharmony_ciSome platforms may have multiple types of memory attached to a compute 962306a36Sopenharmony_cinode. These disparate memory ranges may share some characteristics, such 1062306a36Sopenharmony_cias CPU cache coherence, but may have different performance. For example, 1162306a36Sopenharmony_cidifferent media types and buses affect bandwidth and latency. 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciA system supports such heterogeneous memory by grouping each memory type 1462306a36Sopenharmony_ciunder different domains, or "nodes", based on locality and performance 1562306a36Sopenharmony_cicharacteristics. Some memory may share the same node as a CPU, and others 1662306a36Sopenharmony_ciare provided as memory only nodes. While memory only nodes do not provide 1762306a36Sopenharmony_ciCPUs, they may still be local to one or more compute nodes relative to 1862306a36Sopenharmony_ciother nodes. The following diagram shows one such example of two compute 1962306a36Sopenharmony_cinodes with local memory and a memory only node for each of compute node:: 2062306a36Sopenharmony_ci 2162306a36Sopenharmony_ci +------------------+ +------------------+ 2262306a36Sopenharmony_ci | Compute Node 0 +-----+ Compute Node 1 | 2362306a36Sopenharmony_ci | Local Node0 Mem | | Local Node1 Mem | 2462306a36Sopenharmony_ci +--------+---------+ +--------+---------+ 2562306a36Sopenharmony_ci | | 2662306a36Sopenharmony_ci +--------+---------+ +--------+---------+ 2762306a36Sopenharmony_ci | Slower Node2 Mem | | Slower Node3 Mem | 2862306a36Sopenharmony_ci +------------------+ +--------+---------+ 2962306a36Sopenharmony_ci 3062306a36Sopenharmony_ciA "memory initiator" is a node containing one or more devices such as 3162306a36Sopenharmony_ciCPUs or separate memory I/O devices that can initiate memory requests. 3262306a36Sopenharmony_ciA "memory target" is a node containing one or more physical address 3362306a36Sopenharmony_ciranges accessible from one or more memory initiators. 3462306a36Sopenharmony_ci 3562306a36Sopenharmony_ciWhen multiple memory initiators exist, they may not all have the same 3662306a36Sopenharmony_ciperformance when accessing a given memory target. Each initiator-target 3762306a36Sopenharmony_cipair may be organized into different ranked access classes to represent 3862306a36Sopenharmony_cithis relationship. The highest performing initiator to a given target 3962306a36Sopenharmony_ciis considered to be one of that target's local initiators, and given 4062306a36Sopenharmony_cithe highest access class, 0. Any given target may have one or more 4162306a36Sopenharmony_cilocal initiators, and any given initiator may have multiple local 4262306a36Sopenharmony_cimemory targets. 4362306a36Sopenharmony_ci 4462306a36Sopenharmony_ciTo aid applications matching memory targets with their initiators, the 4562306a36Sopenharmony_cikernel provides symlinks to each other. The following example lists the 4662306a36Sopenharmony_cirelationship for the access class "0" memory initiators and targets:: 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ci # symlinks -v /sys/devices/system/node/nodeX/access0/targets/ 4962306a36Sopenharmony_ci relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ci # symlinks -v /sys/devices/system/node/nodeY/access0/initiators/ 5262306a36Sopenharmony_ci relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX 5362306a36Sopenharmony_ci 5462306a36Sopenharmony_ciA memory initiator may have multiple memory targets in the same access 5562306a36Sopenharmony_ciclass. The target memory's initiators in a given class indicate the 5662306a36Sopenharmony_cinodes' access characteristics share the same performance relative to other 5762306a36Sopenharmony_cilinked initiator nodes. Each target within an initiator's access class, 5862306a36Sopenharmony_cithough, do not necessarily perform the same as each other. 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciThe access class "1" is used to allow differentiation between initiators 6162306a36Sopenharmony_cithat are CPUs and hence suitable for generic task scheduling, and 6262306a36Sopenharmony_ciIO initiators such as GPUs and NICs. Unlike access class 0, only 6362306a36Sopenharmony_cinodes containing CPUs are considered. 6462306a36Sopenharmony_ci 6562306a36Sopenharmony_ciNUMA Performance 6662306a36Sopenharmony_ci================ 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ciApplications may wish to consider which node they want their memory to 6962306a36Sopenharmony_cibe allocated from based on the node's performance characteristics. If 7062306a36Sopenharmony_cithe system provides these attributes, the kernel exports them under the 7162306a36Sopenharmony_cinode sysfs hierarchy by appending the attributes directory under the 7262306a36Sopenharmony_cimemory node's access class 0 initiators as follows:: 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ci /sys/devices/system/node/nodeY/access0/initiators/ 7562306a36Sopenharmony_ci 7662306a36Sopenharmony_ciThese attributes apply only when accessed from nodes that have the 7762306a36Sopenharmony_ciare linked under the this access's initiators. 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ciThe performance characteristics the kernel provides for the local initiators 8062306a36Sopenharmony_ciare exported are as follows:: 8162306a36Sopenharmony_ci 8262306a36Sopenharmony_ci # tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/ 8362306a36Sopenharmony_ci /sys/devices/system/node/nodeY/access0/initiators/ 8462306a36Sopenharmony_ci |-- read_bandwidth 8562306a36Sopenharmony_ci |-- read_latency 8662306a36Sopenharmony_ci |-- write_bandwidth 8762306a36Sopenharmony_ci `-- write_latency 8862306a36Sopenharmony_ci 8962306a36Sopenharmony_ciThe bandwidth attributes are provided in MiB/second. 9062306a36Sopenharmony_ci 9162306a36Sopenharmony_ciThe latency attributes are provided in nanoseconds. 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ciThe values reported here correspond to the rated latency and bandwidth 9462306a36Sopenharmony_cifor the platform. 9562306a36Sopenharmony_ci 9662306a36Sopenharmony_ciAccess class 1 takes the same form but only includes values for CPU to 9762306a36Sopenharmony_cimemory activity. 9862306a36Sopenharmony_ci 9962306a36Sopenharmony_ciNUMA Cache 10062306a36Sopenharmony_ci========== 10162306a36Sopenharmony_ci 10262306a36Sopenharmony_ciSystem memory may be constructed in a hierarchy of elements with various 10362306a36Sopenharmony_ciperformance characteristics in order to provide large address space of 10462306a36Sopenharmony_cislower performing memory cached by a smaller higher performing memory. The 10562306a36Sopenharmony_cisystem physical addresses memory initiators are aware of are provided 10662306a36Sopenharmony_ciby the last memory level in the hierarchy. The system meanwhile uses 10762306a36Sopenharmony_cihigher performing memory to transparently cache access to progressively 10862306a36Sopenharmony_cislower levels. 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ciThe term "far memory" is used to denote the last level memory in the 11162306a36Sopenharmony_cihierarchy. Each increasing cache level provides higher performing 11262306a36Sopenharmony_ciinitiator access, and the term "near memory" represents the fastest 11362306a36Sopenharmony_cicache provided by the system. 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ciThis numbering is different than CPU caches where the cache level (ex: 11662306a36Sopenharmony_ciL1, L2, L3) uses the CPU-side view where each increased level is lower 11762306a36Sopenharmony_ciperforming. In contrast, the memory cache level is centric to the last 11862306a36Sopenharmony_cilevel memory, so the higher numbered cache level corresponds to memory 11962306a36Sopenharmony_cinearer to the CPU, and further from far memory. 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ciThe memory-side caches are not directly addressable by software. When 12262306a36Sopenharmony_cisoftware accesses a system address, the system will return it from the 12362306a36Sopenharmony_cinear memory cache if it is present. If it is not present, the system 12462306a36Sopenharmony_ciaccesses the next level of memory until there is either a hit in that 12562306a36Sopenharmony_cicache level, or it reaches far memory. 12662306a36Sopenharmony_ci 12762306a36Sopenharmony_ciAn application does not need to know about caching attributes in order 12862306a36Sopenharmony_cito use the system. Software may optionally query the memory cache 12962306a36Sopenharmony_ciattributes in order to maximize the performance out of such a setup. 13062306a36Sopenharmony_ciIf the system provides a way for the kernel to discover this information, 13162306a36Sopenharmony_cifor example with ACPI HMAT (Heterogeneous Memory Attribute Table), 13262306a36Sopenharmony_cithe kernel will append these attributes to the NUMA node memory target. 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ciWhen the kernel first registers a memory cache with a node, the kernel 13562306a36Sopenharmony_ciwill create the following directory:: 13662306a36Sopenharmony_ci 13762306a36Sopenharmony_ci /sys/devices/system/node/nodeX/memory_side_cache/ 13862306a36Sopenharmony_ci 13962306a36Sopenharmony_ciIf that directory is not present, the system either does not provide 14062306a36Sopenharmony_cia memory-side cache, or that information is not accessible to the kernel. 14162306a36Sopenharmony_ci 14262306a36Sopenharmony_ciThe attributes for each level of cache is provided under its cache 14362306a36Sopenharmony_cilevel index:: 14462306a36Sopenharmony_ci 14562306a36Sopenharmony_ci /sys/devices/system/node/nodeX/memory_side_cache/indexA/ 14662306a36Sopenharmony_ci /sys/devices/system/node/nodeX/memory_side_cache/indexB/ 14762306a36Sopenharmony_ci /sys/devices/system/node/nodeX/memory_side_cache/indexC/ 14862306a36Sopenharmony_ci 14962306a36Sopenharmony_ciEach cache level's directory provides its attributes. For example, the 15062306a36Sopenharmony_cifollowing shows a single cache level and the attributes available for 15162306a36Sopenharmony_cisoftware to query:: 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ci # tree /sys/devices/system/node/node0/memory_side_cache/ 15462306a36Sopenharmony_ci /sys/devices/system/node/node0/memory_side_cache/ 15562306a36Sopenharmony_ci |-- index1 15662306a36Sopenharmony_ci | |-- indexing 15762306a36Sopenharmony_ci | |-- line_size 15862306a36Sopenharmony_ci | |-- size 15962306a36Sopenharmony_ci | `-- write_policy 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ciThe "indexing" will be 0 if it is a direct-mapped cache, and non-zero 16262306a36Sopenharmony_cifor any other indexed based, multi-way associativity. 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ciThe "line_size" is the number of bytes accessed from the next cache 16562306a36Sopenharmony_cilevel on a miss. 16662306a36Sopenharmony_ci 16762306a36Sopenharmony_ciThe "size" is the number of bytes provided by this cache level. 16862306a36Sopenharmony_ci 16962306a36Sopenharmony_ciThe "write_policy" will be 0 for write-back, and non-zero for 17062306a36Sopenharmony_ciwrite-through caching. 17162306a36Sopenharmony_ci 17262306a36Sopenharmony_ciSee Also 17362306a36Sopenharmony_ci======== 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ci[1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf 17662306a36Sopenharmony_ci- Section 5.2.27 177