162306a36Sopenharmony_ci=======================
262306a36Sopenharmony_ciNUMA Memory Performance
362306a36Sopenharmony_ci=======================
462306a36Sopenharmony_ci
562306a36Sopenharmony_ciNUMA Locality
662306a36Sopenharmony_ci=============
762306a36Sopenharmony_ci
862306a36Sopenharmony_ciSome platforms may have multiple types of memory attached to a compute
962306a36Sopenharmony_cinode. These disparate memory ranges may share some characteristics, such
1062306a36Sopenharmony_cias CPU cache coherence, but may have different performance. For example,
1162306a36Sopenharmony_cidifferent media types and buses affect bandwidth and latency.
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciA system supports such heterogeneous memory by grouping each memory type
1462306a36Sopenharmony_ciunder different domains, or "nodes", based on locality and performance
1562306a36Sopenharmony_cicharacteristics.  Some memory may share the same node as a CPU, and others
1662306a36Sopenharmony_ciare provided as memory only nodes. While memory only nodes do not provide
1762306a36Sopenharmony_ciCPUs, they may still be local to one or more compute nodes relative to
1862306a36Sopenharmony_ciother nodes. The following diagram shows one such example of two compute
1962306a36Sopenharmony_cinodes with local memory and a memory only node for each of compute node::
2062306a36Sopenharmony_ci
2162306a36Sopenharmony_ci +------------------+     +------------------+
2262306a36Sopenharmony_ci | Compute Node 0   +-----+ Compute Node 1   |
2362306a36Sopenharmony_ci | Local Node0 Mem  |     | Local Node1 Mem  |
2462306a36Sopenharmony_ci +--------+---------+     +--------+---------+
2562306a36Sopenharmony_ci          |                        |
2662306a36Sopenharmony_ci +--------+---------+     +--------+---------+
2762306a36Sopenharmony_ci | Slower Node2 Mem |     | Slower Node3 Mem |
2862306a36Sopenharmony_ci +------------------+     +--------+---------+
2962306a36Sopenharmony_ci
3062306a36Sopenharmony_ciA "memory initiator" is a node containing one or more devices such as
3162306a36Sopenharmony_ciCPUs or separate memory I/O devices that can initiate memory requests.
3262306a36Sopenharmony_ciA "memory target" is a node containing one or more physical address
3362306a36Sopenharmony_ciranges accessible from one or more memory initiators.
3462306a36Sopenharmony_ci
3562306a36Sopenharmony_ciWhen multiple memory initiators exist, they may not all have the same
3662306a36Sopenharmony_ciperformance when accessing a given memory target. Each initiator-target
3762306a36Sopenharmony_cipair may be organized into different ranked access classes to represent
3862306a36Sopenharmony_cithis relationship. The highest performing initiator to a given target
3962306a36Sopenharmony_ciis considered to be one of that target's local initiators, and given
4062306a36Sopenharmony_cithe highest access class, 0. Any given target may have one or more
4162306a36Sopenharmony_cilocal initiators, and any given initiator may have multiple local
4262306a36Sopenharmony_cimemory targets.
4362306a36Sopenharmony_ci
4462306a36Sopenharmony_ciTo aid applications matching memory targets with their initiators, the
4562306a36Sopenharmony_cikernel provides symlinks to each other. The following example lists the
4662306a36Sopenharmony_cirelationship for the access class "0" memory initiators and targets::
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ci	# symlinks -v /sys/devices/system/node/nodeX/access0/targets/
4962306a36Sopenharmony_ci	relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci	# symlinks -v /sys/devices/system/node/nodeY/access0/initiators/
5262306a36Sopenharmony_ci	relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX
5362306a36Sopenharmony_ci
5462306a36Sopenharmony_ciA memory initiator may have multiple memory targets in the same access
5562306a36Sopenharmony_ciclass. The target memory's initiators in a given class indicate the
5662306a36Sopenharmony_cinodes' access characteristics share the same performance relative to other
5762306a36Sopenharmony_cilinked initiator nodes. Each target within an initiator's access class,
5862306a36Sopenharmony_cithough, do not necessarily perform the same as each other.
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ciThe access class "1" is used to allow differentiation between initiators
6162306a36Sopenharmony_cithat are CPUs and hence suitable for generic task scheduling, and
6262306a36Sopenharmony_ciIO initiators such as GPUs and NICs.  Unlike access class 0, only
6362306a36Sopenharmony_cinodes containing CPUs are considered.
6462306a36Sopenharmony_ci
6562306a36Sopenharmony_ciNUMA Performance
6662306a36Sopenharmony_ci================
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ciApplications may wish to consider which node they want their memory to
6962306a36Sopenharmony_cibe allocated from based on the node's performance characteristics. If
7062306a36Sopenharmony_cithe system provides these attributes, the kernel exports them under the
7162306a36Sopenharmony_cinode sysfs hierarchy by appending the attributes directory under the
7262306a36Sopenharmony_cimemory node's access class 0 initiators as follows::
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ci	/sys/devices/system/node/nodeY/access0/initiators/
7562306a36Sopenharmony_ci
7662306a36Sopenharmony_ciThese attributes apply only when accessed from nodes that have the
7762306a36Sopenharmony_ciare linked under the this access's initiators.
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ciThe performance characteristics the kernel provides for the local initiators
8062306a36Sopenharmony_ciare exported are as follows::
8162306a36Sopenharmony_ci
8262306a36Sopenharmony_ci	# tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/
8362306a36Sopenharmony_ci	/sys/devices/system/node/nodeY/access0/initiators/
8462306a36Sopenharmony_ci	|-- read_bandwidth
8562306a36Sopenharmony_ci	|-- read_latency
8662306a36Sopenharmony_ci	|-- write_bandwidth
8762306a36Sopenharmony_ci	`-- write_latency
8862306a36Sopenharmony_ci
8962306a36Sopenharmony_ciThe bandwidth attributes are provided in MiB/second.
9062306a36Sopenharmony_ci
9162306a36Sopenharmony_ciThe latency attributes are provided in nanoseconds.
9262306a36Sopenharmony_ci
9362306a36Sopenharmony_ciThe values reported here correspond to the rated latency and bandwidth
9462306a36Sopenharmony_cifor the platform.
9562306a36Sopenharmony_ci
9662306a36Sopenharmony_ciAccess class 1 takes the same form but only includes values for CPU to
9762306a36Sopenharmony_cimemory activity.
9862306a36Sopenharmony_ci
9962306a36Sopenharmony_ciNUMA Cache
10062306a36Sopenharmony_ci==========
10162306a36Sopenharmony_ci
10262306a36Sopenharmony_ciSystem memory may be constructed in a hierarchy of elements with various
10362306a36Sopenharmony_ciperformance characteristics in order to provide large address space of
10462306a36Sopenharmony_cislower performing memory cached by a smaller higher performing memory. The
10562306a36Sopenharmony_cisystem physical addresses memory  initiators are aware of are provided
10662306a36Sopenharmony_ciby the last memory level in the hierarchy. The system meanwhile uses
10762306a36Sopenharmony_cihigher performing memory to transparently cache access to progressively
10862306a36Sopenharmony_cislower levels.
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ciThe term "far memory" is used to denote the last level memory in the
11162306a36Sopenharmony_cihierarchy. Each increasing cache level provides higher performing
11262306a36Sopenharmony_ciinitiator access, and the term "near memory" represents the fastest
11362306a36Sopenharmony_cicache provided by the system.
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ciThis numbering is different than CPU caches where the cache level (ex:
11662306a36Sopenharmony_ciL1, L2, L3) uses the CPU-side view where each increased level is lower
11762306a36Sopenharmony_ciperforming. In contrast, the memory cache level is centric to the last
11862306a36Sopenharmony_cilevel memory, so the higher numbered cache level corresponds to  memory
11962306a36Sopenharmony_cinearer to the CPU, and further from far memory.
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ciThe memory-side caches are not directly addressable by software. When
12262306a36Sopenharmony_cisoftware accesses a system address, the system will return it from the
12362306a36Sopenharmony_cinear memory cache if it is present. If it is not present, the system
12462306a36Sopenharmony_ciaccesses the next level of memory until there is either a hit in that
12562306a36Sopenharmony_cicache level, or it reaches far memory.
12662306a36Sopenharmony_ci
12762306a36Sopenharmony_ciAn application does not need to know about caching attributes in order
12862306a36Sopenharmony_cito use the system. Software may optionally query the memory cache
12962306a36Sopenharmony_ciattributes in order to maximize the performance out of such a setup.
13062306a36Sopenharmony_ciIf the system provides a way for the kernel to discover this information,
13162306a36Sopenharmony_cifor example with ACPI HMAT (Heterogeneous Memory Attribute Table),
13262306a36Sopenharmony_cithe kernel will append these attributes to the NUMA node memory target.
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ciWhen the kernel first registers a memory cache with a node, the kernel
13562306a36Sopenharmony_ciwill create the following directory::
13662306a36Sopenharmony_ci
13762306a36Sopenharmony_ci	/sys/devices/system/node/nodeX/memory_side_cache/
13862306a36Sopenharmony_ci
13962306a36Sopenharmony_ciIf that directory is not present, the system either does not provide
14062306a36Sopenharmony_cia memory-side cache, or that information is not accessible to the kernel.
14162306a36Sopenharmony_ci
14262306a36Sopenharmony_ciThe attributes for each level of cache is provided under its cache
14362306a36Sopenharmony_cilevel index::
14462306a36Sopenharmony_ci
14562306a36Sopenharmony_ci	/sys/devices/system/node/nodeX/memory_side_cache/indexA/
14662306a36Sopenharmony_ci	/sys/devices/system/node/nodeX/memory_side_cache/indexB/
14762306a36Sopenharmony_ci	/sys/devices/system/node/nodeX/memory_side_cache/indexC/
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ciEach cache level's directory provides its attributes. For example, the
15062306a36Sopenharmony_cifollowing shows a single cache level and the attributes available for
15162306a36Sopenharmony_cisoftware to query::
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ci	# tree /sys/devices/system/node/node0/memory_side_cache/
15462306a36Sopenharmony_ci	/sys/devices/system/node/node0/memory_side_cache/
15562306a36Sopenharmony_ci	|-- index1
15662306a36Sopenharmony_ci	|   |-- indexing
15762306a36Sopenharmony_ci	|   |-- line_size
15862306a36Sopenharmony_ci	|   |-- size
15962306a36Sopenharmony_ci	|   `-- write_policy
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ciThe "indexing" will be 0 if it is a direct-mapped cache, and non-zero
16262306a36Sopenharmony_cifor any other indexed based, multi-way associativity.
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ciThe "line_size" is the number of bytes accessed from the next cache
16562306a36Sopenharmony_cilevel on a miss.
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ciThe "size" is the number of bytes provided by this cache level.
16862306a36Sopenharmony_ci
16962306a36Sopenharmony_ciThe "write_policy" will be 0 for write-back, and non-zero for
17062306a36Sopenharmony_ciwrite-through caching.
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_ciSee Also
17362306a36Sopenharmony_ci========
17462306a36Sopenharmony_ci
17562306a36Sopenharmony_ci[1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
17662306a36Sopenharmony_ci- Section 5.2.27
177