162306a36Sopenharmony_ci=============================== 262306a36Sopenharmony_ciLIBNVDIMM: Non-Volatile Devices 362306a36Sopenharmony_ci=============================== 462306a36Sopenharmony_ci 562306a36Sopenharmony_cilibnvdimm - kernel / libndctl - userspace helper library 662306a36Sopenharmony_ci 762306a36Sopenharmony_cinvdimm@lists.linux.dev 862306a36Sopenharmony_ci 962306a36Sopenharmony_ciVersion 13 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ci.. contents: 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ci Glossary 1462306a36Sopenharmony_ci Overview 1562306a36Sopenharmony_ci Supporting Documents 1662306a36Sopenharmony_ci Git Trees 1762306a36Sopenharmony_ci LIBNVDIMM PMEM 1862306a36Sopenharmony_ci PMEM-REGIONs, Atomic Sectors, and DAX 1962306a36Sopenharmony_ci Example NVDIMM Platform 2062306a36Sopenharmony_ci LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API 2162306a36Sopenharmony_ci LIBNDCTL: Context 2262306a36Sopenharmony_ci libndctl: instantiate a new library context example 2362306a36Sopenharmony_ci LIBNVDIMM/LIBNDCTL: Bus 2462306a36Sopenharmony_ci libnvdimm: control class device in /sys/class 2562306a36Sopenharmony_ci libnvdimm: bus 2662306a36Sopenharmony_ci libndctl: bus enumeration example 2762306a36Sopenharmony_ci LIBNVDIMM/LIBNDCTL: DIMM (NMEM) 2862306a36Sopenharmony_ci libnvdimm: DIMM (NMEM) 2962306a36Sopenharmony_ci libndctl: DIMM enumeration example 3062306a36Sopenharmony_ci LIBNVDIMM/LIBNDCTL: Region 3162306a36Sopenharmony_ci libnvdimm: region 3262306a36Sopenharmony_ci libndctl: region enumeration example 3362306a36Sopenharmony_ci Why Not Encode the Region Type into the Region Name? 3462306a36Sopenharmony_ci How Do I Determine the Major Type of a Region? 3562306a36Sopenharmony_ci LIBNVDIMM/LIBNDCTL: Namespace 3662306a36Sopenharmony_ci libnvdimm: namespace 3762306a36Sopenharmony_ci libndctl: namespace enumeration example 3862306a36Sopenharmony_ci libndctl: namespace creation example 3962306a36Sopenharmony_ci Why the Term "namespace"? 4062306a36Sopenharmony_ci LIBNVDIMM/LIBNDCTL: Block Translation Table "btt" 4162306a36Sopenharmony_ci libnvdimm: btt layout 4262306a36Sopenharmony_ci libndctl: btt creation example 4362306a36Sopenharmony_ci Summary LIBNDCTL Diagram 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ci 4662306a36Sopenharmony_ciGlossary 4762306a36Sopenharmony_ci======== 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_ciPMEM: 5062306a36Sopenharmony_ci A system-physical-address range where writes are persistent. A 5162306a36Sopenharmony_ci block device composed of PMEM is capable of DAX. A PMEM address range 5262306a36Sopenharmony_ci may span an interleave of several DIMMs. 5362306a36Sopenharmony_ci 5462306a36Sopenharmony_ciDPA: 5562306a36Sopenharmony_ci DIMM Physical Address, is a DIMM-relative offset. With one DIMM in 5662306a36Sopenharmony_ci the system there would be a 1:1 system-physical-address:DPA association. 5762306a36Sopenharmony_ci Once more DIMMs are added a memory controller interleave must be 5862306a36Sopenharmony_ci decoded to determine the DPA associated with a given 5962306a36Sopenharmony_ci system-physical-address. 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ciDAX: 6262306a36Sopenharmony_ci File system extensions to bypass the page cache and block layer to 6362306a36Sopenharmony_ci mmap persistent memory, from a PMEM block device, directly into a 6462306a36Sopenharmony_ci process address space. 6562306a36Sopenharmony_ci 6662306a36Sopenharmony_ciDSM: 6762306a36Sopenharmony_ci Device Specific Method: ACPI method to control specific 6862306a36Sopenharmony_ci device - in this case the firmware. 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_ciDCR: 7162306a36Sopenharmony_ci NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5. 7262306a36Sopenharmony_ci It defines a vendor-id, device-id, and interface format for a given DIMM. 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ciBTT: 7562306a36Sopenharmony_ci Block Translation Table: Persistent memory is byte addressable. 7662306a36Sopenharmony_ci Existing software may have an expectation that the power-fail-atomicity 7762306a36Sopenharmony_ci of writes is at least one sector, 512 bytes. The BTT is an indirection 7862306a36Sopenharmony_ci table with atomic update semantics to front a PMEM block device 7962306a36Sopenharmony_ci driver and present arbitrary atomic sector sizes. 8062306a36Sopenharmony_ci 8162306a36Sopenharmony_ciLABEL: 8262306a36Sopenharmony_ci Metadata stored on a DIMM device that partitions and identifies 8362306a36Sopenharmony_ci (persistently names) capacity allocated to different PMEM namespaces. It 8462306a36Sopenharmony_ci also indicates whether an address abstraction like a BTT is applied to 8562306a36Sopenharmony_ci the namespace. Note that traditional partition tables, GPT/MBR, are 8662306a36Sopenharmony_ci layered on top of a PMEM namespace, or an address abstraction like BTT 8762306a36Sopenharmony_ci if present, but partition support is deprecated going forward. 8862306a36Sopenharmony_ci 8962306a36Sopenharmony_ci 9062306a36Sopenharmony_ciOverview 9162306a36Sopenharmony_ci======== 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ciThe LIBNVDIMM subsystem provides support for PMEM described by platform 9462306a36Sopenharmony_cifirmware or a device driver. On ACPI based systems the platform firmware 9562306a36Sopenharmony_ciconveys persistent memory resource via the ACPI NFIT "NVDIMM Firmware 9662306a36Sopenharmony_ciInterface Table" in ACPI 6. While the LIBNVDIMM subsystem implementation 9762306a36Sopenharmony_ciis generic and supports pre-NFIT platforms, it was guided by the 9862306a36Sopenharmony_cisuperset of capabilities need to support this ACPI 6 definition for 9962306a36Sopenharmony_ciNVDIMM resources. The original implementation supported the 10062306a36Sopenharmony_ciblock-window-aperture capability described in the NFIT, but that support 10162306a36Sopenharmony_cihas since been abandoned and never shipped in a product. 10262306a36Sopenharmony_ci 10362306a36Sopenharmony_ciSupporting Documents 10462306a36Sopenharmony_ci-------------------- 10562306a36Sopenharmony_ci 10662306a36Sopenharmony_ciACPI 6: 10762306a36Sopenharmony_ci https://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf 10862306a36Sopenharmony_ciNVDIMM Namespace: 10962306a36Sopenharmony_ci https://pmem.io/documents/NVDIMM_Namespace_Spec.pdf 11062306a36Sopenharmony_ciDSM Interface Example: 11162306a36Sopenharmony_ci https://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf 11262306a36Sopenharmony_ciDriver Writer's Guide: 11362306a36Sopenharmony_ci https://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ciGit Trees 11662306a36Sopenharmony_ci--------- 11762306a36Sopenharmony_ci 11862306a36Sopenharmony_ciLIBNVDIMM: 11962306a36Sopenharmony_ci https://git.kernel.org/cgit/linux/kernel/git/nvdimm/nvdimm.git 12062306a36Sopenharmony_ciLIBNDCTL: 12162306a36Sopenharmony_ci https://github.com/pmem/ndctl.git 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ci 12462306a36Sopenharmony_ciLIBNVDIMM PMEM 12562306a36Sopenharmony_ci============== 12662306a36Sopenharmony_ci 12762306a36Sopenharmony_ciPrior to the arrival of the NFIT, non-volatile memory was described to a 12862306a36Sopenharmony_cisystem in various ad-hoc ways. Usually only the bare minimum was 12962306a36Sopenharmony_ciprovided, namely, a single system-physical-address range where writes 13062306a36Sopenharmony_ciare expected to be durable after a system power loss. Now, the NFIT 13162306a36Sopenharmony_cispecification standardizes not only the description of PMEM, but also 13262306a36Sopenharmony_ciplatform message-passing entry points for control and configuration. 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ciPMEM (nd_pmem.ko): Drives a system-physical-address range. This range is 13562306a36Sopenharmony_cicontiguous in system memory and may be interleaved (hardware memory controller 13662306a36Sopenharmony_cistriped) across multiple DIMMs. When interleaved the platform may optionally 13762306a36Sopenharmony_ciprovide details of which DIMMs are participating in the interleave. 13862306a36Sopenharmony_ci 13962306a36Sopenharmony_ciIt is worth noting that when the labeling capability is detected (a EFI 14062306a36Sopenharmony_cinamespace label index block is found), then no block device is created 14162306a36Sopenharmony_ciby default as userspace needs to do at least one allocation of DPA to 14262306a36Sopenharmony_cithe PMEM range. In contrast ND_NAMESPACE_IO ranges, once registered, 14362306a36Sopenharmony_cican be immediately attached to nd_pmem. This latter mode is called 14462306a36Sopenharmony_cilabel-less or "legacy". 14562306a36Sopenharmony_ci 14662306a36Sopenharmony_ciPMEM-REGIONs, Atomic Sectors, and DAX 14762306a36Sopenharmony_ci------------------------------------- 14862306a36Sopenharmony_ci 14962306a36Sopenharmony_ciFor the cases where an application or filesystem still needs atomic sector 15062306a36Sopenharmony_ciupdate guarantees it can register a BTT on a PMEM device or partition. See 15162306a36Sopenharmony_ciLIBNVDIMM/NDCTL: Block Translation Table "btt" 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ci 15462306a36Sopenharmony_ciExample NVDIMM Platform 15562306a36Sopenharmony_ci======================= 15662306a36Sopenharmony_ci 15762306a36Sopenharmony_ciFor the remainder of this document the following diagram will be 15862306a36Sopenharmony_cireferenced for any example sysfs layouts:: 15962306a36Sopenharmony_ci 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ci (a) (b) DIMM 16262306a36Sopenharmony_ci +-------------------+--------+--------+--------+ 16362306a36Sopenharmony_ci +------+ | pm0.0 | free | pm1.0 | free | 0 16462306a36Sopenharmony_ci | imc0 +--+- - - region0- - - +--------+ +--------+ 16562306a36Sopenharmony_ci +--+---+ | pm0.0 | free | pm1.0 | free | 1 16662306a36Sopenharmony_ci | +-------------------+--------v v--------+ 16762306a36Sopenharmony_ci +--+---+ | | 16862306a36Sopenharmony_ci | cpu0 | region1 16962306a36Sopenharmony_ci +--+---+ | | 17062306a36Sopenharmony_ci | +----------------------------^ ^--------+ 17162306a36Sopenharmony_ci +--+---+ | free | pm1.0 | free | 2 17262306a36Sopenharmony_ci | imc1 +--+----------------------------| +--------+ 17362306a36Sopenharmony_ci +------+ | free | pm1.0 | free | 3 17462306a36Sopenharmony_ci +----------------------------+--------+--------+ 17562306a36Sopenharmony_ci 17662306a36Sopenharmony_ciIn this platform we have four DIMMs and two memory controllers in one 17762306a36Sopenharmony_cisocket. Each PMEM interleave set is identified by a region device with 17862306a36Sopenharmony_cia dynamically assigned id. 17962306a36Sopenharmony_ci 18062306a36Sopenharmony_ci 1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A 18162306a36Sopenharmony_ci single PMEM namespace is created in the REGION0-SPA-range that spans most 18262306a36Sopenharmony_ci of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that 18362306a36Sopenharmony_ci interleaved system-physical-address range is left free for 18462306a36Sopenharmony_ci another PMEM namespace to be defined. 18562306a36Sopenharmony_ci 18662306a36Sopenharmony_ci 2. In the last portion of DIMM0 and DIMM1 we have an interleaved 18762306a36Sopenharmony_ci system-physical-address range, REGION1, that spans those two DIMMs as 18862306a36Sopenharmony_ci well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace 18962306a36Sopenharmony_ci named "pm1.0". 19062306a36Sopenharmony_ci 19162306a36Sopenharmony_ci This bus is provided by the kernel under the device 19262306a36Sopenharmony_ci /sys/devices/platform/nfit_test.0 when the nfit_test.ko module from 19362306a36Sopenharmony_ci tools/testing/nvdimm is loaded. This module is a unit test for 19462306a36Sopenharmony_ci LIBNVDIMM and the acpi_nfit.ko driver. 19562306a36Sopenharmony_ci 19662306a36Sopenharmony_ci 19762306a36Sopenharmony_ciLIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API 19862306a36Sopenharmony_ci======================================================== 19962306a36Sopenharmony_ci 20062306a36Sopenharmony_ciWhat follows is a description of the LIBNVDIMM sysfs layout and a 20162306a36Sopenharmony_cicorresponding object hierarchy diagram as viewed through the LIBNDCTL 20262306a36Sopenharmony_ciAPI. The example sysfs paths and diagrams are relative to the Example 20362306a36Sopenharmony_ciNVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit 20462306a36Sopenharmony_citest. 20562306a36Sopenharmony_ci 20662306a36Sopenharmony_ciLIBNDCTL: Context 20762306a36Sopenharmony_ci----------------- 20862306a36Sopenharmony_ci 20962306a36Sopenharmony_ciEvery API call in the LIBNDCTL library requires a context that holds the 21062306a36Sopenharmony_cilogging parameters and other library instance state. The library is 21162306a36Sopenharmony_cibased on the libabc template: 21262306a36Sopenharmony_ci 21362306a36Sopenharmony_ci https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git 21462306a36Sopenharmony_ci 21562306a36Sopenharmony_ciLIBNDCTL: instantiate a new library context example 21662306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 21762306a36Sopenharmony_ci 21862306a36Sopenharmony_ci:: 21962306a36Sopenharmony_ci 22062306a36Sopenharmony_ci struct ndctl_ctx *ctx; 22162306a36Sopenharmony_ci 22262306a36Sopenharmony_ci if (ndctl_new(&ctx) == 0) 22362306a36Sopenharmony_ci return ctx; 22462306a36Sopenharmony_ci else 22562306a36Sopenharmony_ci return NULL; 22662306a36Sopenharmony_ci 22762306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: Bus 22862306a36Sopenharmony_ci----------------------- 22962306a36Sopenharmony_ci 23062306a36Sopenharmony_ciA bus has a 1:1 relationship with an NFIT. The current expectation for 23162306a36Sopenharmony_ciACPI based systems is that there is only ever one platform-global NFIT. 23262306a36Sopenharmony_ciThat said, it is trivial to register multiple NFITs, the specification 23362306a36Sopenharmony_cidoes not preclude it. The infrastructure supports multiple busses and 23462306a36Sopenharmony_ciwe use this capability to test multiple NFIT configurations in the unit 23562306a36Sopenharmony_citest. 23662306a36Sopenharmony_ci 23762306a36Sopenharmony_ciLIBNVDIMM: control class device in /sys/class 23862306a36Sopenharmony_ci--------------------------------------------- 23962306a36Sopenharmony_ci 24062306a36Sopenharmony_ciThis character device accepts DSM messages to be passed to DIMM 24162306a36Sopenharmony_ciidentified by its NFIT handle:: 24262306a36Sopenharmony_ci 24362306a36Sopenharmony_ci /sys/class/nd/ndctl0 24462306a36Sopenharmony_ci |-- dev 24562306a36Sopenharmony_ci |-- device -> ../../../ndbus0 24662306a36Sopenharmony_ci |-- subsystem -> ../../../../../../../class/nd 24762306a36Sopenharmony_ci 24862306a36Sopenharmony_ci 24962306a36Sopenharmony_ci 25062306a36Sopenharmony_ciLIBNVDIMM: bus 25162306a36Sopenharmony_ci-------------- 25262306a36Sopenharmony_ci 25362306a36Sopenharmony_ci:: 25462306a36Sopenharmony_ci 25562306a36Sopenharmony_ci struct nvdimm_bus *nvdimm_bus_register(struct device *parent, 25662306a36Sopenharmony_ci struct nvdimm_bus_descriptor *nfit_desc); 25762306a36Sopenharmony_ci 25862306a36Sopenharmony_ci:: 25962306a36Sopenharmony_ci 26062306a36Sopenharmony_ci /sys/devices/platform/nfit_test.0/ndbus0 26162306a36Sopenharmony_ci |-- commands 26262306a36Sopenharmony_ci |-- nd 26362306a36Sopenharmony_ci |-- nfit 26462306a36Sopenharmony_ci |-- nmem0 26562306a36Sopenharmony_ci |-- nmem1 26662306a36Sopenharmony_ci |-- nmem2 26762306a36Sopenharmony_ci |-- nmem3 26862306a36Sopenharmony_ci |-- power 26962306a36Sopenharmony_ci |-- provider 27062306a36Sopenharmony_ci |-- region0 27162306a36Sopenharmony_ci |-- region1 27262306a36Sopenharmony_ci |-- region2 27362306a36Sopenharmony_ci |-- region3 27462306a36Sopenharmony_ci |-- region4 27562306a36Sopenharmony_ci |-- region5 27662306a36Sopenharmony_ci |-- uevent 27762306a36Sopenharmony_ci `-- wait_probe 27862306a36Sopenharmony_ci 27962306a36Sopenharmony_ciLIBNDCTL: bus enumeration example 28062306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 28162306a36Sopenharmony_ci 28262306a36Sopenharmony_ciFind the bus handle that describes the bus from Example NVDIMM Platform:: 28362306a36Sopenharmony_ci 28462306a36Sopenharmony_ci static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx, 28562306a36Sopenharmony_ci const char *provider) 28662306a36Sopenharmony_ci { 28762306a36Sopenharmony_ci struct ndctl_bus *bus; 28862306a36Sopenharmony_ci 28962306a36Sopenharmony_ci ndctl_bus_foreach(ctx, bus) 29062306a36Sopenharmony_ci if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0) 29162306a36Sopenharmony_ci return bus; 29262306a36Sopenharmony_ci 29362306a36Sopenharmony_ci return NULL; 29462306a36Sopenharmony_ci } 29562306a36Sopenharmony_ci 29662306a36Sopenharmony_ci bus = get_bus_by_provider(ctx, "nfit_test.0"); 29762306a36Sopenharmony_ci 29862306a36Sopenharmony_ci 29962306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: DIMM (NMEM) 30062306a36Sopenharmony_ci------------------------------- 30162306a36Sopenharmony_ci 30262306a36Sopenharmony_ciThe DIMM device provides a character device for sending commands to 30362306a36Sopenharmony_cihardware, and it is a container for LABELs. If the DIMM is defined by 30462306a36Sopenharmony_ciNFIT then an optional 'nfit' attribute sub-directory is available to add 30562306a36Sopenharmony_ciNFIT-specifics. 30662306a36Sopenharmony_ci 30762306a36Sopenharmony_ciNote that the kernel device name for "DIMMs" is "nmemX". The NFIT 30862306a36Sopenharmony_cidescribes these devices via "Memory Device to System Physical Address 30962306a36Sopenharmony_ciRange Mapping Structure", and there is no requirement that they actually 31062306a36Sopenharmony_cibe physical DIMMs, so we use a more generic name. 31162306a36Sopenharmony_ci 31262306a36Sopenharmony_ciLIBNVDIMM: DIMM (NMEM) 31362306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^ 31462306a36Sopenharmony_ci 31562306a36Sopenharmony_ci:: 31662306a36Sopenharmony_ci 31762306a36Sopenharmony_ci struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data, 31862306a36Sopenharmony_ci const struct attribute_group **groups, unsigned long flags, 31962306a36Sopenharmony_ci unsigned long *dsm_mask); 32062306a36Sopenharmony_ci 32162306a36Sopenharmony_ci:: 32262306a36Sopenharmony_ci 32362306a36Sopenharmony_ci /sys/devices/platform/nfit_test.0/ndbus0 32462306a36Sopenharmony_ci |-- nmem0 32562306a36Sopenharmony_ci | |-- available_slots 32662306a36Sopenharmony_ci | |-- commands 32762306a36Sopenharmony_ci | |-- dev 32862306a36Sopenharmony_ci | |-- devtype 32962306a36Sopenharmony_ci | |-- driver -> ../../../../../bus/nd/drivers/nvdimm 33062306a36Sopenharmony_ci | |-- modalias 33162306a36Sopenharmony_ci | |-- nfit 33262306a36Sopenharmony_ci | | |-- device 33362306a36Sopenharmony_ci | | |-- format 33462306a36Sopenharmony_ci | | |-- handle 33562306a36Sopenharmony_ci | | |-- phys_id 33662306a36Sopenharmony_ci | | |-- rev_id 33762306a36Sopenharmony_ci | | |-- serial 33862306a36Sopenharmony_ci | | `-- vendor 33962306a36Sopenharmony_ci | |-- state 34062306a36Sopenharmony_ci | |-- subsystem -> ../../../../../bus/nd 34162306a36Sopenharmony_ci | `-- uevent 34262306a36Sopenharmony_ci |-- nmem1 34362306a36Sopenharmony_ci [..] 34462306a36Sopenharmony_ci 34562306a36Sopenharmony_ci 34662306a36Sopenharmony_ciLIBNDCTL: DIMM enumeration example 34762306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 34862306a36Sopenharmony_ci 34962306a36Sopenharmony_ciNote, in this example we are assuming NFIT-defined DIMMs which are 35062306a36Sopenharmony_ciidentified by an "nfit_handle" a 32-bit value where: 35162306a36Sopenharmony_ci 35262306a36Sopenharmony_ci - Bit 3:0 DIMM number within the memory channel 35362306a36Sopenharmony_ci - Bit 7:4 memory channel number 35462306a36Sopenharmony_ci - Bit 11:8 memory controller ID 35562306a36Sopenharmony_ci - Bit 15:12 socket ID (within scope of a Node controller if node 35662306a36Sopenharmony_ci controller is present) 35762306a36Sopenharmony_ci - Bit 27:16 Node Controller ID 35862306a36Sopenharmony_ci - Bit 31:28 Reserved 35962306a36Sopenharmony_ci 36062306a36Sopenharmony_ci:: 36162306a36Sopenharmony_ci 36262306a36Sopenharmony_ci static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus, 36362306a36Sopenharmony_ci unsigned int handle) 36462306a36Sopenharmony_ci { 36562306a36Sopenharmony_ci struct ndctl_dimm *dimm; 36662306a36Sopenharmony_ci 36762306a36Sopenharmony_ci ndctl_dimm_foreach(bus, dimm) 36862306a36Sopenharmony_ci if (ndctl_dimm_get_handle(dimm) == handle) 36962306a36Sopenharmony_ci return dimm; 37062306a36Sopenharmony_ci 37162306a36Sopenharmony_ci return NULL; 37262306a36Sopenharmony_ci } 37362306a36Sopenharmony_ci 37462306a36Sopenharmony_ci #define DIMM_HANDLE(n, s, i, c, d) \ 37562306a36Sopenharmony_ci (((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \ 37662306a36Sopenharmony_ci | ((c & 0xf) << 4) | (d & 0xf)) 37762306a36Sopenharmony_ci 37862306a36Sopenharmony_ci dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0)); 37962306a36Sopenharmony_ci 38062306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: Region 38162306a36Sopenharmony_ci-------------------------- 38262306a36Sopenharmony_ci 38362306a36Sopenharmony_ciA generic REGION device is registered for each PMEM interleave-set / 38462306a36Sopenharmony_cirange. Per the example there are 2 PMEM regions on the "nfit_test.0" 38562306a36Sopenharmony_cibus. The primary role of regions are to be a container of "mappings". A 38662306a36Sopenharmony_cimapping is a tuple of <DIMM, DPA-start-offset, length>. 38762306a36Sopenharmony_ci 38862306a36Sopenharmony_ciLIBNVDIMM provides a built-in driver for REGION devices. This driver 38962306a36Sopenharmony_ciis responsible for all parsing LABELs, if present, and then emitting NAMESPACE 39062306a36Sopenharmony_cidevices for the nd_pmem driver to consume. 39162306a36Sopenharmony_ci 39262306a36Sopenharmony_ciIn addition to the generic attributes of "mapping"s, "interleave_ways" 39362306a36Sopenharmony_ciand "size" the REGION device also exports some convenience attributes. 39462306a36Sopenharmony_ci"nstype" indicates the integer type of namespace-device this region 39562306a36Sopenharmony_ciemits, "devtype" duplicates the DEVTYPE variable stored by udev at the 39662306a36Sopenharmony_ci'add' event, "modalias" duplicates the MODALIAS variable stored by udev 39762306a36Sopenharmony_ciat the 'add' event, and finally, the optional "spa_index" is provided in 39862306a36Sopenharmony_cithe case where the region is defined by a SPA. 39962306a36Sopenharmony_ci 40062306a36Sopenharmony_ciLIBNVDIMM: region:: 40162306a36Sopenharmony_ci 40262306a36Sopenharmony_ci struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus, 40362306a36Sopenharmony_ci struct nd_region_desc *ndr_desc); 40462306a36Sopenharmony_ci 40562306a36Sopenharmony_ci:: 40662306a36Sopenharmony_ci 40762306a36Sopenharmony_ci /sys/devices/platform/nfit_test.0/ndbus0 40862306a36Sopenharmony_ci |-- region0 40962306a36Sopenharmony_ci | |-- available_size 41062306a36Sopenharmony_ci | |-- btt0 41162306a36Sopenharmony_ci | |-- btt_seed 41262306a36Sopenharmony_ci | |-- devtype 41362306a36Sopenharmony_ci | |-- driver -> ../../../../../bus/nd/drivers/nd_region 41462306a36Sopenharmony_ci | |-- init_namespaces 41562306a36Sopenharmony_ci | |-- mapping0 41662306a36Sopenharmony_ci | |-- mapping1 41762306a36Sopenharmony_ci | |-- mappings 41862306a36Sopenharmony_ci | |-- modalias 41962306a36Sopenharmony_ci | |-- namespace0.0 42062306a36Sopenharmony_ci | |-- namespace_seed 42162306a36Sopenharmony_ci | |-- numa_node 42262306a36Sopenharmony_ci | |-- nfit 42362306a36Sopenharmony_ci | | `-- spa_index 42462306a36Sopenharmony_ci | |-- nstype 42562306a36Sopenharmony_ci | |-- set_cookie 42662306a36Sopenharmony_ci | |-- size 42762306a36Sopenharmony_ci | |-- subsystem -> ../../../../../bus/nd 42862306a36Sopenharmony_ci | `-- uevent 42962306a36Sopenharmony_ci |-- region1 43062306a36Sopenharmony_ci [..] 43162306a36Sopenharmony_ci 43262306a36Sopenharmony_ciLIBNDCTL: region enumeration example 43362306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 43462306a36Sopenharmony_ci 43562306a36Sopenharmony_ciSample region retrieval routines based on NFIT-unique data like 43662306a36Sopenharmony_ci"spa_index" (interleave set id). 43762306a36Sopenharmony_ci 43862306a36Sopenharmony_ci:: 43962306a36Sopenharmony_ci 44062306a36Sopenharmony_ci static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus, 44162306a36Sopenharmony_ci unsigned int spa_index) 44262306a36Sopenharmony_ci { 44362306a36Sopenharmony_ci struct ndctl_region *region; 44462306a36Sopenharmony_ci 44562306a36Sopenharmony_ci ndctl_region_foreach(bus, region) { 44662306a36Sopenharmony_ci if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM) 44762306a36Sopenharmony_ci continue; 44862306a36Sopenharmony_ci if (ndctl_region_get_spa_index(region) == spa_index) 44962306a36Sopenharmony_ci return region; 45062306a36Sopenharmony_ci } 45162306a36Sopenharmony_ci return NULL; 45262306a36Sopenharmony_ci } 45362306a36Sopenharmony_ci 45462306a36Sopenharmony_ci 45562306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: Namespace 45662306a36Sopenharmony_ci----------------------------- 45762306a36Sopenharmony_ci 45862306a36Sopenharmony_ciA REGION, after resolving DPA aliasing and LABEL specified boundaries, surfaces 45962306a36Sopenharmony_cione or more "namespace" devices. The arrival of a "namespace" device currently 46062306a36Sopenharmony_citriggers the nd_pmem driver to load and register a disk/block device. 46162306a36Sopenharmony_ci 46262306a36Sopenharmony_ciLIBNVDIMM: namespace 46362306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^ 46462306a36Sopenharmony_ci 46562306a36Sopenharmony_ciHere is a sample layout from the 2 major types of NAMESPACE where namespace0.0 46662306a36Sopenharmony_cirepresents DIMM-info-backed PMEM (note that it has a 'uuid' attribute), and 46762306a36Sopenharmony_cinamespace1.0 represents an anonymous PMEM namespace (note that has no 'uuid' 46862306a36Sopenharmony_ciattribute due to not support a LABEL) 46962306a36Sopenharmony_ci 47062306a36Sopenharmony_ci:: 47162306a36Sopenharmony_ci 47262306a36Sopenharmony_ci /sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0 47362306a36Sopenharmony_ci |-- alt_name 47462306a36Sopenharmony_ci |-- devtype 47562306a36Sopenharmony_ci |-- dpa_extents 47662306a36Sopenharmony_ci |-- force_raw 47762306a36Sopenharmony_ci |-- modalias 47862306a36Sopenharmony_ci |-- numa_node 47962306a36Sopenharmony_ci |-- resource 48062306a36Sopenharmony_ci |-- size 48162306a36Sopenharmony_ci |-- subsystem -> ../../../../../../bus/nd 48262306a36Sopenharmony_ci |-- type 48362306a36Sopenharmony_ci |-- uevent 48462306a36Sopenharmony_ci `-- uuid 48562306a36Sopenharmony_ci /sys/devices/platform/nfit_test.1/ndbus1/region1/namespace1.0 48662306a36Sopenharmony_ci |-- block 48762306a36Sopenharmony_ci | `-- pmem0 48862306a36Sopenharmony_ci |-- devtype 48962306a36Sopenharmony_ci |-- driver -> ../../../../../../bus/nd/drivers/pmem 49062306a36Sopenharmony_ci |-- force_raw 49162306a36Sopenharmony_ci |-- modalias 49262306a36Sopenharmony_ci |-- numa_node 49362306a36Sopenharmony_ci |-- resource 49462306a36Sopenharmony_ci |-- size 49562306a36Sopenharmony_ci |-- subsystem -> ../../../../../../bus/nd 49662306a36Sopenharmony_ci |-- type 49762306a36Sopenharmony_ci `-- uevent 49862306a36Sopenharmony_ci 49962306a36Sopenharmony_ciLIBNDCTL: namespace enumeration example 50062306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 50162306a36Sopenharmony_ciNamespaces are indexed relative to their parent region, example below. 50262306a36Sopenharmony_ciThese indexes are mostly static from boot to boot, but subsystem makes 50362306a36Sopenharmony_cino guarantees in this regard. For a static namespace identifier use its 50462306a36Sopenharmony_ci'uuid' attribute. 50562306a36Sopenharmony_ci 50662306a36Sopenharmony_ci:: 50762306a36Sopenharmony_ci 50862306a36Sopenharmony_ci static struct ndctl_namespace 50962306a36Sopenharmony_ci *get_namespace_by_id(struct ndctl_region *region, unsigned int id) 51062306a36Sopenharmony_ci { 51162306a36Sopenharmony_ci struct ndctl_namespace *ndns; 51262306a36Sopenharmony_ci 51362306a36Sopenharmony_ci ndctl_namespace_foreach(region, ndns) 51462306a36Sopenharmony_ci if (ndctl_namespace_get_id(ndns) == id) 51562306a36Sopenharmony_ci return ndns; 51662306a36Sopenharmony_ci 51762306a36Sopenharmony_ci return NULL; 51862306a36Sopenharmony_ci } 51962306a36Sopenharmony_ci 52062306a36Sopenharmony_ciLIBNDCTL: namespace creation example 52162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 52262306a36Sopenharmony_ci 52362306a36Sopenharmony_ciIdle namespaces are automatically created by the kernel if a given 52462306a36Sopenharmony_ciregion has enough available capacity to create a new namespace. 52562306a36Sopenharmony_ciNamespace instantiation involves finding an idle namespace and 52662306a36Sopenharmony_ciconfiguring it. For the most part the setting of namespace attributes 52762306a36Sopenharmony_cican occur in any order, the only constraint is that 'uuid' must be set 52862306a36Sopenharmony_cibefore 'size'. This enables the kernel to track DPA allocations 52962306a36Sopenharmony_ciinternally with a static identifier:: 53062306a36Sopenharmony_ci 53162306a36Sopenharmony_ci static int configure_namespace(struct ndctl_region *region, 53262306a36Sopenharmony_ci struct ndctl_namespace *ndns, 53362306a36Sopenharmony_ci struct namespace_parameters *parameters) 53462306a36Sopenharmony_ci { 53562306a36Sopenharmony_ci char devname[50]; 53662306a36Sopenharmony_ci 53762306a36Sopenharmony_ci snprintf(devname, sizeof(devname), "namespace%d.%d", 53862306a36Sopenharmony_ci ndctl_region_get_id(region), paramaters->id); 53962306a36Sopenharmony_ci 54062306a36Sopenharmony_ci ndctl_namespace_set_alt_name(ndns, devname); 54162306a36Sopenharmony_ci /* 'uuid' must be set prior to setting size! */ 54262306a36Sopenharmony_ci ndctl_namespace_set_uuid(ndns, paramaters->uuid); 54362306a36Sopenharmony_ci ndctl_namespace_set_size(ndns, paramaters->size); 54462306a36Sopenharmony_ci /* unlike pmem namespaces, blk namespaces have a sector size */ 54562306a36Sopenharmony_ci if (parameters->lbasize) 54662306a36Sopenharmony_ci ndctl_namespace_set_sector_size(ndns, parameters->lbasize); 54762306a36Sopenharmony_ci ndctl_namespace_enable(ndns); 54862306a36Sopenharmony_ci } 54962306a36Sopenharmony_ci 55062306a36Sopenharmony_ci 55162306a36Sopenharmony_ciWhy the Term "namespace"? 55262306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^ 55362306a36Sopenharmony_ci 55462306a36Sopenharmony_ci 1. Why not "volume" for instance? "volume" ran the risk of confusing 55562306a36Sopenharmony_ci ND (libnvdimm subsystem) to a volume manager like device-mapper. 55662306a36Sopenharmony_ci 55762306a36Sopenharmony_ci 2. The term originated to describe the sub-devices that can be created 55862306a36Sopenharmony_ci within a NVME controller (see the nvme specification: 55962306a36Sopenharmony_ci https://www.nvmexpress.org/specifications/), and NFIT namespaces are 56062306a36Sopenharmony_ci meant to parallel the capabilities and configurability of 56162306a36Sopenharmony_ci NVME-namespaces. 56262306a36Sopenharmony_ci 56362306a36Sopenharmony_ci 56462306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: Block Translation Table "btt" 56562306a36Sopenharmony_ci------------------------------------------------- 56662306a36Sopenharmony_ci 56762306a36Sopenharmony_ciA BTT (design document: https://pmem.io/2014/09/23/btt.html) is a 56862306a36Sopenharmony_cipersonality driver for a namespace that fronts entire namespace as an 56962306a36Sopenharmony_ci'address abstraction'. 57062306a36Sopenharmony_ci 57162306a36Sopenharmony_ciLIBNVDIMM: btt layout 57262306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^ 57362306a36Sopenharmony_ci 57462306a36Sopenharmony_ciEvery region will start out with at least one BTT device which is the 57562306a36Sopenharmony_ciseed device. To activate it set the "namespace", "uuid", and 57662306a36Sopenharmony_ci"sector_size" attributes and then bind the device to the nd_pmem or 57762306a36Sopenharmony_cind_blk driver depending on the region type:: 57862306a36Sopenharmony_ci 57962306a36Sopenharmony_ci /sys/devices/platform/nfit_test.1/ndbus0/region0/btt0/ 58062306a36Sopenharmony_ci |-- namespace 58162306a36Sopenharmony_ci |-- delete 58262306a36Sopenharmony_ci |-- devtype 58362306a36Sopenharmony_ci |-- modalias 58462306a36Sopenharmony_ci |-- numa_node 58562306a36Sopenharmony_ci |-- sector_size 58662306a36Sopenharmony_ci |-- subsystem -> ../../../../../bus/nd 58762306a36Sopenharmony_ci |-- uevent 58862306a36Sopenharmony_ci `-- uuid 58962306a36Sopenharmony_ci 59062306a36Sopenharmony_ciLIBNDCTL: btt creation example 59162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 59262306a36Sopenharmony_ci 59362306a36Sopenharmony_ciSimilar to namespaces an idle BTT device is automatically created per 59462306a36Sopenharmony_ciregion. Each time this "seed" btt device is configured and enabled a new 59562306a36Sopenharmony_ciseed is created. Creating a BTT configuration involves two steps of 59662306a36Sopenharmony_cifinding and idle BTT and assigning it to consume a namespace. 59762306a36Sopenharmony_ci 59862306a36Sopenharmony_ci:: 59962306a36Sopenharmony_ci 60062306a36Sopenharmony_ci static struct ndctl_btt *get_idle_btt(struct ndctl_region *region) 60162306a36Sopenharmony_ci { 60262306a36Sopenharmony_ci struct ndctl_btt *btt; 60362306a36Sopenharmony_ci 60462306a36Sopenharmony_ci ndctl_btt_foreach(region, btt) 60562306a36Sopenharmony_ci if (!ndctl_btt_is_enabled(btt) 60662306a36Sopenharmony_ci && !ndctl_btt_is_configured(btt)) 60762306a36Sopenharmony_ci return btt; 60862306a36Sopenharmony_ci 60962306a36Sopenharmony_ci return NULL; 61062306a36Sopenharmony_ci } 61162306a36Sopenharmony_ci 61262306a36Sopenharmony_ci static int configure_btt(struct ndctl_region *region, 61362306a36Sopenharmony_ci struct btt_parameters *parameters) 61462306a36Sopenharmony_ci { 61562306a36Sopenharmony_ci btt = get_idle_btt(region); 61662306a36Sopenharmony_ci 61762306a36Sopenharmony_ci ndctl_btt_set_uuid(btt, parameters->uuid); 61862306a36Sopenharmony_ci ndctl_btt_set_sector_size(btt, parameters->sector_size); 61962306a36Sopenharmony_ci ndctl_btt_set_namespace(btt, parameters->ndns); 62062306a36Sopenharmony_ci /* turn off raw mode device */ 62162306a36Sopenharmony_ci ndctl_namespace_disable(parameters->ndns); 62262306a36Sopenharmony_ci /* turn on btt access */ 62362306a36Sopenharmony_ci ndctl_btt_enable(btt); 62462306a36Sopenharmony_ci } 62562306a36Sopenharmony_ci 62662306a36Sopenharmony_ciOnce instantiated a new inactive btt seed device will appear underneath 62762306a36Sopenharmony_cithe region. 62862306a36Sopenharmony_ci 62962306a36Sopenharmony_ciOnce a "namespace" is removed from a BTT that instance of the BTT device 63062306a36Sopenharmony_ciwill be deleted or otherwise reset to default values. This deletion is 63162306a36Sopenharmony_cionly at the device model level. In order to destroy a BTT the "info 63262306a36Sopenharmony_ciblock" needs to be destroyed. Note, that to destroy a BTT the media 63362306a36Sopenharmony_cineeds to be written in raw mode. By default, the kernel will autodetect 63462306a36Sopenharmony_cithe presence of a BTT and disable raw mode. This autodetect behavior 63562306a36Sopenharmony_cican be suppressed by enabling raw mode for the namespace via the 63662306a36Sopenharmony_cindctl_namespace_set_raw_mode() API. 63762306a36Sopenharmony_ci 63862306a36Sopenharmony_ci 63962306a36Sopenharmony_ciSummary LIBNDCTL Diagram 64062306a36Sopenharmony_ci------------------------ 64162306a36Sopenharmony_ci 64262306a36Sopenharmony_ciFor the given example above, here is the view of the objects as seen by the 64362306a36Sopenharmony_ciLIBNDCTL API:: 64462306a36Sopenharmony_ci 64562306a36Sopenharmony_ci +---+ 64662306a36Sopenharmony_ci |CTX| 64762306a36Sopenharmony_ci +-+-+ 64862306a36Sopenharmony_ci | 64962306a36Sopenharmony_ci +-------+ | 65062306a36Sopenharmony_ci | DIMM0 <-+ | +---------+ +--------------+ +---------------+ 65162306a36Sopenharmony_ci +-------+ | | +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" | 65262306a36Sopenharmony_ci | DIMM1 <-+ +-v--+ | +---------+ +--------------+ +---------------+ 65362306a36Sopenharmony_ci +-------+ +-+BUS0+-| +---------+ +--------------+ +----------------------+ 65462306a36Sopenharmony_ci | DIMM2 <-+ +----+ +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" | BTT1 | 65562306a36Sopenharmony_ci +-------+ | | +---------+ +--------------+ +---------------+------+ 65662306a36Sopenharmony_ci | DIMM3 <-+ 65762306a36Sopenharmony_ci +-------+ 658