162306a36Sopenharmony_ci===============================
262306a36Sopenharmony_ciLIBNVDIMM: Non-Volatile Devices
362306a36Sopenharmony_ci===============================
462306a36Sopenharmony_ci
562306a36Sopenharmony_cilibnvdimm - kernel / libndctl - userspace helper library
662306a36Sopenharmony_ci
762306a36Sopenharmony_cinvdimm@lists.linux.dev
862306a36Sopenharmony_ci
962306a36Sopenharmony_ciVersion 13
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ci.. contents:
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ci	Glossary
1462306a36Sopenharmony_ci	Overview
1562306a36Sopenharmony_ci	    Supporting Documents
1662306a36Sopenharmony_ci	    Git Trees
1762306a36Sopenharmony_ci	LIBNVDIMM PMEM
1862306a36Sopenharmony_ci	    PMEM-REGIONs, Atomic Sectors, and DAX
1962306a36Sopenharmony_ci	Example NVDIMM Platform
2062306a36Sopenharmony_ci	LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
2162306a36Sopenharmony_ci	    LIBNDCTL: Context
2262306a36Sopenharmony_ci	        libndctl: instantiate a new library context example
2362306a36Sopenharmony_ci	    LIBNVDIMM/LIBNDCTL: Bus
2462306a36Sopenharmony_ci	        libnvdimm: control class device in /sys/class
2562306a36Sopenharmony_ci	        libnvdimm: bus
2662306a36Sopenharmony_ci	        libndctl: bus enumeration example
2762306a36Sopenharmony_ci	    LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
2862306a36Sopenharmony_ci	        libnvdimm: DIMM (NMEM)
2962306a36Sopenharmony_ci	        libndctl: DIMM enumeration example
3062306a36Sopenharmony_ci	    LIBNVDIMM/LIBNDCTL: Region
3162306a36Sopenharmony_ci	        libnvdimm: region
3262306a36Sopenharmony_ci	        libndctl: region enumeration example
3362306a36Sopenharmony_ci	        Why Not Encode the Region Type into the Region Name?
3462306a36Sopenharmony_ci	        How Do I Determine the Major Type of a Region?
3562306a36Sopenharmony_ci	    LIBNVDIMM/LIBNDCTL: Namespace
3662306a36Sopenharmony_ci	        libnvdimm: namespace
3762306a36Sopenharmony_ci	        libndctl: namespace enumeration example
3862306a36Sopenharmony_ci	        libndctl: namespace creation example
3962306a36Sopenharmony_ci	        Why the Term "namespace"?
4062306a36Sopenharmony_ci	    LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
4162306a36Sopenharmony_ci	        libnvdimm: btt layout
4262306a36Sopenharmony_ci	        libndctl: btt creation example
4362306a36Sopenharmony_ci	Summary LIBNDCTL Diagram
4462306a36Sopenharmony_ci
4562306a36Sopenharmony_ci
4662306a36Sopenharmony_ciGlossary
4762306a36Sopenharmony_ci========
4862306a36Sopenharmony_ci
4962306a36Sopenharmony_ciPMEM:
5062306a36Sopenharmony_ci  A system-physical-address range where writes are persistent.  A
5162306a36Sopenharmony_ci  block device composed of PMEM is capable of DAX.  A PMEM address range
5262306a36Sopenharmony_ci  may span an interleave of several DIMMs.
5362306a36Sopenharmony_ci
5462306a36Sopenharmony_ciDPA:
5562306a36Sopenharmony_ci  DIMM Physical Address, is a DIMM-relative offset.  With one DIMM in
5662306a36Sopenharmony_ci  the system there would be a 1:1 system-physical-address:DPA association.
5762306a36Sopenharmony_ci  Once more DIMMs are added a memory controller interleave must be
5862306a36Sopenharmony_ci  decoded to determine the DPA associated with a given
5962306a36Sopenharmony_ci  system-physical-address.
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ciDAX:
6262306a36Sopenharmony_ci  File system extensions to bypass the page cache and block layer to
6362306a36Sopenharmony_ci  mmap persistent memory, from a PMEM block device, directly into a
6462306a36Sopenharmony_ci  process address space.
6562306a36Sopenharmony_ci
6662306a36Sopenharmony_ciDSM:
6762306a36Sopenharmony_ci  Device Specific Method: ACPI method to control specific
6862306a36Sopenharmony_ci  device - in this case the firmware.
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ciDCR:
7162306a36Sopenharmony_ci  NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
7262306a36Sopenharmony_ci  It defines a vendor-id, device-id, and interface format for a given DIMM.
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ciBTT:
7562306a36Sopenharmony_ci  Block Translation Table: Persistent memory is byte addressable.
7662306a36Sopenharmony_ci  Existing software may have an expectation that the power-fail-atomicity
7762306a36Sopenharmony_ci  of writes is at least one sector, 512 bytes.  The BTT is an indirection
7862306a36Sopenharmony_ci  table with atomic update semantics to front a PMEM block device
7962306a36Sopenharmony_ci  driver and present arbitrary atomic sector sizes.
8062306a36Sopenharmony_ci
8162306a36Sopenharmony_ciLABEL:
8262306a36Sopenharmony_ci  Metadata stored on a DIMM device that partitions and identifies
8362306a36Sopenharmony_ci  (persistently names) capacity allocated to different PMEM namespaces. It
8462306a36Sopenharmony_ci  also indicates whether an address abstraction like a BTT is applied to
8562306a36Sopenharmony_ci  the namespace.  Note that traditional partition tables, GPT/MBR, are
8662306a36Sopenharmony_ci  layered on top of a PMEM namespace, or an address abstraction like BTT
8762306a36Sopenharmony_ci  if present, but partition support is deprecated going forward.
8862306a36Sopenharmony_ci
8962306a36Sopenharmony_ci
9062306a36Sopenharmony_ciOverview
9162306a36Sopenharmony_ci========
9262306a36Sopenharmony_ci
9362306a36Sopenharmony_ciThe LIBNVDIMM subsystem provides support for PMEM described by platform
9462306a36Sopenharmony_cifirmware or a device driver. On ACPI based systems the platform firmware
9562306a36Sopenharmony_ciconveys persistent memory resource via the ACPI NFIT "NVDIMM Firmware
9662306a36Sopenharmony_ciInterface Table" in ACPI 6. While the LIBNVDIMM subsystem implementation
9762306a36Sopenharmony_ciis generic and supports pre-NFIT platforms, it was guided by the
9862306a36Sopenharmony_cisuperset of capabilities need to support this ACPI 6 definition for
9962306a36Sopenharmony_ciNVDIMM resources. The original implementation supported the
10062306a36Sopenharmony_ciblock-window-aperture capability described in the NFIT, but that support
10162306a36Sopenharmony_cihas since been abandoned and never shipped in a product.
10262306a36Sopenharmony_ci
10362306a36Sopenharmony_ciSupporting Documents
10462306a36Sopenharmony_ci--------------------
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ciACPI 6:
10762306a36Sopenharmony_ci	https://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
10862306a36Sopenharmony_ciNVDIMM Namespace:
10962306a36Sopenharmony_ci	https://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
11062306a36Sopenharmony_ciDSM Interface Example:
11162306a36Sopenharmony_ci	https://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
11262306a36Sopenharmony_ciDriver Writer's Guide:
11362306a36Sopenharmony_ci	https://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ciGit Trees
11662306a36Sopenharmony_ci---------
11762306a36Sopenharmony_ci
11862306a36Sopenharmony_ciLIBNVDIMM:
11962306a36Sopenharmony_ci	https://git.kernel.org/cgit/linux/kernel/git/nvdimm/nvdimm.git
12062306a36Sopenharmony_ciLIBNDCTL:
12162306a36Sopenharmony_ci	https://github.com/pmem/ndctl.git
12262306a36Sopenharmony_ci
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ciLIBNVDIMM PMEM
12562306a36Sopenharmony_ci==============
12662306a36Sopenharmony_ci
12762306a36Sopenharmony_ciPrior to the arrival of the NFIT, non-volatile memory was described to a
12862306a36Sopenharmony_cisystem in various ad-hoc ways.  Usually only the bare minimum was
12962306a36Sopenharmony_ciprovided, namely, a single system-physical-address range where writes
13062306a36Sopenharmony_ciare expected to be durable after a system power loss.  Now, the NFIT
13162306a36Sopenharmony_cispecification standardizes not only the description of PMEM, but also
13262306a36Sopenharmony_ciplatform message-passing entry points for control and configuration.
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ciPMEM (nd_pmem.ko): Drives a system-physical-address range.  This range is
13562306a36Sopenharmony_cicontiguous in system memory and may be interleaved (hardware memory controller
13662306a36Sopenharmony_cistriped) across multiple DIMMs.  When interleaved the platform may optionally
13762306a36Sopenharmony_ciprovide details of which DIMMs are participating in the interleave.
13862306a36Sopenharmony_ci
13962306a36Sopenharmony_ciIt is worth noting that when the labeling capability is detected (a EFI
14062306a36Sopenharmony_cinamespace label index block is found), then no block device is created
14162306a36Sopenharmony_ciby default as userspace needs to do at least one allocation of DPA to
14262306a36Sopenharmony_cithe PMEM range.  In contrast ND_NAMESPACE_IO ranges, once registered,
14362306a36Sopenharmony_cican be immediately attached to nd_pmem. This latter mode is called
14462306a36Sopenharmony_cilabel-less or "legacy".
14562306a36Sopenharmony_ci
14662306a36Sopenharmony_ciPMEM-REGIONs, Atomic Sectors, and DAX
14762306a36Sopenharmony_ci-------------------------------------
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ciFor the cases where an application or filesystem still needs atomic sector
15062306a36Sopenharmony_ciupdate guarantees it can register a BTT on a PMEM device or partition.  See
15162306a36Sopenharmony_ciLIBNVDIMM/NDCTL: Block Translation Table "btt"
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ci
15462306a36Sopenharmony_ciExample NVDIMM Platform
15562306a36Sopenharmony_ci=======================
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ciFor the remainder of this document the following diagram will be
15862306a36Sopenharmony_cireferenced for any example sysfs layouts::
15962306a36Sopenharmony_ci
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ci                               (a)               (b)           DIMM
16262306a36Sopenharmony_ci            +-------------------+--------+--------+--------+
16362306a36Sopenharmony_ci  +------+  |       pm0.0       |  free  | pm1.0  |  free  |    0
16462306a36Sopenharmony_ci  | imc0 +--+- - - region0- - - +--------+        +--------+
16562306a36Sopenharmony_ci  +--+---+  |       pm0.0       |  free  | pm1.0  |  free  |    1
16662306a36Sopenharmony_ci     |      +-------------------+--------v        v--------+
16762306a36Sopenharmony_ci  +--+---+                               |                 |
16862306a36Sopenharmony_ci  | cpu0 |                                     region1
16962306a36Sopenharmony_ci  +--+---+                               |                 |
17062306a36Sopenharmony_ci     |      +----------------------------^        ^--------+
17162306a36Sopenharmony_ci  +--+---+  |           free             | pm1.0  |  free  |    2
17262306a36Sopenharmony_ci  | imc1 +--+----------------------------|        +--------+
17362306a36Sopenharmony_ci  +------+  |           free             | pm1.0  |  free  |    3
17462306a36Sopenharmony_ci            +----------------------------+--------+--------+
17562306a36Sopenharmony_ci
17662306a36Sopenharmony_ciIn this platform we have four DIMMs and two memory controllers in one
17762306a36Sopenharmony_cisocket.  Each PMEM interleave set is identified by a region device with
17862306a36Sopenharmony_cia dynamically assigned id.
17962306a36Sopenharmony_ci
18062306a36Sopenharmony_ci    1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
18162306a36Sopenharmony_ci       single PMEM namespace is created in the REGION0-SPA-range that spans most
18262306a36Sopenharmony_ci       of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
18362306a36Sopenharmony_ci       interleaved system-physical-address range is left free for
18462306a36Sopenharmony_ci       another PMEM namespace to be defined.
18562306a36Sopenharmony_ci
18662306a36Sopenharmony_ci    2. In the last portion of DIMM0 and DIMM1 we have an interleaved
18762306a36Sopenharmony_ci       system-physical-address range, REGION1, that spans those two DIMMs as
18862306a36Sopenharmony_ci       well as DIMM2 and DIMM3.  Some of REGION1 is allocated to a PMEM namespace
18962306a36Sopenharmony_ci       named "pm1.0".
19062306a36Sopenharmony_ci
19162306a36Sopenharmony_ci    This bus is provided by the kernel under the device
19262306a36Sopenharmony_ci    /sys/devices/platform/nfit_test.0 when the nfit_test.ko module from
19362306a36Sopenharmony_ci    tools/testing/nvdimm is loaded. This module is a unit test for
19462306a36Sopenharmony_ci    LIBNVDIMM and the  acpi_nfit.ko driver.
19562306a36Sopenharmony_ci
19662306a36Sopenharmony_ci
19762306a36Sopenharmony_ciLIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
19862306a36Sopenharmony_ci========================================================
19962306a36Sopenharmony_ci
20062306a36Sopenharmony_ciWhat follows is a description of the LIBNVDIMM sysfs layout and a
20162306a36Sopenharmony_cicorresponding object hierarchy diagram as viewed through the LIBNDCTL
20262306a36Sopenharmony_ciAPI.  The example sysfs paths and diagrams are relative to the Example
20362306a36Sopenharmony_ciNVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
20462306a36Sopenharmony_citest.
20562306a36Sopenharmony_ci
20662306a36Sopenharmony_ciLIBNDCTL: Context
20762306a36Sopenharmony_ci-----------------
20862306a36Sopenharmony_ci
20962306a36Sopenharmony_ciEvery API call in the LIBNDCTL library requires a context that holds the
21062306a36Sopenharmony_cilogging parameters and other library instance state.  The library is
21162306a36Sopenharmony_cibased on the libabc template:
21262306a36Sopenharmony_ci
21362306a36Sopenharmony_ci	https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
21462306a36Sopenharmony_ci
21562306a36Sopenharmony_ciLIBNDCTL: instantiate a new library context example
21662306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21762306a36Sopenharmony_ci
21862306a36Sopenharmony_ci::
21962306a36Sopenharmony_ci
22062306a36Sopenharmony_ci	struct ndctl_ctx *ctx;
22162306a36Sopenharmony_ci
22262306a36Sopenharmony_ci	if (ndctl_new(&ctx) == 0)
22362306a36Sopenharmony_ci		return ctx;
22462306a36Sopenharmony_ci	else
22562306a36Sopenharmony_ci		return NULL;
22662306a36Sopenharmony_ci
22762306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: Bus
22862306a36Sopenharmony_ci-----------------------
22962306a36Sopenharmony_ci
23062306a36Sopenharmony_ciA bus has a 1:1 relationship with an NFIT.  The current expectation for
23162306a36Sopenharmony_ciACPI based systems is that there is only ever one platform-global NFIT.
23262306a36Sopenharmony_ciThat said, it is trivial to register multiple NFITs, the specification
23362306a36Sopenharmony_cidoes not preclude it.  The infrastructure supports multiple busses and
23462306a36Sopenharmony_ciwe use this capability to test multiple NFIT configurations in the unit
23562306a36Sopenharmony_citest.
23662306a36Sopenharmony_ci
23762306a36Sopenharmony_ciLIBNVDIMM: control class device in /sys/class
23862306a36Sopenharmony_ci---------------------------------------------
23962306a36Sopenharmony_ci
24062306a36Sopenharmony_ciThis character device accepts DSM messages to be passed to DIMM
24162306a36Sopenharmony_ciidentified by its NFIT handle::
24262306a36Sopenharmony_ci
24362306a36Sopenharmony_ci	/sys/class/nd/ndctl0
24462306a36Sopenharmony_ci	|-- dev
24562306a36Sopenharmony_ci	|-- device -> ../../../ndbus0
24662306a36Sopenharmony_ci	|-- subsystem -> ../../../../../../../class/nd
24762306a36Sopenharmony_ci
24862306a36Sopenharmony_ci
24962306a36Sopenharmony_ci
25062306a36Sopenharmony_ciLIBNVDIMM: bus
25162306a36Sopenharmony_ci--------------
25262306a36Sopenharmony_ci
25362306a36Sopenharmony_ci::
25462306a36Sopenharmony_ci
25562306a36Sopenharmony_ci	struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
25662306a36Sopenharmony_ci	       struct nvdimm_bus_descriptor *nfit_desc);
25762306a36Sopenharmony_ci
25862306a36Sopenharmony_ci::
25962306a36Sopenharmony_ci
26062306a36Sopenharmony_ci	/sys/devices/platform/nfit_test.0/ndbus0
26162306a36Sopenharmony_ci	|-- commands
26262306a36Sopenharmony_ci	|-- nd
26362306a36Sopenharmony_ci	|-- nfit
26462306a36Sopenharmony_ci	|-- nmem0
26562306a36Sopenharmony_ci	|-- nmem1
26662306a36Sopenharmony_ci	|-- nmem2
26762306a36Sopenharmony_ci	|-- nmem3
26862306a36Sopenharmony_ci	|-- power
26962306a36Sopenharmony_ci	|-- provider
27062306a36Sopenharmony_ci	|-- region0
27162306a36Sopenharmony_ci	|-- region1
27262306a36Sopenharmony_ci	|-- region2
27362306a36Sopenharmony_ci	|-- region3
27462306a36Sopenharmony_ci	|-- region4
27562306a36Sopenharmony_ci	|-- region5
27662306a36Sopenharmony_ci	|-- uevent
27762306a36Sopenharmony_ci	`-- wait_probe
27862306a36Sopenharmony_ci
27962306a36Sopenharmony_ciLIBNDCTL: bus enumeration example
28062306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
28162306a36Sopenharmony_ci
28262306a36Sopenharmony_ciFind the bus handle that describes the bus from Example NVDIMM Platform::
28362306a36Sopenharmony_ci
28462306a36Sopenharmony_ci	static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
28562306a36Sopenharmony_ci			const char *provider)
28662306a36Sopenharmony_ci	{
28762306a36Sopenharmony_ci		struct ndctl_bus *bus;
28862306a36Sopenharmony_ci
28962306a36Sopenharmony_ci		ndctl_bus_foreach(ctx, bus)
29062306a36Sopenharmony_ci			if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
29162306a36Sopenharmony_ci				return bus;
29262306a36Sopenharmony_ci
29362306a36Sopenharmony_ci		return NULL;
29462306a36Sopenharmony_ci	}
29562306a36Sopenharmony_ci
29662306a36Sopenharmony_ci	bus = get_bus_by_provider(ctx, "nfit_test.0");
29762306a36Sopenharmony_ci
29862306a36Sopenharmony_ci
29962306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: DIMM (NMEM)
30062306a36Sopenharmony_ci-------------------------------
30162306a36Sopenharmony_ci
30262306a36Sopenharmony_ciThe DIMM device provides a character device for sending commands to
30362306a36Sopenharmony_cihardware, and it is a container for LABELs.  If the DIMM is defined by
30462306a36Sopenharmony_ciNFIT then an optional 'nfit' attribute sub-directory is available to add
30562306a36Sopenharmony_ciNFIT-specifics.
30662306a36Sopenharmony_ci
30762306a36Sopenharmony_ciNote that the kernel device name for "DIMMs" is "nmemX".  The NFIT
30862306a36Sopenharmony_cidescribes these devices via "Memory Device to System Physical Address
30962306a36Sopenharmony_ciRange Mapping Structure", and there is no requirement that they actually
31062306a36Sopenharmony_cibe physical DIMMs, so we use a more generic name.
31162306a36Sopenharmony_ci
31262306a36Sopenharmony_ciLIBNVDIMM: DIMM (NMEM)
31362306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^
31462306a36Sopenharmony_ci
31562306a36Sopenharmony_ci::
31662306a36Sopenharmony_ci
31762306a36Sopenharmony_ci	struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
31862306a36Sopenharmony_ci			const struct attribute_group **groups, unsigned long flags,
31962306a36Sopenharmony_ci			unsigned long *dsm_mask);
32062306a36Sopenharmony_ci
32162306a36Sopenharmony_ci::
32262306a36Sopenharmony_ci
32362306a36Sopenharmony_ci	/sys/devices/platform/nfit_test.0/ndbus0
32462306a36Sopenharmony_ci	|-- nmem0
32562306a36Sopenharmony_ci	|   |-- available_slots
32662306a36Sopenharmony_ci	|   |-- commands
32762306a36Sopenharmony_ci	|   |-- dev
32862306a36Sopenharmony_ci	|   |-- devtype
32962306a36Sopenharmony_ci	|   |-- driver -> ../../../../../bus/nd/drivers/nvdimm
33062306a36Sopenharmony_ci	|   |-- modalias
33162306a36Sopenharmony_ci	|   |-- nfit
33262306a36Sopenharmony_ci	|   |   |-- device
33362306a36Sopenharmony_ci	|   |   |-- format
33462306a36Sopenharmony_ci	|   |   |-- handle
33562306a36Sopenharmony_ci	|   |   |-- phys_id
33662306a36Sopenharmony_ci	|   |   |-- rev_id
33762306a36Sopenharmony_ci	|   |   |-- serial
33862306a36Sopenharmony_ci	|   |   `-- vendor
33962306a36Sopenharmony_ci	|   |-- state
34062306a36Sopenharmony_ci	|   |-- subsystem -> ../../../../../bus/nd
34162306a36Sopenharmony_ci	|   `-- uevent
34262306a36Sopenharmony_ci	|-- nmem1
34362306a36Sopenharmony_ci	[..]
34462306a36Sopenharmony_ci
34562306a36Sopenharmony_ci
34662306a36Sopenharmony_ciLIBNDCTL: DIMM enumeration example
34762306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
34862306a36Sopenharmony_ci
34962306a36Sopenharmony_ciNote, in this example we are assuming NFIT-defined DIMMs which are
35062306a36Sopenharmony_ciidentified by an "nfit_handle" a 32-bit value where:
35162306a36Sopenharmony_ci
35262306a36Sopenharmony_ci   - Bit 3:0 DIMM number within the memory channel
35362306a36Sopenharmony_ci   - Bit 7:4 memory channel number
35462306a36Sopenharmony_ci   - Bit 11:8 memory controller ID
35562306a36Sopenharmony_ci   - Bit 15:12 socket ID (within scope of a Node controller if node
35662306a36Sopenharmony_ci     controller is present)
35762306a36Sopenharmony_ci   - Bit 27:16 Node Controller ID
35862306a36Sopenharmony_ci   - Bit 31:28 Reserved
35962306a36Sopenharmony_ci
36062306a36Sopenharmony_ci::
36162306a36Sopenharmony_ci
36262306a36Sopenharmony_ci	static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus,
36362306a36Sopenharmony_ci	       unsigned int handle)
36462306a36Sopenharmony_ci	{
36562306a36Sopenharmony_ci		struct ndctl_dimm *dimm;
36662306a36Sopenharmony_ci
36762306a36Sopenharmony_ci		ndctl_dimm_foreach(bus, dimm)
36862306a36Sopenharmony_ci			if (ndctl_dimm_get_handle(dimm) == handle)
36962306a36Sopenharmony_ci				return dimm;
37062306a36Sopenharmony_ci
37162306a36Sopenharmony_ci		return NULL;
37262306a36Sopenharmony_ci	}
37362306a36Sopenharmony_ci
37462306a36Sopenharmony_ci	#define DIMM_HANDLE(n, s, i, c, d) \
37562306a36Sopenharmony_ci		(((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \
37662306a36Sopenharmony_ci		 | ((c & 0xf) << 4) | (d & 0xf))
37762306a36Sopenharmony_ci
37862306a36Sopenharmony_ci	dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0));
37962306a36Sopenharmony_ci
38062306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: Region
38162306a36Sopenharmony_ci--------------------------
38262306a36Sopenharmony_ci
38362306a36Sopenharmony_ciA generic REGION device is registered for each PMEM interleave-set /
38462306a36Sopenharmony_cirange. Per the example there are 2 PMEM regions on the "nfit_test.0"
38562306a36Sopenharmony_cibus. The primary role of regions are to be a container of "mappings".  A
38662306a36Sopenharmony_cimapping is a tuple of <DIMM, DPA-start-offset, length>.
38762306a36Sopenharmony_ci
38862306a36Sopenharmony_ciLIBNVDIMM provides a built-in driver for REGION devices.  This driver
38962306a36Sopenharmony_ciis responsible for all parsing LABELs, if present, and then emitting NAMESPACE
39062306a36Sopenharmony_cidevices for the nd_pmem driver to consume.
39162306a36Sopenharmony_ci
39262306a36Sopenharmony_ciIn addition to the generic attributes of "mapping"s, "interleave_ways"
39362306a36Sopenharmony_ciand "size" the REGION device also exports some convenience attributes.
39462306a36Sopenharmony_ci"nstype" indicates the integer type of namespace-device this region
39562306a36Sopenharmony_ciemits, "devtype" duplicates the DEVTYPE variable stored by udev at the
39662306a36Sopenharmony_ci'add' event, "modalias" duplicates the MODALIAS variable stored by udev
39762306a36Sopenharmony_ciat the 'add' event, and finally, the optional "spa_index" is provided in
39862306a36Sopenharmony_cithe case where the region is defined by a SPA.
39962306a36Sopenharmony_ci
40062306a36Sopenharmony_ciLIBNVDIMM: region::
40162306a36Sopenharmony_ci
40262306a36Sopenharmony_ci	struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
40362306a36Sopenharmony_ci			struct nd_region_desc *ndr_desc);
40462306a36Sopenharmony_ci
40562306a36Sopenharmony_ci::
40662306a36Sopenharmony_ci
40762306a36Sopenharmony_ci	/sys/devices/platform/nfit_test.0/ndbus0
40862306a36Sopenharmony_ci	|-- region0
40962306a36Sopenharmony_ci	|   |-- available_size
41062306a36Sopenharmony_ci	|   |-- btt0
41162306a36Sopenharmony_ci	|   |-- btt_seed
41262306a36Sopenharmony_ci	|   |-- devtype
41362306a36Sopenharmony_ci	|   |-- driver -> ../../../../../bus/nd/drivers/nd_region
41462306a36Sopenharmony_ci	|   |-- init_namespaces
41562306a36Sopenharmony_ci	|   |-- mapping0
41662306a36Sopenharmony_ci	|   |-- mapping1
41762306a36Sopenharmony_ci	|   |-- mappings
41862306a36Sopenharmony_ci	|   |-- modalias
41962306a36Sopenharmony_ci	|   |-- namespace0.0
42062306a36Sopenharmony_ci	|   |-- namespace_seed
42162306a36Sopenharmony_ci	|   |-- numa_node
42262306a36Sopenharmony_ci	|   |-- nfit
42362306a36Sopenharmony_ci	|   |   `-- spa_index
42462306a36Sopenharmony_ci	|   |-- nstype
42562306a36Sopenharmony_ci	|   |-- set_cookie
42662306a36Sopenharmony_ci	|   |-- size
42762306a36Sopenharmony_ci	|   |-- subsystem -> ../../../../../bus/nd
42862306a36Sopenharmony_ci	|   `-- uevent
42962306a36Sopenharmony_ci	|-- region1
43062306a36Sopenharmony_ci	[..]
43162306a36Sopenharmony_ci
43262306a36Sopenharmony_ciLIBNDCTL: region enumeration example
43362306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
43462306a36Sopenharmony_ci
43562306a36Sopenharmony_ciSample region retrieval routines based on NFIT-unique data like
43662306a36Sopenharmony_ci"spa_index" (interleave set id).
43762306a36Sopenharmony_ci
43862306a36Sopenharmony_ci::
43962306a36Sopenharmony_ci
44062306a36Sopenharmony_ci	static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
44162306a36Sopenharmony_ci			unsigned int spa_index)
44262306a36Sopenharmony_ci	{
44362306a36Sopenharmony_ci		struct ndctl_region *region;
44462306a36Sopenharmony_ci
44562306a36Sopenharmony_ci		ndctl_region_foreach(bus, region) {
44662306a36Sopenharmony_ci			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM)
44762306a36Sopenharmony_ci				continue;
44862306a36Sopenharmony_ci			if (ndctl_region_get_spa_index(region) == spa_index)
44962306a36Sopenharmony_ci				return region;
45062306a36Sopenharmony_ci		}
45162306a36Sopenharmony_ci		return NULL;
45262306a36Sopenharmony_ci	}
45362306a36Sopenharmony_ci
45462306a36Sopenharmony_ci
45562306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: Namespace
45662306a36Sopenharmony_ci-----------------------------
45762306a36Sopenharmony_ci
45862306a36Sopenharmony_ciA REGION, after resolving DPA aliasing and LABEL specified boundaries, surfaces
45962306a36Sopenharmony_cione or more "namespace" devices.  The arrival of a "namespace" device currently
46062306a36Sopenharmony_citriggers the nd_pmem driver to load and register a disk/block device.
46162306a36Sopenharmony_ci
46262306a36Sopenharmony_ciLIBNVDIMM: namespace
46362306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^
46462306a36Sopenharmony_ci
46562306a36Sopenharmony_ciHere is a sample layout from the 2 major types of NAMESPACE where namespace0.0
46662306a36Sopenharmony_cirepresents DIMM-info-backed PMEM (note that it has a 'uuid' attribute), and
46762306a36Sopenharmony_cinamespace1.0 represents an anonymous PMEM namespace (note that has no 'uuid'
46862306a36Sopenharmony_ciattribute due to not support a LABEL)
46962306a36Sopenharmony_ci
47062306a36Sopenharmony_ci::
47162306a36Sopenharmony_ci
47262306a36Sopenharmony_ci	/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
47362306a36Sopenharmony_ci	|-- alt_name
47462306a36Sopenharmony_ci	|-- devtype
47562306a36Sopenharmony_ci	|-- dpa_extents
47662306a36Sopenharmony_ci	|-- force_raw
47762306a36Sopenharmony_ci	|-- modalias
47862306a36Sopenharmony_ci	|-- numa_node
47962306a36Sopenharmony_ci	|-- resource
48062306a36Sopenharmony_ci	|-- size
48162306a36Sopenharmony_ci	|-- subsystem -> ../../../../../../bus/nd
48262306a36Sopenharmony_ci	|-- type
48362306a36Sopenharmony_ci	|-- uevent
48462306a36Sopenharmony_ci	`-- uuid
48562306a36Sopenharmony_ci	/sys/devices/platform/nfit_test.1/ndbus1/region1/namespace1.0
48662306a36Sopenharmony_ci	|-- block
48762306a36Sopenharmony_ci	|   `-- pmem0
48862306a36Sopenharmony_ci	|-- devtype
48962306a36Sopenharmony_ci	|-- driver -> ../../../../../../bus/nd/drivers/pmem
49062306a36Sopenharmony_ci	|-- force_raw
49162306a36Sopenharmony_ci	|-- modalias
49262306a36Sopenharmony_ci	|-- numa_node
49362306a36Sopenharmony_ci	|-- resource
49462306a36Sopenharmony_ci	|-- size
49562306a36Sopenharmony_ci	|-- subsystem -> ../../../../../../bus/nd
49662306a36Sopenharmony_ci	|-- type
49762306a36Sopenharmony_ci	`-- uevent
49862306a36Sopenharmony_ci
49962306a36Sopenharmony_ciLIBNDCTL: namespace enumeration example
50062306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
50162306a36Sopenharmony_ciNamespaces are indexed relative to their parent region, example below.
50262306a36Sopenharmony_ciThese indexes are mostly static from boot to boot, but subsystem makes
50362306a36Sopenharmony_cino guarantees in this regard.  For a static namespace identifier use its
50462306a36Sopenharmony_ci'uuid' attribute.
50562306a36Sopenharmony_ci
50662306a36Sopenharmony_ci::
50762306a36Sopenharmony_ci
50862306a36Sopenharmony_ci  static struct ndctl_namespace
50962306a36Sopenharmony_ci  *get_namespace_by_id(struct ndctl_region *region, unsigned int id)
51062306a36Sopenharmony_ci  {
51162306a36Sopenharmony_ci          struct ndctl_namespace *ndns;
51262306a36Sopenharmony_ci
51362306a36Sopenharmony_ci          ndctl_namespace_foreach(region, ndns)
51462306a36Sopenharmony_ci                  if (ndctl_namespace_get_id(ndns) == id)
51562306a36Sopenharmony_ci                          return ndns;
51662306a36Sopenharmony_ci
51762306a36Sopenharmony_ci          return NULL;
51862306a36Sopenharmony_ci  }
51962306a36Sopenharmony_ci
52062306a36Sopenharmony_ciLIBNDCTL: namespace creation example
52162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
52262306a36Sopenharmony_ci
52362306a36Sopenharmony_ciIdle namespaces are automatically created by the kernel if a given
52462306a36Sopenharmony_ciregion has enough available capacity to create a new namespace.
52562306a36Sopenharmony_ciNamespace instantiation involves finding an idle namespace and
52662306a36Sopenharmony_ciconfiguring it.  For the most part the setting of namespace attributes
52762306a36Sopenharmony_cican occur in any order, the only constraint is that 'uuid' must be set
52862306a36Sopenharmony_cibefore 'size'.  This enables the kernel to track DPA allocations
52962306a36Sopenharmony_ciinternally with a static identifier::
53062306a36Sopenharmony_ci
53162306a36Sopenharmony_ci  static int configure_namespace(struct ndctl_region *region,
53262306a36Sopenharmony_ci                  struct ndctl_namespace *ndns,
53362306a36Sopenharmony_ci                  struct namespace_parameters *parameters)
53462306a36Sopenharmony_ci  {
53562306a36Sopenharmony_ci          char devname[50];
53662306a36Sopenharmony_ci
53762306a36Sopenharmony_ci          snprintf(devname, sizeof(devname), "namespace%d.%d",
53862306a36Sopenharmony_ci                          ndctl_region_get_id(region), paramaters->id);
53962306a36Sopenharmony_ci
54062306a36Sopenharmony_ci          ndctl_namespace_set_alt_name(ndns, devname);
54162306a36Sopenharmony_ci          /* 'uuid' must be set prior to setting size! */
54262306a36Sopenharmony_ci          ndctl_namespace_set_uuid(ndns, paramaters->uuid);
54362306a36Sopenharmony_ci          ndctl_namespace_set_size(ndns, paramaters->size);
54462306a36Sopenharmony_ci          /* unlike pmem namespaces, blk namespaces have a sector size */
54562306a36Sopenharmony_ci          if (parameters->lbasize)
54662306a36Sopenharmony_ci                  ndctl_namespace_set_sector_size(ndns, parameters->lbasize);
54762306a36Sopenharmony_ci          ndctl_namespace_enable(ndns);
54862306a36Sopenharmony_ci  }
54962306a36Sopenharmony_ci
55062306a36Sopenharmony_ci
55162306a36Sopenharmony_ciWhy the Term "namespace"?
55262306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^
55362306a36Sopenharmony_ci
55462306a36Sopenharmony_ci    1. Why not "volume" for instance?  "volume" ran the risk of confusing
55562306a36Sopenharmony_ci       ND (libnvdimm subsystem) to a volume manager like device-mapper.
55662306a36Sopenharmony_ci
55762306a36Sopenharmony_ci    2. The term originated to describe the sub-devices that can be created
55862306a36Sopenharmony_ci       within a NVME controller (see the nvme specification:
55962306a36Sopenharmony_ci       https://www.nvmexpress.org/specifications/), and NFIT namespaces are
56062306a36Sopenharmony_ci       meant to parallel the capabilities and configurability of
56162306a36Sopenharmony_ci       NVME-namespaces.
56262306a36Sopenharmony_ci
56362306a36Sopenharmony_ci
56462306a36Sopenharmony_ciLIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
56562306a36Sopenharmony_ci-------------------------------------------------
56662306a36Sopenharmony_ci
56762306a36Sopenharmony_ciA BTT (design document: https://pmem.io/2014/09/23/btt.html) is a
56862306a36Sopenharmony_cipersonality driver for a namespace that fronts entire namespace as an
56962306a36Sopenharmony_ci'address abstraction'.
57062306a36Sopenharmony_ci
57162306a36Sopenharmony_ciLIBNVDIMM: btt layout
57262306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^
57362306a36Sopenharmony_ci
57462306a36Sopenharmony_ciEvery region will start out with at least one BTT device which is the
57562306a36Sopenharmony_ciseed device.  To activate it set the "namespace", "uuid", and
57662306a36Sopenharmony_ci"sector_size" attributes and then bind the device to the nd_pmem or
57762306a36Sopenharmony_cind_blk driver depending on the region type::
57862306a36Sopenharmony_ci
57962306a36Sopenharmony_ci	/sys/devices/platform/nfit_test.1/ndbus0/region0/btt0/
58062306a36Sopenharmony_ci	|-- namespace
58162306a36Sopenharmony_ci	|-- delete
58262306a36Sopenharmony_ci	|-- devtype
58362306a36Sopenharmony_ci	|-- modalias
58462306a36Sopenharmony_ci	|-- numa_node
58562306a36Sopenharmony_ci	|-- sector_size
58662306a36Sopenharmony_ci	|-- subsystem -> ../../../../../bus/nd
58762306a36Sopenharmony_ci	|-- uevent
58862306a36Sopenharmony_ci	`-- uuid
58962306a36Sopenharmony_ci
59062306a36Sopenharmony_ciLIBNDCTL: btt creation example
59162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
59262306a36Sopenharmony_ci
59362306a36Sopenharmony_ciSimilar to namespaces an idle BTT device is automatically created per
59462306a36Sopenharmony_ciregion.  Each time this "seed" btt device is configured and enabled a new
59562306a36Sopenharmony_ciseed is created.  Creating a BTT configuration involves two steps of
59662306a36Sopenharmony_cifinding and idle BTT and assigning it to consume a namespace.
59762306a36Sopenharmony_ci
59862306a36Sopenharmony_ci::
59962306a36Sopenharmony_ci
60062306a36Sopenharmony_ci	static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
60162306a36Sopenharmony_ci	{
60262306a36Sopenharmony_ci		struct ndctl_btt *btt;
60362306a36Sopenharmony_ci
60462306a36Sopenharmony_ci		ndctl_btt_foreach(region, btt)
60562306a36Sopenharmony_ci			if (!ndctl_btt_is_enabled(btt)
60662306a36Sopenharmony_ci					&& !ndctl_btt_is_configured(btt))
60762306a36Sopenharmony_ci				return btt;
60862306a36Sopenharmony_ci
60962306a36Sopenharmony_ci		return NULL;
61062306a36Sopenharmony_ci	}
61162306a36Sopenharmony_ci
61262306a36Sopenharmony_ci	static int configure_btt(struct ndctl_region *region,
61362306a36Sopenharmony_ci			struct btt_parameters *parameters)
61462306a36Sopenharmony_ci	{
61562306a36Sopenharmony_ci		btt = get_idle_btt(region);
61662306a36Sopenharmony_ci
61762306a36Sopenharmony_ci		ndctl_btt_set_uuid(btt, parameters->uuid);
61862306a36Sopenharmony_ci		ndctl_btt_set_sector_size(btt, parameters->sector_size);
61962306a36Sopenharmony_ci		ndctl_btt_set_namespace(btt, parameters->ndns);
62062306a36Sopenharmony_ci		/* turn off raw mode device */
62162306a36Sopenharmony_ci		ndctl_namespace_disable(parameters->ndns);
62262306a36Sopenharmony_ci		/* turn on btt access */
62362306a36Sopenharmony_ci		ndctl_btt_enable(btt);
62462306a36Sopenharmony_ci	}
62562306a36Sopenharmony_ci
62662306a36Sopenharmony_ciOnce instantiated a new inactive btt seed device will appear underneath
62762306a36Sopenharmony_cithe region.
62862306a36Sopenharmony_ci
62962306a36Sopenharmony_ciOnce a "namespace" is removed from a BTT that instance of the BTT device
63062306a36Sopenharmony_ciwill be deleted or otherwise reset to default values.  This deletion is
63162306a36Sopenharmony_cionly at the device model level.  In order to destroy a BTT the "info
63262306a36Sopenharmony_ciblock" needs to be destroyed.  Note, that to destroy a BTT the media
63362306a36Sopenharmony_cineeds to be written in raw mode.  By default, the kernel will autodetect
63462306a36Sopenharmony_cithe presence of a BTT and disable raw mode.  This autodetect behavior
63562306a36Sopenharmony_cican be suppressed by enabling raw mode for the namespace via the
63662306a36Sopenharmony_cindctl_namespace_set_raw_mode() API.
63762306a36Sopenharmony_ci
63862306a36Sopenharmony_ci
63962306a36Sopenharmony_ciSummary LIBNDCTL Diagram
64062306a36Sopenharmony_ci------------------------
64162306a36Sopenharmony_ci
64262306a36Sopenharmony_ciFor the given example above, here is the view of the objects as seen by the
64362306a36Sopenharmony_ciLIBNDCTL API::
64462306a36Sopenharmony_ci
64562306a36Sopenharmony_ci              +---+
64662306a36Sopenharmony_ci              |CTX|
64762306a36Sopenharmony_ci              +-+-+
64862306a36Sopenharmony_ci                |
64962306a36Sopenharmony_ci  +-------+     |
65062306a36Sopenharmony_ci  | DIMM0 <-+   |      +---------+   +--------------+  +---------------+
65162306a36Sopenharmony_ci  +-------+ |   |    +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
65262306a36Sopenharmony_ci  | DIMM1 <-+ +-v--+ | +---------+   +--------------+  +---------------+
65362306a36Sopenharmony_ci  +-------+ +-+BUS0+-| +---------+   +--------------+  +----------------------+
65462306a36Sopenharmony_ci  | DIMM2 <-+ +----+ +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" | BTT1 |
65562306a36Sopenharmony_ci  +-------+ |        | +---------+   +--------------+  +---------------+------+
65662306a36Sopenharmony_ci  | DIMM3 <-+
65762306a36Sopenharmony_ci  +-------+
658