162306a36Sopenharmony_ci.. _memory_hotplug:
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci==============
462306a36Sopenharmony_ciMemory hotplug
562306a36Sopenharmony_ci==============
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciMemory hotplug event notifier
862306a36Sopenharmony_ci=============================
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciHotplugging events are sent to a notification queue.
1162306a36Sopenharmony_ci
1262306a36Sopenharmony_ciThere are six types of notification defined in ``include/linux/memory.h``:
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_ciMEM_GOING_ONLINE
1562306a36Sopenharmony_ci  Generated before new memory becomes available in order to be able to
1662306a36Sopenharmony_ci  prepare subsystems to handle memory. The page allocator is still unable
1762306a36Sopenharmony_ci  to allocate from the new memory.
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciMEM_CANCEL_ONLINE
2062306a36Sopenharmony_ci  Generated if MEM_GOING_ONLINE fails.
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciMEM_ONLINE
2362306a36Sopenharmony_ci  Generated when memory has successfully brought online. The callback may
2462306a36Sopenharmony_ci  allocate pages from the new memory.
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ciMEM_GOING_OFFLINE
2762306a36Sopenharmony_ci  Generated to begin the process of offlining memory. Allocations are no
2862306a36Sopenharmony_ci  longer possible from the memory but some of the memory to be offlined
2962306a36Sopenharmony_ci  is still in use. The callback can be used to free memory known to a
3062306a36Sopenharmony_ci  subsystem from the indicated memory block.
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ciMEM_CANCEL_OFFLINE
3362306a36Sopenharmony_ci  Generated if MEM_GOING_OFFLINE fails. Memory is available again from
3462306a36Sopenharmony_ci  the memory block that we attempted to offline.
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ciMEM_OFFLINE
3762306a36Sopenharmony_ci  Generated after offlining memory is complete.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciA callback routine can be registered by calling::
4062306a36Sopenharmony_ci
4162306a36Sopenharmony_ci  hotplug_memory_notifier(callback_func, priority)
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciCallback functions with higher values of priority are called before callback
4462306a36Sopenharmony_cifunctions with lower values.
4562306a36Sopenharmony_ci
4662306a36Sopenharmony_ciA callback function must have the following prototype::
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ci  int callback_func(
4962306a36Sopenharmony_ci    struct notifier_block *self, unsigned long action, void *arg);
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ciThe first argument of the callback function (self) is a pointer to the block
5262306a36Sopenharmony_ciof the notifier chain that points to the callback function itself.
5362306a36Sopenharmony_ciThe second argument (action) is one of the event types described above.
5462306a36Sopenharmony_ciThe third argument (arg) passes a pointer of struct memory_notify::
5562306a36Sopenharmony_ci
5662306a36Sopenharmony_ci	struct memory_notify {
5762306a36Sopenharmony_ci		unsigned long start_pfn;
5862306a36Sopenharmony_ci		unsigned long nr_pages;
5962306a36Sopenharmony_ci		int status_change_nid_normal;
6062306a36Sopenharmony_ci		int status_change_nid;
6162306a36Sopenharmony_ci	}
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ci- start_pfn is start_pfn of online/offline memory.
6462306a36Sopenharmony_ci- nr_pages is # of pages of online/offline memory.
6562306a36Sopenharmony_ci- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
6662306a36Sopenharmony_ci  is (will be) set/clear, if this is -1, then nodemask status is not changed.
6762306a36Sopenharmony_ci- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
6862306a36Sopenharmony_ci  set/clear. It means a new(memoryless) node gets new memory by online and a
6962306a36Sopenharmony_ci  node loses all memory. If this is -1, then nodemask status is not changed.
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ci  If status_changed_nid* >= 0, callback should create/discard structures for the
7262306a36Sopenharmony_ci  node if necessary.
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ciThe callback routine shall return one of the values
7562306a36Sopenharmony_ciNOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
7662306a36Sopenharmony_cidefined in ``include/linux/notifier.h``
7762306a36Sopenharmony_ci
7862306a36Sopenharmony_ciNOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ciNOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
8162306a36Sopenharmony_ciMEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
8262306a36Sopenharmony_cifurther processing of the notification queue.
8362306a36Sopenharmony_ci
8462306a36Sopenharmony_ciNOTIFY_STOP stops further processing of the notification queue.
8562306a36Sopenharmony_ci
8662306a36Sopenharmony_ciLocking Internals
8762306a36Sopenharmony_ci=================
8862306a36Sopenharmony_ci
8962306a36Sopenharmony_ciWhen adding/removing memory that uses memory block devices (i.e. ordinary RAM),
9062306a36Sopenharmony_cithe device_hotplug_lock should be held to:
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ci- synchronize against online/offline requests (e.g. via sysfs). This way, memory
9362306a36Sopenharmony_ci  block devices can only be accessed (.online/.state attributes) by user
9462306a36Sopenharmony_ci  space once memory has been fully added. And when removing memory, we
9562306a36Sopenharmony_ci  know nobody is in critical sections.
9662306a36Sopenharmony_ci- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ciEspecially, there is a possible lock inversion that is avoided using
9962306a36Sopenharmony_cidevice_hotplug_lock when adding memory and user space tries to online that
10062306a36Sopenharmony_cimemory faster than expected:
10162306a36Sopenharmony_ci
10262306a36Sopenharmony_ci- device_online() will first take the device_lock(), followed by
10362306a36Sopenharmony_ci  mem_hotplug_lock
10462306a36Sopenharmony_ci- add_memory_resource() will first take the mem_hotplug_lock, followed by
10562306a36Sopenharmony_ci  the device_lock() (while creating the devices, during bus_add_device()).
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ciAs the device is visible to user space before taking the device_lock(), this
10862306a36Sopenharmony_cican result in a lock inversion.
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_cionlining/offlining of memory should be done via device_online()/
11162306a36Sopenharmony_cidevice_offline() - to make sure it is properly synchronized to actions
11262306a36Sopenharmony_civia sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
11362306a36Sopenharmony_ci
11462306a36Sopenharmony_ciWhen adding/removing/onlining/offlining memory or adding/removing
11562306a36Sopenharmony_ciheterogeneous/device memory, we should always hold the mem_hotplug_lock in
11662306a36Sopenharmony_ciwrite mode to serialise memory hotplug (e.g. access to global/zone
11762306a36Sopenharmony_civariables).
11862306a36Sopenharmony_ci
11962306a36Sopenharmony_ciIn addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
12062306a36Sopenharmony_cimode allows for a quite efficient get_online_mems/put_online_mems
12162306a36Sopenharmony_ciimplementation, so code accessing memory can protect from that memory
12262306a36Sopenharmony_civanishing.
123