18c2ecf20Sopenharmony_ci.. _memory_hotplug:
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci==============
48c2ecf20Sopenharmony_ciMemory hotplug
58c2ecf20Sopenharmony_ci==============
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ciMemory hotplug event notifier
88c2ecf20Sopenharmony_ci=============================
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciHotplugging events are sent to a notification queue.
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciThere are six types of notification defined in ``include/linux/memory.h``:
138c2ecf20Sopenharmony_ci
148c2ecf20Sopenharmony_ciMEM_GOING_ONLINE
158c2ecf20Sopenharmony_ci  Generated before new memory becomes available in order to be able to
168c2ecf20Sopenharmony_ci  prepare subsystems to handle memory. The page allocator is still unable
178c2ecf20Sopenharmony_ci  to allocate from the new memory.
188c2ecf20Sopenharmony_ci
198c2ecf20Sopenharmony_ciMEM_CANCEL_ONLINE
208c2ecf20Sopenharmony_ci  Generated if MEM_GOING_ONLINE fails.
218c2ecf20Sopenharmony_ci
228c2ecf20Sopenharmony_ciMEM_ONLINE
238c2ecf20Sopenharmony_ci  Generated when memory has successfully brought online. The callback may
248c2ecf20Sopenharmony_ci  allocate pages from the new memory.
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ciMEM_GOING_OFFLINE
278c2ecf20Sopenharmony_ci  Generated to begin the process of offlining memory. Allocations are no
288c2ecf20Sopenharmony_ci  longer possible from the memory but some of the memory to be offlined
298c2ecf20Sopenharmony_ci  is still in use. The callback can be used to free memory known to a
308c2ecf20Sopenharmony_ci  subsystem from the indicated memory block.
318c2ecf20Sopenharmony_ci
328c2ecf20Sopenharmony_ciMEM_CANCEL_OFFLINE
338c2ecf20Sopenharmony_ci  Generated if MEM_GOING_OFFLINE fails. Memory is available again from
348c2ecf20Sopenharmony_ci  the memory block that we attempted to offline.
358c2ecf20Sopenharmony_ci
368c2ecf20Sopenharmony_ciMEM_OFFLINE
378c2ecf20Sopenharmony_ci  Generated after offlining memory is complete.
388c2ecf20Sopenharmony_ci
398c2ecf20Sopenharmony_ciA callback routine can be registered by calling::
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_ci  hotplug_memory_notifier(callback_func, priority)
428c2ecf20Sopenharmony_ci
438c2ecf20Sopenharmony_ciCallback functions with higher values of priority are called before callback
448c2ecf20Sopenharmony_cifunctions with lower values.
458c2ecf20Sopenharmony_ci
468c2ecf20Sopenharmony_ciA callback function must have the following prototype::
478c2ecf20Sopenharmony_ci
488c2ecf20Sopenharmony_ci  int callback_func(
498c2ecf20Sopenharmony_ci    struct notifier_block *self, unsigned long action, void *arg);
508c2ecf20Sopenharmony_ci
518c2ecf20Sopenharmony_ciThe first argument of the callback function (self) is a pointer to the block
528c2ecf20Sopenharmony_ciof the notifier chain that points to the callback function itself.
538c2ecf20Sopenharmony_ciThe second argument (action) is one of the event types described above.
548c2ecf20Sopenharmony_ciThe third argument (arg) passes a pointer of struct memory_notify::
558c2ecf20Sopenharmony_ci
568c2ecf20Sopenharmony_ci	struct memory_notify {
578c2ecf20Sopenharmony_ci		unsigned long start_pfn;
588c2ecf20Sopenharmony_ci		unsigned long nr_pages;
598c2ecf20Sopenharmony_ci		int status_change_nid_normal;
608c2ecf20Sopenharmony_ci		int status_change_nid_high;
618c2ecf20Sopenharmony_ci		int status_change_nid;
628c2ecf20Sopenharmony_ci	}
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ci- start_pfn is start_pfn of online/offline memory.
658c2ecf20Sopenharmony_ci- nr_pages is # of pages of online/offline memory.
668c2ecf20Sopenharmony_ci- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
678c2ecf20Sopenharmony_ci  is (will be) set/clear, if this is -1, then nodemask status is not changed.
688c2ecf20Sopenharmony_ci- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
698c2ecf20Sopenharmony_ci  is (will be) set/clear, if this is -1, then nodemask status is not changed.
708c2ecf20Sopenharmony_ci- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
718c2ecf20Sopenharmony_ci  set/clear. It means a new(memoryless) node gets new memory by online and a
728c2ecf20Sopenharmony_ci  node loses all memory. If this is -1, then nodemask status is not changed.
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ci  If status_changed_nid* >= 0, callback should create/discard structures for the
758c2ecf20Sopenharmony_ci  node if necessary.
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ciThe callback routine shall return one of the values
788c2ecf20Sopenharmony_ciNOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
798c2ecf20Sopenharmony_cidefined in ``include/linux/notifier.h``
808c2ecf20Sopenharmony_ci
818c2ecf20Sopenharmony_ciNOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_ciNOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
848c2ecf20Sopenharmony_ciMEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
858c2ecf20Sopenharmony_cifurther processing of the notification queue.
868c2ecf20Sopenharmony_ci
878c2ecf20Sopenharmony_ciNOTIFY_STOP stops further processing of the notification queue.
888c2ecf20Sopenharmony_ci
898c2ecf20Sopenharmony_ciLocking Internals
908c2ecf20Sopenharmony_ci=================
918c2ecf20Sopenharmony_ci
928c2ecf20Sopenharmony_ciWhen adding/removing memory that uses memory block devices (i.e. ordinary RAM),
938c2ecf20Sopenharmony_cithe device_hotplug_lock should be held to:
948c2ecf20Sopenharmony_ci
958c2ecf20Sopenharmony_ci- synchronize against online/offline requests (e.g. via sysfs). This way, memory
968c2ecf20Sopenharmony_ci  block devices can only be accessed (.online/.state attributes) by user
978c2ecf20Sopenharmony_ci  space once memory has been fully added. And when removing memory, we
988c2ecf20Sopenharmony_ci  know nobody is in critical sections.
998c2ecf20Sopenharmony_ci- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
1008c2ecf20Sopenharmony_ci
1018c2ecf20Sopenharmony_ciEspecially, there is a possible lock inversion that is avoided using
1028c2ecf20Sopenharmony_cidevice_hotplug_lock when adding memory and user space tries to online that
1038c2ecf20Sopenharmony_cimemory faster than expected:
1048c2ecf20Sopenharmony_ci
1058c2ecf20Sopenharmony_ci- device_online() will first take the device_lock(), followed by
1068c2ecf20Sopenharmony_ci  mem_hotplug_lock
1078c2ecf20Sopenharmony_ci- add_memory_resource() will first take the mem_hotplug_lock, followed by
1088c2ecf20Sopenharmony_ci  the device_lock() (while creating the devices, during bus_add_device()).
1098c2ecf20Sopenharmony_ci
1108c2ecf20Sopenharmony_ciAs the device is visible to user space before taking the device_lock(), this
1118c2ecf20Sopenharmony_cican result in a lock inversion.
1128c2ecf20Sopenharmony_ci
1138c2ecf20Sopenharmony_cionlining/offlining of memory should be done via device_online()/
1148c2ecf20Sopenharmony_cidevice_offline() - to make sure it is properly synchronized to actions
1158c2ecf20Sopenharmony_civia sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
1168c2ecf20Sopenharmony_ci
1178c2ecf20Sopenharmony_ciWhen adding/removing/onlining/offlining memory or adding/removing
1188c2ecf20Sopenharmony_ciheterogeneous/device memory, we should always hold the mem_hotplug_lock in
1198c2ecf20Sopenharmony_ciwrite mode to serialise memory hotplug (e.g. access to global/zone
1208c2ecf20Sopenharmony_civariables).
1218c2ecf20Sopenharmony_ci
1228c2ecf20Sopenharmony_ciIn addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
1238c2ecf20Sopenharmony_cimode allows for a quite efficient get_online_mems/put_online_mems
1248c2ecf20Sopenharmony_ciimplementation, so code accessing memory can protect from that memory
1258c2ecf20Sopenharmony_civanishing.
126