162306a36Sopenharmony_ci.. _memory_hotplug: 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci============== 462306a36Sopenharmony_ciMemory hotplug 562306a36Sopenharmony_ci============== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciMemory hotplug event notifier 862306a36Sopenharmony_ci============================= 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciHotplugging events are sent to a notification queue. 1162306a36Sopenharmony_ci 1262306a36Sopenharmony_ciThere are six types of notification defined in ``include/linux/memory.h``: 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ciMEM_GOING_ONLINE 1562306a36Sopenharmony_ci Generated before new memory becomes available in order to be able to 1662306a36Sopenharmony_ci prepare subsystems to handle memory. The page allocator is still unable 1762306a36Sopenharmony_ci to allocate from the new memory. 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciMEM_CANCEL_ONLINE 2062306a36Sopenharmony_ci Generated if MEM_GOING_ONLINE fails. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciMEM_ONLINE 2362306a36Sopenharmony_ci Generated when memory has successfully brought online. The callback may 2462306a36Sopenharmony_ci allocate pages from the new memory. 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ciMEM_GOING_OFFLINE 2762306a36Sopenharmony_ci Generated to begin the process of offlining memory. Allocations are no 2862306a36Sopenharmony_ci longer possible from the memory but some of the memory to be offlined 2962306a36Sopenharmony_ci is still in use. The callback can be used to free memory known to a 3062306a36Sopenharmony_ci subsystem from the indicated memory block. 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ciMEM_CANCEL_OFFLINE 3362306a36Sopenharmony_ci Generated if MEM_GOING_OFFLINE fails. Memory is available again from 3462306a36Sopenharmony_ci the memory block that we attempted to offline. 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ciMEM_OFFLINE 3762306a36Sopenharmony_ci Generated after offlining memory is complete. 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ciA callback routine can be registered by calling:: 4062306a36Sopenharmony_ci 4162306a36Sopenharmony_ci hotplug_memory_notifier(callback_func, priority) 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciCallback functions with higher values of priority are called before callback 4462306a36Sopenharmony_cifunctions with lower values. 4562306a36Sopenharmony_ci 4662306a36Sopenharmony_ciA callback function must have the following prototype:: 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ci int callback_func( 4962306a36Sopenharmony_ci struct notifier_block *self, unsigned long action, void *arg); 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ciThe first argument of the callback function (self) is a pointer to the block 5262306a36Sopenharmony_ciof the notifier chain that points to the callback function itself. 5362306a36Sopenharmony_ciThe second argument (action) is one of the event types described above. 5462306a36Sopenharmony_ciThe third argument (arg) passes a pointer of struct memory_notify:: 5562306a36Sopenharmony_ci 5662306a36Sopenharmony_ci struct memory_notify { 5762306a36Sopenharmony_ci unsigned long start_pfn; 5862306a36Sopenharmony_ci unsigned long nr_pages; 5962306a36Sopenharmony_ci int status_change_nid_normal; 6062306a36Sopenharmony_ci int status_change_nid; 6162306a36Sopenharmony_ci } 6262306a36Sopenharmony_ci 6362306a36Sopenharmony_ci- start_pfn is start_pfn of online/offline memory. 6462306a36Sopenharmony_ci- nr_pages is # of pages of online/offline memory. 6562306a36Sopenharmony_ci- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask 6662306a36Sopenharmony_ci is (will be) set/clear, if this is -1, then nodemask status is not changed. 6762306a36Sopenharmony_ci- status_change_nid is set node id when N_MEMORY of nodemask is (will be) 6862306a36Sopenharmony_ci set/clear. It means a new(memoryless) node gets new memory by online and a 6962306a36Sopenharmony_ci node loses all memory. If this is -1, then nodemask status is not changed. 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ci If status_changed_nid* >= 0, callback should create/discard structures for the 7262306a36Sopenharmony_ci node if necessary. 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ciThe callback routine shall return one of the values 7562306a36Sopenharmony_ciNOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP 7662306a36Sopenharmony_cidefined in ``include/linux/notifier.h`` 7762306a36Sopenharmony_ci 7862306a36Sopenharmony_ciNOTIFY_DONE and NOTIFY_OK have no effect on the further processing. 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ciNOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, 8162306a36Sopenharmony_ciMEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops 8262306a36Sopenharmony_cifurther processing of the notification queue. 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ciNOTIFY_STOP stops further processing of the notification queue. 8562306a36Sopenharmony_ci 8662306a36Sopenharmony_ciLocking Internals 8762306a36Sopenharmony_ci================= 8862306a36Sopenharmony_ci 8962306a36Sopenharmony_ciWhen adding/removing memory that uses memory block devices (i.e. ordinary RAM), 9062306a36Sopenharmony_cithe device_hotplug_lock should be held to: 9162306a36Sopenharmony_ci 9262306a36Sopenharmony_ci- synchronize against online/offline requests (e.g. via sysfs). This way, memory 9362306a36Sopenharmony_ci block devices can only be accessed (.online/.state attributes) by user 9462306a36Sopenharmony_ci space once memory has been fully added. And when removing memory, we 9562306a36Sopenharmony_ci know nobody is in critical sections. 9662306a36Sopenharmony_ci- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciEspecially, there is a possible lock inversion that is avoided using 9962306a36Sopenharmony_cidevice_hotplug_lock when adding memory and user space tries to online that 10062306a36Sopenharmony_cimemory faster than expected: 10162306a36Sopenharmony_ci 10262306a36Sopenharmony_ci- device_online() will first take the device_lock(), followed by 10362306a36Sopenharmony_ci mem_hotplug_lock 10462306a36Sopenharmony_ci- add_memory_resource() will first take the mem_hotplug_lock, followed by 10562306a36Sopenharmony_ci the device_lock() (while creating the devices, during bus_add_device()). 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ciAs the device is visible to user space before taking the device_lock(), this 10862306a36Sopenharmony_cican result in a lock inversion. 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_cionlining/offlining of memory should be done via device_online()/ 11162306a36Sopenharmony_cidevice_offline() - to make sure it is properly synchronized to actions 11262306a36Sopenharmony_civia sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) 11362306a36Sopenharmony_ci 11462306a36Sopenharmony_ciWhen adding/removing/onlining/offlining memory or adding/removing 11562306a36Sopenharmony_ciheterogeneous/device memory, we should always hold the mem_hotplug_lock in 11662306a36Sopenharmony_ciwrite mode to serialise memory hotplug (e.g. access to global/zone 11762306a36Sopenharmony_civariables). 11862306a36Sopenharmony_ci 11962306a36Sopenharmony_ciIn addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read 12062306a36Sopenharmony_cimode allows for a quite efficient get_online_mems/put_online_mems 12162306a36Sopenharmony_ciimplementation, so code accessing memory can protect from that memory 12262306a36Sopenharmony_civanishing. 123