18c2ecf20Sopenharmony_ci.. _memory_hotplug: 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci============== 48c2ecf20Sopenharmony_ciMemory hotplug 58c2ecf20Sopenharmony_ci============== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciMemory hotplug event notifier 88c2ecf20Sopenharmony_ci============================= 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ciHotplugging events are sent to a notification queue. 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ciThere are six types of notification defined in ``include/linux/memory.h``: 138c2ecf20Sopenharmony_ci 148c2ecf20Sopenharmony_ciMEM_GOING_ONLINE 158c2ecf20Sopenharmony_ci Generated before new memory becomes available in order to be able to 168c2ecf20Sopenharmony_ci prepare subsystems to handle memory. The page allocator is still unable 178c2ecf20Sopenharmony_ci to allocate from the new memory. 188c2ecf20Sopenharmony_ci 198c2ecf20Sopenharmony_ciMEM_CANCEL_ONLINE 208c2ecf20Sopenharmony_ci Generated if MEM_GOING_ONLINE fails. 218c2ecf20Sopenharmony_ci 228c2ecf20Sopenharmony_ciMEM_ONLINE 238c2ecf20Sopenharmony_ci Generated when memory has successfully brought online. The callback may 248c2ecf20Sopenharmony_ci allocate pages from the new memory. 258c2ecf20Sopenharmony_ci 268c2ecf20Sopenharmony_ciMEM_GOING_OFFLINE 278c2ecf20Sopenharmony_ci Generated to begin the process of offlining memory. Allocations are no 288c2ecf20Sopenharmony_ci longer possible from the memory but some of the memory to be offlined 298c2ecf20Sopenharmony_ci is still in use. The callback can be used to free memory known to a 308c2ecf20Sopenharmony_ci subsystem from the indicated memory block. 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_ciMEM_CANCEL_OFFLINE 338c2ecf20Sopenharmony_ci Generated if MEM_GOING_OFFLINE fails. Memory is available again from 348c2ecf20Sopenharmony_ci the memory block that we attempted to offline. 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ciMEM_OFFLINE 378c2ecf20Sopenharmony_ci Generated after offlining memory is complete. 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_ciA callback routine can be registered by calling:: 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_ci hotplug_memory_notifier(callback_func, priority) 428c2ecf20Sopenharmony_ci 438c2ecf20Sopenharmony_ciCallback functions with higher values of priority are called before callback 448c2ecf20Sopenharmony_cifunctions with lower values. 458c2ecf20Sopenharmony_ci 468c2ecf20Sopenharmony_ciA callback function must have the following prototype:: 478c2ecf20Sopenharmony_ci 488c2ecf20Sopenharmony_ci int callback_func( 498c2ecf20Sopenharmony_ci struct notifier_block *self, unsigned long action, void *arg); 508c2ecf20Sopenharmony_ci 518c2ecf20Sopenharmony_ciThe first argument of the callback function (self) is a pointer to the block 528c2ecf20Sopenharmony_ciof the notifier chain that points to the callback function itself. 538c2ecf20Sopenharmony_ciThe second argument (action) is one of the event types described above. 548c2ecf20Sopenharmony_ciThe third argument (arg) passes a pointer of struct memory_notify:: 558c2ecf20Sopenharmony_ci 568c2ecf20Sopenharmony_ci struct memory_notify { 578c2ecf20Sopenharmony_ci unsigned long start_pfn; 588c2ecf20Sopenharmony_ci unsigned long nr_pages; 598c2ecf20Sopenharmony_ci int status_change_nid_normal; 608c2ecf20Sopenharmony_ci int status_change_nid_high; 618c2ecf20Sopenharmony_ci int status_change_nid; 628c2ecf20Sopenharmony_ci } 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ci- start_pfn is start_pfn of online/offline memory. 658c2ecf20Sopenharmony_ci- nr_pages is # of pages of online/offline memory. 668c2ecf20Sopenharmony_ci- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask 678c2ecf20Sopenharmony_ci is (will be) set/clear, if this is -1, then nodemask status is not changed. 688c2ecf20Sopenharmony_ci- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask 698c2ecf20Sopenharmony_ci is (will be) set/clear, if this is -1, then nodemask status is not changed. 708c2ecf20Sopenharmony_ci- status_change_nid is set node id when N_MEMORY of nodemask is (will be) 718c2ecf20Sopenharmony_ci set/clear. It means a new(memoryless) node gets new memory by online and a 728c2ecf20Sopenharmony_ci node loses all memory. If this is -1, then nodemask status is not changed. 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_ci If status_changed_nid* >= 0, callback should create/discard structures for the 758c2ecf20Sopenharmony_ci node if necessary. 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciThe callback routine shall return one of the values 788c2ecf20Sopenharmony_ciNOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP 798c2ecf20Sopenharmony_cidefined in ``include/linux/notifier.h`` 808c2ecf20Sopenharmony_ci 818c2ecf20Sopenharmony_ciNOTIFY_DONE and NOTIFY_OK have no effect on the further processing. 828c2ecf20Sopenharmony_ci 838c2ecf20Sopenharmony_ciNOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, 848c2ecf20Sopenharmony_ciMEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops 858c2ecf20Sopenharmony_cifurther processing of the notification queue. 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ciNOTIFY_STOP stops further processing of the notification queue. 888c2ecf20Sopenharmony_ci 898c2ecf20Sopenharmony_ciLocking Internals 908c2ecf20Sopenharmony_ci================= 918c2ecf20Sopenharmony_ci 928c2ecf20Sopenharmony_ciWhen adding/removing memory that uses memory block devices (i.e. ordinary RAM), 938c2ecf20Sopenharmony_cithe device_hotplug_lock should be held to: 948c2ecf20Sopenharmony_ci 958c2ecf20Sopenharmony_ci- synchronize against online/offline requests (e.g. via sysfs). This way, memory 968c2ecf20Sopenharmony_ci block devices can only be accessed (.online/.state attributes) by user 978c2ecf20Sopenharmony_ci space once memory has been fully added. And when removing memory, we 988c2ecf20Sopenharmony_ci know nobody is in critical sections. 998c2ecf20Sopenharmony_ci- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) 1008c2ecf20Sopenharmony_ci 1018c2ecf20Sopenharmony_ciEspecially, there is a possible lock inversion that is avoided using 1028c2ecf20Sopenharmony_cidevice_hotplug_lock when adding memory and user space tries to online that 1038c2ecf20Sopenharmony_cimemory faster than expected: 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ci- device_online() will first take the device_lock(), followed by 1068c2ecf20Sopenharmony_ci mem_hotplug_lock 1078c2ecf20Sopenharmony_ci- add_memory_resource() will first take the mem_hotplug_lock, followed by 1088c2ecf20Sopenharmony_ci the device_lock() (while creating the devices, during bus_add_device()). 1098c2ecf20Sopenharmony_ci 1108c2ecf20Sopenharmony_ciAs the device is visible to user space before taking the device_lock(), this 1118c2ecf20Sopenharmony_cican result in a lock inversion. 1128c2ecf20Sopenharmony_ci 1138c2ecf20Sopenharmony_cionlining/offlining of memory should be done via device_online()/ 1148c2ecf20Sopenharmony_cidevice_offline() - to make sure it is properly synchronized to actions 1158c2ecf20Sopenharmony_civia sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) 1168c2ecf20Sopenharmony_ci 1178c2ecf20Sopenharmony_ciWhen adding/removing/onlining/offlining memory or adding/removing 1188c2ecf20Sopenharmony_ciheterogeneous/device memory, we should always hold the mem_hotplug_lock in 1198c2ecf20Sopenharmony_ciwrite mode to serialise memory hotplug (e.g. access to global/zone 1208c2ecf20Sopenharmony_civariables). 1218c2ecf20Sopenharmony_ci 1228c2ecf20Sopenharmony_ciIn addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read 1238c2ecf20Sopenharmony_cimode allows for a quite efficient get_online_mems/put_online_mems 1248c2ecf20Sopenharmony_ciimplementation, so code accessing memory can protect from that memory 1258c2ecf20Sopenharmony_civanishing. 126