18c2ecf20Sopenharmony_ci.. _swap_numa:
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci===========================================
48c2ecf20Sopenharmony_ciAutomatically bind swap device to numa node
58c2ecf20Sopenharmony_ci===========================================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ciIf the system has more than one swap device and swap device has the node
88c2ecf20Sopenharmony_ciinformation, we can make use of this information to decide which swap
98c2ecf20Sopenharmony_cidevice to use in get_swap_pages() to get better performance.
108c2ecf20Sopenharmony_ci
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciHow to use this feature
138c2ecf20Sopenharmony_ci=======================
148c2ecf20Sopenharmony_ci
158c2ecf20Sopenharmony_ciSwap device has priority and that decides the order of it to be used. To make
168c2ecf20Sopenharmony_ciuse of automatically binding, there is no need to manipulate priority settings
178c2ecf20Sopenharmony_cifor swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
188c2ecf20Sopenharmony_ciswapB, with swapA attached to node 0 and swapB attached to node 1, are going
198c2ecf20Sopenharmony_cito be swapped on. Simply swapping them on by doing::
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_ci	# swapon /dev/swapA
228c2ecf20Sopenharmony_ci	# swapon /dev/swapB
238c2ecf20Sopenharmony_ci
248c2ecf20Sopenharmony_ciThen node 0 will use the two swap devices in the order of swapA then swapB and
258c2ecf20Sopenharmony_cinode 1 will use the two swap devices in the order of swapB then swapA. Note
268c2ecf20Sopenharmony_cithat the order of them being swapped on doesn't matter.
278c2ecf20Sopenharmony_ci
288c2ecf20Sopenharmony_ciA more complex example on a 4 node machine. Assume 6 swap devices are going to
298c2ecf20Sopenharmony_cibe swapped on: swapA and swapB are attached to node 0, swapC is attached to
308c2ecf20Sopenharmony_cinode 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
318c2ecf20Sopenharmony_ciThe way to swap them on is the same as above::
328c2ecf20Sopenharmony_ci
338c2ecf20Sopenharmony_ci	# swapon /dev/swapA
348c2ecf20Sopenharmony_ci	# swapon /dev/swapB
358c2ecf20Sopenharmony_ci	# swapon /dev/swapC
368c2ecf20Sopenharmony_ci	# swapon /dev/swapD
378c2ecf20Sopenharmony_ci	# swapon /dev/swapE
388c2ecf20Sopenharmony_ci	# swapon /dev/swapF
398c2ecf20Sopenharmony_ci
408c2ecf20Sopenharmony_ciThen node 0 will use them in the order of::
418c2ecf20Sopenharmony_ci
428c2ecf20Sopenharmony_ci	swapA/swapB -> swapC -> swapD -> swapE -> swapF
438c2ecf20Sopenharmony_ci
448c2ecf20Sopenharmony_ciswapA and swapB will be used in a round robin mode before any other swap device.
458c2ecf20Sopenharmony_ci
468c2ecf20Sopenharmony_cinode 1 will use them in the order of::
478c2ecf20Sopenharmony_ci
488c2ecf20Sopenharmony_ci	swapC -> swapA -> swapB -> swapD -> swapE -> swapF
498c2ecf20Sopenharmony_ci
508c2ecf20Sopenharmony_cinode 2 will use them in the order of::
518c2ecf20Sopenharmony_ci
528c2ecf20Sopenharmony_ci	swapD/swapE -> swapA -> swapB -> swapC -> swapF
538c2ecf20Sopenharmony_ci
548c2ecf20Sopenharmony_ciSimilaly, swapD and swapE will be used in a round robin mode before any
558c2ecf20Sopenharmony_ciother swap devices.
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_cinode 3 will use them in the order of::
588c2ecf20Sopenharmony_ci
598c2ecf20Sopenharmony_ci	swapF -> swapA -> swapB -> swapC -> swapD -> swapE
608c2ecf20Sopenharmony_ci
618c2ecf20Sopenharmony_ci
628c2ecf20Sopenharmony_ciImplementation details
638c2ecf20Sopenharmony_ci======================
648c2ecf20Sopenharmony_ci
658c2ecf20Sopenharmony_ciThe current code uses a priority based list, swap_avail_list, to decide
668c2ecf20Sopenharmony_ciwhich swap device to use and if multiple swap devices share the same
678c2ecf20Sopenharmony_cipriority, they are used round robin. This change here replaces the single
688c2ecf20Sopenharmony_ciglobal swap_avail_list with a per-numa-node list, i.e. for each numa node,
698c2ecf20Sopenharmony_ciit sees its own priority based list of available swap devices. Swap
708c2ecf20Sopenharmony_cidevice's priority can be promoted on its matching node's swap_avail_list.
718c2ecf20Sopenharmony_ci
728c2ecf20Sopenharmony_ciThe current swap device's priority is set as: user can set a >=0 value,
738c2ecf20Sopenharmony_cior the system will pick one starting from -1 then downwards. The priority
748c2ecf20Sopenharmony_civalue in the swap_avail_list is the negated value of the swap device's
758c2ecf20Sopenharmony_cidue to plist being sorted from low to high. The new policy doesn't change
768c2ecf20Sopenharmony_cithe semantics for priority >=0 cases, the previous starting from -1 then
778c2ecf20Sopenharmony_cidownwards now becomes starting from -2 then downwards and -1 is reserved
788c2ecf20Sopenharmony_cias the promoted value. So if multiple swap devices are attached to the same
798c2ecf20Sopenharmony_cinode, they will all be promoted to priority -1 on that node's plist and will
808c2ecf20Sopenharmony_cibe used round robin before any other swap devices.
81