162306a36Sopenharmony_ci===========================================
262306a36Sopenharmony_ciAutomatically bind swap device to numa node
362306a36Sopenharmony_ci===========================================
462306a36Sopenharmony_ci
562306a36Sopenharmony_ciIf the system has more than one swap device and swap device has the node
662306a36Sopenharmony_ciinformation, we can make use of this information to decide which swap
762306a36Sopenharmony_cidevice to use in get_swap_pages() to get better performance.
862306a36Sopenharmony_ci
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciHow to use this feature
1162306a36Sopenharmony_ci=======================
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciSwap device has priority and that decides the order of it to be used. To make
1462306a36Sopenharmony_ciuse of automatically binding, there is no need to manipulate priority settings
1562306a36Sopenharmony_cifor swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
1662306a36Sopenharmony_ciswapB, with swapA attached to node 0 and swapB attached to node 1, are going
1762306a36Sopenharmony_cito be swapped on. Simply swapping them on by doing::
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ci	# swapon /dev/swapA
2062306a36Sopenharmony_ci	# swapon /dev/swapB
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciThen node 0 will use the two swap devices in the order of swapA then swapB and
2362306a36Sopenharmony_cinode 1 will use the two swap devices in the order of swapB then swapA. Note
2462306a36Sopenharmony_cithat the order of them being swapped on doesn't matter.
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ciA more complex example on a 4 node machine. Assume 6 swap devices are going to
2762306a36Sopenharmony_cibe swapped on: swapA and swapB are attached to node 0, swapC is attached to
2862306a36Sopenharmony_cinode 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
2962306a36Sopenharmony_ciThe way to swap them on is the same as above::
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ci	# swapon /dev/swapA
3262306a36Sopenharmony_ci	# swapon /dev/swapB
3362306a36Sopenharmony_ci	# swapon /dev/swapC
3462306a36Sopenharmony_ci	# swapon /dev/swapD
3562306a36Sopenharmony_ci	# swapon /dev/swapE
3662306a36Sopenharmony_ci	# swapon /dev/swapF
3762306a36Sopenharmony_ci
3862306a36Sopenharmony_ciThen node 0 will use them in the order of::
3962306a36Sopenharmony_ci
4062306a36Sopenharmony_ci	swapA/swapB -> swapC -> swapD -> swapE -> swapF
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ciswapA and swapB will be used in a round robin mode before any other swap device.
4362306a36Sopenharmony_ci
4462306a36Sopenharmony_cinode 1 will use them in the order of::
4562306a36Sopenharmony_ci
4662306a36Sopenharmony_ci	swapC -> swapA -> swapB -> swapD -> swapE -> swapF
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_cinode 2 will use them in the order of::
4962306a36Sopenharmony_ci
5062306a36Sopenharmony_ci	swapD/swapE -> swapA -> swapB -> swapC -> swapF
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciSimilaly, swapD and swapE will be used in a round robin mode before any
5362306a36Sopenharmony_ciother swap devices.
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_cinode 3 will use them in the order of::
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ci	swapF -> swapA -> swapB -> swapC -> swapD -> swapE
5862306a36Sopenharmony_ci
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ciImplementation details
6162306a36Sopenharmony_ci======================
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ciThe current code uses a priority based list, swap_avail_list, to decide
6462306a36Sopenharmony_ciwhich swap device to use and if multiple swap devices share the same
6562306a36Sopenharmony_cipriority, they are used round robin. This change here replaces the single
6662306a36Sopenharmony_ciglobal swap_avail_list with a per-numa-node list, i.e. for each numa node,
6762306a36Sopenharmony_ciit sees its own priority based list of available swap devices. Swap
6862306a36Sopenharmony_cidevice's priority can be promoted on its matching node's swap_avail_list.
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ciThe current swap device's priority is set as: user can set a >=0 value,
7162306a36Sopenharmony_cior the system will pick one starting from -1 then downwards. The priority
7262306a36Sopenharmony_civalue in the swap_avail_list is the negated value of the swap device's
7362306a36Sopenharmony_cidue to plist being sorted from low to high. The new policy doesn't change
7462306a36Sopenharmony_cithe semantics for priority >=0 cases, the previous starting from -1 then
7562306a36Sopenharmony_cidownwards now becomes starting from -2 then downwards and -1 is reserved
7662306a36Sopenharmony_cias the promoted value. So if multiple swap devices are attached to the same
7762306a36Sopenharmony_cinode, they will all be promoted to priority -1 on that node's plist and will
7862306a36Sopenharmony_cibe used round robin before any other swap devices.
79