162306a36Sopenharmony_ci==============
262306a36Sopenharmony_ciPage migration
362306a36Sopenharmony_ci==============
462306a36Sopenharmony_ci
562306a36Sopenharmony_ciPage migration allows moving the physical location of pages between
662306a36Sopenharmony_cinodes in a NUMA system while the process is running. This means that the
762306a36Sopenharmony_civirtual addresses that the process sees do not change. However, the
862306a36Sopenharmony_cisystem rearranges the physical location of those pages.
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciAlso see Documentation/mm/hmm.rst for migrating pages to or from device
1162306a36Sopenharmony_ciprivate memory.
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciThe main intent of page migration is to reduce the latency of memory accesses
1462306a36Sopenharmony_ciby moving pages near to the processor where the process accessing that memory
1562306a36Sopenharmony_ciis running.
1662306a36Sopenharmony_ci
1762306a36Sopenharmony_ciPage migration allows a process to manually relocate the node on which its
1862306a36Sopenharmony_cipages are located through the MF_MOVE and MF_MOVE_ALL options while setting
1962306a36Sopenharmony_cia new memory policy via mbind(). The pages of a process can also be relocated
2062306a36Sopenharmony_cifrom another process using the sys_migrate_pages() function call. The
2162306a36Sopenharmony_cimigrate_pages() function call takes two sets of nodes and moves pages of a
2262306a36Sopenharmony_ciprocess that are located on the from nodes to the destination nodes.
2362306a36Sopenharmony_ciPage migration functions are provided by the numactl package by Andi Kleen
2462306a36Sopenharmony_ci(a version later than 0.9.3 is required. Get it from
2562306a36Sopenharmony_cihttps://github.com/numactl/numactl.git). numactl provides libnuma
2662306a36Sopenharmony_ciwhich provides an interface similar to other NUMA functionality for page
2762306a36Sopenharmony_cimigration.  cat ``/proc/<pid>/numa_maps`` allows an easy review of where the
2862306a36Sopenharmony_cipages of a process are located. See also the numa_maps documentation in the
2962306a36Sopenharmony_ciproc(5) man page.
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciManual migration is useful if for example the scheduler has relocated
3262306a36Sopenharmony_cia process to a processor on a distant node. A batch scheduler or an
3362306a36Sopenharmony_ciadministrator may detect the situation and move the pages of the process
3462306a36Sopenharmony_cinearer to the new processor. The kernel itself only provides
3562306a36Sopenharmony_cimanual page migration support. Automatic page migration may be implemented
3662306a36Sopenharmony_cithrough user space processes that move pages. A special function call
3762306a36Sopenharmony_ci"move_pages" allows the moving of individual pages within a process.
3862306a36Sopenharmony_ciFor example, A NUMA profiler may obtain a log showing frequent off-node
3962306a36Sopenharmony_ciaccesses and may use the result to move pages to more advantageous
4062306a36Sopenharmony_cilocations.
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ciLarger installations usually partition the system using cpusets into
4362306a36Sopenharmony_cisections of nodes. Paul Jackson has equipped cpusets with the ability to
4462306a36Sopenharmony_cimove pages when a task is moved to another cpuset (See
4562306a36Sopenharmony_ci:ref:`CPUSETS <cpusets>`).
4662306a36Sopenharmony_ciCpusets allow the automation of process locality. If a task is moved to
4762306a36Sopenharmony_cia new cpuset then also all its pages are moved with it so that the
4862306a36Sopenharmony_ciperformance of the process does not sink dramatically. Also the pages
4962306a36Sopenharmony_ciof processes in a cpuset are moved if the allowed memory nodes of a
5062306a36Sopenharmony_cicpuset are changed.
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciPage migration allows the preservation of the relative location of pages
5362306a36Sopenharmony_ciwithin a group of nodes for all migration techniques which will preserve a
5462306a36Sopenharmony_ciparticular memory allocation pattern generated even after migrating a
5562306a36Sopenharmony_ciprocess. This is necessary in order to preserve the memory latencies.
5662306a36Sopenharmony_ciProcesses will run with similar performance after migration.
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_ciPage migration occurs in several steps. First a high level
5962306a36Sopenharmony_cidescription for those trying to use migrate_pages() from the kernel
6062306a36Sopenharmony_ci(for userspace usage see the Andi Kleen's numactl package mentioned above)
6162306a36Sopenharmony_ciand then a low level description of how the low level details work.
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ciIn kernel use of migrate_pages()
6462306a36Sopenharmony_ci================================
6562306a36Sopenharmony_ci
6662306a36Sopenharmony_ci1. Remove pages from the LRU.
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ci   Lists of pages to be migrated are generated by scanning over
6962306a36Sopenharmony_ci   pages and moving them into lists. This is done by
7062306a36Sopenharmony_ci   calling isolate_lru_page().
7162306a36Sopenharmony_ci   Calling isolate_lru_page() increases the references to the page
7262306a36Sopenharmony_ci   so that it cannot vanish while the page migration occurs.
7362306a36Sopenharmony_ci   It also prevents the swapper or other scans from encountering
7462306a36Sopenharmony_ci   the page.
7562306a36Sopenharmony_ci
7662306a36Sopenharmony_ci2. We need to have a function of type new_folio_t that can be
7762306a36Sopenharmony_ci   passed to migrate_pages(). This function should figure out
7862306a36Sopenharmony_ci   how to allocate the correct new folio given the old folio.
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ci3. The migrate_pages() function is called which attempts
8162306a36Sopenharmony_ci   to do the migration. It will call the function to allocate
8262306a36Sopenharmony_ci   the new folio for each folio that is considered for moving.
8362306a36Sopenharmony_ci
8462306a36Sopenharmony_ciHow migrate_pages() works
8562306a36Sopenharmony_ci=========================
8662306a36Sopenharmony_ci
8762306a36Sopenharmony_cimigrate_pages() does several passes over its list of pages. A page is moved
8862306a36Sopenharmony_ciif all references to a page are removable at the time. The page has
8962306a36Sopenharmony_cialready been removed from the LRU via isolate_lru_page() and the refcount
9062306a36Sopenharmony_ciis increased so that the page cannot be freed while page migration occurs.
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ciSteps:
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ci1. Lock the page to be migrated.
9562306a36Sopenharmony_ci
9662306a36Sopenharmony_ci2. Ensure that writeback is complete.
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ci3. Lock the new page that we want to move to. It is locked so that accesses to
9962306a36Sopenharmony_ci   this (not yet up-to-date) page immediately block while the move is in progress.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ci4. All the page table references to the page are converted to migration
10262306a36Sopenharmony_ci   entries. This decreases the mapcount of a page. If the resulting
10362306a36Sopenharmony_ci   mapcount is not zero then we do not migrate the page. All user space
10462306a36Sopenharmony_ci   processes that attempt to access the page will now wait on the page lock
10562306a36Sopenharmony_ci   or wait for the migration page table entry to be removed.
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ci5. The i_pages lock is taken. This will cause all processes trying
10862306a36Sopenharmony_ci   to access the page via the mapping to block on the spinlock.
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ci6. The refcount of the page is examined and we back out if references remain.
11162306a36Sopenharmony_ci   Otherwise, we know that we are the only one referencing this page.
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ci7. The radix tree is checked and if it does not contain the pointer to this
11462306a36Sopenharmony_ci   page then we back out because someone else modified the radix tree.
11562306a36Sopenharmony_ci
11662306a36Sopenharmony_ci8. The new page is prepped with some settings from the old page so that
11762306a36Sopenharmony_ci   accesses to the new page will discover a page with the correct settings.
11862306a36Sopenharmony_ci
11962306a36Sopenharmony_ci9. The radix tree is changed to point to the new page.
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ci10. The reference count of the old page is dropped because the address space
12262306a36Sopenharmony_ci    reference is gone. A reference to the new page is established because
12362306a36Sopenharmony_ci    the new page is referenced by the address space.
12462306a36Sopenharmony_ci
12562306a36Sopenharmony_ci11. The i_pages lock is dropped. With that lookups in the mapping
12662306a36Sopenharmony_ci    become possible again. Processes will move from spinning on the lock
12762306a36Sopenharmony_ci    to sleeping on the locked new page.
12862306a36Sopenharmony_ci
12962306a36Sopenharmony_ci12. The page contents are copied to the new page.
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ci13. The remaining page flags are copied to the new page.
13262306a36Sopenharmony_ci
13362306a36Sopenharmony_ci14. The old page flags are cleared to indicate that the page does
13462306a36Sopenharmony_ci    not provide any information anymore.
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_ci15. Queued up writeback on the new page is triggered.
13762306a36Sopenharmony_ci
13862306a36Sopenharmony_ci16. If migration entries were inserted into the page table, then replace them
13962306a36Sopenharmony_ci    with real ptes. Doing so will enable access for user space processes not
14062306a36Sopenharmony_ci    already waiting for the page lock.
14162306a36Sopenharmony_ci
14262306a36Sopenharmony_ci17. The page locks are dropped from the old and new page.
14362306a36Sopenharmony_ci    Processes waiting on the page lock will redo their page faults
14462306a36Sopenharmony_ci    and will reach the new page.
14562306a36Sopenharmony_ci
14662306a36Sopenharmony_ci18. The new page is moved to the LRU and can be scanned by the swapper,
14762306a36Sopenharmony_ci    etc. again.
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ciNon-LRU page migration
15062306a36Sopenharmony_ci======================
15162306a36Sopenharmony_ci
15262306a36Sopenharmony_ciAlthough migration originally aimed for reducing the latency of memory
15362306a36Sopenharmony_ciaccesses for NUMA, compaction also uses migration to create high-order
15462306a36Sopenharmony_cipages.  For compaction purposes, it is also useful to be able to move
15562306a36Sopenharmony_cinon-LRU pages, such as zsmalloc and virtio-balloon pages.
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ciIf a driver wants to make its pages movable, it should define a struct
15862306a36Sopenharmony_cimovable_operations.  It then needs to call __SetPageMovable() on each
15962306a36Sopenharmony_cipage that it may be able to move.  This uses the ``page->mapping`` field,
16062306a36Sopenharmony_ciso this field is not available for the driver to use for other purposes.
16162306a36Sopenharmony_ci
16262306a36Sopenharmony_ciMonitoring Migration
16362306a36Sopenharmony_ci=====================
16462306a36Sopenharmony_ci
16562306a36Sopenharmony_ciThe following events (counters) can be used to monitor page migration.
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ci1. PGMIGRATE_SUCCESS: Normal page migration success. Each count means that a
16862306a36Sopenharmony_ci   page was migrated. If the page was a non-THP and non-hugetlb page, then
16962306a36Sopenharmony_ci   this counter is increased by one. If the page was a THP or hugetlb, then
17062306a36Sopenharmony_ci   this counter is increased by the number of THP or hugetlb subpages.
17162306a36Sopenharmony_ci   For example, migration of a single 2MB THP that has 4KB-size base pages
17262306a36Sopenharmony_ci   (subpages) will cause this counter to increase by 512.
17362306a36Sopenharmony_ci
17462306a36Sopenharmony_ci2. PGMIGRATE_FAIL: Normal page migration failure. Same counting rules as for
17562306a36Sopenharmony_ci   PGMIGRATE_SUCCESS, above: this will be increased by the number of subpages,
17662306a36Sopenharmony_ci   if it was a THP or hugetlb.
17762306a36Sopenharmony_ci
17862306a36Sopenharmony_ci3. THP_MIGRATION_SUCCESS: A THP was migrated without being split.
17962306a36Sopenharmony_ci
18062306a36Sopenharmony_ci4. THP_MIGRATION_FAIL: A THP could not be migrated nor it could be split.
18162306a36Sopenharmony_ci
18262306a36Sopenharmony_ci5. THP_MIGRATION_SPLIT: A THP was migrated, but not as such: first, the THP had
18362306a36Sopenharmony_ci   to be split. After splitting, a migration retry was used for its sub-pages.
18462306a36Sopenharmony_ci
18562306a36Sopenharmony_ciTHP_MIGRATION_* events also update the appropriate PGMIGRATE_SUCCESS or
18662306a36Sopenharmony_ciPGMIGRATE_FAIL events. For example, a THP migration failure will cause both
18762306a36Sopenharmony_ciTHP_MIGRATION_FAIL and PGMIGRATE_FAIL to increase.
18862306a36Sopenharmony_ci
18962306a36Sopenharmony_ciChristoph Lameter, May 8, 2006.
19062306a36Sopenharmony_ciMinchan Kim, Mar 28, 2016.
19162306a36Sopenharmony_ci
19262306a36Sopenharmony_ci.. kernel-doc:: include/linux/migrate.h
193