162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci============== 462306a36Sopenharmony_ci5-level paging 562306a36Sopenharmony_ci============== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciOverview 862306a36Sopenharmony_ci======== 962306a36Sopenharmony_ciOriginal x86-64 was limited by 4-level paging to 256 TiB of virtual address 1062306a36Sopenharmony_cispace and 64 TiB of physical address space. We are already bumping into 1162306a36Sopenharmony_cithis limit: some vendors offer servers with 64 TiB of memory today. 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciTo overcome the limitation upcoming hardware will introduce support for 1462306a36Sopenharmony_ci5-level paging. It is a straight-forward extension of the current page 1562306a36Sopenharmony_citable structure adding one more layer of translation. 1662306a36Sopenharmony_ci 1762306a36Sopenharmony_ciIt bumps the limits to 128 PiB of virtual address space and 4 PiB of 1862306a36Sopenharmony_ciphysical address space. This "ought to be enough for anybody" ©. 1962306a36Sopenharmony_ci 2062306a36Sopenharmony_ciQEMU 2.9 and later support 5-level paging. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciVirtual memory layout for 5-level paging is described in 2362306a36Sopenharmony_ciDocumentation/arch/x86/x86_64/mm.rst 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ciEnabling 5-level paging 2762306a36Sopenharmony_ci======================= 2862306a36Sopenharmony_ciCONFIG_X86_5LEVEL=y enables the feature. 2962306a36Sopenharmony_ci 3062306a36Sopenharmony_ciKernel with CONFIG_X86_5LEVEL=y still able to boot on 4-level hardware. 3162306a36Sopenharmony_ciIn this case additional page table level -- p4d -- will be folded at 3262306a36Sopenharmony_ciruntime. 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ciUser-space and large virtual address space 3562306a36Sopenharmony_ci========================================== 3662306a36Sopenharmony_ciOn x86, 5-level paging enables 56-bit userspace virtual address space. 3762306a36Sopenharmony_ciNot all user space is ready to handle wide addresses. It's known that 3862306a36Sopenharmony_ciat least some JIT compilers use higher bits in pointers to encode their 3962306a36Sopenharmony_ciinformation. It collides with valid pointers with 5-level paging and 4062306a36Sopenharmony_cileads to crashes. 4162306a36Sopenharmony_ci 4262306a36Sopenharmony_ciTo mitigate this, we are not going to allocate virtual address space 4362306a36Sopenharmony_ciabove 47-bit by default. 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ciBut userspace can ask for allocation from full address space by 4662306a36Sopenharmony_cispecifying hint address (with or without MAP_FIXED) above 47-bits. 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ciIf hint address set above 47-bit, but MAP_FIXED is not specified, we try 4962306a36Sopenharmony_cito look for unmapped area by specified address. If it's already 5062306a36Sopenharmony_cioccupied, we look for unmapped area in *full* address space, rather than 5162306a36Sopenharmony_cifrom 47-bit window. 5262306a36Sopenharmony_ci 5362306a36Sopenharmony_ciA high hint address would only affect the allocation in question, but not 5462306a36Sopenharmony_ciany future mmap()s. 5562306a36Sopenharmony_ci 5662306a36Sopenharmony_ciSpecifying high hint address on older kernel or on machine without 5-level 5762306a36Sopenharmony_cipaging support is safe. The hint will be ignored and kernel will fall back 5862306a36Sopenharmony_cito allocation from 47-bit address space. 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciThis approach helps to easily make application's memory allocator aware 6162306a36Sopenharmony_ciabout large address space without manually tracking allocated virtual 6262306a36Sopenharmony_ciaddress space. 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ciOne important case we need to handle here is interaction with MPX. 6562306a36Sopenharmony_ciMPX (without MAWA extension) cannot handle addresses above 47-bit, so we 6662306a36Sopenharmony_cineed to make sure that MPX cannot be enabled we already have VMA above 6762306a36Sopenharmony_cithe boundary and forbid creating such VMAs once MPX is enabled. 68