18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci.. iommu: 38c2ecf20Sopenharmony_ci 48c2ecf20Sopenharmony_ci===================================== 58c2ecf20Sopenharmony_ciIOMMU Userspace API 68c2ecf20Sopenharmony_ci===================================== 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ciIOMMU UAPI is used for virtualization cases where communications are 98c2ecf20Sopenharmony_cineeded between physical and virtual IOMMU drivers. For baremetal 108c2ecf20Sopenharmony_ciusage, the IOMMU is a system device which does not need to communicate 118c2ecf20Sopenharmony_ciwith userspace directly. 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciThe primary use cases are guest Shared Virtual Address (SVA) and 148c2ecf20Sopenharmony_ciguest IO virtual address (IOVA), wherein the vIOMMU implementation 158c2ecf20Sopenharmony_cirelies on the physical IOMMU and for this reason requires interactions 168c2ecf20Sopenharmony_ciwith the host driver. 178c2ecf20Sopenharmony_ci 188c2ecf20Sopenharmony_ci.. contents:: :local: 198c2ecf20Sopenharmony_ci 208c2ecf20Sopenharmony_ciFunctionalities 218c2ecf20Sopenharmony_ci=============== 228c2ecf20Sopenharmony_ciCommunications of user and kernel involve both directions. The 238c2ecf20Sopenharmony_cisupported user-kernel APIs are as follows: 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ci1. Bind/Unbind guest PASID (e.g. Intel VT-d) 268c2ecf20Sopenharmony_ci2. Bind/Unbind guest PASID table (e.g. ARM SMMU) 278c2ecf20Sopenharmony_ci3. Invalidate IOMMU caches upon guest requests 288c2ecf20Sopenharmony_ci4. Report errors to the guest and serve page requests 298c2ecf20Sopenharmony_ci 308c2ecf20Sopenharmony_ciRequirements 318c2ecf20Sopenharmony_ci============ 328c2ecf20Sopenharmony_ciThe IOMMU UAPIs are generic and extensible to meet the following 338c2ecf20Sopenharmony_cirequirements: 348c2ecf20Sopenharmony_ci 358c2ecf20Sopenharmony_ci1. Emulated and para-virtualised vIOMMUs 368c2ecf20Sopenharmony_ci2. Multiple vendors (Intel VT-d, ARM SMMU, etc.) 378c2ecf20Sopenharmony_ci3. Extensions to the UAPI shall not break existing userspace 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_ciInterfaces 408c2ecf20Sopenharmony_ci========== 418c2ecf20Sopenharmony_ciAlthough the data structures defined in IOMMU UAPI are self-contained, 428c2ecf20Sopenharmony_cithere are no user API functions introduced. Instead, IOMMU UAPI is 438c2ecf20Sopenharmony_cidesigned to work with existing user driver frameworks such as VFIO. 448c2ecf20Sopenharmony_ci 458c2ecf20Sopenharmony_ciExtension Rules & Precautions 468c2ecf20Sopenharmony_ci----------------------------- 478c2ecf20Sopenharmony_ciWhen IOMMU UAPI gets extended, the data structures can *only* be 488c2ecf20Sopenharmony_cimodified in two ways: 498c2ecf20Sopenharmony_ci 508c2ecf20Sopenharmony_ci1. Adding new fields by re-purposing the padding[] field. No size change. 518c2ecf20Sopenharmony_ci2. Adding new union members at the end. May increase the structure sizes. 528c2ecf20Sopenharmony_ci 538c2ecf20Sopenharmony_ciNo new fields can be added *after* the variable sized union in that it 548c2ecf20Sopenharmony_ciwill break backward compatibility when offset moves. A new flag must 558c2ecf20Sopenharmony_cibe introduced whenever a change affects the structure using either 568c2ecf20Sopenharmony_cimethod. The IOMMU driver processes the data based on flags which 578c2ecf20Sopenharmony_ciensures backward compatibility. 588c2ecf20Sopenharmony_ci 598c2ecf20Sopenharmony_ciVersion field is only reserved for the unlikely event of UAPI upgrade 608c2ecf20Sopenharmony_ciat its entirety. 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ciIt's *always* the caller's responsibility to indicate the size of the 638c2ecf20Sopenharmony_cistructure passed by setting argsz appropriately. 648c2ecf20Sopenharmony_ciThough at the same time, argsz is user provided data which is not 658c2ecf20Sopenharmony_citrusted. The argsz field allows the user app to indicate how much data 668c2ecf20Sopenharmony_ciit is providing; it's still the kernel's responsibility to validate 678c2ecf20Sopenharmony_ciwhether it's correct and sufficient for the requested operation. 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_ciCompatibility Checking 708c2ecf20Sopenharmony_ci---------------------- 718c2ecf20Sopenharmony_ciWhen IOMMU UAPI extension results in some structure size increase, 728c2ecf20Sopenharmony_ciIOMMU UAPI code shall handle the following cases: 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_ci1. User and kernel has exact size match 758c2ecf20Sopenharmony_ci2. An older user with older kernel header (smaller UAPI size) running on a 768c2ecf20Sopenharmony_ci newer kernel (larger UAPI size) 778c2ecf20Sopenharmony_ci3. A newer user with newer kernel header (larger UAPI size) running 788c2ecf20Sopenharmony_ci on an older kernel. 798c2ecf20Sopenharmony_ci4. A malicious/misbehaving user passing illegal/invalid size but within 808c2ecf20Sopenharmony_ci range. The data may contain garbage. 818c2ecf20Sopenharmony_ci 828c2ecf20Sopenharmony_ciFeature Checking 838c2ecf20Sopenharmony_ci---------------- 848c2ecf20Sopenharmony_ciWhile launching a guest with vIOMMU, it is strongly advised to check 858c2ecf20Sopenharmony_cithe compatibility upfront, as some subsequent errors happening during 868c2ecf20Sopenharmony_civIOMMU operation, such as cache invalidation failures cannot be nicely 878c2ecf20Sopenharmony_ciescalated to the guest due to IOMMU specifications. This can lead to 888c2ecf20Sopenharmony_cicatastrophic failures for the users. 898c2ecf20Sopenharmony_ci 908c2ecf20Sopenharmony_ciUser applications such as QEMU are expected to import kernel UAPI 918c2ecf20Sopenharmony_ciheaders. Backward compatibility is supported per feature flags. 928c2ecf20Sopenharmony_ciFor example, an older QEMU (with older kernel header) can run on newer 938c2ecf20Sopenharmony_cikernel. Newer QEMU (with new kernel header) may refuse to initialize 948c2ecf20Sopenharmony_cion an older kernel if new feature flags are not supported by older 958c2ecf20Sopenharmony_cikernel. Simply recompiling existing code with newer kernel header should 968c2ecf20Sopenharmony_cinot be an issue in that only existing flags are used. 978c2ecf20Sopenharmony_ci 988c2ecf20Sopenharmony_ciIOMMU vendor driver should report the below features to IOMMU UAPI 998c2ecf20Sopenharmony_ciconsumers (e.g. via VFIO). 1008c2ecf20Sopenharmony_ci 1018c2ecf20Sopenharmony_ci1. IOMMU_NESTING_FEAT_SYSWIDE_PASID 1028c2ecf20Sopenharmony_ci2. IOMMU_NESTING_FEAT_BIND_PGTBL 1038c2ecf20Sopenharmony_ci3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE 1048c2ecf20Sopenharmony_ci4. IOMMU_NESTING_FEAT_CACHE_INVLD 1058c2ecf20Sopenharmony_ci5. IOMMU_NESTING_FEAT_PAGE_REQUEST 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ciTake VFIO as example, upon request from VFIO userspace (e.g. QEMU), 1088c2ecf20Sopenharmony_ciVFIO kernel code shall query IOMMU vendor driver for the support of 1098c2ecf20Sopenharmony_cithe above features. Query result can then be reported back to the 1108c2ecf20Sopenharmony_ciuserspace caller. Details can be found in 1118c2ecf20Sopenharmony_ciDocumentation/driver-api/vfio.rst. 1128c2ecf20Sopenharmony_ci 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_ciData Passing Example with VFIO 1158c2ecf20Sopenharmony_ci------------------------------ 1168c2ecf20Sopenharmony_ciAs the ubiquitous userspace driver framework, VFIO is already IOMMU 1178c2ecf20Sopenharmony_ciaware and shares many key concepts such as device model, group, and 1188c2ecf20Sopenharmony_ciprotection domain. Other user driver frameworks can also be extended 1198c2ecf20Sopenharmony_cito support IOMMU UAPI but it is outside the scope of this document. 1208c2ecf20Sopenharmony_ci 1218c2ecf20Sopenharmony_ciIn this tight-knit VFIO-IOMMU interface, the ultimate consumer of the 1228c2ecf20Sopenharmony_ciIOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel 1238c2ecf20Sopenharmony_citransport, capability checking, security, and life cycle management of 1248c2ecf20Sopenharmony_ciprocess address space ID (PASID). 1258c2ecf20Sopenharmony_ci 1268c2ecf20Sopenharmony_ciVFIO layer conveys the data structures down to the IOMMU driver. It 1278c2ecf20Sopenharmony_cifollows the pattern below:: 1288c2ecf20Sopenharmony_ci 1298c2ecf20Sopenharmony_ci struct { 1308c2ecf20Sopenharmony_ci __u32 argsz; 1318c2ecf20Sopenharmony_ci __u32 flags; 1328c2ecf20Sopenharmony_ci __u8 data[]; 1338c2ecf20Sopenharmony_ci }; 1348c2ecf20Sopenharmony_ci 1358c2ecf20Sopenharmony_ciHere data[] contains the IOMMU UAPI data structures. VFIO has the 1368c2ecf20Sopenharmony_cifreedom to bundle the data as well as parse data size based on its own flags. 1378c2ecf20Sopenharmony_ci 1388c2ecf20Sopenharmony_ciIn order to determine the size and feature set of the user data, argsz 1398c2ecf20Sopenharmony_ciand flags (or the equivalent) are also embedded in the IOMMU UAPI data 1408c2ecf20Sopenharmony_cistructures. 1418c2ecf20Sopenharmony_ci 1428c2ecf20Sopenharmony_ciA "__u32 argsz" field is *always* at the beginning of each structure. 1438c2ecf20Sopenharmony_ci 1448c2ecf20Sopenharmony_ciFor example: 1458c2ecf20Sopenharmony_ci:: 1468c2ecf20Sopenharmony_ci 1478c2ecf20Sopenharmony_ci struct iommu_cache_invalidate_info { 1488c2ecf20Sopenharmony_ci __u32 argsz; 1498c2ecf20Sopenharmony_ci #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 1508c2ecf20Sopenharmony_ci __u32 version; 1518c2ecf20Sopenharmony_ci /* IOMMU paging structure cache */ 1528c2ecf20Sopenharmony_ci #define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */ 1538c2ecf20Sopenharmony_ci #define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */ 1548c2ecf20Sopenharmony_ci #define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */ 1558c2ecf20Sopenharmony_ci #define IOMMU_CACHE_INV_TYPE_NR (3) 1568c2ecf20Sopenharmony_ci __u8 cache; 1578c2ecf20Sopenharmony_ci __u8 granularity; 1588c2ecf20Sopenharmony_ci __u8 padding[6]; 1598c2ecf20Sopenharmony_ci union { 1608c2ecf20Sopenharmony_ci struct iommu_inv_pasid_info pasid_info; 1618c2ecf20Sopenharmony_ci struct iommu_inv_addr_info addr_info; 1628c2ecf20Sopenharmony_ci } granu; 1638c2ecf20Sopenharmony_ci }; 1648c2ecf20Sopenharmony_ci 1658c2ecf20Sopenharmony_ciVFIO is responsible for checking its own argsz and flags. It then 1668c2ecf20Sopenharmony_ciinvokes appropriate IOMMU UAPI functions. The user pointers are passed 1678c2ecf20Sopenharmony_cito the IOMMU layer for further processing. The responsibilities are 1688c2ecf20Sopenharmony_cidivided as follows: 1698c2ecf20Sopenharmony_ci 1708c2ecf20Sopenharmony_ci- Generic IOMMU layer checks argsz range based on UAPI data in the 1718c2ecf20Sopenharmony_ci current kernel version. 1728c2ecf20Sopenharmony_ci 1738c2ecf20Sopenharmony_ci- Generic IOMMU layer checks content of the UAPI data for non-zero 1748c2ecf20Sopenharmony_ci reserved bits in flags, padding fields, and unsupported version. 1758c2ecf20Sopenharmony_ci This is to ensure not breaking userspace in the future when these 1768c2ecf20Sopenharmony_ci fields or flags are used. 1778c2ecf20Sopenharmony_ci 1788c2ecf20Sopenharmony_ci- Vendor IOMMU driver checks argsz based on vendor flags. UAPI data 1798c2ecf20Sopenharmony_ci is consumed based on flags. Vendor driver has access to 1808c2ecf20Sopenharmony_ci unadulterated argsz value in case of vendor specific future 1818c2ecf20Sopenharmony_ci extensions. Currently, it does not perform the copy_from_user() 1828c2ecf20Sopenharmony_ci itself. A __user pointer can be provided in some future scenarios 1838c2ecf20Sopenharmony_ci where there's vendor data outside of the structure definition. 1848c2ecf20Sopenharmony_ci 1858c2ecf20Sopenharmony_ciIOMMU code treats UAPI data in two categories: 1868c2ecf20Sopenharmony_ci 1878c2ecf20Sopenharmony_ci- structure contains vendor data 1888c2ecf20Sopenharmony_ci (Example: iommu_uapi_cache_invalidate()) 1898c2ecf20Sopenharmony_ci 1908c2ecf20Sopenharmony_ci- structure contains only generic data 1918c2ecf20Sopenharmony_ci (Example: iommu_uapi_sva_bind_gpasid()) 1928c2ecf20Sopenharmony_ci 1938c2ecf20Sopenharmony_ci 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ciSharing UAPI with in-kernel users 1968c2ecf20Sopenharmony_ci--------------------------------- 1978c2ecf20Sopenharmony_ciFor UAPIs that are shared with in-kernel users, a wrapper function is 1988c2ecf20Sopenharmony_ciprovided to distinguish the callers. For example, 1998c2ecf20Sopenharmony_ci 2008c2ecf20Sopenharmony_ciUserspace caller :: 2018c2ecf20Sopenharmony_ci 2028c2ecf20Sopenharmony_ci int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, 2038c2ecf20Sopenharmony_ci struct device *dev, 2048c2ecf20Sopenharmony_ci void __user *udata) 2058c2ecf20Sopenharmony_ci 2068c2ecf20Sopenharmony_ciIn-kernel caller :: 2078c2ecf20Sopenharmony_ci 2088c2ecf20Sopenharmony_ci int iommu_sva_unbind_gpasid(struct iommu_domain *domain, 2098c2ecf20Sopenharmony_ci struct device *dev, ioasid_t ioasid); 210