18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci=========================================================== 48c2ecf20Sopenharmony_ciPOWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) 58c2ecf20Sopenharmony_ci=========================================================== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciDevice types supported: 88c2ecf20Sopenharmony_ci - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ciThis device acts as a VM interrupt controller. It provides the KVM 118c2ecf20Sopenharmony_ciinterface to configure the interrupt sources of a VM in the underlying 128c2ecf20Sopenharmony_ciPOWER9 XIVE interrupt controller. 138c2ecf20Sopenharmony_ci 148c2ecf20Sopenharmony_ciOnly one XIVE instance may be instantiated. A guest XIVE device 158c2ecf20Sopenharmony_cirequires a POWER9 host and the guest OS should have support for the 168c2ecf20Sopenharmony_ciXIVE native exploitation interrupt mode. If not, it should run using 178c2ecf20Sopenharmony_cithe legacy interrupt mode, referred as XICS (POWER7/8). 188c2ecf20Sopenharmony_ci 198c2ecf20Sopenharmony_ci* Device Mappings 208c2ecf20Sopenharmony_ci 218c2ecf20Sopenharmony_ci The KVM device exposes different MMIO ranges of the XIVE HW which 228c2ecf20Sopenharmony_ci are required for interrupt management. These are exposed to the 238c2ecf20Sopenharmony_ci guest in VMAs populated with a custom VM fault handler. 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ci 1. Thread Interrupt Management Area (TIMA) 268c2ecf20Sopenharmony_ci 278c2ecf20Sopenharmony_ci Each thread has an associated Thread Interrupt Management context 288c2ecf20Sopenharmony_ci composed of a set of registers. These registers let the thread 298c2ecf20Sopenharmony_ci handle priority management and interrupt acknowledgment. The most 308c2ecf20Sopenharmony_ci important are : 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_ci - Interrupt Pending Buffer (IPB) 338c2ecf20Sopenharmony_ci - Current Processor Priority (CPPR) 348c2ecf20Sopenharmony_ci - Notification Source Register (NSR) 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ci They are exposed to software in four different pages each proposing 378c2ecf20Sopenharmony_ci a view with a different privilege. The first page is for the 388c2ecf20Sopenharmony_ci physical thread context and the second for the hypervisor. Only the 398c2ecf20Sopenharmony_ci third (operating system) and the fourth (user level) are exposed the 408c2ecf20Sopenharmony_ci guest. 418c2ecf20Sopenharmony_ci 428c2ecf20Sopenharmony_ci 2. Event State Buffer (ESB) 438c2ecf20Sopenharmony_ci 448c2ecf20Sopenharmony_ci Each source is associated with an Event State Buffer (ESB) with 458c2ecf20Sopenharmony_ci either a pair of even/odd pair of pages which provides commands to 468c2ecf20Sopenharmony_ci manage the source: to trigger, to EOI, to turn off the source for 478c2ecf20Sopenharmony_ci instance. 488c2ecf20Sopenharmony_ci 498c2ecf20Sopenharmony_ci 3. Device pass-through 508c2ecf20Sopenharmony_ci 518c2ecf20Sopenharmony_ci When a device is passed-through into the guest, the source 528c2ecf20Sopenharmony_ci interrupts are from a different HW controller (PHB4) and the ESB 538c2ecf20Sopenharmony_ci pages exposed to the guest should accommadate this change. 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ci The passthru_irq helpers, kvmppc_xive_set_mapped() and 568c2ecf20Sopenharmony_ci kvmppc_xive_clr_mapped() are called when the device HW irqs are 578c2ecf20Sopenharmony_ci mapped into or unmapped from the guest IRQ number space. The KVM 588c2ecf20Sopenharmony_ci device extends these helpers to clear the ESB pages of the guest IRQ 598c2ecf20Sopenharmony_ci number being mapped and then lets the VM fault handler repopulate. 608c2ecf20Sopenharmony_ci The handler will insert the ESB page corresponding to the HW 618c2ecf20Sopenharmony_ci interrupt of the device being passed-through or the initial IPI ESB 628c2ecf20Sopenharmony_ci page if the device has being removed. 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ci The ESB remapping is fully transparent to the guest and the OS 658c2ecf20Sopenharmony_ci device driver. All handling is done within VFIO and the above 668c2ecf20Sopenharmony_ci helpers in KVM-PPC. 678c2ecf20Sopenharmony_ci 688c2ecf20Sopenharmony_ci* Groups: 698c2ecf20Sopenharmony_ci 708c2ecf20Sopenharmony_ci1. KVM_DEV_XIVE_GRP_CTRL 718c2ecf20Sopenharmony_ci Provides global controls on the device 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ci Attributes: 748c2ecf20Sopenharmony_ci 1.1 KVM_DEV_XIVE_RESET (write only) 758c2ecf20Sopenharmony_ci Resets the interrupt controller configuration for sources and event 768c2ecf20Sopenharmony_ci queues. To be used by kexec and kdump. 778c2ecf20Sopenharmony_ci 788c2ecf20Sopenharmony_ci Errors: none 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ci 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 818c2ecf20Sopenharmony_ci Sync all the sources and queues and mark the EQ pages dirty. This 828c2ecf20Sopenharmony_ci to make sure that a consistent memory state is captured when 838c2ecf20Sopenharmony_ci migrating the VM. 848c2ecf20Sopenharmony_ci 858c2ecf20Sopenharmony_ci Errors: none 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ci 1.3 KVM_DEV_XIVE_NR_SERVERS (write only) 888c2ecf20Sopenharmony_ci The kvm_device_attr.addr points to a __u32 value which is the number of 898c2ecf20Sopenharmony_ci interrupt server numbers (ie, highest possible vcpu id plus one). 908c2ecf20Sopenharmony_ci 918c2ecf20Sopenharmony_ci Errors: 928c2ecf20Sopenharmony_ci 938c2ecf20Sopenharmony_ci ======= ========================================== 948c2ecf20Sopenharmony_ci -EINVAL Value greater than KVM_MAX_VCPU_ID. 958c2ecf20Sopenharmony_ci -EFAULT Invalid user pointer for attr->addr. 968c2ecf20Sopenharmony_ci -EBUSY A vCPU is already connected to the device. 978c2ecf20Sopenharmony_ci ======= ========================================== 988c2ecf20Sopenharmony_ci 998c2ecf20Sopenharmony_ci2. KVM_DEV_XIVE_GRP_SOURCE (write only) 1008c2ecf20Sopenharmony_ci Initializes a new source in the XIVE device and mask it. 1018c2ecf20Sopenharmony_ci 1028c2ecf20Sopenharmony_ci Attributes: 1038c2ecf20Sopenharmony_ci Interrupt source number (64-bit) 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ci The kvm_device_attr.addr points to a __u64 value:: 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ci bits: | 63 .... 2 | 1 | 0 1088c2ecf20Sopenharmony_ci values: | unused | level | type 1098c2ecf20Sopenharmony_ci 1108c2ecf20Sopenharmony_ci - type: 0:MSI 1:LSI 1118c2ecf20Sopenharmony_ci - level: assertion level in case of an LSI. 1128c2ecf20Sopenharmony_ci 1138c2ecf20Sopenharmony_ci Errors: 1148c2ecf20Sopenharmony_ci 1158c2ecf20Sopenharmony_ci ======= ========================================== 1168c2ecf20Sopenharmony_ci -E2BIG Interrupt source number is out of range 1178c2ecf20Sopenharmony_ci -ENOMEM Could not create a new source block 1188c2ecf20Sopenharmony_ci -EFAULT Invalid user pointer for attr->addr. 1198c2ecf20Sopenharmony_ci -ENXIO Could not allocate underlying HW interrupt 1208c2ecf20Sopenharmony_ci ======= ========================================== 1218c2ecf20Sopenharmony_ci 1228c2ecf20Sopenharmony_ci3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 1238c2ecf20Sopenharmony_ci Configures source targeting 1248c2ecf20Sopenharmony_ci 1258c2ecf20Sopenharmony_ci Attributes: 1268c2ecf20Sopenharmony_ci Interrupt source number (64-bit) 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ci The kvm_device_attr.addr points to a __u64 value:: 1298c2ecf20Sopenharmony_ci 1308c2ecf20Sopenharmony_ci bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 1318c2ecf20Sopenharmony_ci values: | eisn | mask | server | priority 1328c2ecf20Sopenharmony_ci 1338c2ecf20Sopenharmony_ci - priority: 0-7 interrupt priority level 1348c2ecf20Sopenharmony_ci - server: CPU number chosen to handle the interrupt 1358c2ecf20Sopenharmony_ci - mask: mask flag (unused) 1368c2ecf20Sopenharmony_ci - eisn: Effective Interrupt Source Number 1378c2ecf20Sopenharmony_ci 1388c2ecf20Sopenharmony_ci Errors: 1398c2ecf20Sopenharmony_ci 1408c2ecf20Sopenharmony_ci ======= ======================================================= 1418c2ecf20Sopenharmony_ci -ENOENT Unknown source number 1428c2ecf20Sopenharmony_ci -EINVAL Not initialized source number 1438c2ecf20Sopenharmony_ci -EINVAL Invalid priority 1448c2ecf20Sopenharmony_ci -EINVAL Invalid CPU number. 1458c2ecf20Sopenharmony_ci -EFAULT Invalid user pointer for attr->addr. 1468c2ecf20Sopenharmony_ci -ENXIO CPU event queues not configured or configuration of the 1478c2ecf20Sopenharmony_ci underlying HW interrupt failed 1488c2ecf20Sopenharmony_ci -EBUSY No CPU available to serve interrupt 1498c2ecf20Sopenharmony_ci ======= ======================================================= 1508c2ecf20Sopenharmony_ci 1518c2ecf20Sopenharmony_ci4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 1528c2ecf20Sopenharmony_ci Configures an event queue of a CPU 1538c2ecf20Sopenharmony_ci 1548c2ecf20Sopenharmony_ci Attributes: 1558c2ecf20Sopenharmony_ci EQ descriptor identifier (64-bit) 1568c2ecf20Sopenharmony_ci 1578c2ecf20Sopenharmony_ci The EQ descriptor identifier is a tuple (server, priority):: 1588c2ecf20Sopenharmony_ci 1598c2ecf20Sopenharmony_ci bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 1608c2ecf20Sopenharmony_ci values: | unused | server | priority 1618c2ecf20Sopenharmony_ci 1628c2ecf20Sopenharmony_ci The kvm_device_attr.addr points to:: 1638c2ecf20Sopenharmony_ci 1648c2ecf20Sopenharmony_ci struct kvm_ppc_xive_eq { 1658c2ecf20Sopenharmony_ci __u32 flags; 1668c2ecf20Sopenharmony_ci __u32 qshift; 1678c2ecf20Sopenharmony_ci __u64 qaddr; 1688c2ecf20Sopenharmony_ci __u32 qtoggle; 1698c2ecf20Sopenharmony_ci __u32 qindex; 1708c2ecf20Sopenharmony_ci __u8 pad[40]; 1718c2ecf20Sopenharmony_ci }; 1728c2ecf20Sopenharmony_ci 1738c2ecf20Sopenharmony_ci - flags: queue flags 1748c2ecf20Sopenharmony_ci KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 1758c2ecf20Sopenharmony_ci forces notification without using the coalescing mechanism 1768c2ecf20Sopenharmony_ci provided by the XIVE END ESBs. 1778c2ecf20Sopenharmony_ci - qshift: queue size (power of 2) 1788c2ecf20Sopenharmony_ci - qaddr: real address of queue 1798c2ecf20Sopenharmony_ci - qtoggle: current queue toggle bit 1808c2ecf20Sopenharmony_ci - qindex: current queue index 1818c2ecf20Sopenharmony_ci - pad: reserved for future use 1828c2ecf20Sopenharmony_ci 1838c2ecf20Sopenharmony_ci Errors: 1848c2ecf20Sopenharmony_ci 1858c2ecf20Sopenharmony_ci ======= ========================================= 1868c2ecf20Sopenharmony_ci -ENOENT Invalid CPU number 1878c2ecf20Sopenharmony_ci -EINVAL Invalid priority 1888c2ecf20Sopenharmony_ci -EINVAL Invalid flags 1898c2ecf20Sopenharmony_ci -EINVAL Invalid queue size 1908c2ecf20Sopenharmony_ci -EINVAL Invalid queue address 1918c2ecf20Sopenharmony_ci -EFAULT Invalid user pointer for attr->addr. 1928c2ecf20Sopenharmony_ci -EIO Configuration of the underlying HW failed 1938c2ecf20Sopenharmony_ci ======= ========================================= 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ci5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 1968c2ecf20Sopenharmony_ci Synchronize the source to flush event notifications 1978c2ecf20Sopenharmony_ci 1988c2ecf20Sopenharmony_ci Attributes: 1998c2ecf20Sopenharmony_ci Interrupt source number (64-bit) 2008c2ecf20Sopenharmony_ci 2018c2ecf20Sopenharmony_ci Errors: 2028c2ecf20Sopenharmony_ci 2038c2ecf20Sopenharmony_ci ======= ============================= 2048c2ecf20Sopenharmony_ci -ENOENT Unknown source number 2058c2ecf20Sopenharmony_ci -EINVAL Not initialized source number 2068c2ecf20Sopenharmony_ci ======= ============================= 2078c2ecf20Sopenharmony_ci 2088c2ecf20Sopenharmony_ci* VCPU state 2098c2ecf20Sopenharmony_ci 2108c2ecf20Sopenharmony_ci The XIVE IC maintains VP interrupt state in an internal structure 2118c2ecf20Sopenharmony_ci called the NVT. When a VP is not dispatched on a HW processor 2128c2ecf20Sopenharmony_ci thread, this structure can be updated by HW if the VP is the target 2138c2ecf20Sopenharmony_ci of an event notification. 2148c2ecf20Sopenharmony_ci 2158c2ecf20Sopenharmony_ci It is important for migration to capture the cached IPB from the NVT 2168c2ecf20Sopenharmony_ci as it synthesizes the priorities of the pending interrupts. We 2178c2ecf20Sopenharmony_ci capture a bit more to report debug information. 2188c2ecf20Sopenharmony_ci 2198c2ecf20Sopenharmony_ci KVM_REG_PPC_VP_STATE (2 * 64bits):: 2208c2ecf20Sopenharmony_ci 2218c2ecf20Sopenharmony_ci bits: | 63 .... 32 | 31 .... 0 | 2228c2ecf20Sopenharmony_ci values: | TIMA word0 | TIMA word1 | 2238c2ecf20Sopenharmony_ci bits: | 127 .......... 64 | 2248c2ecf20Sopenharmony_ci values: | unused | 2258c2ecf20Sopenharmony_ci 2268c2ecf20Sopenharmony_ci* Migration: 2278c2ecf20Sopenharmony_ci 2288c2ecf20Sopenharmony_ci Saving the state of a VM using the XIVE native exploitation mode 2298c2ecf20Sopenharmony_ci should follow a specific sequence. When the VM is stopped : 2308c2ecf20Sopenharmony_ci 2318c2ecf20Sopenharmony_ci 1. Mask all sources (PQ=01) to stop the flow of events. 2328c2ecf20Sopenharmony_ci 2338c2ecf20Sopenharmony_ci 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to 2348c2ecf20Sopenharmony_ci flush any in-flight event notification and to stabilize the EQs. At 2358c2ecf20Sopenharmony_ci this stage, the EQ pages are marked dirty to make sure they are 2368c2ecf20Sopenharmony_ci transferred in the migration sequence. 2378c2ecf20Sopenharmony_ci 2388c2ecf20Sopenharmony_ci 3. Capture the state of the source targeting, the EQs configuration 2398c2ecf20Sopenharmony_ci and the state of thread interrupt context registers. 2408c2ecf20Sopenharmony_ci 2418c2ecf20Sopenharmony_ci Restore is similar: 2428c2ecf20Sopenharmony_ci 2438c2ecf20Sopenharmony_ci 1. Restore the EQ configuration. As targeting depends on it. 2448c2ecf20Sopenharmony_ci 2. Restore targeting 2458c2ecf20Sopenharmony_ci 3. Restore the thread interrupt contexts 2468c2ecf20Sopenharmony_ci 4. Restore the source states 2478c2ecf20Sopenharmony_ci 5. Let the vCPU run 248