18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci===========================================================
48c2ecf20Sopenharmony_ciPOWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
58c2ecf20Sopenharmony_ci===========================================================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ciDevice types supported:
88c2ecf20Sopenharmony_ci  - KVM_DEV_TYPE_XIVE     POWER9 XIVE Interrupt Controller generation 1
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciThis device acts as a VM interrupt controller. It provides the KVM
118c2ecf20Sopenharmony_ciinterface to configure the interrupt sources of a VM in the underlying
128c2ecf20Sopenharmony_ciPOWER9 XIVE interrupt controller.
138c2ecf20Sopenharmony_ci
148c2ecf20Sopenharmony_ciOnly one XIVE instance may be instantiated. A guest XIVE device
158c2ecf20Sopenharmony_cirequires a POWER9 host and the guest OS should have support for the
168c2ecf20Sopenharmony_ciXIVE native exploitation interrupt mode. If not, it should run using
178c2ecf20Sopenharmony_cithe legacy interrupt mode, referred as XICS (POWER7/8).
188c2ecf20Sopenharmony_ci
198c2ecf20Sopenharmony_ci* Device Mappings
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_ci  The KVM device exposes different MMIO ranges of the XIVE HW which
228c2ecf20Sopenharmony_ci  are required for interrupt management. These are exposed to the
238c2ecf20Sopenharmony_ci  guest in VMAs populated with a custom VM fault handler.
248c2ecf20Sopenharmony_ci
258c2ecf20Sopenharmony_ci  1. Thread Interrupt Management Area (TIMA)
268c2ecf20Sopenharmony_ci
278c2ecf20Sopenharmony_ci  Each thread has an associated Thread Interrupt Management context
288c2ecf20Sopenharmony_ci  composed of a set of registers. These registers let the thread
298c2ecf20Sopenharmony_ci  handle priority management and interrupt acknowledgment. The most
308c2ecf20Sopenharmony_ci  important are :
318c2ecf20Sopenharmony_ci
328c2ecf20Sopenharmony_ci      - Interrupt Pending Buffer     (IPB)
338c2ecf20Sopenharmony_ci      - Current Processor Priority   (CPPR)
348c2ecf20Sopenharmony_ci      - Notification Source Register (NSR)
358c2ecf20Sopenharmony_ci
368c2ecf20Sopenharmony_ci  They are exposed to software in four different pages each proposing
378c2ecf20Sopenharmony_ci  a view with a different privilege. The first page is for the
388c2ecf20Sopenharmony_ci  physical thread context and the second for the hypervisor. Only the
398c2ecf20Sopenharmony_ci  third (operating system) and the fourth (user level) are exposed the
408c2ecf20Sopenharmony_ci  guest.
418c2ecf20Sopenharmony_ci
428c2ecf20Sopenharmony_ci  2. Event State Buffer (ESB)
438c2ecf20Sopenharmony_ci
448c2ecf20Sopenharmony_ci  Each source is associated with an Event State Buffer (ESB) with
458c2ecf20Sopenharmony_ci  either a pair of even/odd pair of pages which provides commands to
468c2ecf20Sopenharmony_ci  manage the source: to trigger, to EOI, to turn off the source for
478c2ecf20Sopenharmony_ci  instance.
488c2ecf20Sopenharmony_ci
498c2ecf20Sopenharmony_ci  3. Device pass-through
508c2ecf20Sopenharmony_ci
518c2ecf20Sopenharmony_ci  When a device is passed-through into the guest, the source
528c2ecf20Sopenharmony_ci  interrupts are from a different HW controller (PHB4) and the ESB
538c2ecf20Sopenharmony_ci  pages exposed to the guest should accommadate this change.
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ci  The passthru_irq helpers, kvmppc_xive_set_mapped() and
568c2ecf20Sopenharmony_ci  kvmppc_xive_clr_mapped() are called when the device HW irqs are
578c2ecf20Sopenharmony_ci  mapped into or unmapped from the guest IRQ number space. The KVM
588c2ecf20Sopenharmony_ci  device extends these helpers to clear the ESB pages of the guest IRQ
598c2ecf20Sopenharmony_ci  number being mapped and then lets the VM fault handler repopulate.
608c2ecf20Sopenharmony_ci  The handler will insert the ESB page corresponding to the HW
618c2ecf20Sopenharmony_ci  interrupt of the device being passed-through or the initial IPI ESB
628c2ecf20Sopenharmony_ci  page if the device has being removed.
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ci  The ESB remapping is fully transparent to the guest and the OS
658c2ecf20Sopenharmony_ci  device driver. All handling is done within VFIO and the above
668c2ecf20Sopenharmony_ci  helpers in KVM-PPC.
678c2ecf20Sopenharmony_ci
688c2ecf20Sopenharmony_ci* Groups:
698c2ecf20Sopenharmony_ci
708c2ecf20Sopenharmony_ci1. KVM_DEV_XIVE_GRP_CTRL
718c2ecf20Sopenharmony_ci     Provides global controls on the device
728c2ecf20Sopenharmony_ci
738c2ecf20Sopenharmony_ci  Attributes:
748c2ecf20Sopenharmony_ci    1.1 KVM_DEV_XIVE_RESET (write only)
758c2ecf20Sopenharmony_ci    Resets the interrupt controller configuration for sources and event
768c2ecf20Sopenharmony_ci    queues. To be used by kexec and kdump.
778c2ecf20Sopenharmony_ci
788c2ecf20Sopenharmony_ci    Errors: none
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ci    1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
818c2ecf20Sopenharmony_ci    Sync all the sources and queues and mark the EQ pages dirty. This
828c2ecf20Sopenharmony_ci    to make sure that a consistent memory state is captured when
838c2ecf20Sopenharmony_ci    migrating the VM.
848c2ecf20Sopenharmony_ci
858c2ecf20Sopenharmony_ci    Errors: none
868c2ecf20Sopenharmony_ci
878c2ecf20Sopenharmony_ci    1.3 KVM_DEV_XIVE_NR_SERVERS (write only)
888c2ecf20Sopenharmony_ci    The kvm_device_attr.addr points to a __u32 value which is the number of
898c2ecf20Sopenharmony_ci    interrupt server numbers (ie, highest possible vcpu id plus one).
908c2ecf20Sopenharmony_ci
918c2ecf20Sopenharmony_ci    Errors:
928c2ecf20Sopenharmony_ci
938c2ecf20Sopenharmony_ci      =======  ==========================================
948c2ecf20Sopenharmony_ci      -EINVAL  Value greater than KVM_MAX_VCPU_ID.
958c2ecf20Sopenharmony_ci      -EFAULT  Invalid user pointer for attr->addr.
968c2ecf20Sopenharmony_ci      -EBUSY   A vCPU is already connected to the device.
978c2ecf20Sopenharmony_ci      =======  ==========================================
988c2ecf20Sopenharmony_ci
998c2ecf20Sopenharmony_ci2. KVM_DEV_XIVE_GRP_SOURCE (write only)
1008c2ecf20Sopenharmony_ci     Initializes a new source in the XIVE device and mask it.
1018c2ecf20Sopenharmony_ci
1028c2ecf20Sopenharmony_ci  Attributes:
1038c2ecf20Sopenharmony_ci    Interrupt source number  (64-bit)
1048c2ecf20Sopenharmony_ci
1058c2ecf20Sopenharmony_ci  The kvm_device_attr.addr points to a __u64 value::
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ci    bits:     | 63   ....  2 |   1   |   0
1088c2ecf20Sopenharmony_ci    values:   |    unused    | level | type
1098c2ecf20Sopenharmony_ci
1108c2ecf20Sopenharmony_ci  - type:  0:MSI 1:LSI
1118c2ecf20Sopenharmony_ci  - level: assertion level in case of an LSI.
1128c2ecf20Sopenharmony_ci
1138c2ecf20Sopenharmony_ci  Errors:
1148c2ecf20Sopenharmony_ci
1158c2ecf20Sopenharmony_ci    =======  ==========================================
1168c2ecf20Sopenharmony_ci    -E2BIG   Interrupt source number is out of range
1178c2ecf20Sopenharmony_ci    -ENOMEM  Could not create a new source block
1188c2ecf20Sopenharmony_ci    -EFAULT  Invalid user pointer for attr->addr.
1198c2ecf20Sopenharmony_ci    -ENXIO   Could not allocate underlying HW interrupt
1208c2ecf20Sopenharmony_ci    =======  ==========================================
1218c2ecf20Sopenharmony_ci
1228c2ecf20Sopenharmony_ci3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
1238c2ecf20Sopenharmony_ci     Configures source targeting
1248c2ecf20Sopenharmony_ci
1258c2ecf20Sopenharmony_ci  Attributes:
1268c2ecf20Sopenharmony_ci    Interrupt source number  (64-bit)
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ci  The kvm_device_attr.addr points to a __u64 value::
1298c2ecf20Sopenharmony_ci
1308c2ecf20Sopenharmony_ci    bits:     | 63   ....  33 |  32  | 31 .. 3 |  2 .. 0
1318c2ecf20Sopenharmony_ci    values:   |    eisn       | mask |  server | priority
1328c2ecf20Sopenharmony_ci
1338c2ecf20Sopenharmony_ci  - priority: 0-7 interrupt priority level
1348c2ecf20Sopenharmony_ci  - server: CPU number chosen to handle the interrupt
1358c2ecf20Sopenharmony_ci  - mask: mask flag (unused)
1368c2ecf20Sopenharmony_ci  - eisn: Effective Interrupt Source Number
1378c2ecf20Sopenharmony_ci
1388c2ecf20Sopenharmony_ci  Errors:
1398c2ecf20Sopenharmony_ci
1408c2ecf20Sopenharmony_ci    =======  =======================================================
1418c2ecf20Sopenharmony_ci    -ENOENT  Unknown source number
1428c2ecf20Sopenharmony_ci    -EINVAL  Not initialized source number
1438c2ecf20Sopenharmony_ci    -EINVAL  Invalid priority
1448c2ecf20Sopenharmony_ci    -EINVAL  Invalid CPU number.
1458c2ecf20Sopenharmony_ci    -EFAULT  Invalid user pointer for attr->addr.
1468c2ecf20Sopenharmony_ci    -ENXIO   CPU event queues not configured or configuration of the
1478c2ecf20Sopenharmony_ci	     underlying HW interrupt failed
1488c2ecf20Sopenharmony_ci    -EBUSY   No CPU available to serve interrupt
1498c2ecf20Sopenharmony_ci    =======  =======================================================
1508c2ecf20Sopenharmony_ci
1518c2ecf20Sopenharmony_ci4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
1528c2ecf20Sopenharmony_ci     Configures an event queue of a CPU
1538c2ecf20Sopenharmony_ci
1548c2ecf20Sopenharmony_ci  Attributes:
1558c2ecf20Sopenharmony_ci    EQ descriptor identifier (64-bit)
1568c2ecf20Sopenharmony_ci
1578c2ecf20Sopenharmony_ci  The EQ descriptor identifier is a tuple (server, priority)::
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ci    bits:     | 63   ....  32 | 31 .. 3 |  2 .. 0
1608c2ecf20Sopenharmony_ci    values:   |    unused     |  server | priority
1618c2ecf20Sopenharmony_ci
1628c2ecf20Sopenharmony_ci  The kvm_device_attr.addr points to::
1638c2ecf20Sopenharmony_ci
1648c2ecf20Sopenharmony_ci    struct kvm_ppc_xive_eq {
1658c2ecf20Sopenharmony_ci	__u32 flags;
1668c2ecf20Sopenharmony_ci	__u32 qshift;
1678c2ecf20Sopenharmony_ci	__u64 qaddr;
1688c2ecf20Sopenharmony_ci	__u32 qtoggle;
1698c2ecf20Sopenharmony_ci	__u32 qindex;
1708c2ecf20Sopenharmony_ci	__u8  pad[40];
1718c2ecf20Sopenharmony_ci    };
1728c2ecf20Sopenharmony_ci
1738c2ecf20Sopenharmony_ci  - flags: queue flags
1748c2ecf20Sopenharmony_ci      KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
1758c2ecf20Sopenharmony_ci	forces notification without using the coalescing mechanism
1768c2ecf20Sopenharmony_ci	provided by the XIVE END ESBs.
1778c2ecf20Sopenharmony_ci  - qshift: queue size (power of 2)
1788c2ecf20Sopenharmony_ci  - qaddr: real address of queue
1798c2ecf20Sopenharmony_ci  - qtoggle: current queue toggle bit
1808c2ecf20Sopenharmony_ci  - qindex: current queue index
1818c2ecf20Sopenharmony_ci  - pad: reserved for future use
1828c2ecf20Sopenharmony_ci
1838c2ecf20Sopenharmony_ci  Errors:
1848c2ecf20Sopenharmony_ci
1858c2ecf20Sopenharmony_ci    =======  =========================================
1868c2ecf20Sopenharmony_ci    -ENOENT  Invalid CPU number
1878c2ecf20Sopenharmony_ci    -EINVAL  Invalid priority
1888c2ecf20Sopenharmony_ci    -EINVAL  Invalid flags
1898c2ecf20Sopenharmony_ci    -EINVAL  Invalid queue size
1908c2ecf20Sopenharmony_ci    -EINVAL  Invalid queue address
1918c2ecf20Sopenharmony_ci    -EFAULT  Invalid user pointer for attr->addr.
1928c2ecf20Sopenharmony_ci    -EIO     Configuration of the underlying HW failed
1938c2ecf20Sopenharmony_ci    =======  =========================================
1948c2ecf20Sopenharmony_ci
1958c2ecf20Sopenharmony_ci5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
1968c2ecf20Sopenharmony_ci     Synchronize the source to flush event notifications
1978c2ecf20Sopenharmony_ci
1988c2ecf20Sopenharmony_ci  Attributes:
1998c2ecf20Sopenharmony_ci    Interrupt source number  (64-bit)
2008c2ecf20Sopenharmony_ci
2018c2ecf20Sopenharmony_ci  Errors:
2028c2ecf20Sopenharmony_ci
2038c2ecf20Sopenharmony_ci    =======  =============================
2048c2ecf20Sopenharmony_ci    -ENOENT  Unknown source number
2058c2ecf20Sopenharmony_ci    -EINVAL  Not initialized source number
2068c2ecf20Sopenharmony_ci    =======  =============================
2078c2ecf20Sopenharmony_ci
2088c2ecf20Sopenharmony_ci* VCPU state
2098c2ecf20Sopenharmony_ci
2108c2ecf20Sopenharmony_ci  The XIVE IC maintains VP interrupt state in an internal structure
2118c2ecf20Sopenharmony_ci  called the NVT. When a VP is not dispatched on a HW processor
2128c2ecf20Sopenharmony_ci  thread, this structure can be updated by HW if the VP is the target
2138c2ecf20Sopenharmony_ci  of an event notification.
2148c2ecf20Sopenharmony_ci
2158c2ecf20Sopenharmony_ci  It is important for migration to capture the cached IPB from the NVT
2168c2ecf20Sopenharmony_ci  as it synthesizes the priorities of the pending interrupts. We
2178c2ecf20Sopenharmony_ci  capture a bit more to report debug information.
2188c2ecf20Sopenharmony_ci
2198c2ecf20Sopenharmony_ci  KVM_REG_PPC_VP_STATE (2 * 64bits)::
2208c2ecf20Sopenharmony_ci
2218c2ecf20Sopenharmony_ci    bits:     |  63  ....  32  |  31  ....  0  |
2228c2ecf20Sopenharmony_ci    values:   |   TIMA word0   |   TIMA word1  |
2238c2ecf20Sopenharmony_ci    bits:     | 127       ..........       64  |
2248c2ecf20Sopenharmony_ci    values:   |            unused              |
2258c2ecf20Sopenharmony_ci
2268c2ecf20Sopenharmony_ci* Migration:
2278c2ecf20Sopenharmony_ci
2288c2ecf20Sopenharmony_ci  Saving the state of a VM using the XIVE native exploitation mode
2298c2ecf20Sopenharmony_ci  should follow a specific sequence. When the VM is stopped :
2308c2ecf20Sopenharmony_ci
2318c2ecf20Sopenharmony_ci  1. Mask all sources (PQ=01) to stop the flow of events.
2328c2ecf20Sopenharmony_ci
2338c2ecf20Sopenharmony_ci  2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
2348c2ecf20Sopenharmony_ci  flush any in-flight event notification and to stabilize the EQs. At
2358c2ecf20Sopenharmony_ci  this stage, the EQ pages are marked dirty to make sure they are
2368c2ecf20Sopenharmony_ci  transferred in the migration sequence.
2378c2ecf20Sopenharmony_ci
2388c2ecf20Sopenharmony_ci  3. Capture the state of the source targeting, the EQs configuration
2398c2ecf20Sopenharmony_ci  and the state of thread interrupt context registers.
2408c2ecf20Sopenharmony_ci
2418c2ecf20Sopenharmony_ci  Restore is similar:
2428c2ecf20Sopenharmony_ci
2438c2ecf20Sopenharmony_ci  1. Restore the EQ configuration. As targeting depends on it.
2448c2ecf20Sopenharmony_ci  2. Restore targeting
2458c2ecf20Sopenharmony_ci  3. Restore the thread interrupt contexts
2468c2ecf20Sopenharmony_ci  4. Restore the source states
2478c2ecf20Sopenharmony_ci  5. Let the vCPU run
248