162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0-only
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci=============
462306a36Sopenharmony_ci QAIC driver
562306a36Sopenharmony_ci=============
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciThe QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
862306a36Sopenharmony_ciaccelerator products.
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciInterrupts
1162306a36Sopenharmony_ci==========
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciWhile the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
1462306a36Sopenharmony_cimechanism, it is still possible for an IRQ storm to occur. A storm can happen
1562306a36Sopenharmony_ciif the workload is particularly quick, and the host is responsive. If the host
1662306a36Sopenharmony_cican drain the response FIFO as quickly as the device can insert elements into
1762306a36Sopenharmony_ciit, then the device will frequently transition the response FIFO from empty to
1862306a36Sopenharmony_cinon-empty and generate MSIs at a rate equivalent to the speed of the
1962306a36Sopenharmony_ciworkload's ability to process inputs. The lprnet (license plate reader network)
2062306a36Sopenharmony_ciworkload is known to trigger this condition, and can generate in excess of 100k
2162306a36Sopenharmony_ciMSIs per second. It has been observed that most systems cannot tolerate this
2262306a36Sopenharmony_cifor long, and will crash due to some form of watchdog due to the overhead of
2362306a36Sopenharmony_cithe interrupt controller interrupting the host CPU.
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ciTo mitigate this issue, the QAIC driver implements specific IRQ handling. When
2662306a36Sopenharmony_ciQAIC receives an IRQ, it disables that line. This prevents the interrupt
2762306a36Sopenharmony_cicontroller from interrupting the CPU. Then AIC drains the FIFO. Once the FIFO
2862306a36Sopenharmony_ciis drained, QAIC implements a "last chance" polling algorithm where QAIC will
2962306a36Sopenharmony_cisleep for a time to see if the workload will generate more activity. The IRQ
3062306a36Sopenharmony_ciline remains disabled during this time. If no activity is detected, QAIC exits
3162306a36Sopenharmony_cipolling mode and reenables the IRQ line.
3262306a36Sopenharmony_ci
3362306a36Sopenharmony_ciThis mitigation in QAIC is very effective. The same lprnet usecase that
3462306a36Sopenharmony_cigenerates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
3562306a36Sopenharmony_ciIRQs over 5 minutes while keeping the host system stable, and having the same
3662306a36Sopenharmony_ciworkload throughput performance (within run to run noise variation).
3762306a36Sopenharmony_ci
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciNeural Network Control (NNC) Protocol
4062306a36Sopenharmony_ci=====================================
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ciThe implementation of NNC is split between the KMD (QAIC) and UMD. In general
4362306a36Sopenharmony_ciQAIC understands how to encode/decode NNC wire protocol, and elements of the
4462306a36Sopenharmony_ciprotocol which require kernel space knowledge to process (for example, mapping
4562306a36Sopenharmony_cihost memory to device IOVAs). QAIC understands the structure of a message, and
4662306a36Sopenharmony_ciall of the transactions. QAIC does not understand commands (the payload of a
4762306a36Sopenharmony_cipassthrough transaction).
4862306a36Sopenharmony_ci
4962306a36Sopenharmony_ciQAIC handles and enforces the required little endianness and 64-bit alignment,
5062306a36Sopenharmony_cito the degree that it can. Since QAIC does not know the contents of a
5162306a36Sopenharmony_cipassthrough transaction, it relies on the UMD to satisfy the requirements.
5262306a36Sopenharmony_ci
5362306a36Sopenharmony_ciThe terminate transaction is of particular use to QAIC. QAIC is not aware of
5462306a36Sopenharmony_cithe resources that are loaded onto a device since the majority of that activity
5562306a36Sopenharmony_cioccurs within NNC commands. As a result, QAIC does not have the means to
5662306a36Sopenharmony_ciroll back userspace activity. To ensure that a userspace client's resources
5762306a36Sopenharmony_ciare fully released in the case of a process crash, or a bug, QAIC uses the
5862306a36Sopenharmony_citerminate command to let QSM know when a user has gone away, and the resources
5962306a36Sopenharmony_cican be released.
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ciQSM can report a version number of the NNC protocol it supports. This is in the
6262306a36Sopenharmony_ciform of a Major number and a Minor number.
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ciMajor number updates indicate changes to the NNC protocol which impact the
6562306a36Sopenharmony_cimessage format, or transactions (impacts QAIC).
6662306a36Sopenharmony_ci
6762306a36Sopenharmony_ciMinor number updates indicate changes to the NNC protocol which impact the
6862306a36Sopenharmony_cicommands (does not impact QAIC).
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ciuAPI
7162306a36Sopenharmony_ci====
7262306a36Sopenharmony_ci
7362306a36Sopenharmony_ciQAIC defines a number of driver specific IOCTLs as part of the userspace API.
7462306a36Sopenharmony_ciThis section describes those APIs.
7562306a36Sopenharmony_ci
7662306a36Sopenharmony_ciDRM_IOCTL_QAIC_MANAGE
7762306a36Sopenharmony_ci  This IOCTL allows userspace to send a NNC request to the QSM. The call will
7862306a36Sopenharmony_ci  block until a response is received, or the request has timed out.
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ciDRM_IOCTL_QAIC_CREATE_BO
8162306a36Sopenharmony_ci  This IOCTL allows userspace to allocate a buffer object (BO) which can send
8262306a36Sopenharmony_ci  or receive data from a workload. The call will return a GEM handle that
8362306a36Sopenharmony_ci  represents the allocated buffer. The BO is not usable until it has been
8462306a36Sopenharmony_ci  sliced (see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
8562306a36Sopenharmony_ci
8662306a36Sopenharmony_ciDRM_IOCTL_QAIC_MMAP_BO
8762306a36Sopenharmony_ci  This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
8862306a36Sopenharmony_ci  userspace process.
8962306a36Sopenharmony_ci
9062306a36Sopenharmony_ciDRM_IOCTL_QAIC_ATTACH_SLICE_BO
9162306a36Sopenharmony_ci  This IOCTL allows userspace to slice a BO in preparation for sending the BO
9262306a36Sopenharmony_ci  to the device. Slicing is the operation of describing what portions of a BO
9362306a36Sopenharmony_ci  get sent where to a workload. This requires a set of DMA transfers for the
9462306a36Sopenharmony_ci  DMA Bridge, and as such, locks the BO to a specific DBC.
9562306a36Sopenharmony_ci
9662306a36Sopenharmony_ciDRM_IOCTL_QAIC_EXECUTE_BO
9762306a36Sopenharmony_ci  This IOCTL allows userspace to submit a set of sliced BOs to the device. The
9862306a36Sopenharmony_ci  call is non-blocking. Success only indicates that the BOs have been queued
9962306a36Sopenharmony_ci  to the device, but does not guarantee they have been executed.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ciDRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO
10262306a36Sopenharmony_ci  This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace
10362306a36Sopenharmony_ci  to shrink the BOs sent to the device for this specific call. If a BO
10462306a36Sopenharmony_ci  typically has N inputs, but only a subset of those is available, this IOCTL
10562306a36Sopenharmony_ci  allows userspace to indicate that only the first M bytes of the BO should be
10662306a36Sopenharmony_ci  sent to the device to minimize data transfer overhead. This IOCTL dynamically
10762306a36Sopenharmony_ci  recomputes the slicing, and therefore has some processing overhead before the
10862306a36Sopenharmony_ci  BOs can be queued to the device.
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ciDRM_IOCTL_QAIC_WAIT_BO
11162306a36Sopenharmony_ci  This IOCTL allows userspace to determine when a particular BO has been
11262306a36Sopenharmony_ci  processed by the device. The call will block until either the BO has been
11362306a36Sopenharmony_ci  processed and can be re-queued to the device, or a timeout occurs.
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ciDRM_IOCTL_QAIC_PERF_STATS_BO
11662306a36Sopenharmony_ci  This IOCTL allows userspace to collect performance statistics on the most
11762306a36Sopenharmony_ci  recent execution of a BO. This allows userspace to construct an end to end
11862306a36Sopenharmony_ci  timeline of the BO processing for a performance analysis.
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ciDRM_IOCTL_QAIC_PART_DEV
12162306a36Sopenharmony_ci  This IOCTL allows userspace to request a duplicate "shadow device". This extra
12262306a36Sopenharmony_ci  accelN device is associated with a specific partition of resources on the
12362306a36Sopenharmony_ci  AIC100 device and can be used for limiting a process to some subset of
12462306a36Sopenharmony_ci  resources.
12562306a36Sopenharmony_ci
12662306a36Sopenharmony_ciUserspace Client Isolation
12762306a36Sopenharmony_ci==========================
12862306a36Sopenharmony_ci
12962306a36Sopenharmony_ciAIC100 supports multiple clients. Multiple DBCs can be consumed by a single
13062306a36Sopenharmony_ciclient, and multiple clients can each consume one or more DBCs. Workloads
13162306a36Sopenharmony_cimay contain sensitive information therefore only the client that owns the
13262306a36Sopenharmony_ciworkload should be allowed to interface with the DBC.
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ciClients are identified by the instance associated with their open(). A client
13562306a36Sopenharmony_cimay only use memory they allocate, and DBCs that are assigned to their
13662306a36Sopenharmony_ciworkloads. Attempts to access resources assigned to other clients will be
13762306a36Sopenharmony_cirejected.
13862306a36Sopenharmony_ci
13962306a36Sopenharmony_ciModule parameters
14062306a36Sopenharmony_ci=================
14162306a36Sopenharmony_ci
14262306a36Sopenharmony_ciQAIC supports the following module parameters:
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ci**datapath_polling (bool)**
14562306a36Sopenharmony_ci
14662306a36Sopenharmony_ciConfigures QAIC to use a polling thread for datapath events instead of relying
14762306a36Sopenharmony_cion the device interrupts. Useful for platforms with broken multiMSI. Must be
14862306a36Sopenharmony_ciset at QAIC driver initialization. Default is 0 (off).
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_ci**mhi_timeout_ms (unsigned int)**
15162306a36Sopenharmony_ci
15262306a36Sopenharmony_ciSets the timeout value for MHI operations in milliseconds (ms). Must be set
15362306a36Sopenharmony_ciat the time the driver detects a device. Default is 2000 (2 seconds).
15462306a36Sopenharmony_ci
15562306a36Sopenharmony_ci**control_resp_timeout_s (unsigned int)**
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ciSets the timeout value for QSM responses to NNC messages in seconds (s). Must
15862306a36Sopenharmony_cibe set at the time the driver is sending a request to QSM. Default is 60 (one
15962306a36Sopenharmony_ciminute).
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ci**wait_exec_default_timeout_ms (unsigned int)**
16262306a36Sopenharmony_ci
16362306a36Sopenharmony_ciSets the default timeout for the wait_exec ioctl in milliseconds (ms). Must be
16462306a36Sopenharmony_ciset prior to the waic_exec ioctl call. A value specified in the ioctl call
16562306a36Sopenharmony_cioverrides this for that call. Default is 5000 (5 seconds).
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ci**datapath_poll_interval_us (unsigned int)**
16862306a36Sopenharmony_ci
16962306a36Sopenharmony_ciSets the polling interval in microseconds (us) when datapath polling is active.
17062306a36Sopenharmony_ciTakes effect at the next polling interval. Default is 100 (100 us).
171