162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0-only
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci===============================
462306a36Sopenharmony_ci Qualcomm Cloud AI 100 (AIC100)
562306a36Sopenharmony_ci===============================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciOverview
862306a36Sopenharmony_ci========
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciThe Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
1162306a36Sopenharmony_ciSnapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
1262306a36Sopenharmony_cithe purpose of efficiently running Artificial Intelligence (AI) Deep Learning
1362306a36Sopenharmony_ciinference workloads. They are AI accelerators.
1462306a36Sopenharmony_ci
1562306a36Sopenharmony_ciThe PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
1662306a36Sopenharmony_ci(x8). An individual SoC on a card can have up to 16 NSPs for running workloads.
1762306a36Sopenharmony_ciEach SoC has an A53 management CPU. On card, there can be up to 32 GB of DDR.
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciMultiple AIC100 cards can be hosted in a single system to scale overall
2062306a36Sopenharmony_ciperformance. AIC100 cards are multi-user capable and able to execute workloads
2162306a36Sopenharmony_cifrom multiple users in a concurrent manner.
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ciHardware Description
2462306a36Sopenharmony_ci====================
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ciAn AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
2762306a36Sopenharmony_ciperipherals (PMICs, etc).
2862306a36Sopenharmony_ci
2962306a36Sopenharmony_ciAn AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
3062306a36Sopenharmony_cior a Dual M.2 card. Both use PCIe to connect to the host system.
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ciAs a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
3362306a36Sopenharmony_ciDeviceID(DID) combination to uniquely identify itself to the host. AIC100
3462306a36Sopenharmony_ciuses the standard Qualcomm VID (0x17cb). All AIC100 SKUs use the same
3562306a36Sopenharmony_ciAIC100 DID (0xa100).
3662306a36Sopenharmony_ci
3762306a36Sopenharmony_ciAIC100 does not implement FLR (function level reset).
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciAIC100 implements MSI but does not implement MSI-X. AIC100 requires 17 MSIs to
4062306a36Sopenharmony_cioperate (1 for MHI, 16 for the DMA Bridge).
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ciAs a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
4362306a36Sopenharmony_cihardware. AIC100 provides 3, 64-bit BARs.
4462306a36Sopenharmony_ci
4562306a36Sopenharmony_ci* The first BAR is 4K in size, and exposes the MHI interface to the host.
4662306a36Sopenharmony_ci
4762306a36Sopenharmony_ci* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
4862306a36Sopenharmony_ci  host.
4962306a36Sopenharmony_ci
5062306a36Sopenharmony_ci* The third BAR is variable in size based on an individual AIC100's
5162306a36Sopenharmony_ci  configuration, but defaults to 64K. This BAR currently has no purpose.
5262306a36Sopenharmony_ci
5362306a36Sopenharmony_ciFrom the host perspective, AIC100 has several key hardware components -
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ci* MHI (Modem Host Interface)
5662306a36Sopenharmony_ci* QSM (QAIC Service Manager)
5762306a36Sopenharmony_ci* NSPs (Neural Signal Processor)
5862306a36Sopenharmony_ci* DMA Bridge
5962306a36Sopenharmony_ci* DDR
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ciMHI
6262306a36Sopenharmony_ci---
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ciAIC100 has one MHI interface over PCIe. MHI itself is documented at
6562306a36Sopenharmony_ciDocumentation/mhi/index.rst MHI is the mechanism the host uses to communicate
6662306a36Sopenharmony_ciwith the QSM. Except for workload data via the DMA Bridge, all interaction with
6762306a36Sopenharmony_cithe device occurs via MHI.
6862306a36Sopenharmony_ci
6962306a36Sopenharmony_ciQSM
7062306a36Sopenharmony_ci---
7162306a36Sopenharmony_ci
7262306a36Sopenharmony_ciQAIC Service Manager. This is an ARM A53 CPU that runs the primary
7362306a36Sopenharmony_cifirmware of the card and performs on-card management tasks. It also
7462306a36Sopenharmony_cicommunicates with the host via MHI. Each AIC100 has one of
7562306a36Sopenharmony_cithese.
7662306a36Sopenharmony_ci
7762306a36Sopenharmony_ciNSP
7862306a36Sopenharmony_ci---
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ciNeural Signal Processor. Each AIC100 has up to 16 of these. These are
8162306a36Sopenharmony_cithe processors that run the workloads on AIC100. Each NSP is a Qualcomm Hexagon
8262306a36Sopenharmony_ci(Q6) DSP with HVX and HMX. Each NSP can only run one workload at a time, but
8362306a36Sopenharmony_cimultiple NSPs may be assigned to a single workload. Since each NSP can only run
8462306a36Sopenharmony_cione workload, AIC100 is limited to 16 concurrent workloads. Workload
8562306a36Sopenharmony_ci"scheduling" is under the purview of the host. AIC100 does not automatically
8662306a36Sopenharmony_citimeslice.
8762306a36Sopenharmony_ci
8862306a36Sopenharmony_ciDMA Bridge
8962306a36Sopenharmony_ci----------
9062306a36Sopenharmony_ci
9162306a36Sopenharmony_ciThe DMA Bridge is custom DMA engine that manages the flow of data
9262306a36Sopenharmony_ciin and out of workloads. AIC100 has one of these. The DMA Bridge has 16
9362306a36Sopenharmony_cichannels, each consisting of a set of request/response FIFOs. Each active
9462306a36Sopenharmony_ciworkload is assigned a single DMA Bridge channel. The DMA Bridge exposes
9562306a36Sopenharmony_cihardware registers to manage the FIFOs (head/tail pointers), but requires host
9662306a36Sopenharmony_cimemory to store the FIFOs.
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ciDDR
9962306a36Sopenharmony_ci---
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ciAIC100 has on-card DDR. In total, an AIC100 can have up to 32 GB of DDR.
10262306a36Sopenharmony_ciThis DDR is used to store workloads, data for the workloads, and is used by the
10362306a36Sopenharmony_ciQSM for managing the device. NSPs are granted access to sections of the DDR by
10462306a36Sopenharmony_cithe QSM. The host does not have direct access to the DDR, and must make
10562306a36Sopenharmony_cirequests to the QSM to transfer data to the DDR.
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ciHigh-level Use Flow
10862306a36Sopenharmony_ci===================
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ciAIC100 is a multi-user, programmable accelerator typically used for running
11162306a36Sopenharmony_cineural networks in inferencing mode to efficiently perform AI operations.
11262306a36Sopenharmony_ciAIC100 is not intended for training neural networks. AIC100 can be utilized
11362306a36Sopenharmony_cifor generic compute workloads.
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ciAssuming a user wants to utilize AIC100, they would follow these steps:
11662306a36Sopenharmony_ci
11762306a36Sopenharmony_ci1. Compile the workload into an ELF targeting the NSP(s)
11862306a36Sopenharmony_ci2. Make requests to the QSM to load the workload and related artifacts into the
11962306a36Sopenharmony_ci   device DDR
12062306a36Sopenharmony_ci3. Make a request to the QSM to activate the workload onto a set of idle NSPs
12162306a36Sopenharmony_ci4. Make requests to the DMA Bridge to send input data to the workload to be
12262306a36Sopenharmony_ci   processed, and other requests to receive processed output data from the
12362306a36Sopenharmony_ci   workload.
12462306a36Sopenharmony_ci5. Once the workload is no longer required, make a request to the QSM to
12562306a36Sopenharmony_ci   deactivate the workload, thus putting the NSPs back into an idle state.
12662306a36Sopenharmony_ci6. Once the workload and related artifacts are no longer needed for future
12762306a36Sopenharmony_ci   sessions, make requests to the QSM to unload the data from DDR. This frees
12862306a36Sopenharmony_ci   the DDR to be used by other users.
12962306a36Sopenharmony_ci
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ciBoot Flow
13262306a36Sopenharmony_ci=========
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ciAIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_ciWhen AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
13762306a36Sopenharmony_cifrom ROM. PBL enumerates the PCIe link, and initializes the BHI (Boot Host
13862306a36Sopenharmony_ciInterface) component of MHI.
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ciUsing BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
14162306a36Sopenharmony_ciimage. The PBL pulls the image from the host, validates it, and begins
14262306a36Sopenharmony_ciexecution of SBL.
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ciSBL initializes MHI, and uses MHI to notify the host that the device has entered
14562306a36Sopenharmony_cithe SBL stage. SBL performs a number of operations:
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ci* SBL initializes the majority of hardware (anything PBL left uninitialized),
14862306a36Sopenharmony_ci  including DDR.
14962306a36Sopenharmony_ci* SBL offloads the bootlog to the host.
15062306a36Sopenharmony_ci* SBL synchronizes timestamps with the host for future logging.
15162306a36Sopenharmony_ci* SBL uses the Sahara protocol to obtain the runtime firmware images from the
15262306a36Sopenharmony_ci  host.
15362306a36Sopenharmony_ci
15462306a36Sopenharmony_ciOnce SBL has obtained and validated the runtime firmware, it brings the NSPs out
15562306a36Sopenharmony_ciof reset, and jumps into the QSM.
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ciThe QSM uses MHI to notify the host that the device has entered the QSM stage
15862306a36Sopenharmony_ci(AMSS in MHI terms). At this point, the AIC100 device is fully functional, and
15962306a36Sopenharmony_ciready to process workloads.
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ciUserspace components
16262306a36Sopenharmony_ci====================
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ciCompiler
16562306a36Sopenharmony_ci--------
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ciAn open compiler for AIC100 based on upstream LLVM can be found at:
16862306a36Sopenharmony_cihttps://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
16962306a36Sopenharmony_ci
17062306a36Sopenharmony_ciUsermode Driver (UMD)
17162306a36Sopenharmony_ci---------------------
17262306a36Sopenharmony_ci
17362306a36Sopenharmony_ciAn open UMD that interfaces with the qaic kernel driver can be found at:
17462306a36Sopenharmony_cihttps://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
17562306a36Sopenharmony_ci
17662306a36Sopenharmony_ciSahara loader
17762306a36Sopenharmony_ci-------------
17862306a36Sopenharmony_ci
17962306a36Sopenharmony_ciAn open implementation of the Sahara protocol called kickstart can be found at:
18062306a36Sopenharmony_cihttps://github.com/andersson/qdl
18162306a36Sopenharmony_ci
18262306a36Sopenharmony_ciMHI Channels
18362306a36Sopenharmony_ci============
18462306a36Sopenharmony_ci
18562306a36Sopenharmony_ciAIC100 defines a number of MHI channels for different purposes. This is a list
18662306a36Sopenharmony_ciof the defined channels, and their uses.
18762306a36Sopenharmony_ci
18862306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
18962306a36Sopenharmony_ci| Channel name   | IDs     | EEs      | Purpose                                |
19062306a36Sopenharmony_ci+================+=========+==========+========================================+
19162306a36Sopenharmony_ci| QAIC_LOOPBACK  | 0 & 1   | AMSS     | Any data sent to the device on this    |
19262306a36Sopenharmony_ci|                |         |          | channel is sent back to the host.      |
19362306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
19462306a36Sopenharmony_ci| QAIC_SAHARA    | 2 & 3   | SBL      | Used by SBL to obtain the runtime      |
19562306a36Sopenharmony_ci|                |         |          | firmware from the host.                |
19662306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
19762306a36Sopenharmony_ci| QAIC_DIAG      | 4 & 5   | AMSS     | Used to communicate with QSM via the   |
19862306a36Sopenharmony_ci|                |         |          | DIAG protocol.                         |
19962306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
20062306a36Sopenharmony_ci| QAIC_SSR       | 6 & 7   | AMSS     | Used to notify the host of subsystem   |
20162306a36Sopenharmony_ci|                |         |          | restart events, and to offload SSR     |
20262306a36Sopenharmony_ci|                |         |          | crashdumps.                            |
20362306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
20462306a36Sopenharmony_ci| QAIC_QDSS      | 8 & 9   | AMSS     | Used for the Qualcomm Debug Subsystem. |
20562306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
20662306a36Sopenharmony_ci| QAIC_CONTROL   | 10 & 11 | AMSS     | Used for the Neural Network Control    |
20762306a36Sopenharmony_ci|                |         |          | (NNC) protocol. This is the primary    |
20862306a36Sopenharmony_ci|                |         |          | channel between host and QSM for       |
20962306a36Sopenharmony_ci|                |         |          | managing workloads.                    |
21062306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
21162306a36Sopenharmony_ci| QAIC_LOGGING   | 12 & 13 | SBL      | Used by the SBL to send the bootlog to |
21262306a36Sopenharmony_ci|                |         |          | the host.                              |
21362306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
21462306a36Sopenharmony_ci| QAIC_STATUS    | 14 & 15 | AMSS     | Used to notify the host of Reliability,|
21562306a36Sopenharmony_ci|                |         |          | Accessibility, Serviceability (RAS)    |
21662306a36Sopenharmony_ci|                |         |          | events.                                |
21762306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
21862306a36Sopenharmony_ci| QAIC_TELEMETRY | 16 & 17 | AMSS     | Used to get/set power/thermal/etc      |
21962306a36Sopenharmony_ci|                |         |          | attributes.                            |
22062306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
22162306a36Sopenharmony_ci| QAIC_DEBUG     | 18 & 19 | AMSS     | Not used.                              |
22262306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
22362306a36Sopenharmony_ci| QAIC_TIMESYNC  | 20 & 21 | SBL/AMSS | Used to synchronize timestamps in the  |
22462306a36Sopenharmony_ci|                |         |          | device side logs with the host time    |
22562306a36Sopenharmony_ci|                |         |          | source.                                |
22662306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+
22762306a36Sopenharmony_ci
22862306a36Sopenharmony_ciDMA Bridge
22962306a36Sopenharmony_ci==========
23062306a36Sopenharmony_ci
23162306a36Sopenharmony_ciOverview
23262306a36Sopenharmony_ci--------
23362306a36Sopenharmony_ci
23462306a36Sopenharmony_ciThe DMA Bridge is one of the main interfaces to the host from the device
23562306a36Sopenharmony_ci(the other being MHI). As part of activating a workload to run on NSPs, the QSM
23662306a36Sopenharmony_ciassigns that network a DMA Bridge channel. A workload's DMA Bridge channel
23762306a36Sopenharmony_ci(DBC for short) is solely for the use of that workload and is not shared with
23862306a36Sopenharmony_ciother workloads.
23962306a36Sopenharmony_ci
24062306a36Sopenharmony_ciEach DBC is a pair of FIFOs that manage data in and out of the workload. One
24162306a36Sopenharmony_ciFIFO is the request FIFO. The other FIFO is the response FIFO.
24262306a36Sopenharmony_ci
24362306a36Sopenharmony_ciEach DBC contains 4 registers in hardware:
24462306a36Sopenharmony_ci
24562306a36Sopenharmony_ci* Request FIFO head pointer (offset 0x0). Read only by the host. Indicates the
24662306a36Sopenharmony_ci  latest item in the FIFO the device has consumed.
24762306a36Sopenharmony_ci* Request FIFO tail pointer (offset 0x4). Read/write by the host. Host
24862306a36Sopenharmony_ci  increments this register to add new items to the FIFO.
24962306a36Sopenharmony_ci* Response FIFO head pointer (offset 0x8). Read/write by the host. Indicates
25062306a36Sopenharmony_ci  the latest item in the FIFO the host has consumed.
25162306a36Sopenharmony_ci* Response FIFO tail pointer (offset 0xc). Read only by the host. Device
25262306a36Sopenharmony_ci  increments this register to add new items to the FIFO.
25362306a36Sopenharmony_ci
25462306a36Sopenharmony_ciThe values in each register are indexes in the FIFO. To get the location of the
25562306a36Sopenharmony_ciFIFO element pointed to by the register: FIFO base address + register * element
25662306a36Sopenharmony_cisize.
25762306a36Sopenharmony_ci
25862306a36Sopenharmony_ciDBC registers are exposed to the host via the second BAR. Each DBC consumes
25962306a36Sopenharmony_ci4KB of space in the BAR.
26062306a36Sopenharmony_ci
26162306a36Sopenharmony_ciThe actual FIFOs are backed by host memory. When sending a request to the QSM
26262306a36Sopenharmony_cito activate a network, the host must donate memory to be used for the FIFOs.
26362306a36Sopenharmony_ciDue to internal mapping limitations of the device, a single contiguous chunk of
26462306a36Sopenharmony_cimemory must be provided per DBC, which hosts both FIFOs. The request FIFO will
26562306a36Sopenharmony_ciconsume the beginning of the memory chunk, and the response FIFO will consume
26662306a36Sopenharmony_cithe end of the memory chunk.
26762306a36Sopenharmony_ci
26862306a36Sopenharmony_ciRequest FIFO
26962306a36Sopenharmony_ci------------
27062306a36Sopenharmony_ci
27162306a36Sopenharmony_ciA request FIFO element has the following structure:
27262306a36Sopenharmony_ci
27362306a36Sopenharmony_ci.. code-block:: c
27462306a36Sopenharmony_ci
27562306a36Sopenharmony_ci  struct request_elem {
27662306a36Sopenharmony_ci	u16 req_id;
27762306a36Sopenharmony_ci	u8  seq_id;
27862306a36Sopenharmony_ci	u8  pcie_dma_cmd;
27962306a36Sopenharmony_ci	u32 reserved;
28062306a36Sopenharmony_ci	u64 pcie_dma_source_addr;
28162306a36Sopenharmony_ci	u64 pcie_dma_dest_addr;
28262306a36Sopenharmony_ci	u32 pcie_dma_len;
28362306a36Sopenharmony_ci	u32 reserved;
28462306a36Sopenharmony_ci	u64 doorbell_addr;
28562306a36Sopenharmony_ci	u8  doorbell_attr;
28662306a36Sopenharmony_ci	u8  reserved;
28762306a36Sopenharmony_ci	u16 reserved;
28862306a36Sopenharmony_ci	u32 doorbell_data;
28962306a36Sopenharmony_ci	u32 sem_cmd0;
29062306a36Sopenharmony_ci	u32 sem_cmd1;
29162306a36Sopenharmony_ci	u32 sem_cmd2;
29262306a36Sopenharmony_ci	u32 sem_cmd3;
29362306a36Sopenharmony_ci  };
29462306a36Sopenharmony_ci
29562306a36Sopenharmony_ciRequest field descriptions:
29662306a36Sopenharmony_ci
29762306a36Sopenharmony_cireq_id
29862306a36Sopenharmony_ci	request ID. A request FIFO element and a response FIFO element with
29962306a36Sopenharmony_ci	the same request ID refer to the same command.
30062306a36Sopenharmony_ci
30162306a36Sopenharmony_ciseq_id
30262306a36Sopenharmony_ci	sequence ID within a request. Ignored by the DMA Bridge.
30362306a36Sopenharmony_ci
30462306a36Sopenharmony_cipcie_dma_cmd
30562306a36Sopenharmony_ci	describes the DMA element of this request.
30662306a36Sopenharmony_ci
30762306a36Sopenharmony_ci	* Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
30862306a36Sopenharmony_ci	  and generates a MSI when this request is complete, and QSM
30962306a36Sopenharmony_ci	  configures the DMA Bridge to look at this bit.
31062306a36Sopenharmony_ci	* Bits(6:5) are reserved.
31162306a36Sopenharmony_ci	* Bit(4) is the completion code flag, and indicates that the DMA Bridge
31262306a36Sopenharmony_ci	  shall generate a response FIFO element when this request is
31362306a36Sopenharmony_ci	  complete.
31462306a36Sopenharmony_ci	* Bit(3) indicates if this request is a linked list transfer(0) or a bulk
31562306a36Sopenharmony_ci	  transfer(1).
31662306a36Sopenharmony_ci	* Bit(2) is reserved.
31762306a36Sopenharmony_ci	* Bits(1:0) indicate the type of transfer. No transfer(0), to device(1),
31862306a36Sopenharmony_ci	  from device(2). Value 3 is illegal.
31962306a36Sopenharmony_ci
32062306a36Sopenharmony_cipcie_dma_source_addr
32162306a36Sopenharmony_ci	source address for a bulk transfer, or the address of the linked list.
32262306a36Sopenharmony_ci
32362306a36Sopenharmony_cipcie_dma_dest_addr
32462306a36Sopenharmony_ci	destination address for a bulk transfer.
32562306a36Sopenharmony_ci
32662306a36Sopenharmony_cipcie_dma_len
32762306a36Sopenharmony_ci	length of the bulk transfer. Note that the size of this field
32862306a36Sopenharmony_ci	limits transfers to 4G in size.
32962306a36Sopenharmony_ci
33062306a36Sopenharmony_cidoorbell_addr
33162306a36Sopenharmony_ci	address of the doorbell to ring when this request is complete.
33262306a36Sopenharmony_ci
33362306a36Sopenharmony_cidoorbell_attr
33462306a36Sopenharmony_ci	doorbell attributes.
33562306a36Sopenharmony_ci
33662306a36Sopenharmony_ci	* Bit(7) indicates if a write to a doorbell is to occur.
33762306a36Sopenharmony_ci	* Bits(6:2) are reserved.
33862306a36Sopenharmony_ci	* Bits(1:0) contain the encoding of the doorbell length. 0 is 32-bit,
33962306a36Sopenharmony_ci	  1 is 16-bit, 2 is 8-bit, 3 is reserved. The doorbell address
34062306a36Sopenharmony_ci	  must be naturally aligned to the specified length.
34162306a36Sopenharmony_ci
34262306a36Sopenharmony_cidoorbell_data
34362306a36Sopenharmony_ci	data to write to the doorbell. Only the bits corresponding to
34462306a36Sopenharmony_ci	the doorbell length are valid.
34562306a36Sopenharmony_ci
34662306a36Sopenharmony_cisem_cmdN
34762306a36Sopenharmony_ci	semaphore command.
34862306a36Sopenharmony_ci
34962306a36Sopenharmony_ci	* Bit(31) indicates this semaphore command is enabled.
35062306a36Sopenharmony_ci	* Bit(30) is the to-device DMA fence. Block this request until all
35162306a36Sopenharmony_ci	  to-device DMA transfers are complete.
35262306a36Sopenharmony_ci	* Bit(29) is the from-device DMA fence. Block this request until all
35362306a36Sopenharmony_ci	  from-device DMA transfers are complete.
35462306a36Sopenharmony_ci	* Bits(28:27) are reserved.
35562306a36Sopenharmony_ci	* Bits(26:24) are the semaphore command. 0 is NOP. 1 is init with the
35662306a36Sopenharmony_ci	  specified value. 2 is increment. 3 is decrement. 4 is wait
35762306a36Sopenharmony_ci	  until the semaphore is equal to the specified value. 5 is wait
35862306a36Sopenharmony_ci	  until the semaphore is greater or equal to the specified value.
35962306a36Sopenharmony_ci	  6 is "P", wait until semaphore is greater than 0, then
36062306a36Sopenharmony_ci	  decrement by 1. 7 is reserved.
36162306a36Sopenharmony_ci	* Bit(23) is reserved.
36262306a36Sopenharmony_ci	* Bit(22) is the semaphore sync. 0 is post sync, which means that the
36362306a36Sopenharmony_ci	  semaphore operation is done after the DMA transfer. 1 is
36462306a36Sopenharmony_ci	  presync, which gates the DMA transfer. Only one presync is
36562306a36Sopenharmony_ci	  allowed per request.
36662306a36Sopenharmony_ci	* Bit(21) is reserved.
36762306a36Sopenharmony_ci	* Bits(20:16) is the index of the semaphore to operate on.
36862306a36Sopenharmony_ci	* Bits(15:12) are reserved.
36962306a36Sopenharmony_ci	* Bits(11:0) are the semaphore value to use in operations.
37062306a36Sopenharmony_ci
37162306a36Sopenharmony_ciOverall, a request is processed in 4 steps:
37262306a36Sopenharmony_ci
37362306a36Sopenharmony_ci1. If specified, the presync semaphore condition must be true
37462306a36Sopenharmony_ci2. If enabled, the DMA transfer occurs
37562306a36Sopenharmony_ci3. If specified, the postsync semaphore conditions must be true
37662306a36Sopenharmony_ci4. If enabled, the doorbell is written
37762306a36Sopenharmony_ci
37862306a36Sopenharmony_ciBy using the semaphores in conjunction with the workload running on the NSPs,
37962306a36Sopenharmony_cithe data pipeline can be synchronized such that the host can queue multiple
38062306a36Sopenharmony_cirequests of data for the workload to process, but the DMA Bridge will only copy
38162306a36Sopenharmony_cithe data into the memory of the workload when the workload is ready to process
38262306a36Sopenharmony_cithe next input.
38362306a36Sopenharmony_ci
38462306a36Sopenharmony_ciResponse FIFO
38562306a36Sopenharmony_ci-------------
38662306a36Sopenharmony_ci
38762306a36Sopenharmony_ciOnce a request is fully processed, a response FIFO element is generated if
38862306a36Sopenharmony_cispecified in pcie_dma_cmd. The structure of a response FIFO element:
38962306a36Sopenharmony_ci
39062306a36Sopenharmony_ci.. code-block:: c
39162306a36Sopenharmony_ci
39262306a36Sopenharmony_ci  struct response_elem {
39362306a36Sopenharmony_ci	u16 req_id;
39462306a36Sopenharmony_ci	u16 completion_code;
39562306a36Sopenharmony_ci  };
39662306a36Sopenharmony_ci
39762306a36Sopenharmony_cireq_id
39862306a36Sopenharmony_ci	matches the req_id of the request that generated this element.
39962306a36Sopenharmony_ci
40062306a36Sopenharmony_cicompletion_code
40162306a36Sopenharmony_ci	status of this request. 0 is success. Non-zero is an error.
40262306a36Sopenharmony_ci
40362306a36Sopenharmony_ciThe DMA Bridge will generate a MSI to the host as a reaction to activity in the
40462306a36Sopenharmony_ciresponse FIFO of a DBC. The DMA Bridge hardware has an IRQ storm mitigation
40562306a36Sopenharmony_cialgorithm, where it will only generate a MSI when the response FIFO transitions
40662306a36Sopenharmony_cifrom empty to non-empty (unless force MSI is enabled and triggered). In
40762306a36Sopenharmony_ciresponse to this MSI, the host is expected to drain the response FIFO, and must
40862306a36Sopenharmony_citake care to handle any race conditions between draining the FIFO, and the
40962306a36Sopenharmony_cidevice inserting elements into the FIFO.
41062306a36Sopenharmony_ci
41162306a36Sopenharmony_ciNeural Network Control (NNC) Protocol
41262306a36Sopenharmony_ci=====================================
41362306a36Sopenharmony_ci
41462306a36Sopenharmony_ciThe NNC protocol is how the host makes requests to the QSM to manage workloads.
41562306a36Sopenharmony_ciIt uses the QAIC_CONTROL MHI channel.
41662306a36Sopenharmony_ci
41762306a36Sopenharmony_ciEach NNC request is packaged into a message. Each message is a series of
41862306a36Sopenharmony_citransactions. A passthrough type transaction can contain elements known as
41962306a36Sopenharmony_cicommands.
42062306a36Sopenharmony_ci
42162306a36Sopenharmony_ciQSM requires NNC messages be little endian encoded and the fields be naturally
42262306a36Sopenharmony_cialigned. Since there are 64-bit elements in some NNC messages, 64-bit alignment
42362306a36Sopenharmony_cimust be maintained.
42462306a36Sopenharmony_ci
42562306a36Sopenharmony_ciA message contains a header and then a series of transactions. A message may be
42662306a36Sopenharmony_ciat most 4K in size from QSM to the host. From the host to the QSM, a message
42762306a36Sopenharmony_cican be at most 64K (maximum size of a single MHI packet), but there is a
42862306a36Sopenharmony_cicontinuation feature where message N+1 can be marked as a continuation of
42962306a36Sopenharmony_cimessage N. This is used for exceedingly large DMA xfer transactions.
43062306a36Sopenharmony_ci
43162306a36Sopenharmony_ciTransaction descriptions
43262306a36Sopenharmony_ci------------------------
43362306a36Sopenharmony_ci
43462306a36Sopenharmony_cipassthrough
43562306a36Sopenharmony_ci	Allows userspace to send an opaque payload directly to the QSM.
43662306a36Sopenharmony_ci	This is used for NNC commands. Userspace is responsible for managing
43762306a36Sopenharmony_ci	the QSM message requirements in the payload.
43862306a36Sopenharmony_ci
43962306a36Sopenharmony_cidma_xfer
44062306a36Sopenharmony_ci	DMA transfer. Describes an object that the QSM should DMA into the
44162306a36Sopenharmony_ci	device via address and size tuples.
44262306a36Sopenharmony_ci
44362306a36Sopenharmony_ciactivate
44462306a36Sopenharmony_ci	Activate a workload onto NSPs. The host must provide memory to be
44562306a36Sopenharmony_ci	used by the DBC.
44662306a36Sopenharmony_ci
44762306a36Sopenharmony_cideactivate
44862306a36Sopenharmony_ci	Deactivate an active workload and return the NSPs to idle.
44962306a36Sopenharmony_ci
45062306a36Sopenharmony_cistatus
45162306a36Sopenharmony_ci	Query the QSM about it's NNC implementation. Returns the NNC version,
45262306a36Sopenharmony_ci	and if CRC is used.
45362306a36Sopenharmony_ci
45462306a36Sopenharmony_citerminate
45562306a36Sopenharmony_ci	Release a user's resources.
45662306a36Sopenharmony_ci
45762306a36Sopenharmony_cidma_xfer_cont
45862306a36Sopenharmony_ci	Continuation of a previous DMA transfer. If a DMA transfer
45962306a36Sopenharmony_ci	cannot be specified in a single message (highly fragmented), this
46062306a36Sopenharmony_ci	transaction can be used to specify more ranges.
46162306a36Sopenharmony_ci
46262306a36Sopenharmony_civalidate_partition
46362306a36Sopenharmony_ci	Query to QSM to determine if a partition identifier is valid.
46462306a36Sopenharmony_ci
46562306a36Sopenharmony_ciEach message is tagged with a user id, and a partition id. The user id allows
46662306a36Sopenharmony_ciQSM to track resources, and release them when the user goes away (eg the process
46762306a36Sopenharmony_cicrashes). A partition id identifies the resource partition that QSM manages,
46862306a36Sopenharmony_ciwhich this message applies to.
46962306a36Sopenharmony_ci
47062306a36Sopenharmony_ciMessages may have CRCs. Messages should have CRCs applied until the QSM
47162306a36Sopenharmony_cireports via the status transaction that CRCs are not needed. The QSM on the
47262306a36Sopenharmony_ciSA9000P requires CRCs for black channel safing.
47362306a36Sopenharmony_ci
47462306a36Sopenharmony_ciSubsystem Restart (SSR)
47562306a36Sopenharmony_ci=======================
47662306a36Sopenharmony_ci
47762306a36Sopenharmony_ciSSR is the concept of limiting the impact of an error. An AIC100 device may
47862306a36Sopenharmony_cihave multiple users, each with their own workload running. If the workload of
47962306a36Sopenharmony_cione user crashes, the fallout of that should be limited to that workload and not
48062306a36Sopenharmony_ciimpact other workloads. SSR accomplishes this.
48162306a36Sopenharmony_ci
48262306a36Sopenharmony_ciIf a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
48362306a36Sopenharmony_cichannel. This notification identifies the workload by it's assigned DBC. A
48462306a36Sopenharmony_cimulti-stage recovery process is then used to cleanup both sides, and get the
48562306a36Sopenharmony_ciDBC/NSPs into a working state.
48662306a36Sopenharmony_ci
48762306a36Sopenharmony_ciWhen SSR occurs, any state in the workload is lost. Any inputs that were in
48862306a36Sopenharmony_ciprocess, or queued by not yet serviced, are lost. The loaded artifacts will
48962306a36Sopenharmony_ciremain in on-card DDR, but the host will need to re-activate the workload if
49062306a36Sopenharmony_ciit desires to recover the workload.
49162306a36Sopenharmony_ci
49262306a36Sopenharmony_ciReliability, Accessibility, Serviceability (RAS)
49362306a36Sopenharmony_ci================================================
49462306a36Sopenharmony_ci
49562306a36Sopenharmony_ciAIC100 is expected to be deployed in server systems where RAS ideology is
49662306a36Sopenharmony_ciapplied. Simply put, RAS is the concept of detecting, classifying, and
49762306a36Sopenharmony_cireporting errors. While PCIe has AER (Advanced Error Reporting) which factors
49862306a36Sopenharmony_ciinto RAS, AER does not allow for a device to report details about internal
49962306a36Sopenharmony_cierrors. Therefore, AIC100 implements a custom RAS mechanism. When a RAS event
50062306a36Sopenharmony_cioccurs, QSM will report the event with appropriate details via the QAIC_STATUS
50162306a36Sopenharmony_ciMHI channel. A sysadmin may determine that a particular device needs
50262306a36Sopenharmony_ciadditional service based on RAS reports.
50362306a36Sopenharmony_ci
50462306a36Sopenharmony_ciTelemetry
50562306a36Sopenharmony_ci=========
50662306a36Sopenharmony_ci
50762306a36Sopenharmony_ciQSM has the ability to report various physical attributes of the device, and in
50862306a36Sopenharmony_cisome cases, to allow the host to control them. Examples include thermal limits,
50962306a36Sopenharmony_cithermal readings, and power readings. These items are communicated via the
51062306a36Sopenharmony_ciQAIC_TELEMETRY MHI channel.
511