162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0-only 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci=============================== 462306a36Sopenharmony_ci Qualcomm Cloud AI 100 (AIC100) 562306a36Sopenharmony_ci=============================== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciOverview 862306a36Sopenharmony_ci======== 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciThe Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of 1162306a36Sopenharmony_ciSnapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for 1262306a36Sopenharmony_cithe purpose of efficiently running Artificial Intelligence (AI) Deep Learning 1362306a36Sopenharmony_ciinference workloads. They are AI accelerators. 1462306a36Sopenharmony_ci 1562306a36Sopenharmony_ciThe PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes 1662306a36Sopenharmony_ci(x8). An individual SoC on a card can have up to 16 NSPs for running workloads. 1762306a36Sopenharmony_ciEach SoC has an A53 management CPU. On card, there can be up to 32 GB of DDR. 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciMultiple AIC100 cards can be hosted in a single system to scale overall 2062306a36Sopenharmony_ciperformance. AIC100 cards are multi-user capable and able to execute workloads 2162306a36Sopenharmony_cifrom multiple users in a concurrent manner. 2262306a36Sopenharmony_ci 2362306a36Sopenharmony_ciHardware Description 2462306a36Sopenharmony_ci==================== 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ciAn AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc 2762306a36Sopenharmony_ciperipherals (PMICs, etc). 2862306a36Sopenharmony_ci 2962306a36Sopenharmony_ciAn AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card), 3062306a36Sopenharmony_cior a Dual M.2 card. Both use PCIe to connect to the host system. 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ciAs a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/ 3362306a36Sopenharmony_ciDeviceID(DID) combination to uniquely identify itself to the host. AIC100 3462306a36Sopenharmony_ciuses the standard Qualcomm VID (0x17cb). All AIC100 SKUs use the same 3562306a36Sopenharmony_ciAIC100 DID (0xa100). 3662306a36Sopenharmony_ci 3762306a36Sopenharmony_ciAIC100 does not implement FLR (function level reset). 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ciAIC100 implements MSI but does not implement MSI-X. AIC100 requires 17 MSIs to 4062306a36Sopenharmony_cioperate (1 for MHI, 16 for the DMA Bridge). 4162306a36Sopenharmony_ci 4262306a36Sopenharmony_ciAs a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device 4362306a36Sopenharmony_cihardware. AIC100 provides 3, 64-bit BARs. 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ci* The first BAR is 4K in size, and exposes the MHI interface to the host. 4662306a36Sopenharmony_ci 4762306a36Sopenharmony_ci* The second BAR is 2M in size, and exposes the DMA Bridge interface to the 4862306a36Sopenharmony_ci host. 4962306a36Sopenharmony_ci 5062306a36Sopenharmony_ci* The third BAR is variable in size based on an individual AIC100's 5162306a36Sopenharmony_ci configuration, but defaults to 64K. This BAR currently has no purpose. 5262306a36Sopenharmony_ci 5362306a36Sopenharmony_ciFrom the host perspective, AIC100 has several key hardware components - 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ci* MHI (Modem Host Interface) 5662306a36Sopenharmony_ci* QSM (QAIC Service Manager) 5762306a36Sopenharmony_ci* NSPs (Neural Signal Processor) 5862306a36Sopenharmony_ci* DMA Bridge 5962306a36Sopenharmony_ci* DDR 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ciMHI 6262306a36Sopenharmony_ci--- 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ciAIC100 has one MHI interface over PCIe. MHI itself is documented at 6562306a36Sopenharmony_ciDocumentation/mhi/index.rst MHI is the mechanism the host uses to communicate 6662306a36Sopenharmony_ciwith the QSM. Except for workload data via the DMA Bridge, all interaction with 6762306a36Sopenharmony_cithe device occurs via MHI. 6862306a36Sopenharmony_ci 6962306a36Sopenharmony_ciQSM 7062306a36Sopenharmony_ci--- 7162306a36Sopenharmony_ci 7262306a36Sopenharmony_ciQAIC Service Manager. This is an ARM A53 CPU that runs the primary 7362306a36Sopenharmony_cifirmware of the card and performs on-card management tasks. It also 7462306a36Sopenharmony_cicommunicates with the host via MHI. Each AIC100 has one of 7562306a36Sopenharmony_cithese. 7662306a36Sopenharmony_ci 7762306a36Sopenharmony_ciNSP 7862306a36Sopenharmony_ci--- 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ciNeural Signal Processor. Each AIC100 has up to 16 of these. These are 8162306a36Sopenharmony_cithe processors that run the workloads on AIC100. Each NSP is a Qualcomm Hexagon 8262306a36Sopenharmony_ci(Q6) DSP with HVX and HMX. Each NSP can only run one workload at a time, but 8362306a36Sopenharmony_cimultiple NSPs may be assigned to a single workload. Since each NSP can only run 8462306a36Sopenharmony_cione workload, AIC100 is limited to 16 concurrent workloads. Workload 8562306a36Sopenharmony_ci"scheduling" is under the purview of the host. AIC100 does not automatically 8662306a36Sopenharmony_citimeslice. 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ciDMA Bridge 8962306a36Sopenharmony_ci---------- 9062306a36Sopenharmony_ci 9162306a36Sopenharmony_ciThe DMA Bridge is custom DMA engine that manages the flow of data 9262306a36Sopenharmony_ciin and out of workloads. AIC100 has one of these. The DMA Bridge has 16 9362306a36Sopenharmony_cichannels, each consisting of a set of request/response FIFOs. Each active 9462306a36Sopenharmony_ciworkload is assigned a single DMA Bridge channel. The DMA Bridge exposes 9562306a36Sopenharmony_cihardware registers to manage the FIFOs (head/tail pointers), but requires host 9662306a36Sopenharmony_cimemory to store the FIFOs. 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciDDR 9962306a36Sopenharmony_ci--- 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ciAIC100 has on-card DDR. In total, an AIC100 can have up to 32 GB of DDR. 10262306a36Sopenharmony_ciThis DDR is used to store workloads, data for the workloads, and is used by the 10362306a36Sopenharmony_ciQSM for managing the device. NSPs are granted access to sections of the DDR by 10462306a36Sopenharmony_cithe QSM. The host does not have direct access to the DDR, and must make 10562306a36Sopenharmony_cirequests to the QSM to transfer data to the DDR. 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ciHigh-level Use Flow 10862306a36Sopenharmony_ci=================== 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ciAIC100 is a multi-user, programmable accelerator typically used for running 11162306a36Sopenharmony_cineural networks in inferencing mode to efficiently perform AI operations. 11262306a36Sopenharmony_ciAIC100 is not intended for training neural networks. AIC100 can be utilized 11362306a36Sopenharmony_cifor generic compute workloads. 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ciAssuming a user wants to utilize AIC100, they would follow these steps: 11662306a36Sopenharmony_ci 11762306a36Sopenharmony_ci1. Compile the workload into an ELF targeting the NSP(s) 11862306a36Sopenharmony_ci2. Make requests to the QSM to load the workload and related artifacts into the 11962306a36Sopenharmony_ci device DDR 12062306a36Sopenharmony_ci3. Make a request to the QSM to activate the workload onto a set of idle NSPs 12162306a36Sopenharmony_ci4. Make requests to the DMA Bridge to send input data to the workload to be 12262306a36Sopenharmony_ci processed, and other requests to receive processed output data from the 12362306a36Sopenharmony_ci workload. 12462306a36Sopenharmony_ci5. Once the workload is no longer required, make a request to the QSM to 12562306a36Sopenharmony_ci deactivate the workload, thus putting the NSPs back into an idle state. 12662306a36Sopenharmony_ci6. Once the workload and related artifacts are no longer needed for future 12762306a36Sopenharmony_ci sessions, make requests to the QSM to unload the data from DDR. This frees 12862306a36Sopenharmony_ci the DDR to be used by other users. 12962306a36Sopenharmony_ci 13062306a36Sopenharmony_ci 13162306a36Sopenharmony_ciBoot Flow 13262306a36Sopenharmony_ci========= 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ciAIC100 uses a flashless boot flow, derived from Qualcomm MSMs. 13562306a36Sopenharmony_ci 13662306a36Sopenharmony_ciWhen AIC100 is first powered on, it begins executing PBL (Primary Bootloader) 13762306a36Sopenharmony_cifrom ROM. PBL enumerates the PCIe link, and initializes the BHI (Boot Host 13862306a36Sopenharmony_ciInterface) component of MHI. 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ciUsing BHI, the host points PBL to the location of the SBL (Secondary Bootloader) 14162306a36Sopenharmony_ciimage. The PBL pulls the image from the host, validates it, and begins 14262306a36Sopenharmony_ciexecution of SBL. 14362306a36Sopenharmony_ci 14462306a36Sopenharmony_ciSBL initializes MHI, and uses MHI to notify the host that the device has entered 14562306a36Sopenharmony_cithe SBL stage. SBL performs a number of operations: 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ci* SBL initializes the majority of hardware (anything PBL left uninitialized), 14862306a36Sopenharmony_ci including DDR. 14962306a36Sopenharmony_ci* SBL offloads the bootlog to the host. 15062306a36Sopenharmony_ci* SBL synchronizes timestamps with the host for future logging. 15162306a36Sopenharmony_ci* SBL uses the Sahara protocol to obtain the runtime firmware images from the 15262306a36Sopenharmony_ci host. 15362306a36Sopenharmony_ci 15462306a36Sopenharmony_ciOnce SBL has obtained and validated the runtime firmware, it brings the NSPs out 15562306a36Sopenharmony_ciof reset, and jumps into the QSM. 15662306a36Sopenharmony_ci 15762306a36Sopenharmony_ciThe QSM uses MHI to notify the host that the device has entered the QSM stage 15862306a36Sopenharmony_ci(AMSS in MHI terms). At this point, the AIC100 device is fully functional, and 15962306a36Sopenharmony_ciready to process workloads. 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ciUserspace components 16262306a36Sopenharmony_ci==================== 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ciCompiler 16562306a36Sopenharmony_ci-------- 16662306a36Sopenharmony_ci 16762306a36Sopenharmony_ciAn open compiler for AIC100 based on upstream LLVM can be found at: 16862306a36Sopenharmony_cihttps://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc 16962306a36Sopenharmony_ci 17062306a36Sopenharmony_ciUsermode Driver (UMD) 17162306a36Sopenharmony_ci--------------------- 17262306a36Sopenharmony_ci 17362306a36Sopenharmony_ciAn open UMD that interfaces with the qaic kernel driver can be found at: 17462306a36Sopenharmony_cihttps://github.com/quic/software-kit-for-qualcomm-cloud-ai-100 17562306a36Sopenharmony_ci 17662306a36Sopenharmony_ciSahara loader 17762306a36Sopenharmony_ci------------- 17862306a36Sopenharmony_ci 17962306a36Sopenharmony_ciAn open implementation of the Sahara protocol called kickstart can be found at: 18062306a36Sopenharmony_cihttps://github.com/andersson/qdl 18162306a36Sopenharmony_ci 18262306a36Sopenharmony_ciMHI Channels 18362306a36Sopenharmony_ci============ 18462306a36Sopenharmony_ci 18562306a36Sopenharmony_ciAIC100 defines a number of MHI channels for different purposes. This is a list 18662306a36Sopenharmony_ciof the defined channels, and their uses. 18762306a36Sopenharmony_ci 18862306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 18962306a36Sopenharmony_ci| Channel name | IDs | EEs | Purpose | 19062306a36Sopenharmony_ci+================+=========+==========+========================================+ 19162306a36Sopenharmony_ci| QAIC_LOOPBACK | 0 & 1 | AMSS | Any data sent to the device on this | 19262306a36Sopenharmony_ci| | | | channel is sent back to the host. | 19362306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 19462306a36Sopenharmony_ci| QAIC_SAHARA | 2 & 3 | SBL | Used by SBL to obtain the runtime | 19562306a36Sopenharmony_ci| | | | firmware from the host. | 19662306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 19762306a36Sopenharmony_ci| QAIC_DIAG | 4 & 5 | AMSS | Used to communicate with QSM via the | 19862306a36Sopenharmony_ci| | | | DIAG protocol. | 19962306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 20062306a36Sopenharmony_ci| QAIC_SSR | 6 & 7 | AMSS | Used to notify the host of subsystem | 20162306a36Sopenharmony_ci| | | | restart events, and to offload SSR | 20262306a36Sopenharmony_ci| | | | crashdumps. | 20362306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 20462306a36Sopenharmony_ci| QAIC_QDSS | 8 & 9 | AMSS | Used for the Qualcomm Debug Subsystem. | 20562306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 20662306a36Sopenharmony_ci| QAIC_CONTROL | 10 & 11 | AMSS | Used for the Neural Network Control | 20762306a36Sopenharmony_ci| | | | (NNC) protocol. This is the primary | 20862306a36Sopenharmony_ci| | | | channel between host and QSM for | 20962306a36Sopenharmony_ci| | | | managing workloads. | 21062306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 21162306a36Sopenharmony_ci| QAIC_LOGGING | 12 & 13 | SBL | Used by the SBL to send the bootlog to | 21262306a36Sopenharmony_ci| | | | the host. | 21362306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 21462306a36Sopenharmony_ci| QAIC_STATUS | 14 & 15 | AMSS | Used to notify the host of Reliability,| 21562306a36Sopenharmony_ci| | | | Accessibility, Serviceability (RAS) | 21662306a36Sopenharmony_ci| | | | events. | 21762306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 21862306a36Sopenharmony_ci| QAIC_TELEMETRY | 16 & 17 | AMSS | Used to get/set power/thermal/etc | 21962306a36Sopenharmony_ci| | | | attributes. | 22062306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 22162306a36Sopenharmony_ci| QAIC_DEBUG | 18 & 19 | AMSS | Not used. | 22262306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 22362306a36Sopenharmony_ci| QAIC_TIMESYNC | 20 & 21 | SBL/AMSS | Used to synchronize timestamps in the | 22462306a36Sopenharmony_ci| | | | device side logs with the host time | 22562306a36Sopenharmony_ci| | | | source. | 22662306a36Sopenharmony_ci+----------------+---------+----------+----------------------------------------+ 22762306a36Sopenharmony_ci 22862306a36Sopenharmony_ciDMA Bridge 22962306a36Sopenharmony_ci========== 23062306a36Sopenharmony_ci 23162306a36Sopenharmony_ciOverview 23262306a36Sopenharmony_ci-------- 23362306a36Sopenharmony_ci 23462306a36Sopenharmony_ciThe DMA Bridge is one of the main interfaces to the host from the device 23562306a36Sopenharmony_ci(the other being MHI). As part of activating a workload to run on NSPs, the QSM 23662306a36Sopenharmony_ciassigns that network a DMA Bridge channel. A workload's DMA Bridge channel 23762306a36Sopenharmony_ci(DBC for short) is solely for the use of that workload and is not shared with 23862306a36Sopenharmony_ciother workloads. 23962306a36Sopenharmony_ci 24062306a36Sopenharmony_ciEach DBC is a pair of FIFOs that manage data in and out of the workload. One 24162306a36Sopenharmony_ciFIFO is the request FIFO. The other FIFO is the response FIFO. 24262306a36Sopenharmony_ci 24362306a36Sopenharmony_ciEach DBC contains 4 registers in hardware: 24462306a36Sopenharmony_ci 24562306a36Sopenharmony_ci* Request FIFO head pointer (offset 0x0). Read only by the host. Indicates the 24662306a36Sopenharmony_ci latest item in the FIFO the device has consumed. 24762306a36Sopenharmony_ci* Request FIFO tail pointer (offset 0x4). Read/write by the host. Host 24862306a36Sopenharmony_ci increments this register to add new items to the FIFO. 24962306a36Sopenharmony_ci* Response FIFO head pointer (offset 0x8). Read/write by the host. Indicates 25062306a36Sopenharmony_ci the latest item in the FIFO the host has consumed. 25162306a36Sopenharmony_ci* Response FIFO tail pointer (offset 0xc). Read only by the host. Device 25262306a36Sopenharmony_ci increments this register to add new items to the FIFO. 25362306a36Sopenharmony_ci 25462306a36Sopenharmony_ciThe values in each register are indexes in the FIFO. To get the location of the 25562306a36Sopenharmony_ciFIFO element pointed to by the register: FIFO base address + register * element 25662306a36Sopenharmony_cisize. 25762306a36Sopenharmony_ci 25862306a36Sopenharmony_ciDBC registers are exposed to the host via the second BAR. Each DBC consumes 25962306a36Sopenharmony_ci4KB of space in the BAR. 26062306a36Sopenharmony_ci 26162306a36Sopenharmony_ciThe actual FIFOs are backed by host memory. When sending a request to the QSM 26262306a36Sopenharmony_cito activate a network, the host must donate memory to be used for the FIFOs. 26362306a36Sopenharmony_ciDue to internal mapping limitations of the device, a single contiguous chunk of 26462306a36Sopenharmony_cimemory must be provided per DBC, which hosts both FIFOs. The request FIFO will 26562306a36Sopenharmony_ciconsume the beginning of the memory chunk, and the response FIFO will consume 26662306a36Sopenharmony_cithe end of the memory chunk. 26762306a36Sopenharmony_ci 26862306a36Sopenharmony_ciRequest FIFO 26962306a36Sopenharmony_ci------------ 27062306a36Sopenharmony_ci 27162306a36Sopenharmony_ciA request FIFO element has the following structure: 27262306a36Sopenharmony_ci 27362306a36Sopenharmony_ci.. code-block:: c 27462306a36Sopenharmony_ci 27562306a36Sopenharmony_ci struct request_elem { 27662306a36Sopenharmony_ci u16 req_id; 27762306a36Sopenharmony_ci u8 seq_id; 27862306a36Sopenharmony_ci u8 pcie_dma_cmd; 27962306a36Sopenharmony_ci u32 reserved; 28062306a36Sopenharmony_ci u64 pcie_dma_source_addr; 28162306a36Sopenharmony_ci u64 pcie_dma_dest_addr; 28262306a36Sopenharmony_ci u32 pcie_dma_len; 28362306a36Sopenharmony_ci u32 reserved; 28462306a36Sopenharmony_ci u64 doorbell_addr; 28562306a36Sopenharmony_ci u8 doorbell_attr; 28662306a36Sopenharmony_ci u8 reserved; 28762306a36Sopenharmony_ci u16 reserved; 28862306a36Sopenharmony_ci u32 doorbell_data; 28962306a36Sopenharmony_ci u32 sem_cmd0; 29062306a36Sopenharmony_ci u32 sem_cmd1; 29162306a36Sopenharmony_ci u32 sem_cmd2; 29262306a36Sopenharmony_ci u32 sem_cmd3; 29362306a36Sopenharmony_ci }; 29462306a36Sopenharmony_ci 29562306a36Sopenharmony_ciRequest field descriptions: 29662306a36Sopenharmony_ci 29762306a36Sopenharmony_cireq_id 29862306a36Sopenharmony_ci request ID. A request FIFO element and a response FIFO element with 29962306a36Sopenharmony_ci the same request ID refer to the same command. 30062306a36Sopenharmony_ci 30162306a36Sopenharmony_ciseq_id 30262306a36Sopenharmony_ci sequence ID within a request. Ignored by the DMA Bridge. 30362306a36Sopenharmony_ci 30462306a36Sopenharmony_cipcie_dma_cmd 30562306a36Sopenharmony_ci describes the DMA element of this request. 30662306a36Sopenharmony_ci 30762306a36Sopenharmony_ci * Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic 30862306a36Sopenharmony_ci and generates a MSI when this request is complete, and QSM 30962306a36Sopenharmony_ci configures the DMA Bridge to look at this bit. 31062306a36Sopenharmony_ci * Bits(6:5) are reserved. 31162306a36Sopenharmony_ci * Bit(4) is the completion code flag, and indicates that the DMA Bridge 31262306a36Sopenharmony_ci shall generate a response FIFO element when this request is 31362306a36Sopenharmony_ci complete. 31462306a36Sopenharmony_ci * Bit(3) indicates if this request is a linked list transfer(0) or a bulk 31562306a36Sopenharmony_ci transfer(1). 31662306a36Sopenharmony_ci * Bit(2) is reserved. 31762306a36Sopenharmony_ci * Bits(1:0) indicate the type of transfer. No transfer(0), to device(1), 31862306a36Sopenharmony_ci from device(2). Value 3 is illegal. 31962306a36Sopenharmony_ci 32062306a36Sopenharmony_cipcie_dma_source_addr 32162306a36Sopenharmony_ci source address for a bulk transfer, or the address of the linked list. 32262306a36Sopenharmony_ci 32362306a36Sopenharmony_cipcie_dma_dest_addr 32462306a36Sopenharmony_ci destination address for a bulk transfer. 32562306a36Sopenharmony_ci 32662306a36Sopenharmony_cipcie_dma_len 32762306a36Sopenharmony_ci length of the bulk transfer. Note that the size of this field 32862306a36Sopenharmony_ci limits transfers to 4G in size. 32962306a36Sopenharmony_ci 33062306a36Sopenharmony_cidoorbell_addr 33162306a36Sopenharmony_ci address of the doorbell to ring when this request is complete. 33262306a36Sopenharmony_ci 33362306a36Sopenharmony_cidoorbell_attr 33462306a36Sopenharmony_ci doorbell attributes. 33562306a36Sopenharmony_ci 33662306a36Sopenharmony_ci * Bit(7) indicates if a write to a doorbell is to occur. 33762306a36Sopenharmony_ci * Bits(6:2) are reserved. 33862306a36Sopenharmony_ci * Bits(1:0) contain the encoding of the doorbell length. 0 is 32-bit, 33962306a36Sopenharmony_ci 1 is 16-bit, 2 is 8-bit, 3 is reserved. The doorbell address 34062306a36Sopenharmony_ci must be naturally aligned to the specified length. 34162306a36Sopenharmony_ci 34262306a36Sopenharmony_cidoorbell_data 34362306a36Sopenharmony_ci data to write to the doorbell. Only the bits corresponding to 34462306a36Sopenharmony_ci the doorbell length are valid. 34562306a36Sopenharmony_ci 34662306a36Sopenharmony_cisem_cmdN 34762306a36Sopenharmony_ci semaphore command. 34862306a36Sopenharmony_ci 34962306a36Sopenharmony_ci * Bit(31) indicates this semaphore command is enabled. 35062306a36Sopenharmony_ci * Bit(30) is the to-device DMA fence. Block this request until all 35162306a36Sopenharmony_ci to-device DMA transfers are complete. 35262306a36Sopenharmony_ci * Bit(29) is the from-device DMA fence. Block this request until all 35362306a36Sopenharmony_ci from-device DMA transfers are complete. 35462306a36Sopenharmony_ci * Bits(28:27) are reserved. 35562306a36Sopenharmony_ci * Bits(26:24) are the semaphore command. 0 is NOP. 1 is init with the 35662306a36Sopenharmony_ci specified value. 2 is increment. 3 is decrement. 4 is wait 35762306a36Sopenharmony_ci until the semaphore is equal to the specified value. 5 is wait 35862306a36Sopenharmony_ci until the semaphore is greater or equal to the specified value. 35962306a36Sopenharmony_ci 6 is "P", wait until semaphore is greater than 0, then 36062306a36Sopenharmony_ci decrement by 1. 7 is reserved. 36162306a36Sopenharmony_ci * Bit(23) is reserved. 36262306a36Sopenharmony_ci * Bit(22) is the semaphore sync. 0 is post sync, which means that the 36362306a36Sopenharmony_ci semaphore operation is done after the DMA transfer. 1 is 36462306a36Sopenharmony_ci presync, which gates the DMA transfer. Only one presync is 36562306a36Sopenharmony_ci allowed per request. 36662306a36Sopenharmony_ci * Bit(21) is reserved. 36762306a36Sopenharmony_ci * Bits(20:16) is the index of the semaphore to operate on. 36862306a36Sopenharmony_ci * Bits(15:12) are reserved. 36962306a36Sopenharmony_ci * Bits(11:0) are the semaphore value to use in operations. 37062306a36Sopenharmony_ci 37162306a36Sopenharmony_ciOverall, a request is processed in 4 steps: 37262306a36Sopenharmony_ci 37362306a36Sopenharmony_ci1. If specified, the presync semaphore condition must be true 37462306a36Sopenharmony_ci2. If enabled, the DMA transfer occurs 37562306a36Sopenharmony_ci3. If specified, the postsync semaphore conditions must be true 37662306a36Sopenharmony_ci4. If enabled, the doorbell is written 37762306a36Sopenharmony_ci 37862306a36Sopenharmony_ciBy using the semaphores in conjunction with the workload running on the NSPs, 37962306a36Sopenharmony_cithe data pipeline can be synchronized such that the host can queue multiple 38062306a36Sopenharmony_cirequests of data for the workload to process, but the DMA Bridge will only copy 38162306a36Sopenharmony_cithe data into the memory of the workload when the workload is ready to process 38262306a36Sopenharmony_cithe next input. 38362306a36Sopenharmony_ci 38462306a36Sopenharmony_ciResponse FIFO 38562306a36Sopenharmony_ci------------- 38662306a36Sopenharmony_ci 38762306a36Sopenharmony_ciOnce a request is fully processed, a response FIFO element is generated if 38862306a36Sopenharmony_cispecified in pcie_dma_cmd. The structure of a response FIFO element: 38962306a36Sopenharmony_ci 39062306a36Sopenharmony_ci.. code-block:: c 39162306a36Sopenharmony_ci 39262306a36Sopenharmony_ci struct response_elem { 39362306a36Sopenharmony_ci u16 req_id; 39462306a36Sopenharmony_ci u16 completion_code; 39562306a36Sopenharmony_ci }; 39662306a36Sopenharmony_ci 39762306a36Sopenharmony_cireq_id 39862306a36Sopenharmony_ci matches the req_id of the request that generated this element. 39962306a36Sopenharmony_ci 40062306a36Sopenharmony_cicompletion_code 40162306a36Sopenharmony_ci status of this request. 0 is success. Non-zero is an error. 40262306a36Sopenharmony_ci 40362306a36Sopenharmony_ciThe DMA Bridge will generate a MSI to the host as a reaction to activity in the 40462306a36Sopenharmony_ciresponse FIFO of a DBC. The DMA Bridge hardware has an IRQ storm mitigation 40562306a36Sopenharmony_cialgorithm, where it will only generate a MSI when the response FIFO transitions 40662306a36Sopenharmony_cifrom empty to non-empty (unless force MSI is enabled and triggered). In 40762306a36Sopenharmony_ciresponse to this MSI, the host is expected to drain the response FIFO, and must 40862306a36Sopenharmony_citake care to handle any race conditions between draining the FIFO, and the 40962306a36Sopenharmony_cidevice inserting elements into the FIFO. 41062306a36Sopenharmony_ci 41162306a36Sopenharmony_ciNeural Network Control (NNC) Protocol 41262306a36Sopenharmony_ci===================================== 41362306a36Sopenharmony_ci 41462306a36Sopenharmony_ciThe NNC protocol is how the host makes requests to the QSM to manage workloads. 41562306a36Sopenharmony_ciIt uses the QAIC_CONTROL MHI channel. 41662306a36Sopenharmony_ci 41762306a36Sopenharmony_ciEach NNC request is packaged into a message. Each message is a series of 41862306a36Sopenharmony_citransactions. A passthrough type transaction can contain elements known as 41962306a36Sopenharmony_cicommands. 42062306a36Sopenharmony_ci 42162306a36Sopenharmony_ciQSM requires NNC messages be little endian encoded and the fields be naturally 42262306a36Sopenharmony_cialigned. Since there are 64-bit elements in some NNC messages, 64-bit alignment 42362306a36Sopenharmony_cimust be maintained. 42462306a36Sopenharmony_ci 42562306a36Sopenharmony_ciA message contains a header and then a series of transactions. A message may be 42662306a36Sopenharmony_ciat most 4K in size from QSM to the host. From the host to the QSM, a message 42762306a36Sopenharmony_cican be at most 64K (maximum size of a single MHI packet), but there is a 42862306a36Sopenharmony_cicontinuation feature where message N+1 can be marked as a continuation of 42962306a36Sopenharmony_cimessage N. This is used for exceedingly large DMA xfer transactions. 43062306a36Sopenharmony_ci 43162306a36Sopenharmony_ciTransaction descriptions 43262306a36Sopenharmony_ci------------------------ 43362306a36Sopenharmony_ci 43462306a36Sopenharmony_cipassthrough 43562306a36Sopenharmony_ci Allows userspace to send an opaque payload directly to the QSM. 43662306a36Sopenharmony_ci This is used for NNC commands. Userspace is responsible for managing 43762306a36Sopenharmony_ci the QSM message requirements in the payload. 43862306a36Sopenharmony_ci 43962306a36Sopenharmony_cidma_xfer 44062306a36Sopenharmony_ci DMA transfer. Describes an object that the QSM should DMA into the 44162306a36Sopenharmony_ci device via address and size tuples. 44262306a36Sopenharmony_ci 44362306a36Sopenharmony_ciactivate 44462306a36Sopenharmony_ci Activate a workload onto NSPs. The host must provide memory to be 44562306a36Sopenharmony_ci used by the DBC. 44662306a36Sopenharmony_ci 44762306a36Sopenharmony_cideactivate 44862306a36Sopenharmony_ci Deactivate an active workload and return the NSPs to idle. 44962306a36Sopenharmony_ci 45062306a36Sopenharmony_cistatus 45162306a36Sopenharmony_ci Query the QSM about it's NNC implementation. Returns the NNC version, 45262306a36Sopenharmony_ci and if CRC is used. 45362306a36Sopenharmony_ci 45462306a36Sopenharmony_citerminate 45562306a36Sopenharmony_ci Release a user's resources. 45662306a36Sopenharmony_ci 45762306a36Sopenharmony_cidma_xfer_cont 45862306a36Sopenharmony_ci Continuation of a previous DMA transfer. If a DMA transfer 45962306a36Sopenharmony_ci cannot be specified in a single message (highly fragmented), this 46062306a36Sopenharmony_ci transaction can be used to specify more ranges. 46162306a36Sopenharmony_ci 46262306a36Sopenharmony_civalidate_partition 46362306a36Sopenharmony_ci Query to QSM to determine if a partition identifier is valid. 46462306a36Sopenharmony_ci 46562306a36Sopenharmony_ciEach message is tagged with a user id, and a partition id. The user id allows 46662306a36Sopenharmony_ciQSM to track resources, and release them when the user goes away (eg the process 46762306a36Sopenharmony_cicrashes). A partition id identifies the resource partition that QSM manages, 46862306a36Sopenharmony_ciwhich this message applies to. 46962306a36Sopenharmony_ci 47062306a36Sopenharmony_ciMessages may have CRCs. Messages should have CRCs applied until the QSM 47162306a36Sopenharmony_cireports via the status transaction that CRCs are not needed. The QSM on the 47262306a36Sopenharmony_ciSA9000P requires CRCs for black channel safing. 47362306a36Sopenharmony_ci 47462306a36Sopenharmony_ciSubsystem Restart (SSR) 47562306a36Sopenharmony_ci======================= 47662306a36Sopenharmony_ci 47762306a36Sopenharmony_ciSSR is the concept of limiting the impact of an error. An AIC100 device may 47862306a36Sopenharmony_cihave multiple users, each with their own workload running. If the workload of 47962306a36Sopenharmony_cione user crashes, the fallout of that should be limited to that workload and not 48062306a36Sopenharmony_ciimpact other workloads. SSR accomplishes this. 48162306a36Sopenharmony_ci 48262306a36Sopenharmony_ciIf a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI 48362306a36Sopenharmony_cichannel. This notification identifies the workload by it's assigned DBC. A 48462306a36Sopenharmony_cimulti-stage recovery process is then used to cleanup both sides, and get the 48562306a36Sopenharmony_ciDBC/NSPs into a working state. 48662306a36Sopenharmony_ci 48762306a36Sopenharmony_ciWhen SSR occurs, any state in the workload is lost. Any inputs that were in 48862306a36Sopenharmony_ciprocess, or queued by not yet serviced, are lost. The loaded artifacts will 48962306a36Sopenharmony_ciremain in on-card DDR, but the host will need to re-activate the workload if 49062306a36Sopenharmony_ciit desires to recover the workload. 49162306a36Sopenharmony_ci 49262306a36Sopenharmony_ciReliability, Accessibility, Serviceability (RAS) 49362306a36Sopenharmony_ci================================================ 49462306a36Sopenharmony_ci 49562306a36Sopenharmony_ciAIC100 is expected to be deployed in server systems where RAS ideology is 49662306a36Sopenharmony_ciapplied. Simply put, RAS is the concept of detecting, classifying, and 49762306a36Sopenharmony_cireporting errors. While PCIe has AER (Advanced Error Reporting) which factors 49862306a36Sopenharmony_ciinto RAS, AER does not allow for a device to report details about internal 49962306a36Sopenharmony_cierrors. Therefore, AIC100 implements a custom RAS mechanism. When a RAS event 50062306a36Sopenharmony_cioccurs, QSM will report the event with appropriate details via the QAIC_STATUS 50162306a36Sopenharmony_ciMHI channel. A sysadmin may determine that a particular device needs 50262306a36Sopenharmony_ciadditional service based on RAS reports. 50362306a36Sopenharmony_ci 50462306a36Sopenharmony_ciTelemetry 50562306a36Sopenharmony_ci========= 50662306a36Sopenharmony_ci 50762306a36Sopenharmony_ciQSM has the ability to report various physical attributes of the device, and in 50862306a36Sopenharmony_cisome cases, to allow the host to control them. Examples include thermal limits, 50962306a36Sopenharmony_cithermal readings, and power readings. These items are communicated via the 51062306a36Sopenharmony_ciQAIC_TELEMETRY MHI channel. 511