162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci============ 462306a36Sopenharmony_ciIntroduction 562306a36Sopenharmony_ci============ 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciThe Linux compute accelerators subsystem is designed to expose compute 862306a36Sopenharmony_ciaccelerators in a common way to user-space and provide a common set of 962306a36Sopenharmony_cifunctionality. 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ciThese devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. 1262306a36Sopenharmony_ciAlthough these devices are typically designed to accelerate 1362306a36Sopenharmony_ciMachine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer 1462306a36Sopenharmony_ciis not limited to handling these types of accelerators. 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ciTypically, a compute accelerator will belong to one of the following 1762306a36Sopenharmony_cicategories: 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ci- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, 2062306a36Sopenharmony_ci or an IP inside a SoC (e.g. laptop web camera). These devices 2162306a36Sopenharmony_ci are typically configured using registers and can work with or without DMA. 2262306a36Sopenharmony_ci 2362306a36Sopenharmony_ci- Inference data-center - single/multi user devices in a large server. This 2462306a36Sopenharmony_ci type of device can be stand-alone or an IP inside a SoC or a GPU. It will 2562306a36Sopenharmony_ci have on-board DRAM (to hold the DL topology), DMA engines and 2662306a36Sopenharmony_ci command submission queues (either kernel or user-space queues). 2762306a36Sopenharmony_ci It might also have an MMU to manage multiple users and might also enable 2862306a36Sopenharmony_ci virtualization (SR-IOV) to support multiple VMs on the same device. In 2962306a36Sopenharmony_ci addition, these devices will usually have some tools, such as profiler and 3062306a36Sopenharmony_ci debugger. 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ci- Training data-center - Similar to Inference data-center cards, but typically 3362306a36Sopenharmony_ci have more computational power and memory b/w (e.g. HBM) and will likely have 3462306a36Sopenharmony_ci a method of scaling-up/out, i.e. connecting to other training cards inside 3562306a36Sopenharmony_ci the server or in other servers, respectively. 3662306a36Sopenharmony_ci 3762306a36Sopenharmony_ciAll these devices typically have different runtime user-space software stacks, 3862306a36Sopenharmony_cithat are tailored-made to their h/w. In addition, they will also probably 3962306a36Sopenharmony_ciinclude a compiler to generate programs to their custom-made computational 4062306a36Sopenharmony_ciengines. Typically, the common layer in user-space will be the DL frameworks, 4162306a36Sopenharmony_cisuch as PyTorch and TensorFlow. 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciSharing code with DRM 4462306a36Sopenharmony_ci===================== 4562306a36Sopenharmony_ci 4662306a36Sopenharmony_ciBecause this type of devices can be an IP inside GPUs or have similar 4762306a36Sopenharmony_cicharacteristics as those of GPUs, the accel subsystem will use the 4862306a36Sopenharmony_ciDRM subsystem's code and functionality. i.e. the accel core code will 4962306a36Sopenharmony_cibe part of the DRM subsystem and an accel device will be a new type of DRM 5062306a36Sopenharmony_cidevice. 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ciThis will allow us to leverage the extensive DRM code-base and 5362306a36Sopenharmony_cicollaborate with DRM developers that have experience with this type of 5462306a36Sopenharmony_cidevices. In addition, new features that will be added for the accelerator 5562306a36Sopenharmony_cidrivers can be of use to GPU drivers as well. 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ciDifferentiation from GPUs 5862306a36Sopenharmony_ci========================= 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciBecause we want to prevent the extensive user-space graphic software stack 6162306a36Sopenharmony_cifrom trying to use an accelerator as a GPU, the compute accelerators will be 6262306a36Sopenharmony_cidifferentiated from GPUs by using a new major number and new device char files. 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ciFurthermore, the drivers will be located in a separate place in the kernel 6562306a36Sopenharmony_citree - drivers/accel/. 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ciThe accelerator devices will be exposed to the user space with the dedicated 6862306a36Sopenharmony_ci261 major number and will have the following convention: 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_ci- device char files - /dev/accel/accel\* 7162306a36Sopenharmony_ci- sysfs - /sys/class/accel/accel\*/ 7262306a36Sopenharmony_ci- debugfs - /sys/kernel/debug/accel/\*/ 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ciGetting Started 7562306a36Sopenharmony_ci=============== 7662306a36Sopenharmony_ci 7762306a36Sopenharmony_ciFirst, read the DRM documentation at Documentation/gpu/index.rst. 7862306a36Sopenharmony_ciNot only it will explain how to write a new DRM driver but it will also 7962306a36Sopenharmony_cicontain all the information on how to contribute, the Code Of Conduct and 8062306a36Sopenharmony_ciwhat is the coding style/documentation. All of that is the same for the 8162306a36Sopenharmony_ciaccel subsystem. 8262306a36Sopenharmony_ci 8362306a36Sopenharmony_ciSecond, make sure the kernel is configured with CONFIG_DRM_ACCEL. 8462306a36Sopenharmony_ci 8562306a36Sopenharmony_ciTo expose your device as an accelerator, two changes are needed to 8662306a36Sopenharmony_cibe done in your driver (as opposed to a standard DRM driver): 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ci- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's 8962306a36Sopenharmony_ci driver_features field. It is important to note that this driver feature is 9062306a36Sopenharmony_ci mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want 9162306a36Sopenharmony_ci to expose both graphics and compute device char files should be handled by 9262306a36Sopenharmony_ci two drivers that are connected using the auxiliary bus framework. 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ci- Change the open callback in your driver fops structure to accel_open(). 9562306a36Sopenharmony_ci Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily 9662306a36Sopenharmony_ci set the correct function operations pointers structure. 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciExternal References 9962306a36Sopenharmony_ci=================== 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ciemail threads 10262306a36Sopenharmony_ci------------- 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ci* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022) 10562306a36Sopenharmony_ci* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022) 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ciConference talks 10862306a36Sopenharmony_ci---------------- 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ci* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022) 111