162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci============
462306a36Sopenharmony_ciIntroduction
562306a36Sopenharmony_ci============
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciThe Linux compute accelerators subsystem is designed to expose compute
862306a36Sopenharmony_ciaccelerators in a common way to user-space and provide a common set of
962306a36Sopenharmony_cifunctionality.
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ciThese devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
1262306a36Sopenharmony_ciAlthough these devices are typically designed to accelerate
1362306a36Sopenharmony_ciMachine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
1462306a36Sopenharmony_ciis not limited to handling these types of accelerators.
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ciTypically, a compute accelerator will belong to one of the following
1762306a36Sopenharmony_cicategories:
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ci- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
2062306a36Sopenharmony_ci  or an IP inside a SoC (e.g. laptop web camera). These devices
2162306a36Sopenharmony_ci  are typically configured using registers and can work with or without DMA.
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ci- Inference data-center - single/multi user devices in a large server. This
2462306a36Sopenharmony_ci  type of device can be stand-alone or an IP inside a SoC or a GPU. It will
2562306a36Sopenharmony_ci  have on-board DRAM (to hold the DL topology), DMA engines and
2662306a36Sopenharmony_ci  command submission queues (either kernel or user-space queues).
2762306a36Sopenharmony_ci  It might also have an MMU to manage multiple users and might also enable
2862306a36Sopenharmony_ci  virtualization (SR-IOV) to support multiple VMs on the same device. In
2962306a36Sopenharmony_ci  addition, these devices will usually have some tools, such as profiler and
3062306a36Sopenharmony_ci  debugger.
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ci- Training data-center - Similar to Inference data-center cards, but typically
3362306a36Sopenharmony_ci  have more computational power and memory b/w (e.g. HBM) and will likely have
3462306a36Sopenharmony_ci  a method of scaling-up/out, i.e. connecting to other training cards inside
3562306a36Sopenharmony_ci  the server or in other servers, respectively.
3662306a36Sopenharmony_ci
3762306a36Sopenharmony_ciAll these devices typically have different runtime user-space software stacks,
3862306a36Sopenharmony_cithat are tailored-made to their h/w. In addition, they will also probably
3962306a36Sopenharmony_ciinclude a compiler to generate programs to their custom-made computational
4062306a36Sopenharmony_ciengines. Typically, the common layer in user-space will be the DL frameworks,
4162306a36Sopenharmony_cisuch as PyTorch and TensorFlow.
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciSharing code with DRM
4462306a36Sopenharmony_ci=====================
4562306a36Sopenharmony_ci
4662306a36Sopenharmony_ciBecause this type of devices can be an IP inside GPUs or have similar
4762306a36Sopenharmony_cicharacteristics as those of GPUs, the accel subsystem will use the
4862306a36Sopenharmony_ciDRM subsystem's code and functionality. i.e. the accel core code will
4962306a36Sopenharmony_cibe part of the DRM subsystem and an accel device will be a new type of DRM
5062306a36Sopenharmony_cidevice.
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciThis will allow us to leverage the extensive DRM code-base and
5362306a36Sopenharmony_cicollaborate with DRM developers that have experience with this type of
5462306a36Sopenharmony_cidevices. In addition, new features that will be added for the accelerator
5562306a36Sopenharmony_cidrivers can be of use to GPU drivers as well.
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ciDifferentiation from GPUs
5862306a36Sopenharmony_ci=========================
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ciBecause we want to prevent the extensive user-space graphic software stack
6162306a36Sopenharmony_cifrom trying to use an accelerator as a GPU, the compute accelerators will be
6262306a36Sopenharmony_cidifferentiated from GPUs by using a new major number and new device char files.
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ciFurthermore, the drivers will be located in a separate place in the kernel
6562306a36Sopenharmony_citree - drivers/accel/.
6662306a36Sopenharmony_ci
6762306a36Sopenharmony_ciThe accelerator devices will be exposed to the user space with the dedicated
6862306a36Sopenharmony_ci261 major number and will have the following convention:
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ci- device char files - /dev/accel/accel\*
7162306a36Sopenharmony_ci- sysfs             - /sys/class/accel/accel\*/
7262306a36Sopenharmony_ci- debugfs           - /sys/kernel/debug/accel/\*/
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ciGetting Started
7562306a36Sopenharmony_ci===============
7662306a36Sopenharmony_ci
7762306a36Sopenharmony_ciFirst, read the DRM documentation at Documentation/gpu/index.rst.
7862306a36Sopenharmony_ciNot only it will explain how to write a new DRM driver but it will also
7962306a36Sopenharmony_cicontain all the information on how to contribute, the Code Of Conduct and
8062306a36Sopenharmony_ciwhat is the coding style/documentation. All of that is the same for the
8162306a36Sopenharmony_ciaccel subsystem.
8262306a36Sopenharmony_ci
8362306a36Sopenharmony_ciSecond, make sure the kernel is configured with CONFIG_DRM_ACCEL.
8462306a36Sopenharmony_ci
8562306a36Sopenharmony_ciTo expose your device as an accelerator, two changes are needed to
8662306a36Sopenharmony_cibe done in your driver (as opposed to a standard DRM driver):
8762306a36Sopenharmony_ci
8862306a36Sopenharmony_ci- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
8962306a36Sopenharmony_ci  driver_features field. It is important to note that this driver feature is
9062306a36Sopenharmony_ci  mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
9162306a36Sopenharmony_ci  to expose both graphics and compute device char files should be handled by
9262306a36Sopenharmony_ci  two drivers that are connected using the auxiliary bus framework.
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ci- Change the open callback in your driver fops structure to accel_open().
9562306a36Sopenharmony_ci  Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
9662306a36Sopenharmony_ci  set the correct function operations pointers structure.
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ciExternal References
9962306a36Sopenharmony_ci===================
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ciemail threads
10262306a36Sopenharmony_ci-------------
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ci* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
10562306a36Sopenharmony_ci* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ciConference talks
10862306a36Sopenharmony_ci----------------
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ci* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)
111