18c2ecf20Sopenharmony_ciBuffer Sharing and Synchronization
28c2ecf20Sopenharmony_ci==================================
38c2ecf20Sopenharmony_ci
48c2ecf20Sopenharmony_ciThe dma-buf subsystem provides the framework for sharing buffers for
58c2ecf20Sopenharmony_cihardware (DMA) access across multiple device drivers and subsystems, and
68c2ecf20Sopenharmony_cifor synchronizing asynchronous hardware access.
78c2ecf20Sopenharmony_ci
88c2ecf20Sopenharmony_ciThis is used, for example, by drm "prime" multi-GPU support, but is of
98c2ecf20Sopenharmony_cicourse not limited to GPU use cases.
108c2ecf20Sopenharmony_ci
118c2ecf20Sopenharmony_ciThe three main components of this are: (1) dma-buf, representing a
128c2ecf20Sopenharmony_cisg_table and exposed to userspace as a file descriptor to allow passing
138c2ecf20Sopenharmony_cibetween devices, (2) fence, which provides a mechanism to signal when
148c2ecf20Sopenharmony_cione device has finished access, and (3) reservation, which manages the
158c2ecf20Sopenharmony_cishared or exclusive fence(s) associated with the buffer.
168c2ecf20Sopenharmony_ci
178c2ecf20Sopenharmony_ciShared DMA Buffers
188c2ecf20Sopenharmony_ci------------------
198c2ecf20Sopenharmony_ci
208c2ecf20Sopenharmony_ciThis document serves as a guide to device-driver writers on what is the dma-buf
218c2ecf20Sopenharmony_cibuffer sharing API, how to use it for exporting and using shared buffers.
228c2ecf20Sopenharmony_ci
238c2ecf20Sopenharmony_ciAny device driver which wishes to be a part of DMA buffer sharing, can do so as
248c2ecf20Sopenharmony_cieither the 'exporter' of buffers, or the 'user' or 'importer' of buffers.
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ciSay a driver A wants to use buffers created by driver B, then we call B as the
278c2ecf20Sopenharmony_ciexporter, and A as buffer-user/importer.
288c2ecf20Sopenharmony_ci
298c2ecf20Sopenharmony_ciThe exporter
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ci - implements and manages operations in :c:type:`struct dma_buf_ops
328c2ecf20Sopenharmony_ci   <dma_buf_ops>` for the buffer,
338c2ecf20Sopenharmony_ci - allows other users to share the buffer by using dma_buf sharing APIs,
348c2ecf20Sopenharmony_ci - manages the details of buffer allocation, wrapped in a :c:type:`struct
358c2ecf20Sopenharmony_ci   dma_buf <dma_buf>`,
368c2ecf20Sopenharmony_ci - decides about the actual backing storage where this allocation happens,
378c2ecf20Sopenharmony_ci - and takes care of any migration of scatterlist - for all (shared) users of
388c2ecf20Sopenharmony_ci   this buffer.
398c2ecf20Sopenharmony_ci
408c2ecf20Sopenharmony_ciThe buffer-user
418c2ecf20Sopenharmony_ci
428c2ecf20Sopenharmony_ci - is one of (many) sharing users of the buffer.
438c2ecf20Sopenharmony_ci - doesn't need to worry about how the buffer is allocated, or where.
448c2ecf20Sopenharmony_ci - and needs a mechanism to get access to the scatterlist that makes up this
458c2ecf20Sopenharmony_ci   buffer in memory, mapped into its own address space, so it can access the
468c2ecf20Sopenharmony_ci   same area of memory. This interface is provided by :c:type:`struct
478c2ecf20Sopenharmony_ci   dma_buf_attachment <dma_buf_attachment>`.
488c2ecf20Sopenharmony_ci
498c2ecf20Sopenharmony_ciAny exporters or users of the dma-buf buffer sharing framework must have a
508c2ecf20Sopenharmony_ci'select DMA_SHARED_BUFFER' in their respective Kconfigs.
518c2ecf20Sopenharmony_ci
528c2ecf20Sopenharmony_ciUserspace Interface Notes
538c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ciMostly a DMA buffer file descriptor is simply an opaque object for userspace,
568c2ecf20Sopenharmony_ciand hence the generic interface exposed is very minimal. There's a few things to
578c2ecf20Sopenharmony_ciconsider though:
588c2ecf20Sopenharmony_ci
598c2ecf20Sopenharmony_ci- Since kernel 3.12 the dma-buf FD supports the llseek system call, but only
608c2ecf20Sopenharmony_ci  with offset=0 and whence=SEEK_END|SEEK_SET. SEEK_SET is supported to allow
618c2ecf20Sopenharmony_ci  the usual size discover pattern size = SEEK_END(0); SEEK_SET(0). Every other
628c2ecf20Sopenharmony_ci  llseek operation will report -EINVAL.
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ci  If llseek on dma-buf FDs isn't support the kernel will report -ESPIPE for all
658c2ecf20Sopenharmony_ci  cases. Userspace can use this to detect support for discovering the dma-buf
668c2ecf20Sopenharmony_ci  size using llseek.
678c2ecf20Sopenharmony_ci
688c2ecf20Sopenharmony_ci- In order to avoid fd leaks on exec, the FD_CLOEXEC flag must be set
698c2ecf20Sopenharmony_ci  on the file descriptor.  This is not just a resource leak, but a
708c2ecf20Sopenharmony_ci  potential security hole.  It could give the newly exec'd application
718c2ecf20Sopenharmony_ci  access to buffers, via the leaked fd, to which it should otherwise
728c2ecf20Sopenharmony_ci  not be permitted access.
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ci  The problem with doing this via a separate fcntl() call, versus doing it
758c2ecf20Sopenharmony_ci  atomically when the fd is created, is that this is inherently racy in a
768c2ecf20Sopenharmony_ci  multi-threaded app[3].  The issue is made worse when it is library code
778c2ecf20Sopenharmony_ci  opening/creating the file descriptor, as the application may not even be
788c2ecf20Sopenharmony_ci  aware of the fd's.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ci  To avoid this problem, userspace must have a way to request O_CLOEXEC
818c2ecf20Sopenharmony_ci  flag be set when the dma-buf fd is created.  So any API provided by
828c2ecf20Sopenharmony_ci  the exporting driver to create a dmabuf fd must provide a way to let
838c2ecf20Sopenharmony_ci  userspace control setting of O_CLOEXEC flag passed in to dma_buf_fd().
848c2ecf20Sopenharmony_ci
858c2ecf20Sopenharmony_ci- Memory mapping the contents of the DMA buffer is also supported. See the
868c2ecf20Sopenharmony_ci  discussion below on `CPU Access to DMA Buffer Objects`_ for the full details.
878c2ecf20Sopenharmony_ci
888c2ecf20Sopenharmony_ci- The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below for
898c2ecf20Sopenharmony_ci  details.
908c2ecf20Sopenharmony_ci
918c2ecf20Sopenharmony_ciBasic Operation and Device DMA Access
928c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
938c2ecf20Sopenharmony_ci
948c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-buf.c
958c2ecf20Sopenharmony_ci   :doc: dma buf device access
968c2ecf20Sopenharmony_ci
978c2ecf20Sopenharmony_ciCPU Access to DMA Buffer Objects
988c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
998c2ecf20Sopenharmony_ci
1008c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-buf.c
1018c2ecf20Sopenharmony_ci   :doc: cpu access
1028c2ecf20Sopenharmony_ci
1038c2ecf20Sopenharmony_ciImplicit Fence Poll Support
1048c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~
1058c2ecf20Sopenharmony_ci
1068c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-buf.c
1078c2ecf20Sopenharmony_ci   :doc: implicit fence polling
1088c2ecf20Sopenharmony_ci
1098c2ecf20Sopenharmony_ciDMA-BUF statistics
1108c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~
1118c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-buf-sysfs-stats.c
1128c2ecf20Sopenharmony_ci   :doc: overview
1138c2ecf20Sopenharmony_ci
1148c2ecf20Sopenharmony_ciKernel Functions and Structures Reference
1158c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1168c2ecf20Sopenharmony_ci
1178c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-buf.c
1188c2ecf20Sopenharmony_ci   :export:
1198c2ecf20Sopenharmony_ci
1208c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/dma-buf.h
1218c2ecf20Sopenharmony_ci   :internal:
1228c2ecf20Sopenharmony_ci
1238c2ecf20Sopenharmony_ciReservation Objects
1248c2ecf20Sopenharmony_ci-------------------
1258c2ecf20Sopenharmony_ci
1268c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-resv.c
1278c2ecf20Sopenharmony_ci   :doc: Reservation Object Overview
1288c2ecf20Sopenharmony_ci
1298c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-resv.c
1308c2ecf20Sopenharmony_ci   :export:
1318c2ecf20Sopenharmony_ci
1328c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/dma-resv.h
1338c2ecf20Sopenharmony_ci   :internal:
1348c2ecf20Sopenharmony_ci
1358c2ecf20Sopenharmony_ciDMA Fences
1368c2ecf20Sopenharmony_ci----------
1378c2ecf20Sopenharmony_ci
1388c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-fence.c
1398c2ecf20Sopenharmony_ci   :doc: DMA fences overview
1408c2ecf20Sopenharmony_ci
1418c2ecf20Sopenharmony_ciDMA Fence Cross-Driver Contract
1428c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1438c2ecf20Sopenharmony_ci
1448c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-fence.c
1458c2ecf20Sopenharmony_ci   :doc: fence cross-driver contract
1468c2ecf20Sopenharmony_ci
1478c2ecf20Sopenharmony_ciDMA Fence Signalling Annotations
1488c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1498c2ecf20Sopenharmony_ci
1508c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-fence.c
1518c2ecf20Sopenharmony_ci   :doc: fence signalling annotation
1528c2ecf20Sopenharmony_ci
1538c2ecf20Sopenharmony_ciDMA Fences Functions Reference
1548c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1558c2ecf20Sopenharmony_ci
1568c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-fence.c
1578c2ecf20Sopenharmony_ci   :export:
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/dma-fence.h
1608c2ecf20Sopenharmony_ci   :internal:
1618c2ecf20Sopenharmony_ci
1628c2ecf20Sopenharmony_ciSeqno Hardware Fences
1638c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~
1648c2ecf20Sopenharmony_ci
1658c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/seqno-fence.h
1668c2ecf20Sopenharmony_ci   :internal:
1678c2ecf20Sopenharmony_ci
1688c2ecf20Sopenharmony_ciDMA Fence Array
1698c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~
1708c2ecf20Sopenharmony_ci
1718c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/dma-fence-array.c
1728c2ecf20Sopenharmony_ci   :export:
1738c2ecf20Sopenharmony_ci
1748c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/dma-fence-array.h
1758c2ecf20Sopenharmony_ci   :internal:
1768c2ecf20Sopenharmony_ci
1778c2ecf20Sopenharmony_ciDMA Fence uABI/Sync File
1788c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~
1798c2ecf20Sopenharmony_ci
1808c2ecf20Sopenharmony_ci.. kernel-doc:: drivers/dma-buf/sync_file.c
1818c2ecf20Sopenharmony_ci   :export:
1828c2ecf20Sopenharmony_ci
1838c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/sync_file.h
1848c2ecf20Sopenharmony_ci   :internal:
1858c2ecf20Sopenharmony_ci
1868c2ecf20Sopenharmony_ciIndefinite DMA Fences
1878c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~
1888c2ecf20Sopenharmony_ci
1898c2ecf20Sopenharmony_ciAt various times &dma_fence with an indefinite time until dma_fence_wait()
1908c2ecf20Sopenharmony_cifinishes have been proposed. Examples include:
1918c2ecf20Sopenharmony_ci
1928c2ecf20Sopenharmony_ci* Future fences, used in HWC1 to signal when a buffer isn't used by the display
1938c2ecf20Sopenharmony_ci  any longer, and created with the screen update that makes the buffer visible.
1948c2ecf20Sopenharmony_ci  The time this fence completes is entirely under userspace's control.
1958c2ecf20Sopenharmony_ci
1968c2ecf20Sopenharmony_ci* Proxy fences, proposed to handle &drm_syncobj for which the fence has not yet
1978c2ecf20Sopenharmony_ci  been set. Used to asynchronously delay command submission.
1988c2ecf20Sopenharmony_ci
1998c2ecf20Sopenharmony_ci* Userspace fences or gpu futexes, fine-grained locking within a command buffer
2008c2ecf20Sopenharmony_ci  that userspace uses for synchronization across engines or with the CPU, which
2018c2ecf20Sopenharmony_ci  are then imported as a DMA fence for integration into existing winsys
2028c2ecf20Sopenharmony_ci  protocols.
2038c2ecf20Sopenharmony_ci
2048c2ecf20Sopenharmony_ci* Long-running compute command buffers, while still using traditional end of
2058c2ecf20Sopenharmony_ci  batch DMA fences for memory management instead of context preemption DMA
2068c2ecf20Sopenharmony_ci  fences which get reattached when the compute job is rescheduled.
2078c2ecf20Sopenharmony_ci
2088c2ecf20Sopenharmony_ciCommon to all these schemes is that userspace controls the dependencies of these
2098c2ecf20Sopenharmony_cifences and controls when they fire. Mixing indefinite fences with normal
2108c2ecf20Sopenharmony_ciin-kernel DMA fences does not work, even when a fallback timeout is included to
2118c2ecf20Sopenharmony_ciprotect against malicious userspace:
2128c2ecf20Sopenharmony_ci
2138c2ecf20Sopenharmony_ci* Only the kernel knows about all DMA fence dependencies, userspace is not aware
2148c2ecf20Sopenharmony_ci  of dependencies injected due to memory management or scheduler decisions.
2158c2ecf20Sopenharmony_ci
2168c2ecf20Sopenharmony_ci* Only userspace knows about all dependencies in indefinite fences and when
2178c2ecf20Sopenharmony_ci  exactly they will complete, the kernel has no visibility.
2188c2ecf20Sopenharmony_ci
2198c2ecf20Sopenharmony_ciFurthermore the kernel has to be able to hold up userspace command submission
2208c2ecf20Sopenharmony_cifor memory management needs, which means we must support indefinite fences being
2218c2ecf20Sopenharmony_cidependent upon DMA fences. If the kernel also support indefinite fences in the
2228c2ecf20Sopenharmony_cikernel like a DMA fence, like any of the above proposal would, there is the
2238c2ecf20Sopenharmony_cipotential for deadlocks.
2248c2ecf20Sopenharmony_ci
2258c2ecf20Sopenharmony_ci.. kernel-render:: DOT
2268c2ecf20Sopenharmony_ci   :alt: Indefinite Fencing Dependency Cycle
2278c2ecf20Sopenharmony_ci   :caption: Indefinite Fencing Dependency Cycle
2288c2ecf20Sopenharmony_ci
2298c2ecf20Sopenharmony_ci   digraph "Fencing Cycle" {
2308c2ecf20Sopenharmony_ci      node [shape=box bgcolor=grey style=filled]
2318c2ecf20Sopenharmony_ci      kernel [label="Kernel DMA Fences"]
2328c2ecf20Sopenharmony_ci      userspace [label="userspace controlled fences"]
2338c2ecf20Sopenharmony_ci      kernel -> userspace [label="memory management"]
2348c2ecf20Sopenharmony_ci      userspace -> kernel [label="Future fence, fence proxy, ..."]
2358c2ecf20Sopenharmony_ci
2368c2ecf20Sopenharmony_ci      { rank=same; kernel userspace }
2378c2ecf20Sopenharmony_ci   }
2388c2ecf20Sopenharmony_ci
2398c2ecf20Sopenharmony_ciThis means that the kernel might accidentally create deadlocks
2408c2ecf20Sopenharmony_cithrough memory management dependencies which userspace is unaware of, which
2418c2ecf20Sopenharmony_cirandomly hangs workloads until the timeout kicks in. Workloads, which from
2428c2ecf20Sopenharmony_ciuserspace's perspective, do not contain a deadlock.  In such a mixed fencing
2438c2ecf20Sopenharmony_ciarchitecture there is no single entity with knowledge of all dependencies.
2448c2ecf20Sopenharmony_ciThefore preventing such deadlocks from within the kernel is not possible.
2458c2ecf20Sopenharmony_ci
2468c2ecf20Sopenharmony_ciThe only solution to avoid dependencies loops is by not allowing indefinite
2478c2ecf20Sopenharmony_cifences in the kernel. This means:
2488c2ecf20Sopenharmony_ci
2498c2ecf20Sopenharmony_ci* No future fences, proxy fences or userspace fences imported as DMA fences,
2508c2ecf20Sopenharmony_ci  with or without a timeout.
2518c2ecf20Sopenharmony_ci
2528c2ecf20Sopenharmony_ci* No DMA fences that signal end of batchbuffer for command submission where
2538c2ecf20Sopenharmony_ci  userspace is allowed to use userspace fencing or long running compute
2548c2ecf20Sopenharmony_ci  workloads. This also means no implicit fencing for shared buffers in these
2558c2ecf20Sopenharmony_ci  cases.
256