162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci=====================================
462306a36Sopenharmony_ciAsynchronous Transfers/Transforms API
562306a36Sopenharmony_ci=====================================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ci.. Contents
862306a36Sopenharmony_ci
962306a36Sopenharmony_ci  1. INTRODUCTION
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ci  2 GENEALOGY
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ci  3 USAGE
1462306a36Sopenharmony_ci  3.1 General format of the API
1562306a36Sopenharmony_ci  3.2 Supported operations
1662306a36Sopenharmony_ci  3.3 Descriptor management
1762306a36Sopenharmony_ci  3.4 When does the operation execute?
1862306a36Sopenharmony_ci  3.5 When does the operation complete?
1962306a36Sopenharmony_ci  3.6 Constraints
2062306a36Sopenharmony_ci  3.7 Example
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ci  4 DMAENGINE DRIVER DEVELOPER NOTES
2362306a36Sopenharmony_ci  4.1 Conformance points
2462306a36Sopenharmony_ci  4.2 "My application needs exclusive control of hardware channels"
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ci  5 SOURCE
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ci1. Introduction
2962306a36Sopenharmony_ci===============
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciThe async_tx API provides methods for describing a chain of asynchronous
3262306a36Sopenharmony_cibulk memory transfers/transforms with support for inter-transactional
3362306a36Sopenharmony_cidependencies.  It is implemented as a dmaengine client that smooths over
3462306a36Sopenharmony_cithe details of different hardware offload engine implementations.  Code
3562306a36Sopenharmony_cithat is written to the API can optimize for asynchronous operation and
3662306a36Sopenharmony_cithe API will fit the chain of operations to the available offload
3762306a36Sopenharmony_ciresources.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ci2.Genealogy
4062306a36Sopenharmony_ci===========
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ciThe API was initially designed to offload the memory copy and
4362306a36Sopenharmony_cixor-parity-calculations of the md-raid5 driver using the offload engines
4462306a36Sopenharmony_cipresent in the Intel(R) Xscale series of I/O processors.  It also built
4562306a36Sopenharmony_cion the 'dmaengine' layer developed for offloading memory copies in the
4662306a36Sopenharmony_cinetwork stack using Intel(R) I/OAT engines.  The following design
4762306a36Sopenharmony_cifeatures surfaced as a result:
4862306a36Sopenharmony_ci
4962306a36Sopenharmony_ci1. implicit synchronous path: users of the API do not need to know if
5062306a36Sopenharmony_ci   the platform they are running on has offload capabilities.  The
5162306a36Sopenharmony_ci   operation will be offloaded when an engine is available and carried out
5262306a36Sopenharmony_ci   in software otherwise.
5362306a36Sopenharmony_ci2. cross channel dependency chains: the API allows a chain of dependent
5462306a36Sopenharmony_ci   operations to be submitted, like xor->copy->xor in the raid5 case.  The
5562306a36Sopenharmony_ci   API automatically handles cases where the transition from one operation
5662306a36Sopenharmony_ci   to another implies a hardware channel switch.
5762306a36Sopenharmony_ci3. dmaengine extensions to support multiple clients and operation types
5862306a36Sopenharmony_ci   beyond 'memcpy'
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ci3. Usage
6162306a36Sopenharmony_ci========
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ci3.1 General format of the API
6462306a36Sopenharmony_ci-----------------------------
6562306a36Sopenharmony_ci
6662306a36Sopenharmony_ci::
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ci  struct dma_async_tx_descriptor *
6962306a36Sopenharmony_ci  async_<operation>(<op specific parameters>, struct async_submit_ctl *submit)
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ci3.2 Supported operations
7262306a36Sopenharmony_ci------------------------
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ci========  ====================================================================
7562306a36Sopenharmony_cimemcpy    memory copy between a source and a destination buffer
7662306a36Sopenharmony_cimemset    fill a destination buffer with a byte value
7762306a36Sopenharmony_cixor       xor a series of source buffers and write the result to a
7862306a36Sopenharmony_ci	  destination buffer
7962306a36Sopenharmony_cixor_val   xor a series of source buffers and set a flag if the
8062306a36Sopenharmony_ci	  result is zero.  The implementation attempts to prevent
8162306a36Sopenharmony_ci	  writes to memory
8262306a36Sopenharmony_cipq	  generate the p+q (raid6 syndrome) from a series of source buffers
8362306a36Sopenharmony_cipq_val    validate that a p and or q buffer are in sync with a given series of
8462306a36Sopenharmony_ci	  sources
8562306a36Sopenharmony_cidatap	  (raid6_datap_recov) recover a raid6 data block and the p block
8662306a36Sopenharmony_ci	  from the given sources
8762306a36Sopenharmony_ci2data	  (raid6_2data_recov) recover 2 raid6 data blocks from the given
8862306a36Sopenharmony_ci	  sources
8962306a36Sopenharmony_ci========  ====================================================================
9062306a36Sopenharmony_ci
9162306a36Sopenharmony_ci3.3 Descriptor management
9262306a36Sopenharmony_ci-------------------------
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ciThe return value is non-NULL and points to a 'descriptor' when the operation
9562306a36Sopenharmony_cihas been queued to execute asynchronously.  Descriptors are recycled
9662306a36Sopenharmony_ciresources, under control of the offload engine driver, to be reused as
9762306a36Sopenharmony_cioperations complete.  When an application needs to submit a chain of
9862306a36Sopenharmony_cioperations it must guarantee that the descriptor is not automatically recycled
9962306a36Sopenharmony_cibefore the dependency is submitted.  This requires that all descriptors be
10062306a36Sopenharmony_ciacknowledged by the application before the offload engine driver is allowed to
10162306a36Sopenharmony_cirecycle (or free) the descriptor.  A descriptor can be acked by one of the
10262306a36Sopenharmony_cifollowing methods:
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ci1. setting the ASYNC_TX_ACK flag if no child operations are to be submitted
10562306a36Sopenharmony_ci2. submitting an unacknowledged descriptor as a dependency to another
10662306a36Sopenharmony_ci   async_tx call will implicitly set the acknowledged state.
10762306a36Sopenharmony_ci3. calling async_tx_ack() on the descriptor.
10862306a36Sopenharmony_ci
10962306a36Sopenharmony_ci3.4 When does the operation execute?
11062306a36Sopenharmony_ci------------------------------------
11162306a36Sopenharmony_ci
11262306a36Sopenharmony_ciOperations do not immediately issue after return from the
11362306a36Sopenharmony_ciasync_<operation> call.  Offload engine drivers batch operations to
11462306a36Sopenharmony_ciimprove performance by reducing the number of mmio cycles needed to
11562306a36Sopenharmony_cimanage the channel.  Once a driver-specific threshold is met the driver
11662306a36Sopenharmony_ciautomatically issues pending operations.  An application can force this
11762306a36Sopenharmony_cievent by calling async_tx_issue_pending_all().  This operates on all
11862306a36Sopenharmony_cichannels since the application has no knowledge of channel to operation
11962306a36Sopenharmony_cimapping.
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ci3.5 When does the operation complete?
12262306a36Sopenharmony_ci-------------------------------------
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ciThere are two methods for an application to learn about the completion
12562306a36Sopenharmony_ciof an operation.
12662306a36Sopenharmony_ci
12762306a36Sopenharmony_ci1. Call dma_wait_for_async_tx().  This call causes the CPU to spin while
12862306a36Sopenharmony_ci   it polls for the completion of the operation.  It handles dependency
12962306a36Sopenharmony_ci   chains and issuing pending operations.
13062306a36Sopenharmony_ci2. Specify a completion callback.  The callback routine runs in tasklet
13162306a36Sopenharmony_ci   context if the offload engine driver supports interrupts, or it is
13262306a36Sopenharmony_ci   called in application context if the operation is carried out
13362306a36Sopenharmony_ci   synchronously in software.  The callback can be set in the call to
13462306a36Sopenharmony_ci   async_<operation>, or when the application needs to submit a chain of
13562306a36Sopenharmony_ci   unknown length it can use the async_trigger_callback() routine to set a
13662306a36Sopenharmony_ci   completion interrupt/callback at the end of the chain.
13762306a36Sopenharmony_ci
13862306a36Sopenharmony_ci3.6 Constraints
13962306a36Sopenharmony_ci---------------
14062306a36Sopenharmony_ci
14162306a36Sopenharmony_ci1. Calls to async_<operation> are not permitted in IRQ context.  Other
14262306a36Sopenharmony_ci   contexts are permitted provided constraint #2 is not violated.
14362306a36Sopenharmony_ci2. Completion callback routines cannot submit new operations.  This
14462306a36Sopenharmony_ci   results in recursion in the synchronous case and spin_locks being
14562306a36Sopenharmony_ci   acquired twice in the asynchronous case.
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ci3.7 Example
14862306a36Sopenharmony_ci-----------
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_ciPerform a xor->copy->xor operation where each operation depends on the
15162306a36Sopenharmony_ciresult from the previous operation::
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ci    void callback(void *param)
15462306a36Sopenharmony_ci    {
15562306a36Sopenharmony_ci	    struct completion *cmp = param;
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ci	    complete(cmp);
15862306a36Sopenharmony_ci    }
15962306a36Sopenharmony_ci
16062306a36Sopenharmony_ci    void run_xor_copy_xor(struct page **xor_srcs,
16162306a36Sopenharmony_ci			int xor_src_cnt,
16262306a36Sopenharmony_ci			struct page *xor_dest,
16362306a36Sopenharmony_ci			size_t xor_len,
16462306a36Sopenharmony_ci			struct page *copy_src,
16562306a36Sopenharmony_ci			struct page *copy_dest,
16662306a36Sopenharmony_ci			size_t copy_len)
16762306a36Sopenharmony_ci    {
16862306a36Sopenharmony_ci	    struct dma_async_tx_descriptor *tx;
16962306a36Sopenharmony_ci	    addr_conv_t addr_conv[xor_src_cnt];
17062306a36Sopenharmony_ci	    struct async_submit_ctl submit;
17162306a36Sopenharmony_ci	    addr_conv_t addr_conv[NDISKS];
17262306a36Sopenharmony_ci	    struct completion cmp;
17362306a36Sopenharmony_ci
17462306a36Sopenharmony_ci	    init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL,
17562306a36Sopenharmony_ci			    addr_conv);
17662306a36Sopenharmony_ci	    tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit)
17762306a36Sopenharmony_ci
17862306a36Sopenharmony_ci	    submit->depend_tx = tx;
17962306a36Sopenharmony_ci	    tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, &submit);
18062306a36Sopenharmony_ci
18162306a36Sopenharmony_ci	    init_completion(&cmp);
18262306a36Sopenharmony_ci	    init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, tx,
18362306a36Sopenharmony_ci			    callback, &cmp, addr_conv);
18462306a36Sopenharmony_ci	    tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit);
18562306a36Sopenharmony_ci
18662306a36Sopenharmony_ci	    async_tx_issue_pending_all();
18762306a36Sopenharmony_ci
18862306a36Sopenharmony_ci	    wait_for_completion(&cmp);
18962306a36Sopenharmony_ci    }
19062306a36Sopenharmony_ci
19162306a36Sopenharmony_ciSee include/linux/async_tx.h for more information on the flags.  See the
19262306a36Sopenharmony_ciops_run_* and ops_complete_* routines in drivers/md/raid5.c for more
19362306a36Sopenharmony_ciimplementation examples.
19462306a36Sopenharmony_ci
19562306a36Sopenharmony_ci4. Driver Development Notes
19662306a36Sopenharmony_ci===========================
19762306a36Sopenharmony_ci
19862306a36Sopenharmony_ci4.1 Conformance points
19962306a36Sopenharmony_ci----------------------
20062306a36Sopenharmony_ci
20162306a36Sopenharmony_ciThere are a few conformance points required in dmaengine drivers to
20262306a36Sopenharmony_ciaccommodate assumptions made by applications using the async_tx API:
20362306a36Sopenharmony_ci
20462306a36Sopenharmony_ci1. Completion callbacks are expected to happen in tasklet context
20562306a36Sopenharmony_ci2. dma_async_tx_descriptor fields are never manipulated in IRQ context
20662306a36Sopenharmony_ci3. Use async_tx_run_dependencies() in the descriptor clean up path to
20762306a36Sopenharmony_ci   handle submission of dependent operations
20862306a36Sopenharmony_ci
20962306a36Sopenharmony_ci4.2 "My application needs exclusive control of hardware channels"
21062306a36Sopenharmony_ci-----------------------------------------------------------------
21162306a36Sopenharmony_ci
21262306a36Sopenharmony_ciPrimarily this requirement arises from cases where a DMA engine driver
21362306a36Sopenharmony_ciis being used to support device-to-memory operations.  A channel that is
21462306a36Sopenharmony_ciperforming these operations cannot, for many platform specific reasons,
21562306a36Sopenharmony_cibe shared.  For these cases the dma_request_channel() interface is
21662306a36Sopenharmony_ciprovided.
21762306a36Sopenharmony_ci
21862306a36Sopenharmony_ciThe interface is::
21962306a36Sopenharmony_ci
22062306a36Sopenharmony_ci  struct dma_chan *dma_request_channel(dma_cap_mask_t mask,
22162306a36Sopenharmony_ci				       dma_filter_fn filter_fn,
22262306a36Sopenharmony_ci				       void *filter_param);
22362306a36Sopenharmony_ci
22462306a36Sopenharmony_ciWhere dma_filter_fn is defined as::
22562306a36Sopenharmony_ci
22662306a36Sopenharmony_ci  typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param);
22762306a36Sopenharmony_ci
22862306a36Sopenharmony_ciWhen the optional 'filter_fn' parameter is set to NULL
22962306a36Sopenharmony_cidma_request_channel simply returns the first channel that satisfies the
23062306a36Sopenharmony_cicapability mask.  Otherwise, when the mask parameter is insufficient for
23162306a36Sopenharmony_cispecifying the necessary channel, the filter_fn routine can be used to
23262306a36Sopenharmony_cidisposition the available channels in the system. The filter_fn routine
23362306a36Sopenharmony_ciis called once for each free channel in the system.  Upon seeing a
23462306a36Sopenharmony_cisuitable channel filter_fn returns DMA_ACK which flags that channel to
23562306a36Sopenharmony_cibe the return value from dma_request_channel.  A channel allocated via
23662306a36Sopenharmony_cithis interface is exclusive to the caller, until dma_release_channel()
23762306a36Sopenharmony_ciis called.
23862306a36Sopenharmony_ci
23962306a36Sopenharmony_ciThe DMA_PRIVATE capability flag is used to tag dma devices that should
24062306a36Sopenharmony_cinot be used by the general-purpose allocator.  It can be set at
24162306a36Sopenharmony_ciinitialization time if it is known that a channel will always be
24262306a36Sopenharmony_ciprivate.  Alternatively, it is set when dma_request_channel() finds an
24362306a36Sopenharmony_ciunused "public" channel.
24462306a36Sopenharmony_ci
24562306a36Sopenharmony_ciA couple caveats to note when implementing a driver and consumer:
24662306a36Sopenharmony_ci
24762306a36Sopenharmony_ci1. Once a channel has been privately allocated it will no longer be
24862306a36Sopenharmony_ci   considered by the general-purpose allocator even after a call to
24962306a36Sopenharmony_ci   dma_release_channel().
25062306a36Sopenharmony_ci2. Since capabilities are specified at the device level a dma_device
25162306a36Sopenharmony_ci   with multiple channels will either have all channels public, or all
25262306a36Sopenharmony_ci   channels private.
25362306a36Sopenharmony_ci
25462306a36Sopenharmony_ci5. Source
25562306a36Sopenharmony_ci---------
25662306a36Sopenharmony_ci
25762306a36Sopenharmony_ciinclude/linux/dmaengine.h:
25862306a36Sopenharmony_ci    core header file for DMA drivers and api users
25962306a36Sopenharmony_cidrivers/dma/dmaengine.c:
26062306a36Sopenharmony_ci    offload engine channel management routines
26162306a36Sopenharmony_cidrivers/dma/:
26262306a36Sopenharmony_ci    location for offload engine drivers
26362306a36Sopenharmony_ciinclude/linux/async_tx.h:
26462306a36Sopenharmony_ci    core header file for the async_tx api
26562306a36Sopenharmony_cicrypto/async_tx/async_tx.c:
26662306a36Sopenharmony_ci    async_tx interface to dmaengine and common code
26762306a36Sopenharmony_cicrypto/async_tx/async_memcpy.c:
26862306a36Sopenharmony_ci    copy offload
26962306a36Sopenharmony_cicrypto/async_tx/async_xor.c:
27062306a36Sopenharmony_ci    xor and xor zero sum offload
271