162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci===================================== 462306a36Sopenharmony_ciAsynchronous Transfers/Transforms API 562306a36Sopenharmony_ci===================================== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ci.. Contents 862306a36Sopenharmony_ci 962306a36Sopenharmony_ci 1. INTRODUCTION 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ci 2 GENEALOGY 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ci 3 USAGE 1462306a36Sopenharmony_ci 3.1 General format of the API 1562306a36Sopenharmony_ci 3.2 Supported operations 1662306a36Sopenharmony_ci 3.3 Descriptor management 1762306a36Sopenharmony_ci 3.4 When does the operation execute? 1862306a36Sopenharmony_ci 3.5 When does the operation complete? 1962306a36Sopenharmony_ci 3.6 Constraints 2062306a36Sopenharmony_ci 3.7 Example 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ci 4 DMAENGINE DRIVER DEVELOPER NOTES 2362306a36Sopenharmony_ci 4.1 Conformance points 2462306a36Sopenharmony_ci 4.2 "My application needs exclusive control of hardware channels" 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ci 5 SOURCE 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ci1. Introduction 2962306a36Sopenharmony_ci=============== 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciThe async_tx API provides methods for describing a chain of asynchronous 3262306a36Sopenharmony_cibulk memory transfers/transforms with support for inter-transactional 3362306a36Sopenharmony_cidependencies. It is implemented as a dmaengine client that smooths over 3462306a36Sopenharmony_cithe details of different hardware offload engine implementations. Code 3562306a36Sopenharmony_cithat is written to the API can optimize for asynchronous operation and 3662306a36Sopenharmony_cithe API will fit the chain of operations to the available offload 3762306a36Sopenharmony_ciresources. 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ci2.Genealogy 4062306a36Sopenharmony_ci=========== 4162306a36Sopenharmony_ci 4262306a36Sopenharmony_ciThe API was initially designed to offload the memory copy and 4362306a36Sopenharmony_cixor-parity-calculations of the md-raid5 driver using the offload engines 4462306a36Sopenharmony_cipresent in the Intel(R) Xscale series of I/O processors. It also built 4562306a36Sopenharmony_cion the 'dmaengine' layer developed for offloading memory copies in the 4662306a36Sopenharmony_cinetwork stack using Intel(R) I/OAT engines. The following design 4762306a36Sopenharmony_cifeatures surfaced as a result: 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_ci1. implicit synchronous path: users of the API do not need to know if 5062306a36Sopenharmony_ci the platform they are running on has offload capabilities. The 5162306a36Sopenharmony_ci operation will be offloaded when an engine is available and carried out 5262306a36Sopenharmony_ci in software otherwise. 5362306a36Sopenharmony_ci2. cross channel dependency chains: the API allows a chain of dependent 5462306a36Sopenharmony_ci operations to be submitted, like xor->copy->xor in the raid5 case. The 5562306a36Sopenharmony_ci API automatically handles cases where the transition from one operation 5662306a36Sopenharmony_ci to another implies a hardware channel switch. 5762306a36Sopenharmony_ci3. dmaengine extensions to support multiple clients and operation types 5862306a36Sopenharmony_ci beyond 'memcpy' 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ci3. Usage 6162306a36Sopenharmony_ci======== 6262306a36Sopenharmony_ci 6362306a36Sopenharmony_ci3.1 General format of the API 6462306a36Sopenharmony_ci----------------------------- 6562306a36Sopenharmony_ci 6662306a36Sopenharmony_ci:: 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ci struct dma_async_tx_descriptor * 6962306a36Sopenharmony_ci async_<operation>(<op specific parameters>, struct async_submit_ctl *submit) 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ci3.2 Supported operations 7262306a36Sopenharmony_ci------------------------ 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ci======== ==================================================================== 7562306a36Sopenharmony_cimemcpy memory copy between a source and a destination buffer 7662306a36Sopenharmony_cimemset fill a destination buffer with a byte value 7762306a36Sopenharmony_cixor xor a series of source buffers and write the result to a 7862306a36Sopenharmony_ci destination buffer 7962306a36Sopenharmony_cixor_val xor a series of source buffers and set a flag if the 8062306a36Sopenharmony_ci result is zero. The implementation attempts to prevent 8162306a36Sopenharmony_ci writes to memory 8262306a36Sopenharmony_cipq generate the p+q (raid6 syndrome) from a series of source buffers 8362306a36Sopenharmony_cipq_val validate that a p and or q buffer are in sync with a given series of 8462306a36Sopenharmony_ci sources 8562306a36Sopenharmony_cidatap (raid6_datap_recov) recover a raid6 data block and the p block 8662306a36Sopenharmony_ci from the given sources 8762306a36Sopenharmony_ci2data (raid6_2data_recov) recover 2 raid6 data blocks from the given 8862306a36Sopenharmony_ci sources 8962306a36Sopenharmony_ci======== ==================================================================== 9062306a36Sopenharmony_ci 9162306a36Sopenharmony_ci3.3 Descriptor management 9262306a36Sopenharmony_ci------------------------- 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ciThe return value is non-NULL and points to a 'descriptor' when the operation 9562306a36Sopenharmony_cihas been queued to execute asynchronously. Descriptors are recycled 9662306a36Sopenharmony_ciresources, under control of the offload engine driver, to be reused as 9762306a36Sopenharmony_cioperations complete. When an application needs to submit a chain of 9862306a36Sopenharmony_cioperations it must guarantee that the descriptor is not automatically recycled 9962306a36Sopenharmony_cibefore the dependency is submitted. This requires that all descriptors be 10062306a36Sopenharmony_ciacknowledged by the application before the offload engine driver is allowed to 10162306a36Sopenharmony_cirecycle (or free) the descriptor. A descriptor can be acked by one of the 10262306a36Sopenharmony_cifollowing methods: 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ci1. setting the ASYNC_TX_ACK flag if no child operations are to be submitted 10562306a36Sopenharmony_ci2. submitting an unacknowledged descriptor as a dependency to another 10662306a36Sopenharmony_ci async_tx call will implicitly set the acknowledged state. 10762306a36Sopenharmony_ci3. calling async_tx_ack() on the descriptor. 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ci3.4 When does the operation execute? 11062306a36Sopenharmony_ci------------------------------------ 11162306a36Sopenharmony_ci 11262306a36Sopenharmony_ciOperations do not immediately issue after return from the 11362306a36Sopenharmony_ciasync_<operation> call. Offload engine drivers batch operations to 11462306a36Sopenharmony_ciimprove performance by reducing the number of mmio cycles needed to 11562306a36Sopenharmony_cimanage the channel. Once a driver-specific threshold is met the driver 11662306a36Sopenharmony_ciautomatically issues pending operations. An application can force this 11762306a36Sopenharmony_cievent by calling async_tx_issue_pending_all(). This operates on all 11862306a36Sopenharmony_cichannels since the application has no knowledge of channel to operation 11962306a36Sopenharmony_cimapping. 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ci3.5 When does the operation complete? 12262306a36Sopenharmony_ci------------------------------------- 12362306a36Sopenharmony_ci 12462306a36Sopenharmony_ciThere are two methods for an application to learn about the completion 12562306a36Sopenharmony_ciof an operation. 12662306a36Sopenharmony_ci 12762306a36Sopenharmony_ci1. Call dma_wait_for_async_tx(). This call causes the CPU to spin while 12862306a36Sopenharmony_ci it polls for the completion of the operation. It handles dependency 12962306a36Sopenharmony_ci chains and issuing pending operations. 13062306a36Sopenharmony_ci2. Specify a completion callback. The callback routine runs in tasklet 13162306a36Sopenharmony_ci context if the offload engine driver supports interrupts, or it is 13262306a36Sopenharmony_ci called in application context if the operation is carried out 13362306a36Sopenharmony_ci synchronously in software. The callback can be set in the call to 13462306a36Sopenharmony_ci async_<operation>, or when the application needs to submit a chain of 13562306a36Sopenharmony_ci unknown length it can use the async_trigger_callback() routine to set a 13662306a36Sopenharmony_ci completion interrupt/callback at the end of the chain. 13762306a36Sopenharmony_ci 13862306a36Sopenharmony_ci3.6 Constraints 13962306a36Sopenharmony_ci--------------- 14062306a36Sopenharmony_ci 14162306a36Sopenharmony_ci1. Calls to async_<operation> are not permitted in IRQ context. Other 14262306a36Sopenharmony_ci contexts are permitted provided constraint #2 is not violated. 14362306a36Sopenharmony_ci2. Completion callback routines cannot submit new operations. This 14462306a36Sopenharmony_ci results in recursion in the synchronous case and spin_locks being 14562306a36Sopenharmony_ci acquired twice in the asynchronous case. 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ci3.7 Example 14862306a36Sopenharmony_ci----------- 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ciPerform a xor->copy->xor operation where each operation depends on the 15162306a36Sopenharmony_ciresult from the previous operation:: 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ci void callback(void *param) 15462306a36Sopenharmony_ci { 15562306a36Sopenharmony_ci struct completion *cmp = param; 15662306a36Sopenharmony_ci 15762306a36Sopenharmony_ci complete(cmp); 15862306a36Sopenharmony_ci } 15962306a36Sopenharmony_ci 16062306a36Sopenharmony_ci void run_xor_copy_xor(struct page **xor_srcs, 16162306a36Sopenharmony_ci int xor_src_cnt, 16262306a36Sopenharmony_ci struct page *xor_dest, 16362306a36Sopenharmony_ci size_t xor_len, 16462306a36Sopenharmony_ci struct page *copy_src, 16562306a36Sopenharmony_ci struct page *copy_dest, 16662306a36Sopenharmony_ci size_t copy_len) 16762306a36Sopenharmony_ci { 16862306a36Sopenharmony_ci struct dma_async_tx_descriptor *tx; 16962306a36Sopenharmony_ci addr_conv_t addr_conv[xor_src_cnt]; 17062306a36Sopenharmony_ci struct async_submit_ctl submit; 17162306a36Sopenharmony_ci addr_conv_t addr_conv[NDISKS]; 17262306a36Sopenharmony_ci struct completion cmp; 17362306a36Sopenharmony_ci 17462306a36Sopenharmony_ci init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL, 17562306a36Sopenharmony_ci addr_conv); 17662306a36Sopenharmony_ci tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit) 17762306a36Sopenharmony_ci 17862306a36Sopenharmony_ci submit->depend_tx = tx; 17962306a36Sopenharmony_ci tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, &submit); 18062306a36Sopenharmony_ci 18162306a36Sopenharmony_ci init_completion(&cmp); 18262306a36Sopenharmony_ci init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, tx, 18362306a36Sopenharmony_ci callback, &cmp, addr_conv); 18462306a36Sopenharmony_ci tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit); 18562306a36Sopenharmony_ci 18662306a36Sopenharmony_ci async_tx_issue_pending_all(); 18762306a36Sopenharmony_ci 18862306a36Sopenharmony_ci wait_for_completion(&cmp); 18962306a36Sopenharmony_ci } 19062306a36Sopenharmony_ci 19162306a36Sopenharmony_ciSee include/linux/async_tx.h for more information on the flags. See the 19262306a36Sopenharmony_ciops_run_* and ops_complete_* routines in drivers/md/raid5.c for more 19362306a36Sopenharmony_ciimplementation examples. 19462306a36Sopenharmony_ci 19562306a36Sopenharmony_ci4. Driver Development Notes 19662306a36Sopenharmony_ci=========================== 19762306a36Sopenharmony_ci 19862306a36Sopenharmony_ci4.1 Conformance points 19962306a36Sopenharmony_ci---------------------- 20062306a36Sopenharmony_ci 20162306a36Sopenharmony_ciThere are a few conformance points required in dmaengine drivers to 20262306a36Sopenharmony_ciaccommodate assumptions made by applications using the async_tx API: 20362306a36Sopenharmony_ci 20462306a36Sopenharmony_ci1. Completion callbacks are expected to happen in tasklet context 20562306a36Sopenharmony_ci2. dma_async_tx_descriptor fields are never manipulated in IRQ context 20662306a36Sopenharmony_ci3. Use async_tx_run_dependencies() in the descriptor clean up path to 20762306a36Sopenharmony_ci handle submission of dependent operations 20862306a36Sopenharmony_ci 20962306a36Sopenharmony_ci4.2 "My application needs exclusive control of hardware channels" 21062306a36Sopenharmony_ci----------------------------------------------------------------- 21162306a36Sopenharmony_ci 21262306a36Sopenharmony_ciPrimarily this requirement arises from cases where a DMA engine driver 21362306a36Sopenharmony_ciis being used to support device-to-memory operations. A channel that is 21462306a36Sopenharmony_ciperforming these operations cannot, for many platform specific reasons, 21562306a36Sopenharmony_cibe shared. For these cases the dma_request_channel() interface is 21662306a36Sopenharmony_ciprovided. 21762306a36Sopenharmony_ci 21862306a36Sopenharmony_ciThe interface is:: 21962306a36Sopenharmony_ci 22062306a36Sopenharmony_ci struct dma_chan *dma_request_channel(dma_cap_mask_t mask, 22162306a36Sopenharmony_ci dma_filter_fn filter_fn, 22262306a36Sopenharmony_ci void *filter_param); 22362306a36Sopenharmony_ci 22462306a36Sopenharmony_ciWhere dma_filter_fn is defined as:: 22562306a36Sopenharmony_ci 22662306a36Sopenharmony_ci typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param); 22762306a36Sopenharmony_ci 22862306a36Sopenharmony_ciWhen the optional 'filter_fn' parameter is set to NULL 22962306a36Sopenharmony_cidma_request_channel simply returns the first channel that satisfies the 23062306a36Sopenharmony_cicapability mask. Otherwise, when the mask parameter is insufficient for 23162306a36Sopenharmony_cispecifying the necessary channel, the filter_fn routine can be used to 23262306a36Sopenharmony_cidisposition the available channels in the system. The filter_fn routine 23362306a36Sopenharmony_ciis called once for each free channel in the system. Upon seeing a 23462306a36Sopenharmony_cisuitable channel filter_fn returns DMA_ACK which flags that channel to 23562306a36Sopenharmony_cibe the return value from dma_request_channel. A channel allocated via 23662306a36Sopenharmony_cithis interface is exclusive to the caller, until dma_release_channel() 23762306a36Sopenharmony_ciis called. 23862306a36Sopenharmony_ci 23962306a36Sopenharmony_ciThe DMA_PRIVATE capability flag is used to tag dma devices that should 24062306a36Sopenharmony_cinot be used by the general-purpose allocator. It can be set at 24162306a36Sopenharmony_ciinitialization time if it is known that a channel will always be 24262306a36Sopenharmony_ciprivate. Alternatively, it is set when dma_request_channel() finds an 24362306a36Sopenharmony_ciunused "public" channel. 24462306a36Sopenharmony_ci 24562306a36Sopenharmony_ciA couple caveats to note when implementing a driver and consumer: 24662306a36Sopenharmony_ci 24762306a36Sopenharmony_ci1. Once a channel has been privately allocated it will no longer be 24862306a36Sopenharmony_ci considered by the general-purpose allocator even after a call to 24962306a36Sopenharmony_ci dma_release_channel(). 25062306a36Sopenharmony_ci2. Since capabilities are specified at the device level a dma_device 25162306a36Sopenharmony_ci with multiple channels will either have all channels public, or all 25262306a36Sopenharmony_ci channels private. 25362306a36Sopenharmony_ci 25462306a36Sopenharmony_ci5. Source 25562306a36Sopenharmony_ci--------- 25662306a36Sopenharmony_ci 25762306a36Sopenharmony_ciinclude/linux/dmaengine.h: 25862306a36Sopenharmony_ci core header file for DMA drivers and api users 25962306a36Sopenharmony_cidrivers/dma/dmaengine.c: 26062306a36Sopenharmony_ci offload engine channel management routines 26162306a36Sopenharmony_cidrivers/dma/: 26262306a36Sopenharmony_ci location for offload engine drivers 26362306a36Sopenharmony_ciinclude/linux/async_tx.h: 26462306a36Sopenharmony_ci core header file for the async_tx api 26562306a36Sopenharmony_cicrypto/async_tx/async_tx.c: 26662306a36Sopenharmony_ci async_tx interface to dmaengine and common code 26762306a36Sopenharmony_cicrypto/async_tx/async_memcpy.c: 26862306a36Sopenharmony_ci copy offload 26962306a36Sopenharmony_cicrypto/async_tx/async_xor.c: 27062306a36Sopenharmony_ci xor and xor zero sum offload 271