18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci======================================= 48c2ecf20Sopenharmony_ciThe padata parallel execution mechanism 58c2ecf20Sopenharmony_ci======================================= 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ci:Date: May 2020 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ciPadata is a mechanism by which the kernel can farm jobs out to be done in 108c2ecf20Sopenharmony_ciparallel on multiple CPUs while optionally retaining their ordering. 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ciIt was originally developed for IPsec, which needs to perform encryption and 138c2ecf20Sopenharmony_cidecryption on large numbers of packets without reordering those packets. This 148c2ecf20Sopenharmony_ciis currently the sole consumer of padata's serialized job support. 158c2ecf20Sopenharmony_ci 168c2ecf20Sopenharmony_ciPadata also supports multithreaded jobs, splitting up the job evenly while load 178c2ecf20Sopenharmony_cibalancing and coordinating between threads. 188c2ecf20Sopenharmony_ci 198c2ecf20Sopenharmony_ciRunning Serialized Jobs 208c2ecf20Sopenharmony_ci======================= 218c2ecf20Sopenharmony_ci 228c2ecf20Sopenharmony_ciInitializing 238c2ecf20Sopenharmony_ci------------ 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ciThe first step in using padata to run serialized jobs is to set up a 268c2ecf20Sopenharmony_cipadata_instance structure for overall control of how jobs are to be run:: 278c2ecf20Sopenharmony_ci 288c2ecf20Sopenharmony_ci #include <linux/padata.h> 298c2ecf20Sopenharmony_ci 308c2ecf20Sopenharmony_ci struct padata_instance *padata_alloc(const char *name); 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_ci'name' simply identifies the instance. 338c2ecf20Sopenharmony_ci 348c2ecf20Sopenharmony_ciThen, complete padata initialization by allocating a padata_shell:: 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ci struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); 378c2ecf20Sopenharmony_ci 388c2ecf20Sopenharmony_ciA padata_shell is used to submit a job to padata and allows a series of such 398c2ecf20Sopenharmony_cijobs to be serialized independently. A padata_instance may have one or more 408c2ecf20Sopenharmony_cipadata_shells associated with it, each allowing a separate series of jobs. 418c2ecf20Sopenharmony_ci 428c2ecf20Sopenharmony_ciModifying cpumasks 438c2ecf20Sopenharmony_ci------------------ 448c2ecf20Sopenharmony_ci 458c2ecf20Sopenharmony_ciThe CPUs used to run jobs can be changed in two ways, programatically with 468c2ecf20Sopenharmony_cipadata_set_cpumask() or via sysfs. The former is defined:: 478c2ecf20Sopenharmony_ci 488c2ecf20Sopenharmony_ci int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, 498c2ecf20Sopenharmony_ci cpumask_var_t cpumask); 508c2ecf20Sopenharmony_ci 518c2ecf20Sopenharmony_ciHere cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a 528c2ecf20Sopenharmony_ciparallel cpumask describes which processors will be used to execute jobs 538c2ecf20Sopenharmony_cisubmitted to this instance in parallel and a serial cpumask defines which 548c2ecf20Sopenharmony_ciprocessors are allowed to be used as the serialization callback processor. 558c2ecf20Sopenharmony_cicpumask specifies the new cpumask to use. 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ciThere may be sysfs files for an instance's cpumasks. For example, pcrypt's 588c2ecf20Sopenharmony_cilive in /sys/kernel/pcrypt/<instance-name>. Within an instance's directory 598c2ecf20Sopenharmony_cithere are two files, parallel_cpumask and serial_cpumask, and either cpumask 608c2ecf20Sopenharmony_cimay be changed by echoing a bitmask into the file, for example:: 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ci echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ciReading one of these files shows the user-supplied cpumask, which may be 658c2ecf20Sopenharmony_cidifferent from the 'usable' cpumask. 668c2ecf20Sopenharmony_ci 678c2ecf20Sopenharmony_ciPadata maintains two pairs of cpumasks internally, the user-supplied cpumasks 688c2ecf20Sopenharmony_ciand the 'usable' cpumasks. (Each pair consists of a parallel and a serial 698c2ecf20Sopenharmony_cicpumask.) The user-supplied cpumasks default to all possible CPUs on instance 708c2ecf20Sopenharmony_ciallocation and may be changed as above. The usable cpumasks are always a 718c2ecf20Sopenharmony_cisubset of the user-supplied cpumasks and contain only the online CPUs in the 728c2ecf20Sopenharmony_ciuser-supplied masks; these are the cpumasks padata actually uses. So it is 738c2ecf20Sopenharmony_cilegal to supply a cpumask to padata that contains offline CPUs. Once an 748c2ecf20Sopenharmony_cioffline CPU in the user-supplied cpumask comes online, padata is going to use 758c2ecf20Sopenharmony_ciit. 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciChanging the CPU masks are expensive operations, so it should not be done with 788c2ecf20Sopenharmony_cigreat frequency. 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ciRunning A Job 818c2ecf20Sopenharmony_ci------------- 828c2ecf20Sopenharmony_ci 838c2ecf20Sopenharmony_ciActually submitting work to the padata instance requires the creation of a 848c2ecf20Sopenharmony_cipadata_priv structure, which represents one job:: 858c2ecf20Sopenharmony_ci 868c2ecf20Sopenharmony_ci struct padata_priv { 878c2ecf20Sopenharmony_ci /* Other stuff here... */ 888c2ecf20Sopenharmony_ci void (*parallel)(struct padata_priv *padata); 898c2ecf20Sopenharmony_ci void (*serial)(struct padata_priv *padata); 908c2ecf20Sopenharmony_ci }; 918c2ecf20Sopenharmony_ci 928c2ecf20Sopenharmony_ciThis structure will almost certainly be embedded within some larger 938c2ecf20Sopenharmony_cistructure specific to the work to be done. Most of its fields are private to 948c2ecf20Sopenharmony_cipadata, but the structure should be zeroed at initialisation time, and the 958c2ecf20Sopenharmony_ciparallel() and serial() functions should be provided. Those functions will 968c2ecf20Sopenharmony_cibe called in the process of getting the work done as we will see 978c2ecf20Sopenharmony_cimomentarily. 988c2ecf20Sopenharmony_ci 998c2ecf20Sopenharmony_ciThe submission of the job is done with:: 1008c2ecf20Sopenharmony_ci 1018c2ecf20Sopenharmony_ci int padata_do_parallel(struct padata_shell *ps, 1028c2ecf20Sopenharmony_ci struct padata_priv *padata, int *cb_cpu); 1038c2ecf20Sopenharmony_ci 1048c2ecf20Sopenharmony_ciThe ps and padata structures must be set up as described above; cb_cpu 1058c2ecf20Sopenharmony_cipoints to the preferred CPU to be used for the final callback when the job is 1068c2ecf20Sopenharmony_cidone; it must be in the current instance's CPU mask (if not the cb_cpu pointer 1078c2ecf20Sopenharmony_ciis updated to point to the CPU actually chosen). The return value from 1088c2ecf20Sopenharmony_cipadata_do_parallel() is zero on success, indicating that the job is in 1098c2ecf20Sopenharmony_ciprogress. -EBUSY means that somebody, somewhere else is messing with the 1108c2ecf20Sopenharmony_ciinstance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the 1118c2ecf20Sopenharmony_ciserial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped 1128c2ecf20Sopenharmony_ciinstance. 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_ciEach job submitted to padata_do_parallel() will, in turn, be passed to 1158c2ecf20Sopenharmony_ciexactly one call to the above-mentioned parallel() function, on one CPU, so 1168c2ecf20Sopenharmony_citrue parallelism is achieved by submitting multiple jobs. parallel() runs with 1178c2ecf20Sopenharmony_cisoftware interrupts disabled and thus cannot sleep. The parallel() 1188c2ecf20Sopenharmony_cifunction gets the padata_priv structure pointer as its lone parameter; 1198c2ecf20Sopenharmony_ciinformation about the actual work to be done is probably obtained by using 1208c2ecf20Sopenharmony_cicontainer_of() to find the enclosing structure. 1218c2ecf20Sopenharmony_ci 1228c2ecf20Sopenharmony_ciNote that parallel() has no return value; the padata subsystem assumes that 1238c2ecf20Sopenharmony_ciparallel() will take responsibility for the job from this point. The job 1248c2ecf20Sopenharmony_cineed not be completed during this call, but, if parallel() leaves work 1258c2ecf20Sopenharmony_cioutstanding, it should be prepared to be called again with a new job before 1268c2ecf20Sopenharmony_cithe previous one completes. 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ciSerializing Jobs 1298c2ecf20Sopenharmony_ci---------------- 1308c2ecf20Sopenharmony_ci 1318c2ecf20Sopenharmony_ciWhen a job does complete, parallel() (or whatever function actually finishes 1328c2ecf20Sopenharmony_cithe work) should inform padata of the fact with a call to:: 1338c2ecf20Sopenharmony_ci 1348c2ecf20Sopenharmony_ci void padata_do_serial(struct padata_priv *padata); 1358c2ecf20Sopenharmony_ci 1368c2ecf20Sopenharmony_ciAt some point in the future, padata_do_serial() will trigger a call to the 1378c2ecf20Sopenharmony_ciserial() function in the padata_priv structure. That call will happen on 1388c2ecf20Sopenharmony_cithe CPU requested in the initial call to padata_do_parallel(); it, too, is 1398c2ecf20Sopenharmony_cirun with local software interrupts disabled. 1408c2ecf20Sopenharmony_ciNote that this call may be deferred for a while since the padata code takes 1418c2ecf20Sopenharmony_cipains to ensure that jobs are completed in the order in which they were 1428c2ecf20Sopenharmony_cisubmitted. 1438c2ecf20Sopenharmony_ci 1448c2ecf20Sopenharmony_ciDestroying 1458c2ecf20Sopenharmony_ci---------- 1468c2ecf20Sopenharmony_ci 1478c2ecf20Sopenharmony_ciCleaning up a padata instance predictably involves calling the two free 1488c2ecf20Sopenharmony_cifunctions that correspond to the allocation in reverse:: 1498c2ecf20Sopenharmony_ci 1508c2ecf20Sopenharmony_ci void padata_free_shell(struct padata_shell *ps); 1518c2ecf20Sopenharmony_ci void padata_free(struct padata_instance *pinst); 1528c2ecf20Sopenharmony_ci 1538c2ecf20Sopenharmony_ciIt is the user's responsibility to ensure all outstanding jobs are complete 1548c2ecf20Sopenharmony_cibefore any of the above are called. 1558c2ecf20Sopenharmony_ci 1568c2ecf20Sopenharmony_ciRunning Multithreaded Jobs 1578c2ecf20Sopenharmony_ci========================== 1588c2ecf20Sopenharmony_ci 1598c2ecf20Sopenharmony_ciA multithreaded job has a main thread and zero or more helper threads, with the 1608c2ecf20Sopenharmony_cimain thread participating in the job and then waiting until all helpers have 1618c2ecf20Sopenharmony_cifinished. padata splits the job into units called chunks, where a chunk is a 1628c2ecf20Sopenharmony_cipiece of the job that one thread completes in one call to the thread function. 1638c2ecf20Sopenharmony_ci 1648c2ecf20Sopenharmony_ciA user has to do three things to run a multithreaded job. First, describe the 1658c2ecf20Sopenharmony_cijob by defining a padata_mt_job structure, which is explained in the Interface 1668c2ecf20Sopenharmony_cisection. This includes a pointer to the thread function, which padata will 1678c2ecf20Sopenharmony_cicall each time it assigns a job chunk to a thread. Then, define the thread 1688c2ecf20Sopenharmony_cifunction, which accepts three arguments, ``start``, ``end``, and ``arg``, where 1698c2ecf20Sopenharmony_cithe first two delimit the range that the thread operates on and the last is a 1708c2ecf20Sopenharmony_cipointer to the job's shared state, if any. Prepare the shared state, which is 1718c2ecf20Sopenharmony_citypically allocated on the main thread's stack. Last, call 1728c2ecf20Sopenharmony_cipadata_do_multithreaded(), which will return once the job is finished. 1738c2ecf20Sopenharmony_ci 1748c2ecf20Sopenharmony_ciInterface 1758c2ecf20Sopenharmony_ci========= 1768c2ecf20Sopenharmony_ci 1778c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/padata.h 1788c2ecf20Sopenharmony_ci.. kernel-doc:: kernel/padata.c 179