162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci======================================= 462306a36Sopenharmony_ciThe padata parallel execution mechanism 562306a36Sopenharmony_ci======================================= 662306a36Sopenharmony_ci 762306a36Sopenharmony_ci:Date: May 2020 862306a36Sopenharmony_ci 962306a36Sopenharmony_ciPadata is a mechanism by which the kernel can farm jobs out to be done in 1062306a36Sopenharmony_ciparallel on multiple CPUs while optionally retaining their ordering. 1162306a36Sopenharmony_ci 1262306a36Sopenharmony_ciIt was originally developed for IPsec, which needs to perform encryption and 1362306a36Sopenharmony_cidecryption on large numbers of packets without reordering those packets. This 1462306a36Sopenharmony_ciis currently the sole consumer of padata's serialized job support. 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ciPadata also supports multithreaded jobs, splitting up the job evenly while load 1762306a36Sopenharmony_cibalancing and coordinating between threads. 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciRunning Serialized Jobs 2062306a36Sopenharmony_ci======================= 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciInitializing 2362306a36Sopenharmony_ci------------ 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ciThe first step in using padata to run serialized jobs is to set up a 2662306a36Sopenharmony_cipadata_instance structure for overall control of how jobs are to be run:: 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ci #include <linux/padata.h> 2962306a36Sopenharmony_ci 3062306a36Sopenharmony_ci struct padata_instance *padata_alloc(const char *name); 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ci'name' simply identifies the instance. 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ciThen, complete padata initialization by allocating a padata_shell:: 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ci struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); 3762306a36Sopenharmony_ci 3862306a36Sopenharmony_ciA padata_shell is used to submit a job to padata and allows a series of such 3962306a36Sopenharmony_cijobs to be serialized independently. A padata_instance may have one or more 4062306a36Sopenharmony_cipadata_shells associated with it, each allowing a separate series of jobs. 4162306a36Sopenharmony_ci 4262306a36Sopenharmony_ciModifying cpumasks 4362306a36Sopenharmony_ci------------------ 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ciThe CPUs used to run jobs can be changed in two ways, programmatically with 4662306a36Sopenharmony_cipadata_set_cpumask() or via sysfs. The former is defined:: 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ci int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, 4962306a36Sopenharmony_ci cpumask_var_t cpumask); 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ciHere cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a 5262306a36Sopenharmony_ciparallel cpumask describes which processors will be used to execute jobs 5362306a36Sopenharmony_cisubmitted to this instance in parallel and a serial cpumask defines which 5462306a36Sopenharmony_ciprocessors are allowed to be used as the serialization callback processor. 5562306a36Sopenharmony_cicpumask specifies the new cpumask to use. 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ciThere may be sysfs files for an instance's cpumasks. For example, pcrypt's 5862306a36Sopenharmony_cilive in /sys/kernel/pcrypt/<instance-name>. Within an instance's directory 5962306a36Sopenharmony_cithere are two files, parallel_cpumask and serial_cpumask, and either cpumask 6062306a36Sopenharmony_cimay be changed by echoing a bitmask into the file, for example:: 6162306a36Sopenharmony_ci 6262306a36Sopenharmony_ci echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ciReading one of these files shows the user-supplied cpumask, which may be 6562306a36Sopenharmony_cidifferent from the 'usable' cpumask. 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ciPadata maintains two pairs of cpumasks internally, the user-supplied cpumasks 6862306a36Sopenharmony_ciand the 'usable' cpumasks. (Each pair consists of a parallel and a serial 6962306a36Sopenharmony_cicpumask.) The user-supplied cpumasks default to all possible CPUs on instance 7062306a36Sopenharmony_ciallocation and may be changed as above. The usable cpumasks are always a 7162306a36Sopenharmony_cisubset of the user-supplied cpumasks and contain only the online CPUs in the 7262306a36Sopenharmony_ciuser-supplied masks; these are the cpumasks padata actually uses. So it is 7362306a36Sopenharmony_cilegal to supply a cpumask to padata that contains offline CPUs. Once an 7462306a36Sopenharmony_cioffline CPU in the user-supplied cpumask comes online, padata is going to use 7562306a36Sopenharmony_ciit. 7662306a36Sopenharmony_ci 7762306a36Sopenharmony_ciChanging the CPU masks are expensive operations, so it should not be done with 7862306a36Sopenharmony_cigreat frequency. 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ciRunning A Job 8162306a36Sopenharmony_ci------------- 8262306a36Sopenharmony_ci 8362306a36Sopenharmony_ciActually submitting work to the padata instance requires the creation of a 8462306a36Sopenharmony_cipadata_priv structure, which represents one job:: 8562306a36Sopenharmony_ci 8662306a36Sopenharmony_ci struct padata_priv { 8762306a36Sopenharmony_ci /* Other stuff here... */ 8862306a36Sopenharmony_ci void (*parallel)(struct padata_priv *padata); 8962306a36Sopenharmony_ci void (*serial)(struct padata_priv *padata); 9062306a36Sopenharmony_ci }; 9162306a36Sopenharmony_ci 9262306a36Sopenharmony_ciThis structure will almost certainly be embedded within some larger 9362306a36Sopenharmony_cistructure specific to the work to be done. Most of its fields are private to 9462306a36Sopenharmony_cipadata, but the structure should be zeroed at initialisation time, and the 9562306a36Sopenharmony_ciparallel() and serial() functions should be provided. Those functions will 9662306a36Sopenharmony_cibe called in the process of getting the work done as we will see 9762306a36Sopenharmony_cimomentarily. 9862306a36Sopenharmony_ci 9962306a36Sopenharmony_ciThe submission of the job is done with:: 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ci int padata_do_parallel(struct padata_shell *ps, 10262306a36Sopenharmony_ci struct padata_priv *padata, int *cb_cpu); 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ciThe ps and padata structures must be set up as described above; cb_cpu 10562306a36Sopenharmony_cipoints to the preferred CPU to be used for the final callback when the job is 10662306a36Sopenharmony_cidone; it must be in the current instance's CPU mask (if not the cb_cpu pointer 10762306a36Sopenharmony_ciis updated to point to the CPU actually chosen). The return value from 10862306a36Sopenharmony_cipadata_do_parallel() is zero on success, indicating that the job is in 10962306a36Sopenharmony_ciprogress. -EBUSY means that somebody, somewhere else is messing with the 11062306a36Sopenharmony_ciinstance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the 11162306a36Sopenharmony_ciserial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped 11262306a36Sopenharmony_ciinstance. 11362306a36Sopenharmony_ci 11462306a36Sopenharmony_ciEach job submitted to padata_do_parallel() will, in turn, be passed to 11562306a36Sopenharmony_ciexactly one call to the above-mentioned parallel() function, on one CPU, so 11662306a36Sopenharmony_citrue parallelism is achieved by submitting multiple jobs. parallel() runs with 11762306a36Sopenharmony_cisoftware interrupts disabled and thus cannot sleep. The parallel() 11862306a36Sopenharmony_cifunction gets the padata_priv structure pointer as its lone parameter; 11962306a36Sopenharmony_ciinformation about the actual work to be done is probably obtained by using 12062306a36Sopenharmony_cicontainer_of() to find the enclosing structure. 12162306a36Sopenharmony_ci 12262306a36Sopenharmony_ciNote that parallel() has no return value; the padata subsystem assumes that 12362306a36Sopenharmony_ciparallel() will take responsibility for the job from this point. The job 12462306a36Sopenharmony_cineed not be completed during this call, but, if parallel() leaves work 12562306a36Sopenharmony_cioutstanding, it should be prepared to be called again with a new job before 12662306a36Sopenharmony_cithe previous one completes. 12762306a36Sopenharmony_ci 12862306a36Sopenharmony_ciSerializing Jobs 12962306a36Sopenharmony_ci---------------- 13062306a36Sopenharmony_ci 13162306a36Sopenharmony_ciWhen a job does complete, parallel() (or whatever function actually finishes 13262306a36Sopenharmony_cithe work) should inform padata of the fact with a call to:: 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ci void padata_do_serial(struct padata_priv *padata); 13562306a36Sopenharmony_ci 13662306a36Sopenharmony_ciAt some point in the future, padata_do_serial() will trigger a call to the 13762306a36Sopenharmony_ciserial() function in the padata_priv structure. That call will happen on 13862306a36Sopenharmony_cithe CPU requested in the initial call to padata_do_parallel(); it, too, is 13962306a36Sopenharmony_cirun with local software interrupts disabled. 14062306a36Sopenharmony_ciNote that this call may be deferred for a while since the padata code takes 14162306a36Sopenharmony_cipains to ensure that jobs are completed in the order in which they were 14262306a36Sopenharmony_cisubmitted. 14362306a36Sopenharmony_ci 14462306a36Sopenharmony_ciDestroying 14562306a36Sopenharmony_ci---------- 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ciCleaning up a padata instance predictably involves calling the two free 14862306a36Sopenharmony_cifunctions that correspond to the allocation in reverse:: 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ci void padata_free_shell(struct padata_shell *ps); 15162306a36Sopenharmony_ci void padata_free(struct padata_instance *pinst); 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ciIt is the user's responsibility to ensure all outstanding jobs are complete 15462306a36Sopenharmony_cibefore any of the above are called. 15562306a36Sopenharmony_ci 15662306a36Sopenharmony_ciRunning Multithreaded Jobs 15762306a36Sopenharmony_ci========================== 15862306a36Sopenharmony_ci 15962306a36Sopenharmony_ciA multithreaded job has a main thread and zero or more helper threads, with the 16062306a36Sopenharmony_cimain thread participating in the job and then waiting until all helpers have 16162306a36Sopenharmony_cifinished. padata splits the job into units called chunks, where a chunk is a 16262306a36Sopenharmony_cipiece of the job that one thread completes in one call to the thread function. 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ciA user has to do three things to run a multithreaded job. First, describe the 16562306a36Sopenharmony_cijob by defining a padata_mt_job structure, which is explained in the Interface 16662306a36Sopenharmony_cisection. This includes a pointer to the thread function, which padata will 16762306a36Sopenharmony_cicall each time it assigns a job chunk to a thread. Then, define the thread 16862306a36Sopenharmony_cifunction, which accepts three arguments, ``start``, ``end``, and ``arg``, where 16962306a36Sopenharmony_cithe first two delimit the range that the thread operates on and the last is a 17062306a36Sopenharmony_cipointer to the job's shared state, if any. Prepare the shared state, which is 17162306a36Sopenharmony_citypically allocated on the main thread's stack. Last, call 17262306a36Sopenharmony_cipadata_do_multithreaded(), which will return once the job is finished. 17362306a36Sopenharmony_ci 17462306a36Sopenharmony_ciInterface 17562306a36Sopenharmony_ci========= 17662306a36Sopenharmony_ci 17762306a36Sopenharmony_ci.. kernel-doc:: include/linux/padata.h 17862306a36Sopenharmony_ci.. kernel-doc:: kernel/padata.c 179