18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci=======================================
48c2ecf20Sopenharmony_ciThe padata parallel execution mechanism
58c2ecf20Sopenharmony_ci=======================================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ci:Date: May 2020
88c2ecf20Sopenharmony_ci
98c2ecf20Sopenharmony_ciPadata is a mechanism by which the kernel can farm jobs out to be done in
108c2ecf20Sopenharmony_ciparallel on multiple CPUs while optionally retaining their ordering.
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciIt was originally developed for IPsec, which needs to perform encryption and
138c2ecf20Sopenharmony_cidecryption on large numbers of packets without reordering those packets.  This
148c2ecf20Sopenharmony_ciis currently the sole consumer of padata's serialized job support.
158c2ecf20Sopenharmony_ci
168c2ecf20Sopenharmony_ciPadata also supports multithreaded jobs, splitting up the job evenly while load
178c2ecf20Sopenharmony_cibalancing and coordinating between threads.
188c2ecf20Sopenharmony_ci
198c2ecf20Sopenharmony_ciRunning Serialized Jobs
208c2ecf20Sopenharmony_ci=======================
218c2ecf20Sopenharmony_ci
228c2ecf20Sopenharmony_ciInitializing
238c2ecf20Sopenharmony_ci------------
248c2ecf20Sopenharmony_ci
258c2ecf20Sopenharmony_ciThe first step in using padata to run serialized jobs is to set up a
268c2ecf20Sopenharmony_cipadata_instance structure for overall control of how jobs are to be run::
278c2ecf20Sopenharmony_ci
288c2ecf20Sopenharmony_ci    #include <linux/padata.h>
298c2ecf20Sopenharmony_ci
308c2ecf20Sopenharmony_ci    struct padata_instance *padata_alloc(const char *name);
318c2ecf20Sopenharmony_ci
328c2ecf20Sopenharmony_ci'name' simply identifies the instance.
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ciThen, complete padata initialization by allocating a padata_shell::
358c2ecf20Sopenharmony_ci
368c2ecf20Sopenharmony_ci   struct padata_shell *padata_alloc_shell(struct padata_instance *pinst);
378c2ecf20Sopenharmony_ci
388c2ecf20Sopenharmony_ciA padata_shell is used to submit a job to padata and allows a series of such
398c2ecf20Sopenharmony_cijobs to be serialized independently.  A padata_instance may have one or more
408c2ecf20Sopenharmony_cipadata_shells associated with it, each allowing a separate series of jobs.
418c2ecf20Sopenharmony_ci
428c2ecf20Sopenharmony_ciModifying cpumasks
438c2ecf20Sopenharmony_ci------------------
448c2ecf20Sopenharmony_ci
458c2ecf20Sopenharmony_ciThe CPUs used to run jobs can be changed in two ways, programatically with
468c2ecf20Sopenharmony_cipadata_set_cpumask() or via sysfs.  The former is defined::
478c2ecf20Sopenharmony_ci
488c2ecf20Sopenharmony_ci    int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
498c2ecf20Sopenharmony_ci			   cpumask_var_t cpumask);
508c2ecf20Sopenharmony_ci
518c2ecf20Sopenharmony_ciHere cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a
528c2ecf20Sopenharmony_ciparallel cpumask describes which processors will be used to execute jobs
538c2ecf20Sopenharmony_cisubmitted to this instance in parallel and a serial cpumask defines which
548c2ecf20Sopenharmony_ciprocessors are allowed to be used as the serialization callback processor.
558c2ecf20Sopenharmony_cicpumask specifies the new cpumask to use.
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ciThere may be sysfs files for an instance's cpumasks.  For example, pcrypt's
588c2ecf20Sopenharmony_cilive in /sys/kernel/pcrypt/<instance-name>.  Within an instance's directory
598c2ecf20Sopenharmony_cithere are two files, parallel_cpumask and serial_cpumask, and either cpumask
608c2ecf20Sopenharmony_cimay be changed by echoing a bitmask into the file, for example::
618c2ecf20Sopenharmony_ci
628c2ecf20Sopenharmony_ci    echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ciReading one of these files shows the user-supplied cpumask, which may be
658c2ecf20Sopenharmony_cidifferent from the 'usable' cpumask.
668c2ecf20Sopenharmony_ci
678c2ecf20Sopenharmony_ciPadata maintains two pairs of cpumasks internally, the user-supplied cpumasks
688c2ecf20Sopenharmony_ciand the 'usable' cpumasks.  (Each pair consists of a parallel and a serial
698c2ecf20Sopenharmony_cicpumask.)  The user-supplied cpumasks default to all possible CPUs on instance
708c2ecf20Sopenharmony_ciallocation and may be changed as above.  The usable cpumasks are always a
718c2ecf20Sopenharmony_cisubset of the user-supplied cpumasks and contain only the online CPUs in the
728c2ecf20Sopenharmony_ciuser-supplied masks; these are the cpumasks padata actually uses.  So it is
738c2ecf20Sopenharmony_cilegal to supply a cpumask to padata that contains offline CPUs.  Once an
748c2ecf20Sopenharmony_cioffline CPU in the user-supplied cpumask comes online, padata is going to use
758c2ecf20Sopenharmony_ciit.
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ciChanging the CPU masks are expensive operations, so it should not be done with
788c2ecf20Sopenharmony_cigreat frequency.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ciRunning A Job
818c2ecf20Sopenharmony_ci-------------
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_ciActually submitting work to the padata instance requires the creation of a
848c2ecf20Sopenharmony_cipadata_priv structure, which represents one job::
858c2ecf20Sopenharmony_ci
868c2ecf20Sopenharmony_ci    struct padata_priv {
878c2ecf20Sopenharmony_ci        /* Other stuff here... */
888c2ecf20Sopenharmony_ci	void                    (*parallel)(struct padata_priv *padata);
898c2ecf20Sopenharmony_ci	void                    (*serial)(struct padata_priv *padata);
908c2ecf20Sopenharmony_ci    };
918c2ecf20Sopenharmony_ci
928c2ecf20Sopenharmony_ciThis structure will almost certainly be embedded within some larger
938c2ecf20Sopenharmony_cistructure specific to the work to be done.  Most of its fields are private to
948c2ecf20Sopenharmony_cipadata, but the structure should be zeroed at initialisation time, and the
958c2ecf20Sopenharmony_ciparallel() and serial() functions should be provided.  Those functions will
968c2ecf20Sopenharmony_cibe called in the process of getting the work done as we will see
978c2ecf20Sopenharmony_cimomentarily.
988c2ecf20Sopenharmony_ci
998c2ecf20Sopenharmony_ciThe submission of the job is done with::
1008c2ecf20Sopenharmony_ci
1018c2ecf20Sopenharmony_ci    int padata_do_parallel(struct padata_shell *ps,
1028c2ecf20Sopenharmony_ci		           struct padata_priv *padata, int *cb_cpu);
1038c2ecf20Sopenharmony_ci
1048c2ecf20Sopenharmony_ciThe ps and padata structures must be set up as described above; cb_cpu
1058c2ecf20Sopenharmony_cipoints to the preferred CPU to be used for the final callback when the job is
1068c2ecf20Sopenharmony_cidone; it must be in the current instance's CPU mask (if not the cb_cpu pointer
1078c2ecf20Sopenharmony_ciis updated to point to the CPU actually chosen).  The return value from
1088c2ecf20Sopenharmony_cipadata_do_parallel() is zero on success, indicating that the job is in
1098c2ecf20Sopenharmony_ciprogress. -EBUSY means that somebody, somewhere else is messing with the
1108c2ecf20Sopenharmony_ciinstance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the
1118c2ecf20Sopenharmony_ciserial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped
1128c2ecf20Sopenharmony_ciinstance.
1138c2ecf20Sopenharmony_ci
1148c2ecf20Sopenharmony_ciEach job submitted to padata_do_parallel() will, in turn, be passed to
1158c2ecf20Sopenharmony_ciexactly one call to the above-mentioned parallel() function, on one CPU, so
1168c2ecf20Sopenharmony_citrue parallelism is achieved by submitting multiple jobs.  parallel() runs with
1178c2ecf20Sopenharmony_cisoftware interrupts disabled and thus cannot sleep.  The parallel()
1188c2ecf20Sopenharmony_cifunction gets the padata_priv structure pointer as its lone parameter;
1198c2ecf20Sopenharmony_ciinformation about the actual work to be done is probably obtained by using
1208c2ecf20Sopenharmony_cicontainer_of() to find the enclosing structure.
1218c2ecf20Sopenharmony_ci
1228c2ecf20Sopenharmony_ciNote that parallel() has no return value; the padata subsystem assumes that
1238c2ecf20Sopenharmony_ciparallel() will take responsibility for the job from this point.  The job
1248c2ecf20Sopenharmony_cineed not be completed during this call, but, if parallel() leaves work
1258c2ecf20Sopenharmony_cioutstanding, it should be prepared to be called again with a new job before
1268c2ecf20Sopenharmony_cithe previous one completes.
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ciSerializing Jobs
1298c2ecf20Sopenharmony_ci----------------
1308c2ecf20Sopenharmony_ci
1318c2ecf20Sopenharmony_ciWhen a job does complete, parallel() (or whatever function actually finishes
1328c2ecf20Sopenharmony_cithe work) should inform padata of the fact with a call to::
1338c2ecf20Sopenharmony_ci
1348c2ecf20Sopenharmony_ci    void padata_do_serial(struct padata_priv *padata);
1358c2ecf20Sopenharmony_ci
1368c2ecf20Sopenharmony_ciAt some point in the future, padata_do_serial() will trigger a call to the
1378c2ecf20Sopenharmony_ciserial() function in the padata_priv structure.  That call will happen on
1388c2ecf20Sopenharmony_cithe CPU requested in the initial call to padata_do_parallel(); it, too, is
1398c2ecf20Sopenharmony_cirun with local software interrupts disabled.
1408c2ecf20Sopenharmony_ciNote that this call may be deferred for a while since the padata code takes
1418c2ecf20Sopenharmony_cipains to ensure that jobs are completed in the order in which they were
1428c2ecf20Sopenharmony_cisubmitted.
1438c2ecf20Sopenharmony_ci
1448c2ecf20Sopenharmony_ciDestroying
1458c2ecf20Sopenharmony_ci----------
1468c2ecf20Sopenharmony_ci
1478c2ecf20Sopenharmony_ciCleaning up a padata instance predictably involves calling the two free
1488c2ecf20Sopenharmony_cifunctions that correspond to the allocation in reverse::
1498c2ecf20Sopenharmony_ci
1508c2ecf20Sopenharmony_ci    void padata_free_shell(struct padata_shell *ps);
1518c2ecf20Sopenharmony_ci    void padata_free(struct padata_instance *pinst);
1528c2ecf20Sopenharmony_ci
1538c2ecf20Sopenharmony_ciIt is the user's responsibility to ensure all outstanding jobs are complete
1548c2ecf20Sopenharmony_cibefore any of the above are called.
1558c2ecf20Sopenharmony_ci
1568c2ecf20Sopenharmony_ciRunning Multithreaded Jobs
1578c2ecf20Sopenharmony_ci==========================
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ciA multithreaded job has a main thread and zero or more helper threads, with the
1608c2ecf20Sopenharmony_cimain thread participating in the job and then waiting until all helpers have
1618c2ecf20Sopenharmony_cifinished.  padata splits the job into units called chunks, where a chunk is a
1628c2ecf20Sopenharmony_cipiece of the job that one thread completes in one call to the thread function.
1638c2ecf20Sopenharmony_ci
1648c2ecf20Sopenharmony_ciA user has to do three things to run a multithreaded job.  First, describe the
1658c2ecf20Sopenharmony_cijob by defining a padata_mt_job structure, which is explained in the Interface
1668c2ecf20Sopenharmony_cisection.  This includes a pointer to the thread function, which padata will
1678c2ecf20Sopenharmony_cicall each time it assigns a job chunk to a thread.  Then, define the thread
1688c2ecf20Sopenharmony_cifunction, which accepts three arguments, ``start``, ``end``, and ``arg``, where
1698c2ecf20Sopenharmony_cithe first two delimit the range that the thread operates on and the last is a
1708c2ecf20Sopenharmony_cipointer to the job's shared state, if any.  Prepare the shared state, which is
1718c2ecf20Sopenharmony_citypically allocated on the main thread's stack.  Last, call
1728c2ecf20Sopenharmony_cipadata_do_multithreaded(), which will return once the job is finished.
1738c2ecf20Sopenharmony_ci
1748c2ecf20Sopenharmony_ciInterface
1758c2ecf20Sopenharmony_ci=========
1768c2ecf20Sopenharmony_ci
1778c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/padata.h
1788c2ecf20Sopenharmony_ci.. kernel-doc:: kernel/padata.c
179