162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci=======================================
462306a36Sopenharmony_ciThe padata parallel execution mechanism
562306a36Sopenharmony_ci=======================================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ci:Date: May 2020
862306a36Sopenharmony_ci
962306a36Sopenharmony_ciPadata is a mechanism by which the kernel can farm jobs out to be done in
1062306a36Sopenharmony_ciparallel on multiple CPUs while optionally retaining their ordering.
1162306a36Sopenharmony_ci
1262306a36Sopenharmony_ciIt was originally developed for IPsec, which needs to perform encryption and
1362306a36Sopenharmony_cidecryption on large numbers of packets without reordering those packets.  This
1462306a36Sopenharmony_ciis currently the sole consumer of padata's serialized job support.
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ciPadata also supports multithreaded jobs, splitting up the job evenly while load
1762306a36Sopenharmony_cibalancing and coordinating between threads.
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciRunning Serialized Jobs
2062306a36Sopenharmony_ci=======================
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciInitializing
2362306a36Sopenharmony_ci------------
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ciThe first step in using padata to run serialized jobs is to set up a
2662306a36Sopenharmony_cipadata_instance structure for overall control of how jobs are to be run::
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ci    #include <linux/padata.h>
2962306a36Sopenharmony_ci
3062306a36Sopenharmony_ci    struct padata_instance *padata_alloc(const char *name);
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ci'name' simply identifies the instance.
3362306a36Sopenharmony_ci
3462306a36Sopenharmony_ciThen, complete padata initialization by allocating a padata_shell::
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ci   struct padata_shell *padata_alloc_shell(struct padata_instance *pinst);
3762306a36Sopenharmony_ci
3862306a36Sopenharmony_ciA padata_shell is used to submit a job to padata and allows a series of such
3962306a36Sopenharmony_cijobs to be serialized independently.  A padata_instance may have one or more
4062306a36Sopenharmony_cipadata_shells associated with it, each allowing a separate series of jobs.
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ciModifying cpumasks
4362306a36Sopenharmony_ci------------------
4462306a36Sopenharmony_ci
4562306a36Sopenharmony_ciThe CPUs used to run jobs can be changed in two ways, programmatically with
4662306a36Sopenharmony_cipadata_set_cpumask() or via sysfs.  The former is defined::
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ci    int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
4962306a36Sopenharmony_ci			   cpumask_var_t cpumask);
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ciHere cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a
5262306a36Sopenharmony_ciparallel cpumask describes which processors will be used to execute jobs
5362306a36Sopenharmony_cisubmitted to this instance in parallel and a serial cpumask defines which
5462306a36Sopenharmony_ciprocessors are allowed to be used as the serialization callback processor.
5562306a36Sopenharmony_cicpumask specifies the new cpumask to use.
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ciThere may be sysfs files for an instance's cpumasks.  For example, pcrypt's
5862306a36Sopenharmony_cilive in /sys/kernel/pcrypt/<instance-name>.  Within an instance's directory
5962306a36Sopenharmony_cithere are two files, parallel_cpumask and serial_cpumask, and either cpumask
6062306a36Sopenharmony_cimay be changed by echoing a bitmask into the file, for example::
6162306a36Sopenharmony_ci
6262306a36Sopenharmony_ci    echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ciReading one of these files shows the user-supplied cpumask, which may be
6562306a36Sopenharmony_cidifferent from the 'usable' cpumask.
6662306a36Sopenharmony_ci
6762306a36Sopenharmony_ciPadata maintains two pairs of cpumasks internally, the user-supplied cpumasks
6862306a36Sopenharmony_ciand the 'usable' cpumasks.  (Each pair consists of a parallel and a serial
6962306a36Sopenharmony_cicpumask.)  The user-supplied cpumasks default to all possible CPUs on instance
7062306a36Sopenharmony_ciallocation and may be changed as above.  The usable cpumasks are always a
7162306a36Sopenharmony_cisubset of the user-supplied cpumasks and contain only the online CPUs in the
7262306a36Sopenharmony_ciuser-supplied masks; these are the cpumasks padata actually uses.  So it is
7362306a36Sopenharmony_cilegal to supply a cpumask to padata that contains offline CPUs.  Once an
7462306a36Sopenharmony_cioffline CPU in the user-supplied cpumask comes online, padata is going to use
7562306a36Sopenharmony_ciit.
7662306a36Sopenharmony_ci
7762306a36Sopenharmony_ciChanging the CPU masks are expensive operations, so it should not be done with
7862306a36Sopenharmony_cigreat frequency.
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ciRunning A Job
8162306a36Sopenharmony_ci-------------
8262306a36Sopenharmony_ci
8362306a36Sopenharmony_ciActually submitting work to the padata instance requires the creation of a
8462306a36Sopenharmony_cipadata_priv structure, which represents one job::
8562306a36Sopenharmony_ci
8662306a36Sopenharmony_ci    struct padata_priv {
8762306a36Sopenharmony_ci        /* Other stuff here... */
8862306a36Sopenharmony_ci	void                    (*parallel)(struct padata_priv *padata);
8962306a36Sopenharmony_ci	void                    (*serial)(struct padata_priv *padata);
9062306a36Sopenharmony_ci    };
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ciThis structure will almost certainly be embedded within some larger
9362306a36Sopenharmony_cistructure specific to the work to be done.  Most of its fields are private to
9462306a36Sopenharmony_cipadata, but the structure should be zeroed at initialisation time, and the
9562306a36Sopenharmony_ciparallel() and serial() functions should be provided.  Those functions will
9662306a36Sopenharmony_cibe called in the process of getting the work done as we will see
9762306a36Sopenharmony_cimomentarily.
9862306a36Sopenharmony_ci
9962306a36Sopenharmony_ciThe submission of the job is done with::
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ci    int padata_do_parallel(struct padata_shell *ps,
10262306a36Sopenharmony_ci		           struct padata_priv *padata, int *cb_cpu);
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ciThe ps and padata structures must be set up as described above; cb_cpu
10562306a36Sopenharmony_cipoints to the preferred CPU to be used for the final callback when the job is
10662306a36Sopenharmony_cidone; it must be in the current instance's CPU mask (if not the cb_cpu pointer
10762306a36Sopenharmony_ciis updated to point to the CPU actually chosen).  The return value from
10862306a36Sopenharmony_cipadata_do_parallel() is zero on success, indicating that the job is in
10962306a36Sopenharmony_ciprogress. -EBUSY means that somebody, somewhere else is messing with the
11062306a36Sopenharmony_ciinstance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the
11162306a36Sopenharmony_ciserial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped
11262306a36Sopenharmony_ciinstance.
11362306a36Sopenharmony_ci
11462306a36Sopenharmony_ciEach job submitted to padata_do_parallel() will, in turn, be passed to
11562306a36Sopenharmony_ciexactly one call to the above-mentioned parallel() function, on one CPU, so
11662306a36Sopenharmony_citrue parallelism is achieved by submitting multiple jobs.  parallel() runs with
11762306a36Sopenharmony_cisoftware interrupts disabled and thus cannot sleep.  The parallel()
11862306a36Sopenharmony_cifunction gets the padata_priv structure pointer as its lone parameter;
11962306a36Sopenharmony_ciinformation about the actual work to be done is probably obtained by using
12062306a36Sopenharmony_cicontainer_of() to find the enclosing structure.
12162306a36Sopenharmony_ci
12262306a36Sopenharmony_ciNote that parallel() has no return value; the padata subsystem assumes that
12362306a36Sopenharmony_ciparallel() will take responsibility for the job from this point.  The job
12462306a36Sopenharmony_cineed not be completed during this call, but, if parallel() leaves work
12562306a36Sopenharmony_cioutstanding, it should be prepared to be called again with a new job before
12662306a36Sopenharmony_cithe previous one completes.
12762306a36Sopenharmony_ci
12862306a36Sopenharmony_ciSerializing Jobs
12962306a36Sopenharmony_ci----------------
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ciWhen a job does complete, parallel() (or whatever function actually finishes
13262306a36Sopenharmony_cithe work) should inform padata of the fact with a call to::
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ci    void padata_do_serial(struct padata_priv *padata);
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_ciAt some point in the future, padata_do_serial() will trigger a call to the
13762306a36Sopenharmony_ciserial() function in the padata_priv structure.  That call will happen on
13862306a36Sopenharmony_cithe CPU requested in the initial call to padata_do_parallel(); it, too, is
13962306a36Sopenharmony_cirun with local software interrupts disabled.
14062306a36Sopenharmony_ciNote that this call may be deferred for a while since the padata code takes
14162306a36Sopenharmony_cipains to ensure that jobs are completed in the order in which they were
14262306a36Sopenharmony_cisubmitted.
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ciDestroying
14562306a36Sopenharmony_ci----------
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ciCleaning up a padata instance predictably involves calling the two free
14862306a36Sopenharmony_cifunctions that correspond to the allocation in reverse::
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_ci    void padata_free_shell(struct padata_shell *ps);
15162306a36Sopenharmony_ci    void padata_free(struct padata_instance *pinst);
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ciIt is the user's responsibility to ensure all outstanding jobs are complete
15462306a36Sopenharmony_cibefore any of the above are called.
15562306a36Sopenharmony_ci
15662306a36Sopenharmony_ciRunning Multithreaded Jobs
15762306a36Sopenharmony_ci==========================
15862306a36Sopenharmony_ci
15962306a36Sopenharmony_ciA multithreaded job has a main thread and zero or more helper threads, with the
16062306a36Sopenharmony_cimain thread participating in the job and then waiting until all helpers have
16162306a36Sopenharmony_cifinished.  padata splits the job into units called chunks, where a chunk is a
16262306a36Sopenharmony_cipiece of the job that one thread completes in one call to the thread function.
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ciA user has to do three things to run a multithreaded job.  First, describe the
16562306a36Sopenharmony_cijob by defining a padata_mt_job structure, which is explained in the Interface
16662306a36Sopenharmony_cisection.  This includes a pointer to the thread function, which padata will
16762306a36Sopenharmony_cicall each time it assigns a job chunk to a thread.  Then, define the thread
16862306a36Sopenharmony_cifunction, which accepts three arguments, ``start``, ``end``, and ``arg``, where
16962306a36Sopenharmony_cithe first two delimit the range that the thread operates on and the last is a
17062306a36Sopenharmony_cipointer to the job's shared state, if any.  Prepare the shared state, which is
17162306a36Sopenharmony_citypically allocated on the main thread's stack.  Last, call
17262306a36Sopenharmony_cipadata_do_multithreaded(), which will return once the job is finished.
17362306a36Sopenharmony_ci
17462306a36Sopenharmony_ciInterface
17562306a36Sopenharmony_ci=========
17662306a36Sopenharmony_ci
17762306a36Sopenharmony_ci.. kernel-doc:: include/linux/padata.h
17862306a36Sopenharmony_ci.. kernel-doc:: kernel/padata.c
179