162306a36Sopenharmony_ci=========
262306a36Sopenharmony_ciWorkqueue
362306a36Sopenharmony_ci=========
462306a36Sopenharmony_ci
562306a36Sopenharmony_ci:Date: September, 2010
662306a36Sopenharmony_ci:Author: Tejun Heo <tj@kernel.org>
762306a36Sopenharmony_ci:Author: Florian Mickler <florian@mickler.org>
862306a36Sopenharmony_ci
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciIntroduction
1162306a36Sopenharmony_ci============
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciThere are many cases where an asynchronous process execution context
1462306a36Sopenharmony_ciis needed and the workqueue (wq) API is the most commonly used
1562306a36Sopenharmony_cimechanism for such cases.
1662306a36Sopenharmony_ci
1762306a36Sopenharmony_ciWhen such an asynchronous execution context is needed, a work item
1862306a36Sopenharmony_cidescribing which function to execute is put on a queue.  An
1962306a36Sopenharmony_ciindependent thread serves as the asynchronous execution context.  The
2062306a36Sopenharmony_ciqueue is called workqueue and the thread is called worker.
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciWhile there are work items on the workqueue the worker executes the
2362306a36Sopenharmony_cifunctions associated with the work items one after the other.  When
2462306a36Sopenharmony_cithere is no work item left on the workqueue the worker becomes idle.
2562306a36Sopenharmony_ciWhen a new work item gets queued, the worker begins executing again.
2662306a36Sopenharmony_ci
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ciWhy Concurrency Managed Workqueue?
2962306a36Sopenharmony_ci==================================
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciIn the original wq implementation, a multi threaded (MT) wq had one
3262306a36Sopenharmony_ciworker thread per CPU and a single threaded (ST) wq had one worker
3362306a36Sopenharmony_cithread system-wide.  A single MT wq needed to keep around the same
3462306a36Sopenharmony_cinumber of workers as the number of CPUs.  The kernel grew a lot of MT
3562306a36Sopenharmony_ciwq users over the years and with the number of CPU cores continuously
3662306a36Sopenharmony_cirising, some systems saturated the default 32k PID space just booting
3762306a36Sopenharmony_ciup.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciAlthough MT wq wasted a lot of resource, the level of concurrency
4062306a36Sopenharmony_ciprovided was unsatisfactory.  The limitation was common to both ST and
4162306a36Sopenharmony_ciMT wq albeit less severe on MT.  Each wq maintained its own separate
4262306a36Sopenharmony_ciworker pool.  An MT wq could provide only one execution context per CPU
4362306a36Sopenharmony_ciwhile an ST wq one for the whole system.  Work items had to compete for
4462306a36Sopenharmony_cithose very limited execution contexts leading to various problems
4562306a36Sopenharmony_ciincluding proneness to deadlocks around the single execution context.
4662306a36Sopenharmony_ci
4762306a36Sopenharmony_ciThe tension between the provided level of concurrency and resource
4862306a36Sopenharmony_ciusage also forced its users to make unnecessary tradeoffs like libata
4962306a36Sopenharmony_cichoosing to use ST wq for polling PIOs and accepting an unnecessary
5062306a36Sopenharmony_cilimitation that no two polling PIOs can progress at the same time.  As
5162306a36Sopenharmony_ciMT wq don't provide much better concurrency, users which require
5262306a36Sopenharmony_cihigher level of concurrency, like async or fscache, had to implement
5362306a36Sopenharmony_citheir own thread pool.
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ciConcurrency Managed Workqueue (cmwq) is a reimplementation of wq with
5662306a36Sopenharmony_cifocus on the following goals.
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_ci* Maintain compatibility with the original workqueue API.
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ci* Use per-CPU unified worker pools shared by all wq to provide
6162306a36Sopenharmony_ci  flexible level of concurrency on demand without wasting a lot of
6262306a36Sopenharmony_ci  resource.
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ci* Automatically regulate worker pool and level of concurrency so that
6562306a36Sopenharmony_ci  the API users don't need to worry about such details.
6662306a36Sopenharmony_ci
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ciThe Design
6962306a36Sopenharmony_ci==========
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ciIn order to ease the asynchronous execution of functions a new
7262306a36Sopenharmony_ciabstraction, the work item, is introduced.
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ciA work item is a simple struct that holds a pointer to the function
7562306a36Sopenharmony_cithat is to be executed asynchronously.  Whenever a driver or subsystem
7662306a36Sopenharmony_ciwants a function to be executed asynchronously it has to set up a work
7762306a36Sopenharmony_ciitem pointing to that function and queue that work item on a
7862306a36Sopenharmony_ciworkqueue.
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ciSpecial purpose threads, called worker threads, execute the functions
8162306a36Sopenharmony_cioff of the queue, one after the other.  If no work is queued, the
8262306a36Sopenharmony_ciworker threads become idle.  These worker threads are managed in so
8362306a36Sopenharmony_cicalled worker-pools.
8462306a36Sopenharmony_ci
8562306a36Sopenharmony_ciThe cmwq design differentiates between the user-facing workqueues that
8662306a36Sopenharmony_cisubsystems and drivers queue work items on and the backend mechanism
8762306a36Sopenharmony_ciwhich manages worker-pools and processes the queued work items.
8862306a36Sopenharmony_ci
8962306a36Sopenharmony_ciThere are two worker-pools, one for normal work items and the other
9062306a36Sopenharmony_cifor high priority ones, for each possible CPU and some extra
9162306a36Sopenharmony_ciworker-pools to serve work items queued on unbound workqueues - the
9262306a36Sopenharmony_cinumber of these backing pools is dynamic.
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ciSubsystems and drivers can create and queue work items through special
9562306a36Sopenharmony_ciworkqueue API functions as they see fit. They can influence some
9662306a36Sopenharmony_ciaspects of the way the work items are executed by setting flags on the
9762306a36Sopenharmony_ciworkqueue they are putting the work item on. These flags include
9862306a36Sopenharmony_cithings like CPU locality, concurrency limits, priority and more.  To
9962306a36Sopenharmony_ciget a detailed overview refer to the API description of
10062306a36Sopenharmony_ci``alloc_workqueue()`` below.
10162306a36Sopenharmony_ci
10262306a36Sopenharmony_ciWhen a work item is queued to a workqueue, the target worker-pool is
10362306a36Sopenharmony_cidetermined according to the queue parameters and workqueue attributes
10462306a36Sopenharmony_ciand appended on the shared worklist of the worker-pool.  For example,
10562306a36Sopenharmony_ciunless specifically overridden, a work item of a bound workqueue will
10662306a36Sopenharmony_cibe queued on the worklist of either normal or highpri worker-pool that
10762306a36Sopenharmony_ciis associated to the CPU the issuer is running on.
10862306a36Sopenharmony_ci
10962306a36Sopenharmony_ciFor any worker pool implementation, managing the concurrency level
11062306a36Sopenharmony_ci(how many execution contexts are active) is an important issue.  cmwq
11162306a36Sopenharmony_citries to keep the concurrency at a minimal but sufficient level.
11262306a36Sopenharmony_ciMinimal to save resources and sufficient in that the system is used at
11362306a36Sopenharmony_ciits full capacity.
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ciEach worker-pool bound to an actual CPU implements concurrency
11662306a36Sopenharmony_cimanagement by hooking into the scheduler.  The worker-pool is notified
11762306a36Sopenharmony_ciwhenever an active worker wakes up or sleeps and keeps track of the
11862306a36Sopenharmony_cinumber of the currently runnable workers.  Generally, work items are
11962306a36Sopenharmony_cinot expected to hog a CPU and consume many cycles.  That means
12062306a36Sopenharmony_cimaintaining just enough concurrency to prevent work processing from
12162306a36Sopenharmony_cistalling should be optimal.  As long as there are one or more runnable
12262306a36Sopenharmony_ciworkers on the CPU, the worker-pool doesn't start execution of a new
12362306a36Sopenharmony_ciwork, but, when the last running worker goes to sleep, it immediately
12462306a36Sopenharmony_cischedules a new worker so that the CPU doesn't sit idle while there
12562306a36Sopenharmony_ciare pending work items.  This allows using a minimal number of workers
12662306a36Sopenharmony_ciwithout losing execution bandwidth.
12762306a36Sopenharmony_ci
12862306a36Sopenharmony_ciKeeping idle workers around doesn't cost other than the memory space
12962306a36Sopenharmony_cifor kthreads, so cmwq holds onto idle ones for a while before killing
13062306a36Sopenharmony_cithem.
13162306a36Sopenharmony_ci
13262306a36Sopenharmony_ciFor unbound workqueues, the number of backing pools is dynamic.
13362306a36Sopenharmony_ciUnbound workqueue can be assigned custom attributes using
13462306a36Sopenharmony_ci``apply_workqueue_attrs()`` and workqueue will automatically create
13562306a36Sopenharmony_cibacking worker pools matching the attributes.  The responsibility of
13662306a36Sopenharmony_ciregulating concurrency level is on the users.  There is also a flag to
13762306a36Sopenharmony_cimark a bound wq to ignore the concurrency management.  Please refer to
13862306a36Sopenharmony_cithe API section for details.
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ciForward progress guarantee relies on that workers can be created when
14162306a36Sopenharmony_cimore execution contexts are necessary, which in turn is guaranteed
14262306a36Sopenharmony_cithrough the use of rescue workers.  All work items which might be used
14362306a36Sopenharmony_cion code paths that handle memory reclaim are required to be queued on
14462306a36Sopenharmony_ciwq's that have a rescue-worker reserved for execution under memory
14562306a36Sopenharmony_cipressure.  Else it is possible that the worker-pool deadlocks waiting
14662306a36Sopenharmony_cifor execution contexts to free up.
14762306a36Sopenharmony_ci
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ciApplication Programming Interface (API)
15062306a36Sopenharmony_ci=======================================
15162306a36Sopenharmony_ci
15262306a36Sopenharmony_ci``alloc_workqueue()`` allocates a wq.  The original
15362306a36Sopenharmony_ci``create_*workqueue()`` functions are deprecated and scheduled for
15462306a36Sopenharmony_ciremoval.  ``alloc_workqueue()`` takes three arguments - ``@name``,
15562306a36Sopenharmony_ci``@flags`` and ``@max_active``.  ``@name`` is the name of the wq and
15662306a36Sopenharmony_cialso used as the name of the rescuer thread if there is one.
15762306a36Sopenharmony_ci
15862306a36Sopenharmony_ciA wq no longer manages execution resources but serves as a domain for
15962306a36Sopenharmony_ciforward progress guarantee, flush and work item attributes. ``@flags``
16062306a36Sopenharmony_ciand ``@max_active`` control how work items are assigned execution
16162306a36Sopenharmony_ciresources, scheduled and executed.
16262306a36Sopenharmony_ci
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ci``flags``
16562306a36Sopenharmony_ci---------
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ci``WQ_UNBOUND``
16862306a36Sopenharmony_ci  Work items queued to an unbound wq are served by the special
16962306a36Sopenharmony_ci  worker-pools which host workers which are not bound to any
17062306a36Sopenharmony_ci  specific CPU.  This makes the wq behave as a simple execution
17162306a36Sopenharmony_ci  context provider without concurrency management.  The unbound
17262306a36Sopenharmony_ci  worker-pools try to start execution of work items as soon as
17362306a36Sopenharmony_ci  possible.  Unbound wq sacrifices locality but is useful for
17462306a36Sopenharmony_ci  the following cases.
17562306a36Sopenharmony_ci
17662306a36Sopenharmony_ci  * Wide fluctuation in the concurrency level requirement is
17762306a36Sopenharmony_ci    expected and using bound wq may end up creating large number
17862306a36Sopenharmony_ci    of mostly unused workers across different CPUs as the issuer
17962306a36Sopenharmony_ci    hops through different CPUs.
18062306a36Sopenharmony_ci
18162306a36Sopenharmony_ci  * Long running CPU intensive workloads which can be better
18262306a36Sopenharmony_ci    managed by the system scheduler.
18362306a36Sopenharmony_ci
18462306a36Sopenharmony_ci``WQ_FREEZABLE``
18562306a36Sopenharmony_ci  A freezable wq participates in the freeze phase of the system
18662306a36Sopenharmony_ci  suspend operations.  Work items on the wq are drained and no
18762306a36Sopenharmony_ci  new work item starts execution until thawed.
18862306a36Sopenharmony_ci
18962306a36Sopenharmony_ci``WQ_MEM_RECLAIM``
19062306a36Sopenharmony_ci  All wq which might be used in the memory reclaim paths **MUST**
19162306a36Sopenharmony_ci  have this flag set.  The wq is guaranteed to have at least one
19262306a36Sopenharmony_ci  execution context regardless of memory pressure.
19362306a36Sopenharmony_ci
19462306a36Sopenharmony_ci``WQ_HIGHPRI``
19562306a36Sopenharmony_ci  Work items of a highpri wq are queued to the highpri
19662306a36Sopenharmony_ci  worker-pool of the target cpu.  Highpri worker-pools are
19762306a36Sopenharmony_ci  served by worker threads with elevated nice level.
19862306a36Sopenharmony_ci
19962306a36Sopenharmony_ci  Note that normal and highpri worker-pools don't interact with
20062306a36Sopenharmony_ci  each other.  Each maintains its separate pool of workers and
20162306a36Sopenharmony_ci  implements concurrency management among its workers.
20262306a36Sopenharmony_ci
20362306a36Sopenharmony_ci``WQ_CPU_INTENSIVE``
20462306a36Sopenharmony_ci  Work items of a CPU intensive wq do not contribute to the
20562306a36Sopenharmony_ci  concurrency level.  In other words, runnable CPU intensive
20662306a36Sopenharmony_ci  work items will not prevent other work items in the same
20762306a36Sopenharmony_ci  worker-pool from starting execution.  This is useful for bound
20862306a36Sopenharmony_ci  work items which are expected to hog CPU cycles so that their
20962306a36Sopenharmony_ci  execution is regulated by the system scheduler.
21062306a36Sopenharmony_ci
21162306a36Sopenharmony_ci  Although CPU intensive work items don't contribute to the
21262306a36Sopenharmony_ci  concurrency level, start of their executions is still
21362306a36Sopenharmony_ci  regulated by the concurrency management and runnable
21462306a36Sopenharmony_ci  non-CPU-intensive work items can delay execution of CPU
21562306a36Sopenharmony_ci  intensive work items.
21662306a36Sopenharmony_ci
21762306a36Sopenharmony_ci  This flag is meaningless for unbound wq.
21862306a36Sopenharmony_ci
21962306a36Sopenharmony_ci
22062306a36Sopenharmony_ci``max_active``
22162306a36Sopenharmony_ci--------------
22262306a36Sopenharmony_ci
22362306a36Sopenharmony_ci``@max_active`` determines the maximum number of execution contexts per
22462306a36Sopenharmony_ciCPU which can be assigned to the work items of a wq. For example, with
22562306a36Sopenharmony_ci``@max_active`` of 16, at most 16 work items of the wq can be executing
22662306a36Sopenharmony_ciat the same time per CPU. This is always a per-CPU attribute, even for
22762306a36Sopenharmony_ciunbound workqueues.
22862306a36Sopenharmony_ci
22962306a36Sopenharmony_ciThe maximum limit for ``@max_active`` is 512 and the default value used
23062306a36Sopenharmony_ciwhen 0 is specified is 256. These values are chosen sufficiently high
23162306a36Sopenharmony_cisuch that they are not the limiting factor while providing protection in
23262306a36Sopenharmony_cirunaway cases.
23362306a36Sopenharmony_ci
23462306a36Sopenharmony_ciThe number of active work items of a wq is usually regulated by the
23562306a36Sopenharmony_ciusers of the wq, more specifically, by how many work items the users
23662306a36Sopenharmony_cimay queue at the same time.  Unless there is a specific need for
23762306a36Sopenharmony_cithrottling the number of active work items, specifying '0' is
23862306a36Sopenharmony_cirecommended.
23962306a36Sopenharmony_ci
24062306a36Sopenharmony_ciSome users depend on the strict execution ordering of ST wq.  The
24162306a36Sopenharmony_cicombination of ``@max_active`` of 1 and ``WQ_UNBOUND`` used to
24262306a36Sopenharmony_ciachieve this behavior.  Work items on such wq were always queued to the
24362306a36Sopenharmony_ciunbound worker-pools and only one work item could be active at any given
24462306a36Sopenharmony_citime thus achieving the same ordering property as ST wq.
24562306a36Sopenharmony_ci
24662306a36Sopenharmony_ciIn the current implementation the above configuration only guarantees
24762306a36Sopenharmony_ciST behavior within a given NUMA node. Instead ``alloc_ordered_workqueue()`` should
24862306a36Sopenharmony_cibe used to achieve system-wide ST behavior.
24962306a36Sopenharmony_ci
25062306a36Sopenharmony_ci
25162306a36Sopenharmony_ciExample Execution Scenarios
25262306a36Sopenharmony_ci===========================
25362306a36Sopenharmony_ci
25462306a36Sopenharmony_ciThe following example execution scenarios try to illustrate how cmwq
25562306a36Sopenharmony_cibehave under different configurations.
25662306a36Sopenharmony_ci
25762306a36Sopenharmony_ci Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU.
25862306a36Sopenharmony_ci w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms
25962306a36Sopenharmony_ci again before finishing.  w1 and w2 burn CPU for 5ms then sleep for
26062306a36Sopenharmony_ci 10ms.
26162306a36Sopenharmony_ci
26262306a36Sopenharmony_ciIgnoring all other tasks, works and processing overhead, and assuming
26362306a36Sopenharmony_cisimple FIFO scheduling, the following is one highly simplified version
26462306a36Sopenharmony_ciof possible sequences of events with the original wq. ::
26562306a36Sopenharmony_ci
26662306a36Sopenharmony_ci TIME IN MSECS	EVENT
26762306a36Sopenharmony_ci 0		w0 starts and burns CPU
26862306a36Sopenharmony_ci 5		w0 sleeps
26962306a36Sopenharmony_ci 15		w0 wakes up and burns CPU
27062306a36Sopenharmony_ci 20		w0 finishes
27162306a36Sopenharmony_ci 20		w1 starts and burns CPU
27262306a36Sopenharmony_ci 25		w1 sleeps
27362306a36Sopenharmony_ci 35		w1 wakes up and finishes
27462306a36Sopenharmony_ci 35		w2 starts and burns CPU
27562306a36Sopenharmony_ci 40		w2 sleeps
27662306a36Sopenharmony_ci 50		w2 wakes up and finishes
27762306a36Sopenharmony_ci
27862306a36Sopenharmony_ciAnd with cmwq with ``@max_active`` >= 3, ::
27962306a36Sopenharmony_ci
28062306a36Sopenharmony_ci TIME IN MSECS	EVENT
28162306a36Sopenharmony_ci 0		w0 starts and burns CPU
28262306a36Sopenharmony_ci 5		w0 sleeps
28362306a36Sopenharmony_ci 5		w1 starts and burns CPU
28462306a36Sopenharmony_ci 10		w1 sleeps
28562306a36Sopenharmony_ci 10		w2 starts and burns CPU
28662306a36Sopenharmony_ci 15		w2 sleeps
28762306a36Sopenharmony_ci 15		w0 wakes up and burns CPU
28862306a36Sopenharmony_ci 20		w0 finishes
28962306a36Sopenharmony_ci 20		w1 wakes up and finishes
29062306a36Sopenharmony_ci 25		w2 wakes up and finishes
29162306a36Sopenharmony_ci
29262306a36Sopenharmony_ciIf ``@max_active`` == 2, ::
29362306a36Sopenharmony_ci
29462306a36Sopenharmony_ci TIME IN MSECS	EVENT
29562306a36Sopenharmony_ci 0		w0 starts and burns CPU
29662306a36Sopenharmony_ci 5		w0 sleeps
29762306a36Sopenharmony_ci 5		w1 starts and burns CPU
29862306a36Sopenharmony_ci 10		w1 sleeps
29962306a36Sopenharmony_ci 15		w0 wakes up and burns CPU
30062306a36Sopenharmony_ci 20		w0 finishes
30162306a36Sopenharmony_ci 20		w1 wakes up and finishes
30262306a36Sopenharmony_ci 20		w2 starts and burns CPU
30362306a36Sopenharmony_ci 25		w2 sleeps
30462306a36Sopenharmony_ci 35		w2 wakes up and finishes
30562306a36Sopenharmony_ci
30662306a36Sopenharmony_ciNow, let's assume w1 and w2 are queued to a different wq q1 which has
30762306a36Sopenharmony_ci``WQ_CPU_INTENSIVE`` set, ::
30862306a36Sopenharmony_ci
30962306a36Sopenharmony_ci TIME IN MSECS	EVENT
31062306a36Sopenharmony_ci 0		w0 starts and burns CPU
31162306a36Sopenharmony_ci 5		w0 sleeps
31262306a36Sopenharmony_ci 5		w1 and w2 start and burn CPU
31362306a36Sopenharmony_ci 10		w1 sleeps
31462306a36Sopenharmony_ci 15		w2 sleeps
31562306a36Sopenharmony_ci 15		w0 wakes up and burns CPU
31662306a36Sopenharmony_ci 20		w0 finishes
31762306a36Sopenharmony_ci 20		w1 wakes up and finishes
31862306a36Sopenharmony_ci 25		w2 wakes up and finishes
31962306a36Sopenharmony_ci
32062306a36Sopenharmony_ci
32162306a36Sopenharmony_ciGuidelines
32262306a36Sopenharmony_ci==========
32362306a36Sopenharmony_ci
32462306a36Sopenharmony_ci* Do not forget to use ``WQ_MEM_RECLAIM`` if a wq may process work
32562306a36Sopenharmony_ci  items which are used during memory reclaim.  Each wq with
32662306a36Sopenharmony_ci  ``WQ_MEM_RECLAIM`` set has an execution context reserved for it.  If
32762306a36Sopenharmony_ci  there is dependency among multiple work items used during memory
32862306a36Sopenharmony_ci  reclaim, they should be queued to separate wq each with
32962306a36Sopenharmony_ci  ``WQ_MEM_RECLAIM``.
33062306a36Sopenharmony_ci
33162306a36Sopenharmony_ci* Unless strict ordering is required, there is no need to use ST wq.
33262306a36Sopenharmony_ci
33362306a36Sopenharmony_ci* Unless there is a specific need, using 0 for @max_active is
33462306a36Sopenharmony_ci  recommended.  In most use cases, concurrency level usually stays
33562306a36Sopenharmony_ci  well under the default limit.
33662306a36Sopenharmony_ci
33762306a36Sopenharmony_ci* A wq serves as a domain for forward progress guarantee
33862306a36Sopenharmony_ci  (``WQ_MEM_RECLAIM``, flush and work item attributes.  Work items
33962306a36Sopenharmony_ci  which are not involved in memory reclaim and don't need to be
34062306a36Sopenharmony_ci  flushed as a part of a group of work items, and don't require any
34162306a36Sopenharmony_ci  special attribute, can use one of the system wq.  There is no
34262306a36Sopenharmony_ci  difference in execution characteristics between using a dedicated wq
34362306a36Sopenharmony_ci  and a system wq.
34462306a36Sopenharmony_ci
34562306a36Sopenharmony_ci* Unless work items are expected to consume a huge amount of CPU
34662306a36Sopenharmony_ci  cycles, using a bound wq is usually beneficial due to the increased
34762306a36Sopenharmony_ci  level of locality in wq operations and work item execution.
34862306a36Sopenharmony_ci
34962306a36Sopenharmony_ci
35062306a36Sopenharmony_ciAffinity Scopes
35162306a36Sopenharmony_ci===============
35262306a36Sopenharmony_ci
35362306a36Sopenharmony_ciAn unbound workqueue groups CPUs according to its affinity scope to improve
35462306a36Sopenharmony_cicache locality. For example, if a workqueue is using the default affinity
35562306a36Sopenharmony_ciscope of "cache", it will group CPUs according to last level cache
35662306a36Sopenharmony_ciboundaries. A work item queued on the workqueue will be assigned to a worker
35762306a36Sopenharmony_cion one of the CPUs which share the last level cache with the issuing CPU.
35862306a36Sopenharmony_ciOnce started, the worker may or may not be allowed to move outside the scope
35962306a36Sopenharmony_cidepending on the ``affinity_strict`` setting of the scope.
36062306a36Sopenharmony_ci
36162306a36Sopenharmony_ciWorkqueue currently supports the following affinity scopes.
36262306a36Sopenharmony_ci
36362306a36Sopenharmony_ci``default``
36462306a36Sopenharmony_ci  Use the scope in module parameter ``workqueue.default_affinity_scope``
36562306a36Sopenharmony_ci  which is always set to one of the scopes below.
36662306a36Sopenharmony_ci
36762306a36Sopenharmony_ci``cpu``
36862306a36Sopenharmony_ci  CPUs are not grouped. A work item issued on one CPU is processed by a
36962306a36Sopenharmony_ci  worker on the same CPU. This makes unbound workqueues behave as per-cpu
37062306a36Sopenharmony_ci  workqueues without concurrency management.
37162306a36Sopenharmony_ci
37262306a36Sopenharmony_ci``smt``
37362306a36Sopenharmony_ci  CPUs are grouped according to SMT boundaries. This usually means that the
37462306a36Sopenharmony_ci  logical threads of each physical CPU core are grouped together.
37562306a36Sopenharmony_ci
37662306a36Sopenharmony_ci``cache``
37762306a36Sopenharmony_ci  CPUs are grouped according to cache boundaries. Which specific cache
37862306a36Sopenharmony_ci  boundary is used is determined by the arch code. L3 is used in a lot of
37962306a36Sopenharmony_ci  cases. This is the default affinity scope.
38062306a36Sopenharmony_ci
38162306a36Sopenharmony_ci``numa``
38262306a36Sopenharmony_ci  CPUs are grouped according to NUMA bounaries.
38362306a36Sopenharmony_ci
38462306a36Sopenharmony_ci``system``
38562306a36Sopenharmony_ci  All CPUs are put in the same group. Workqueue makes no effort to process a
38662306a36Sopenharmony_ci  work item on a CPU close to the issuing CPU.
38762306a36Sopenharmony_ci
38862306a36Sopenharmony_ciThe default affinity scope can be changed with the module parameter
38962306a36Sopenharmony_ci``workqueue.default_affinity_scope`` and a specific workqueue's affinity
39062306a36Sopenharmony_ciscope can be changed using ``apply_workqueue_attrs()``.
39162306a36Sopenharmony_ci
39262306a36Sopenharmony_ciIf ``WQ_SYSFS`` is set, the workqueue will have the following affinity scope
39362306a36Sopenharmony_cirelated interface files under its ``/sys/devices/virtual/workqueue/WQ_NAME/``
39462306a36Sopenharmony_cidirectory.
39562306a36Sopenharmony_ci
39662306a36Sopenharmony_ci``affinity_scope``
39762306a36Sopenharmony_ci  Read to see the current affinity scope. Write to change.
39862306a36Sopenharmony_ci
39962306a36Sopenharmony_ci  When default is the current scope, reading this file will also show the
40062306a36Sopenharmony_ci  current effective scope in parentheses, for example, ``default (cache)``.
40162306a36Sopenharmony_ci
40262306a36Sopenharmony_ci``affinity_strict``
40362306a36Sopenharmony_ci  0 by default indicating that affinity scopes are not strict. When a work
40462306a36Sopenharmony_ci  item starts execution, workqueue makes a best-effort attempt to ensure
40562306a36Sopenharmony_ci  that the worker is inside its affinity scope, which is called
40662306a36Sopenharmony_ci  repatriation. Once started, the scheduler is free to move the worker
40762306a36Sopenharmony_ci  anywhere in the system as it sees fit. This enables benefiting from scope
40862306a36Sopenharmony_ci  locality while still being able to utilize other CPUs if necessary and
40962306a36Sopenharmony_ci  available.
41062306a36Sopenharmony_ci
41162306a36Sopenharmony_ci  If set to 1, all workers of the scope are guaranteed always to be in the
41262306a36Sopenharmony_ci  scope. This may be useful when crossing affinity scopes has other
41362306a36Sopenharmony_ci  implications, for example, in terms of power consumption or workload
41462306a36Sopenharmony_ci  isolation. Strict NUMA scope can also be used to match the workqueue
41562306a36Sopenharmony_ci  behavior of older kernels.
41662306a36Sopenharmony_ci
41762306a36Sopenharmony_ci
41862306a36Sopenharmony_ciAffinity Scopes and Performance
41962306a36Sopenharmony_ci===============================
42062306a36Sopenharmony_ci
42162306a36Sopenharmony_ciIt'd be ideal if an unbound workqueue's behavior is optimal for vast
42262306a36Sopenharmony_cimajority of use cases without further tuning. Unfortunately, in the current
42362306a36Sopenharmony_cikernel, there exists a pronounced trade-off between locality and utilization
42462306a36Sopenharmony_cinecessitating explicit configurations when workqueues are heavily used.
42562306a36Sopenharmony_ci
42662306a36Sopenharmony_ciHigher locality leads to higher efficiency where more work is performed for
42762306a36Sopenharmony_cithe same number of consumed CPU cycles. However, higher locality may also
42862306a36Sopenharmony_cicause lower overall system utilization if the work items are not spread
42962306a36Sopenharmony_cienough across the affinity scopes by the issuers. The following performance
43062306a36Sopenharmony_citesting with dm-crypt clearly illustrates this trade-off.
43162306a36Sopenharmony_ci
43262306a36Sopenharmony_ciThe tests are run on a CPU with 12-cores/24-threads split across four L3
43362306a36Sopenharmony_cicaches (AMD Ryzen 9 3900x). CPU clock boost is turned off for consistency.
43462306a36Sopenharmony_ci``/dev/dm-0`` is a dm-crypt device created on NVME SSD (Samsung 990 PRO) and
43562306a36Sopenharmony_ciopened with ``cryptsetup`` with default settings.
43662306a36Sopenharmony_ci
43762306a36Sopenharmony_ci
43862306a36Sopenharmony_ciScenario 1: Enough issuers and work spread across the machine
43962306a36Sopenharmony_ci-------------------------------------------------------------
44062306a36Sopenharmony_ci
44162306a36Sopenharmony_ciThe command used: ::
44262306a36Sopenharmony_ci
44362306a36Sopenharmony_ci  $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k --ioengine=libaio \
44462306a36Sopenharmony_ci    --iodepth=64 --runtime=60 --numjobs=24 --time_based --group_reporting \
44562306a36Sopenharmony_ci    --name=iops-test-job --verify=sha512
44662306a36Sopenharmony_ci
44762306a36Sopenharmony_ciThere are 24 issuers, each issuing 64 IOs concurrently. ``--verify=sha512``
44862306a36Sopenharmony_cimakes ``fio`` generate and read back the content each time which makes
44962306a36Sopenharmony_ciexecution locality matter between the issuer and ``kcryptd``. The followings
45062306a36Sopenharmony_ciare the read bandwidths and CPU utilizations depending on different affinity
45162306a36Sopenharmony_ciscope settings on ``kcryptd`` measured over five runs. Bandwidths are in
45262306a36Sopenharmony_ciMiBps, and CPU util in percents.
45362306a36Sopenharmony_ci
45462306a36Sopenharmony_ci.. list-table::
45562306a36Sopenharmony_ci   :widths: 16 20 20
45662306a36Sopenharmony_ci   :header-rows: 1
45762306a36Sopenharmony_ci
45862306a36Sopenharmony_ci   * - Affinity
45962306a36Sopenharmony_ci     - Bandwidth (MiBps)
46062306a36Sopenharmony_ci     - CPU util (%)
46162306a36Sopenharmony_ci
46262306a36Sopenharmony_ci   * - system
46362306a36Sopenharmony_ci     - 1159.40 ±1.34
46462306a36Sopenharmony_ci     - 99.31 ±0.02
46562306a36Sopenharmony_ci
46662306a36Sopenharmony_ci   * - cache
46762306a36Sopenharmony_ci     - 1166.40 ±0.89
46862306a36Sopenharmony_ci     - 99.34 ±0.01
46962306a36Sopenharmony_ci
47062306a36Sopenharmony_ci   * - cache (strict)
47162306a36Sopenharmony_ci     - 1166.00 ±0.71
47262306a36Sopenharmony_ci     - 99.35 ±0.01
47362306a36Sopenharmony_ci
47462306a36Sopenharmony_ciWith enough issuers spread across the system, there is no downside to
47562306a36Sopenharmony_ci"cache", strict or otherwise. All three configurations saturate the whole
47662306a36Sopenharmony_cimachine but the cache-affine ones outperform by 0.6% thanks to improved
47762306a36Sopenharmony_cilocality.
47862306a36Sopenharmony_ci
47962306a36Sopenharmony_ci
48062306a36Sopenharmony_ciScenario 2: Fewer issuers, enough work for saturation
48162306a36Sopenharmony_ci-----------------------------------------------------
48262306a36Sopenharmony_ci
48362306a36Sopenharmony_ciThe command used: ::
48462306a36Sopenharmony_ci
48562306a36Sopenharmony_ci  $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k \
48662306a36Sopenharmony_ci    --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=8 \
48762306a36Sopenharmony_ci    --time_based --group_reporting --name=iops-test-job --verify=sha512
48862306a36Sopenharmony_ci
48962306a36Sopenharmony_ciThe only difference from the previous scenario is ``--numjobs=8``. There are
49062306a36Sopenharmony_cia third of the issuers but is still enough total work to saturate the
49162306a36Sopenharmony_cisystem.
49262306a36Sopenharmony_ci
49362306a36Sopenharmony_ci.. list-table::
49462306a36Sopenharmony_ci   :widths: 16 20 20
49562306a36Sopenharmony_ci   :header-rows: 1
49662306a36Sopenharmony_ci
49762306a36Sopenharmony_ci   * - Affinity
49862306a36Sopenharmony_ci     - Bandwidth (MiBps)
49962306a36Sopenharmony_ci     - CPU util (%)
50062306a36Sopenharmony_ci
50162306a36Sopenharmony_ci   * - system
50262306a36Sopenharmony_ci     - 1155.40 ±0.89
50362306a36Sopenharmony_ci     - 97.41 ±0.05
50462306a36Sopenharmony_ci
50562306a36Sopenharmony_ci   * - cache
50662306a36Sopenharmony_ci     - 1154.40 ±1.14
50762306a36Sopenharmony_ci     - 96.15 ±0.09
50862306a36Sopenharmony_ci
50962306a36Sopenharmony_ci   * - cache (strict)
51062306a36Sopenharmony_ci     - 1112.00 ±4.64
51162306a36Sopenharmony_ci     - 93.26 ±0.35
51262306a36Sopenharmony_ci
51362306a36Sopenharmony_ciThis is more than enough work to saturate the system. Both "system" and
51462306a36Sopenharmony_ci"cache" are nearly saturating the machine but not fully. "cache" is using
51562306a36Sopenharmony_ciless CPU but the better efficiency puts it at the same bandwidth as
51662306a36Sopenharmony_ci"system".
51762306a36Sopenharmony_ci
51862306a36Sopenharmony_ciEight issuers moving around over four L3 cache scope still allow "cache
51962306a36Sopenharmony_ci(strict)" to mostly saturate the machine but the loss of work conservation
52062306a36Sopenharmony_ciis now starting to hurt with 3.7% bandwidth loss.
52162306a36Sopenharmony_ci
52262306a36Sopenharmony_ci
52362306a36Sopenharmony_ciScenario 3: Even fewer issuers, not enough work to saturate
52462306a36Sopenharmony_ci-----------------------------------------------------------
52562306a36Sopenharmony_ci
52662306a36Sopenharmony_ciThe command used: ::
52762306a36Sopenharmony_ci
52862306a36Sopenharmony_ci  $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k \
52962306a36Sopenharmony_ci    --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=4 \
53062306a36Sopenharmony_ci    --time_based --group_reporting --name=iops-test-job --verify=sha512
53162306a36Sopenharmony_ci
53262306a36Sopenharmony_ciAgain, the only difference is ``--numjobs=4``. With the number of issuers
53362306a36Sopenharmony_cireduced to four, there now isn't enough work to saturate the whole system
53462306a36Sopenharmony_ciand the bandwidth becomes dependent on completion latencies.
53562306a36Sopenharmony_ci
53662306a36Sopenharmony_ci.. list-table::
53762306a36Sopenharmony_ci   :widths: 16 20 20
53862306a36Sopenharmony_ci   :header-rows: 1
53962306a36Sopenharmony_ci
54062306a36Sopenharmony_ci   * - Affinity
54162306a36Sopenharmony_ci     - Bandwidth (MiBps)
54262306a36Sopenharmony_ci     - CPU util (%)
54362306a36Sopenharmony_ci
54462306a36Sopenharmony_ci   * - system
54562306a36Sopenharmony_ci     - 993.60 ±1.82
54662306a36Sopenharmony_ci     - 75.49 ±0.06
54762306a36Sopenharmony_ci
54862306a36Sopenharmony_ci   * - cache
54962306a36Sopenharmony_ci     - 973.40 ±1.52
55062306a36Sopenharmony_ci     - 74.90 ±0.07
55162306a36Sopenharmony_ci
55262306a36Sopenharmony_ci   * - cache (strict)
55362306a36Sopenharmony_ci     - 828.20 ±4.49
55462306a36Sopenharmony_ci     - 66.84 ±0.29
55562306a36Sopenharmony_ci
55662306a36Sopenharmony_ciNow, the tradeoff between locality and utilization is clearer. "cache" shows
55762306a36Sopenharmony_ci2% bandwidth loss compared to "system" and "cache (struct)" whopping 20%.
55862306a36Sopenharmony_ci
55962306a36Sopenharmony_ci
56062306a36Sopenharmony_ciConclusion and Recommendations
56162306a36Sopenharmony_ci------------------------------
56262306a36Sopenharmony_ci
56362306a36Sopenharmony_ciIn the above experiments, the efficiency advantage of the "cache" affinity
56462306a36Sopenharmony_ciscope over "system" is, while consistent and noticeable, small. However, the
56562306a36Sopenharmony_ciimpact is dependent on the distances between the scopes and may be more
56662306a36Sopenharmony_cipronounced in processors with more complex topologies.
56762306a36Sopenharmony_ci
56862306a36Sopenharmony_ciWhile the loss of work-conservation in certain scenarios hurts, it is a lot
56962306a36Sopenharmony_cibetter than "cache (strict)" and maximizing workqueue utilization is
57062306a36Sopenharmony_ciunlikely to be the common case anyway. As such, "cache" is the default
57162306a36Sopenharmony_ciaffinity scope for unbound pools.
57262306a36Sopenharmony_ci
57362306a36Sopenharmony_ci* As there is no one option which is great for most cases, workqueue usages
57462306a36Sopenharmony_ci  that may consume a significant amount of CPU are recommended to configure
57562306a36Sopenharmony_ci  the workqueues using ``apply_workqueue_attrs()`` and/or enable
57662306a36Sopenharmony_ci  ``WQ_SYSFS``.
57762306a36Sopenharmony_ci
57862306a36Sopenharmony_ci* An unbound workqueue with strict "cpu" affinity scope behaves the same as
57962306a36Sopenharmony_ci  ``WQ_CPU_INTENSIVE`` per-cpu workqueue. There is no real advanage to the
58062306a36Sopenharmony_ci  latter and an unbound workqueue provides a lot more flexibility.
58162306a36Sopenharmony_ci
58262306a36Sopenharmony_ci* Affinity scopes are introduced in Linux v6.5. To emulate the previous
58362306a36Sopenharmony_ci  behavior, use strict "numa" affinity scope.
58462306a36Sopenharmony_ci
58562306a36Sopenharmony_ci* The loss of work-conservation in non-strict affinity scopes is likely
58662306a36Sopenharmony_ci  originating from the scheduler. There is no theoretical reason why the
58762306a36Sopenharmony_ci  kernel wouldn't be able to do the right thing and maintain
58862306a36Sopenharmony_ci  work-conservation in most cases. As such, it is possible that future
58962306a36Sopenharmony_ci  scheduler improvements may make most of these tunables unnecessary.
59062306a36Sopenharmony_ci
59162306a36Sopenharmony_ci
59262306a36Sopenharmony_ciExamining Configuration
59362306a36Sopenharmony_ci=======================
59462306a36Sopenharmony_ci
59562306a36Sopenharmony_ciUse tools/workqueue/wq_dump.py to examine unbound CPU affinity
59662306a36Sopenharmony_ciconfiguration, worker pools and how workqueues map to the pools: ::
59762306a36Sopenharmony_ci
59862306a36Sopenharmony_ci  $ tools/workqueue/wq_dump.py
59962306a36Sopenharmony_ci  Affinity Scopes
60062306a36Sopenharmony_ci  ===============
60162306a36Sopenharmony_ci  wq_unbound_cpumask=0000000f
60262306a36Sopenharmony_ci
60362306a36Sopenharmony_ci  CPU
60462306a36Sopenharmony_ci    nr_pods  4
60562306a36Sopenharmony_ci    pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008
60662306a36Sopenharmony_ci    pod_node [0]=0 [1]=0 [2]=1 [3]=1
60762306a36Sopenharmony_ci    cpu_pod  [0]=0 [1]=1 [2]=2 [3]=3
60862306a36Sopenharmony_ci
60962306a36Sopenharmony_ci  SMT
61062306a36Sopenharmony_ci    nr_pods  4
61162306a36Sopenharmony_ci    pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008
61262306a36Sopenharmony_ci    pod_node [0]=0 [1]=0 [2]=1 [3]=1
61362306a36Sopenharmony_ci    cpu_pod  [0]=0 [1]=1 [2]=2 [3]=3
61462306a36Sopenharmony_ci
61562306a36Sopenharmony_ci  CACHE (default)
61662306a36Sopenharmony_ci    nr_pods  2
61762306a36Sopenharmony_ci    pod_cpus [0]=00000003 [1]=0000000c
61862306a36Sopenharmony_ci    pod_node [0]=0 [1]=1
61962306a36Sopenharmony_ci    cpu_pod  [0]=0 [1]=0 [2]=1 [3]=1
62062306a36Sopenharmony_ci
62162306a36Sopenharmony_ci  NUMA
62262306a36Sopenharmony_ci    nr_pods  2
62362306a36Sopenharmony_ci    pod_cpus [0]=00000003 [1]=0000000c
62462306a36Sopenharmony_ci    pod_node [0]=0 [1]=1
62562306a36Sopenharmony_ci    cpu_pod  [0]=0 [1]=0 [2]=1 [3]=1
62662306a36Sopenharmony_ci
62762306a36Sopenharmony_ci  SYSTEM
62862306a36Sopenharmony_ci    nr_pods  1
62962306a36Sopenharmony_ci    pod_cpus [0]=0000000f
63062306a36Sopenharmony_ci    pod_node [0]=-1
63162306a36Sopenharmony_ci    cpu_pod  [0]=0 [1]=0 [2]=0 [3]=0
63262306a36Sopenharmony_ci
63362306a36Sopenharmony_ci  Worker Pools
63462306a36Sopenharmony_ci  ============
63562306a36Sopenharmony_ci  pool[00] ref= 1 nice=  0 idle/workers=  4/  4 cpu=  0
63662306a36Sopenharmony_ci  pool[01] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  0
63762306a36Sopenharmony_ci  pool[02] ref= 1 nice=  0 idle/workers=  4/  4 cpu=  1
63862306a36Sopenharmony_ci  pool[03] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  1
63962306a36Sopenharmony_ci  pool[04] ref= 1 nice=  0 idle/workers=  4/  4 cpu=  2
64062306a36Sopenharmony_ci  pool[05] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  2
64162306a36Sopenharmony_ci  pool[06] ref= 1 nice=  0 idle/workers=  3/  3 cpu=  3
64262306a36Sopenharmony_ci  pool[07] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  3
64362306a36Sopenharmony_ci  pool[08] ref=42 nice=  0 idle/workers=  6/  6 cpus=0000000f
64462306a36Sopenharmony_ci  pool[09] ref=28 nice=  0 idle/workers=  3/  3 cpus=00000003
64562306a36Sopenharmony_ci  pool[10] ref=28 nice=  0 idle/workers= 17/ 17 cpus=0000000c
64662306a36Sopenharmony_ci  pool[11] ref= 1 nice=-20 idle/workers=  1/  1 cpus=0000000f
64762306a36Sopenharmony_ci  pool[12] ref= 2 nice=-20 idle/workers=  1/  1 cpus=00000003
64862306a36Sopenharmony_ci  pool[13] ref= 2 nice=-20 idle/workers=  1/  1 cpus=0000000c
64962306a36Sopenharmony_ci
65062306a36Sopenharmony_ci  Workqueue CPU -> pool
65162306a36Sopenharmony_ci  =====================
65262306a36Sopenharmony_ci  [    workqueue \ CPU              0  1  2  3 dfl]
65362306a36Sopenharmony_ci  events                   percpu   0  2  4  6
65462306a36Sopenharmony_ci  events_highpri           percpu   1  3  5  7
65562306a36Sopenharmony_ci  events_long              percpu   0  2  4  6
65662306a36Sopenharmony_ci  events_unbound           unbound  9  9 10 10  8
65762306a36Sopenharmony_ci  events_freezable         percpu   0  2  4  6
65862306a36Sopenharmony_ci  events_power_efficient   percpu   0  2  4  6
65962306a36Sopenharmony_ci  events_freezable_power_  percpu   0  2  4  6
66062306a36Sopenharmony_ci  rcu_gp                   percpu   0  2  4  6
66162306a36Sopenharmony_ci  rcu_par_gp               percpu   0  2  4  6
66262306a36Sopenharmony_ci  slub_flushwq             percpu   0  2  4  6
66362306a36Sopenharmony_ci  netns                    ordered  8  8  8  8  8
66462306a36Sopenharmony_ci  ...
66562306a36Sopenharmony_ci
66662306a36Sopenharmony_ciSee the command's help message for more info.
66762306a36Sopenharmony_ci
66862306a36Sopenharmony_ci
66962306a36Sopenharmony_ciMonitoring
67062306a36Sopenharmony_ci==========
67162306a36Sopenharmony_ci
67262306a36Sopenharmony_ciUse tools/workqueue/wq_monitor.py to monitor workqueue operations: ::
67362306a36Sopenharmony_ci
67462306a36Sopenharmony_ci  $ tools/workqueue/wq_monitor.py events
67562306a36Sopenharmony_ci                              total  infl  CPUtime  CPUhog CMW/RPR  mayday rescued
67662306a36Sopenharmony_ci  events                      18545     0      6.1       0       5       -       -
67762306a36Sopenharmony_ci  events_highpri                  8     0      0.0       0       0       -       -
67862306a36Sopenharmony_ci  events_long                     3     0      0.0       0       0       -       -
67962306a36Sopenharmony_ci  events_unbound              38306     0      0.1       -       7       -       -
68062306a36Sopenharmony_ci  events_freezable                0     0      0.0       0       0       -       -
68162306a36Sopenharmony_ci  events_power_efficient      29598     0      0.2       0       0       -       -
68262306a36Sopenharmony_ci  events_freezable_power_        10     0      0.0       0       0       -       -
68362306a36Sopenharmony_ci  sock_diag_events                0     0      0.0       0       0       -       -
68462306a36Sopenharmony_ci
68562306a36Sopenharmony_ci                              total  infl  CPUtime  CPUhog CMW/RPR  mayday rescued
68662306a36Sopenharmony_ci  events                      18548     0      6.1       0       5       -       -
68762306a36Sopenharmony_ci  events_highpri                  8     0      0.0       0       0       -       -
68862306a36Sopenharmony_ci  events_long                     3     0      0.0       0       0       -       -
68962306a36Sopenharmony_ci  events_unbound              38322     0      0.1       -       7       -       -
69062306a36Sopenharmony_ci  events_freezable                0     0      0.0       0       0       -       -
69162306a36Sopenharmony_ci  events_power_efficient      29603     0      0.2       0       0       -       -
69262306a36Sopenharmony_ci  events_freezable_power_        10     0      0.0       0       0       -       -
69362306a36Sopenharmony_ci  sock_diag_events                0     0      0.0       0       0       -       -
69462306a36Sopenharmony_ci
69562306a36Sopenharmony_ci  ...
69662306a36Sopenharmony_ci
69762306a36Sopenharmony_ciSee the command's help message for more info.
69862306a36Sopenharmony_ci
69962306a36Sopenharmony_ci
70062306a36Sopenharmony_ciDebugging
70162306a36Sopenharmony_ci=========
70262306a36Sopenharmony_ci
70362306a36Sopenharmony_ciBecause the work functions are executed by generic worker threads
70462306a36Sopenharmony_cithere are a few tricks needed to shed some light on misbehaving
70562306a36Sopenharmony_ciworkqueue users.
70662306a36Sopenharmony_ci
70762306a36Sopenharmony_ciWorker threads show up in the process list as: ::
70862306a36Sopenharmony_ci
70962306a36Sopenharmony_ci  root      5671  0.0  0.0      0     0 ?        S    12:07   0:00 [kworker/0:1]
71062306a36Sopenharmony_ci  root      5672  0.0  0.0      0     0 ?        S    12:07   0:00 [kworker/1:2]
71162306a36Sopenharmony_ci  root      5673  0.0  0.0      0     0 ?        S    12:12   0:00 [kworker/0:0]
71262306a36Sopenharmony_ci  root      5674  0.0  0.0      0     0 ?        S    12:13   0:00 [kworker/1:0]
71362306a36Sopenharmony_ci
71462306a36Sopenharmony_ciIf kworkers are going crazy (using too much cpu), there are two types
71562306a36Sopenharmony_ciof possible problems:
71662306a36Sopenharmony_ci
71762306a36Sopenharmony_ci	1. Something being scheduled in rapid succession
71862306a36Sopenharmony_ci	2. A single work item that consumes lots of cpu cycles
71962306a36Sopenharmony_ci
72062306a36Sopenharmony_ciThe first one can be tracked using tracing: ::
72162306a36Sopenharmony_ci
72262306a36Sopenharmony_ci	$ echo workqueue:workqueue_queue_work > /sys/kernel/tracing/set_event
72362306a36Sopenharmony_ci	$ cat /sys/kernel/tracing/trace_pipe > out.txt
72462306a36Sopenharmony_ci	(wait a few secs)
72562306a36Sopenharmony_ci	^C
72662306a36Sopenharmony_ci
72762306a36Sopenharmony_ciIf something is busy looping on work queueing, it would be dominating
72862306a36Sopenharmony_cithe output and the offender can be determined with the work item
72962306a36Sopenharmony_cifunction.
73062306a36Sopenharmony_ci
73162306a36Sopenharmony_ciFor the second type of problems it should be possible to just check
73262306a36Sopenharmony_cithe stack trace of the offending worker thread. ::
73362306a36Sopenharmony_ci
73462306a36Sopenharmony_ci	$ cat /proc/THE_OFFENDING_KWORKER/stack
73562306a36Sopenharmony_ci
73662306a36Sopenharmony_ciThe work item's function should be trivially visible in the stack
73762306a36Sopenharmony_citrace.
73862306a36Sopenharmony_ci
73962306a36Sopenharmony_ci
74062306a36Sopenharmony_ciNon-reentrance Conditions
74162306a36Sopenharmony_ci=========================
74262306a36Sopenharmony_ci
74362306a36Sopenharmony_ciWorkqueue guarantees that a work item cannot be re-entrant if the following
74462306a36Sopenharmony_ciconditions hold after a work item gets queued:
74562306a36Sopenharmony_ci
74662306a36Sopenharmony_ci        1. The work function hasn't been changed.
74762306a36Sopenharmony_ci        2. No one queues the work item to another workqueue.
74862306a36Sopenharmony_ci        3. The work item hasn't been reinitiated.
74962306a36Sopenharmony_ci
75062306a36Sopenharmony_ciIn other words, if the above conditions hold, the work item is guaranteed to be
75162306a36Sopenharmony_ciexecuted by at most one worker system-wide at any given time.
75262306a36Sopenharmony_ci
75362306a36Sopenharmony_ciNote that requeuing the work item (to the same queue) in the self function
75462306a36Sopenharmony_cidoesn't break these conditions, so it's safe to do. Otherwise, caution is
75562306a36Sopenharmony_cirequired when breaking the conditions inside a work function.
75662306a36Sopenharmony_ci
75762306a36Sopenharmony_ci
75862306a36Sopenharmony_ciKernel Inline Documentations Reference
75962306a36Sopenharmony_ci======================================
76062306a36Sopenharmony_ci
76162306a36Sopenharmony_ci.. kernel-doc:: include/linux/workqueue.h
76262306a36Sopenharmony_ci
76362306a36Sopenharmony_ci.. kernel-doc:: kernel/workqueue.c
764