18c2ecf20Sopenharmony_ci====================================
28c2ecf20Sopenharmony_ciConcurrency Managed Workqueue (cmwq)
38c2ecf20Sopenharmony_ci====================================
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ci:Date: September, 2010
68c2ecf20Sopenharmony_ci:Author: Tejun Heo <tj@kernel.org>
78c2ecf20Sopenharmony_ci:Author: Florian Mickler <florian@mickler.org>
88c2ecf20Sopenharmony_ci
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciIntroduction
118c2ecf20Sopenharmony_ci============
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ciThere are many cases where an asynchronous process execution context
148c2ecf20Sopenharmony_ciis needed and the workqueue (wq) API is the most commonly used
158c2ecf20Sopenharmony_cimechanism for such cases.
168c2ecf20Sopenharmony_ci
178c2ecf20Sopenharmony_ciWhen such an asynchronous execution context is needed, a work item
188c2ecf20Sopenharmony_cidescribing which function to execute is put on a queue.  An
198c2ecf20Sopenharmony_ciindependent thread serves as the asynchronous execution context.  The
208c2ecf20Sopenharmony_ciqueue is called workqueue and the thread is called worker.
218c2ecf20Sopenharmony_ci
228c2ecf20Sopenharmony_ciWhile there are work items on the workqueue the worker executes the
238c2ecf20Sopenharmony_cifunctions associated with the work items one after the other.  When
248c2ecf20Sopenharmony_cithere is no work item left on the workqueue the worker becomes idle.
258c2ecf20Sopenharmony_ciWhen a new work item gets queued, the worker begins executing again.
268c2ecf20Sopenharmony_ci
278c2ecf20Sopenharmony_ci
288c2ecf20Sopenharmony_ciWhy cmwq?
298c2ecf20Sopenharmony_ci=========
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ciIn the original wq implementation, a multi threaded (MT) wq had one
328c2ecf20Sopenharmony_ciworker thread per CPU and a single threaded (ST) wq had one worker
338c2ecf20Sopenharmony_cithread system-wide.  A single MT wq needed to keep around the same
348c2ecf20Sopenharmony_cinumber of workers as the number of CPUs.  The kernel grew a lot of MT
358c2ecf20Sopenharmony_ciwq users over the years and with the number of CPU cores continuously
368c2ecf20Sopenharmony_cirising, some systems saturated the default 32k PID space just booting
378c2ecf20Sopenharmony_ciup.
388c2ecf20Sopenharmony_ci
398c2ecf20Sopenharmony_ciAlthough MT wq wasted a lot of resource, the level of concurrency
408c2ecf20Sopenharmony_ciprovided was unsatisfactory.  The limitation was common to both ST and
418c2ecf20Sopenharmony_ciMT wq albeit less severe on MT.  Each wq maintained its own separate
428c2ecf20Sopenharmony_ciworker pool.  An MT wq could provide only one execution context per CPU
438c2ecf20Sopenharmony_ciwhile an ST wq one for the whole system.  Work items had to compete for
448c2ecf20Sopenharmony_cithose very limited execution contexts leading to various problems
458c2ecf20Sopenharmony_ciincluding proneness to deadlocks around the single execution context.
468c2ecf20Sopenharmony_ci
478c2ecf20Sopenharmony_ciThe tension between the provided level of concurrency and resource
488c2ecf20Sopenharmony_ciusage also forced its users to make unnecessary tradeoffs like libata
498c2ecf20Sopenharmony_cichoosing to use ST wq for polling PIOs and accepting an unnecessary
508c2ecf20Sopenharmony_cilimitation that no two polling PIOs can progress at the same time.  As
518c2ecf20Sopenharmony_ciMT wq don't provide much better concurrency, users which require
528c2ecf20Sopenharmony_cihigher level of concurrency, like async or fscache, had to implement
538c2ecf20Sopenharmony_citheir own thread pool.
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ciConcurrency Managed Workqueue (cmwq) is a reimplementation of wq with
568c2ecf20Sopenharmony_cifocus on the following goals.
578c2ecf20Sopenharmony_ci
588c2ecf20Sopenharmony_ci* Maintain compatibility with the original workqueue API.
598c2ecf20Sopenharmony_ci
608c2ecf20Sopenharmony_ci* Use per-CPU unified worker pools shared by all wq to provide
618c2ecf20Sopenharmony_ci  flexible level of concurrency on demand without wasting a lot of
628c2ecf20Sopenharmony_ci  resource.
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ci* Automatically regulate worker pool and level of concurrency so that
658c2ecf20Sopenharmony_ci  the API users don't need to worry about such details.
668c2ecf20Sopenharmony_ci
678c2ecf20Sopenharmony_ci
688c2ecf20Sopenharmony_ciThe Design
698c2ecf20Sopenharmony_ci==========
708c2ecf20Sopenharmony_ci
718c2ecf20Sopenharmony_ciIn order to ease the asynchronous execution of functions a new
728c2ecf20Sopenharmony_ciabstraction, the work item, is introduced.
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ciA work item is a simple struct that holds a pointer to the function
758c2ecf20Sopenharmony_cithat is to be executed asynchronously.  Whenever a driver or subsystem
768c2ecf20Sopenharmony_ciwants a function to be executed asynchronously it has to set up a work
778c2ecf20Sopenharmony_ciitem pointing to that function and queue that work item on a
788c2ecf20Sopenharmony_ciworkqueue.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ciSpecial purpose threads, called worker threads, execute the functions
818c2ecf20Sopenharmony_cioff of the queue, one after the other.  If no work is queued, the
828c2ecf20Sopenharmony_ciworker threads become idle.  These worker threads are managed in so
838c2ecf20Sopenharmony_cicalled worker-pools.
848c2ecf20Sopenharmony_ci
858c2ecf20Sopenharmony_ciThe cmwq design differentiates between the user-facing workqueues that
868c2ecf20Sopenharmony_cisubsystems and drivers queue work items on and the backend mechanism
878c2ecf20Sopenharmony_ciwhich manages worker-pools and processes the queued work items.
888c2ecf20Sopenharmony_ci
898c2ecf20Sopenharmony_ciThere are two worker-pools, one for normal work items and the other
908c2ecf20Sopenharmony_cifor high priority ones, for each possible CPU and some extra
918c2ecf20Sopenharmony_ciworker-pools to serve work items queued on unbound workqueues - the
928c2ecf20Sopenharmony_cinumber of these backing pools is dynamic.
938c2ecf20Sopenharmony_ci
948c2ecf20Sopenharmony_ciSubsystems and drivers can create and queue work items through special
958c2ecf20Sopenharmony_ciworkqueue API functions as they see fit. They can influence some
968c2ecf20Sopenharmony_ciaspects of the way the work items are executed by setting flags on the
978c2ecf20Sopenharmony_ciworkqueue they are putting the work item on. These flags include
988c2ecf20Sopenharmony_cithings like CPU locality, concurrency limits, priority and more.  To
998c2ecf20Sopenharmony_ciget a detailed overview refer to the API description of
1008c2ecf20Sopenharmony_ci``alloc_workqueue()`` below.
1018c2ecf20Sopenharmony_ci
1028c2ecf20Sopenharmony_ciWhen a work item is queued to a workqueue, the target worker-pool is
1038c2ecf20Sopenharmony_cidetermined according to the queue parameters and workqueue attributes
1048c2ecf20Sopenharmony_ciand appended on the shared worklist of the worker-pool.  For example,
1058c2ecf20Sopenharmony_ciunless specifically overridden, a work item of a bound workqueue will
1068c2ecf20Sopenharmony_cibe queued on the worklist of either normal or highpri worker-pool that
1078c2ecf20Sopenharmony_ciis associated to the CPU the issuer is running on.
1088c2ecf20Sopenharmony_ci
1098c2ecf20Sopenharmony_ciFor any worker pool implementation, managing the concurrency level
1108c2ecf20Sopenharmony_ci(how many execution contexts are active) is an important issue.  cmwq
1118c2ecf20Sopenharmony_citries to keep the concurrency at a minimal but sufficient level.
1128c2ecf20Sopenharmony_ciMinimal to save resources and sufficient in that the system is used at
1138c2ecf20Sopenharmony_ciits full capacity.
1148c2ecf20Sopenharmony_ci
1158c2ecf20Sopenharmony_ciEach worker-pool bound to an actual CPU implements concurrency
1168c2ecf20Sopenharmony_cimanagement by hooking into the scheduler.  The worker-pool is notified
1178c2ecf20Sopenharmony_ciwhenever an active worker wakes up or sleeps and keeps track of the
1188c2ecf20Sopenharmony_cinumber of the currently runnable workers.  Generally, work items are
1198c2ecf20Sopenharmony_cinot expected to hog a CPU and consume many cycles.  That means
1208c2ecf20Sopenharmony_cimaintaining just enough concurrency to prevent work processing from
1218c2ecf20Sopenharmony_cistalling should be optimal.  As long as there are one or more runnable
1228c2ecf20Sopenharmony_ciworkers on the CPU, the worker-pool doesn't start execution of a new
1238c2ecf20Sopenharmony_ciwork, but, when the last running worker goes to sleep, it immediately
1248c2ecf20Sopenharmony_cischedules a new worker so that the CPU doesn't sit idle while there
1258c2ecf20Sopenharmony_ciare pending work items.  This allows using a minimal number of workers
1268c2ecf20Sopenharmony_ciwithout losing execution bandwidth.
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ciKeeping idle workers around doesn't cost other than the memory space
1298c2ecf20Sopenharmony_cifor kthreads, so cmwq holds onto idle ones for a while before killing
1308c2ecf20Sopenharmony_cithem.
1318c2ecf20Sopenharmony_ci
1328c2ecf20Sopenharmony_ciFor unbound workqueues, the number of backing pools is dynamic.
1338c2ecf20Sopenharmony_ciUnbound workqueue can be assigned custom attributes using
1348c2ecf20Sopenharmony_ci``apply_workqueue_attrs()`` and workqueue will automatically create
1358c2ecf20Sopenharmony_cibacking worker pools matching the attributes.  The responsibility of
1368c2ecf20Sopenharmony_ciregulating concurrency level is on the users.  There is also a flag to
1378c2ecf20Sopenharmony_cimark a bound wq to ignore the concurrency management.  Please refer to
1388c2ecf20Sopenharmony_cithe API section for details.
1398c2ecf20Sopenharmony_ci
1408c2ecf20Sopenharmony_ciForward progress guarantee relies on that workers can be created when
1418c2ecf20Sopenharmony_cimore execution contexts are necessary, which in turn is guaranteed
1428c2ecf20Sopenharmony_cithrough the use of rescue workers.  All work items which might be used
1438c2ecf20Sopenharmony_cion code paths that handle memory reclaim are required to be queued on
1448c2ecf20Sopenharmony_ciwq's that have a rescue-worker reserved for execution under memory
1458c2ecf20Sopenharmony_cipressure.  Else it is possible that the worker-pool deadlocks waiting
1468c2ecf20Sopenharmony_cifor execution contexts to free up.
1478c2ecf20Sopenharmony_ci
1488c2ecf20Sopenharmony_ci
1498c2ecf20Sopenharmony_ciApplication Programming Interface (API)
1508c2ecf20Sopenharmony_ci=======================================
1518c2ecf20Sopenharmony_ci
1528c2ecf20Sopenharmony_ci``alloc_workqueue()`` allocates a wq.  The original
1538c2ecf20Sopenharmony_ci``create_*workqueue()`` functions are deprecated and scheduled for
1548c2ecf20Sopenharmony_ciremoval.  ``alloc_workqueue()`` takes three arguments - ``@name``,
1558c2ecf20Sopenharmony_ci``@flags`` and ``@max_active``.  ``@name`` is the name of the wq and
1568c2ecf20Sopenharmony_cialso used as the name of the rescuer thread if there is one.
1578c2ecf20Sopenharmony_ci
1588c2ecf20Sopenharmony_ciA wq no longer manages execution resources but serves as a domain for
1598c2ecf20Sopenharmony_ciforward progress guarantee, flush and work item attributes. ``@flags``
1608c2ecf20Sopenharmony_ciand ``@max_active`` control how work items are assigned execution
1618c2ecf20Sopenharmony_ciresources, scheduled and executed.
1628c2ecf20Sopenharmony_ci
1638c2ecf20Sopenharmony_ci
1648c2ecf20Sopenharmony_ci``flags``
1658c2ecf20Sopenharmony_ci---------
1668c2ecf20Sopenharmony_ci
1678c2ecf20Sopenharmony_ci``WQ_UNBOUND``
1688c2ecf20Sopenharmony_ci  Work items queued to an unbound wq are served by the special
1698c2ecf20Sopenharmony_ci  worker-pools which host workers which are not bound to any
1708c2ecf20Sopenharmony_ci  specific CPU.  This makes the wq behave as a simple execution
1718c2ecf20Sopenharmony_ci  context provider without concurrency management.  The unbound
1728c2ecf20Sopenharmony_ci  worker-pools try to start execution of work items as soon as
1738c2ecf20Sopenharmony_ci  possible.  Unbound wq sacrifices locality but is useful for
1748c2ecf20Sopenharmony_ci  the following cases.
1758c2ecf20Sopenharmony_ci
1768c2ecf20Sopenharmony_ci  * Wide fluctuation in the concurrency level requirement is
1778c2ecf20Sopenharmony_ci    expected and using bound wq may end up creating large number
1788c2ecf20Sopenharmony_ci    of mostly unused workers across different CPUs as the issuer
1798c2ecf20Sopenharmony_ci    hops through different CPUs.
1808c2ecf20Sopenharmony_ci
1818c2ecf20Sopenharmony_ci  * Long running CPU intensive workloads which can be better
1828c2ecf20Sopenharmony_ci    managed by the system scheduler.
1838c2ecf20Sopenharmony_ci
1848c2ecf20Sopenharmony_ci``WQ_FREEZABLE``
1858c2ecf20Sopenharmony_ci  A freezable wq participates in the freeze phase of the system
1868c2ecf20Sopenharmony_ci  suspend operations.  Work items on the wq are drained and no
1878c2ecf20Sopenharmony_ci  new work item starts execution until thawed.
1888c2ecf20Sopenharmony_ci
1898c2ecf20Sopenharmony_ci``WQ_MEM_RECLAIM``
1908c2ecf20Sopenharmony_ci  All wq which might be used in the memory reclaim paths **MUST**
1918c2ecf20Sopenharmony_ci  have this flag set.  The wq is guaranteed to have at least one
1928c2ecf20Sopenharmony_ci  execution context regardless of memory pressure.
1938c2ecf20Sopenharmony_ci
1948c2ecf20Sopenharmony_ci``WQ_HIGHPRI``
1958c2ecf20Sopenharmony_ci  Work items of a highpri wq are queued to the highpri
1968c2ecf20Sopenharmony_ci  worker-pool of the target cpu.  Highpri worker-pools are
1978c2ecf20Sopenharmony_ci  served by worker threads with elevated nice level.
1988c2ecf20Sopenharmony_ci
1998c2ecf20Sopenharmony_ci  Note that normal and highpri worker-pools don't interact with
2008c2ecf20Sopenharmony_ci  each other.  Each maintains its separate pool of workers and
2018c2ecf20Sopenharmony_ci  implements concurrency management among its workers.
2028c2ecf20Sopenharmony_ci
2038c2ecf20Sopenharmony_ci``WQ_CPU_INTENSIVE``
2048c2ecf20Sopenharmony_ci  Work items of a CPU intensive wq do not contribute to the
2058c2ecf20Sopenharmony_ci  concurrency level.  In other words, runnable CPU intensive
2068c2ecf20Sopenharmony_ci  work items will not prevent other work items in the same
2078c2ecf20Sopenharmony_ci  worker-pool from starting execution.  This is useful for bound
2088c2ecf20Sopenharmony_ci  work items which are expected to hog CPU cycles so that their
2098c2ecf20Sopenharmony_ci  execution is regulated by the system scheduler.
2108c2ecf20Sopenharmony_ci
2118c2ecf20Sopenharmony_ci  Although CPU intensive work items don't contribute to the
2128c2ecf20Sopenharmony_ci  concurrency level, start of their executions is still
2138c2ecf20Sopenharmony_ci  regulated by the concurrency management and runnable
2148c2ecf20Sopenharmony_ci  non-CPU-intensive work items can delay execution of CPU
2158c2ecf20Sopenharmony_ci  intensive work items.
2168c2ecf20Sopenharmony_ci
2178c2ecf20Sopenharmony_ci  This flag is meaningless for unbound wq.
2188c2ecf20Sopenharmony_ci
2198c2ecf20Sopenharmony_ciNote that the flag ``WQ_NON_REENTRANT`` no longer exists as all
2208c2ecf20Sopenharmony_ciworkqueues are now non-reentrant - any work item is guaranteed to be
2218c2ecf20Sopenharmony_ciexecuted by at most one worker system-wide at any given time.
2228c2ecf20Sopenharmony_ci
2238c2ecf20Sopenharmony_ci
2248c2ecf20Sopenharmony_ci``max_active``
2258c2ecf20Sopenharmony_ci--------------
2268c2ecf20Sopenharmony_ci
2278c2ecf20Sopenharmony_ci``@max_active`` determines the maximum number of execution contexts
2288c2ecf20Sopenharmony_ciper CPU which can be assigned to the work items of a wq.  For example,
2298c2ecf20Sopenharmony_ciwith ``@max_active`` of 16, at most 16 work items of the wq can be
2308c2ecf20Sopenharmony_ciexecuting at the same time per CPU.
2318c2ecf20Sopenharmony_ci
2328c2ecf20Sopenharmony_ciCurrently, for a bound wq, the maximum limit for ``@max_active`` is
2338c2ecf20Sopenharmony_ci512 and the default value used when 0 is specified is 256.  For an
2348c2ecf20Sopenharmony_ciunbound wq, the limit is higher of 512 and 4 *
2358c2ecf20Sopenharmony_ci``num_possible_cpus()``.  These values are chosen sufficiently high
2368c2ecf20Sopenharmony_cisuch that they are not the limiting factor while providing protection
2378c2ecf20Sopenharmony_ciin runaway cases.
2388c2ecf20Sopenharmony_ci
2398c2ecf20Sopenharmony_ciThe number of active work items of a wq is usually regulated by the
2408c2ecf20Sopenharmony_ciusers of the wq, more specifically, by how many work items the users
2418c2ecf20Sopenharmony_cimay queue at the same time.  Unless there is a specific need for
2428c2ecf20Sopenharmony_cithrottling the number of active work items, specifying '0' is
2438c2ecf20Sopenharmony_cirecommended.
2448c2ecf20Sopenharmony_ci
2458c2ecf20Sopenharmony_ciSome users depend on the strict execution ordering of ST wq.  The
2468c2ecf20Sopenharmony_cicombination of ``@max_active`` of 1 and ``WQ_UNBOUND`` used to
2478c2ecf20Sopenharmony_ciachieve this behavior.  Work items on such wq were always queued to the
2488c2ecf20Sopenharmony_ciunbound worker-pools and only one work item could be active at any given
2498c2ecf20Sopenharmony_citime thus achieving the same ordering property as ST wq.
2508c2ecf20Sopenharmony_ci
2518c2ecf20Sopenharmony_ciIn the current implementation the above configuration only guarantees
2528c2ecf20Sopenharmony_ciST behavior within a given NUMA node. Instead ``alloc_ordered_queue()`` should
2538c2ecf20Sopenharmony_cibe used to achieve system-wide ST behavior.
2548c2ecf20Sopenharmony_ci
2558c2ecf20Sopenharmony_ci
2568c2ecf20Sopenharmony_ciExample Execution Scenarios
2578c2ecf20Sopenharmony_ci===========================
2588c2ecf20Sopenharmony_ci
2598c2ecf20Sopenharmony_ciThe following example execution scenarios try to illustrate how cmwq
2608c2ecf20Sopenharmony_cibehave under different configurations.
2618c2ecf20Sopenharmony_ci
2628c2ecf20Sopenharmony_ci Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU.
2638c2ecf20Sopenharmony_ci w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms
2648c2ecf20Sopenharmony_ci again before finishing.  w1 and w2 burn CPU for 5ms then sleep for
2658c2ecf20Sopenharmony_ci 10ms.
2668c2ecf20Sopenharmony_ci
2678c2ecf20Sopenharmony_ciIgnoring all other tasks, works and processing overhead, and assuming
2688c2ecf20Sopenharmony_cisimple FIFO scheduling, the following is one highly simplified version
2698c2ecf20Sopenharmony_ciof possible sequences of events with the original wq. ::
2708c2ecf20Sopenharmony_ci
2718c2ecf20Sopenharmony_ci TIME IN MSECS	EVENT
2728c2ecf20Sopenharmony_ci 0		w0 starts and burns CPU
2738c2ecf20Sopenharmony_ci 5		w0 sleeps
2748c2ecf20Sopenharmony_ci 15		w0 wakes up and burns CPU
2758c2ecf20Sopenharmony_ci 20		w0 finishes
2768c2ecf20Sopenharmony_ci 20		w1 starts and burns CPU
2778c2ecf20Sopenharmony_ci 25		w1 sleeps
2788c2ecf20Sopenharmony_ci 35		w1 wakes up and finishes
2798c2ecf20Sopenharmony_ci 35		w2 starts and burns CPU
2808c2ecf20Sopenharmony_ci 40		w2 sleeps
2818c2ecf20Sopenharmony_ci 50		w2 wakes up and finishes
2828c2ecf20Sopenharmony_ci
2838c2ecf20Sopenharmony_ciAnd with cmwq with ``@max_active`` >= 3, ::
2848c2ecf20Sopenharmony_ci
2858c2ecf20Sopenharmony_ci TIME IN MSECS	EVENT
2868c2ecf20Sopenharmony_ci 0		w0 starts and burns CPU
2878c2ecf20Sopenharmony_ci 5		w0 sleeps
2888c2ecf20Sopenharmony_ci 5		w1 starts and burns CPU
2898c2ecf20Sopenharmony_ci 10		w1 sleeps
2908c2ecf20Sopenharmony_ci 10		w2 starts and burns CPU
2918c2ecf20Sopenharmony_ci 15		w2 sleeps
2928c2ecf20Sopenharmony_ci 15		w0 wakes up and burns CPU
2938c2ecf20Sopenharmony_ci 20		w0 finishes
2948c2ecf20Sopenharmony_ci 20		w1 wakes up and finishes
2958c2ecf20Sopenharmony_ci 25		w2 wakes up and finishes
2968c2ecf20Sopenharmony_ci
2978c2ecf20Sopenharmony_ciIf ``@max_active`` == 2, ::
2988c2ecf20Sopenharmony_ci
2998c2ecf20Sopenharmony_ci TIME IN MSECS	EVENT
3008c2ecf20Sopenharmony_ci 0		w0 starts and burns CPU
3018c2ecf20Sopenharmony_ci 5		w0 sleeps
3028c2ecf20Sopenharmony_ci 5		w1 starts and burns CPU
3038c2ecf20Sopenharmony_ci 10		w1 sleeps
3048c2ecf20Sopenharmony_ci 15		w0 wakes up and burns CPU
3058c2ecf20Sopenharmony_ci 20		w0 finishes
3068c2ecf20Sopenharmony_ci 20		w1 wakes up and finishes
3078c2ecf20Sopenharmony_ci 20		w2 starts and burns CPU
3088c2ecf20Sopenharmony_ci 25		w2 sleeps
3098c2ecf20Sopenharmony_ci 35		w2 wakes up and finishes
3108c2ecf20Sopenharmony_ci
3118c2ecf20Sopenharmony_ciNow, let's assume w1 and w2 are queued to a different wq q1 which has
3128c2ecf20Sopenharmony_ci``WQ_CPU_INTENSIVE`` set, ::
3138c2ecf20Sopenharmony_ci
3148c2ecf20Sopenharmony_ci TIME IN MSECS	EVENT
3158c2ecf20Sopenharmony_ci 0		w0 starts and burns CPU
3168c2ecf20Sopenharmony_ci 5		w0 sleeps
3178c2ecf20Sopenharmony_ci 5		w1 and w2 start and burn CPU
3188c2ecf20Sopenharmony_ci 10		w1 sleeps
3198c2ecf20Sopenharmony_ci 15		w2 sleeps
3208c2ecf20Sopenharmony_ci 15		w0 wakes up and burns CPU
3218c2ecf20Sopenharmony_ci 20		w0 finishes
3228c2ecf20Sopenharmony_ci 20		w1 wakes up and finishes
3238c2ecf20Sopenharmony_ci 25		w2 wakes up and finishes
3248c2ecf20Sopenharmony_ci
3258c2ecf20Sopenharmony_ci
3268c2ecf20Sopenharmony_ciGuidelines
3278c2ecf20Sopenharmony_ci==========
3288c2ecf20Sopenharmony_ci
3298c2ecf20Sopenharmony_ci* Do not forget to use ``WQ_MEM_RECLAIM`` if a wq may process work
3308c2ecf20Sopenharmony_ci  items which are used during memory reclaim.  Each wq with
3318c2ecf20Sopenharmony_ci  ``WQ_MEM_RECLAIM`` set has an execution context reserved for it.  If
3328c2ecf20Sopenharmony_ci  there is dependency among multiple work items used during memory
3338c2ecf20Sopenharmony_ci  reclaim, they should be queued to separate wq each with
3348c2ecf20Sopenharmony_ci  ``WQ_MEM_RECLAIM``.
3358c2ecf20Sopenharmony_ci
3368c2ecf20Sopenharmony_ci* Unless strict ordering is required, there is no need to use ST wq.
3378c2ecf20Sopenharmony_ci
3388c2ecf20Sopenharmony_ci* Unless there is a specific need, using 0 for @max_active is
3398c2ecf20Sopenharmony_ci  recommended.  In most use cases, concurrency level usually stays
3408c2ecf20Sopenharmony_ci  well under the default limit.
3418c2ecf20Sopenharmony_ci
3428c2ecf20Sopenharmony_ci* A wq serves as a domain for forward progress guarantee
3438c2ecf20Sopenharmony_ci  (``WQ_MEM_RECLAIM``, flush and work item attributes.  Work items
3448c2ecf20Sopenharmony_ci  which are not involved in memory reclaim and don't need to be
3458c2ecf20Sopenharmony_ci  flushed as a part of a group of work items, and don't require any
3468c2ecf20Sopenharmony_ci  special attribute, can use one of the system wq.  There is no
3478c2ecf20Sopenharmony_ci  difference in execution characteristics between using a dedicated wq
3488c2ecf20Sopenharmony_ci  and a system wq.
3498c2ecf20Sopenharmony_ci
3508c2ecf20Sopenharmony_ci* Unless work items are expected to consume a huge amount of CPU
3518c2ecf20Sopenharmony_ci  cycles, using a bound wq is usually beneficial due to the increased
3528c2ecf20Sopenharmony_ci  level of locality in wq operations and work item execution.
3538c2ecf20Sopenharmony_ci
3548c2ecf20Sopenharmony_ci
3558c2ecf20Sopenharmony_ciDebugging
3568c2ecf20Sopenharmony_ci=========
3578c2ecf20Sopenharmony_ci
3588c2ecf20Sopenharmony_ciBecause the work functions are executed by generic worker threads
3598c2ecf20Sopenharmony_cithere are a few tricks needed to shed some light on misbehaving
3608c2ecf20Sopenharmony_ciworkqueue users.
3618c2ecf20Sopenharmony_ci
3628c2ecf20Sopenharmony_ciWorker threads show up in the process list as: ::
3638c2ecf20Sopenharmony_ci
3648c2ecf20Sopenharmony_ci  root      5671  0.0  0.0      0     0 ?        S    12:07   0:00 [kworker/0:1]
3658c2ecf20Sopenharmony_ci  root      5672  0.0  0.0      0     0 ?        S    12:07   0:00 [kworker/1:2]
3668c2ecf20Sopenharmony_ci  root      5673  0.0  0.0      0     0 ?        S    12:12   0:00 [kworker/0:0]
3678c2ecf20Sopenharmony_ci  root      5674  0.0  0.0      0     0 ?        S    12:13   0:00 [kworker/1:0]
3688c2ecf20Sopenharmony_ci
3698c2ecf20Sopenharmony_ciIf kworkers are going crazy (using too much cpu), there are two types
3708c2ecf20Sopenharmony_ciof possible problems:
3718c2ecf20Sopenharmony_ci
3728c2ecf20Sopenharmony_ci	1. Something being scheduled in rapid succession
3738c2ecf20Sopenharmony_ci	2. A single work item that consumes lots of cpu cycles
3748c2ecf20Sopenharmony_ci
3758c2ecf20Sopenharmony_ciThe first one can be tracked using tracing: ::
3768c2ecf20Sopenharmony_ci
3778c2ecf20Sopenharmony_ci	$ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
3788c2ecf20Sopenharmony_ci	$ cat /sys/kernel/debug/tracing/trace_pipe > out.txt
3798c2ecf20Sopenharmony_ci	(wait a few secs)
3808c2ecf20Sopenharmony_ci	^C
3818c2ecf20Sopenharmony_ci
3828c2ecf20Sopenharmony_ciIf something is busy looping on work queueing, it would be dominating
3838c2ecf20Sopenharmony_cithe output and the offender can be determined with the work item
3848c2ecf20Sopenharmony_cifunction.
3858c2ecf20Sopenharmony_ci
3868c2ecf20Sopenharmony_ciFor the second type of problems it should be possible to just check
3878c2ecf20Sopenharmony_cithe stack trace of the offending worker thread. ::
3888c2ecf20Sopenharmony_ci
3898c2ecf20Sopenharmony_ci	$ cat /proc/THE_OFFENDING_KWORKER/stack
3908c2ecf20Sopenharmony_ci
3918c2ecf20Sopenharmony_ciThe work item's function should be trivially visible in the stack
3928c2ecf20Sopenharmony_citrace.
3938c2ecf20Sopenharmony_ci
3948c2ecf20Sopenharmony_ci
3958c2ecf20Sopenharmony_ciKernel Inline Documentations Reference
3968c2ecf20Sopenharmony_ci======================================
3978c2ecf20Sopenharmony_ci
3988c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/workqueue.h
3998c2ecf20Sopenharmony_ci
4008c2ecf20Sopenharmony_ci.. kernel-doc:: kernel/workqueue.c
401