18c2ecf20Sopenharmony_ci==================================== 28c2ecf20Sopenharmony_ciConcurrency Managed Workqueue (cmwq) 38c2ecf20Sopenharmony_ci==================================== 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ci:Date: September, 2010 68c2ecf20Sopenharmony_ci:Author: Tejun Heo <tj@kernel.org> 78c2ecf20Sopenharmony_ci:Author: Florian Mickler <florian@mickler.org> 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ciIntroduction 118c2ecf20Sopenharmony_ci============ 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciThere are many cases where an asynchronous process execution context 148c2ecf20Sopenharmony_ciis needed and the workqueue (wq) API is the most commonly used 158c2ecf20Sopenharmony_cimechanism for such cases. 168c2ecf20Sopenharmony_ci 178c2ecf20Sopenharmony_ciWhen such an asynchronous execution context is needed, a work item 188c2ecf20Sopenharmony_cidescribing which function to execute is put on a queue. An 198c2ecf20Sopenharmony_ciindependent thread serves as the asynchronous execution context. The 208c2ecf20Sopenharmony_ciqueue is called workqueue and the thread is called worker. 218c2ecf20Sopenharmony_ci 228c2ecf20Sopenharmony_ciWhile there are work items on the workqueue the worker executes the 238c2ecf20Sopenharmony_cifunctions associated with the work items one after the other. When 248c2ecf20Sopenharmony_cithere is no work item left on the workqueue the worker becomes idle. 258c2ecf20Sopenharmony_ciWhen a new work item gets queued, the worker begins executing again. 268c2ecf20Sopenharmony_ci 278c2ecf20Sopenharmony_ci 288c2ecf20Sopenharmony_ciWhy cmwq? 298c2ecf20Sopenharmony_ci========= 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ciIn the original wq implementation, a multi threaded (MT) wq had one 328c2ecf20Sopenharmony_ciworker thread per CPU and a single threaded (ST) wq had one worker 338c2ecf20Sopenharmony_cithread system-wide. A single MT wq needed to keep around the same 348c2ecf20Sopenharmony_cinumber of workers as the number of CPUs. The kernel grew a lot of MT 358c2ecf20Sopenharmony_ciwq users over the years and with the number of CPU cores continuously 368c2ecf20Sopenharmony_cirising, some systems saturated the default 32k PID space just booting 378c2ecf20Sopenharmony_ciup. 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_ciAlthough MT wq wasted a lot of resource, the level of concurrency 408c2ecf20Sopenharmony_ciprovided was unsatisfactory. The limitation was common to both ST and 418c2ecf20Sopenharmony_ciMT wq albeit less severe on MT. Each wq maintained its own separate 428c2ecf20Sopenharmony_ciworker pool. An MT wq could provide only one execution context per CPU 438c2ecf20Sopenharmony_ciwhile an ST wq one for the whole system. Work items had to compete for 448c2ecf20Sopenharmony_cithose very limited execution contexts leading to various problems 458c2ecf20Sopenharmony_ciincluding proneness to deadlocks around the single execution context. 468c2ecf20Sopenharmony_ci 478c2ecf20Sopenharmony_ciThe tension between the provided level of concurrency and resource 488c2ecf20Sopenharmony_ciusage also forced its users to make unnecessary tradeoffs like libata 498c2ecf20Sopenharmony_cichoosing to use ST wq for polling PIOs and accepting an unnecessary 508c2ecf20Sopenharmony_cilimitation that no two polling PIOs can progress at the same time. As 518c2ecf20Sopenharmony_ciMT wq don't provide much better concurrency, users which require 528c2ecf20Sopenharmony_cihigher level of concurrency, like async or fscache, had to implement 538c2ecf20Sopenharmony_citheir own thread pool. 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ciConcurrency Managed Workqueue (cmwq) is a reimplementation of wq with 568c2ecf20Sopenharmony_cifocus on the following goals. 578c2ecf20Sopenharmony_ci 588c2ecf20Sopenharmony_ci* Maintain compatibility with the original workqueue API. 598c2ecf20Sopenharmony_ci 608c2ecf20Sopenharmony_ci* Use per-CPU unified worker pools shared by all wq to provide 618c2ecf20Sopenharmony_ci flexible level of concurrency on demand without wasting a lot of 628c2ecf20Sopenharmony_ci resource. 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ci* Automatically regulate worker pool and level of concurrency so that 658c2ecf20Sopenharmony_ci the API users don't need to worry about such details. 668c2ecf20Sopenharmony_ci 678c2ecf20Sopenharmony_ci 688c2ecf20Sopenharmony_ciThe Design 698c2ecf20Sopenharmony_ci========== 708c2ecf20Sopenharmony_ci 718c2ecf20Sopenharmony_ciIn order to ease the asynchronous execution of functions a new 728c2ecf20Sopenharmony_ciabstraction, the work item, is introduced. 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_ciA work item is a simple struct that holds a pointer to the function 758c2ecf20Sopenharmony_cithat is to be executed asynchronously. Whenever a driver or subsystem 768c2ecf20Sopenharmony_ciwants a function to be executed asynchronously it has to set up a work 778c2ecf20Sopenharmony_ciitem pointing to that function and queue that work item on a 788c2ecf20Sopenharmony_ciworkqueue. 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ciSpecial purpose threads, called worker threads, execute the functions 818c2ecf20Sopenharmony_cioff of the queue, one after the other. If no work is queued, the 828c2ecf20Sopenharmony_ciworker threads become idle. These worker threads are managed in so 838c2ecf20Sopenharmony_cicalled worker-pools. 848c2ecf20Sopenharmony_ci 858c2ecf20Sopenharmony_ciThe cmwq design differentiates between the user-facing workqueues that 868c2ecf20Sopenharmony_cisubsystems and drivers queue work items on and the backend mechanism 878c2ecf20Sopenharmony_ciwhich manages worker-pools and processes the queued work items. 888c2ecf20Sopenharmony_ci 898c2ecf20Sopenharmony_ciThere are two worker-pools, one for normal work items and the other 908c2ecf20Sopenharmony_cifor high priority ones, for each possible CPU and some extra 918c2ecf20Sopenharmony_ciworker-pools to serve work items queued on unbound workqueues - the 928c2ecf20Sopenharmony_cinumber of these backing pools is dynamic. 938c2ecf20Sopenharmony_ci 948c2ecf20Sopenharmony_ciSubsystems and drivers can create and queue work items through special 958c2ecf20Sopenharmony_ciworkqueue API functions as they see fit. They can influence some 968c2ecf20Sopenharmony_ciaspects of the way the work items are executed by setting flags on the 978c2ecf20Sopenharmony_ciworkqueue they are putting the work item on. These flags include 988c2ecf20Sopenharmony_cithings like CPU locality, concurrency limits, priority and more. To 998c2ecf20Sopenharmony_ciget a detailed overview refer to the API description of 1008c2ecf20Sopenharmony_ci``alloc_workqueue()`` below. 1018c2ecf20Sopenharmony_ci 1028c2ecf20Sopenharmony_ciWhen a work item is queued to a workqueue, the target worker-pool is 1038c2ecf20Sopenharmony_cidetermined according to the queue parameters and workqueue attributes 1048c2ecf20Sopenharmony_ciand appended on the shared worklist of the worker-pool. For example, 1058c2ecf20Sopenharmony_ciunless specifically overridden, a work item of a bound workqueue will 1068c2ecf20Sopenharmony_cibe queued on the worklist of either normal or highpri worker-pool that 1078c2ecf20Sopenharmony_ciis associated to the CPU the issuer is running on. 1088c2ecf20Sopenharmony_ci 1098c2ecf20Sopenharmony_ciFor any worker pool implementation, managing the concurrency level 1108c2ecf20Sopenharmony_ci(how many execution contexts are active) is an important issue. cmwq 1118c2ecf20Sopenharmony_citries to keep the concurrency at a minimal but sufficient level. 1128c2ecf20Sopenharmony_ciMinimal to save resources and sufficient in that the system is used at 1138c2ecf20Sopenharmony_ciits full capacity. 1148c2ecf20Sopenharmony_ci 1158c2ecf20Sopenharmony_ciEach worker-pool bound to an actual CPU implements concurrency 1168c2ecf20Sopenharmony_cimanagement by hooking into the scheduler. The worker-pool is notified 1178c2ecf20Sopenharmony_ciwhenever an active worker wakes up or sleeps and keeps track of the 1188c2ecf20Sopenharmony_cinumber of the currently runnable workers. Generally, work items are 1198c2ecf20Sopenharmony_cinot expected to hog a CPU and consume many cycles. That means 1208c2ecf20Sopenharmony_cimaintaining just enough concurrency to prevent work processing from 1218c2ecf20Sopenharmony_cistalling should be optimal. As long as there are one or more runnable 1228c2ecf20Sopenharmony_ciworkers on the CPU, the worker-pool doesn't start execution of a new 1238c2ecf20Sopenharmony_ciwork, but, when the last running worker goes to sleep, it immediately 1248c2ecf20Sopenharmony_cischedules a new worker so that the CPU doesn't sit idle while there 1258c2ecf20Sopenharmony_ciare pending work items. This allows using a minimal number of workers 1268c2ecf20Sopenharmony_ciwithout losing execution bandwidth. 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ciKeeping idle workers around doesn't cost other than the memory space 1298c2ecf20Sopenharmony_cifor kthreads, so cmwq holds onto idle ones for a while before killing 1308c2ecf20Sopenharmony_cithem. 1318c2ecf20Sopenharmony_ci 1328c2ecf20Sopenharmony_ciFor unbound workqueues, the number of backing pools is dynamic. 1338c2ecf20Sopenharmony_ciUnbound workqueue can be assigned custom attributes using 1348c2ecf20Sopenharmony_ci``apply_workqueue_attrs()`` and workqueue will automatically create 1358c2ecf20Sopenharmony_cibacking worker pools matching the attributes. The responsibility of 1368c2ecf20Sopenharmony_ciregulating concurrency level is on the users. There is also a flag to 1378c2ecf20Sopenharmony_cimark a bound wq to ignore the concurrency management. Please refer to 1388c2ecf20Sopenharmony_cithe API section for details. 1398c2ecf20Sopenharmony_ci 1408c2ecf20Sopenharmony_ciForward progress guarantee relies on that workers can be created when 1418c2ecf20Sopenharmony_cimore execution contexts are necessary, which in turn is guaranteed 1428c2ecf20Sopenharmony_cithrough the use of rescue workers. All work items which might be used 1438c2ecf20Sopenharmony_cion code paths that handle memory reclaim are required to be queued on 1448c2ecf20Sopenharmony_ciwq's that have a rescue-worker reserved for execution under memory 1458c2ecf20Sopenharmony_cipressure. Else it is possible that the worker-pool deadlocks waiting 1468c2ecf20Sopenharmony_cifor execution contexts to free up. 1478c2ecf20Sopenharmony_ci 1488c2ecf20Sopenharmony_ci 1498c2ecf20Sopenharmony_ciApplication Programming Interface (API) 1508c2ecf20Sopenharmony_ci======================================= 1518c2ecf20Sopenharmony_ci 1528c2ecf20Sopenharmony_ci``alloc_workqueue()`` allocates a wq. The original 1538c2ecf20Sopenharmony_ci``create_*workqueue()`` functions are deprecated and scheduled for 1548c2ecf20Sopenharmony_ciremoval. ``alloc_workqueue()`` takes three arguments - ``@name``, 1558c2ecf20Sopenharmony_ci``@flags`` and ``@max_active``. ``@name`` is the name of the wq and 1568c2ecf20Sopenharmony_cialso used as the name of the rescuer thread if there is one. 1578c2ecf20Sopenharmony_ci 1588c2ecf20Sopenharmony_ciA wq no longer manages execution resources but serves as a domain for 1598c2ecf20Sopenharmony_ciforward progress guarantee, flush and work item attributes. ``@flags`` 1608c2ecf20Sopenharmony_ciand ``@max_active`` control how work items are assigned execution 1618c2ecf20Sopenharmony_ciresources, scheduled and executed. 1628c2ecf20Sopenharmony_ci 1638c2ecf20Sopenharmony_ci 1648c2ecf20Sopenharmony_ci``flags`` 1658c2ecf20Sopenharmony_ci--------- 1668c2ecf20Sopenharmony_ci 1678c2ecf20Sopenharmony_ci``WQ_UNBOUND`` 1688c2ecf20Sopenharmony_ci Work items queued to an unbound wq are served by the special 1698c2ecf20Sopenharmony_ci worker-pools which host workers which are not bound to any 1708c2ecf20Sopenharmony_ci specific CPU. This makes the wq behave as a simple execution 1718c2ecf20Sopenharmony_ci context provider without concurrency management. The unbound 1728c2ecf20Sopenharmony_ci worker-pools try to start execution of work items as soon as 1738c2ecf20Sopenharmony_ci possible. Unbound wq sacrifices locality but is useful for 1748c2ecf20Sopenharmony_ci the following cases. 1758c2ecf20Sopenharmony_ci 1768c2ecf20Sopenharmony_ci * Wide fluctuation in the concurrency level requirement is 1778c2ecf20Sopenharmony_ci expected and using bound wq may end up creating large number 1788c2ecf20Sopenharmony_ci of mostly unused workers across different CPUs as the issuer 1798c2ecf20Sopenharmony_ci hops through different CPUs. 1808c2ecf20Sopenharmony_ci 1818c2ecf20Sopenharmony_ci * Long running CPU intensive workloads which can be better 1828c2ecf20Sopenharmony_ci managed by the system scheduler. 1838c2ecf20Sopenharmony_ci 1848c2ecf20Sopenharmony_ci``WQ_FREEZABLE`` 1858c2ecf20Sopenharmony_ci A freezable wq participates in the freeze phase of the system 1868c2ecf20Sopenharmony_ci suspend operations. Work items on the wq are drained and no 1878c2ecf20Sopenharmony_ci new work item starts execution until thawed. 1888c2ecf20Sopenharmony_ci 1898c2ecf20Sopenharmony_ci``WQ_MEM_RECLAIM`` 1908c2ecf20Sopenharmony_ci All wq which might be used in the memory reclaim paths **MUST** 1918c2ecf20Sopenharmony_ci have this flag set. The wq is guaranteed to have at least one 1928c2ecf20Sopenharmony_ci execution context regardless of memory pressure. 1938c2ecf20Sopenharmony_ci 1948c2ecf20Sopenharmony_ci``WQ_HIGHPRI`` 1958c2ecf20Sopenharmony_ci Work items of a highpri wq are queued to the highpri 1968c2ecf20Sopenharmony_ci worker-pool of the target cpu. Highpri worker-pools are 1978c2ecf20Sopenharmony_ci served by worker threads with elevated nice level. 1988c2ecf20Sopenharmony_ci 1998c2ecf20Sopenharmony_ci Note that normal and highpri worker-pools don't interact with 2008c2ecf20Sopenharmony_ci each other. Each maintains its separate pool of workers and 2018c2ecf20Sopenharmony_ci implements concurrency management among its workers. 2028c2ecf20Sopenharmony_ci 2038c2ecf20Sopenharmony_ci``WQ_CPU_INTENSIVE`` 2048c2ecf20Sopenharmony_ci Work items of a CPU intensive wq do not contribute to the 2058c2ecf20Sopenharmony_ci concurrency level. In other words, runnable CPU intensive 2068c2ecf20Sopenharmony_ci work items will not prevent other work items in the same 2078c2ecf20Sopenharmony_ci worker-pool from starting execution. This is useful for bound 2088c2ecf20Sopenharmony_ci work items which are expected to hog CPU cycles so that their 2098c2ecf20Sopenharmony_ci execution is regulated by the system scheduler. 2108c2ecf20Sopenharmony_ci 2118c2ecf20Sopenharmony_ci Although CPU intensive work items don't contribute to the 2128c2ecf20Sopenharmony_ci concurrency level, start of their executions is still 2138c2ecf20Sopenharmony_ci regulated by the concurrency management and runnable 2148c2ecf20Sopenharmony_ci non-CPU-intensive work items can delay execution of CPU 2158c2ecf20Sopenharmony_ci intensive work items. 2168c2ecf20Sopenharmony_ci 2178c2ecf20Sopenharmony_ci This flag is meaningless for unbound wq. 2188c2ecf20Sopenharmony_ci 2198c2ecf20Sopenharmony_ciNote that the flag ``WQ_NON_REENTRANT`` no longer exists as all 2208c2ecf20Sopenharmony_ciworkqueues are now non-reentrant - any work item is guaranteed to be 2218c2ecf20Sopenharmony_ciexecuted by at most one worker system-wide at any given time. 2228c2ecf20Sopenharmony_ci 2238c2ecf20Sopenharmony_ci 2248c2ecf20Sopenharmony_ci``max_active`` 2258c2ecf20Sopenharmony_ci-------------- 2268c2ecf20Sopenharmony_ci 2278c2ecf20Sopenharmony_ci``@max_active`` determines the maximum number of execution contexts 2288c2ecf20Sopenharmony_ciper CPU which can be assigned to the work items of a wq. For example, 2298c2ecf20Sopenharmony_ciwith ``@max_active`` of 16, at most 16 work items of the wq can be 2308c2ecf20Sopenharmony_ciexecuting at the same time per CPU. 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ciCurrently, for a bound wq, the maximum limit for ``@max_active`` is 2338c2ecf20Sopenharmony_ci512 and the default value used when 0 is specified is 256. For an 2348c2ecf20Sopenharmony_ciunbound wq, the limit is higher of 512 and 4 * 2358c2ecf20Sopenharmony_ci``num_possible_cpus()``. These values are chosen sufficiently high 2368c2ecf20Sopenharmony_cisuch that they are not the limiting factor while providing protection 2378c2ecf20Sopenharmony_ciin runaway cases. 2388c2ecf20Sopenharmony_ci 2398c2ecf20Sopenharmony_ciThe number of active work items of a wq is usually regulated by the 2408c2ecf20Sopenharmony_ciusers of the wq, more specifically, by how many work items the users 2418c2ecf20Sopenharmony_cimay queue at the same time. Unless there is a specific need for 2428c2ecf20Sopenharmony_cithrottling the number of active work items, specifying '0' is 2438c2ecf20Sopenharmony_cirecommended. 2448c2ecf20Sopenharmony_ci 2458c2ecf20Sopenharmony_ciSome users depend on the strict execution ordering of ST wq. The 2468c2ecf20Sopenharmony_cicombination of ``@max_active`` of 1 and ``WQ_UNBOUND`` used to 2478c2ecf20Sopenharmony_ciachieve this behavior. Work items on such wq were always queued to the 2488c2ecf20Sopenharmony_ciunbound worker-pools and only one work item could be active at any given 2498c2ecf20Sopenharmony_citime thus achieving the same ordering property as ST wq. 2508c2ecf20Sopenharmony_ci 2518c2ecf20Sopenharmony_ciIn the current implementation the above configuration only guarantees 2528c2ecf20Sopenharmony_ciST behavior within a given NUMA node. Instead ``alloc_ordered_queue()`` should 2538c2ecf20Sopenharmony_cibe used to achieve system-wide ST behavior. 2548c2ecf20Sopenharmony_ci 2558c2ecf20Sopenharmony_ci 2568c2ecf20Sopenharmony_ciExample Execution Scenarios 2578c2ecf20Sopenharmony_ci=========================== 2588c2ecf20Sopenharmony_ci 2598c2ecf20Sopenharmony_ciThe following example execution scenarios try to illustrate how cmwq 2608c2ecf20Sopenharmony_cibehave under different configurations. 2618c2ecf20Sopenharmony_ci 2628c2ecf20Sopenharmony_ci Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU. 2638c2ecf20Sopenharmony_ci w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms 2648c2ecf20Sopenharmony_ci again before finishing. w1 and w2 burn CPU for 5ms then sleep for 2658c2ecf20Sopenharmony_ci 10ms. 2668c2ecf20Sopenharmony_ci 2678c2ecf20Sopenharmony_ciIgnoring all other tasks, works and processing overhead, and assuming 2688c2ecf20Sopenharmony_cisimple FIFO scheduling, the following is one highly simplified version 2698c2ecf20Sopenharmony_ciof possible sequences of events with the original wq. :: 2708c2ecf20Sopenharmony_ci 2718c2ecf20Sopenharmony_ci TIME IN MSECS EVENT 2728c2ecf20Sopenharmony_ci 0 w0 starts and burns CPU 2738c2ecf20Sopenharmony_ci 5 w0 sleeps 2748c2ecf20Sopenharmony_ci 15 w0 wakes up and burns CPU 2758c2ecf20Sopenharmony_ci 20 w0 finishes 2768c2ecf20Sopenharmony_ci 20 w1 starts and burns CPU 2778c2ecf20Sopenharmony_ci 25 w1 sleeps 2788c2ecf20Sopenharmony_ci 35 w1 wakes up and finishes 2798c2ecf20Sopenharmony_ci 35 w2 starts and burns CPU 2808c2ecf20Sopenharmony_ci 40 w2 sleeps 2818c2ecf20Sopenharmony_ci 50 w2 wakes up and finishes 2828c2ecf20Sopenharmony_ci 2838c2ecf20Sopenharmony_ciAnd with cmwq with ``@max_active`` >= 3, :: 2848c2ecf20Sopenharmony_ci 2858c2ecf20Sopenharmony_ci TIME IN MSECS EVENT 2868c2ecf20Sopenharmony_ci 0 w0 starts and burns CPU 2878c2ecf20Sopenharmony_ci 5 w0 sleeps 2888c2ecf20Sopenharmony_ci 5 w1 starts and burns CPU 2898c2ecf20Sopenharmony_ci 10 w1 sleeps 2908c2ecf20Sopenharmony_ci 10 w2 starts and burns CPU 2918c2ecf20Sopenharmony_ci 15 w2 sleeps 2928c2ecf20Sopenharmony_ci 15 w0 wakes up and burns CPU 2938c2ecf20Sopenharmony_ci 20 w0 finishes 2948c2ecf20Sopenharmony_ci 20 w1 wakes up and finishes 2958c2ecf20Sopenharmony_ci 25 w2 wakes up and finishes 2968c2ecf20Sopenharmony_ci 2978c2ecf20Sopenharmony_ciIf ``@max_active`` == 2, :: 2988c2ecf20Sopenharmony_ci 2998c2ecf20Sopenharmony_ci TIME IN MSECS EVENT 3008c2ecf20Sopenharmony_ci 0 w0 starts and burns CPU 3018c2ecf20Sopenharmony_ci 5 w0 sleeps 3028c2ecf20Sopenharmony_ci 5 w1 starts and burns CPU 3038c2ecf20Sopenharmony_ci 10 w1 sleeps 3048c2ecf20Sopenharmony_ci 15 w0 wakes up and burns CPU 3058c2ecf20Sopenharmony_ci 20 w0 finishes 3068c2ecf20Sopenharmony_ci 20 w1 wakes up and finishes 3078c2ecf20Sopenharmony_ci 20 w2 starts and burns CPU 3088c2ecf20Sopenharmony_ci 25 w2 sleeps 3098c2ecf20Sopenharmony_ci 35 w2 wakes up and finishes 3108c2ecf20Sopenharmony_ci 3118c2ecf20Sopenharmony_ciNow, let's assume w1 and w2 are queued to a different wq q1 which has 3128c2ecf20Sopenharmony_ci``WQ_CPU_INTENSIVE`` set, :: 3138c2ecf20Sopenharmony_ci 3148c2ecf20Sopenharmony_ci TIME IN MSECS EVENT 3158c2ecf20Sopenharmony_ci 0 w0 starts and burns CPU 3168c2ecf20Sopenharmony_ci 5 w0 sleeps 3178c2ecf20Sopenharmony_ci 5 w1 and w2 start and burn CPU 3188c2ecf20Sopenharmony_ci 10 w1 sleeps 3198c2ecf20Sopenharmony_ci 15 w2 sleeps 3208c2ecf20Sopenharmony_ci 15 w0 wakes up and burns CPU 3218c2ecf20Sopenharmony_ci 20 w0 finishes 3228c2ecf20Sopenharmony_ci 20 w1 wakes up and finishes 3238c2ecf20Sopenharmony_ci 25 w2 wakes up and finishes 3248c2ecf20Sopenharmony_ci 3258c2ecf20Sopenharmony_ci 3268c2ecf20Sopenharmony_ciGuidelines 3278c2ecf20Sopenharmony_ci========== 3288c2ecf20Sopenharmony_ci 3298c2ecf20Sopenharmony_ci* Do not forget to use ``WQ_MEM_RECLAIM`` if a wq may process work 3308c2ecf20Sopenharmony_ci items which are used during memory reclaim. Each wq with 3318c2ecf20Sopenharmony_ci ``WQ_MEM_RECLAIM`` set has an execution context reserved for it. If 3328c2ecf20Sopenharmony_ci there is dependency among multiple work items used during memory 3338c2ecf20Sopenharmony_ci reclaim, they should be queued to separate wq each with 3348c2ecf20Sopenharmony_ci ``WQ_MEM_RECLAIM``. 3358c2ecf20Sopenharmony_ci 3368c2ecf20Sopenharmony_ci* Unless strict ordering is required, there is no need to use ST wq. 3378c2ecf20Sopenharmony_ci 3388c2ecf20Sopenharmony_ci* Unless there is a specific need, using 0 for @max_active is 3398c2ecf20Sopenharmony_ci recommended. In most use cases, concurrency level usually stays 3408c2ecf20Sopenharmony_ci well under the default limit. 3418c2ecf20Sopenharmony_ci 3428c2ecf20Sopenharmony_ci* A wq serves as a domain for forward progress guarantee 3438c2ecf20Sopenharmony_ci (``WQ_MEM_RECLAIM``, flush and work item attributes. Work items 3448c2ecf20Sopenharmony_ci which are not involved in memory reclaim and don't need to be 3458c2ecf20Sopenharmony_ci flushed as a part of a group of work items, and don't require any 3468c2ecf20Sopenharmony_ci special attribute, can use one of the system wq. There is no 3478c2ecf20Sopenharmony_ci difference in execution characteristics between using a dedicated wq 3488c2ecf20Sopenharmony_ci and a system wq. 3498c2ecf20Sopenharmony_ci 3508c2ecf20Sopenharmony_ci* Unless work items are expected to consume a huge amount of CPU 3518c2ecf20Sopenharmony_ci cycles, using a bound wq is usually beneficial due to the increased 3528c2ecf20Sopenharmony_ci level of locality in wq operations and work item execution. 3538c2ecf20Sopenharmony_ci 3548c2ecf20Sopenharmony_ci 3558c2ecf20Sopenharmony_ciDebugging 3568c2ecf20Sopenharmony_ci========= 3578c2ecf20Sopenharmony_ci 3588c2ecf20Sopenharmony_ciBecause the work functions are executed by generic worker threads 3598c2ecf20Sopenharmony_cithere are a few tricks needed to shed some light on misbehaving 3608c2ecf20Sopenharmony_ciworkqueue users. 3618c2ecf20Sopenharmony_ci 3628c2ecf20Sopenharmony_ciWorker threads show up in the process list as: :: 3638c2ecf20Sopenharmony_ci 3648c2ecf20Sopenharmony_ci root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1] 3658c2ecf20Sopenharmony_ci root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2] 3668c2ecf20Sopenharmony_ci root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0] 3678c2ecf20Sopenharmony_ci root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0] 3688c2ecf20Sopenharmony_ci 3698c2ecf20Sopenharmony_ciIf kworkers are going crazy (using too much cpu), there are two types 3708c2ecf20Sopenharmony_ciof possible problems: 3718c2ecf20Sopenharmony_ci 3728c2ecf20Sopenharmony_ci 1. Something being scheduled in rapid succession 3738c2ecf20Sopenharmony_ci 2. A single work item that consumes lots of cpu cycles 3748c2ecf20Sopenharmony_ci 3758c2ecf20Sopenharmony_ciThe first one can be tracked using tracing: :: 3768c2ecf20Sopenharmony_ci 3778c2ecf20Sopenharmony_ci $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event 3788c2ecf20Sopenharmony_ci $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt 3798c2ecf20Sopenharmony_ci (wait a few secs) 3808c2ecf20Sopenharmony_ci ^C 3818c2ecf20Sopenharmony_ci 3828c2ecf20Sopenharmony_ciIf something is busy looping on work queueing, it would be dominating 3838c2ecf20Sopenharmony_cithe output and the offender can be determined with the work item 3848c2ecf20Sopenharmony_cifunction. 3858c2ecf20Sopenharmony_ci 3868c2ecf20Sopenharmony_ciFor the second type of problems it should be possible to just check 3878c2ecf20Sopenharmony_cithe stack trace of the offending worker thread. :: 3888c2ecf20Sopenharmony_ci 3898c2ecf20Sopenharmony_ci $ cat /proc/THE_OFFENDING_KWORKER/stack 3908c2ecf20Sopenharmony_ci 3918c2ecf20Sopenharmony_ciThe work item's function should be trivially visible in the stack 3928c2ecf20Sopenharmony_citrace. 3938c2ecf20Sopenharmony_ci 3948c2ecf20Sopenharmony_ci 3958c2ecf20Sopenharmony_ciKernel Inline Documentations Reference 3968c2ecf20Sopenharmony_ci====================================== 3978c2ecf20Sopenharmony_ci 3988c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/workqueue.h 3998c2ecf20Sopenharmony_ci 4008c2ecf20Sopenharmony_ci.. kernel-doc:: kernel/workqueue.c 401