162306a36Sopenharmony_ci========= 262306a36Sopenharmony_ciWorkqueue 362306a36Sopenharmony_ci========= 462306a36Sopenharmony_ci 562306a36Sopenharmony_ci:Date: September, 2010 662306a36Sopenharmony_ci:Author: Tejun Heo <tj@kernel.org> 762306a36Sopenharmony_ci:Author: Florian Mickler <florian@mickler.org> 862306a36Sopenharmony_ci 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciIntroduction 1162306a36Sopenharmony_ci============ 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciThere are many cases where an asynchronous process execution context 1462306a36Sopenharmony_ciis needed and the workqueue (wq) API is the most commonly used 1562306a36Sopenharmony_cimechanism for such cases. 1662306a36Sopenharmony_ci 1762306a36Sopenharmony_ciWhen such an asynchronous execution context is needed, a work item 1862306a36Sopenharmony_cidescribing which function to execute is put on a queue. An 1962306a36Sopenharmony_ciindependent thread serves as the asynchronous execution context. The 2062306a36Sopenharmony_ciqueue is called workqueue and the thread is called worker. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciWhile there are work items on the workqueue the worker executes the 2362306a36Sopenharmony_cifunctions associated with the work items one after the other. When 2462306a36Sopenharmony_cithere is no work item left on the workqueue the worker becomes idle. 2562306a36Sopenharmony_ciWhen a new work item gets queued, the worker begins executing again. 2662306a36Sopenharmony_ci 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ciWhy Concurrency Managed Workqueue? 2962306a36Sopenharmony_ci================================== 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciIn the original wq implementation, a multi threaded (MT) wq had one 3262306a36Sopenharmony_ciworker thread per CPU and a single threaded (ST) wq had one worker 3362306a36Sopenharmony_cithread system-wide. A single MT wq needed to keep around the same 3462306a36Sopenharmony_cinumber of workers as the number of CPUs. The kernel grew a lot of MT 3562306a36Sopenharmony_ciwq users over the years and with the number of CPU cores continuously 3662306a36Sopenharmony_cirising, some systems saturated the default 32k PID space just booting 3762306a36Sopenharmony_ciup. 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ciAlthough MT wq wasted a lot of resource, the level of concurrency 4062306a36Sopenharmony_ciprovided was unsatisfactory. The limitation was common to both ST and 4162306a36Sopenharmony_ciMT wq albeit less severe on MT. Each wq maintained its own separate 4262306a36Sopenharmony_ciworker pool. An MT wq could provide only one execution context per CPU 4362306a36Sopenharmony_ciwhile an ST wq one for the whole system. Work items had to compete for 4462306a36Sopenharmony_cithose very limited execution contexts leading to various problems 4562306a36Sopenharmony_ciincluding proneness to deadlocks around the single execution context. 4662306a36Sopenharmony_ci 4762306a36Sopenharmony_ciThe tension between the provided level of concurrency and resource 4862306a36Sopenharmony_ciusage also forced its users to make unnecessary tradeoffs like libata 4962306a36Sopenharmony_cichoosing to use ST wq for polling PIOs and accepting an unnecessary 5062306a36Sopenharmony_cilimitation that no two polling PIOs can progress at the same time. As 5162306a36Sopenharmony_ciMT wq don't provide much better concurrency, users which require 5262306a36Sopenharmony_cihigher level of concurrency, like async or fscache, had to implement 5362306a36Sopenharmony_citheir own thread pool. 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ciConcurrency Managed Workqueue (cmwq) is a reimplementation of wq with 5662306a36Sopenharmony_cifocus on the following goals. 5762306a36Sopenharmony_ci 5862306a36Sopenharmony_ci* Maintain compatibility with the original workqueue API. 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ci* Use per-CPU unified worker pools shared by all wq to provide 6162306a36Sopenharmony_ci flexible level of concurrency on demand without wasting a lot of 6262306a36Sopenharmony_ci resource. 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ci* Automatically regulate worker pool and level of concurrency so that 6562306a36Sopenharmony_ci the API users don't need to worry about such details. 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ciThe Design 6962306a36Sopenharmony_ci========== 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ciIn order to ease the asynchronous execution of functions a new 7262306a36Sopenharmony_ciabstraction, the work item, is introduced. 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ciA work item is a simple struct that holds a pointer to the function 7562306a36Sopenharmony_cithat is to be executed asynchronously. Whenever a driver or subsystem 7662306a36Sopenharmony_ciwants a function to be executed asynchronously it has to set up a work 7762306a36Sopenharmony_ciitem pointing to that function and queue that work item on a 7862306a36Sopenharmony_ciworkqueue. 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ciSpecial purpose threads, called worker threads, execute the functions 8162306a36Sopenharmony_cioff of the queue, one after the other. If no work is queued, the 8262306a36Sopenharmony_ciworker threads become idle. These worker threads are managed in so 8362306a36Sopenharmony_cicalled worker-pools. 8462306a36Sopenharmony_ci 8562306a36Sopenharmony_ciThe cmwq design differentiates between the user-facing workqueues that 8662306a36Sopenharmony_cisubsystems and drivers queue work items on and the backend mechanism 8762306a36Sopenharmony_ciwhich manages worker-pools and processes the queued work items. 8862306a36Sopenharmony_ci 8962306a36Sopenharmony_ciThere are two worker-pools, one for normal work items and the other 9062306a36Sopenharmony_cifor high priority ones, for each possible CPU and some extra 9162306a36Sopenharmony_ciworker-pools to serve work items queued on unbound workqueues - the 9262306a36Sopenharmony_cinumber of these backing pools is dynamic. 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ciSubsystems and drivers can create and queue work items through special 9562306a36Sopenharmony_ciworkqueue API functions as they see fit. They can influence some 9662306a36Sopenharmony_ciaspects of the way the work items are executed by setting flags on the 9762306a36Sopenharmony_ciworkqueue they are putting the work item on. These flags include 9862306a36Sopenharmony_cithings like CPU locality, concurrency limits, priority and more. To 9962306a36Sopenharmony_ciget a detailed overview refer to the API description of 10062306a36Sopenharmony_ci``alloc_workqueue()`` below. 10162306a36Sopenharmony_ci 10262306a36Sopenharmony_ciWhen a work item is queued to a workqueue, the target worker-pool is 10362306a36Sopenharmony_cidetermined according to the queue parameters and workqueue attributes 10462306a36Sopenharmony_ciand appended on the shared worklist of the worker-pool. For example, 10562306a36Sopenharmony_ciunless specifically overridden, a work item of a bound workqueue will 10662306a36Sopenharmony_cibe queued on the worklist of either normal or highpri worker-pool that 10762306a36Sopenharmony_ciis associated to the CPU the issuer is running on. 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ciFor any worker pool implementation, managing the concurrency level 11062306a36Sopenharmony_ci(how many execution contexts are active) is an important issue. cmwq 11162306a36Sopenharmony_citries to keep the concurrency at a minimal but sufficient level. 11262306a36Sopenharmony_ciMinimal to save resources and sufficient in that the system is used at 11362306a36Sopenharmony_ciits full capacity. 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ciEach worker-pool bound to an actual CPU implements concurrency 11662306a36Sopenharmony_cimanagement by hooking into the scheduler. The worker-pool is notified 11762306a36Sopenharmony_ciwhenever an active worker wakes up or sleeps and keeps track of the 11862306a36Sopenharmony_cinumber of the currently runnable workers. Generally, work items are 11962306a36Sopenharmony_cinot expected to hog a CPU and consume many cycles. That means 12062306a36Sopenharmony_cimaintaining just enough concurrency to prevent work processing from 12162306a36Sopenharmony_cistalling should be optimal. As long as there are one or more runnable 12262306a36Sopenharmony_ciworkers on the CPU, the worker-pool doesn't start execution of a new 12362306a36Sopenharmony_ciwork, but, when the last running worker goes to sleep, it immediately 12462306a36Sopenharmony_cischedules a new worker so that the CPU doesn't sit idle while there 12562306a36Sopenharmony_ciare pending work items. This allows using a minimal number of workers 12662306a36Sopenharmony_ciwithout losing execution bandwidth. 12762306a36Sopenharmony_ci 12862306a36Sopenharmony_ciKeeping idle workers around doesn't cost other than the memory space 12962306a36Sopenharmony_cifor kthreads, so cmwq holds onto idle ones for a while before killing 13062306a36Sopenharmony_cithem. 13162306a36Sopenharmony_ci 13262306a36Sopenharmony_ciFor unbound workqueues, the number of backing pools is dynamic. 13362306a36Sopenharmony_ciUnbound workqueue can be assigned custom attributes using 13462306a36Sopenharmony_ci``apply_workqueue_attrs()`` and workqueue will automatically create 13562306a36Sopenharmony_cibacking worker pools matching the attributes. The responsibility of 13662306a36Sopenharmony_ciregulating concurrency level is on the users. There is also a flag to 13762306a36Sopenharmony_cimark a bound wq to ignore the concurrency management. Please refer to 13862306a36Sopenharmony_cithe API section for details. 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ciForward progress guarantee relies on that workers can be created when 14162306a36Sopenharmony_cimore execution contexts are necessary, which in turn is guaranteed 14262306a36Sopenharmony_cithrough the use of rescue workers. All work items which might be used 14362306a36Sopenharmony_cion code paths that handle memory reclaim are required to be queued on 14462306a36Sopenharmony_ciwq's that have a rescue-worker reserved for execution under memory 14562306a36Sopenharmony_cipressure. Else it is possible that the worker-pool deadlocks waiting 14662306a36Sopenharmony_cifor execution contexts to free up. 14762306a36Sopenharmony_ci 14862306a36Sopenharmony_ci 14962306a36Sopenharmony_ciApplication Programming Interface (API) 15062306a36Sopenharmony_ci======================================= 15162306a36Sopenharmony_ci 15262306a36Sopenharmony_ci``alloc_workqueue()`` allocates a wq. The original 15362306a36Sopenharmony_ci``create_*workqueue()`` functions are deprecated and scheduled for 15462306a36Sopenharmony_ciremoval. ``alloc_workqueue()`` takes three arguments - ``@name``, 15562306a36Sopenharmony_ci``@flags`` and ``@max_active``. ``@name`` is the name of the wq and 15662306a36Sopenharmony_cialso used as the name of the rescuer thread if there is one. 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ciA wq no longer manages execution resources but serves as a domain for 15962306a36Sopenharmony_ciforward progress guarantee, flush and work item attributes. ``@flags`` 16062306a36Sopenharmony_ciand ``@max_active`` control how work items are assigned execution 16162306a36Sopenharmony_ciresources, scheduled and executed. 16262306a36Sopenharmony_ci 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ci``flags`` 16562306a36Sopenharmony_ci--------- 16662306a36Sopenharmony_ci 16762306a36Sopenharmony_ci``WQ_UNBOUND`` 16862306a36Sopenharmony_ci Work items queued to an unbound wq are served by the special 16962306a36Sopenharmony_ci worker-pools which host workers which are not bound to any 17062306a36Sopenharmony_ci specific CPU. This makes the wq behave as a simple execution 17162306a36Sopenharmony_ci context provider without concurrency management. The unbound 17262306a36Sopenharmony_ci worker-pools try to start execution of work items as soon as 17362306a36Sopenharmony_ci possible. Unbound wq sacrifices locality but is useful for 17462306a36Sopenharmony_ci the following cases. 17562306a36Sopenharmony_ci 17662306a36Sopenharmony_ci * Wide fluctuation in the concurrency level requirement is 17762306a36Sopenharmony_ci expected and using bound wq may end up creating large number 17862306a36Sopenharmony_ci of mostly unused workers across different CPUs as the issuer 17962306a36Sopenharmony_ci hops through different CPUs. 18062306a36Sopenharmony_ci 18162306a36Sopenharmony_ci * Long running CPU intensive workloads which can be better 18262306a36Sopenharmony_ci managed by the system scheduler. 18362306a36Sopenharmony_ci 18462306a36Sopenharmony_ci``WQ_FREEZABLE`` 18562306a36Sopenharmony_ci A freezable wq participates in the freeze phase of the system 18662306a36Sopenharmony_ci suspend operations. Work items on the wq are drained and no 18762306a36Sopenharmony_ci new work item starts execution until thawed. 18862306a36Sopenharmony_ci 18962306a36Sopenharmony_ci``WQ_MEM_RECLAIM`` 19062306a36Sopenharmony_ci All wq which might be used in the memory reclaim paths **MUST** 19162306a36Sopenharmony_ci have this flag set. The wq is guaranteed to have at least one 19262306a36Sopenharmony_ci execution context regardless of memory pressure. 19362306a36Sopenharmony_ci 19462306a36Sopenharmony_ci``WQ_HIGHPRI`` 19562306a36Sopenharmony_ci Work items of a highpri wq are queued to the highpri 19662306a36Sopenharmony_ci worker-pool of the target cpu. Highpri worker-pools are 19762306a36Sopenharmony_ci served by worker threads with elevated nice level. 19862306a36Sopenharmony_ci 19962306a36Sopenharmony_ci Note that normal and highpri worker-pools don't interact with 20062306a36Sopenharmony_ci each other. Each maintains its separate pool of workers and 20162306a36Sopenharmony_ci implements concurrency management among its workers. 20262306a36Sopenharmony_ci 20362306a36Sopenharmony_ci``WQ_CPU_INTENSIVE`` 20462306a36Sopenharmony_ci Work items of a CPU intensive wq do not contribute to the 20562306a36Sopenharmony_ci concurrency level. In other words, runnable CPU intensive 20662306a36Sopenharmony_ci work items will not prevent other work items in the same 20762306a36Sopenharmony_ci worker-pool from starting execution. This is useful for bound 20862306a36Sopenharmony_ci work items which are expected to hog CPU cycles so that their 20962306a36Sopenharmony_ci execution is regulated by the system scheduler. 21062306a36Sopenharmony_ci 21162306a36Sopenharmony_ci Although CPU intensive work items don't contribute to the 21262306a36Sopenharmony_ci concurrency level, start of their executions is still 21362306a36Sopenharmony_ci regulated by the concurrency management and runnable 21462306a36Sopenharmony_ci non-CPU-intensive work items can delay execution of CPU 21562306a36Sopenharmony_ci intensive work items. 21662306a36Sopenharmony_ci 21762306a36Sopenharmony_ci This flag is meaningless for unbound wq. 21862306a36Sopenharmony_ci 21962306a36Sopenharmony_ci 22062306a36Sopenharmony_ci``max_active`` 22162306a36Sopenharmony_ci-------------- 22262306a36Sopenharmony_ci 22362306a36Sopenharmony_ci``@max_active`` determines the maximum number of execution contexts per 22462306a36Sopenharmony_ciCPU which can be assigned to the work items of a wq. For example, with 22562306a36Sopenharmony_ci``@max_active`` of 16, at most 16 work items of the wq can be executing 22662306a36Sopenharmony_ciat the same time per CPU. This is always a per-CPU attribute, even for 22762306a36Sopenharmony_ciunbound workqueues. 22862306a36Sopenharmony_ci 22962306a36Sopenharmony_ciThe maximum limit for ``@max_active`` is 512 and the default value used 23062306a36Sopenharmony_ciwhen 0 is specified is 256. These values are chosen sufficiently high 23162306a36Sopenharmony_cisuch that they are not the limiting factor while providing protection in 23262306a36Sopenharmony_cirunaway cases. 23362306a36Sopenharmony_ci 23462306a36Sopenharmony_ciThe number of active work items of a wq is usually regulated by the 23562306a36Sopenharmony_ciusers of the wq, more specifically, by how many work items the users 23662306a36Sopenharmony_cimay queue at the same time. Unless there is a specific need for 23762306a36Sopenharmony_cithrottling the number of active work items, specifying '0' is 23862306a36Sopenharmony_cirecommended. 23962306a36Sopenharmony_ci 24062306a36Sopenharmony_ciSome users depend on the strict execution ordering of ST wq. The 24162306a36Sopenharmony_cicombination of ``@max_active`` of 1 and ``WQ_UNBOUND`` used to 24262306a36Sopenharmony_ciachieve this behavior. Work items on such wq were always queued to the 24362306a36Sopenharmony_ciunbound worker-pools and only one work item could be active at any given 24462306a36Sopenharmony_citime thus achieving the same ordering property as ST wq. 24562306a36Sopenharmony_ci 24662306a36Sopenharmony_ciIn the current implementation the above configuration only guarantees 24762306a36Sopenharmony_ciST behavior within a given NUMA node. Instead ``alloc_ordered_workqueue()`` should 24862306a36Sopenharmony_cibe used to achieve system-wide ST behavior. 24962306a36Sopenharmony_ci 25062306a36Sopenharmony_ci 25162306a36Sopenharmony_ciExample Execution Scenarios 25262306a36Sopenharmony_ci=========================== 25362306a36Sopenharmony_ci 25462306a36Sopenharmony_ciThe following example execution scenarios try to illustrate how cmwq 25562306a36Sopenharmony_cibehave under different configurations. 25662306a36Sopenharmony_ci 25762306a36Sopenharmony_ci Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU. 25862306a36Sopenharmony_ci w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms 25962306a36Sopenharmony_ci again before finishing. w1 and w2 burn CPU for 5ms then sleep for 26062306a36Sopenharmony_ci 10ms. 26162306a36Sopenharmony_ci 26262306a36Sopenharmony_ciIgnoring all other tasks, works and processing overhead, and assuming 26362306a36Sopenharmony_cisimple FIFO scheduling, the following is one highly simplified version 26462306a36Sopenharmony_ciof possible sequences of events with the original wq. :: 26562306a36Sopenharmony_ci 26662306a36Sopenharmony_ci TIME IN MSECS EVENT 26762306a36Sopenharmony_ci 0 w0 starts and burns CPU 26862306a36Sopenharmony_ci 5 w0 sleeps 26962306a36Sopenharmony_ci 15 w0 wakes up and burns CPU 27062306a36Sopenharmony_ci 20 w0 finishes 27162306a36Sopenharmony_ci 20 w1 starts and burns CPU 27262306a36Sopenharmony_ci 25 w1 sleeps 27362306a36Sopenharmony_ci 35 w1 wakes up and finishes 27462306a36Sopenharmony_ci 35 w2 starts and burns CPU 27562306a36Sopenharmony_ci 40 w2 sleeps 27662306a36Sopenharmony_ci 50 w2 wakes up and finishes 27762306a36Sopenharmony_ci 27862306a36Sopenharmony_ciAnd with cmwq with ``@max_active`` >= 3, :: 27962306a36Sopenharmony_ci 28062306a36Sopenharmony_ci TIME IN MSECS EVENT 28162306a36Sopenharmony_ci 0 w0 starts and burns CPU 28262306a36Sopenharmony_ci 5 w0 sleeps 28362306a36Sopenharmony_ci 5 w1 starts and burns CPU 28462306a36Sopenharmony_ci 10 w1 sleeps 28562306a36Sopenharmony_ci 10 w2 starts and burns CPU 28662306a36Sopenharmony_ci 15 w2 sleeps 28762306a36Sopenharmony_ci 15 w0 wakes up and burns CPU 28862306a36Sopenharmony_ci 20 w0 finishes 28962306a36Sopenharmony_ci 20 w1 wakes up and finishes 29062306a36Sopenharmony_ci 25 w2 wakes up and finishes 29162306a36Sopenharmony_ci 29262306a36Sopenharmony_ciIf ``@max_active`` == 2, :: 29362306a36Sopenharmony_ci 29462306a36Sopenharmony_ci TIME IN MSECS EVENT 29562306a36Sopenharmony_ci 0 w0 starts and burns CPU 29662306a36Sopenharmony_ci 5 w0 sleeps 29762306a36Sopenharmony_ci 5 w1 starts and burns CPU 29862306a36Sopenharmony_ci 10 w1 sleeps 29962306a36Sopenharmony_ci 15 w0 wakes up and burns CPU 30062306a36Sopenharmony_ci 20 w0 finishes 30162306a36Sopenharmony_ci 20 w1 wakes up and finishes 30262306a36Sopenharmony_ci 20 w2 starts and burns CPU 30362306a36Sopenharmony_ci 25 w2 sleeps 30462306a36Sopenharmony_ci 35 w2 wakes up and finishes 30562306a36Sopenharmony_ci 30662306a36Sopenharmony_ciNow, let's assume w1 and w2 are queued to a different wq q1 which has 30762306a36Sopenharmony_ci``WQ_CPU_INTENSIVE`` set, :: 30862306a36Sopenharmony_ci 30962306a36Sopenharmony_ci TIME IN MSECS EVENT 31062306a36Sopenharmony_ci 0 w0 starts and burns CPU 31162306a36Sopenharmony_ci 5 w0 sleeps 31262306a36Sopenharmony_ci 5 w1 and w2 start and burn CPU 31362306a36Sopenharmony_ci 10 w1 sleeps 31462306a36Sopenharmony_ci 15 w2 sleeps 31562306a36Sopenharmony_ci 15 w0 wakes up and burns CPU 31662306a36Sopenharmony_ci 20 w0 finishes 31762306a36Sopenharmony_ci 20 w1 wakes up and finishes 31862306a36Sopenharmony_ci 25 w2 wakes up and finishes 31962306a36Sopenharmony_ci 32062306a36Sopenharmony_ci 32162306a36Sopenharmony_ciGuidelines 32262306a36Sopenharmony_ci========== 32362306a36Sopenharmony_ci 32462306a36Sopenharmony_ci* Do not forget to use ``WQ_MEM_RECLAIM`` if a wq may process work 32562306a36Sopenharmony_ci items which are used during memory reclaim. Each wq with 32662306a36Sopenharmony_ci ``WQ_MEM_RECLAIM`` set has an execution context reserved for it. If 32762306a36Sopenharmony_ci there is dependency among multiple work items used during memory 32862306a36Sopenharmony_ci reclaim, they should be queued to separate wq each with 32962306a36Sopenharmony_ci ``WQ_MEM_RECLAIM``. 33062306a36Sopenharmony_ci 33162306a36Sopenharmony_ci* Unless strict ordering is required, there is no need to use ST wq. 33262306a36Sopenharmony_ci 33362306a36Sopenharmony_ci* Unless there is a specific need, using 0 for @max_active is 33462306a36Sopenharmony_ci recommended. In most use cases, concurrency level usually stays 33562306a36Sopenharmony_ci well under the default limit. 33662306a36Sopenharmony_ci 33762306a36Sopenharmony_ci* A wq serves as a domain for forward progress guarantee 33862306a36Sopenharmony_ci (``WQ_MEM_RECLAIM``, flush and work item attributes. Work items 33962306a36Sopenharmony_ci which are not involved in memory reclaim and don't need to be 34062306a36Sopenharmony_ci flushed as a part of a group of work items, and don't require any 34162306a36Sopenharmony_ci special attribute, can use one of the system wq. There is no 34262306a36Sopenharmony_ci difference in execution characteristics between using a dedicated wq 34362306a36Sopenharmony_ci and a system wq. 34462306a36Sopenharmony_ci 34562306a36Sopenharmony_ci* Unless work items are expected to consume a huge amount of CPU 34662306a36Sopenharmony_ci cycles, using a bound wq is usually beneficial due to the increased 34762306a36Sopenharmony_ci level of locality in wq operations and work item execution. 34862306a36Sopenharmony_ci 34962306a36Sopenharmony_ci 35062306a36Sopenharmony_ciAffinity Scopes 35162306a36Sopenharmony_ci=============== 35262306a36Sopenharmony_ci 35362306a36Sopenharmony_ciAn unbound workqueue groups CPUs according to its affinity scope to improve 35462306a36Sopenharmony_cicache locality. For example, if a workqueue is using the default affinity 35562306a36Sopenharmony_ciscope of "cache", it will group CPUs according to last level cache 35662306a36Sopenharmony_ciboundaries. A work item queued on the workqueue will be assigned to a worker 35762306a36Sopenharmony_cion one of the CPUs which share the last level cache with the issuing CPU. 35862306a36Sopenharmony_ciOnce started, the worker may or may not be allowed to move outside the scope 35962306a36Sopenharmony_cidepending on the ``affinity_strict`` setting of the scope. 36062306a36Sopenharmony_ci 36162306a36Sopenharmony_ciWorkqueue currently supports the following affinity scopes. 36262306a36Sopenharmony_ci 36362306a36Sopenharmony_ci``default`` 36462306a36Sopenharmony_ci Use the scope in module parameter ``workqueue.default_affinity_scope`` 36562306a36Sopenharmony_ci which is always set to one of the scopes below. 36662306a36Sopenharmony_ci 36762306a36Sopenharmony_ci``cpu`` 36862306a36Sopenharmony_ci CPUs are not grouped. A work item issued on one CPU is processed by a 36962306a36Sopenharmony_ci worker on the same CPU. This makes unbound workqueues behave as per-cpu 37062306a36Sopenharmony_ci workqueues without concurrency management. 37162306a36Sopenharmony_ci 37262306a36Sopenharmony_ci``smt`` 37362306a36Sopenharmony_ci CPUs are grouped according to SMT boundaries. This usually means that the 37462306a36Sopenharmony_ci logical threads of each physical CPU core are grouped together. 37562306a36Sopenharmony_ci 37662306a36Sopenharmony_ci``cache`` 37762306a36Sopenharmony_ci CPUs are grouped according to cache boundaries. Which specific cache 37862306a36Sopenharmony_ci boundary is used is determined by the arch code. L3 is used in a lot of 37962306a36Sopenharmony_ci cases. This is the default affinity scope. 38062306a36Sopenharmony_ci 38162306a36Sopenharmony_ci``numa`` 38262306a36Sopenharmony_ci CPUs are grouped according to NUMA bounaries. 38362306a36Sopenharmony_ci 38462306a36Sopenharmony_ci``system`` 38562306a36Sopenharmony_ci All CPUs are put in the same group. Workqueue makes no effort to process a 38662306a36Sopenharmony_ci work item on a CPU close to the issuing CPU. 38762306a36Sopenharmony_ci 38862306a36Sopenharmony_ciThe default affinity scope can be changed with the module parameter 38962306a36Sopenharmony_ci``workqueue.default_affinity_scope`` and a specific workqueue's affinity 39062306a36Sopenharmony_ciscope can be changed using ``apply_workqueue_attrs()``. 39162306a36Sopenharmony_ci 39262306a36Sopenharmony_ciIf ``WQ_SYSFS`` is set, the workqueue will have the following affinity scope 39362306a36Sopenharmony_cirelated interface files under its ``/sys/devices/virtual/workqueue/WQ_NAME/`` 39462306a36Sopenharmony_cidirectory. 39562306a36Sopenharmony_ci 39662306a36Sopenharmony_ci``affinity_scope`` 39762306a36Sopenharmony_ci Read to see the current affinity scope. Write to change. 39862306a36Sopenharmony_ci 39962306a36Sopenharmony_ci When default is the current scope, reading this file will also show the 40062306a36Sopenharmony_ci current effective scope in parentheses, for example, ``default (cache)``. 40162306a36Sopenharmony_ci 40262306a36Sopenharmony_ci``affinity_strict`` 40362306a36Sopenharmony_ci 0 by default indicating that affinity scopes are not strict. When a work 40462306a36Sopenharmony_ci item starts execution, workqueue makes a best-effort attempt to ensure 40562306a36Sopenharmony_ci that the worker is inside its affinity scope, which is called 40662306a36Sopenharmony_ci repatriation. Once started, the scheduler is free to move the worker 40762306a36Sopenharmony_ci anywhere in the system as it sees fit. This enables benefiting from scope 40862306a36Sopenharmony_ci locality while still being able to utilize other CPUs if necessary and 40962306a36Sopenharmony_ci available. 41062306a36Sopenharmony_ci 41162306a36Sopenharmony_ci If set to 1, all workers of the scope are guaranteed always to be in the 41262306a36Sopenharmony_ci scope. This may be useful when crossing affinity scopes has other 41362306a36Sopenharmony_ci implications, for example, in terms of power consumption or workload 41462306a36Sopenharmony_ci isolation. Strict NUMA scope can also be used to match the workqueue 41562306a36Sopenharmony_ci behavior of older kernels. 41662306a36Sopenharmony_ci 41762306a36Sopenharmony_ci 41862306a36Sopenharmony_ciAffinity Scopes and Performance 41962306a36Sopenharmony_ci=============================== 42062306a36Sopenharmony_ci 42162306a36Sopenharmony_ciIt'd be ideal if an unbound workqueue's behavior is optimal for vast 42262306a36Sopenharmony_cimajority of use cases without further tuning. Unfortunately, in the current 42362306a36Sopenharmony_cikernel, there exists a pronounced trade-off between locality and utilization 42462306a36Sopenharmony_cinecessitating explicit configurations when workqueues are heavily used. 42562306a36Sopenharmony_ci 42662306a36Sopenharmony_ciHigher locality leads to higher efficiency where more work is performed for 42762306a36Sopenharmony_cithe same number of consumed CPU cycles. However, higher locality may also 42862306a36Sopenharmony_cicause lower overall system utilization if the work items are not spread 42962306a36Sopenharmony_cienough across the affinity scopes by the issuers. The following performance 43062306a36Sopenharmony_citesting with dm-crypt clearly illustrates this trade-off. 43162306a36Sopenharmony_ci 43262306a36Sopenharmony_ciThe tests are run on a CPU with 12-cores/24-threads split across four L3 43362306a36Sopenharmony_cicaches (AMD Ryzen 9 3900x). CPU clock boost is turned off for consistency. 43462306a36Sopenharmony_ci``/dev/dm-0`` is a dm-crypt device created on NVME SSD (Samsung 990 PRO) and 43562306a36Sopenharmony_ciopened with ``cryptsetup`` with default settings. 43662306a36Sopenharmony_ci 43762306a36Sopenharmony_ci 43862306a36Sopenharmony_ciScenario 1: Enough issuers and work spread across the machine 43962306a36Sopenharmony_ci------------------------------------------------------------- 44062306a36Sopenharmony_ci 44162306a36Sopenharmony_ciThe command used: :: 44262306a36Sopenharmony_ci 44362306a36Sopenharmony_ci $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k --ioengine=libaio \ 44462306a36Sopenharmony_ci --iodepth=64 --runtime=60 --numjobs=24 --time_based --group_reporting \ 44562306a36Sopenharmony_ci --name=iops-test-job --verify=sha512 44662306a36Sopenharmony_ci 44762306a36Sopenharmony_ciThere are 24 issuers, each issuing 64 IOs concurrently. ``--verify=sha512`` 44862306a36Sopenharmony_cimakes ``fio`` generate and read back the content each time which makes 44962306a36Sopenharmony_ciexecution locality matter between the issuer and ``kcryptd``. The followings 45062306a36Sopenharmony_ciare the read bandwidths and CPU utilizations depending on different affinity 45162306a36Sopenharmony_ciscope settings on ``kcryptd`` measured over five runs. Bandwidths are in 45262306a36Sopenharmony_ciMiBps, and CPU util in percents. 45362306a36Sopenharmony_ci 45462306a36Sopenharmony_ci.. list-table:: 45562306a36Sopenharmony_ci :widths: 16 20 20 45662306a36Sopenharmony_ci :header-rows: 1 45762306a36Sopenharmony_ci 45862306a36Sopenharmony_ci * - Affinity 45962306a36Sopenharmony_ci - Bandwidth (MiBps) 46062306a36Sopenharmony_ci - CPU util (%) 46162306a36Sopenharmony_ci 46262306a36Sopenharmony_ci * - system 46362306a36Sopenharmony_ci - 1159.40 ±1.34 46462306a36Sopenharmony_ci - 99.31 ±0.02 46562306a36Sopenharmony_ci 46662306a36Sopenharmony_ci * - cache 46762306a36Sopenharmony_ci - 1166.40 ±0.89 46862306a36Sopenharmony_ci - 99.34 ±0.01 46962306a36Sopenharmony_ci 47062306a36Sopenharmony_ci * - cache (strict) 47162306a36Sopenharmony_ci - 1166.00 ±0.71 47262306a36Sopenharmony_ci - 99.35 ±0.01 47362306a36Sopenharmony_ci 47462306a36Sopenharmony_ciWith enough issuers spread across the system, there is no downside to 47562306a36Sopenharmony_ci"cache", strict or otherwise. All three configurations saturate the whole 47662306a36Sopenharmony_cimachine but the cache-affine ones outperform by 0.6% thanks to improved 47762306a36Sopenharmony_cilocality. 47862306a36Sopenharmony_ci 47962306a36Sopenharmony_ci 48062306a36Sopenharmony_ciScenario 2: Fewer issuers, enough work for saturation 48162306a36Sopenharmony_ci----------------------------------------------------- 48262306a36Sopenharmony_ci 48362306a36Sopenharmony_ciThe command used: :: 48462306a36Sopenharmony_ci 48562306a36Sopenharmony_ci $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k \ 48662306a36Sopenharmony_ci --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=8 \ 48762306a36Sopenharmony_ci --time_based --group_reporting --name=iops-test-job --verify=sha512 48862306a36Sopenharmony_ci 48962306a36Sopenharmony_ciThe only difference from the previous scenario is ``--numjobs=8``. There are 49062306a36Sopenharmony_cia third of the issuers but is still enough total work to saturate the 49162306a36Sopenharmony_cisystem. 49262306a36Sopenharmony_ci 49362306a36Sopenharmony_ci.. list-table:: 49462306a36Sopenharmony_ci :widths: 16 20 20 49562306a36Sopenharmony_ci :header-rows: 1 49662306a36Sopenharmony_ci 49762306a36Sopenharmony_ci * - Affinity 49862306a36Sopenharmony_ci - Bandwidth (MiBps) 49962306a36Sopenharmony_ci - CPU util (%) 50062306a36Sopenharmony_ci 50162306a36Sopenharmony_ci * - system 50262306a36Sopenharmony_ci - 1155.40 ±0.89 50362306a36Sopenharmony_ci - 97.41 ±0.05 50462306a36Sopenharmony_ci 50562306a36Sopenharmony_ci * - cache 50662306a36Sopenharmony_ci - 1154.40 ±1.14 50762306a36Sopenharmony_ci - 96.15 ±0.09 50862306a36Sopenharmony_ci 50962306a36Sopenharmony_ci * - cache (strict) 51062306a36Sopenharmony_ci - 1112.00 ±4.64 51162306a36Sopenharmony_ci - 93.26 ±0.35 51262306a36Sopenharmony_ci 51362306a36Sopenharmony_ciThis is more than enough work to saturate the system. Both "system" and 51462306a36Sopenharmony_ci"cache" are nearly saturating the machine but not fully. "cache" is using 51562306a36Sopenharmony_ciless CPU but the better efficiency puts it at the same bandwidth as 51662306a36Sopenharmony_ci"system". 51762306a36Sopenharmony_ci 51862306a36Sopenharmony_ciEight issuers moving around over four L3 cache scope still allow "cache 51962306a36Sopenharmony_ci(strict)" to mostly saturate the machine but the loss of work conservation 52062306a36Sopenharmony_ciis now starting to hurt with 3.7% bandwidth loss. 52162306a36Sopenharmony_ci 52262306a36Sopenharmony_ci 52362306a36Sopenharmony_ciScenario 3: Even fewer issuers, not enough work to saturate 52462306a36Sopenharmony_ci----------------------------------------------------------- 52562306a36Sopenharmony_ci 52662306a36Sopenharmony_ciThe command used: :: 52762306a36Sopenharmony_ci 52862306a36Sopenharmony_ci $ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k \ 52962306a36Sopenharmony_ci --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=4 \ 53062306a36Sopenharmony_ci --time_based --group_reporting --name=iops-test-job --verify=sha512 53162306a36Sopenharmony_ci 53262306a36Sopenharmony_ciAgain, the only difference is ``--numjobs=4``. With the number of issuers 53362306a36Sopenharmony_cireduced to four, there now isn't enough work to saturate the whole system 53462306a36Sopenharmony_ciand the bandwidth becomes dependent on completion latencies. 53562306a36Sopenharmony_ci 53662306a36Sopenharmony_ci.. list-table:: 53762306a36Sopenharmony_ci :widths: 16 20 20 53862306a36Sopenharmony_ci :header-rows: 1 53962306a36Sopenharmony_ci 54062306a36Sopenharmony_ci * - Affinity 54162306a36Sopenharmony_ci - Bandwidth (MiBps) 54262306a36Sopenharmony_ci - CPU util (%) 54362306a36Sopenharmony_ci 54462306a36Sopenharmony_ci * - system 54562306a36Sopenharmony_ci - 993.60 ±1.82 54662306a36Sopenharmony_ci - 75.49 ±0.06 54762306a36Sopenharmony_ci 54862306a36Sopenharmony_ci * - cache 54962306a36Sopenharmony_ci - 973.40 ±1.52 55062306a36Sopenharmony_ci - 74.90 ±0.07 55162306a36Sopenharmony_ci 55262306a36Sopenharmony_ci * - cache (strict) 55362306a36Sopenharmony_ci - 828.20 ±4.49 55462306a36Sopenharmony_ci - 66.84 ±0.29 55562306a36Sopenharmony_ci 55662306a36Sopenharmony_ciNow, the tradeoff between locality and utilization is clearer. "cache" shows 55762306a36Sopenharmony_ci2% bandwidth loss compared to "system" and "cache (struct)" whopping 20%. 55862306a36Sopenharmony_ci 55962306a36Sopenharmony_ci 56062306a36Sopenharmony_ciConclusion and Recommendations 56162306a36Sopenharmony_ci------------------------------ 56262306a36Sopenharmony_ci 56362306a36Sopenharmony_ciIn the above experiments, the efficiency advantage of the "cache" affinity 56462306a36Sopenharmony_ciscope over "system" is, while consistent and noticeable, small. However, the 56562306a36Sopenharmony_ciimpact is dependent on the distances between the scopes and may be more 56662306a36Sopenharmony_cipronounced in processors with more complex topologies. 56762306a36Sopenharmony_ci 56862306a36Sopenharmony_ciWhile the loss of work-conservation in certain scenarios hurts, it is a lot 56962306a36Sopenharmony_cibetter than "cache (strict)" and maximizing workqueue utilization is 57062306a36Sopenharmony_ciunlikely to be the common case anyway. As such, "cache" is the default 57162306a36Sopenharmony_ciaffinity scope for unbound pools. 57262306a36Sopenharmony_ci 57362306a36Sopenharmony_ci* As there is no one option which is great for most cases, workqueue usages 57462306a36Sopenharmony_ci that may consume a significant amount of CPU are recommended to configure 57562306a36Sopenharmony_ci the workqueues using ``apply_workqueue_attrs()`` and/or enable 57662306a36Sopenharmony_ci ``WQ_SYSFS``. 57762306a36Sopenharmony_ci 57862306a36Sopenharmony_ci* An unbound workqueue with strict "cpu" affinity scope behaves the same as 57962306a36Sopenharmony_ci ``WQ_CPU_INTENSIVE`` per-cpu workqueue. There is no real advanage to the 58062306a36Sopenharmony_ci latter and an unbound workqueue provides a lot more flexibility. 58162306a36Sopenharmony_ci 58262306a36Sopenharmony_ci* Affinity scopes are introduced in Linux v6.5. To emulate the previous 58362306a36Sopenharmony_ci behavior, use strict "numa" affinity scope. 58462306a36Sopenharmony_ci 58562306a36Sopenharmony_ci* The loss of work-conservation in non-strict affinity scopes is likely 58662306a36Sopenharmony_ci originating from the scheduler. There is no theoretical reason why the 58762306a36Sopenharmony_ci kernel wouldn't be able to do the right thing and maintain 58862306a36Sopenharmony_ci work-conservation in most cases. As such, it is possible that future 58962306a36Sopenharmony_ci scheduler improvements may make most of these tunables unnecessary. 59062306a36Sopenharmony_ci 59162306a36Sopenharmony_ci 59262306a36Sopenharmony_ciExamining Configuration 59362306a36Sopenharmony_ci======================= 59462306a36Sopenharmony_ci 59562306a36Sopenharmony_ciUse tools/workqueue/wq_dump.py to examine unbound CPU affinity 59662306a36Sopenharmony_ciconfiguration, worker pools and how workqueues map to the pools: :: 59762306a36Sopenharmony_ci 59862306a36Sopenharmony_ci $ tools/workqueue/wq_dump.py 59962306a36Sopenharmony_ci Affinity Scopes 60062306a36Sopenharmony_ci =============== 60162306a36Sopenharmony_ci wq_unbound_cpumask=0000000f 60262306a36Sopenharmony_ci 60362306a36Sopenharmony_ci CPU 60462306a36Sopenharmony_ci nr_pods 4 60562306a36Sopenharmony_ci pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 60662306a36Sopenharmony_ci pod_node [0]=0 [1]=0 [2]=1 [3]=1 60762306a36Sopenharmony_ci cpu_pod [0]=0 [1]=1 [2]=2 [3]=3 60862306a36Sopenharmony_ci 60962306a36Sopenharmony_ci SMT 61062306a36Sopenharmony_ci nr_pods 4 61162306a36Sopenharmony_ci pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 61262306a36Sopenharmony_ci pod_node [0]=0 [1]=0 [2]=1 [3]=1 61362306a36Sopenharmony_ci cpu_pod [0]=0 [1]=1 [2]=2 [3]=3 61462306a36Sopenharmony_ci 61562306a36Sopenharmony_ci CACHE (default) 61662306a36Sopenharmony_ci nr_pods 2 61762306a36Sopenharmony_ci pod_cpus [0]=00000003 [1]=0000000c 61862306a36Sopenharmony_ci pod_node [0]=0 [1]=1 61962306a36Sopenharmony_ci cpu_pod [0]=0 [1]=0 [2]=1 [3]=1 62062306a36Sopenharmony_ci 62162306a36Sopenharmony_ci NUMA 62262306a36Sopenharmony_ci nr_pods 2 62362306a36Sopenharmony_ci pod_cpus [0]=00000003 [1]=0000000c 62462306a36Sopenharmony_ci pod_node [0]=0 [1]=1 62562306a36Sopenharmony_ci cpu_pod [0]=0 [1]=0 [2]=1 [3]=1 62662306a36Sopenharmony_ci 62762306a36Sopenharmony_ci SYSTEM 62862306a36Sopenharmony_ci nr_pods 1 62962306a36Sopenharmony_ci pod_cpus [0]=0000000f 63062306a36Sopenharmony_ci pod_node [0]=-1 63162306a36Sopenharmony_ci cpu_pod [0]=0 [1]=0 [2]=0 [3]=0 63262306a36Sopenharmony_ci 63362306a36Sopenharmony_ci Worker Pools 63462306a36Sopenharmony_ci ============ 63562306a36Sopenharmony_ci pool[00] ref= 1 nice= 0 idle/workers= 4/ 4 cpu= 0 63662306a36Sopenharmony_ci pool[01] ref= 1 nice=-20 idle/workers= 2/ 2 cpu= 0 63762306a36Sopenharmony_ci pool[02] ref= 1 nice= 0 idle/workers= 4/ 4 cpu= 1 63862306a36Sopenharmony_ci pool[03] ref= 1 nice=-20 idle/workers= 2/ 2 cpu= 1 63962306a36Sopenharmony_ci pool[04] ref= 1 nice= 0 idle/workers= 4/ 4 cpu= 2 64062306a36Sopenharmony_ci pool[05] ref= 1 nice=-20 idle/workers= 2/ 2 cpu= 2 64162306a36Sopenharmony_ci pool[06] ref= 1 nice= 0 idle/workers= 3/ 3 cpu= 3 64262306a36Sopenharmony_ci pool[07] ref= 1 nice=-20 idle/workers= 2/ 2 cpu= 3 64362306a36Sopenharmony_ci pool[08] ref=42 nice= 0 idle/workers= 6/ 6 cpus=0000000f 64462306a36Sopenharmony_ci pool[09] ref=28 nice= 0 idle/workers= 3/ 3 cpus=00000003 64562306a36Sopenharmony_ci pool[10] ref=28 nice= 0 idle/workers= 17/ 17 cpus=0000000c 64662306a36Sopenharmony_ci pool[11] ref= 1 nice=-20 idle/workers= 1/ 1 cpus=0000000f 64762306a36Sopenharmony_ci pool[12] ref= 2 nice=-20 idle/workers= 1/ 1 cpus=00000003 64862306a36Sopenharmony_ci pool[13] ref= 2 nice=-20 idle/workers= 1/ 1 cpus=0000000c 64962306a36Sopenharmony_ci 65062306a36Sopenharmony_ci Workqueue CPU -> pool 65162306a36Sopenharmony_ci ===================== 65262306a36Sopenharmony_ci [ workqueue \ CPU 0 1 2 3 dfl] 65362306a36Sopenharmony_ci events percpu 0 2 4 6 65462306a36Sopenharmony_ci events_highpri percpu 1 3 5 7 65562306a36Sopenharmony_ci events_long percpu 0 2 4 6 65662306a36Sopenharmony_ci events_unbound unbound 9 9 10 10 8 65762306a36Sopenharmony_ci events_freezable percpu 0 2 4 6 65862306a36Sopenharmony_ci events_power_efficient percpu 0 2 4 6 65962306a36Sopenharmony_ci events_freezable_power_ percpu 0 2 4 6 66062306a36Sopenharmony_ci rcu_gp percpu 0 2 4 6 66162306a36Sopenharmony_ci rcu_par_gp percpu 0 2 4 6 66262306a36Sopenharmony_ci slub_flushwq percpu 0 2 4 6 66362306a36Sopenharmony_ci netns ordered 8 8 8 8 8 66462306a36Sopenharmony_ci ... 66562306a36Sopenharmony_ci 66662306a36Sopenharmony_ciSee the command's help message for more info. 66762306a36Sopenharmony_ci 66862306a36Sopenharmony_ci 66962306a36Sopenharmony_ciMonitoring 67062306a36Sopenharmony_ci========== 67162306a36Sopenharmony_ci 67262306a36Sopenharmony_ciUse tools/workqueue/wq_monitor.py to monitor workqueue operations: :: 67362306a36Sopenharmony_ci 67462306a36Sopenharmony_ci $ tools/workqueue/wq_monitor.py events 67562306a36Sopenharmony_ci total infl CPUtime CPUhog CMW/RPR mayday rescued 67662306a36Sopenharmony_ci events 18545 0 6.1 0 5 - - 67762306a36Sopenharmony_ci events_highpri 8 0 0.0 0 0 - - 67862306a36Sopenharmony_ci events_long 3 0 0.0 0 0 - - 67962306a36Sopenharmony_ci events_unbound 38306 0 0.1 - 7 - - 68062306a36Sopenharmony_ci events_freezable 0 0 0.0 0 0 - - 68162306a36Sopenharmony_ci events_power_efficient 29598 0 0.2 0 0 - - 68262306a36Sopenharmony_ci events_freezable_power_ 10 0 0.0 0 0 - - 68362306a36Sopenharmony_ci sock_diag_events 0 0 0.0 0 0 - - 68462306a36Sopenharmony_ci 68562306a36Sopenharmony_ci total infl CPUtime CPUhog CMW/RPR mayday rescued 68662306a36Sopenharmony_ci events 18548 0 6.1 0 5 - - 68762306a36Sopenharmony_ci events_highpri 8 0 0.0 0 0 - - 68862306a36Sopenharmony_ci events_long 3 0 0.0 0 0 - - 68962306a36Sopenharmony_ci events_unbound 38322 0 0.1 - 7 - - 69062306a36Sopenharmony_ci events_freezable 0 0 0.0 0 0 - - 69162306a36Sopenharmony_ci events_power_efficient 29603 0 0.2 0 0 - - 69262306a36Sopenharmony_ci events_freezable_power_ 10 0 0.0 0 0 - - 69362306a36Sopenharmony_ci sock_diag_events 0 0 0.0 0 0 - - 69462306a36Sopenharmony_ci 69562306a36Sopenharmony_ci ... 69662306a36Sopenharmony_ci 69762306a36Sopenharmony_ciSee the command's help message for more info. 69862306a36Sopenharmony_ci 69962306a36Sopenharmony_ci 70062306a36Sopenharmony_ciDebugging 70162306a36Sopenharmony_ci========= 70262306a36Sopenharmony_ci 70362306a36Sopenharmony_ciBecause the work functions are executed by generic worker threads 70462306a36Sopenharmony_cithere are a few tricks needed to shed some light on misbehaving 70562306a36Sopenharmony_ciworkqueue users. 70662306a36Sopenharmony_ci 70762306a36Sopenharmony_ciWorker threads show up in the process list as: :: 70862306a36Sopenharmony_ci 70962306a36Sopenharmony_ci root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1] 71062306a36Sopenharmony_ci root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2] 71162306a36Sopenharmony_ci root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0] 71262306a36Sopenharmony_ci root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0] 71362306a36Sopenharmony_ci 71462306a36Sopenharmony_ciIf kworkers are going crazy (using too much cpu), there are two types 71562306a36Sopenharmony_ciof possible problems: 71662306a36Sopenharmony_ci 71762306a36Sopenharmony_ci 1. Something being scheduled in rapid succession 71862306a36Sopenharmony_ci 2. A single work item that consumes lots of cpu cycles 71962306a36Sopenharmony_ci 72062306a36Sopenharmony_ciThe first one can be tracked using tracing: :: 72162306a36Sopenharmony_ci 72262306a36Sopenharmony_ci $ echo workqueue:workqueue_queue_work > /sys/kernel/tracing/set_event 72362306a36Sopenharmony_ci $ cat /sys/kernel/tracing/trace_pipe > out.txt 72462306a36Sopenharmony_ci (wait a few secs) 72562306a36Sopenharmony_ci ^C 72662306a36Sopenharmony_ci 72762306a36Sopenharmony_ciIf something is busy looping on work queueing, it would be dominating 72862306a36Sopenharmony_cithe output and the offender can be determined with the work item 72962306a36Sopenharmony_cifunction. 73062306a36Sopenharmony_ci 73162306a36Sopenharmony_ciFor the second type of problems it should be possible to just check 73262306a36Sopenharmony_cithe stack trace of the offending worker thread. :: 73362306a36Sopenharmony_ci 73462306a36Sopenharmony_ci $ cat /proc/THE_OFFENDING_KWORKER/stack 73562306a36Sopenharmony_ci 73662306a36Sopenharmony_ciThe work item's function should be trivially visible in the stack 73762306a36Sopenharmony_citrace. 73862306a36Sopenharmony_ci 73962306a36Sopenharmony_ci 74062306a36Sopenharmony_ciNon-reentrance Conditions 74162306a36Sopenharmony_ci========================= 74262306a36Sopenharmony_ci 74362306a36Sopenharmony_ciWorkqueue guarantees that a work item cannot be re-entrant if the following 74462306a36Sopenharmony_ciconditions hold after a work item gets queued: 74562306a36Sopenharmony_ci 74662306a36Sopenharmony_ci 1. The work function hasn't been changed. 74762306a36Sopenharmony_ci 2. No one queues the work item to another workqueue. 74862306a36Sopenharmony_ci 3. The work item hasn't been reinitiated. 74962306a36Sopenharmony_ci 75062306a36Sopenharmony_ciIn other words, if the above conditions hold, the work item is guaranteed to be 75162306a36Sopenharmony_ciexecuted by at most one worker system-wide at any given time. 75262306a36Sopenharmony_ci 75362306a36Sopenharmony_ciNote that requeuing the work item (to the same queue) in the self function 75462306a36Sopenharmony_cidoesn't break these conditions, so it's safe to do. Otherwise, caution is 75562306a36Sopenharmony_cirequired when breaking the conditions inside a work function. 75662306a36Sopenharmony_ci 75762306a36Sopenharmony_ci 75862306a36Sopenharmony_ciKernel Inline Documentations Reference 75962306a36Sopenharmony_ci====================================== 76062306a36Sopenharmony_ci 76162306a36Sopenharmony_ci.. kernel-doc:: include/linux/workqueue.h 76262306a36Sopenharmony_ci 76362306a36Sopenharmony_ci.. kernel-doc:: kernel/workqueue.c 764