162306a36Sopenharmony_ci========================================== 262306a36Sopenharmony_ciReducing OS jitter due to per-cpu kthreads 362306a36Sopenharmony_ci========================================== 462306a36Sopenharmony_ci 562306a36Sopenharmony_ciThis document lists per-CPU kthreads in the Linux kernel and presents 662306a36Sopenharmony_cioptions to control their OS jitter. Note that non-per-CPU kthreads are 762306a36Sopenharmony_cinot listed here. To reduce OS jitter from non-per-CPU kthreads, bind 862306a36Sopenharmony_cithem to a "housekeeping" CPU dedicated to such work. 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciReferences 1162306a36Sopenharmony_ci========== 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ci- Documentation/core-api/irq/irq-affinity.rst: Binding interrupts to sets of CPUs. 1462306a36Sopenharmony_ci 1562306a36Sopenharmony_ci- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs. 1662306a36Sopenharmony_ci 1762306a36Sopenharmony_ci- man taskset: Using the taskset command to bind tasks to sets 1862306a36Sopenharmony_ci of CPUs. 1962306a36Sopenharmony_ci 2062306a36Sopenharmony_ci- man sched_setaffinity: Using the sched_setaffinity() system 2162306a36Sopenharmony_ci call to bind tasks to sets of CPUs. 2262306a36Sopenharmony_ci 2362306a36Sopenharmony_ci- /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, 2462306a36Sopenharmony_ci writing "0" to offline and "1" to online. 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ci- In order to locate kernel-generated OS jitter on CPU N: 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ci cd /sys/kernel/tracing 2962306a36Sopenharmony_ci echo 1 > max_graph_depth # Increase the "1" for more detail 3062306a36Sopenharmony_ci echo function_graph > current_tracer 3162306a36Sopenharmony_ci # run workload 3262306a36Sopenharmony_ci cat per_cpu/cpuN/trace 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_cikthreads 3562306a36Sopenharmony_ci======== 3662306a36Sopenharmony_ci 3762306a36Sopenharmony_ciName: 3862306a36Sopenharmony_ci ehca_comp/%u 3962306a36Sopenharmony_ci 4062306a36Sopenharmony_ciPurpose: 4162306a36Sopenharmony_ci Periodically process Infiniband-related work. 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciTo reduce its OS jitter, do any of the following: 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ci1. Don't use eHCA Infiniband hardware, instead choosing hardware 4662306a36Sopenharmony_ci that does not require per-CPU kthreads. This will prevent these 4762306a36Sopenharmony_ci kthreads from being created in the first place. (This will 4862306a36Sopenharmony_ci work for most people, as this hardware, though important, is 4962306a36Sopenharmony_ci relatively old and is produced in relatively low unit volumes.) 5062306a36Sopenharmony_ci2. Do all eHCA-Infiniband-related work on other CPUs, including 5162306a36Sopenharmony_ci interrupts. 5262306a36Sopenharmony_ci3. Rework the eHCA driver so that its per-CPU kthreads are 5362306a36Sopenharmony_ci provisioned only on selected CPUs. 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ci 5662306a36Sopenharmony_ciName: 5762306a36Sopenharmony_ci irq/%d-%s 5862306a36Sopenharmony_ci 5962306a36Sopenharmony_ciPurpose: 6062306a36Sopenharmony_ci Handle threaded interrupts. 6162306a36Sopenharmony_ci 6262306a36Sopenharmony_ciTo reduce its OS jitter, do the following: 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ci1. Use irq affinity to force the irq threads to execute on 6562306a36Sopenharmony_ci some other CPU. 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ciName: 6862306a36Sopenharmony_ci kcmtpd_ctr_%d 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_ciPurpose: 7162306a36Sopenharmony_ci Handle Bluetooth work. 7262306a36Sopenharmony_ci 7362306a36Sopenharmony_ciTo reduce its OS jitter, do one of the following: 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ci1. Don't use Bluetooth, in which case these kthreads won't be 7662306a36Sopenharmony_ci created in the first place. 7762306a36Sopenharmony_ci2. Use irq affinity to force Bluetooth-related interrupts to 7862306a36Sopenharmony_ci occur on some other CPU and furthermore initiate all 7962306a36Sopenharmony_ci Bluetooth activity on some other CPU. 8062306a36Sopenharmony_ci 8162306a36Sopenharmony_ciName: 8262306a36Sopenharmony_ci ksoftirqd/%u 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ciPurpose: 8562306a36Sopenharmony_ci Execute softirq handlers when threaded or when under heavy load. 8662306a36Sopenharmony_ci 8762306a36Sopenharmony_ciTo reduce its OS jitter, each softirq vector must be handled 8862306a36Sopenharmony_ciseparately as follows: 8962306a36Sopenharmony_ci 9062306a36Sopenharmony_ciTIMER_SOFTIRQ 9162306a36Sopenharmony_ci------------- 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ciDo all of the following: 9462306a36Sopenharmony_ci 9562306a36Sopenharmony_ci1. To the extent possible, keep the CPU out of the kernel when it 9662306a36Sopenharmony_ci is non-idle, for example, by avoiding system calls and by forcing 9762306a36Sopenharmony_ci both kernel threads and interrupts to execute elsewhere. 9862306a36Sopenharmony_ci2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force 9962306a36Sopenharmony_ci the CPU offline, then bring it back online. This forces 10062306a36Sopenharmony_ci recurring timers to migrate elsewhere. If you are concerned 10162306a36Sopenharmony_ci with multiple CPUs, force them all offline before bringing the 10262306a36Sopenharmony_ci first one back online. Once you have onlined the CPUs in question, 10362306a36Sopenharmony_ci do not offline any other CPUs, because doing so could force the 10462306a36Sopenharmony_ci timer back onto one of the CPUs in question. 10562306a36Sopenharmony_ci 10662306a36Sopenharmony_ciNET_TX_SOFTIRQ and NET_RX_SOFTIRQ 10762306a36Sopenharmony_ci--------------------------------- 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ciDo all of the following: 11062306a36Sopenharmony_ci 11162306a36Sopenharmony_ci1. Force networking interrupts onto other CPUs. 11262306a36Sopenharmony_ci2. Initiate any network I/O on other CPUs. 11362306a36Sopenharmony_ci3. Once your application has started, prevent CPU-hotplug operations 11462306a36Sopenharmony_ci from being initiated from tasks that might run on the CPU to 11562306a36Sopenharmony_ci be de-jittered. (It is OK to force this CPU offline and then 11662306a36Sopenharmony_ci bring it back online before you start your application.) 11762306a36Sopenharmony_ci 11862306a36Sopenharmony_ciBLOCK_SOFTIRQ 11962306a36Sopenharmony_ci------------- 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ciDo all of the following: 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ci1. Force block-device interrupts onto some other CPU. 12462306a36Sopenharmony_ci2. Initiate any block I/O on other CPUs. 12562306a36Sopenharmony_ci3. Once your application has started, prevent CPU-hotplug operations 12662306a36Sopenharmony_ci from being initiated from tasks that might run on the CPU to 12762306a36Sopenharmony_ci be de-jittered. (It is OK to force this CPU offline and then 12862306a36Sopenharmony_ci bring it back online before you start your application.) 12962306a36Sopenharmony_ci 13062306a36Sopenharmony_ciIRQ_POLL_SOFTIRQ 13162306a36Sopenharmony_ci---------------- 13262306a36Sopenharmony_ci 13362306a36Sopenharmony_ciDo all of the following: 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ci1. Force block-device interrupts onto some other CPU. 13662306a36Sopenharmony_ci2. Initiate any block I/O and block-I/O polling on other CPUs. 13762306a36Sopenharmony_ci3. Once your application has started, prevent CPU-hotplug operations 13862306a36Sopenharmony_ci from being initiated from tasks that might run on the CPU to 13962306a36Sopenharmony_ci be de-jittered. (It is OK to force this CPU offline and then 14062306a36Sopenharmony_ci bring it back online before you start your application.) 14162306a36Sopenharmony_ci 14262306a36Sopenharmony_ciTASKLET_SOFTIRQ 14362306a36Sopenharmony_ci--------------- 14462306a36Sopenharmony_ci 14562306a36Sopenharmony_ciDo one or more of the following: 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ci1. Avoid use of drivers that use tasklets. (Such drivers will contain 14862306a36Sopenharmony_ci calls to things like tasklet_schedule().) 14962306a36Sopenharmony_ci2. Convert all drivers that you must use from tasklets to workqueues. 15062306a36Sopenharmony_ci3. Force interrupts for drivers using tasklets onto other CPUs, 15162306a36Sopenharmony_ci and also do I/O involving these drivers on other CPUs. 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ciSCHED_SOFTIRQ 15462306a36Sopenharmony_ci------------- 15562306a36Sopenharmony_ci 15662306a36Sopenharmony_ciDo all of the following: 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ci1. Avoid sending scheduler IPIs to the CPU to be de-jittered, 15962306a36Sopenharmony_ci for example, ensure that at most one runnable kthread is present 16062306a36Sopenharmony_ci on that CPU. If a thread that expects to run on the de-jittered 16162306a36Sopenharmony_ci CPU awakens, the scheduler will send an IPI that can result in 16262306a36Sopenharmony_ci a subsequent SCHED_SOFTIRQ. 16362306a36Sopenharmony_ci2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered 16462306a36Sopenharmony_ci is marked as an adaptive-ticks CPU using the "nohz_full=" 16562306a36Sopenharmony_ci boot parameter. This reduces the number of scheduler-clock 16662306a36Sopenharmony_ci interrupts that the de-jittered CPU receives, minimizing its 16762306a36Sopenharmony_ci chances of being selected to do the load balancing work that 16862306a36Sopenharmony_ci runs in SCHED_SOFTIRQ context. 16962306a36Sopenharmony_ci3. To the extent possible, keep the CPU out of the kernel when it 17062306a36Sopenharmony_ci is non-idle, for example, by avoiding system calls and by 17162306a36Sopenharmony_ci forcing both kernel threads and interrupts to execute elsewhere. 17262306a36Sopenharmony_ci This further reduces the number of scheduler-clock interrupts 17362306a36Sopenharmony_ci received by the de-jittered CPU. 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ciHRTIMER_SOFTIRQ 17662306a36Sopenharmony_ci--------------- 17762306a36Sopenharmony_ci 17862306a36Sopenharmony_ciDo all of the following: 17962306a36Sopenharmony_ci 18062306a36Sopenharmony_ci1. To the extent possible, keep the CPU out of the kernel when it 18162306a36Sopenharmony_ci is non-idle. For example, avoid system calls and force both 18262306a36Sopenharmony_ci kernel threads and interrupts to execute elsewhere. 18362306a36Sopenharmony_ci2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the 18462306a36Sopenharmony_ci CPU offline, then bring it back online. This forces recurring 18562306a36Sopenharmony_ci timers to migrate elsewhere. If you are concerned with multiple 18662306a36Sopenharmony_ci CPUs, force them all offline before bringing the first one 18762306a36Sopenharmony_ci back online. Once you have onlined the CPUs in question, do not 18862306a36Sopenharmony_ci offline any other CPUs, because doing so could force the timer 18962306a36Sopenharmony_ci back onto one of the CPUs in question. 19062306a36Sopenharmony_ci 19162306a36Sopenharmony_ciRCU_SOFTIRQ 19262306a36Sopenharmony_ci----------- 19362306a36Sopenharmony_ci 19462306a36Sopenharmony_ciDo at least one of the following: 19562306a36Sopenharmony_ci 19662306a36Sopenharmony_ci1. Offload callbacks and keep the CPU in either dyntick-idle or 19762306a36Sopenharmony_ci adaptive-ticks state by doing all of the following: 19862306a36Sopenharmony_ci 19962306a36Sopenharmony_ci a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be 20062306a36Sopenharmony_ci de-jittered is marked as an adaptive-ticks CPU using the 20162306a36Sopenharmony_ci "nohz_full=" boot parameter. Bind the rcuo kthreads to 20262306a36Sopenharmony_ci housekeeping CPUs, which can tolerate OS jitter. 20362306a36Sopenharmony_ci b. To the extent possible, keep the CPU out of the kernel 20462306a36Sopenharmony_ci when it is non-idle, for example, by avoiding system 20562306a36Sopenharmony_ci calls and by forcing both kernel threads and interrupts 20662306a36Sopenharmony_ci to execute elsewhere. 20762306a36Sopenharmony_ci 20862306a36Sopenharmony_ci2. Enable RCU to do its processing remotely via dyntick-idle by 20962306a36Sopenharmony_ci doing all of the following: 21062306a36Sopenharmony_ci 21162306a36Sopenharmony_ci a. Build with CONFIG_NO_HZ=y. 21262306a36Sopenharmony_ci b. Ensure that the CPU goes idle frequently, allowing other 21362306a36Sopenharmony_ci CPUs to detect that it has passed through an RCU quiescent 21462306a36Sopenharmony_ci state. If the kernel is built with CONFIG_NO_HZ_FULL=y, 21562306a36Sopenharmony_ci userspace execution also allows other CPUs to detect that 21662306a36Sopenharmony_ci the CPU in question has passed through a quiescent state. 21762306a36Sopenharmony_ci c. To the extent possible, keep the CPU out of the kernel 21862306a36Sopenharmony_ci when it is non-idle, for example, by avoiding system 21962306a36Sopenharmony_ci calls and by forcing both kernel threads and interrupts 22062306a36Sopenharmony_ci to execute elsewhere. 22162306a36Sopenharmony_ci 22262306a36Sopenharmony_ciName: 22362306a36Sopenharmony_ci kworker/%u:%d%s (cpu, id, priority) 22462306a36Sopenharmony_ci 22562306a36Sopenharmony_ciPurpose: 22662306a36Sopenharmony_ci Execute workqueue requests 22762306a36Sopenharmony_ci 22862306a36Sopenharmony_ciTo reduce its OS jitter, do any of the following: 22962306a36Sopenharmony_ci 23062306a36Sopenharmony_ci1. Run your workload at a real-time priority, which will allow 23162306a36Sopenharmony_ci preempting the kworker daemons. 23262306a36Sopenharmony_ci2. A given workqueue can be made visible in the sysfs filesystem 23362306a36Sopenharmony_ci by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). 23462306a36Sopenharmony_ci Such a workqueue can be confined to a given subset of the 23562306a36Sopenharmony_ci CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs 23662306a36Sopenharmony_ci files. The set of WQ_SYSFS workqueues can be displayed using 23762306a36Sopenharmony_ci "ls /sys/devices/virtual/workqueue". That said, the workqueues 23862306a36Sopenharmony_ci maintainer would like to caution people against indiscriminately 23962306a36Sopenharmony_ci sprinkling WQ_SYSFS across all the workqueues. The reason for 24062306a36Sopenharmony_ci caution is that it is easy to add WQ_SYSFS, but because sysfs is 24162306a36Sopenharmony_ci part of the formal user/kernel API, it can be nearly impossible 24262306a36Sopenharmony_ci to remove it, even if its addition was a mistake. 24362306a36Sopenharmony_ci3. Do any of the following needed to avoid jitter that your 24462306a36Sopenharmony_ci application cannot tolerate: 24562306a36Sopenharmony_ci 24662306a36Sopenharmony_ci a. Build your kernel with CONFIG_SLUB=y rather than 24762306a36Sopenharmony_ci CONFIG_SLAB=y, thus avoiding the slab allocator's periodic 24862306a36Sopenharmony_ci use of each CPU's workqueues to run its cache_reap() 24962306a36Sopenharmony_ci function. 25062306a36Sopenharmony_ci b. Avoid using oprofile, thus avoiding OS jitter from 25162306a36Sopenharmony_ci wq_sync_buffer(). 25262306a36Sopenharmony_ci c. Limit your CPU frequency so that a CPU-frequency 25362306a36Sopenharmony_ci governor is not required, possibly enlisting the aid of 25462306a36Sopenharmony_ci special heatsinks or other cooling technologies. If done 25562306a36Sopenharmony_ci correctly, and if you CPU architecture permits, you should 25662306a36Sopenharmony_ci be able to build your kernel with CONFIG_CPU_FREQ=n to 25762306a36Sopenharmony_ci avoid the CPU-frequency governor periodically running 25862306a36Sopenharmony_ci on each CPU, including cs_dbs_timer() and od_dbs_timer(). 25962306a36Sopenharmony_ci 26062306a36Sopenharmony_ci WARNING: Please check your CPU specifications to 26162306a36Sopenharmony_ci make sure that this is safe on your particular system. 26262306a36Sopenharmony_ci d. As of v3.18, Christoph Lameter's on-demand vmstat workers 26362306a36Sopenharmony_ci commit prevents OS jitter due to vmstat_update() on 26462306a36Sopenharmony_ci CONFIG_SMP=y systems. Before v3.18, is not possible 26562306a36Sopenharmony_ci to entirely get rid of the OS jitter, but you can 26662306a36Sopenharmony_ci decrease its frequency by writing a large value to 26762306a36Sopenharmony_ci /proc/sys/vm/stat_interval. The default value is HZ, 26862306a36Sopenharmony_ci for an interval of one second. Of course, larger values 26962306a36Sopenharmony_ci will make your virtual-memory statistics update more 27062306a36Sopenharmony_ci slowly. Of course, you can also run your workload at 27162306a36Sopenharmony_ci a real-time priority, thus preempting vmstat_update(), 27262306a36Sopenharmony_ci but if your workload is CPU-bound, this is a bad idea. 27362306a36Sopenharmony_ci However, there is an RFC patch from Christoph Lameter 27462306a36Sopenharmony_ci (based on an earlier one from Gilad Ben-Yossef) that 27562306a36Sopenharmony_ci reduces or even eliminates vmstat overhead for some 27662306a36Sopenharmony_ci workloads at https://lore.kernel.org/r/00000140e9dfd6bd-40db3d4f-c1be-434f-8132-7820f81bb586-000000@email.amazonses.com. 27762306a36Sopenharmony_ci e. If running on high-end powerpc servers, build with 27862306a36Sopenharmony_ci CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS 27962306a36Sopenharmony_ci daemon from running on each CPU every second or so. 28062306a36Sopenharmony_ci (This will require editing Kconfig files and will defeat 28162306a36Sopenharmony_ci this platform's RAS functionality.) This avoids jitter 28262306a36Sopenharmony_ci due to the rtas_event_scan() function. 28362306a36Sopenharmony_ci WARNING: Please check your CPU specifications to 28462306a36Sopenharmony_ci make sure that this is safe on your particular system. 28562306a36Sopenharmony_ci f. If running on Cell Processor, build your kernel with 28662306a36Sopenharmony_ci CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from 28762306a36Sopenharmony_ci spu_gov_work(). 28862306a36Sopenharmony_ci WARNING: Please check your CPU specifications to 28962306a36Sopenharmony_ci make sure that this is safe on your particular system. 29062306a36Sopenharmony_ci g. If running on PowerMAC, build your kernel with 29162306a36Sopenharmony_ci CONFIG_PMAC_RACKMETER=n to disable the CPU-meter, 29262306a36Sopenharmony_ci avoiding OS jitter from rackmeter_do_timer(). 29362306a36Sopenharmony_ci 29462306a36Sopenharmony_ciName: 29562306a36Sopenharmony_ci rcuc/%u 29662306a36Sopenharmony_ci 29762306a36Sopenharmony_ciPurpose: 29862306a36Sopenharmony_ci Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. 29962306a36Sopenharmony_ci 30062306a36Sopenharmony_ciTo reduce its OS jitter, do at least one of the following: 30162306a36Sopenharmony_ci 30262306a36Sopenharmony_ci1. Build the kernel with CONFIG_PREEMPT=n. This prevents these 30362306a36Sopenharmony_ci kthreads from being created in the first place, and also obviates 30462306a36Sopenharmony_ci the need for RCU priority boosting. This approach is feasible 30562306a36Sopenharmony_ci for workloads that do not require high degrees of responsiveness. 30662306a36Sopenharmony_ci2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these 30762306a36Sopenharmony_ci kthreads from being created in the first place. This approach 30862306a36Sopenharmony_ci is feasible only if your workload never requires RCU priority 30962306a36Sopenharmony_ci boosting, for example, if you ensure frequent idle time on all 31062306a36Sopenharmony_ci CPUs that might execute within the kernel. 31162306a36Sopenharmony_ci3. Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs= 31262306a36Sopenharmony_ci boot parameter offloading RCU callbacks from all CPUs susceptible 31362306a36Sopenharmony_ci to OS jitter. This approach prevents the rcuc/%u kthreads from 31462306a36Sopenharmony_ci having any work to do, so that they are never awakened. 31562306a36Sopenharmony_ci4. Ensure that the CPU never enters the kernel, and, in particular, 31662306a36Sopenharmony_ci avoid initiating any CPU hotplug operations on this CPU. This is 31762306a36Sopenharmony_ci another way of preventing any callbacks from being queued on the 31862306a36Sopenharmony_ci CPU, again preventing the rcuc/%u kthreads from having any work 31962306a36Sopenharmony_ci to do. 32062306a36Sopenharmony_ci 32162306a36Sopenharmony_ciName: 32262306a36Sopenharmony_ci rcuop/%d and rcuos/%d 32362306a36Sopenharmony_ci 32462306a36Sopenharmony_ciPurpose: 32562306a36Sopenharmony_ci Offload RCU callbacks from the corresponding CPU. 32662306a36Sopenharmony_ci 32762306a36Sopenharmony_ciTo reduce its OS jitter, do at least one of the following: 32862306a36Sopenharmony_ci 32962306a36Sopenharmony_ci1. Use affinity, cgroups, or other mechanism to force these kthreads 33062306a36Sopenharmony_ci to execute on some other CPU. 33162306a36Sopenharmony_ci2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these 33262306a36Sopenharmony_ci kthreads from being created in the first place. However, please 33362306a36Sopenharmony_ci note that this will not eliminate OS jitter, but will instead 33462306a36Sopenharmony_ci shift it to RCU_SOFTIRQ. 335