18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci=========================== 48c2ecf20Sopenharmony_ciThe KVM halt polling system 58c2ecf20Sopenharmony_ci=========================== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciThe KVM halt polling system provides a feature within KVM whereby the latency 88c2ecf20Sopenharmony_ciof a guest can, under some circumstances, be reduced by polling in the host 98c2ecf20Sopenharmony_cifor some time period after the guest has elected to no longer run by cedeing. 108c2ecf20Sopenharmony_ciThat is, when a guest vcpu has ceded, or in the case of powerpc when all of the 118c2ecf20Sopenharmony_civcpus of a single vcore have ceded, the host kernel polls for wakeup conditions 128c2ecf20Sopenharmony_cibefore giving up the cpu to the scheduler in order to let something else run. 138c2ecf20Sopenharmony_ci 148c2ecf20Sopenharmony_ciPolling provides a latency advantage in cases where the guest can be run again 158c2ecf20Sopenharmony_civery quickly by at least saving us a trip through the scheduler, normally on 168c2ecf20Sopenharmony_cithe order of a few micro-seconds, although performance benefits are workload 178c2ecf20Sopenharmony_cidependant. In the event that no wakeup source arrives during the polling 188c2ecf20Sopenharmony_ciinterval or some other task on the runqueue is runnable the scheduler is 198c2ecf20Sopenharmony_ciinvoked. Thus halt polling is especially useful on workloads with very short 208c2ecf20Sopenharmony_ciwakeup periods where the time spent halt polling is minimised and the time 218c2ecf20Sopenharmony_cisavings of not invoking the scheduler are distinguishable. 228c2ecf20Sopenharmony_ci 238c2ecf20Sopenharmony_ciThe generic halt polling code is implemented in: 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ci virt/kvm/kvm_main.c: kvm_vcpu_block() 268c2ecf20Sopenharmony_ci 278c2ecf20Sopenharmony_ciThe powerpc kvm-hv specific case is implemented in: 288c2ecf20Sopenharmony_ci 298c2ecf20Sopenharmony_ci arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked() 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ciHalt Polling Interval 328c2ecf20Sopenharmony_ci===================== 338c2ecf20Sopenharmony_ci 348c2ecf20Sopenharmony_ciThe maximum time for which to poll before invoking the scheduler, referred to 358c2ecf20Sopenharmony_cias the halt polling interval, is increased and decreased based on the perceived 368c2ecf20Sopenharmony_cieffectiveness of the polling in an attempt to limit pointless polling. 378c2ecf20Sopenharmony_ciThis value is stored in either the vcpu struct: 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_ci kvm_vcpu->halt_poll_ns 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_cior in the case of powerpc kvm-hv, in the vcore struct: 428c2ecf20Sopenharmony_ci 438c2ecf20Sopenharmony_ci kvmppc_vcore->halt_poll_ns 448c2ecf20Sopenharmony_ci 458c2ecf20Sopenharmony_ciThus this is a per vcpu (or vcore) value. 468c2ecf20Sopenharmony_ci 478c2ecf20Sopenharmony_ciDuring polling if a wakeup source is received within the halt polling interval, 488c2ecf20Sopenharmony_cithe interval is left unchanged. In the event that a wakeup source isn't 498c2ecf20Sopenharmony_cireceived during the polling interval (and thus schedule is invoked) there are 508c2ecf20Sopenharmony_citwo options, either the polling interval and total block time[0] were less than 518c2ecf20Sopenharmony_cithe global max polling interval (see module params below), or the total block 528c2ecf20Sopenharmony_citime was greater than the global max polling interval. 538c2ecf20Sopenharmony_ci 548c2ecf20Sopenharmony_ciIn the event that both the polling interval and total block time were less than 558c2ecf20Sopenharmony_cithe global max polling interval then the polling interval can be increased in 568c2ecf20Sopenharmony_cithe hope that next time during the longer polling interval the wake up source 578c2ecf20Sopenharmony_ciwill be received while the host is polling and the latency benefits will be 588c2ecf20Sopenharmony_cireceived. The polling interval is grown in the function grow_halt_poll_ns() and 598c2ecf20Sopenharmony_ciis multiplied by the module parameters halt_poll_ns_grow and 608c2ecf20Sopenharmony_cihalt_poll_ns_grow_start. 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ciIn the event that the total block time was greater than the global max polling 638c2ecf20Sopenharmony_ciinterval then the host will never poll for long enough (limited by the global 648c2ecf20Sopenharmony_cimax) to wakeup during the polling interval so it may as well be shrunk in order 658c2ecf20Sopenharmony_cito avoid pointless polling. The polling interval is shrunk in the function 668c2ecf20Sopenharmony_cishrink_halt_poll_ns() and is divided by the module parameter 678c2ecf20Sopenharmony_cihalt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0. 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_ciIt is worth noting that this adjustment process attempts to hone in on some 708c2ecf20Sopenharmony_cisteady state polling interval but will only really do a good job for wakeups 718c2ecf20Sopenharmony_ciwhich come at an approximately constant rate, otherwise there will be constant 728c2ecf20Sopenharmony_ciadjustment of the polling interval. 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_ci[0] total block time: 758c2ecf20Sopenharmony_ci the time between when the halt polling function is 768c2ecf20Sopenharmony_ci invoked and a wakeup source received (irrespective of 778c2ecf20Sopenharmony_ci whether the scheduler is invoked within that function). 788c2ecf20Sopenharmony_ci 798c2ecf20Sopenharmony_ciModule Parameters 808c2ecf20Sopenharmony_ci================= 818c2ecf20Sopenharmony_ci 828c2ecf20Sopenharmony_ciThe kvm module has 3 tuneable module parameters to adjust the global max 838c2ecf20Sopenharmony_cipolling interval as well as the rate at which the polling interval is grown and 848c2ecf20Sopenharmony_cishrunk. These variables are defined in include/linux/kvm_host.h and as module 858c2ecf20Sopenharmony_ciparameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the 868c2ecf20Sopenharmony_cipowerpc kvm-hv case. 878c2ecf20Sopenharmony_ci 888c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+ 898c2ecf20Sopenharmony_ci|Module Parameter | Description | Default Value | 908c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+ 918c2ecf20Sopenharmony_ci|halt_poll_ns | The global max polling | KVM_HALT_POLL_NS_DEFAULT| 928c2ecf20Sopenharmony_ci| | interval which defines | | 938c2ecf20Sopenharmony_ci| | the ceiling value of the | | 948c2ecf20Sopenharmony_ci| | polling interval for | (per arch value) | 958c2ecf20Sopenharmony_ci| | each vcpu. | | 968c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+ 978c2ecf20Sopenharmony_ci|halt_poll_ns_grow | The value by which the | 2 | 988c2ecf20Sopenharmony_ci| | halt polling interval is | | 998c2ecf20Sopenharmony_ci| | multiplied in the | | 1008c2ecf20Sopenharmony_ci| | grow_halt_poll_ns() | | 1018c2ecf20Sopenharmony_ci| | function. | | 1028c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+ 1038c2ecf20Sopenharmony_ci|halt_poll_ns_grow_start| The initial value to grow | 10000 | 1048c2ecf20Sopenharmony_ci| | to from zero in the | | 1058c2ecf20Sopenharmony_ci| | grow_halt_poll_ns() | | 1068c2ecf20Sopenharmony_ci| | function. | | 1078c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+ 1088c2ecf20Sopenharmony_ci|halt_poll_ns_shrink | The value by which the | 0 | 1098c2ecf20Sopenharmony_ci| | halt polling interval is | | 1108c2ecf20Sopenharmony_ci| | divided in the | | 1118c2ecf20Sopenharmony_ci| | shrink_halt_poll_ns() | | 1128c2ecf20Sopenharmony_ci| | function. | | 1138c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+ 1148c2ecf20Sopenharmony_ci 1158c2ecf20Sopenharmony_ciThese module parameters can be set from the debugfs files in: 1168c2ecf20Sopenharmony_ci 1178c2ecf20Sopenharmony_ci /sys/module/kvm/parameters/ 1188c2ecf20Sopenharmony_ci 1198c2ecf20Sopenharmony_ciNote: that these module parameters are system wide values and are not able to 1208c2ecf20Sopenharmony_ci be tuned on a per vm basis. 1218c2ecf20Sopenharmony_ci 1228c2ecf20Sopenharmony_ciFurther Notes 1238c2ecf20Sopenharmony_ci============= 1248c2ecf20Sopenharmony_ci 1258c2ecf20Sopenharmony_ci- Care should be taken when setting the halt_poll_ns module parameter as a large value 1268c2ecf20Sopenharmony_ci has the potential to drive the cpu usage to 100% on a machine which would be almost 1278c2ecf20Sopenharmony_ci entirely idle otherwise. This is because even if a guest has wakeups during which very 1288c2ecf20Sopenharmony_ci little work is done and which are quite far apart, if the period is shorter than the 1298c2ecf20Sopenharmony_ci global max polling interval (halt_poll_ns) then the host will always poll for the 1308c2ecf20Sopenharmony_ci entire block time and thus cpu utilisation will go to 100%. 1318c2ecf20Sopenharmony_ci 1328c2ecf20Sopenharmony_ci- Halt polling essentially presents a trade off between power usage and latency and 1338c2ecf20Sopenharmony_ci the module parameters should be used to tune the affinity for this. Idle cpu time is 1348c2ecf20Sopenharmony_ci essentially converted to host kernel time with the aim of decreasing latency when 1358c2ecf20Sopenharmony_ci entering the guest. 1368c2ecf20Sopenharmony_ci 1378c2ecf20Sopenharmony_ci- Halt polling will only be conducted by the host when no other tasks are runnable on 1388c2ecf20Sopenharmony_ci that cpu, otherwise the polling will cease immediately and schedule will be invoked to 1398c2ecf20Sopenharmony_ci allow that other task to run. Thus this doesn't allow a guest to denial of service the 1408c2ecf20Sopenharmony_ci cpu. 141