18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci===========================
48c2ecf20Sopenharmony_ciThe KVM halt polling system
58c2ecf20Sopenharmony_ci===========================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ciThe KVM halt polling system provides a feature within KVM whereby the latency
88c2ecf20Sopenharmony_ciof a guest can, under some circumstances, be reduced by polling in the host
98c2ecf20Sopenharmony_cifor some time period after the guest has elected to no longer run by cedeing.
108c2ecf20Sopenharmony_ciThat is, when a guest vcpu has ceded, or in the case of powerpc when all of the
118c2ecf20Sopenharmony_civcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
128c2ecf20Sopenharmony_cibefore giving up the cpu to the scheduler in order to let something else run.
138c2ecf20Sopenharmony_ci
148c2ecf20Sopenharmony_ciPolling provides a latency advantage in cases where the guest can be run again
158c2ecf20Sopenharmony_civery quickly by at least saving us a trip through the scheduler, normally on
168c2ecf20Sopenharmony_cithe order of a few micro-seconds, although performance benefits are workload
178c2ecf20Sopenharmony_cidependant. In the event that no wakeup source arrives during the polling
188c2ecf20Sopenharmony_ciinterval or some other task on the runqueue is runnable the scheduler is
198c2ecf20Sopenharmony_ciinvoked. Thus halt polling is especially useful on workloads with very short
208c2ecf20Sopenharmony_ciwakeup periods where the time spent halt polling is minimised and the time
218c2ecf20Sopenharmony_cisavings of not invoking the scheduler are distinguishable.
228c2ecf20Sopenharmony_ci
238c2ecf20Sopenharmony_ciThe generic halt polling code is implemented in:
248c2ecf20Sopenharmony_ci
258c2ecf20Sopenharmony_ci	virt/kvm/kvm_main.c: kvm_vcpu_block()
268c2ecf20Sopenharmony_ci
278c2ecf20Sopenharmony_ciThe powerpc kvm-hv specific case is implemented in:
288c2ecf20Sopenharmony_ci
298c2ecf20Sopenharmony_ci	arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ciHalt Polling Interval
328c2ecf20Sopenharmony_ci=====================
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ciThe maximum time for which to poll before invoking the scheduler, referred to
358c2ecf20Sopenharmony_cias the halt polling interval, is increased and decreased based on the perceived
368c2ecf20Sopenharmony_cieffectiveness of the polling in an attempt to limit pointless polling.
378c2ecf20Sopenharmony_ciThis value is stored in either the vcpu struct:
388c2ecf20Sopenharmony_ci
398c2ecf20Sopenharmony_ci	kvm_vcpu->halt_poll_ns
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_cior in the case of powerpc kvm-hv, in the vcore struct:
428c2ecf20Sopenharmony_ci
438c2ecf20Sopenharmony_ci	kvmppc_vcore->halt_poll_ns
448c2ecf20Sopenharmony_ci
458c2ecf20Sopenharmony_ciThus this is a per vcpu (or vcore) value.
468c2ecf20Sopenharmony_ci
478c2ecf20Sopenharmony_ciDuring polling if a wakeup source is received within the halt polling interval,
488c2ecf20Sopenharmony_cithe interval is left unchanged. In the event that a wakeup source isn't
498c2ecf20Sopenharmony_cireceived during the polling interval (and thus schedule is invoked) there are
508c2ecf20Sopenharmony_citwo options, either the polling interval and total block time[0] were less than
518c2ecf20Sopenharmony_cithe global max polling interval (see module params below), or the total block
528c2ecf20Sopenharmony_citime was greater than the global max polling interval.
538c2ecf20Sopenharmony_ci
548c2ecf20Sopenharmony_ciIn the event that both the polling interval and total block time were less than
558c2ecf20Sopenharmony_cithe global max polling interval then the polling interval can be increased in
568c2ecf20Sopenharmony_cithe hope that next time during the longer polling interval the wake up source
578c2ecf20Sopenharmony_ciwill be received while the host is polling and the latency benefits will be
588c2ecf20Sopenharmony_cireceived. The polling interval is grown in the function grow_halt_poll_ns() and
598c2ecf20Sopenharmony_ciis multiplied by the module parameters halt_poll_ns_grow and
608c2ecf20Sopenharmony_cihalt_poll_ns_grow_start.
618c2ecf20Sopenharmony_ci
628c2ecf20Sopenharmony_ciIn the event that the total block time was greater than the global max polling
638c2ecf20Sopenharmony_ciinterval then the host will never poll for long enough (limited by the global
648c2ecf20Sopenharmony_cimax) to wakeup during the polling interval so it may as well be shrunk in order
658c2ecf20Sopenharmony_cito avoid pointless polling. The polling interval is shrunk in the function
668c2ecf20Sopenharmony_cishrink_halt_poll_ns() and is divided by the module parameter
678c2ecf20Sopenharmony_cihalt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
688c2ecf20Sopenharmony_ci
698c2ecf20Sopenharmony_ciIt is worth noting that this adjustment process attempts to hone in on some
708c2ecf20Sopenharmony_cisteady state polling interval but will only really do a good job for wakeups
718c2ecf20Sopenharmony_ciwhich come at an approximately constant rate, otherwise there will be constant
728c2ecf20Sopenharmony_ciadjustment of the polling interval.
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ci[0] total block time:
758c2ecf20Sopenharmony_ci		      the time between when the halt polling function is
768c2ecf20Sopenharmony_ci		      invoked and a wakeup source received (irrespective of
778c2ecf20Sopenharmony_ci		      whether the scheduler is invoked within that function).
788c2ecf20Sopenharmony_ci
798c2ecf20Sopenharmony_ciModule Parameters
808c2ecf20Sopenharmony_ci=================
818c2ecf20Sopenharmony_ci
828c2ecf20Sopenharmony_ciThe kvm module has 3 tuneable module parameters to adjust the global max
838c2ecf20Sopenharmony_cipolling interval as well as the rate at which the polling interval is grown and
848c2ecf20Sopenharmony_cishrunk. These variables are defined in include/linux/kvm_host.h and as module
858c2ecf20Sopenharmony_ciparameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
868c2ecf20Sopenharmony_cipowerpc kvm-hv case.
878c2ecf20Sopenharmony_ci
888c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+
898c2ecf20Sopenharmony_ci|Module Parameter	|   Description		    |	     Default Value    |
908c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+
918c2ecf20Sopenharmony_ci|halt_poll_ns		| The global max polling    | KVM_HALT_POLL_NS_DEFAULT|
928c2ecf20Sopenharmony_ci|			| interval which defines    |			      |
938c2ecf20Sopenharmony_ci|			| the ceiling value of the  |			      |
948c2ecf20Sopenharmony_ci|			| polling interval for      | (per arch value)	      |
958c2ecf20Sopenharmony_ci|			| each vcpu.		    |			      |
968c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+
978c2ecf20Sopenharmony_ci|halt_poll_ns_grow	| The value by which the    | 2			      |
988c2ecf20Sopenharmony_ci|			| halt polling interval is  |			      |
998c2ecf20Sopenharmony_ci|			| multiplied in the	    |			      |
1008c2ecf20Sopenharmony_ci|			| grow_halt_poll_ns()	    |			      |
1018c2ecf20Sopenharmony_ci|			| function.		    |			      |
1028c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+
1038c2ecf20Sopenharmony_ci|halt_poll_ns_grow_start| The initial value to grow | 10000		      |
1048c2ecf20Sopenharmony_ci|			| to from zero in the	    |			      |
1058c2ecf20Sopenharmony_ci|			| grow_halt_poll_ns()	    |			      |
1068c2ecf20Sopenharmony_ci|			| function.		    |			      |
1078c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+
1088c2ecf20Sopenharmony_ci|halt_poll_ns_shrink	| The value by which the    | 0			      |
1098c2ecf20Sopenharmony_ci|			| halt polling interval is  |			      |
1108c2ecf20Sopenharmony_ci|			| divided in the	    |			      |
1118c2ecf20Sopenharmony_ci|			| shrink_halt_poll_ns()	    |			      |
1128c2ecf20Sopenharmony_ci|			| function.		    |			      |
1138c2ecf20Sopenharmony_ci+-----------------------+---------------------------+-------------------------+
1148c2ecf20Sopenharmony_ci
1158c2ecf20Sopenharmony_ciThese module parameters can be set from the debugfs files in:
1168c2ecf20Sopenharmony_ci
1178c2ecf20Sopenharmony_ci	/sys/module/kvm/parameters/
1188c2ecf20Sopenharmony_ci
1198c2ecf20Sopenharmony_ciNote: that these module parameters are system wide values and are not able to
1208c2ecf20Sopenharmony_ci      be tuned on a per vm basis.
1218c2ecf20Sopenharmony_ci
1228c2ecf20Sopenharmony_ciFurther Notes
1238c2ecf20Sopenharmony_ci=============
1248c2ecf20Sopenharmony_ci
1258c2ecf20Sopenharmony_ci- Care should be taken when setting the halt_poll_ns module parameter as a large value
1268c2ecf20Sopenharmony_ci  has the potential to drive the cpu usage to 100% on a machine which would be almost
1278c2ecf20Sopenharmony_ci  entirely idle otherwise. This is because even if a guest has wakeups during which very
1288c2ecf20Sopenharmony_ci  little work is done and which are quite far apart, if the period is shorter than the
1298c2ecf20Sopenharmony_ci  global max polling interval (halt_poll_ns) then the host will always poll for the
1308c2ecf20Sopenharmony_ci  entire block time and thus cpu utilisation will go to 100%.
1318c2ecf20Sopenharmony_ci
1328c2ecf20Sopenharmony_ci- Halt polling essentially presents a trade off between power usage and latency and
1338c2ecf20Sopenharmony_ci  the module parameters should be used to tune the affinity for this. Idle cpu time is
1348c2ecf20Sopenharmony_ci  essentially converted to host kernel time with the aim of decreasing latency when
1358c2ecf20Sopenharmony_ci  entering the guest.
1368c2ecf20Sopenharmony_ci
1378c2ecf20Sopenharmony_ci- Halt polling will only be conducted by the host when no other tasks are runnable on
1388c2ecf20Sopenharmony_ci  that cpu, otherwise the polling will cease immediately and schedule will be invoked to
1398c2ecf20Sopenharmony_ci  allow that other task to run. Thus this doesn't allow a guest to denial of service the
1408c2ecf20Sopenharmony_ci  cpu.
141