162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci.. include:: <isonum.txt>
362306a36Sopenharmony_ci
462306a36Sopenharmony_ci========================
562306a36Sopenharmony_ciCPU Idle Time Management
662306a36Sopenharmony_ci========================
762306a36Sopenharmony_ci
862306a36Sopenharmony_ci:Copyright: |copy| 2019 Intel Corporation
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ci:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
1162306a36Sopenharmony_ci
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciCPU Idle Time Management Subsystem
1462306a36Sopenharmony_ci==================================
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ciEvery time one of the logical CPUs in the system (the entities that appear to
1762306a36Sopenharmony_cifetch and execute instructions: hardware threads, if present, or processor
1862306a36Sopenharmony_cicores) is idle after an interrupt or equivalent wakeup event, which means that
1962306a36Sopenharmony_cithere are no tasks to run on it except for the special "idle" task associated
2062306a36Sopenharmony_ciwith it, there is an opportunity to save energy for the processor that it
2162306a36Sopenharmony_cibelongs to.  That can be done by making the idle logical CPU stop fetching
2262306a36Sopenharmony_ciinstructions from memory and putting some of the processor's functional units
2362306a36Sopenharmony_cidepended on by it into an idle state in which they will draw less power.
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ciHowever, there may be multiple different idle states that can be used in such a
2662306a36Sopenharmony_cisituation in principle, so it may be necessary to find the most suitable one
2762306a36Sopenharmony_ci(from the kernel perspective) and ask the processor to use (or "enter") that
2862306a36Sopenharmony_ciparticular idle state.  That is the role of the CPU idle time management
2962306a36Sopenharmony_cisubsystem in the kernel, called ``CPUIdle``.
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciThe design of ``CPUIdle`` is modular and based on the code duplication avoidance
3262306a36Sopenharmony_ciprinciple, so the generic code that in principle need not depend on the hardware
3362306a36Sopenharmony_cior platform design details in it is separate from the code that interacts with
3462306a36Sopenharmony_cithe hardware.  It generally is divided into three categories of functional
3562306a36Sopenharmony_ciunits: *governors* responsible for selecting idle states to ask the processor
3662306a36Sopenharmony_cito enter, *drivers* that pass the governors' decisions on to the hardware and
3762306a36Sopenharmony_cithe *core* providing a common framework for them.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ci
4062306a36Sopenharmony_ciCPU Idle Time Governors
4162306a36Sopenharmony_ci=======================
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciA CPU idle time (``CPUIdle``) governor is a bundle of policy code invoked when
4462306a36Sopenharmony_cione of the logical CPUs in the system turns out to be idle.  Its role is to
4562306a36Sopenharmony_ciselect an idle state to ask the processor to enter in order to save some energy.
4662306a36Sopenharmony_ci
4762306a36Sopenharmony_ci``CPUIdle`` governors are generic and each of them can be used on any hardware
4862306a36Sopenharmony_ciplatform that the Linux kernel can run on.  For this reason, data structures
4962306a36Sopenharmony_cioperated on by them cannot depend on any hardware architecture or platform
5062306a36Sopenharmony_cidesign details as well.
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciThe governor itself is represented by a struct cpuidle_governor object
5362306a36Sopenharmony_cicontaining four callback pointers, :c:member:`enable`, :c:member:`disable`,
5462306a36Sopenharmony_ci:c:member:`select`, :c:member:`reflect`, a :c:member:`rating` field described
5562306a36Sopenharmony_cibelow, and a name (string) used for identifying it.
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ciFor the governor to be available at all, that object needs to be registered
5862306a36Sopenharmony_ciwith the ``CPUIdle`` core by calling :c:func:`cpuidle_register_governor()` with
5962306a36Sopenharmony_cia pointer to it passed as the argument.  If successful, that causes the core to
6062306a36Sopenharmony_ciadd the governor to the global list of available governors and, if it is the
6162306a36Sopenharmony_cionly one in the list (that is, the list was empty before) or the value of its
6262306a36Sopenharmony_ci:c:member:`rating` field is greater than the value of that field for the
6362306a36Sopenharmony_cigovernor currently in use, or the name of the new governor was passed to the
6462306a36Sopenharmony_cikernel as the value of the ``cpuidle.governor=`` command line parameter, the new
6562306a36Sopenharmony_cigovernor will be used from that point on (there can be only one ``CPUIdle``
6662306a36Sopenharmony_cigovernor in use at a time).  Also, user space can choose the ``CPUIdle``
6762306a36Sopenharmony_cigovernor to use at run time via ``sysfs``.
6862306a36Sopenharmony_ci
6962306a36Sopenharmony_ciOnce registered, ``CPUIdle`` governors cannot be unregistered, so it is not
7062306a36Sopenharmony_cipractical to put them into loadable kernel modules.
7162306a36Sopenharmony_ci
7262306a36Sopenharmony_ciThe interface between ``CPUIdle`` governors and the core consists of four
7362306a36Sopenharmony_cicallbacks:
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_ci:c:member:`enable`
7662306a36Sopenharmony_ci	::
7762306a36Sopenharmony_ci
7862306a36Sopenharmony_ci	  int (*enable) (struct cpuidle_driver *drv, struct cpuidle_device *dev);
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ci	The role of this callback is to prepare the governor for handling the
8162306a36Sopenharmony_ci	(logical) CPU represented by the struct cpuidle_device object	pointed
8262306a36Sopenharmony_ci	to by the ``dev`` argument.  The struct cpuidle_driver object pointed
8362306a36Sopenharmony_ci	to by the ``drv`` argument represents the ``CPUIdle`` driver to be used
8462306a36Sopenharmony_ci	with that CPU (among other things, it should contain the list of
8562306a36Sopenharmony_ci	struct cpuidle_state objects representing idle states that the
8662306a36Sopenharmony_ci	processor holding the given CPU can be asked to enter).
8762306a36Sopenharmony_ci
8862306a36Sopenharmony_ci	It may fail, in which case it is expected to return a negative error
8962306a36Sopenharmony_ci	code, and that causes the kernel to run the architecture-specific
9062306a36Sopenharmony_ci	default code for idle CPUs on the CPU in question instead of ``CPUIdle``
9162306a36Sopenharmony_ci	until the ``->enable()`` governor callback is invoked for that CPU
9262306a36Sopenharmony_ci	again.
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ci:c:member:`disable`
9562306a36Sopenharmony_ci	::
9662306a36Sopenharmony_ci
9762306a36Sopenharmony_ci	  void (*disable) (struct cpuidle_driver *drv, struct cpuidle_device *dev);
9862306a36Sopenharmony_ci
9962306a36Sopenharmony_ci	Called to make the governor stop handling the (logical) CPU represented
10062306a36Sopenharmony_ci	by the struct cpuidle_device object pointed to by the ``dev``
10162306a36Sopenharmony_ci	argument.
10262306a36Sopenharmony_ci
10362306a36Sopenharmony_ci	It is expected to reverse any changes made by the ``->enable()``
10462306a36Sopenharmony_ci	callback when it was last invoked for the target CPU, free all memory
10562306a36Sopenharmony_ci	allocated by that callback and so on.
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ci:c:member:`select`
10862306a36Sopenharmony_ci	::
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ci	  int (*select) (struct cpuidle_driver *drv, struct cpuidle_device *dev,
11162306a36Sopenharmony_ci	                 bool *stop_tick);
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ci	Called to select an idle state for the processor holding the (logical)
11462306a36Sopenharmony_ci	CPU represented by the struct cpuidle_device object pointed to by the
11562306a36Sopenharmony_ci	``dev`` argument.
11662306a36Sopenharmony_ci
11762306a36Sopenharmony_ci	The list of idle states to take into consideration is represented by the
11862306a36Sopenharmony_ci	:c:member:`states` array of struct cpuidle_state objects held by the
11962306a36Sopenharmony_ci	struct cpuidle_driver object pointed to by the ``drv`` argument (which
12062306a36Sopenharmony_ci	represents the ``CPUIdle`` driver to be used with the CPU at hand).  The
12162306a36Sopenharmony_ci	value returned by this callback is interpreted as an index into that
12262306a36Sopenharmony_ci	array (unless it is a negative error code).
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ci	The ``stop_tick`` argument is used to indicate whether or not to stop
12562306a36Sopenharmony_ci	the scheduler tick before asking the processor to enter the selected
12662306a36Sopenharmony_ci	idle state.  When the ``bool`` variable pointed to by it (which is set
12762306a36Sopenharmony_ci	to ``true`` before invoking this callback) is cleared to ``false``, the
12862306a36Sopenharmony_ci	processor will be asked to enter the selected idle state without
12962306a36Sopenharmony_ci	stopping the scheduler tick on the given CPU (if the tick has been
13062306a36Sopenharmony_ci	stopped on that CPU already, however, it will not be restarted before
13162306a36Sopenharmony_ci	asking the processor to enter the idle state).
13262306a36Sopenharmony_ci
13362306a36Sopenharmony_ci	This callback is mandatory (i.e. the :c:member:`select` callback pointer
13462306a36Sopenharmony_ci	in struct cpuidle_governor must not be ``NULL`` for the registration
13562306a36Sopenharmony_ci	of the governor to succeed).
13662306a36Sopenharmony_ci
13762306a36Sopenharmony_ci:c:member:`reflect`
13862306a36Sopenharmony_ci	::
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ci	  void (*reflect) (struct cpuidle_device *dev, int index);
14162306a36Sopenharmony_ci
14262306a36Sopenharmony_ci	Called to allow the governor to evaluate the accuracy of the idle state
14362306a36Sopenharmony_ci	selection made by the ``->select()`` callback (when it was invoked last
14462306a36Sopenharmony_ci	time) and possibly use the result of that to improve the accuracy of
14562306a36Sopenharmony_ci	idle state selections in the future.
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ciIn addition, ``CPUIdle`` governors are required to take power management
14862306a36Sopenharmony_ciquality of service (PM QoS) constraints on the processor wakeup latency into
14962306a36Sopenharmony_ciaccount when selecting idle states.  In order to obtain the current effective
15062306a36Sopenharmony_ciPM QoS wakeup latency constraint for a given CPU, a ``CPUIdle`` governor is
15162306a36Sopenharmony_ciexpected to pass the number of the CPU to
15262306a36Sopenharmony_ci:c:func:`cpuidle_governor_latency_req()`.  Then, the governor's ``->select()``
15362306a36Sopenharmony_cicallback must not return the index of an indle state whose
15462306a36Sopenharmony_ci:c:member:`exit_latency` value is greater than the number returned by that
15562306a36Sopenharmony_cifunction.
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ci
15862306a36Sopenharmony_ciCPU Idle Time Management Drivers
15962306a36Sopenharmony_ci================================
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ciCPU idle time management (``CPUIdle``) drivers provide an interface between the
16262306a36Sopenharmony_ciother parts of ``CPUIdle`` and the hardware.
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ciFirst of all, a ``CPUIdle`` driver has to populate the :c:member:`states` array
16562306a36Sopenharmony_ciof struct cpuidle_state objects included in the struct cpuidle_driver object
16662306a36Sopenharmony_cirepresenting it.  Going forward this array will represent the list of available
16762306a36Sopenharmony_ciidle states that the processor hardware can be asked to enter shared by all of
16862306a36Sopenharmony_cithe logical CPUs handled by the given driver.
16962306a36Sopenharmony_ci
17062306a36Sopenharmony_ciThe entries in the :c:member:`states` array are expected to be sorted by the
17162306a36Sopenharmony_civalue of the :c:member:`target_residency` field in struct cpuidle_state in
17262306a36Sopenharmony_cithe ascending order (that is, index 0 should correspond to the idle state with
17362306a36Sopenharmony_cithe minimum value of :c:member:`target_residency`).  [Since the
17462306a36Sopenharmony_ci:c:member:`target_residency` value is expected to reflect the "depth" of the
17562306a36Sopenharmony_ciidle state represented by the struct cpuidle_state object holding it, this
17662306a36Sopenharmony_cisorting order should be the same as the ascending sorting order by the idle
17762306a36Sopenharmony_cistate "depth".]
17862306a36Sopenharmony_ci
17962306a36Sopenharmony_ciThree fields in struct cpuidle_state are used by the existing ``CPUIdle``
18062306a36Sopenharmony_cigovernors for computations related to idle state selection:
18162306a36Sopenharmony_ci
18262306a36Sopenharmony_ci:c:member:`target_residency`
18362306a36Sopenharmony_ci	Minimum time to spend in this idle state including the time needed to
18462306a36Sopenharmony_ci	enter it (which may be substantial) to save more energy than could
18562306a36Sopenharmony_ci	be saved by staying in a shallower idle state for the same amount of
18662306a36Sopenharmony_ci	time, in microseconds.
18762306a36Sopenharmony_ci
18862306a36Sopenharmony_ci:c:member:`exit_latency`
18962306a36Sopenharmony_ci	Maximum time it will take a CPU asking the processor to enter this idle
19062306a36Sopenharmony_ci	state to start executing the first instruction after a wakeup from it,
19162306a36Sopenharmony_ci	in microseconds.
19262306a36Sopenharmony_ci
19362306a36Sopenharmony_ci:c:member:`flags`
19462306a36Sopenharmony_ci	Flags representing idle state properties.  Currently, governors only use
19562306a36Sopenharmony_ci	the ``CPUIDLE_FLAG_POLLING`` flag which is set if the given object
19662306a36Sopenharmony_ci	does not represent a real idle state, but an interface to a software
19762306a36Sopenharmony_ci	"loop" that can be used in order to avoid asking the processor to enter
19862306a36Sopenharmony_ci	any idle state at all.  [There are other flags used by the ``CPUIdle``
19962306a36Sopenharmony_ci	core in special situations.]
20062306a36Sopenharmony_ci
20162306a36Sopenharmony_ciThe :c:member:`enter` callback pointer in struct cpuidle_state, which must not
20262306a36Sopenharmony_cibe ``NULL``, points to the routine to execute in order to ask the processor to
20362306a36Sopenharmony_cienter this particular idle state:
20462306a36Sopenharmony_ci
20562306a36Sopenharmony_ci::
20662306a36Sopenharmony_ci
20762306a36Sopenharmony_ci  void (*enter) (struct cpuidle_device *dev, struct cpuidle_driver *drv,
20862306a36Sopenharmony_ci                 int index);
20962306a36Sopenharmony_ci
21062306a36Sopenharmony_ciThe first two arguments of it point to the struct cpuidle_device object
21162306a36Sopenharmony_cirepresenting the logical CPU running this callback and the
21262306a36Sopenharmony_cistruct cpuidle_driver object representing the driver itself, respectively,
21362306a36Sopenharmony_ciand the last one is an index of the struct cpuidle_state entry in the driver's
21462306a36Sopenharmony_ci:c:member:`states` array representing the idle state to ask the processor to
21562306a36Sopenharmony_cienter.
21662306a36Sopenharmony_ci
21762306a36Sopenharmony_ciThe analogous ``->enter_s2idle()`` callback in struct cpuidle_state is used
21862306a36Sopenharmony_cionly for implementing the suspend-to-idle system-wide power management feature.
21962306a36Sopenharmony_ciThe difference between in and ``->enter()`` is that it must not re-enable
22062306a36Sopenharmony_ciinterrupts at any point (even temporarily) or attempt to change the states of
22162306a36Sopenharmony_ciclock event devices, which the ``->enter()`` callback may do sometimes.
22262306a36Sopenharmony_ci
22362306a36Sopenharmony_ciOnce the :c:member:`states` array has been populated, the number of valid
22462306a36Sopenharmony_cientries in it has to be stored in the :c:member:`state_count` field of the
22562306a36Sopenharmony_cistruct cpuidle_driver object representing the driver.  Moreover, if any
22662306a36Sopenharmony_cientries in the :c:member:`states` array represent "coupled" idle states (that
22762306a36Sopenharmony_ciis, idle states that can only be asked for if multiple related logical CPUs are
22862306a36Sopenharmony_ciidle), the :c:member:`safe_state_index` field in struct cpuidle_driver needs
22962306a36Sopenharmony_cito be the index of an idle state that is not "coupled" (that is, one that can be
23062306a36Sopenharmony_ciasked for if only one logical CPU is idle).
23162306a36Sopenharmony_ci
23262306a36Sopenharmony_ciIn addition to that, if the given ``CPUIdle`` driver is only going to handle a
23362306a36Sopenharmony_cisubset of logical CPUs in the system, the :c:member:`cpumask` field in its
23462306a36Sopenharmony_cistruct cpuidle_driver object must point to the set (mask) of CPUs that will be
23562306a36Sopenharmony_cihandled by it.
23662306a36Sopenharmony_ci
23762306a36Sopenharmony_ciA ``CPUIdle`` driver can only be used after it has been registered.  If there
23862306a36Sopenharmony_ciare no "coupled" idle state entries in the driver's :c:member:`states` array,
23962306a36Sopenharmony_cithat can be accomplished by passing the driver's struct cpuidle_driver object
24062306a36Sopenharmony_cito :c:func:`cpuidle_register_driver()`.  Otherwise, :c:func:`cpuidle_register()`
24162306a36Sopenharmony_cishould be used for this purpose.
24262306a36Sopenharmony_ci
24362306a36Sopenharmony_ciHowever, it also is necessary to register struct cpuidle_device objects for
24462306a36Sopenharmony_ciall of the logical CPUs to be handled by the given ``CPUIdle`` driver with the
24562306a36Sopenharmony_cihelp of :c:func:`cpuidle_register_device()` after the driver has been registered
24662306a36Sopenharmony_ciand :c:func:`cpuidle_register_driver()`, unlike :c:func:`cpuidle_register()`,
24762306a36Sopenharmony_cidoes not do that automatically.  For this reason, the drivers that use
24862306a36Sopenharmony_ci:c:func:`cpuidle_register_driver()` to register themselves must also take care
24962306a36Sopenharmony_ciof registering the struct cpuidle_device objects as needed, so it is generally
25062306a36Sopenharmony_cirecommended to use :c:func:`cpuidle_register()` for ``CPUIdle`` driver
25162306a36Sopenharmony_ciregistration in all cases.
25262306a36Sopenharmony_ci
25362306a36Sopenharmony_ciThe registration of a struct cpuidle_device object causes the ``CPUIdle``
25462306a36Sopenharmony_ci``sysfs`` interface to be created and the governor's ``->enable()`` callback to
25562306a36Sopenharmony_cibe invoked for the logical CPU represented by it, so it must take place after
25662306a36Sopenharmony_ciregistering the driver that will handle the CPU in question.
25762306a36Sopenharmony_ci
25862306a36Sopenharmony_ci``CPUIdle`` drivers and struct cpuidle_device objects can be unregistered
25962306a36Sopenharmony_ciwhen they are not necessary any more which allows some resources associated with
26062306a36Sopenharmony_cithem to be released.  Due to dependencies between them, all of the
26162306a36Sopenharmony_cistruct cpuidle_device objects representing CPUs handled by the given
26262306a36Sopenharmony_ci``CPUIdle`` driver must be unregistered, with the help of
26362306a36Sopenharmony_ci:c:func:`cpuidle_unregister_device()`, before calling
26462306a36Sopenharmony_ci:c:func:`cpuidle_unregister_driver()` to unregister the driver.  Alternatively,
26562306a36Sopenharmony_ci:c:func:`cpuidle_unregister()` can be called to unregister a ``CPUIdle`` driver
26662306a36Sopenharmony_cialong with all of the struct cpuidle_device objects representing CPUs handled
26762306a36Sopenharmony_ciby it.
26862306a36Sopenharmony_ci
26962306a36Sopenharmony_ci``CPUIdle`` drivers can respond to runtime system configuration changes that
27062306a36Sopenharmony_cilead to modifications of the list of available processor idle states (which can
27162306a36Sopenharmony_cihappen, for example, when the system's power source is switched from AC to
27262306a36Sopenharmony_cibattery or the other way around).  Upon a notification of such a change,
27362306a36Sopenharmony_cia ``CPUIdle`` driver is expected to call :c:func:`cpuidle_pause_and_lock()` to
27462306a36Sopenharmony_citurn ``CPUIdle`` off temporarily and then :c:func:`cpuidle_disable_device()` for
27562306a36Sopenharmony_ciall of the struct cpuidle_device objects representing CPUs affected by that
27662306a36Sopenharmony_cichange.  Next, it can update its :c:member:`states` array in accordance with
27762306a36Sopenharmony_cithe new configuration of the system, call :c:func:`cpuidle_enable_device()` for
27862306a36Sopenharmony_ciall of the relevant struct cpuidle_device objects and invoke
27962306a36Sopenharmony_ci:c:func:`cpuidle_resume_and_unlock()` to allow ``CPUIdle`` to be used again.
280