162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci.. include:: <isonum.txt> 362306a36Sopenharmony_ci 462306a36Sopenharmony_ci======================== 562306a36Sopenharmony_ciCPU Idle Time Management 662306a36Sopenharmony_ci======================== 762306a36Sopenharmony_ci 862306a36Sopenharmony_ci:Copyright: |copy| 2019 Intel Corporation 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ci:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 1162306a36Sopenharmony_ci 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciCPU Idle Time Management Subsystem 1462306a36Sopenharmony_ci================================== 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ciEvery time one of the logical CPUs in the system (the entities that appear to 1762306a36Sopenharmony_cifetch and execute instructions: hardware threads, if present, or processor 1862306a36Sopenharmony_cicores) is idle after an interrupt or equivalent wakeup event, which means that 1962306a36Sopenharmony_cithere are no tasks to run on it except for the special "idle" task associated 2062306a36Sopenharmony_ciwith it, there is an opportunity to save energy for the processor that it 2162306a36Sopenharmony_cibelongs to. That can be done by making the idle logical CPU stop fetching 2262306a36Sopenharmony_ciinstructions from memory and putting some of the processor's functional units 2362306a36Sopenharmony_cidepended on by it into an idle state in which they will draw less power. 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ciHowever, there may be multiple different idle states that can be used in such a 2662306a36Sopenharmony_cisituation in principle, so it may be necessary to find the most suitable one 2762306a36Sopenharmony_ci(from the kernel perspective) and ask the processor to use (or "enter") that 2862306a36Sopenharmony_ciparticular idle state. That is the role of the CPU idle time management 2962306a36Sopenharmony_cisubsystem in the kernel, called ``CPUIdle``. 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciThe design of ``CPUIdle`` is modular and based on the code duplication avoidance 3262306a36Sopenharmony_ciprinciple, so the generic code that in principle need not depend on the hardware 3362306a36Sopenharmony_cior platform design details in it is separate from the code that interacts with 3462306a36Sopenharmony_cithe hardware. It generally is divided into three categories of functional 3562306a36Sopenharmony_ciunits: *governors* responsible for selecting idle states to ask the processor 3662306a36Sopenharmony_cito enter, *drivers* that pass the governors' decisions on to the hardware and 3762306a36Sopenharmony_cithe *core* providing a common framework for them. 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ci 4062306a36Sopenharmony_ciCPU Idle Time Governors 4162306a36Sopenharmony_ci======================= 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciA CPU idle time (``CPUIdle``) governor is a bundle of policy code invoked when 4462306a36Sopenharmony_cione of the logical CPUs in the system turns out to be idle. Its role is to 4562306a36Sopenharmony_ciselect an idle state to ask the processor to enter in order to save some energy. 4662306a36Sopenharmony_ci 4762306a36Sopenharmony_ci``CPUIdle`` governors are generic and each of them can be used on any hardware 4862306a36Sopenharmony_ciplatform that the Linux kernel can run on. For this reason, data structures 4962306a36Sopenharmony_cioperated on by them cannot depend on any hardware architecture or platform 5062306a36Sopenharmony_cidesign details as well. 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ciThe governor itself is represented by a struct cpuidle_governor object 5362306a36Sopenharmony_cicontaining four callback pointers, :c:member:`enable`, :c:member:`disable`, 5462306a36Sopenharmony_ci:c:member:`select`, :c:member:`reflect`, a :c:member:`rating` field described 5562306a36Sopenharmony_cibelow, and a name (string) used for identifying it. 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ciFor the governor to be available at all, that object needs to be registered 5862306a36Sopenharmony_ciwith the ``CPUIdle`` core by calling :c:func:`cpuidle_register_governor()` with 5962306a36Sopenharmony_cia pointer to it passed as the argument. If successful, that causes the core to 6062306a36Sopenharmony_ciadd the governor to the global list of available governors and, if it is the 6162306a36Sopenharmony_cionly one in the list (that is, the list was empty before) or the value of its 6262306a36Sopenharmony_ci:c:member:`rating` field is greater than the value of that field for the 6362306a36Sopenharmony_cigovernor currently in use, or the name of the new governor was passed to the 6462306a36Sopenharmony_cikernel as the value of the ``cpuidle.governor=`` command line parameter, the new 6562306a36Sopenharmony_cigovernor will be used from that point on (there can be only one ``CPUIdle`` 6662306a36Sopenharmony_cigovernor in use at a time). Also, user space can choose the ``CPUIdle`` 6762306a36Sopenharmony_cigovernor to use at run time via ``sysfs``. 6862306a36Sopenharmony_ci 6962306a36Sopenharmony_ciOnce registered, ``CPUIdle`` governors cannot be unregistered, so it is not 7062306a36Sopenharmony_cipractical to put them into loadable kernel modules. 7162306a36Sopenharmony_ci 7262306a36Sopenharmony_ciThe interface between ``CPUIdle`` governors and the core consists of four 7362306a36Sopenharmony_cicallbacks: 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ci:c:member:`enable` 7662306a36Sopenharmony_ci :: 7762306a36Sopenharmony_ci 7862306a36Sopenharmony_ci int (*enable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ci The role of this callback is to prepare the governor for handling the 8162306a36Sopenharmony_ci (logical) CPU represented by the struct cpuidle_device object pointed 8262306a36Sopenharmony_ci to by the ``dev`` argument. The struct cpuidle_driver object pointed 8362306a36Sopenharmony_ci to by the ``drv`` argument represents the ``CPUIdle`` driver to be used 8462306a36Sopenharmony_ci with that CPU (among other things, it should contain the list of 8562306a36Sopenharmony_ci struct cpuidle_state objects representing idle states that the 8662306a36Sopenharmony_ci processor holding the given CPU can be asked to enter). 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ci It may fail, in which case it is expected to return a negative error 8962306a36Sopenharmony_ci code, and that causes the kernel to run the architecture-specific 9062306a36Sopenharmony_ci default code for idle CPUs on the CPU in question instead of ``CPUIdle`` 9162306a36Sopenharmony_ci until the ``->enable()`` governor callback is invoked for that CPU 9262306a36Sopenharmony_ci again. 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ci:c:member:`disable` 9562306a36Sopenharmony_ci :: 9662306a36Sopenharmony_ci 9762306a36Sopenharmony_ci void (*disable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); 9862306a36Sopenharmony_ci 9962306a36Sopenharmony_ci Called to make the governor stop handling the (logical) CPU represented 10062306a36Sopenharmony_ci by the struct cpuidle_device object pointed to by the ``dev`` 10162306a36Sopenharmony_ci argument. 10262306a36Sopenharmony_ci 10362306a36Sopenharmony_ci It is expected to reverse any changes made by the ``->enable()`` 10462306a36Sopenharmony_ci callback when it was last invoked for the target CPU, free all memory 10562306a36Sopenharmony_ci allocated by that callback and so on. 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ci:c:member:`select` 10862306a36Sopenharmony_ci :: 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ci int (*select) (struct cpuidle_driver *drv, struct cpuidle_device *dev, 11162306a36Sopenharmony_ci bool *stop_tick); 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ci Called to select an idle state for the processor holding the (logical) 11462306a36Sopenharmony_ci CPU represented by the struct cpuidle_device object pointed to by the 11562306a36Sopenharmony_ci ``dev`` argument. 11662306a36Sopenharmony_ci 11762306a36Sopenharmony_ci The list of idle states to take into consideration is represented by the 11862306a36Sopenharmony_ci :c:member:`states` array of struct cpuidle_state objects held by the 11962306a36Sopenharmony_ci struct cpuidle_driver object pointed to by the ``drv`` argument (which 12062306a36Sopenharmony_ci represents the ``CPUIdle`` driver to be used with the CPU at hand). The 12162306a36Sopenharmony_ci value returned by this callback is interpreted as an index into that 12262306a36Sopenharmony_ci array (unless it is a negative error code). 12362306a36Sopenharmony_ci 12462306a36Sopenharmony_ci The ``stop_tick`` argument is used to indicate whether or not to stop 12562306a36Sopenharmony_ci the scheduler tick before asking the processor to enter the selected 12662306a36Sopenharmony_ci idle state. When the ``bool`` variable pointed to by it (which is set 12762306a36Sopenharmony_ci to ``true`` before invoking this callback) is cleared to ``false``, the 12862306a36Sopenharmony_ci processor will be asked to enter the selected idle state without 12962306a36Sopenharmony_ci stopping the scheduler tick on the given CPU (if the tick has been 13062306a36Sopenharmony_ci stopped on that CPU already, however, it will not be restarted before 13162306a36Sopenharmony_ci asking the processor to enter the idle state). 13262306a36Sopenharmony_ci 13362306a36Sopenharmony_ci This callback is mandatory (i.e. the :c:member:`select` callback pointer 13462306a36Sopenharmony_ci in struct cpuidle_governor must not be ``NULL`` for the registration 13562306a36Sopenharmony_ci of the governor to succeed). 13662306a36Sopenharmony_ci 13762306a36Sopenharmony_ci:c:member:`reflect` 13862306a36Sopenharmony_ci :: 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ci void (*reflect) (struct cpuidle_device *dev, int index); 14162306a36Sopenharmony_ci 14262306a36Sopenharmony_ci Called to allow the governor to evaluate the accuracy of the idle state 14362306a36Sopenharmony_ci selection made by the ``->select()`` callback (when it was invoked last 14462306a36Sopenharmony_ci time) and possibly use the result of that to improve the accuracy of 14562306a36Sopenharmony_ci idle state selections in the future. 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ciIn addition, ``CPUIdle`` governors are required to take power management 14862306a36Sopenharmony_ciquality of service (PM QoS) constraints on the processor wakeup latency into 14962306a36Sopenharmony_ciaccount when selecting idle states. In order to obtain the current effective 15062306a36Sopenharmony_ciPM QoS wakeup latency constraint for a given CPU, a ``CPUIdle`` governor is 15162306a36Sopenharmony_ciexpected to pass the number of the CPU to 15262306a36Sopenharmony_ci:c:func:`cpuidle_governor_latency_req()`. Then, the governor's ``->select()`` 15362306a36Sopenharmony_cicallback must not return the index of an indle state whose 15462306a36Sopenharmony_ci:c:member:`exit_latency` value is greater than the number returned by that 15562306a36Sopenharmony_cifunction. 15662306a36Sopenharmony_ci 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ciCPU Idle Time Management Drivers 15962306a36Sopenharmony_ci================================ 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ciCPU idle time management (``CPUIdle``) drivers provide an interface between the 16262306a36Sopenharmony_ciother parts of ``CPUIdle`` and the hardware. 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ciFirst of all, a ``CPUIdle`` driver has to populate the :c:member:`states` array 16562306a36Sopenharmony_ciof struct cpuidle_state objects included in the struct cpuidle_driver object 16662306a36Sopenharmony_cirepresenting it. Going forward this array will represent the list of available 16762306a36Sopenharmony_ciidle states that the processor hardware can be asked to enter shared by all of 16862306a36Sopenharmony_cithe logical CPUs handled by the given driver. 16962306a36Sopenharmony_ci 17062306a36Sopenharmony_ciThe entries in the :c:member:`states` array are expected to be sorted by the 17162306a36Sopenharmony_civalue of the :c:member:`target_residency` field in struct cpuidle_state in 17262306a36Sopenharmony_cithe ascending order (that is, index 0 should correspond to the idle state with 17362306a36Sopenharmony_cithe minimum value of :c:member:`target_residency`). [Since the 17462306a36Sopenharmony_ci:c:member:`target_residency` value is expected to reflect the "depth" of the 17562306a36Sopenharmony_ciidle state represented by the struct cpuidle_state object holding it, this 17662306a36Sopenharmony_cisorting order should be the same as the ascending sorting order by the idle 17762306a36Sopenharmony_cistate "depth".] 17862306a36Sopenharmony_ci 17962306a36Sopenharmony_ciThree fields in struct cpuidle_state are used by the existing ``CPUIdle`` 18062306a36Sopenharmony_cigovernors for computations related to idle state selection: 18162306a36Sopenharmony_ci 18262306a36Sopenharmony_ci:c:member:`target_residency` 18362306a36Sopenharmony_ci Minimum time to spend in this idle state including the time needed to 18462306a36Sopenharmony_ci enter it (which may be substantial) to save more energy than could 18562306a36Sopenharmony_ci be saved by staying in a shallower idle state for the same amount of 18662306a36Sopenharmony_ci time, in microseconds. 18762306a36Sopenharmony_ci 18862306a36Sopenharmony_ci:c:member:`exit_latency` 18962306a36Sopenharmony_ci Maximum time it will take a CPU asking the processor to enter this idle 19062306a36Sopenharmony_ci state to start executing the first instruction after a wakeup from it, 19162306a36Sopenharmony_ci in microseconds. 19262306a36Sopenharmony_ci 19362306a36Sopenharmony_ci:c:member:`flags` 19462306a36Sopenharmony_ci Flags representing idle state properties. Currently, governors only use 19562306a36Sopenharmony_ci the ``CPUIDLE_FLAG_POLLING`` flag which is set if the given object 19662306a36Sopenharmony_ci does not represent a real idle state, but an interface to a software 19762306a36Sopenharmony_ci "loop" that can be used in order to avoid asking the processor to enter 19862306a36Sopenharmony_ci any idle state at all. [There are other flags used by the ``CPUIdle`` 19962306a36Sopenharmony_ci core in special situations.] 20062306a36Sopenharmony_ci 20162306a36Sopenharmony_ciThe :c:member:`enter` callback pointer in struct cpuidle_state, which must not 20262306a36Sopenharmony_cibe ``NULL``, points to the routine to execute in order to ask the processor to 20362306a36Sopenharmony_cienter this particular idle state: 20462306a36Sopenharmony_ci 20562306a36Sopenharmony_ci:: 20662306a36Sopenharmony_ci 20762306a36Sopenharmony_ci void (*enter) (struct cpuidle_device *dev, struct cpuidle_driver *drv, 20862306a36Sopenharmony_ci int index); 20962306a36Sopenharmony_ci 21062306a36Sopenharmony_ciThe first two arguments of it point to the struct cpuidle_device object 21162306a36Sopenharmony_cirepresenting the logical CPU running this callback and the 21262306a36Sopenharmony_cistruct cpuidle_driver object representing the driver itself, respectively, 21362306a36Sopenharmony_ciand the last one is an index of the struct cpuidle_state entry in the driver's 21462306a36Sopenharmony_ci:c:member:`states` array representing the idle state to ask the processor to 21562306a36Sopenharmony_cienter. 21662306a36Sopenharmony_ci 21762306a36Sopenharmony_ciThe analogous ``->enter_s2idle()`` callback in struct cpuidle_state is used 21862306a36Sopenharmony_cionly for implementing the suspend-to-idle system-wide power management feature. 21962306a36Sopenharmony_ciThe difference between in and ``->enter()`` is that it must not re-enable 22062306a36Sopenharmony_ciinterrupts at any point (even temporarily) or attempt to change the states of 22162306a36Sopenharmony_ciclock event devices, which the ``->enter()`` callback may do sometimes. 22262306a36Sopenharmony_ci 22362306a36Sopenharmony_ciOnce the :c:member:`states` array has been populated, the number of valid 22462306a36Sopenharmony_cientries in it has to be stored in the :c:member:`state_count` field of the 22562306a36Sopenharmony_cistruct cpuidle_driver object representing the driver. Moreover, if any 22662306a36Sopenharmony_cientries in the :c:member:`states` array represent "coupled" idle states (that 22762306a36Sopenharmony_ciis, idle states that can only be asked for if multiple related logical CPUs are 22862306a36Sopenharmony_ciidle), the :c:member:`safe_state_index` field in struct cpuidle_driver needs 22962306a36Sopenharmony_cito be the index of an idle state that is not "coupled" (that is, one that can be 23062306a36Sopenharmony_ciasked for if only one logical CPU is idle). 23162306a36Sopenharmony_ci 23262306a36Sopenharmony_ciIn addition to that, if the given ``CPUIdle`` driver is only going to handle a 23362306a36Sopenharmony_cisubset of logical CPUs in the system, the :c:member:`cpumask` field in its 23462306a36Sopenharmony_cistruct cpuidle_driver object must point to the set (mask) of CPUs that will be 23562306a36Sopenharmony_cihandled by it. 23662306a36Sopenharmony_ci 23762306a36Sopenharmony_ciA ``CPUIdle`` driver can only be used after it has been registered. If there 23862306a36Sopenharmony_ciare no "coupled" idle state entries in the driver's :c:member:`states` array, 23962306a36Sopenharmony_cithat can be accomplished by passing the driver's struct cpuidle_driver object 24062306a36Sopenharmony_cito :c:func:`cpuidle_register_driver()`. Otherwise, :c:func:`cpuidle_register()` 24162306a36Sopenharmony_cishould be used for this purpose. 24262306a36Sopenharmony_ci 24362306a36Sopenharmony_ciHowever, it also is necessary to register struct cpuidle_device objects for 24462306a36Sopenharmony_ciall of the logical CPUs to be handled by the given ``CPUIdle`` driver with the 24562306a36Sopenharmony_cihelp of :c:func:`cpuidle_register_device()` after the driver has been registered 24662306a36Sopenharmony_ciand :c:func:`cpuidle_register_driver()`, unlike :c:func:`cpuidle_register()`, 24762306a36Sopenharmony_cidoes not do that automatically. For this reason, the drivers that use 24862306a36Sopenharmony_ci:c:func:`cpuidle_register_driver()` to register themselves must also take care 24962306a36Sopenharmony_ciof registering the struct cpuidle_device objects as needed, so it is generally 25062306a36Sopenharmony_cirecommended to use :c:func:`cpuidle_register()` for ``CPUIdle`` driver 25162306a36Sopenharmony_ciregistration in all cases. 25262306a36Sopenharmony_ci 25362306a36Sopenharmony_ciThe registration of a struct cpuidle_device object causes the ``CPUIdle`` 25462306a36Sopenharmony_ci``sysfs`` interface to be created and the governor's ``->enable()`` callback to 25562306a36Sopenharmony_cibe invoked for the logical CPU represented by it, so it must take place after 25662306a36Sopenharmony_ciregistering the driver that will handle the CPU in question. 25762306a36Sopenharmony_ci 25862306a36Sopenharmony_ci``CPUIdle`` drivers and struct cpuidle_device objects can be unregistered 25962306a36Sopenharmony_ciwhen they are not necessary any more which allows some resources associated with 26062306a36Sopenharmony_cithem to be released. Due to dependencies between them, all of the 26162306a36Sopenharmony_cistruct cpuidle_device objects representing CPUs handled by the given 26262306a36Sopenharmony_ci``CPUIdle`` driver must be unregistered, with the help of 26362306a36Sopenharmony_ci:c:func:`cpuidle_unregister_device()`, before calling 26462306a36Sopenharmony_ci:c:func:`cpuidle_unregister_driver()` to unregister the driver. Alternatively, 26562306a36Sopenharmony_ci:c:func:`cpuidle_unregister()` can be called to unregister a ``CPUIdle`` driver 26662306a36Sopenharmony_cialong with all of the struct cpuidle_device objects representing CPUs handled 26762306a36Sopenharmony_ciby it. 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ci``CPUIdle`` drivers can respond to runtime system configuration changes that 27062306a36Sopenharmony_cilead to modifications of the list of available processor idle states (which can 27162306a36Sopenharmony_cihappen, for example, when the system's power source is switched from AC to 27262306a36Sopenharmony_cibattery or the other way around). Upon a notification of such a change, 27362306a36Sopenharmony_cia ``CPUIdle`` driver is expected to call :c:func:`cpuidle_pause_and_lock()` to 27462306a36Sopenharmony_citurn ``CPUIdle`` off temporarily and then :c:func:`cpuidle_disable_device()` for 27562306a36Sopenharmony_ciall of the struct cpuidle_device objects representing CPUs affected by that 27662306a36Sopenharmony_cichange. Next, it can update its :c:member:`states` array in accordance with 27762306a36Sopenharmony_cithe new configuration of the system, call :c:func:`cpuidle_enable_device()` for 27862306a36Sopenharmony_ciall of the relevant struct cpuidle_device objects and invoke 27962306a36Sopenharmony_ci:c:func:`cpuidle_resume_and_unlock()` to allow ``CPUIdle`` to be used again. 280