18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci.. include:: <isonum.txt> 38c2ecf20Sopenharmony_ci 48c2ecf20Sopenharmony_ci=============================================== 58c2ecf20Sopenharmony_ci``intel_pstate`` CPU Performance Scaling Driver 68c2ecf20Sopenharmony_ci=============================================== 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ci:Copyright: |copy| 2017 Intel Corporation 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ci:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciGeneral Information 148c2ecf20Sopenharmony_ci=================== 158c2ecf20Sopenharmony_ci 168c2ecf20Sopenharmony_ci``intel_pstate`` is a part of the 178c2ecf20Sopenharmony_ci:doc:`CPU performance scaling subsystem <cpufreq>` in the Linux kernel 188c2ecf20Sopenharmony_ci(``CPUFreq``). It is a scaling driver for the Sandy Bridge and later 198c2ecf20Sopenharmony_cigenerations of Intel processors. Note, however, that some of those processors 208c2ecf20Sopenharmony_cimay not be supported. [To understand ``intel_pstate`` it is necessary to know 218c2ecf20Sopenharmony_cihow ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if 228c2ecf20Sopenharmony_ciyou have not done that yet.] 238c2ecf20Sopenharmony_ci 248c2ecf20Sopenharmony_ciFor the processors supported by ``intel_pstate``, the P-state concept is broader 258c2ecf20Sopenharmony_cithan just an operating frequency or an operating performance point (see the 268c2ecf20Sopenharmony_ciLinuxCon Europe 2015 presentation by Kristen Accardi [1]_ for more 278c2ecf20Sopenharmony_ciinformation about that). For this reason, the representation of P-states used 288c2ecf20Sopenharmony_ciby ``intel_pstate`` internally follows the hardware specification (for details 298c2ecf20Sopenharmony_cirefer to Intel Software Developer’s Manual [2]_). However, the ``CPUFreq`` core 308c2ecf20Sopenharmony_ciuses frequencies for identifying operating performance points of CPUs and 318c2ecf20Sopenharmony_cifrequencies are involved in the user space interface exposed by it, so 328c2ecf20Sopenharmony_ci``intel_pstate`` maps its internal representation of P-states to frequencies too 338c2ecf20Sopenharmony_ci(fortunately, that mapping is unambiguous). At the same time, it would not be 348c2ecf20Sopenharmony_cipractical for ``intel_pstate`` to supply the ``CPUFreq`` core with a table of 358c2ecf20Sopenharmony_ciavailable frequencies due to the possible size of it, so the driver does not do 368c2ecf20Sopenharmony_cithat. Some functionality of the core is limited by that. 378c2ecf20Sopenharmony_ci 388c2ecf20Sopenharmony_ciSince the hardware P-state selection interface used by ``intel_pstate`` is 398c2ecf20Sopenharmony_ciavailable at the logical CPU level, the driver always works with individual 408c2ecf20Sopenharmony_ciCPUs. Consequently, if ``intel_pstate`` is in use, every ``CPUFreq`` policy 418c2ecf20Sopenharmony_ciobject corresponds to one logical CPU and ``CPUFreq`` policies are effectively 428c2ecf20Sopenharmony_ciequivalent to CPUs. In particular, this means that they become "inactive" every 438c2ecf20Sopenharmony_citime the corresponding CPU is taken offline and need to be re-initialized when 448c2ecf20Sopenharmony_ciit goes back online. 458c2ecf20Sopenharmony_ci 468c2ecf20Sopenharmony_ci``intel_pstate`` is not modular, so it cannot be unloaded, which means that the 478c2ecf20Sopenharmony_cionly way to pass early-configuration-time parameters to it is via the kernel 488c2ecf20Sopenharmony_cicommand line. However, its configuration can be adjusted via ``sysfs`` to a 498c2ecf20Sopenharmony_cigreat extent. In some configurations it even is possible to unregister it via 508c2ecf20Sopenharmony_ci``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and 518c2ecf20Sopenharmony_ciregistered (see `below <status_attr_>`_). 528c2ecf20Sopenharmony_ci 538c2ecf20Sopenharmony_ci 548c2ecf20Sopenharmony_ciOperation Modes 558c2ecf20Sopenharmony_ci=============== 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ci``intel_pstate`` can operate in two different modes, active or passive. In the 588c2ecf20Sopenharmony_ciactive mode, it uses its own internal performance scaling governor algorithm or 598c2ecf20Sopenharmony_ciallows the hardware to do preformance scaling by itself, while in the passive 608c2ecf20Sopenharmony_cimode it responds to requests made by a generic ``CPUFreq`` governor implementing 618c2ecf20Sopenharmony_cia certain performance scaling algorithm. Which of them will be in effect 628c2ecf20Sopenharmony_cidepends on what kernel command line options are used and on the capabilities of 638c2ecf20Sopenharmony_cithe processor. 648c2ecf20Sopenharmony_ci 658c2ecf20Sopenharmony_ciActive Mode 668c2ecf20Sopenharmony_ci----------- 678c2ecf20Sopenharmony_ci 688c2ecf20Sopenharmony_ciThis is the default operation mode of ``intel_pstate`` for processors with 698c2ecf20Sopenharmony_cihardware-managed P-states (HWP) support. If it works in this mode, the 708c2ecf20Sopenharmony_ci``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies 718c2ecf20Sopenharmony_cicontains the string "intel_pstate". 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ciIn this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and 748c2ecf20Sopenharmony_ciprovides its own scaling algorithms for P-state selection. Those algorithms 758c2ecf20Sopenharmony_cican be applied to ``CPUFreq`` policies in the same way as generic scaling 768c2ecf20Sopenharmony_cigovernors (that is, through the ``scaling_governor`` policy attribute in 778c2ecf20Sopenharmony_ci``sysfs``). [Note that different P-state selection algorithms may be chosen for 788c2ecf20Sopenharmony_cidifferent policies, but that is not recommended.] 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ciThey are not generic scaling governors, but their names are the same as the 818c2ecf20Sopenharmony_cinames of some of those governors. Moreover, confusingly enough, they generally 828c2ecf20Sopenharmony_cido not work in the same way as the generic governors they share the names with. 838c2ecf20Sopenharmony_ciFor example, the ``powersave`` P-state selection algorithm provided by 848c2ecf20Sopenharmony_ci``intel_pstate`` is not a counterpart of the generic ``powersave`` governor 858c2ecf20Sopenharmony_ci(roughly, it corresponds to the ``schedutil`` and ``ondemand`` governors). 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ciThere are two P-state selection algorithms provided by ``intel_pstate`` in the 888c2ecf20Sopenharmony_ciactive mode: ``powersave`` and ``performance``. The way they both operate 898c2ecf20Sopenharmony_cidepends on whether or not the hardware-managed P-states (HWP) feature has been 908c2ecf20Sopenharmony_cienabled in the processor and possibly on the processor model. 918c2ecf20Sopenharmony_ci 928c2ecf20Sopenharmony_ciWhich of the P-state selection algorithms is used by default depends on the 938c2ecf20Sopenharmony_ci:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option. 948c2ecf20Sopenharmony_ciNamely, if that option is set, the ``performance`` algorithm will be used by 958c2ecf20Sopenharmony_cidefault, and the other one will be used by default if it is not set. 968c2ecf20Sopenharmony_ci 978c2ecf20Sopenharmony_ciActive Mode With HWP 988c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~ 998c2ecf20Sopenharmony_ci 1008c2ecf20Sopenharmony_ciIf the processor supports the HWP feature, it will be enabled during the 1018c2ecf20Sopenharmony_ciprocessor initialization and cannot be disabled after that. It is possible 1028c2ecf20Sopenharmony_cito avoid enabling it by passing the ``intel_pstate=no_hwp`` argument to the 1038c2ecf20Sopenharmony_cikernel in the command line. 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ciIf the HWP feature has been enabled, ``intel_pstate`` relies on the processor to 1068c2ecf20Sopenharmony_ciselect P-states by itself, but still it can give hints to the processor's 1078c2ecf20Sopenharmony_ciinternal P-state selection logic. What those hints are depends on which P-state 1088c2ecf20Sopenharmony_ciselection algorithm has been applied to the given policy (or to the CPU it 1098c2ecf20Sopenharmony_cicorresponds to). 1108c2ecf20Sopenharmony_ci 1118c2ecf20Sopenharmony_ciEven though the P-state selection is carried out by the processor automatically, 1128c2ecf20Sopenharmony_ci``intel_pstate`` registers utilization update callbacks with the CPU scheduler 1138c2ecf20Sopenharmony_ciin this mode. However, they are not used for running a P-state selection 1148c2ecf20Sopenharmony_cialgorithm, but for periodic updates of the current CPU frequency information to 1158c2ecf20Sopenharmony_cibe made available from the ``scaling_cur_freq`` policy attribute in ``sysfs``. 1168c2ecf20Sopenharmony_ci 1178c2ecf20Sopenharmony_ciHWP + ``performance`` 1188c2ecf20Sopenharmony_ci..................... 1198c2ecf20Sopenharmony_ci 1208c2ecf20Sopenharmony_ciIn this configuration ``intel_pstate`` will write 0 to the processor's 1218c2ecf20Sopenharmony_ciEnergy-Performance Preference (EPP) knob (if supported) or its 1228c2ecf20Sopenharmony_ciEnergy-Performance Bias (EPB) knob (otherwise), which means that the processor's 1238c2ecf20Sopenharmony_ciinternal P-state selection logic is expected to focus entirely on performance. 1248c2ecf20Sopenharmony_ci 1258c2ecf20Sopenharmony_ciThis will override the EPP/EPB setting coming from the ``sysfs`` interface 1268c2ecf20Sopenharmony_ci(see `Energy vs Performance Hints`_ below). Moreover, any attempts to change 1278c2ecf20Sopenharmony_cithe EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this 1288c2ecf20Sopenharmony_ciconfiguration will be rejected. 1298c2ecf20Sopenharmony_ci 1308c2ecf20Sopenharmony_ciAlso, in this configuration the range of P-states available to the processor's 1318c2ecf20Sopenharmony_ciinternal P-state selection logic is always restricted to the upper boundary 1328c2ecf20Sopenharmony_ci(that is, the maximum P-state that the driver is allowed to use). 1338c2ecf20Sopenharmony_ci 1348c2ecf20Sopenharmony_ciHWP + ``powersave`` 1358c2ecf20Sopenharmony_ci................... 1368c2ecf20Sopenharmony_ci 1378c2ecf20Sopenharmony_ciIn this configuration ``intel_pstate`` will set the processor's 1388c2ecf20Sopenharmony_ciEnergy-Performance Preference (EPP) knob (if supported) or its 1398c2ecf20Sopenharmony_ciEnergy-Performance Bias (EPB) knob (otherwise) to whatever value it was 1408c2ecf20Sopenharmony_cipreviously set to via ``sysfs`` (or whatever default value it was 1418c2ecf20Sopenharmony_ciset to by the platform firmware). This usually causes the processor's 1428c2ecf20Sopenharmony_ciinternal P-state selection logic to be less performance-focused. 1438c2ecf20Sopenharmony_ci 1448c2ecf20Sopenharmony_ciActive Mode Without HWP 1458c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~ 1468c2ecf20Sopenharmony_ci 1478c2ecf20Sopenharmony_ciThis operation mode is optional for processors that do not support the HWP 1488c2ecf20Sopenharmony_cifeature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in 1498c2ecf20Sopenharmony_cithe command line. The active mode is used in those cases if the 1508c2ecf20Sopenharmony_ci``intel_pstate=active`` argument is passed to the kernel in the command line. 1518c2ecf20Sopenharmony_ciIn this mode ``intel_pstate`` may refuse to work with processors that are not 1528c2ecf20Sopenharmony_cirecognized by it. [Note that ``intel_pstate`` will never refuse to work with 1538c2ecf20Sopenharmony_ciany processor with the HWP feature enabled.] 1548c2ecf20Sopenharmony_ci 1558c2ecf20Sopenharmony_ciIn this mode ``intel_pstate`` registers utilization update callbacks with the 1568c2ecf20Sopenharmony_ciCPU scheduler in order to run a P-state selection algorithm, either 1578c2ecf20Sopenharmony_ci``powersave`` or ``performance``, depending on the ``scaling_governor`` policy 1588c2ecf20Sopenharmony_cisetting in ``sysfs``. The current CPU frequency information to be made 1598c2ecf20Sopenharmony_ciavailable from the ``scaling_cur_freq`` policy attribute in ``sysfs`` is 1608c2ecf20Sopenharmony_ciperiodically updated by those utilization update callbacks too. 1618c2ecf20Sopenharmony_ci 1628c2ecf20Sopenharmony_ci``performance`` 1638c2ecf20Sopenharmony_ci............... 1648c2ecf20Sopenharmony_ci 1658c2ecf20Sopenharmony_ciWithout HWP, this P-state selection algorithm is always the same regardless of 1668c2ecf20Sopenharmony_cithe processor model and platform configuration. 1678c2ecf20Sopenharmony_ci 1688c2ecf20Sopenharmony_ciIt selects the maximum P-state it is allowed to use, subject to limits set via 1698c2ecf20Sopenharmony_ci``sysfs``, every time the driver configuration for the given CPU is updated 1708c2ecf20Sopenharmony_ci(e.g. via ``sysfs``). 1718c2ecf20Sopenharmony_ci 1728c2ecf20Sopenharmony_ciThis is the default P-state selection algorithm if the 1738c2ecf20Sopenharmony_ci:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option 1748c2ecf20Sopenharmony_ciis set. 1758c2ecf20Sopenharmony_ci 1768c2ecf20Sopenharmony_ci``powersave`` 1778c2ecf20Sopenharmony_ci............. 1788c2ecf20Sopenharmony_ci 1798c2ecf20Sopenharmony_ciWithout HWP, this P-state selection algorithm is similar to the algorithm 1808c2ecf20Sopenharmony_ciimplemented by the generic ``schedutil`` scaling governor except that the 1818c2ecf20Sopenharmony_ciutilization metric used by it is based on numbers coming from feedback 1828c2ecf20Sopenharmony_ciregisters of the CPU. It generally selects P-states proportional to the 1838c2ecf20Sopenharmony_cicurrent CPU utilization. 1848c2ecf20Sopenharmony_ci 1858c2ecf20Sopenharmony_ciThis algorithm is run by the driver's utilization update callback for the 1868c2ecf20Sopenharmony_cigiven CPU when it is invoked by the CPU scheduler, but not more often than 1878c2ecf20Sopenharmony_cievery 10 ms. Like in the ``performance`` case, the hardware configuration 1888c2ecf20Sopenharmony_ciis not touched if the new P-state turns out to be the same as the current 1898c2ecf20Sopenharmony_cione. 1908c2ecf20Sopenharmony_ci 1918c2ecf20Sopenharmony_ciThis is the default P-state selection algorithm if the 1928c2ecf20Sopenharmony_ci:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option 1938c2ecf20Sopenharmony_ciis not set. 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ciPassive Mode 1968c2ecf20Sopenharmony_ci------------ 1978c2ecf20Sopenharmony_ci 1988c2ecf20Sopenharmony_ciThis is the default operation mode of ``intel_pstate`` for processors without 1998c2ecf20Sopenharmony_cihardware-managed P-states (HWP) support. It is always used if the 2008c2ecf20Sopenharmony_ci``intel_pstate=passive`` argument is passed to the kernel in the command line 2018c2ecf20Sopenharmony_ciregardless of whether or not the given processor supports HWP. [Note that the 2028c2ecf20Sopenharmony_ci``intel_pstate=no_hwp`` setting causes the driver to start in the passive mode 2038c2ecf20Sopenharmony_ciif it is not combined with ``intel_pstate=active``.] Like in the active mode 2048c2ecf20Sopenharmony_ciwithout HWP support, in this mode ``intel_pstate`` may refuse to work with 2058c2ecf20Sopenharmony_ciprocessors that are not recognized by it if HWP is prevented from being enabled 2068c2ecf20Sopenharmony_cithrough the kernel command line. 2078c2ecf20Sopenharmony_ci 2088c2ecf20Sopenharmony_ciIf the driver works in this mode, the ``scaling_driver`` policy attribute in 2098c2ecf20Sopenharmony_ci``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq". 2108c2ecf20Sopenharmony_ciThen, the driver behaves like a regular ``CPUFreq`` scaling driver. That is, 2118c2ecf20Sopenharmony_ciit is invoked by generic scaling governors when necessary to talk to the 2128c2ecf20Sopenharmony_cihardware in order to change the P-state of a CPU (in particular, the 2138c2ecf20Sopenharmony_ci``schedutil`` governor can invoke it directly from scheduler context). 2148c2ecf20Sopenharmony_ci 2158c2ecf20Sopenharmony_ciWhile in this mode, ``intel_pstate`` can be used with all of the (generic) 2168c2ecf20Sopenharmony_ciscaling governors listed by the ``scaling_available_governors`` policy attribute 2178c2ecf20Sopenharmony_ciin ``sysfs`` (and the P-state selection algorithms described above are not 2188c2ecf20Sopenharmony_ciused). Then, it is responsible for the configuration of policy objects 2198c2ecf20Sopenharmony_cicorresponding to CPUs and provides the ``CPUFreq`` core (and the scaling 2208c2ecf20Sopenharmony_cigovernors attached to the policy objects) with accurate information on the 2218c2ecf20Sopenharmony_cimaximum and minimum operating frequencies supported by the hardware (including 2228c2ecf20Sopenharmony_cithe so-called "turbo" frequency ranges). In other words, in the passive mode 2238c2ecf20Sopenharmony_cithe entire range of available P-states is exposed by ``intel_pstate`` to the 2248c2ecf20Sopenharmony_ci``CPUFreq`` core. However, in this mode the driver does not register 2258c2ecf20Sopenharmony_ciutilization update callbacks with the CPU scheduler and the ``scaling_cur_freq`` 2268c2ecf20Sopenharmony_ciinformation comes from the ``CPUFreq`` core (and is the last frequency selected 2278c2ecf20Sopenharmony_ciby the current scaling governor for the given policy). 2288c2ecf20Sopenharmony_ci 2298c2ecf20Sopenharmony_ci 2308c2ecf20Sopenharmony_ci.. _turbo: 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ciTurbo P-states Support 2338c2ecf20Sopenharmony_ci====================== 2348c2ecf20Sopenharmony_ci 2358c2ecf20Sopenharmony_ciIn the majority of cases, the entire range of P-states available to 2368c2ecf20Sopenharmony_ci``intel_pstate`` can be divided into two sub-ranges that correspond to 2378c2ecf20Sopenharmony_cidifferent types of processor behavior, above and below a boundary that 2388c2ecf20Sopenharmony_ciwill be referred to as the "turbo threshold" in what follows. 2398c2ecf20Sopenharmony_ci 2408c2ecf20Sopenharmony_ciThe P-states above the turbo threshold are referred to as "turbo P-states" and 2418c2ecf20Sopenharmony_cithe whole sub-range of P-states they belong to is referred to as the "turbo 2428c2ecf20Sopenharmony_cirange". These names are related to the Turbo Boost technology allowing a 2438c2ecf20Sopenharmony_cimulticore processor to opportunistically increase the P-state of one or more 2448c2ecf20Sopenharmony_cicores if there is enough power to do that and if that is not going to cause the 2458c2ecf20Sopenharmony_cithermal envelope of the processor package to be exceeded. 2468c2ecf20Sopenharmony_ci 2478c2ecf20Sopenharmony_ciSpecifically, if software sets the P-state of a CPU core within the turbo range 2488c2ecf20Sopenharmony_ci(that is, above the turbo threshold), the processor is permitted to take over 2498c2ecf20Sopenharmony_ciperformance scaling control for that core and put it into turbo P-states of its 2508c2ecf20Sopenharmony_cichoice going forward. However, that permission is interpreted differently by 2518c2ecf20Sopenharmony_cidifferent processor generations. Namely, the Sandy Bridge generation of 2528c2ecf20Sopenharmony_ciprocessors will never use any P-states above the last one set by software for 2538c2ecf20Sopenharmony_cithe given core, even if it is within the turbo range, whereas all of the later 2548c2ecf20Sopenharmony_ciprocessor generations will take it as a license to use any P-states from the 2558c2ecf20Sopenharmony_citurbo range, even above the one set by software. In other words, on those 2568c2ecf20Sopenharmony_ciprocessors setting any P-state from the turbo range will enable the processor 2578c2ecf20Sopenharmony_cito put the given core into all turbo P-states up to and including the maximum 2588c2ecf20Sopenharmony_cisupported one as it sees fit. 2598c2ecf20Sopenharmony_ci 2608c2ecf20Sopenharmony_ciOne important property of turbo P-states is that they are not sustainable. More 2618c2ecf20Sopenharmony_ciprecisely, there is no guarantee that any CPUs will be able to stay in any of 2628c2ecf20Sopenharmony_cithose states indefinitely, because the power distribution within the processor 2638c2ecf20Sopenharmony_cipackage may change over time or the thermal envelope it was designed for might 2648c2ecf20Sopenharmony_cibe exceeded if a turbo P-state was used for too long. 2658c2ecf20Sopenharmony_ci 2668c2ecf20Sopenharmony_ciIn turn, the P-states below the turbo threshold generally are sustainable. In 2678c2ecf20Sopenharmony_cifact, if one of them is set by software, the processor is not expected to change 2688c2ecf20Sopenharmony_ciit to a lower one unless in a thermal stress or a power limit violation 2698c2ecf20Sopenharmony_cisituation (a higher P-state may still be used if it is set for another CPU in 2708c2ecf20Sopenharmony_cithe same package at the same time, for example). 2718c2ecf20Sopenharmony_ci 2728c2ecf20Sopenharmony_ciSome processors allow multiple cores to be in turbo P-states at the same time, 2738c2ecf20Sopenharmony_cibut the maximum P-state that can be set for them generally depends on the number 2748c2ecf20Sopenharmony_ciof cores running concurrently. The maximum turbo P-state that can be set for 3 2758c2ecf20Sopenharmony_cicores at the same time usually is lower than the analogous maximum P-state for 2768c2ecf20Sopenharmony_ci2 cores, which in turn usually is lower than the maximum turbo P-state that can 2778c2ecf20Sopenharmony_cibe set for 1 core. The one-core maximum turbo P-state is thus the maximum 2788c2ecf20Sopenharmony_cisupported one overall. 2798c2ecf20Sopenharmony_ci 2808c2ecf20Sopenharmony_ciThe maximum supported turbo P-state, the turbo threshold (the maximum supported 2818c2ecf20Sopenharmony_cinon-turbo P-state) and the minimum supported P-state are specific to the 2828c2ecf20Sopenharmony_ciprocessor model and can be determined by reading the processor's model-specific 2838c2ecf20Sopenharmony_ciregisters (MSRs). Moreover, some processors support the Configurable TDP 2848c2ecf20Sopenharmony_ci(Thermal Design Power) feature and, when that feature is enabled, the turbo 2858c2ecf20Sopenharmony_cithreshold effectively becomes a configurable value that can be set by the 2868c2ecf20Sopenharmony_ciplatform firmware. 2878c2ecf20Sopenharmony_ci 2888c2ecf20Sopenharmony_ciUnlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes 2898c2ecf20Sopenharmony_cithe entire range of available P-states, including the whole turbo range, to the 2908c2ecf20Sopenharmony_ci``CPUFreq`` core and (in the passive mode) to generic scaling governors. This 2918c2ecf20Sopenharmony_cigenerally causes turbo P-states to be set more often when ``intel_pstate`` is 2928c2ecf20Sopenharmony_ciused relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_ 2938c2ecf20Sopenharmony_cifor more information). 2948c2ecf20Sopenharmony_ci 2958c2ecf20Sopenharmony_ciMoreover, since ``intel_pstate`` always knows what the real turbo threshold is 2968c2ecf20Sopenharmony_ci(even if the Configurable TDP feature is enabled in the processor), its 2978c2ecf20Sopenharmony_ci``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should 2988c2ecf20Sopenharmony_ciwork as expected in all cases (that is, if set to disable turbo P-states, it 2998c2ecf20Sopenharmony_cialways should prevent ``intel_pstate`` from using them). 3008c2ecf20Sopenharmony_ci 3018c2ecf20Sopenharmony_ci 3028c2ecf20Sopenharmony_ciProcessor Support 3038c2ecf20Sopenharmony_ci================= 3048c2ecf20Sopenharmony_ci 3058c2ecf20Sopenharmony_ciTo handle a given processor ``intel_pstate`` requires a number of different 3068c2ecf20Sopenharmony_cipieces of information on it to be known, including: 3078c2ecf20Sopenharmony_ci 3088c2ecf20Sopenharmony_ci * The minimum supported P-state. 3098c2ecf20Sopenharmony_ci 3108c2ecf20Sopenharmony_ci * The maximum supported `non-turbo P-state <turbo_>`_. 3118c2ecf20Sopenharmony_ci 3128c2ecf20Sopenharmony_ci * Whether or not turbo P-states are supported at all. 3138c2ecf20Sopenharmony_ci 3148c2ecf20Sopenharmony_ci * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states 3158c2ecf20Sopenharmony_ci are supported). 3168c2ecf20Sopenharmony_ci 3178c2ecf20Sopenharmony_ci * The scaling formula to translate the driver's internal representation 3188c2ecf20Sopenharmony_ci of P-states into frequencies and the other way around. 3198c2ecf20Sopenharmony_ci 3208c2ecf20Sopenharmony_ciGenerally, ways to obtain that information are specific to the processor model 3218c2ecf20Sopenharmony_cior family. Although it often is possible to obtain all of it from the processor 3228c2ecf20Sopenharmony_ciitself (using model-specific registers), there are cases in which hardware 3238c2ecf20Sopenharmony_cimanuals need to be consulted to get to it too. 3248c2ecf20Sopenharmony_ci 3258c2ecf20Sopenharmony_ciFor this reason, there is a list of supported processors in ``intel_pstate`` and 3268c2ecf20Sopenharmony_cithe driver initialization will fail if the detected processor is not in that 3278c2ecf20Sopenharmony_cilist, unless it supports the HWP feature. [The interface to obtain all of the 3288c2ecf20Sopenharmony_ciinformation listed above is the same for all of the processors supporting the 3298c2ecf20Sopenharmony_ciHWP feature, which is why ``intel_pstate`` works with all of them.] 3308c2ecf20Sopenharmony_ci 3318c2ecf20Sopenharmony_ci 3328c2ecf20Sopenharmony_ciUser Space Interface in ``sysfs`` 3338c2ecf20Sopenharmony_ci================================= 3348c2ecf20Sopenharmony_ci 3358c2ecf20Sopenharmony_ciGlobal Attributes 3368c2ecf20Sopenharmony_ci----------------- 3378c2ecf20Sopenharmony_ci 3388c2ecf20Sopenharmony_ci``intel_pstate`` exposes several global attributes (files) in ``sysfs`` to 3398c2ecf20Sopenharmony_cicontrol its functionality at the system level. They are located in the 3408c2ecf20Sopenharmony_ci``/sys/devices/system/cpu/intel_pstate/`` directory and affect all CPUs. 3418c2ecf20Sopenharmony_ci 3428c2ecf20Sopenharmony_ciSome of them are not present if the ``intel_pstate=per_cpu_perf_limits`` 3438c2ecf20Sopenharmony_ciargument is passed to the kernel in the command line. 3448c2ecf20Sopenharmony_ci 3458c2ecf20Sopenharmony_ci``max_perf_pct`` 3468c2ecf20Sopenharmony_ci Maximum P-state the driver is allowed to set in percent of the 3478c2ecf20Sopenharmony_ci maximum supported performance level (the highest supported `turbo 3488c2ecf20Sopenharmony_ci P-state <turbo_>`_). 3498c2ecf20Sopenharmony_ci 3508c2ecf20Sopenharmony_ci This attribute will not be exposed if the 3518c2ecf20Sopenharmony_ci ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel 3528c2ecf20Sopenharmony_ci command line. 3538c2ecf20Sopenharmony_ci 3548c2ecf20Sopenharmony_ci``min_perf_pct`` 3558c2ecf20Sopenharmony_ci Minimum P-state the driver is allowed to set in percent of the 3568c2ecf20Sopenharmony_ci maximum supported performance level (the highest supported `turbo 3578c2ecf20Sopenharmony_ci P-state <turbo_>`_). 3588c2ecf20Sopenharmony_ci 3598c2ecf20Sopenharmony_ci This attribute will not be exposed if the 3608c2ecf20Sopenharmony_ci ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel 3618c2ecf20Sopenharmony_ci command line. 3628c2ecf20Sopenharmony_ci 3638c2ecf20Sopenharmony_ci``num_pstates`` 3648c2ecf20Sopenharmony_ci Number of P-states supported by the processor (between 0 and 255 3658c2ecf20Sopenharmony_ci inclusive) including both turbo and non-turbo P-states (see 3668c2ecf20Sopenharmony_ci `Turbo P-states Support`_). 3678c2ecf20Sopenharmony_ci 3688c2ecf20Sopenharmony_ci The value of this attribute is not affected by the ``no_turbo`` 3698c2ecf20Sopenharmony_ci setting described `below <no_turbo_attr_>`_. 3708c2ecf20Sopenharmony_ci 3718c2ecf20Sopenharmony_ci This attribute is read-only. 3728c2ecf20Sopenharmony_ci 3738c2ecf20Sopenharmony_ci``turbo_pct`` 3748c2ecf20Sopenharmony_ci Ratio of the `turbo range <turbo_>`_ size to the size of the entire 3758c2ecf20Sopenharmony_ci range of supported P-states, in percent. 3768c2ecf20Sopenharmony_ci 3778c2ecf20Sopenharmony_ci This attribute is read-only. 3788c2ecf20Sopenharmony_ci 3798c2ecf20Sopenharmony_ci.. _no_turbo_attr: 3808c2ecf20Sopenharmony_ci 3818c2ecf20Sopenharmony_ci``no_turbo`` 3828c2ecf20Sopenharmony_ci If set (equal to 1), the driver is not allowed to set any turbo P-states 3838c2ecf20Sopenharmony_ci (see `Turbo P-states Support`_). If unset (equalt to 0, which is the 3848c2ecf20Sopenharmony_ci default), turbo P-states can be set by the driver. 3858c2ecf20Sopenharmony_ci [Note that ``intel_pstate`` does not support the general ``boost`` 3868c2ecf20Sopenharmony_ci attribute (supported by some other scaling drivers) which is replaced 3878c2ecf20Sopenharmony_ci by this one.] 3888c2ecf20Sopenharmony_ci 3898c2ecf20Sopenharmony_ci This attrubute does not affect the maximum supported frequency value 3908c2ecf20Sopenharmony_ci supplied to the ``CPUFreq`` core and exposed via the policy interface, 3918c2ecf20Sopenharmony_ci but it affects the maximum possible value of per-policy P-state limits 3928c2ecf20Sopenharmony_ci (see `Interpretation of Policy Attributes`_ below for details). 3938c2ecf20Sopenharmony_ci 3948c2ecf20Sopenharmony_ci``hwp_dynamic_boost`` 3958c2ecf20Sopenharmony_ci This attribute is only present if ``intel_pstate`` works in the 3968c2ecf20Sopenharmony_ci `active mode with the HWP feature enabled <Active Mode With HWP_>`_ in 3978c2ecf20Sopenharmony_ci the processor. If set (equal to 1), it causes the minimum P-state limit 3988c2ecf20Sopenharmony_ci to be increased dynamically for a short time whenever a task previously 3998c2ecf20Sopenharmony_ci waiting on I/O is selected to run on a given logical CPU (the purpose 4008c2ecf20Sopenharmony_ci of this mechanism is to improve performance). 4018c2ecf20Sopenharmony_ci 4028c2ecf20Sopenharmony_ci This setting has no effect on logical CPUs whose minimum P-state limit 4038c2ecf20Sopenharmony_ci is directly set to the highest non-turbo P-state or above it. 4048c2ecf20Sopenharmony_ci 4058c2ecf20Sopenharmony_ci.. _status_attr: 4068c2ecf20Sopenharmony_ci 4078c2ecf20Sopenharmony_ci``status`` 4088c2ecf20Sopenharmony_ci Operation mode of the driver: "active", "passive" or "off". 4098c2ecf20Sopenharmony_ci 4108c2ecf20Sopenharmony_ci "active" 4118c2ecf20Sopenharmony_ci The driver is functional and in the `active mode 4128c2ecf20Sopenharmony_ci <Active Mode_>`_. 4138c2ecf20Sopenharmony_ci 4148c2ecf20Sopenharmony_ci "passive" 4158c2ecf20Sopenharmony_ci The driver is functional and in the `passive mode 4168c2ecf20Sopenharmony_ci <Passive Mode_>`_. 4178c2ecf20Sopenharmony_ci 4188c2ecf20Sopenharmony_ci "off" 4198c2ecf20Sopenharmony_ci The driver is not functional (it is not registered as a scaling 4208c2ecf20Sopenharmony_ci driver with the ``CPUFreq`` core). 4218c2ecf20Sopenharmony_ci 4228c2ecf20Sopenharmony_ci This attribute can be written to in order to change the driver's 4238c2ecf20Sopenharmony_ci operation mode or to unregister it. The string written to it must be 4248c2ecf20Sopenharmony_ci one of the possible values of it and, if successful, the write will 4258c2ecf20Sopenharmony_ci cause the driver to switch over to the operation mode represented by 4268c2ecf20Sopenharmony_ci that string - or to be unregistered in the "off" case. [Actually, 4278c2ecf20Sopenharmony_ci switching over from the active mode to the passive mode or the other 4288c2ecf20Sopenharmony_ci way around causes the driver to be unregistered and registered again 4298c2ecf20Sopenharmony_ci with a different set of callbacks, so all of its settings (the global 4308c2ecf20Sopenharmony_ci as well as the per-policy ones) are then reset to their default 4318c2ecf20Sopenharmony_ci values, possibly depending on the target operation mode.] 4328c2ecf20Sopenharmony_ci 4338c2ecf20Sopenharmony_ci``energy_efficiency`` 4348c2ecf20Sopenharmony_ci This attribute is only present on platforms with CPUs matching the Kaby 4358c2ecf20Sopenharmony_ci Lake or Coffee Lake desktop CPU model. By default, energy-efficiency 4368c2ecf20Sopenharmony_ci optimizations are disabled on these CPU models if HWP is enabled. 4378c2ecf20Sopenharmony_ci Enabling energy-efficiency optimizations may limit maximum operating 4388c2ecf20Sopenharmony_ci frequency with or without the HWP feature. With HWP enabled, the 4398c2ecf20Sopenharmony_ci optimizations are done only in the turbo frequency range. Without it, 4408c2ecf20Sopenharmony_ci they are done in the entire available frequency range. Setting this 4418c2ecf20Sopenharmony_ci attribute to "1" enables the energy-efficiency optimizations and setting 4428c2ecf20Sopenharmony_ci to "0" disables them. 4438c2ecf20Sopenharmony_ci 4448c2ecf20Sopenharmony_ciInterpretation of Policy Attributes 4458c2ecf20Sopenharmony_ci----------------------------------- 4468c2ecf20Sopenharmony_ci 4478c2ecf20Sopenharmony_ciThe interpretation of some ``CPUFreq`` policy attributes described in 4488c2ecf20Sopenharmony_ci:doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver 4498c2ecf20Sopenharmony_ciand it generally depends on the driver's `operation mode <Operation Modes_>`_. 4508c2ecf20Sopenharmony_ci 4518c2ecf20Sopenharmony_ciFirst of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and 4528c2ecf20Sopenharmony_ci``scaling_cur_freq`` attributes are produced by applying a processor-specific 4538c2ecf20Sopenharmony_cimultiplier to the internal P-state representation used by ``intel_pstate``. 4548c2ecf20Sopenharmony_ciAlso, the values of the ``scaling_max_freq`` and ``scaling_min_freq`` 4558c2ecf20Sopenharmony_ciattributes are capped by the frequency corresponding to the maximum P-state that 4568c2ecf20Sopenharmony_cithe driver is allowed to set. 4578c2ecf20Sopenharmony_ci 4588c2ecf20Sopenharmony_ciIf the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is 4598c2ecf20Sopenharmony_cinot allowed to use turbo P-states, so the maximum value of ``scaling_max_freq`` 4608c2ecf20Sopenharmony_ciand ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency. 4618c2ecf20Sopenharmony_ciAccordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and 4628c2ecf20Sopenharmony_ci``scaling_min_freq`` to go down to that value if they were above it before. 4638c2ecf20Sopenharmony_ciHowever, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be 4648c2ecf20Sopenharmony_cirestored after unsetting ``no_turbo``, unless these attributes have been written 4658c2ecf20Sopenharmony_cito after ``no_turbo`` was set. 4668c2ecf20Sopenharmony_ci 4678c2ecf20Sopenharmony_ciIf ``no_turbo`` is not set, the maximum possible value of ``scaling_max_freq`` 4688c2ecf20Sopenharmony_ciand ``scaling_min_freq`` corresponds to the maximum supported turbo P-state, 4698c2ecf20Sopenharmony_ciwhich also is the value of ``cpuinfo_max_freq`` in either case. 4708c2ecf20Sopenharmony_ci 4718c2ecf20Sopenharmony_ciNext, the following policy attributes have special meaning if 4728c2ecf20Sopenharmony_ci``intel_pstate`` works in the `active mode <Active Mode_>`_: 4738c2ecf20Sopenharmony_ci 4748c2ecf20Sopenharmony_ci``scaling_available_governors`` 4758c2ecf20Sopenharmony_ci List of P-state selection algorithms provided by ``intel_pstate``. 4768c2ecf20Sopenharmony_ci 4778c2ecf20Sopenharmony_ci``scaling_governor`` 4788c2ecf20Sopenharmony_ci P-state selection algorithm provided by ``intel_pstate`` currently in 4798c2ecf20Sopenharmony_ci use with the given policy. 4808c2ecf20Sopenharmony_ci 4818c2ecf20Sopenharmony_ci``scaling_cur_freq`` 4828c2ecf20Sopenharmony_ci Frequency of the average P-state of the CPU represented by the given 4838c2ecf20Sopenharmony_ci policy for the time interval between the last two invocations of the 4848c2ecf20Sopenharmony_ci driver's utilization update callback by the CPU scheduler for that CPU. 4858c2ecf20Sopenharmony_ci 4868c2ecf20Sopenharmony_ciOne more policy attribute is present if the HWP feature is enabled in the 4878c2ecf20Sopenharmony_ciprocessor: 4888c2ecf20Sopenharmony_ci 4898c2ecf20Sopenharmony_ci``base_frequency`` 4908c2ecf20Sopenharmony_ci Shows the base frequency of the CPU. Any frequency above this will be 4918c2ecf20Sopenharmony_ci in the turbo frequency range. 4928c2ecf20Sopenharmony_ci 4938c2ecf20Sopenharmony_ciThe meaning of these attributes in the `passive mode <Passive Mode_>`_ is the 4948c2ecf20Sopenharmony_cisame as for other scaling drivers. 4958c2ecf20Sopenharmony_ci 4968c2ecf20Sopenharmony_ciAdditionally, the value of the ``scaling_driver`` attribute for ``intel_pstate`` 4978c2ecf20Sopenharmony_cidepends on the operation mode of the driver. Namely, it is either 4988c2ecf20Sopenharmony_ci"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the 4998c2ecf20Sopenharmony_ci`passive mode <Passive Mode_>`_). 5008c2ecf20Sopenharmony_ci 5018c2ecf20Sopenharmony_ciCoordination of P-State Limits 5028c2ecf20Sopenharmony_ci------------------------------ 5038c2ecf20Sopenharmony_ci 5048c2ecf20Sopenharmony_ci``intel_pstate`` allows P-state limits to be set in two ways: with the help of 5058c2ecf20Sopenharmony_cithe ``max_perf_pct`` and ``min_perf_pct`` `global attributes 5068c2ecf20Sopenharmony_ci<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq`` 5078c2ecf20Sopenharmony_ci``CPUFreq`` policy attributes. The coordination between those limits is based 5088c2ecf20Sopenharmony_cion the following rules, regardless of the current operation mode of the driver: 5098c2ecf20Sopenharmony_ci 5108c2ecf20Sopenharmony_ci 1. All CPUs are affected by the global limits (that is, none of them can be 5118c2ecf20Sopenharmony_ci requested to run faster than the global maximum and none of them can be 5128c2ecf20Sopenharmony_ci requested to run slower than the global minimum). 5138c2ecf20Sopenharmony_ci 5148c2ecf20Sopenharmony_ci 2. Each individual CPU is affected by its own per-policy limits (that is, it 5158c2ecf20Sopenharmony_ci cannot be requested to run faster than its own per-policy maximum and it 5168c2ecf20Sopenharmony_ci cannot be requested to run slower than its own per-policy minimum). The 5178c2ecf20Sopenharmony_ci effective performance depends on whether the platform supports per core 5188c2ecf20Sopenharmony_ci P-states, hyper-threading is enabled and on current performance requests 5198c2ecf20Sopenharmony_ci from other CPUs. When platform doesn't support per core P-states, the 5208c2ecf20Sopenharmony_ci effective performance can be more than the policy limits set on a CPU, if 5218c2ecf20Sopenharmony_ci other CPUs are requesting higher performance at that moment. Even with per 5228c2ecf20Sopenharmony_ci core P-states support, when hyper-threading is enabled, if the sibling CPU 5238c2ecf20Sopenharmony_ci is requesting higher performance, the other siblings will get higher 5248c2ecf20Sopenharmony_ci performance than their policy limits. 5258c2ecf20Sopenharmony_ci 5268c2ecf20Sopenharmony_ci 3. The global and per-policy limits can be set independently. 5278c2ecf20Sopenharmony_ci 5288c2ecf20Sopenharmony_ciIn the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the 5298c2ecf20Sopenharmony_ciresulting effective values are written into hardware registers whenever the 5308c2ecf20Sopenharmony_cilimits change in order to request its internal P-state selection logic to always 5318c2ecf20Sopenharmony_ciset P-states within these limits. Otherwise, the limits are taken into account 5328c2ecf20Sopenharmony_ciby scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver 5338c2ecf20Sopenharmony_cievery time before setting a new P-state for a CPU. 5348c2ecf20Sopenharmony_ci 5358c2ecf20Sopenharmony_ciAdditionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument 5368c2ecf20Sopenharmony_ciis passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed 5378c2ecf20Sopenharmony_ciat all and the only way to set the limits is by using the policy attributes. 5388c2ecf20Sopenharmony_ci 5398c2ecf20Sopenharmony_ci 5408c2ecf20Sopenharmony_ciEnergy vs Performance Hints 5418c2ecf20Sopenharmony_ci--------------------------- 5428c2ecf20Sopenharmony_ci 5438c2ecf20Sopenharmony_ciIf the hardware-managed P-states (HWP) is enabled in the processor, additional 5448c2ecf20Sopenharmony_ciattributes, intended to allow user space to help ``intel_pstate`` to adjust the 5458c2ecf20Sopenharmony_ciprocessor's internal P-state selection logic by focusing it on performance or on 5468c2ecf20Sopenharmony_cienergy-efficiency, or somewhere between the two extremes, are present in every 5478c2ecf20Sopenharmony_ci``CPUFreq`` policy directory in ``sysfs``. They are : 5488c2ecf20Sopenharmony_ci 5498c2ecf20Sopenharmony_ci``energy_performance_preference`` 5508c2ecf20Sopenharmony_ci Current value of the energy vs performance hint for the given policy 5518c2ecf20Sopenharmony_ci (or the CPU represented by it). 5528c2ecf20Sopenharmony_ci 5538c2ecf20Sopenharmony_ci The hint can be changed by writing to this attribute. 5548c2ecf20Sopenharmony_ci 5558c2ecf20Sopenharmony_ci``energy_performance_available_preferences`` 5568c2ecf20Sopenharmony_ci List of strings that can be written to the 5578c2ecf20Sopenharmony_ci ``energy_performance_preference`` attribute. 5588c2ecf20Sopenharmony_ci 5598c2ecf20Sopenharmony_ci They represent different energy vs performance hints and should be 5608c2ecf20Sopenharmony_ci self-explanatory, except that ``default`` represents whatever hint 5618c2ecf20Sopenharmony_ci value was set by the platform firmware. 5628c2ecf20Sopenharmony_ci 5638c2ecf20Sopenharmony_ciStrings written to the ``energy_performance_preference`` attribute are 5648c2ecf20Sopenharmony_ciinternally translated to integer values written to the processor's 5658c2ecf20Sopenharmony_ciEnergy-Performance Preference (EPP) knob (if supported) or its 5668c2ecf20Sopenharmony_ciEnergy-Performance Bias (EPB) knob. It is also possible to write a positive 5678c2ecf20Sopenharmony_ciinteger value between 0 to 255, if the EPP feature is present. If the EPP 5688c2ecf20Sopenharmony_cifeature is not present, writing integer value to this attribute is not 5698c2ecf20Sopenharmony_cisupported. In this case, user can use the 5708c2ecf20Sopenharmony_ci"/sys/devices/system/cpu/cpu*/power/energy_perf_bias" interface. 5718c2ecf20Sopenharmony_ci 5728c2ecf20Sopenharmony_ci[Note that tasks may by migrated from one CPU to another by the scheduler's 5738c2ecf20Sopenharmony_ciload-balancing algorithm and if different energy vs performance hints are 5748c2ecf20Sopenharmony_ciset for those CPUs, that may lead to undesirable outcomes. To avoid such 5758c2ecf20Sopenharmony_ciissues it is better to set the same energy vs performance hint for all CPUs 5768c2ecf20Sopenharmony_cior to pin every task potentially sensitive to them to a specific CPU.] 5778c2ecf20Sopenharmony_ci 5788c2ecf20Sopenharmony_ci.. _acpi-cpufreq: 5798c2ecf20Sopenharmony_ci 5808c2ecf20Sopenharmony_ci``intel_pstate`` vs ``acpi-cpufreq`` 5818c2ecf20Sopenharmony_ci==================================== 5828c2ecf20Sopenharmony_ci 5838c2ecf20Sopenharmony_ciOn the majority of systems supported by ``intel_pstate``, the ACPI tables 5848c2ecf20Sopenharmony_ciprovided by the platform firmware contain ``_PSS`` objects returning information 5858c2ecf20Sopenharmony_cithat can be used for CPU performance scaling (refer to the ACPI specification 5868c2ecf20Sopenharmony_ci[3]_ for details on the ``_PSS`` objects and the format of the information 5878c2ecf20Sopenharmony_cireturned by them). 5888c2ecf20Sopenharmony_ci 5898c2ecf20Sopenharmony_ciThe information returned by the ACPI ``_PSS`` objects is used by the 5908c2ecf20Sopenharmony_ci``acpi-cpufreq`` scaling driver. On systems supported by ``intel_pstate`` 5918c2ecf20Sopenharmony_cithe ``acpi-cpufreq`` driver uses the same hardware CPU performance scaling 5928c2ecf20Sopenharmony_ciinterface, but the set of P-states it can use is limited by the ``_PSS`` 5938c2ecf20Sopenharmony_cioutput. 5948c2ecf20Sopenharmony_ci 5958c2ecf20Sopenharmony_ciOn those systems each ``_PSS`` object returns a list of P-states supported by 5968c2ecf20Sopenharmony_cithe corresponding CPU which basically is a subset of the P-states range that can 5978c2ecf20Sopenharmony_cibe used by ``intel_pstate`` on the same system, with one exception: the whole 5988c2ecf20Sopenharmony_ci`turbo range <turbo_>`_ is represented by one item in it (the topmost one). By 5998c2ecf20Sopenharmony_ciconvention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz 6008c2ecf20Sopenharmony_cithan the frequency of the highest non-turbo P-state listed by it, but the 6018c2ecf20Sopenharmony_cicorresponding P-state representation (following the hardware specification) 6028c2ecf20Sopenharmony_cireturned for it matches the maximum supported turbo P-state (or is the 6038c2ecf20Sopenharmony_cispecial value 255 meaning essentially "go as high as you can get"). 6048c2ecf20Sopenharmony_ci 6058c2ecf20Sopenharmony_ciThe list of P-states returned by ``_PSS`` is reflected by the table of 6068c2ecf20Sopenharmony_ciavailable frequencies supplied by ``acpi-cpufreq`` to the ``CPUFreq`` core and 6078c2ecf20Sopenharmony_ciscaling governors and the minimum and maximum supported frequencies reported by 6088c2ecf20Sopenharmony_ciit come from that list as well. In particular, given the special representation 6098c2ecf20Sopenharmony_ciof the turbo range described above, this means that the maximum supported 6108c2ecf20Sopenharmony_cifrequency reported by ``acpi-cpufreq`` is higher by 1 MHz than the frequency 6118c2ecf20Sopenharmony_ciof the highest supported non-turbo P-state listed by ``_PSS`` which, of course, 6128c2ecf20Sopenharmony_ciaffects decisions made by the scaling governors, except for ``powersave`` and 6138c2ecf20Sopenharmony_ci``performance``. 6148c2ecf20Sopenharmony_ci 6158c2ecf20Sopenharmony_ciFor example, if a given governor attempts to select a frequency proportional to 6168c2ecf20Sopenharmony_ciestimated CPU load and maps the load of 100% to the maximum supported frequency 6178c2ecf20Sopenharmony_ci(possibly multiplied by a constant), then it will tend to choose P-states below 6188c2ecf20Sopenharmony_cithe turbo threshold if ``acpi-cpufreq`` is used as the scaling driver, because 6198c2ecf20Sopenharmony_ciin that case the turbo range corresponds to a small fraction of the frequency 6208c2ecf20Sopenharmony_ciband it can use (1 MHz vs 1 GHz or more). In consequence, it will only go to 6218c2ecf20Sopenharmony_cithe turbo range for the highest loads and the other loads above 50% that might 6228c2ecf20Sopenharmony_cibenefit from running at turbo frequencies will be given non-turbo P-states 6238c2ecf20Sopenharmony_ciinstead. 6248c2ecf20Sopenharmony_ci 6258c2ecf20Sopenharmony_ciOne more issue related to that may appear on systems supporting the 6268c2ecf20Sopenharmony_ci`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the 6278c2ecf20Sopenharmony_citurbo threshold. Namely, if that is not coordinated with the lists of P-states 6288c2ecf20Sopenharmony_cireturned by ``_PSS`` properly, there may be more than one item corresponding to 6298c2ecf20Sopenharmony_cia turbo P-state in those lists and there may be a problem with avoiding the 6308c2ecf20Sopenharmony_citurbo range (if desirable or necessary). Usually, to avoid using turbo 6318c2ecf20Sopenharmony_ciP-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed 6328c2ecf20Sopenharmony_ciby ``_PSS``, but that is not sufficient when there are other turbo P-states in 6338c2ecf20Sopenharmony_cithe list returned by it. 6348c2ecf20Sopenharmony_ci 6358c2ecf20Sopenharmony_ciApart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the 6368c2ecf20Sopenharmony_ci`passive mode <Passive Mode_>`_, except that the number of P-states it can set 6378c2ecf20Sopenharmony_ciis limited to the ones listed by the ACPI ``_PSS`` objects. 6388c2ecf20Sopenharmony_ci 6398c2ecf20Sopenharmony_ci 6408c2ecf20Sopenharmony_ciKernel Command Line Options for ``intel_pstate`` 6418c2ecf20Sopenharmony_ci================================================ 6428c2ecf20Sopenharmony_ci 6438c2ecf20Sopenharmony_ciSeveral kernel command line options can be used to pass early-configuration-time 6448c2ecf20Sopenharmony_ciparameters to ``intel_pstate`` in order to enforce specific behavior of it. All 6458c2ecf20Sopenharmony_ciof them have to be prepended with the ``intel_pstate=`` prefix. 6468c2ecf20Sopenharmony_ci 6478c2ecf20Sopenharmony_ci``disable`` 6488c2ecf20Sopenharmony_ci Do not register ``intel_pstate`` as the scaling driver even if the 6498c2ecf20Sopenharmony_ci processor is supported by it. 6508c2ecf20Sopenharmony_ci 6518c2ecf20Sopenharmony_ci``active`` 6528c2ecf20Sopenharmony_ci Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start 6538c2ecf20Sopenharmony_ci with. 6548c2ecf20Sopenharmony_ci 6558c2ecf20Sopenharmony_ci``passive`` 6568c2ecf20Sopenharmony_ci Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to 6578c2ecf20Sopenharmony_ci start with. 6588c2ecf20Sopenharmony_ci 6598c2ecf20Sopenharmony_ci``force`` 6608c2ecf20Sopenharmony_ci Register ``intel_pstate`` as the scaling driver instead of 6618c2ecf20Sopenharmony_ci ``acpi-cpufreq`` even if the latter is preferred on the given system. 6628c2ecf20Sopenharmony_ci 6638c2ecf20Sopenharmony_ci This may prevent some platform features (such as thermal controls and 6648c2ecf20Sopenharmony_ci power capping) that rely on the availability of ACPI P-states 6658c2ecf20Sopenharmony_ci information from functioning as expected, so it should be used with 6668c2ecf20Sopenharmony_ci caution. 6678c2ecf20Sopenharmony_ci 6688c2ecf20Sopenharmony_ci This option does not work with processors that are not supported by 6698c2ecf20Sopenharmony_ci ``intel_pstate`` and on platforms where the ``pcc-cpufreq`` scaling 6708c2ecf20Sopenharmony_ci driver is used instead of ``acpi-cpufreq``. 6718c2ecf20Sopenharmony_ci 6728c2ecf20Sopenharmony_ci``no_hwp`` 6738c2ecf20Sopenharmony_ci Do not enable the hardware-managed P-states (HWP) feature even if it is 6748c2ecf20Sopenharmony_ci supported by the processor. 6758c2ecf20Sopenharmony_ci 6768c2ecf20Sopenharmony_ci``hwp_only`` 6778c2ecf20Sopenharmony_ci Register ``intel_pstate`` as the scaling driver only if the 6788c2ecf20Sopenharmony_ci hardware-managed P-states (HWP) feature is supported by the processor. 6798c2ecf20Sopenharmony_ci 6808c2ecf20Sopenharmony_ci``support_acpi_ppc`` 6818c2ecf20Sopenharmony_ci Take ACPI ``_PPC`` performance limits into account. 6828c2ecf20Sopenharmony_ci 6838c2ecf20Sopenharmony_ci If the preferred power management profile in the FADT (Fixed ACPI 6848c2ecf20Sopenharmony_ci Description Table) is set to "Enterprise Server" or "Performance 6858c2ecf20Sopenharmony_ci Server", the ACPI ``_PPC`` limits are taken into account by default 6868c2ecf20Sopenharmony_ci and this option has no effect. 6878c2ecf20Sopenharmony_ci 6888c2ecf20Sopenharmony_ci``per_cpu_perf_limits`` 6898c2ecf20Sopenharmony_ci Use per-logical-CPU P-State limits (see `Coordination of P-state 6908c2ecf20Sopenharmony_ci Limits`_ for details). 6918c2ecf20Sopenharmony_ci 6928c2ecf20Sopenharmony_ci 6938c2ecf20Sopenharmony_ciDiagnostics and Tuning 6948c2ecf20Sopenharmony_ci====================== 6958c2ecf20Sopenharmony_ci 6968c2ecf20Sopenharmony_ciTrace Events 6978c2ecf20Sopenharmony_ci------------ 6988c2ecf20Sopenharmony_ci 6998c2ecf20Sopenharmony_ciThere are two static trace events that can be used for ``intel_pstate`` 7008c2ecf20Sopenharmony_cidiagnostics. One of them is the ``cpu_frequency`` trace event generally used 7018c2ecf20Sopenharmony_ciby ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific 7028c2ecf20Sopenharmony_cito ``intel_pstate``. Both of them are triggered by ``intel_pstate`` only if 7038c2ecf20Sopenharmony_ciit works in the `active mode <Active Mode_>`_. 7048c2ecf20Sopenharmony_ci 7058c2ecf20Sopenharmony_ciThe following sequence of shell commands can be used to enable them and see 7068c2ecf20Sopenharmony_citheir output (if the kernel is generally configured to support event tracing):: 7078c2ecf20Sopenharmony_ci 7088c2ecf20Sopenharmony_ci # cd /sys/kernel/debug/tracing/ 7098c2ecf20Sopenharmony_ci # echo 1 > events/power/pstate_sample/enable 7108c2ecf20Sopenharmony_ci # echo 1 > events/power/cpu_frequency/enable 7118c2ecf20Sopenharmony_ci # cat trace 7128c2ecf20Sopenharmony_ci gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476 7138c2ecf20Sopenharmony_ci cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2 7148c2ecf20Sopenharmony_ci 7158c2ecf20Sopenharmony_ciIf ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the 7168c2ecf20Sopenharmony_ci``cpu_frequency`` trace event will be triggered either by the ``schedutil`` 7178c2ecf20Sopenharmony_ciscaling governor (for the policies it is attached to), or by the ``CPUFreq`` 7188c2ecf20Sopenharmony_cicore (for the policies with other scaling governors). 7198c2ecf20Sopenharmony_ci 7208c2ecf20Sopenharmony_ci``ftrace`` 7218c2ecf20Sopenharmony_ci---------- 7228c2ecf20Sopenharmony_ci 7238c2ecf20Sopenharmony_ciThe ``ftrace`` interface can be used for low-level diagnostics of 7248c2ecf20Sopenharmony_ci``intel_pstate``. For example, to check how often the function to set a 7258c2ecf20Sopenharmony_ciP-state is called, the ``ftrace`` filter can be set to 7268c2ecf20Sopenharmony_ci:c:func:`intel_pstate_set_pstate`:: 7278c2ecf20Sopenharmony_ci 7288c2ecf20Sopenharmony_ci # cd /sys/kernel/debug/tracing/ 7298c2ecf20Sopenharmony_ci # cat available_filter_functions | grep -i pstate 7308c2ecf20Sopenharmony_ci intel_pstate_set_pstate 7318c2ecf20Sopenharmony_ci intel_pstate_cpu_init 7328c2ecf20Sopenharmony_ci ... 7338c2ecf20Sopenharmony_ci # echo intel_pstate_set_pstate > set_ftrace_filter 7348c2ecf20Sopenharmony_ci # echo function > current_tracer 7358c2ecf20Sopenharmony_ci # cat trace | head -15 7368c2ecf20Sopenharmony_ci # tracer: function 7378c2ecf20Sopenharmony_ci # 7388c2ecf20Sopenharmony_ci # entries-in-buffer/entries-written: 80/80 #P:4 7398c2ecf20Sopenharmony_ci # 7408c2ecf20Sopenharmony_ci # _-----=> irqs-off 7418c2ecf20Sopenharmony_ci # / _----=> need-resched 7428c2ecf20Sopenharmony_ci # | / _---=> hardirq/softirq 7438c2ecf20Sopenharmony_ci # || / _--=> preempt-depth 7448c2ecf20Sopenharmony_ci # ||| / delay 7458c2ecf20Sopenharmony_ci # TASK-PID CPU# |||| TIMESTAMP FUNCTION 7468c2ecf20Sopenharmony_ci # | | | |||| | | 7478c2ecf20Sopenharmony_ci Xorg-3129 [000] ..s. 2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func 7488c2ecf20Sopenharmony_ci gnome-terminal--4510 [002] ..s. 2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func 7498c2ecf20Sopenharmony_ci gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func 7508c2ecf20Sopenharmony_ci <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func 7518c2ecf20Sopenharmony_ci 7528c2ecf20Sopenharmony_ci 7538c2ecf20Sopenharmony_ciReferences 7548c2ecf20Sopenharmony_ci========== 7558c2ecf20Sopenharmony_ci 7568c2ecf20Sopenharmony_ci.. [1] Kristen Accardi, *Balancing Power and Performance in the Linux Kernel*, 7578c2ecf20Sopenharmony_ci https://events.static.linuxfound.org/sites/events/files/slides/LinuxConEurope_2015.pdf 7588c2ecf20Sopenharmony_ci 7598c2ecf20Sopenharmony_ci.. [2] *Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Programming Guide*, 7608c2ecf20Sopenharmony_ci https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html 7618c2ecf20Sopenharmony_ci 7628c2ecf20Sopenharmony_ci.. [3] *Advanced Configuration and Power Interface Specification*, 7638c2ecf20Sopenharmony_ci https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf 764