18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci.. _imc: 38c2ecf20Sopenharmony_ci 48c2ecf20Sopenharmony_ci=================================== 58c2ecf20Sopenharmony_ciIMC (In-Memory Collection Counters) 68c2ecf20Sopenharmony_ci=================================== 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ciAnju T Sudhakar, 10 May 2019 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ci.. contents:: 118c2ecf20Sopenharmony_ci :depth: 3 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ci 148c2ecf20Sopenharmony_ciBasic overview 158c2ecf20Sopenharmony_ci============== 168c2ecf20Sopenharmony_ci 178c2ecf20Sopenharmony_ciIMC (In-Memory collection counters) is a hardware monitoring facility that 188c2ecf20Sopenharmony_cicollects large numbers of hardware performance events at Nest level (these are 198c2ecf20Sopenharmony_cion-chip but off-core), Core level and Thread level. 208c2ecf20Sopenharmony_ci 218c2ecf20Sopenharmony_ciThe Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC 228c2ecf20Sopenharmony_ci(On-Chip Controller) complex. The microcode collects the counter data and moves 238c2ecf20Sopenharmony_cithe nest IMC counter data to memory. 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ciThe Core and Thread IMC PMU counters are handled in the core. Core level PMU 268c2ecf20Sopenharmony_cicounters give us the IMC counters' data per core and thread level PMU counters 278c2ecf20Sopenharmony_cigive us the IMC counters' data per CPU thread. 288c2ecf20Sopenharmony_ci 298c2ecf20Sopenharmony_ciOPAL obtains the IMC PMU and supported events information from the IMC Catalog 308c2ecf20Sopenharmony_ciand passes on to the kernel via the device tree. The event's information 318c2ecf20Sopenharmony_cicontains: 328c2ecf20Sopenharmony_ci 338c2ecf20Sopenharmony_ci- Event name 348c2ecf20Sopenharmony_ci- Event Offset 358c2ecf20Sopenharmony_ci- Event description 368c2ecf20Sopenharmony_ci 378c2ecf20Sopenharmony_ciand possibly also: 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_ci- Event scale 408c2ecf20Sopenharmony_ci- Event unit 418c2ecf20Sopenharmony_ci 428c2ecf20Sopenharmony_ciSome PMUs may have a common scale and unit values for all their supported 438c2ecf20Sopenharmony_cievents. For those cases, the scale and unit properties for those events must be 448c2ecf20Sopenharmony_ciinherited from the PMU. 458c2ecf20Sopenharmony_ci 468c2ecf20Sopenharmony_ciThe event offset in the memory is where the counter data gets accumulated. 478c2ecf20Sopenharmony_ci 488c2ecf20Sopenharmony_ciIMC catalog is available at: 498c2ecf20Sopenharmony_ci https://github.com/open-power/ima-catalog 508c2ecf20Sopenharmony_ci 518c2ecf20Sopenharmony_ciThe kernel discovers the IMC counters information in the device tree at the 528c2ecf20Sopenharmony_ci`imc-counters` device node which has a compatible field 538c2ecf20Sopenharmony_ci`ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs 548c2ecf20Sopenharmony_ciand their event's information and register the PMU and its attributes in the 558c2ecf20Sopenharmony_cikernel. 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ciIMC example usage 588c2ecf20Sopenharmony_ci================= 598c2ecf20Sopenharmony_ci 608c2ecf20Sopenharmony_ci.. code-block:: sh 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ci # perf list 638c2ecf20Sopenharmony_ci [...] 648c2ecf20Sopenharmony_ci nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/ [Kernel PMU event] 658c2ecf20Sopenharmony_ci nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/ [Kernel PMU event] 668c2ecf20Sopenharmony_ci [...] 678c2ecf20Sopenharmony_ci core_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] 688c2ecf20Sopenharmony_ci core_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] 698c2ecf20Sopenharmony_ci [...] 708c2ecf20Sopenharmony_ci thread_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] 718c2ecf20Sopenharmony_ci thread_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ciTo see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/: 748c2ecf20Sopenharmony_ci 758c2ecf20Sopenharmony_ci.. code-block:: sh 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ci # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket 788c2ecf20Sopenharmony_ci 798c2ecf20Sopenharmony_ciTo see non-idle instructions for core 0: 808c2ecf20Sopenharmony_ci 818c2ecf20Sopenharmony_ci.. code-block:: sh 828c2ecf20Sopenharmony_ci 838c2ecf20Sopenharmony_ci # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000 848c2ecf20Sopenharmony_ci 858c2ecf20Sopenharmony_ciTo see non-idle instructions for a "make": 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ci.. code-block:: sh 888c2ecf20Sopenharmony_ci 898c2ecf20Sopenharmony_ci # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make 908c2ecf20Sopenharmony_ci 918c2ecf20Sopenharmony_ci 928c2ecf20Sopenharmony_ciIMC Trace-mode 938c2ecf20Sopenharmony_ci=============== 948c2ecf20Sopenharmony_ci 958c2ecf20Sopenharmony_ciPOWER9 supports two modes for IMC which are the Accumulation mode and Trace 968c2ecf20Sopenharmony_cimode. In Accumulation mode, event counts are accumulated in system Memory. 978c2ecf20Sopenharmony_ciHypervisor then reads the posted counts periodically or when requested. In IMC 988c2ecf20Sopenharmony_ciTrace mode, the 64 bit trace SCOM value is initialized with the event 998c2ecf20Sopenharmony_ciinformation. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event 1008c2ecf20Sopenharmony_cito be monitored and the sampling duration. On each overflow in the CPMCxSEL, 1018c2ecf20Sopenharmony_cihardware snapshots the program counter along with event counts and writes into 1028c2ecf20Sopenharmony_cimemory pointed by LDBAR. 1038c2ecf20Sopenharmony_ci 1048c2ecf20Sopenharmony_ciLDBAR is a 64 bit special purpose per thread register, it has bits to indicate 1058c2ecf20Sopenharmony_ciwhether hardware is configured for accumulation or trace mode. 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ciLDBAR Register Layout 1088c2ecf20Sopenharmony_ci--------------------- 1098c2ecf20Sopenharmony_ci 1108c2ecf20Sopenharmony_ci +-------+----------------------+ 1118c2ecf20Sopenharmony_ci | 0 | Enable/Disable | 1128c2ecf20Sopenharmony_ci +-------+----------------------+ 1138c2ecf20Sopenharmony_ci | 1 | 0: Accumulation Mode | 1148c2ecf20Sopenharmony_ci | +----------------------+ 1158c2ecf20Sopenharmony_ci | | 1: Trace Mode | 1168c2ecf20Sopenharmony_ci +-------+----------------------+ 1178c2ecf20Sopenharmony_ci | 2:3 | Reserved | 1188c2ecf20Sopenharmony_ci +-------+----------------------+ 1198c2ecf20Sopenharmony_ci | 4-6 | PB scope | 1208c2ecf20Sopenharmony_ci +-------+----------------------+ 1218c2ecf20Sopenharmony_ci | 7 | Reserved | 1228c2ecf20Sopenharmony_ci +-------+----------------------+ 1238c2ecf20Sopenharmony_ci | 8:50 | Counter Address | 1248c2ecf20Sopenharmony_ci +-------+----------------------+ 1258c2ecf20Sopenharmony_ci | 51:63 | Reserved | 1268c2ecf20Sopenharmony_ci +-------+----------------------+ 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ciTRACE_IMC_SCOM bit representation 1298c2ecf20Sopenharmony_ci--------------------------------- 1308c2ecf20Sopenharmony_ci 1318c2ecf20Sopenharmony_ci +-------+------------+ 1328c2ecf20Sopenharmony_ci | 0:1 | SAMPSEL | 1338c2ecf20Sopenharmony_ci +-------+------------+ 1348c2ecf20Sopenharmony_ci | 2:33 | CPMC_LOAD | 1358c2ecf20Sopenharmony_ci +-------+------------+ 1368c2ecf20Sopenharmony_ci | 34:40 | CPMC1SEL | 1378c2ecf20Sopenharmony_ci +-------+------------+ 1388c2ecf20Sopenharmony_ci | 41:47 | CPMC2SEL | 1398c2ecf20Sopenharmony_ci +-------+------------+ 1408c2ecf20Sopenharmony_ci | 48:50 | BUFFERSIZE | 1418c2ecf20Sopenharmony_ci +-------+------------+ 1428c2ecf20Sopenharmony_ci | 51:63 | RESERVED | 1438c2ecf20Sopenharmony_ci +-------+------------+ 1448c2ecf20Sopenharmony_ci 1458c2ecf20Sopenharmony_ciCPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the 1468c2ecf20Sopenharmony_cievent to count. BUFFERSIZE indicates the memory range. On each overflow, 1478c2ecf20Sopenharmony_cihardware snapshots the program counter along with event counts and updates the 1488c2ecf20Sopenharmony_cimemory and reloads the CMPC_LOAD value for the next sampling duration. IMC 1498c2ecf20Sopenharmony_cihardware does not support exceptions, so it quietly wraps around if memory 1508c2ecf20Sopenharmony_cibuffer reaches the end. 1518c2ecf20Sopenharmony_ci 1528c2ecf20Sopenharmony_ci*Currently the event monitored for trace-mode is fixed as cycle.* 1538c2ecf20Sopenharmony_ci 1548c2ecf20Sopenharmony_ciTrace IMC example usage 1558c2ecf20Sopenharmony_ci======================= 1568c2ecf20Sopenharmony_ci 1578c2ecf20Sopenharmony_ci.. code-block:: sh 1588c2ecf20Sopenharmony_ci 1598c2ecf20Sopenharmony_ci # perf list 1608c2ecf20Sopenharmony_ci [....] 1618c2ecf20Sopenharmony_ci trace_imc/trace_cycles/ [Kernel PMU event] 1628c2ecf20Sopenharmony_ci 1638c2ecf20Sopenharmony_ciTo record an application/process with trace-imc event: 1648c2ecf20Sopenharmony_ci 1658c2ecf20Sopenharmony_ci.. code-block:: sh 1668c2ecf20Sopenharmony_ci 1678c2ecf20Sopenharmony_ci # perf record -e trace_imc/trace_cycles/ yes > /dev/null 1688c2ecf20Sopenharmony_ci [ perf record: Woken up 1 times to write data ] 1698c2ecf20Sopenharmony_ci [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ] 1708c2ecf20Sopenharmony_ci 1718c2ecf20Sopenharmony_ciThe `perf.data` generated, can be read using perf report. 1728c2ecf20Sopenharmony_ci 1738c2ecf20Sopenharmony_ciBenefits of using IMC trace-mode 1748c2ecf20Sopenharmony_ci================================ 1758c2ecf20Sopenharmony_ci 1768c2ecf20Sopenharmony_ciPMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC 1778c2ecf20Sopenharmony_citrace mode snapshots the program counter and updates to the memory. And this 1788c2ecf20Sopenharmony_cialso provide a way for the operating system to do instruction sampling in real 1798c2ecf20Sopenharmony_citime without PMI processing overhead. 1808c2ecf20Sopenharmony_ci 1818c2ecf20Sopenharmony_ciPerformance data using `perf top` with and without trace-imc event. 1828c2ecf20Sopenharmony_ci 1838c2ecf20Sopenharmony_ciPMI interrupts count when `perf top` command is executed without trace-imc event. 1848c2ecf20Sopenharmony_ci 1858c2ecf20Sopenharmony_ci.. code-block:: sh 1868c2ecf20Sopenharmony_ci 1878c2ecf20Sopenharmony_ci # grep PMI /proc/interrupts 1888c2ecf20Sopenharmony_ci PMI: 0 0 0 0 Performance monitoring interrupts 1898c2ecf20Sopenharmony_ci # ./perf top 1908c2ecf20Sopenharmony_ci ... 1918c2ecf20Sopenharmony_ci # grep PMI /proc/interrupts 1928c2ecf20Sopenharmony_ci PMI: 39735 8710 17338 17801 Performance monitoring interrupts 1938c2ecf20Sopenharmony_ci # ./perf top -e trace_imc/trace_cycles/ 1948c2ecf20Sopenharmony_ci ... 1958c2ecf20Sopenharmony_ci # grep PMI /proc/interrupts 1968c2ecf20Sopenharmony_ci PMI: 39735 8710 17338 17801 Performance monitoring interrupts 1978c2ecf20Sopenharmony_ci 1988c2ecf20Sopenharmony_ci 1998c2ecf20Sopenharmony_ciThat is, the PMI interrupt counts do not increment when using the `trace_imc` event. 200