18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci.. _imc:
38c2ecf20Sopenharmony_ci
48c2ecf20Sopenharmony_ci===================================
58c2ecf20Sopenharmony_ciIMC (In-Memory Collection Counters)
68c2ecf20Sopenharmony_ci===================================
78c2ecf20Sopenharmony_ci
88c2ecf20Sopenharmony_ciAnju T Sudhakar, 10 May 2019
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ci.. contents::
118c2ecf20Sopenharmony_ci    :depth: 3
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ci
148c2ecf20Sopenharmony_ciBasic overview
158c2ecf20Sopenharmony_ci==============
168c2ecf20Sopenharmony_ci
178c2ecf20Sopenharmony_ciIMC (In-Memory collection counters) is a hardware monitoring facility that
188c2ecf20Sopenharmony_cicollects large numbers of hardware performance events at Nest level (these are
198c2ecf20Sopenharmony_cion-chip but off-core), Core level and Thread level.
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_ciThe Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC
228c2ecf20Sopenharmony_ci(On-Chip Controller) complex. The microcode collects the counter data and moves
238c2ecf20Sopenharmony_cithe nest IMC counter data to memory.
248c2ecf20Sopenharmony_ci
258c2ecf20Sopenharmony_ciThe Core and Thread IMC PMU counters are handled in the core. Core level PMU
268c2ecf20Sopenharmony_cicounters give us the IMC counters' data per core and thread level PMU counters
278c2ecf20Sopenharmony_cigive us the IMC counters' data per CPU thread.
288c2ecf20Sopenharmony_ci
298c2ecf20Sopenharmony_ciOPAL obtains the IMC PMU and supported events information from the IMC Catalog
308c2ecf20Sopenharmony_ciand passes on to the kernel via the device tree. The event's information
318c2ecf20Sopenharmony_cicontains:
328c2ecf20Sopenharmony_ci
338c2ecf20Sopenharmony_ci- Event name
348c2ecf20Sopenharmony_ci- Event Offset
358c2ecf20Sopenharmony_ci- Event description
368c2ecf20Sopenharmony_ci
378c2ecf20Sopenharmony_ciand possibly also:
388c2ecf20Sopenharmony_ci
398c2ecf20Sopenharmony_ci- Event scale
408c2ecf20Sopenharmony_ci- Event unit
418c2ecf20Sopenharmony_ci
428c2ecf20Sopenharmony_ciSome PMUs may have a common scale and unit values for all their supported
438c2ecf20Sopenharmony_cievents. For those cases, the scale and unit properties for those events must be
448c2ecf20Sopenharmony_ciinherited from the PMU.
458c2ecf20Sopenharmony_ci
468c2ecf20Sopenharmony_ciThe event offset in the memory is where the counter data gets accumulated.
478c2ecf20Sopenharmony_ci
488c2ecf20Sopenharmony_ciIMC catalog is available at:
498c2ecf20Sopenharmony_ci	https://github.com/open-power/ima-catalog
508c2ecf20Sopenharmony_ci
518c2ecf20Sopenharmony_ciThe kernel discovers the IMC counters information in the device tree at the
528c2ecf20Sopenharmony_ci`imc-counters` device node which has a compatible field
538c2ecf20Sopenharmony_ci`ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs
548c2ecf20Sopenharmony_ciand their event's information and register the PMU and its attributes in the
558c2ecf20Sopenharmony_cikernel.
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ciIMC example usage
588c2ecf20Sopenharmony_ci=================
598c2ecf20Sopenharmony_ci
608c2ecf20Sopenharmony_ci.. code-block:: sh
618c2ecf20Sopenharmony_ci
628c2ecf20Sopenharmony_ci  # perf list
638c2ecf20Sopenharmony_ci  [...]
648c2ecf20Sopenharmony_ci  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/            [Kernel PMU event]
658c2ecf20Sopenharmony_ci  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/            [Kernel PMU event]
668c2ecf20Sopenharmony_ci  [...]
678c2ecf20Sopenharmony_ci  core_imc/CPM_0THRD_NON_IDLE_PCYC/                  [Kernel PMU event]
688c2ecf20Sopenharmony_ci  core_imc/CPM_1THRD_NON_IDLE_INST/                  [Kernel PMU event]
698c2ecf20Sopenharmony_ci  [...]
708c2ecf20Sopenharmony_ci  thread_imc/CPM_0THRD_NON_IDLE_PCYC/                [Kernel PMU event]
718c2ecf20Sopenharmony_ci  thread_imc/CPM_1THRD_NON_IDLE_INST/                [Kernel PMU event]
728c2ecf20Sopenharmony_ci
738c2ecf20Sopenharmony_ciTo see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/:
748c2ecf20Sopenharmony_ci
758c2ecf20Sopenharmony_ci.. code-block:: sh
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ci  # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket
788c2ecf20Sopenharmony_ci
798c2ecf20Sopenharmony_ciTo see non-idle instructions for core 0:
808c2ecf20Sopenharmony_ci
818c2ecf20Sopenharmony_ci.. code-block:: sh
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_ci  # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000
848c2ecf20Sopenharmony_ci
858c2ecf20Sopenharmony_ciTo see non-idle instructions for a "make":
868c2ecf20Sopenharmony_ci
878c2ecf20Sopenharmony_ci.. code-block:: sh
888c2ecf20Sopenharmony_ci
898c2ecf20Sopenharmony_ci  # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make
908c2ecf20Sopenharmony_ci
918c2ecf20Sopenharmony_ci
928c2ecf20Sopenharmony_ciIMC Trace-mode
938c2ecf20Sopenharmony_ci===============
948c2ecf20Sopenharmony_ci
958c2ecf20Sopenharmony_ciPOWER9 supports two modes for IMC which are the Accumulation mode and Trace
968c2ecf20Sopenharmony_cimode. In Accumulation mode, event counts are accumulated in system Memory.
978c2ecf20Sopenharmony_ciHypervisor then reads the posted counts periodically or when requested. In IMC
988c2ecf20Sopenharmony_ciTrace mode, the 64 bit trace SCOM value is initialized with the event
998c2ecf20Sopenharmony_ciinformation. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event
1008c2ecf20Sopenharmony_cito be monitored and the sampling duration. On each overflow in the CPMCxSEL,
1018c2ecf20Sopenharmony_cihardware snapshots the program counter along with event counts and writes into
1028c2ecf20Sopenharmony_cimemory pointed by LDBAR.
1038c2ecf20Sopenharmony_ci
1048c2ecf20Sopenharmony_ciLDBAR is a 64 bit special purpose per thread register, it has bits to indicate
1058c2ecf20Sopenharmony_ciwhether hardware is configured for accumulation or trace mode.
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ciLDBAR Register Layout
1088c2ecf20Sopenharmony_ci---------------------
1098c2ecf20Sopenharmony_ci
1108c2ecf20Sopenharmony_ci  +-------+----------------------+
1118c2ecf20Sopenharmony_ci  | 0     | Enable/Disable       |
1128c2ecf20Sopenharmony_ci  +-------+----------------------+
1138c2ecf20Sopenharmony_ci  | 1     | 0: Accumulation Mode |
1148c2ecf20Sopenharmony_ci  |       +----------------------+
1158c2ecf20Sopenharmony_ci  |       | 1: Trace Mode        |
1168c2ecf20Sopenharmony_ci  +-------+----------------------+
1178c2ecf20Sopenharmony_ci  | 2:3   | Reserved             |
1188c2ecf20Sopenharmony_ci  +-------+----------------------+
1198c2ecf20Sopenharmony_ci  | 4-6   | PB scope             |
1208c2ecf20Sopenharmony_ci  +-------+----------------------+
1218c2ecf20Sopenharmony_ci  | 7     | Reserved             |
1228c2ecf20Sopenharmony_ci  +-------+----------------------+
1238c2ecf20Sopenharmony_ci  | 8:50  | Counter Address      |
1248c2ecf20Sopenharmony_ci  +-------+----------------------+
1258c2ecf20Sopenharmony_ci  | 51:63 | Reserved             |
1268c2ecf20Sopenharmony_ci  +-------+----------------------+
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ciTRACE_IMC_SCOM bit representation
1298c2ecf20Sopenharmony_ci---------------------------------
1308c2ecf20Sopenharmony_ci
1318c2ecf20Sopenharmony_ci  +-------+------------+
1328c2ecf20Sopenharmony_ci  | 0:1   | SAMPSEL    |
1338c2ecf20Sopenharmony_ci  +-------+------------+
1348c2ecf20Sopenharmony_ci  | 2:33  | CPMC_LOAD  |
1358c2ecf20Sopenharmony_ci  +-------+------------+
1368c2ecf20Sopenharmony_ci  | 34:40 | CPMC1SEL   |
1378c2ecf20Sopenharmony_ci  +-------+------------+
1388c2ecf20Sopenharmony_ci  | 41:47 | CPMC2SEL   |
1398c2ecf20Sopenharmony_ci  +-------+------------+
1408c2ecf20Sopenharmony_ci  | 48:50 | BUFFERSIZE |
1418c2ecf20Sopenharmony_ci  +-------+------------+
1428c2ecf20Sopenharmony_ci  | 51:63 | RESERVED   |
1438c2ecf20Sopenharmony_ci  +-------+------------+
1448c2ecf20Sopenharmony_ci
1458c2ecf20Sopenharmony_ciCPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the
1468c2ecf20Sopenharmony_cievent to count. BUFFERSIZE indicates the memory range. On each overflow,
1478c2ecf20Sopenharmony_cihardware snapshots the program counter along with event counts and updates the
1488c2ecf20Sopenharmony_cimemory and reloads the CMPC_LOAD value for the next sampling duration. IMC
1498c2ecf20Sopenharmony_cihardware does not support exceptions, so it quietly wraps around if memory
1508c2ecf20Sopenharmony_cibuffer reaches the end.
1518c2ecf20Sopenharmony_ci
1528c2ecf20Sopenharmony_ci*Currently the event monitored for trace-mode is fixed as cycle.*
1538c2ecf20Sopenharmony_ci
1548c2ecf20Sopenharmony_ciTrace IMC example usage
1558c2ecf20Sopenharmony_ci=======================
1568c2ecf20Sopenharmony_ci
1578c2ecf20Sopenharmony_ci.. code-block:: sh
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ci  # perf list
1608c2ecf20Sopenharmony_ci  [....]
1618c2ecf20Sopenharmony_ci  trace_imc/trace_cycles/                            [Kernel PMU event]
1628c2ecf20Sopenharmony_ci
1638c2ecf20Sopenharmony_ciTo record an application/process with trace-imc event:
1648c2ecf20Sopenharmony_ci
1658c2ecf20Sopenharmony_ci.. code-block:: sh
1668c2ecf20Sopenharmony_ci
1678c2ecf20Sopenharmony_ci  # perf record -e trace_imc/trace_cycles/ yes > /dev/null
1688c2ecf20Sopenharmony_ci  [ perf record: Woken up 1 times to write data ]
1698c2ecf20Sopenharmony_ci  [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ]
1708c2ecf20Sopenharmony_ci
1718c2ecf20Sopenharmony_ciThe `perf.data` generated, can be read using perf report.
1728c2ecf20Sopenharmony_ci
1738c2ecf20Sopenharmony_ciBenefits of using IMC trace-mode
1748c2ecf20Sopenharmony_ci================================
1758c2ecf20Sopenharmony_ci
1768c2ecf20Sopenharmony_ciPMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC
1778c2ecf20Sopenharmony_citrace mode snapshots the program counter and updates to the memory. And this
1788c2ecf20Sopenharmony_cialso provide a way for the operating system to do instruction sampling in real
1798c2ecf20Sopenharmony_citime without PMI processing overhead.
1808c2ecf20Sopenharmony_ci
1818c2ecf20Sopenharmony_ciPerformance data using `perf top` with and without trace-imc event.
1828c2ecf20Sopenharmony_ci
1838c2ecf20Sopenharmony_ciPMI interrupts count when `perf top` command is executed without trace-imc event.
1848c2ecf20Sopenharmony_ci
1858c2ecf20Sopenharmony_ci.. code-block:: sh
1868c2ecf20Sopenharmony_ci
1878c2ecf20Sopenharmony_ci  # grep PMI /proc/interrupts
1888c2ecf20Sopenharmony_ci  PMI:          0          0          0          0   Performance monitoring interrupts
1898c2ecf20Sopenharmony_ci  # ./perf top
1908c2ecf20Sopenharmony_ci  ...
1918c2ecf20Sopenharmony_ci  # grep PMI /proc/interrupts
1928c2ecf20Sopenharmony_ci  PMI:      39735       8710      17338      17801   Performance monitoring interrupts
1938c2ecf20Sopenharmony_ci  # ./perf top -e trace_imc/trace_cycles/
1948c2ecf20Sopenharmony_ci  ...
1958c2ecf20Sopenharmony_ci  # grep PMI /proc/interrupts
1968c2ecf20Sopenharmony_ci  PMI:      39735       8710      17338      17801   Performance monitoring interrupts
1978c2ecf20Sopenharmony_ci
1988c2ecf20Sopenharmony_ci
1998c2ecf20Sopenharmony_ciThat is, the PMI interrupt counts do not increment when using the `trace_imc` event.
200