162306a36Sopenharmony_ciUsing TopDown metrics 262306a36Sopenharmony_ci--------------------- 362306a36Sopenharmony_ci 462306a36Sopenharmony_ciTopDown metrics break apart performance bottlenecks. Starting at level 562306a36Sopenharmony_ci1 it is typical to get metrics on retiring, bad speculation, frontend 662306a36Sopenharmony_cibound, and backend bound. Higher levels provide more detail in to the 762306a36Sopenharmony_cilevel 1 bottlenecks, such as at level 2: core bound, memory bound, 862306a36Sopenharmony_ciheavy operations, light operations, branch mispredicts, machine 962306a36Sopenharmony_ciclears, fetch latency and fetch bandwidth. For more details see [1][2][3]. 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ciperf stat --topdown implements this using available metrics that vary 1262306a36Sopenharmony_ciper architecture. 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ci% perf stat -a --topdown -I1000 1562306a36Sopenharmony_ci# time % tma_retiring % tma_backend_bound % tma_frontend_bound % tma_bad_speculation 1662306a36Sopenharmony_ci 1.001141351 11.5 34.9 46.9 6.7 1762306a36Sopenharmony_ci 2.006141972 13.4 28.1 50.4 8.1 1862306a36Sopenharmony_ci 3.010162040 12.9 28.1 51.1 8.0 1962306a36Sopenharmony_ci 4.014009311 12.5 28.6 51.8 7.2 2062306a36Sopenharmony_ci 5.017838554 11.8 33.0 48.0 7.2 2162306a36Sopenharmony_ci 5.704818971 14.0 27.5 51.3 7.3 2262306a36Sopenharmony_ci... 2362306a36Sopenharmony_ci 2462306a36Sopenharmony_ciNew Topdown features in Intel Ice Lake 2562306a36Sopenharmony_ci====================================== 2662306a36Sopenharmony_ci 2762306a36Sopenharmony_ciWith Ice Lake CPUs the TopDown metrics are directly available as 2862306a36Sopenharmony_cifixed counters and do not require generic counters. This allows 2962306a36Sopenharmony_cito collect TopDown always in addition to other events. 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciUsing TopDown through RDPMC in applications on Intel Ice Lake 3262306a36Sopenharmony_ci============================================================= 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ciFor more fine grained measurements it can be useful to 3562306a36Sopenharmony_ciaccess the new directly from user space. This is more complicated, 3662306a36Sopenharmony_cibut drastically lowers overhead. 3762306a36Sopenharmony_ci 3862306a36Sopenharmony_ciOn Ice Lake, there is a new fixed counter 3: SLOTS, which reports 3962306a36Sopenharmony_ci"pipeline SLOTS" (cycles multiplied by core issue width) and a 4062306a36Sopenharmony_cimetric register that reports slots ratios for the different bottleneck 4162306a36Sopenharmony_cicategories. 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciThe metrics counter is CPU model specific and is not available on older 4462306a36Sopenharmony_ciCPUs. 4562306a36Sopenharmony_ci 4662306a36Sopenharmony_ciExample code 4762306a36Sopenharmony_ci============ 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_ciLibrary functions to do the functionality described below 5062306a36Sopenharmony_ciis also available in libjevents [4] 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ciThe application opens a group with fixed counter 3 (SLOTS) and any 5362306a36Sopenharmony_cimetric event, and allow user programs to read the performance counters. 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ciFixed counter 3 is mapped to a pseudo event event=0x00, umask=04, 5662306a36Sopenharmony_ciso the perf_event_attr structure should be initialized with 5762306a36Sopenharmony_ci{ .config = 0x0400, .type = PERF_TYPE_RAW } 5862306a36Sopenharmony_ciThe metric events are mapped to the pseudo event event=0x00, umask=0x8X. 5962306a36Sopenharmony_ciFor example, the perf_event_attr structure can be initialized with 6062306a36Sopenharmony_ci{ .config = 0x8000, .type = PERF_TYPE_RAW } for Retiring metric event 6162306a36Sopenharmony_ciThe Fixed counter 3 must be the leader of the group. 6262306a36Sopenharmony_ci 6362306a36Sopenharmony_ci#include <linux/perf_event.h> 6462306a36Sopenharmony_ci#include <sys/mman.h> 6562306a36Sopenharmony_ci#include <sys/syscall.h> 6662306a36Sopenharmony_ci#include <unistd.h> 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ci/* Provide own perf_event_open stub because glibc doesn't */ 6962306a36Sopenharmony_ci__attribute__((weak)) 7062306a36Sopenharmony_ciint perf_event_open(struct perf_event_attr *attr, pid_t pid, 7162306a36Sopenharmony_ci int cpu, int group_fd, unsigned long flags) 7262306a36Sopenharmony_ci{ 7362306a36Sopenharmony_ci return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); 7462306a36Sopenharmony_ci} 7562306a36Sopenharmony_ci 7662306a36Sopenharmony_ci/* Open slots counter file descriptor for current task. */ 7762306a36Sopenharmony_cistruct perf_event_attr slots = { 7862306a36Sopenharmony_ci .type = PERF_TYPE_RAW, 7962306a36Sopenharmony_ci .size = sizeof(struct perf_event_attr), 8062306a36Sopenharmony_ci .config = 0x400, 8162306a36Sopenharmony_ci .exclude_kernel = 1, 8262306a36Sopenharmony_ci}; 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ciint slots_fd = perf_event_open(&slots, 0, -1, -1, 0); 8562306a36Sopenharmony_ciif (slots_fd < 0) 8662306a36Sopenharmony_ci ... error ... 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ci/* Memory mapping the fd permits _rdpmc calls from userspace */ 8962306a36Sopenharmony_civoid *slots_p = mmap(0, getpagesize(), PROT_READ, MAP_SHARED, slots_fd, 0); 9062306a36Sopenharmony_ciif (!slot_p) 9162306a36Sopenharmony_ci .... error ... 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ci/* 9462306a36Sopenharmony_ci * Open metrics event file descriptor for current task. 9562306a36Sopenharmony_ci * Set slots event as the leader of the group. 9662306a36Sopenharmony_ci */ 9762306a36Sopenharmony_cistruct perf_event_attr metrics = { 9862306a36Sopenharmony_ci .type = PERF_TYPE_RAW, 9962306a36Sopenharmony_ci .size = sizeof(struct perf_event_attr), 10062306a36Sopenharmony_ci .config = 0x8000, 10162306a36Sopenharmony_ci .exclude_kernel = 1, 10262306a36Sopenharmony_ci}; 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ciint metrics_fd = perf_event_open(&metrics, 0, -1, slots_fd, 0); 10562306a36Sopenharmony_ciif (metrics_fd < 0) 10662306a36Sopenharmony_ci ... error ... 10762306a36Sopenharmony_ci 10862306a36Sopenharmony_ci/* Memory mapping the fd permits _rdpmc calls from userspace */ 10962306a36Sopenharmony_civoid *metrics_p = mmap(0, getpagesize(), PROT_READ, MAP_SHARED, metrics_fd, 0); 11062306a36Sopenharmony_ciif (!metrics_p) 11162306a36Sopenharmony_ci ... error ... 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ciNote: the file descriptors returned by the perf_event_open calls must be memory 11462306a36Sopenharmony_cimapped to permit calls to the _rdpmd instruction. Permission may also be granted 11562306a36Sopenharmony_ciby writing the /sys/devices/cpu/rdpmc sysfs node. 11662306a36Sopenharmony_ci 11762306a36Sopenharmony_ciThe RDPMC instruction (or _rdpmc compiler intrinsic) can now be used 11862306a36Sopenharmony_cito read slots and the topdown metrics at different points of the program: 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ci#include <stdint.h> 12162306a36Sopenharmony_ci#include <x86intrin.h> 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ci#define RDPMC_FIXED (1 << 30) /* return fixed counters */ 12462306a36Sopenharmony_ci#define RDPMC_METRIC (1 << 29) /* return metric counters */ 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ci#define FIXED_COUNTER_SLOTS 3 12762306a36Sopenharmony_ci#define METRIC_COUNTER_TOPDOWN_L1_L2 0 12862306a36Sopenharmony_ci 12962306a36Sopenharmony_cistatic inline uint64_t read_slots(void) 13062306a36Sopenharmony_ci{ 13162306a36Sopenharmony_ci return _rdpmc(RDPMC_FIXED | FIXED_COUNTER_SLOTS); 13262306a36Sopenharmony_ci} 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_cistatic inline uint64_t read_metrics(void) 13562306a36Sopenharmony_ci{ 13662306a36Sopenharmony_ci return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1_L2); 13762306a36Sopenharmony_ci} 13862306a36Sopenharmony_ci 13962306a36Sopenharmony_ciThen the program can be instrumented to read these metrics at different 14062306a36Sopenharmony_cipoints. 14162306a36Sopenharmony_ci 14262306a36Sopenharmony_ciIt's not a good idea to do this with too short code regions, 14362306a36Sopenharmony_cias the parallelism and overlap in the CPU program execution will 14462306a36Sopenharmony_cicause too much measurement inaccuracy. For example instrumenting 14562306a36Sopenharmony_ciindividual basic blocks is definitely too fine grained. 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ci_rdpmc calls should not be mixed with reading the metrics and slots counters 14862306a36Sopenharmony_cithrough system calls, as the kernel will reset these counters after each system 14962306a36Sopenharmony_cicall. 15062306a36Sopenharmony_ci 15162306a36Sopenharmony_ciDecoding metrics values 15262306a36Sopenharmony_ci======================= 15362306a36Sopenharmony_ci 15462306a36Sopenharmony_ciThe value reported by read_metrics() contains four 8 bit fields 15562306a36Sopenharmony_cithat represent a scaled ratio that represent the Level 1 bottleneck. 15662306a36Sopenharmony_ciAll four fields add up to 0xff (= 100%) 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ciThe binary ratios in the metric value can be converted to float ratios: 15962306a36Sopenharmony_ci 16062306a36Sopenharmony_ci#define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff) 16162306a36Sopenharmony_ci 16262306a36Sopenharmony_ci/* L1 Topdown metric events */ 16362306a36Sopenharmony_ci#define TOPDOWN_RETIRING(val) ((float)GET_METRIC(val, 0) / 0xff) 16462306a36Sopenharmony_ci#define TOPDOWN_BAD_SPEC(val) ((float)GET_METRIC(val, 1) / 0xff) 16562306a36Sopenharmony_ci#define TOPDOWN_FE_BOUND(val) ((float)GET_METRIC(val, 2) / 0xff) 16662306a36Sopenharmony_ci#define TOPDOWN_BE_BOUND(val) ((float)GET_METRIC(val, 3) / 0xff) 16762306a36Sopenharmony_ci 16862306a36Sopenharmony_ci/* 16962306a36Sopenharmony_ci * L2 Topdown metric events. 17062306a36Sopenharmony_ci * Available on Sapphire Rapids and later platforms. 17162306a36Sopenharmony_ci */ 17262306a36Sopenharmony_ci#define TOPDOWN_HEAVY_OPS(val) ((float)GET_METRIC(val, 4) / 0xff) 17362306a36Sopenharmony_ci#define TOPDOWN_BR_MISPREDICT(val) ((float)GET_METRIC(val, 5) / 0xff) 17462306a36Sopenharmony_ci#define TOPDOWN_FETCH_LAT(val) ((float)GET_METRIC(val, 6) / 0xff) 17562306a36Sopenharmony_ci#define TOPDOWN_MEM_BOUND(val) ((float)GET_METRIC(val, 7) / 0xff) 17662306a36Sopenharmony_ci 17762306a36Sopenharmony_ciand then converted to percent for printing. 17862306a36Sopenharmony_ci 17962306a36Sopenharmony_ciThe ratios in the metric accumulate for the time when the counter 18062306a36Sopenharmony_ciis enabled. For measuring programs it is often useful to measure 18162306a36Sopenharmony_cispecific sections. For this it is needed to deltas on metrics. 18262306a36Sopenharmony_ci 18362306a36Sopenharmony_ciThis can be done by scaling the metrics with the slots counter 18462306a36Sopenharmony_ciread at the same time. 18562306a36Sopenharmony_ci 18662306a36Sopenharmony_ciThen it's possible to take deltas of these slots counts 18762306a36Sopenharmony_cimeasured at different points, and determine the metrics 18862306a36Sopenharmony_cifor that time period. 18962306a36Sopenharmony_ci 19062306a36Sopenharmony_ci slots_a = read_slots(); 19162306a36Sopenharmony_ci metric_a = read_metrics(); 19262306a36Sopenharmony_ci 19362306a36Sopenharmony_ci ... larger code region ... 19462306a36Sopenharmony_ci 19562306a36Sopenharmony_ci slots_b = read_slots() 19662306a36Sopenharmony_ci metric_b = read_metrics() 19762306a36Sopenharmony_ci 19862306a36Sopenharmony_ci # compute scaled metrics for measurement a 19962306a36Sopenharmony_ci retiring_slots_a = GET_METRIC(metric_a, 0) * slots_a 20062306a36Sopenharmony_ci bad_spec_slots_a = GET_METRIC(metric_a, 1) * slots_a 20162306a36Sopenharmony_ci fe_bound_slots_a = GET_METRIC(metric_a, 2) * slots_a 20262306a36Sopenharmony_ci be_bound_slots_a = GET_METRIC(metric_a, 3) * slots_a 20362306a36Sopenharmony_ci 20462306a36Sopenharmony_ci # compute delta scaled metrics between b and a 20562306a36Sopenharmony_ci retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a 20662306a36Sopenharmony_ci bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a 20762306a36Sopenharmony_ci fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a 20862306a36Sopenharmony_ci be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a 20962306a36Sopenharmony_ci 21062306a36Sopenharmony_ciLater the individual ratios of L1 metric events for the measurement period can 21162306a36Sopenharmony_cibe recreated from these counts. 21262306a36Sopenharmony_ci 21362306a36Sopenharmony_ci slots_delta = slots_b - slots_a 21462306a36Sopenharmony_ci retiring_ratio = (float)retiring_slots / slots_delta 21562306a36Sopenharmony_ci bad_spec_ratio = (float)bad_spec_slots / slots_delta 21662306a36Sopenharmony_ci fe_bound_ratio = (float)fe_bound_slots / slots_delta 21762306a36Sopenharmony_ci be_bound_ratio = (float)be_bound_slots / slota_delta 21862306a36Sopenharmony_ci 21962306a36Sopenharmony_ci printf("Retiring %.2f%% Bad Speculation %.2f%% FE Bound %.2f%% BE Bound %.2f%%\n", 22062306a36Sopenharmony_ci retiring_ratio * 100., 22162306a36Sopenharmony_ci bad_spec_ratio * 100., 22262306a36Sopenharmony_ci fe_bound_ratio * 100., 22362306a36Sopenharmony_ci be_bound_ratio * 100.); 22462306a36Sopenharmony_ci 22562306a36Sopenharmony_ciThe individual ratios of L2 metric events for the measurement period can be 22662306a36Sopenharmony_cirecreated from L1 and L2 metric counters. (Available on Sapphire Rapids and 22762306a36Sopenharmony_cilater platforms) 22862306a36Sopenharmony_ci 22962306a36Sopenharmony_ci # compute scaled metrics for measurement a 23062306a36Sopenharmony_ci heavy_ops_slots_a = GET_METRIC(metric_a, 4) * slots_a 23162306a36Sopenharmony_ci br_mispredict_slots_a = GET_METRIC(metric_a, 5) * slots_a 23262306a36Sopenharmony_ci fetch_lat_slots_a = GET_METRIC(metric_a, 6) * slots_a 23362306a36Sopenharmony_ci mem_bound_slots_a = GET_METRIC(metric_a, 7) * slots_a 23462306a36Sopenharmony_ci 23562306a36Sopenharmony_ci # compute delta scaled metrics between b and a 23662306a36Sopenharmony_ci heavy_ops_slots = GET_METRIC(metric_b, 4) * slots_b - heavy_ops_slots_a 23762306a36Sopenharmony_ci br_mispredict_slots = GET_METRIC(metric_b, 5) * slots_b - br_mispredict_slots_a 23862306a36Sopenharmony_ci fetch_lat_slots = GET_METRIC(metric_b, 6) * slots_b - fetch_lat_slots_a 23962306a36Sopenharmony_ci mem_bound_slots = GET_METRIC(metric_b, 7) * slots_b - mem_bound_slots_a 24062306a36Sopenharmony_ci 24162306a36Sopenharmony_ci slots_delta = slots_b - slots_a 24262306a36Sopenharmony_ci heavy_ops_ratio = (float)heavy_ops_slots / slots_delta 24362306a36Sopenharmony_ci light_ops_ratio = retiring_ratio - heavy_ops_ratio; 24462306a36Sopenharmony_ci 24562306a36Sopenharmony_ci br_mispredict_ratio = (float)br_mispredict_slots / slots_delta 24662306a36Sopenharmony_ci machine_clears_ratio = bad_spec_ratio - br_mispredict_ratio; 24762306a36Sopenharmony_ci 24862306a36Sopenharmony_ci fetch_lat_ratio = (float)fetch_lat_slots / slots_delta 24962306a36Sopenharmony_ci fetch_bw_ratio = fe_bound_ratio - fetch_lat_ratio; 25062306a36Sopenharmony_ci 25162306a36Sopenharmony_ci mem_bound_ratio = (float)mem_bound_slots / slota_delta 25262306a36Sopenharmony_ci core_bound_ratio = be_bound_ratio - mem_bound_ratio; 25362306a36Sopenharmony_ci 25462306a36Sopenharmony_ci printf("Heavy Operations %.2f%% Light Operations %.2f%% " 25562306a36Sopenharmony_ci "Branch Mispredict %.2f%% Machine Clears %.2f%% " 25662306a36Sopenharmony_ci "Fetch Latency %.2f%% Fetch Bandwidth %.2f%% " 25762306a36Sopenharmony_ci "Mem Bound %.2f%% Core Bound %.2f%%\n", 25862306a36Sopenharmony_ci heavy_ops_ratio * 100., 25962306a36Sopenharmony_ci light_ops_ratio * 100., 26062306a36Sopenharmony_ci br_mispredict_ratio * 100., 26162306a36Sopenharmony_ci machine_clears_ratio * 100., 26262306a36Sopenharmony_ci fetch_lat_ratio * 100., 26362306a36Sopenharmony_ci fetch_bw_ratio * 100., 26462306a36Sopenharmony_ci mem_bound_ratio * 100., 26562306a36Sopenharmony_ci core_bound_ratio * 100.); 26662306a36Sopenharmony_ci 26762306a36Sopenharmony_ciResetting metrics counters 26862306a36Sopenharmony_ci========================== 26962306a36Sopenharmony_ci 27062306a36Sopenharmony_ciSince the individual metrics are only 8bit they lose precision for 27162306a36Sopenharmony_cishort regions over time because the number of cycles covered by each 27262306a36Sopenharmony_cifraction bit shrinks. So the counters need to be reset regularly. 27362306a36Sopenharmony_ci 27462306a36Sopenharmony_ciWhen using the kernel perf API the kernel resets on every read. 27562306a36Sopenharmony_ciSo as long as the reading is at reasonable intervals (every few 27662306a36Sopenharmony_ciseconds) the precision is good. 27762306a36Sopenharmony_ci 27862306a36Sopenharmony_ciWhen using perf stat it is recommended to always use the -I option, 27962306a36Sopenharmony_ciwith no longer interval than a few seconds 28062306a36Sopenharmony_ci 28162306a36Sopenharmony_ci perf stat -I 1000 --topdown ... 28262306a36Sopenharmony_ci 28362306a36Sopenharmony_ciFor user programs using RDPMC directly the counter can 28462306a36Sopenharmony_cibe reset explicitly using ioctl: 28562306a36Sopenharmony_ci 28662306a36Sopenharmony_ci ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0); 28762306a36Sopenharmony_ci 28862306a36Sopenharmony_ciThis "opens" a new measurement period. 28962306a36Sopenharmony_ci 29062306a36Sopenharmony_ciA program using RDPMC for TopDown should schedule such a reset 29162306a36Sopenharmony_ciregularly, as in every few seconds. 29262306a36Sopenharmony_ci 29362306a36Sopenharmony_ciLimits on Intel Ice Lake 29462306a36Sopenharmony_ci======================== 29562306a36Sopenharmony_ci 29662306a36Sopenharmony_ciFour pseudo TopDown metric events are exposed for the end-users, 29762306a36Sopenharmony_citopdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound. 29862306a36Sopenharmony_ciThey can be used to collect the TopDown value under the following 29962306a36Sopenharmony_cirules: 30062306a36Sopenharmony_ci- All the TopDown metric events must be in a group with the SLOTS event. 30162306a36Sopenharmony_ci- The SLOTS event must be the leader of the group. 30262306a36Sopenharmony_ci- The PERF_FORMAT_GROUP flag must be applied for each TopDown metric 30362306a36Sopenharmony_ci events 30462306a36Sopenharmony_ci 30562306a36Sopenharmony_ciThe SLOTS event and the TopDown metric events can be counting members of 30662306a36Sopenharmony_cia sampling read group. Since the SLOTS event must be the leader of a TopDown 30762306a36Sopenharmony_cigroup, the second event of the group is the sampling event. 30862306a36Sopenharmony_ciFor example, perf record -e '{slots, $sampling_event, topdown-retiring}:S' 30962306a36Sopenharmony_ci 31062306a36Sopenharmony_ciExtension on Intel Sapphire Rapids Server 31162306a36Sopenharmony_ci========================================= 31262306a36Sopenharmony_ciThe metrics counter is extended to support TMA method level 2 metrics. 31362306a36Sopenharmony_ciThe lower half of the register is the TMA level 1 metrics (legacy). 31462306a36Sopenharmony_ciThe upper half is also divided into four 8-bit fields for the new level 2 31562306a36Sopenharmony_cimetrics. Four more TopDown metric events are exposed for the end-users, 31662306a36Sopenharmony_citopdown-heavy-ops, topdown-br-mispredict, topdown-fetch-lat and 31762306a36Sopenharmony_citopdown-mem-bound. 31862306a36Sopenharmony_ci 31962306a36Sopenharmony_ciEach of the new level 2 metrics in the upper half is a subset of the 32062306a36Sopenharmony_cicorresponding level 1 metric in the lower half. Software can deduce the 32162306a36Sopenharmony_ciother four level 2 metrics by subtracting corresponding metrics as below. 32262306a36Sopenharmony_ci 32362306a36Sopenharmony_ci Light_Operations = Retiring - Heavy_Operations 32462306a36Sopenharmony_ci Machine_Clears = Bad_Speculation - Branch_Mispredicts 32562306a36Sopenharmony_ci Fetch_Bandwidth = Frontend_Bound - Fetch_Latency 32662306a36Sopenharmony_ci Core_Bound = Backend_Bound - Memory_Bound 32762306a36Sopenharmony_ci 32862306a36Sopenharmony_ci 32962306a36Sopenharmony_ci[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win 33062306a36Sopenharmony_ci[2] https://sites.google.com/site/analysismethods/yasin-pubs 33162306a36Sopenharmony_ci[3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis 33262306a36Sopenharmony_ci[4] https://github.com/andikleen/pmu-tools/tree/master/jevents 333