162306a36Sopenharmony_ciperf-c2c(1)
262306a36Sopenharmony_ci===========
362306a36Sopenharmony_ci
462306a36Sopenharmony_ciNAME
562306a36Sopenharmony_ci----
662306a36Sopenharmony_ciperf-c2c - Shared Data C2C/HITM Analyzer.
762306a36Sopenharmony_ci
862306a36Sopenharmony_ciSYNOPSIS
962306a36Sopenharmony_ci--------
1062306a36Sopenharmony_ci[verse]
1162306a36Sopenharmony_ci'perf c2c record' [<options>] <command>
1262306a36Sopenharmony_ci'perf c2c record' [<options>] \-- [<record command options>] <command>
1362306a36Sopenharmony_ci'perf c2c report' [<options>]
1462306a36Sopenharmony_ci
1562306a36Sopenharmony_ciDESCRIPTION
1662306a36Sopenharmony_ci-----------
1762306a36Sopenharmony_ciC2C stands for Cache To Cache.
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciThe perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
2062306a36Sopenharmony_ciyou to track down the cacheline contentions.
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciOn Intel, the tool is based on load latency and precise store facility events
2362306a36Sopenharmony_ciprovided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
2462306a36Sopenharmony_ciwith thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
2562306a36Sopenharmony_cilimitations, perf c2c is not supported on Zen3 cpus). On Arm64 it uses SPE to
2662306a36Sopenharmony_cisample load and store operations, therefore hardware and kernel support is
2762306a36Sopenharmony_cirequired. See linkperf:perf-arm-spe[1] for a setup guide. Due to the
2862306a36Sopenharmony_cistatistical nature of Arm SPE sampling, not every memory operation will be
2962306a36Sopenharmony_cisampled.
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciThese events provide:
3262306a36Sopenharmony_ci  - memory address of the access
3362306a36Sopenharmony_ci  - type of the access (load and store details)
3462306a36Sopenharmony_ci  - latency (in cycles) of the load access
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ciThe c2c tool provide means to record this data and report back access details
3762306a36Sopenharmony_cifor cachelines with highest contention - highest number of HITM accesses.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciThe basic workflow with this tool follows the standard record/report phase.
4062306a36Sopenharmony_ciUser uses the record command to record events data and report command to
4162306a36Sopenharmony_cidisplay it.
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ci
4462306a36Sopenharmony_ciRECORD OPTIONS
4562306a36Sopenharmony_ci--------------
4662306a36Sopenharmony_ci-e::
4762306a36Sopenharmony_ci--event=::
4862306a36Sopenharmony_ci	Select the PMU event. Use 'perf c2c record -e list'
4962306a36Sopenharmony_ci	to list available events.
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci-v::
5262306a36Sopenharmony_ci--verbose::
5362306a36Sopenharmony_ci	Be more verbose (show counter open errors, etc).
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ci-l::
5662306a36Sopenharmony_ci--ldlat::
5762306a36Sopenharmony_ci	Configure mem-loads latency. Supported on Intel and Arm64 processors
5862306a36Sopenharmony_ci	only. Ignored on other archs.
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ci-k::
6162306a36Sopenharmony_ci--all-kernel::
6262306a36Sopenharmony_ci	Configure all used events to run in kernel space.
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ci-u::
6562306a36Sopenharmony_ci--all-user::
6662306a36Sopenharmony_ci	Configure all used events to run in user space.
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ciREPORT OPTIONS
6962306a36Sopenharmony_ci--------------
7062306a36Sopenharmony_ci-k::
7162306a36Sopenharmony_ci--vmlinux=<file>::
7262306a36Sopenharmony_ci	vmlinux pathname
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ci-v::
7562306a36Sopenharmony_ci--verbose::
7662306a36Sopenharmony_ci	Be more verbose (show counter open errors, etc).
7762306a36Sopenharmony_ci
7862306a36Sopenharmony_ci-i::
7962306a36Sopenharmony_ci--input::
8062306a36Sopenharmony_ci	Specify the input file to process.
8162306a36Sopenharmony_ci
8262306a36Sopenharmony_ci-N::
8362306a36Sopenharmony_ci--node-info::
8462306a36Sopenharmony_ci	Show extra node info in report (see NODE INFO section)
8562306a36Sopenharmony_ci
8662306a36Sopenharmony_ci-c::
8762306a36Sopenharmony_ci--coalesce::
8862306a36Sopenharmony_ci	Specify sorting fields for single cacheline display.
8962306a36Sopenharmony_ci	Following fields are available: tid,pid,iaddr,dso
9062306a36Sopenharmony_ci	(see COALESCE)
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ci-g::
9362306a36Sopenharmony_ci--call-graph::
9462306a36Sopenharmony_ci	Setup callchains parameters.
9562306a36Sopenharmony_ci	Please refer to perf-report man page for details.
9662306a36Sopenharmony_ci
9762306a36Sopenharmony_ci--stdio::
9862306a36Sopenharmony_ci	Force the stdio output (see STDIO OUTPUT)
9962306a36Sopenharmony_ci
10062306a36Sopenharmony_ci--stats::
10162306a36Sopenharmony_ci	Display only statistic tables and force stdio mode.
10262306a36Sopenharmony_ci
10362306a36Sopenharmony_ci--full-symbols::
10462306a36Sopenharmony_ci	Display full length of symbols.
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ci--no-source::
10762306a36Sopenharmony_ci	Do not display Source:Line column.
10862306a36Sopenharmony_ci
10962306a36Sopenharmony_ci--show-all::
11062306a36Sopenharmony_ci	Show all captured HITM lines, with no regard to HITM % 0.0005 limit.
11162306a36Sopenharmony_ci
11262306a36Sopenharmony_ci-f::
11362306a36Sopenharmony_ci--force::
11462306a36Sopenharmony_ci	Don't do ownership validation.
11562306a36Sopenharmony_ci
11662306a36Sopenharmony_ci-d::
11762306a36Sopenharmony_ci--display::
11862306a36Sopenharmony_ci	Switch to HITM type (rmt, lcl) or peer snooping type (peer) to display
11962306a36Sopenharmony_ci	and sort on. Total HITMs (tot) as default, except Arm64 uses peer mode
12062306a36Sopenharmony_ci	as default.
12162306a36Sopenharmony_ci
12262306a36Sopenharmony_ci--stitch-lbr::
12362306a36Sopenharmony_ci	Show callgraph with stitched LBRs, which may have more complete
12462306a36Sopenharmony_ci	callgraph. The perf.data file must have been obtained using
12562306a36Sopenharmony_ci	perf c2c record --call-graph lbr.
12662306a36Sopenharmony_ci	Disabled by default. In common cases with call stack overflows,
12762306a36Sopenharmony_ci	it can recreate better call stacks than the default lbr call stack
12862306a36Sopenharmony_ci	output. But this approach is not foolproof. There can be cases
12962306a36Sopenharmony_ci	where it creates incorrect call stacks from incorrect matches.
13062306a36Sopenharmony_ci	The known limitations include exception handing such as
13162306a36Sopenharmony_ci	setjmp/longjmp will have calls/returns not match.
13262306a36Sopenharmony_ci
13362306a36Sopenharmony_ci--double-cl::
13462306a36Sopenharmony_ci	Group the detection of shared cacheline events into double cacheline
13562306a36Sopenharmony_ci	granularity. Some architectures have an Adjacent Cacheline Prefetch
13662306a36Sopenharmony_ci	feature, which causes cacheline sharing to behave like the cacheline
13762306a36Sopenharmony_ci	size is doubled.
13862306a36Sopenharmony_ci
13962306a36Sopenharmony_ciC2C RECORD
14062306a36Sopenharmony_ci----------
14162306a36Sopenharmony_ciThe perf c2c record command setup options related to HITM cacheline analysis
14262306a36Sopenharmony_ciand calls standard perf record command.
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ciFollowing perf record options are configured by default:
14562306a36Sopenharmony_ci(check perf record man page for details)
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ci  -W,-d,--phys-data,--sample-cpu
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ciUnless specified otherwise with '-e' option, following events are monitored by
15062306a36Sopenharmony_cidefault on Intel:
15162306a36Sopenharmony_ci
15262306a36Sopenharmony_ci  cpu/mem-loads,ldlat=30/P
15362306a36Sopenharmony_ci  cpu/mem-stores/P
15462306a36Sopenharmony_ci
15562306a36Sopenharmony_cifollowing on AMD:
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ci  ibs_op//
15862306a36Sopenharmony_ci
15962306a36Sopenharmony_ciand following on PowerPC:
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ci  cpu/mem-loads/
16262306a36Sopenharmony_ci  cpu/mem-stores/
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ciUser can pass any 'perf record' option behind '--' mark, like (to enable
16562306a36Sopenharmony_cicallchains and system wide monitoring):
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ci  $ perf c2c record -- -g -a
16862306a36Sopenharmony_ci
16962306a36Sopenharmony_ciPlease check RECORD OPTIONS section for specific c2c record options.
17062306a36Sopenharmony_ci
17162306a36Sopenharmony_ciC2C REPORT
17262306a36Sopenharmony_ci----------
17362306a36Sopenharmony_ciThe perf c2c report command displays shared data analysis.  It comes in two
17462306a36Sopenharmony_cidisplay modes: stdio and tui (default).
17562306a36Sopenharmony_ci
17662306a36Sopenharmony_ciThe report command workflow is following:
17762306a36Sopenharmony_ci  - sort all the data based on the cacheline address
17862306a36Sopenharmony_ci  - store access details for each cacheline
17962306a36Sopenharmony_ci  - sort all cachelines based on user settings
18062306a36Sopenharmony_ci  - display data
18162306a36Sopenharmony_ci
18262306a36Sopenharmony_ciIn general perf report output consist of 2 basic views:
18362306a36Sopenharmony_ci  1) most expensive cachelines list
18462306a36Sopenharmony_ci  2) offsets details for each cacheline
18562306a36Sopenharmony_ci
18662306a36Sopenharmony_ciFor each cacheline in the 1) list we display following data:
18762306a36Sopenharmony_ci(Both stdio and TUI modes follow the same fields output)
18862306a36Sopenharmony_ci
18962306a36Sopenharmony_ci  Index
19062306a36Sopenharmony_ci  - zero based index to identify the cacheline
19162306a36Sopenharmony_ci
19262306a36Sopenharmony_ci  Cacheline
19362306a36Sopenharmony_ci  - cacheline address (hex number)
19462306a36Sopenharmony_ci
19562306a36Sopenharmony_ci  Rmt/Lcl Hitm (Display with HITM types)
19662306a36Sopenharmony_ci  - cacheline percentage of all Remote/Local HITM accesses
19762306a36Sopenharmony_ci
19862306a36Sopenharmony_ci  Peer Snoop (Display with peer type)
19962306a36Sopenharmony_ci  - cacheline percentage of all peer accesses
20062306a36Sopenharmony_ci
20162306a36Sopenharmony_ci  LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
20262306a36Sopenharmony_ci  - count of Total/Local/Remote load HITMs
20362306a36Sopenharmony_ci
20462306a36Sopenharmony_ci  Load Peer - Total, Local, Remote (For display with peer type)
20562306a36Sopenharmony_ci  - count of Total/Local/Remote load from peer cache or DRAM
20662306a36Sopenharmony_ci
20762306a36Sopenharmony_ci  Total records
20862306a36Sopenharmony_ci  - sum of all cachelines accesses
20962306a36Sopenharmony_ci
21062306a36Sopenharmony_ci  Total loads
21162306a36Sopenharmony_ci  - sum of all load accesses
21262306a36Sopenharmony_ci
21362306a36Sopenharmony_ci  Total stores
21462306a36Sopenharmony_ci  - sum of all store accesses
21562306a36Sopenharmony_ci
21662306a36Sopenharmony_ci  Store Reference - L1Hit, L1Miss, N/A
21762306a36Sopenharmony_ci    L1Hit - store accesses that hit L1
21862306a36Sopenharmony_ci    L1Miss - store accesses that missed L1
21962306a36Sopenharmony_ci    N/A - store accesses with memory level is not available
22062306a36Sopenharmony_ci
22162306a36Sopenharmony_ci  Core Load Hit - FB, L1, L2
22262306a36Sopenharmony_ci  - count of load hits in FB (Fill Buffer), L1 and L2 cache
22362306a36Sopenharmony_ci
22462306a36Sopenharmony_ci  LLC Load Hit - LlcHit, LclHitm
22562306a36Sopenharmony_ci  - count of LLC load accesses, includes LLC hits and LLC HITMs
22662306a36Sopenharmony_ci
22762306a36Sopenharmony_ci  RMT Load Hit - RmtHit, RmtHitm
22862306a36Sopenharmony_ci  - count of remote load accesses, includes remote hits and remote HITMs;
22962306a36Sopenharmony_ci    on Arm neoverse cores, RmtHit is used to account remote accesses,
23062306a36Sopenharmony_ci    includes remote DRAM or any upward cache level in remote node
23162306a36Sopenharmony_ci
23262306a36Sopenharmony_ci  Load Dram - Lcl, Rmt
23362306a36Sopenharmony_ci  - count of local and remote DRAM accesses
23462306a36Sopenharmony_ci
23562306a36Sopenharmony_ciFor each offset in the 2) list we display following data:
23662306a36Sopenharmony_ci
23762306a36Sopenharmony_ci  HITM - Rmt, Lcl (Display with HITM types)
23862306a36Sopenharmony_ci  - % of Remote/Local HITM accesses for given offset within cacheline
23962306a36Sopenharmony_ci
24062306a36Sopenharmony_ci  Peer Snoop - Rmt, Lcl (Display with peer type)
24162306a36Sopenharmony_ci  - % of Remote/Local peer accesses for given offset within cacheline
24262306a36Sopenharmony_ci
24362306a36Sopenharmony_ci  Store Refs - L1 Hit, L1 Miss, N/A
24462306a36Sopenharmony_ci  - % of store accesses that hit L1, missed L1 and N/A (no available) memory
24562306a36Sopenharmony_ci    level for given offset within cacheline
24662306a36Sopenharmony_ci
24762306a36Sopenharmony_ci  Data address - Offset
24862306a36Sopenharmony_ci  - offset address
24962306a36Sopenharmony_ci
25062306a36Sopenharmony_ci  Pid
25162306a36Sopenharmony_ci  - pid of the process responsible for the accesses
25262306a36Sopenharmony_ci
25362306a36Sopenharmony_ci  Tid
25462306a36Sopenharmony_ci  - tid of the process responsible for the accesses
25562306a36Sopenharmony_ci
25662306a36Sopenharmony_ci  Code address
25762306a36Sopenharmony_ci  - code address responsible for the accesses
25862306a36Sopenharmony_ci
25962306a36Sopenharmony_ci  cycles - rmt hitm, lcl hitm, load (Display with HITM types)
26062306a36Sopenharmony_ci    - sum of cycles for given accesses - Remote/Local HITM and generic load
26162306a36Sopenharmony_ci
26262306a36Sopenharmony_ci  cycles - rmt peer, lcl peer, load (Display with peer type)
26362306a36Sopenharmony_ci    - sum of cycles for given accesses - Remote/Local peer load and generic load
26462306a36Sopenharmony_ci
26562306a36Sopenharmony_ci  cpu cnt
26662306a36Sopenharmony_ci    - number of cpus that participated on the access
26762306a36Sopenharmony_ci
26862306a36Sopenharmony_ci  Symbol
26962306a36Sopenharmony_ci    - code symbol related to the 'Code address' value
27062306a36Sopenharmony_ci
27162306a36Sopenharmony_ci  Shared Object
27262306a36Sopenharmony_ci    - shared object name related to the 'Code address' value
27362306a36Sopenharmony_ci
27462306a36Sopenharmony_ci  Source:Line
27562306a36Sopenharmony_ci    - source information related to the 'Code address' value
27662306a36Sopenharmony_ci
27762306a36Sopenharmony_ci  Node
27862306a36Sopenharmony_ci    - nodes participating on the access (see NODE INFO section)
27962306a36Sopenharmony_ci
28062306a36Sopenharmony_ciNODE INFO
28162306a36Sopenharmony_ci---------
28262306a36Sopenharmony_ciThe 'Node' field displays nodes that accesses given cacheline
28362306a36Sopenharmony_cioffset. Its output comes in 3 flavors:
28462306a36Sopenharmony_ci  - node IDs separated by ','
28562306a36Sopenharmony_ci  - node IDs with stats for each ID, in following format:
28662306a36Sopenharmony_ci      Node{cpus %hitms %stores} (Display with HITM types)
28762306a36Sopenharmony_ci      Node{cpus %peers %stores} (Display with peer type)
28862306a36Sopenharmony_ci  - node IDs with list of affected CPUs in following format:
28962306a36Sopenharmony_ci      Node{cpu list}
29062306a36Sopenharmony_ci
29162306a36Sopenharmony_ciUser can switch between above flavors with -N option or
29262306a36Sopenharmony_ciuse 'n' key to interactively switch in TUI mode.
29362306a36Sopenharmony_ci
29462306a36Sopenharmony_ciCOALESCE
29562306a36Sopenharmony_ci--------
29662306a36Sopenharmony_ciUser can specify how to sort offsets for cacheline.
29762306a36Sopenharmony_ci
29862306a36Sopenharmony_ciFollowing fields are available and governs the final
29962306a36Sopenharmony_cioutput fields set for cacheline offsets output:
30062306a36Sopenharmony_ci
30162306a36Sopenharmony_ci  tid   - coalesced by process TIDs
30262306a36Sopenharmony_ci  pid   - coalesced by process PIDs
30362306a36Sopenharmony_ci  iaddr - coalesced by code address, following fields are displayed:
30462306a36Sopenharmony_ci             Code address, Code symbol, Shared Object, Source line
30562306a36Sopenharmony_ci  dso   - coalesced by shared object
30662306a36Sopenharmony_ci
30762306a36Sopenharmony_ciBy default the coalescing is setup with 'pid,iaddr'.
30862306a36Sopenharmony_ci
30962306a36Sopenharmony_ciSTDIO OUTPUT
31062306a36Sopenharmony_ci------------
31162306a36Sopenharmony_ciThe stdio output displays data on standard output.
31262306a36Sopenharmony_ci
31362306a36Sopenharmony_ciFollowing tables are displayed:
31462306a36Sopenharmony_ci  Trace Event Information
31562306a36Sopenharmony_ci  - overall statistics of memory accesses
31662306a36Sopenharmony_ci
31762306a36Sopenharmony_ci  Global Shared Cache Line Event Information
31862306a36Sopenharmony_ci  - overall statistics on shared cachelines
31962306a36Sopenharmony_ci
32062306a36Sopenharmony_ci  Shared Data Cache Line Table
32162306a36Sopenharmony_ci  - list of most expensive cachelines
32262306a36Sopenharmony_ci
32362306a36Sopenharmony_ci  Shared Cache Line Distribution Pareto
32462306a36Sopenharmony_ci  - list of all accessed offsets for each cacheline
32562306a36Sopenharmony_ci
32662306a36Sopenharmony_ciTUI OUTPUT
32762306a36Sopenharmony_ci----------
32862306a36Sopenharmony_ciThe TUI output provides interactive interface to navigate
32962306a36Sopenharmony_cithrough cachelines list and to display offset details.
33062306a36Sopenharmony_ci
33162306a36Sopenharmony_ciFor details please refer to the help window by pressing '?' key.
33262306a36Sopenharmony_ci
33362306a36Sopenharmony_ciCREDITS
33462306a36Sopenharmony_ci-------
33562306a36Sopenharmony_ciAlthough Don Zickus, Dick Fowles and Joe Mario worked together
33662306a36Sopenharmony_cito get this implemented, we got lots of early help from Arnaldo
33762306a36Sopenharmony_ciCarvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen.
33862306a36Sopenharmony_ci
33962306a36Sopenharmony_ciC2C BLOG
34062306a36Sopenharmony_ci--------
34162306a36Sopenharmony_ciCheck Joe's blog on c2c tool for detailed use case explanation:
34262306a36Sopenharmony_ci  https://joemario.github.io/blog/2016/09/01/c2c-blog/
34362306a36Sopenharmony_ci
34462306a36Sopenharmony_ciSEE ALSO
34562306a36Sopenharmony_ci--------
34662306a36Sopenharmony_cilinkperf:perf-record[1], linkperf:perf-mem[1], linkperf:perf-arm-spe[1]
347