162306a36Sopenharmony_ci=========================================================
262306a36Sopenharmony_ciNotes on Analysing Behaviour Using Events and Tracepoints
362306a36Sopenharmony_ci=========================================================
462306a36Sopenharmony_ci:Author: Mel Gorman (PCL information heavily based on email from Ingo Molnar)
562306a36Sopenharmony_ci
662306a36Sopenharmony_ci1. Introduction
762306a36Sopenharmony_ci===============
862306a36Sopenharmony_ci
962306a36Sopenharmony_ciTracepoints (see Documentation/trace/tracepoints.rst) can be used without
1062306a36Sopenharmony_cicreating custom kernel modules to register probe functions using the event
1162306a36Sopenharmony_citracing infrastructure.
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciSimplistically, tracepoints represent important events that can be
1462306a36Sopenharmony_citaken in conjunction with other tracepoints to build a "Big Picture" of
1562306a36Sopenharmony_ciwhat is going on within the system. There are a large number of methods for
1662306a36Sopenharmony_cigathering and interpreting these events. Lacking any current Best Practises,
1762306a36Sopenharmony_cithis document describes some of the methods that can be used.
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciThis document assumes that debugfs is mounted on /sys/kernel/debug and that
2062306a36Sopenharmony_cithe appropriate tracing options have been configured into the kernel. It is
2162306a36Sopenharmony_ciassumed that the PCL tool tools/perf has been installed and is in your path.
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ci2. Listing Available Events
2462306a36Sopenharmony_ci===========================
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ci2.1 Standard Utilities
2762306a36Sopenharmony_ci----------------------
2862306a36Sopenharmony_ci
2962306a36Sopenharmony_ciAll possible events are visible from /sys/kernel/tracing/events. Simply
3062306a36Sopenharmony_cicalling::
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ci  $ find /sys/kernel/tracing/events -type d
3362306a36Sopenharmony_ci
3462306a36Sopenharmony_ciwill give a fair indication of the number of events available.
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ci2.2 PCL (Performance Counters for Linux)
3762306a36Sopenharmony_ci----------------------------------------
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciDiscovery and enumeration of all counters and events, including tracepoints,
4062306a36Sopenharmony_ciare available with the perf tool. Getting a list of available events is a
4162306a36Sopenharmony_cisimple case of::
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ci  $ perf list 2>&1 | grep Tracepoint
4462306a36Sopenharmony_ci  ext4:ext4_free_inode                     [Tracepoint event]
4562306a36Sopenharmony_ci  ext4:ext4_request_inode                  [Tracepoint event]
4662306a36Sopenharmony_ci  ext4:ext4_allocate_inode                 [Tracepoint event]
4762306a36Sopenharmony_ci  ext4:ext4_write_begin                    [Tracepoint event]
4862306a36Sopenharmony_ci  ext4:ext4_ordered_write_end              [Tracepoint event]
4962306a36Sopenharmony_ci  [ .... remaining output snipped .... ]
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ci3. Enabling Events
5362306a36Sopenharmony_ci==================
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ci3.1 System-Wide Event Enabling
5662306a36Sopenharmony_ci------------------------------
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_ciSee Documentation/trace/events.rst for a proper description on how events
5962306a36Sopenharmony_cican be enabled system-wide. A short example of enabling all events related
6062306a36Sopenharmony_cito page allocation would look something like::
6162306a36Sopenharmony_ci
6262306a36Sopenharmony_ci  $ for i in `find /sys/kernel/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ci3.2 System-Wide Event Enabling with SystemTap
6562306a36Sopenharmony_ci---------------------------------------------
6662306a36Sopenharmony_ci
6762306a36Sopenharmony_ciIn SystemTap, tracepoints are accessible using the kernel.trace() function
6862306a36Sopenharmony_cicall. The following is an example that reports every 5 seconds what processes
6962306a36Sopenharmony_ciwere allocating the pages.
7062306a36Sopenharmony_ci::
7162306a36Sopenharmony_ci
7262306a36Sopenharmony_ci  global page_allocs
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ci  probe kernel.trace("mm_page_alloc") {
7562306a36Sopenharmony_ci  	page_allocs[execname()]++
7662306a36Sopenharmony_ci  }
7762306a36Sopenharmony_ci
7862306a36Sopenharmony_ci  function print_count() {
7962306a36Sopenharmony_ci  	printf ("%-25s %-s\n", "#Pages Allocated", "Process Name")
8062306a36Sopenharmony_ci  	foreach (proc in page_allocs-)
8162306a36Sopenharmony_ci  		printf("%-25d %s\n", page_allocs[proc], proc)
8262306a36Sopenharmony_ci  	printf ("\n")
8362306a36Sopenharmony_ci  	delete page_allocs
8462306a36Sopenharmony_ci  }
8562306a36Sopenharmony_ci
8662306a36Sopenharmony_ci  probe timer.s(5) {
8762306a36Sopenharmony_ci          print_count()
8862306a36Sopenharmony_ci  }
8962306a36Sopenharmony_ci
9062306a36Sopenharmony_ci3.3 System-Wide Event Enabling with PCL
9162306a36Sopenharmony_ci---------------------------------------
9262306a36Sopenharmony_ci
9362306a36Sopenharmony_ciBy specifying the -a switch and analysing sleep, the system-wide events
9462306a36Sopenharmony_cifor a duration of time can be examined.
9562306a36Sopenharmony_ci::
9662306a36Sopenharmony_ci
9762306a36Sopenharmony_ci $ perf stat -a \
9862306a36Sopenharmony_ci	-e kmem:mm_page_alloc -e kmem:mm_page_free \
9962306a36Sopenharmony_ci	-e kmem:mm_page_free_batched \
10062306a36Sopenharmony_ci	sleep 10
10162306a36Sopenharmony_ci Performance counter stats for 'sleep 10':
10262306a36Sopenharmony_ci
10362306a36Sopenharmony_ci           9630  kmem:mm_page_alloc
10462306a36Sopenharmony_ci           2143  kmem:mm_page_free
10562306a36Sopenharmony_ci           7424  kmem:mm_page_free_batched
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ci   10.002577764  seconds time elapsed
10862306a36Sopenharmony_ci
10962306a36Sopenharmony_ciSimilarly, one could execute a shell and exit it as desired to get a report
11062306a36Sopenharmony_ciat that point.
11162306a36Sopenharmony_ci
11262306a36Sopenharmony_ci3.4 Local Event Enabling
11362306a36Sopenharmony_ci------------------------
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ciDocumentation/trace/ftrace.rst describes how to enable events on a per-thread
11662306a36Sopenharmony_cibasis using set_ftrace_pid.
11762306a36Sopenharmony_ci
11862306a36Sopenharmony_ci3.5 Local Event Enablement with PCL
11962306a36Sopenharmony_ci-----------------------------------
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ciEvents can be activated and tracked for the duration of a process on a local
12262306a36Sopenharmony_cibasis using PCL such as follows.
12362306a36Sopenharmony_ci::
12462306a36Sopenharmony_ci
12562306a36Sopenharmony_ci  $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \
12662306a36Sopenharmony_ci		 -e kmem:mm_page_free_batched ./hackbench 10
12762306a36Sopenharmony_ci  Time: 0.909
12862306a36Sopenharmony_ci
12962306a36Sopenharmony_ci    Performance counter stats for './hackbench 10':
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ci          17803  kmem:mm_page_alloc
13262306a36Sopenharmony_ci          12398  kmem:mm_page_free
13362306a36Sopenharmony_ci           4827  kmem:mm_page_free_batched
13462306a36Sopenharmony_ci
13562306a36Sopenharmony_ci    0.973913387  seconds time elapsed
13662306a36Sopenharmony_ci
13762306a36Sopenharmony_ci4. Event Filtering
13862306a36Sopenharmony_ci==================
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ciDocumentation/trace/ftrace.rst covers in-depth how to filter events in
14162306a36Sopenharmony_ciftrace.  Obviously using grep and awk of trace_pipe is an option as well
14262306a36Sopenharmony_cias any script reading trace_pipe.
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ci5. Analysing Event Variances with PCL
14562306a36Sopenharmony_ci=====================================
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ciAny workload can exhibit variances between runs and it can be important
14862306a36Sopenharmony_cito know what the standard deviation is. By and large, this is left to the
14962306a36Sopenharmony_ciperformance analyst to do it by hand. In the event that the discrete event
15062306a36Sopenharmony_cioccurrences are useful to the performance analyst, then perf can be used.
15162306a36Sopenharmony_ci::
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ci  $ perf stat --repeat 5 -e kmem:mm_page_alloc -e kmem:mm_page_free
15462306a36Sopenharmony_ci			-e kmem:mm_page_free_batched ./hackbench 10
15562306a36Sopenharmony_ci  Time: 0.890
15662306a36Sopenharmony_ci  Time: 0.895
15762306a36Sopenharmony_ci  Time: 0.915
15862306a36Sopenharmony_ci  Time: 1.001
15962306a36Sopenharmony_ci  Time: 0.899
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ci   Performance counter stats for './hackbench 10' (5 runs):
16262306a36Sopenharmony_ci
16362306a36Sopenharmony_ci          16630  kmem:mm_page_alloc         ( +-   3.542% )
16462306a36Sopenharmony_ci          11486  kmem:mm_page_free	    ( +-   4.771% )
16562306a36Sopenharmony_ci           4730  kmem:mm_page_free_batched  ( +-   2.325% )
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ci    0.982653002  seconds time elapsed   ( +-   1.448% )
16862306a36Sopenharmony_ci
16962306a36Sopenharmony_ciIn the event that some higher-level event is required that depends on some
17062306a36Sopenharmony_ciaggregation of discrete events, then a script would need to be developed.
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_ciUsing --repeat, it is also possible to view how events are fluctuating over
17362306a36Sopenharmony_citime on a system-wide basis using -a and sleep.
17462306a36Sopenharmony_ci::
17562306a36Sopenharmony_ci
17662306a36Sopenharmony_ci  $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \
17762306a36Sopenharmony_ci		-e kmem:mm_page_free_batched \
17862306a36Sopenharmony_ci		-a --repeat 10 \
17962306a36Sopenharmony_ci		sleep 1
18062306a36Sopenharmony_ci  Performance counter stats for 'sleep 1' (10 runs):
18162306a36Sopenharmony_ci
18262306a36Sopenharmony_ci           1066  kmem:mm_page_alloc         ( +-  26.148% )
18362306a36Sopenharmony_ci            182  kmem:mm_page_free          ( +-   5.464% )
18462306a36Sopenharmony_ci            890  kmem:mm_page_free_batched  ( +-  30.079% )
18562306a36Sopenharmony_ci
18662306a36Sopenharmony_ci    1.002251757  seconds time elapsed   ( +-   0.005% )
18762306a36Sopenharmony_ci
18862306a36Sopenharmony_ci6. Higher-Level Analysis with Helper Scripts
18962306a36Sopenharmony_ci============================================
19062306a36Sopenharmony_ci
19162306a36Sopenharmony_ciWhen events are enabled the events that are triggering can be read from
19262306a36Sopenharmony_ci/sys/kernel/tracing/trace_pipe in human-readable format although binary
19362306a36Sopenharmony_cioptions exist as well. By post-processing the output, further information can
19462306a36Sopenharmony_cibe gathered on-line as appropriate. Examples of post-processing might include
19562306a36Sopenharmony_ci
19662306a36Sopenharmony_ci  - Reading information from /proc for the PID that triggered the event
19762306a36Sopenharmony_ci  - Deriving a higher-level event from a series of lower-level events.
19862306a36Sopenharmony_ci  - Calculating latencies between two events
19962306a36Sopenharmony_ci
20062306a36Sopenharmony_ciDocumentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example
20162306a36Sopenharmony_ciscript that can read trace_pipe from STDIN or a copy of a trace. When used
20262306a36Sopenharmony_cion-line, it can be interrupted once to generate a report without exiting
20362306a36Sopenharmony_ciand twice to exit.
20462306a36Sopenharmony_ci
20562306a36Sopenharmony_ciSimplistically, the script just reads STDIN and counts up events but it
20662306a36Sopenharmony_cialso can do more such as
20762306a36Sopenharmony_ci
20862306a36Sopenharmony_ci  - Derive high-level events from many low-level events. If a number of pages
20962306a36Sopenharmony_ci    are freed to the main allocator from the per-CPU lists, it recognises
21062306a36Sopenharmony_ci    that as one per-CPU drain even though there is no specific tracepoint
21162306a36Sopenharmony_ci    for that event
21262306a36Sopenharmony_ci  - It can aggregate based on PID or individual process number
21362306a36Sopenharmony_ci  - In the event memory is getting externally fragmented, it reports
21462306a36Sopenharmony_ci    on whether the fragmentation event was severe or moderate.
21562306a36Sopenharmony_ci  - When receiving an event about a PID, it can record who the parent was so
21662306a36Sopenharmony_ci    that if large numbers of events are coming from very short-lived
21762306a36Sopenharmony_ci    processes, the parent process responsible for creating all the helpers
21862306a36Sopenharmony_ci    can be identified
21962306a36Sopenharmony_ci
22062306a36Sopenharmony_ci7. Lower-Level Analysis with PCL
22162306a36Sopenharmony_ci================================
22262306a36Sopenharmony_ci
22362306a36Sopenharmony_ciThere may also be a requirement to identify what functions within a program
22462306a36Sopenharmony_ciwere generating events within the kernel. To begin this sort of analysis, the
22562306a36Sopenharmony_cidata must be recorded. At the time of writing, this required root:
22662306a36Sopenharmony_ci::
22762306a36Sopenharmony_ci
22862306a36Sopenharmony_ci  $ perf record -c 1 \
22962306a36Sopenharmony_ci	-e kmem:mm_page_alloc -e kmem:mm_page_free \
23062306a36Sopenharmony_ci	-e kmem:mm_page_free_batched \
23162306a36Sopenharmony_ci	./hackbench 10
23262306a36Sopenharmony_ci  Time: 0.894
23362306a36Sopenharmony_ci  [ perf record: Captured and wrote 0.733 MB perf.data (~32010 samples) ]
23462306a36Sopenharmony_ci
23562306a36Sopenharmony_ciNote the use of '-c 1' to set the event period to sample. The default sample
23662306a36Sopenharmony_ciperiod is quite high to minimise overhead but the information collected can be
23762306a36Sopenharmony_civery coarse as a result.
23862306a36Sopenharmony_ci
23962306a36Sopenharmony_ciThis record outputted a file called perf.data which can be analysed using
24062306a36Sopenharmony_ciperf report.
24162306a36Sopenharmony_ci::
24262306a36Sopenharmony_ci
24362306a36Sopenharmony_ci  $ perf report
24462306a36Sopenharmony_ci  # Samples: 30922
24562306a36Sopenharmony_ci  #
24662306a36Sopenharmony_ci  # Overhead    Command                     Shared Object
24762306a36Sopenharmony_ci  # ........  .........  ................................
24862306a36Sopenharmony_ci  #
24962306a36Sopenharmony_ci      87.27%  hackbench  [vdso]
25062306a36Sopenharmony_ci       6.85%  hackbench  /lib/i686/cmov/libc-2.9.so
25162306a36Sopenharmony_ci       2.62%  hackbench  /lib/ld-2.9.so
25262306a36Sopenharmony_ci       1.52%       perf  [vdso]
25362306a36Sopenharmony_ci       1.22%  hackbench  ./hackbench
25462306a36Sopenharmony_ci       0.48%  hackbench  [kernel]
25562306a36Sopenharmony_ci       0.02%       perf  /lib/i686/cmov/libc-2.9.so
25662306a36Sopenharmony_ci       0.01%       perf  /usr/bin/perf
25762306a36Sopenharmony_ci       0.01%       perf  /lib/ld-2.9.so
25862306a36Sopenharmony_ci       0.00%  hackbench  /lib/i686/cmov/libpthread-2.9.so
25962306a36Sopenharmony_ci  #
26062306a36Sopenharmony_ci  # (For more details, try: perf report --sort comm,dso,symbol)
26162306a36Sopenharmony_ci  #
26262306a36Sopenharmony_ci
26362306a36Sopenharmony_ciAccording to this, the vast majority of events triggered on events
26462306a36Sopenharmony_ciwithin the VDSO. With simple binaries, this will often be the case so let's
26562306a36Sopenharmony_citake a slightly different example. In the course of writing this, it was
26662306a36Sopenharmony_cinoticed that X was generating an insane amount of page allocations so let's look
26762306a36Sopenharmony_ciat it:
26862306a36Sopenharmony_ci::
26962306a36Sopenharmony_ci
27062306a36Sopenharmony_ci  $ perf record -c 1 -f \
27162306a36Sopenharmony_ci		-e kmem:mm_page_alloc -e kmem:mm_page_free \
27262306a36Sopenharmony_ci		-e kmem:mm_page_free_batched \
27362306a36Sopenharmony_ci		-p `pidof X`
27462306a36Sopenharmony_ci
27562306a36Sopenharmony_ciThis was interrupted after a few seconds and
27662306a36Sopenharmony_ci::
27762306a36Sopenharmony_ci
27862306a36Sopenharmony_ci  $ perf report
27962306a36Sopenharmony_ci  # Samples: 27666
28062306a36Sopenharmony_ci  #
28162306a36Sopenharmony_ci  # Overhead  Command                            Shared Object
28262306a36Sopenharmony_ci  # ........  .......  .......................................
28362306a36Sopenharmony_ci  #
28462306a36Sopenharmony_ci      51.95%     Xorg  [vdso]
28562306a36Sopenharmony_ci      47.95%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1
28662306a36Sopenharmony_ci       0.09%     Xorg  /lib/i686/cmov/libc-2.9.so
28762306a36Sopenharmony_ci       0.01%     Xorg  [kernel]
28862306a36Sopenharmony_ci  #
28962306a36Sopenharmony_ci  # (For more details, try: perf report --sort comm,dso,symbol)
29062306a36Sopenharmony_ci  #
29162306a36Sopenharmony_ci
29262306a36Sopenharmony_ciSo, almost half of the events are occurring in a library. To get an idea which
29362306a36Sopenharmony_cisymbol:
29462306a36Sopenharmony_ci::
29562306a36Sopenharmony_ci
29662306a36Sopenharmony_ci  $ perf report --sort comm,dso,symbol
29762306a36Sopenharmony_ci  # Samples: 27666
29862306a36Sopenharmony_ci  #
29962306a36Sopenharmony_ci  # Overhead  Command                            Shared Object  Symbol
30062306a36Sopenharmony_ci  # ........  .......  .......................................  ......
30162306a36Sopenharmony_ci  #
30262306a36Sopenharmony_ci      51.95%     Xorg  [vdso]                                   [.] 0x000000ffffe424
30362306a36Sopenharmony_ci      47.93%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] pixmanFillsse2
30462306a36Sopenharmony_ci       0.09%     Xorg  /lib/i686/cmov/libc-2.9.so               [.] _int_malloc
30562306a36Sopenharmony_ci       0.01%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] pixman_region32_copy_f
30662306a36Sopenharmony_ci       0.01%     Xorg  [kernel]                                 [k] read_hpet
30762306a36Sopenharmony_ci       0.01%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] get_fast_path
30862306a36Sopenharmony_ci       0.00%     Xorg  [kernel]                                 [k] ftrace_trace_userstack
30962306a36Sopenharmony_ci
31062306a36Sopenharmony_ciTo see where within the function pixmanFillsse2 things are going wrong:
31162306a36Sopenharmony_ci::
31262306a36Sopenharmony_ci
31362306a36Sopenharmony_ci  $ perf annotate pixmanFillsse2
31462306a36Sopenharmony_ci  [ ... ]
31562306a36Sopenharmony_ci    0.00 :         34eeb:       0f 18 08                prefetcht0 (%eax)
31662306a36Sopenharmony_ci         :      }
31762306a36Sopenharmony_ci         :
31862306a36Sopenharmony_ci         :      extern __inline void __attribute__((__gnu_inline__, __always_inline__, _
31962306a36Sopenharmony_ci         :      _mm_store_si128 (__m128i *__P, __m128i __B) :      {
32062306a36Sopenharmony_ci         :        *__P = __B;
32162306a36Sopenharmony_ci   12.40 :         34eee:       66 0f 7f 80 40 ff ff    movdqa %xmm0,-0xc0(%eax)
32262306a36Sopenharmony_ci    0.00 :         34ef5:       ff
32362306a36Sopenharmony_ci   12.40 :         34ef6:       66 0f 7f 80 50 ff ff    movdqa %xmm0,-0xb0(%eax)
32462306a36Sopenharmony_ci    0.00 :         34efd:       ff
32562306a36Sopenharmony_ci   12.39 :         34efe:       66 0f 7f 80 60 ff ff    movdqa %xmm0,-0xa0(%eax)
32662306a36Sopenharmony_ci    0.00 :         34f05:       ff
32762306a36Sopenharmony_ci   12.67 :         34f06:       66 0f 7f 80 70 ff ff    movdqa %xmm0,-0x90(%eax)
32862306a36Sopenharmony_ci    0.00 :         34f0d:       ff
32962306a36Sopenharmony_ci   12.58 :         34f0e:       66 0f 7f 40 80          movdqa %xmm0,-0x80(%eax)
33062306a36Sopenharmony_ci   12.31 :         34f13:       66 0f 7f 40 90          movdqa %xmm0,-0x70(%eax)
33162306a36Sopenharmony_ci   12.40 :         34f18:       66 0f 7f 40 a0          movdqa %xmm0,-0x60(%eax)
33262306a36Sopenharmony_ci   12.31 :         34f1d:       66 0f 7f 40 b0          movdqa %xmm0,-0x50(%eax)
33362306a36Sopenharmony_ci
33462306a36Sopenharmony_ciAt a glance, it looks like the time is being spent copying pixmaps to
33562306a36Sopenharmony_cithe card.  Further investigation would be needed to determine why pixmaps
33662306a36Sopenharmony_ciare being copied around so much but a starting point would be to take an
33762306a36Sopenharmony_ciancient build of libpixmap out of the library path where it was totally
33862306a36Sopenharmony_ciforgotten about from months ago!
339