162306a36Sopenharmony_ci========================================================= 262306a36Sopenharmony_ciNotes on Analysing Behaviour Using Events and Tracepoints 362306a36Sopenharmony_ci========================================================= 462306a36Sopenharmony_ci:Author: Mel Gorman (PCL information heavily based on email from Ingo Molnar) 562306a36Sopenharmony_ci 662306a36Sopenharmony_ci1. Introduction 762306a36Sopenharmony_ci=============== 862306a36Sopenharmony_ci 962306a36Sopenharmony_ciTracepoints (see Documentation/trace/tracepoints.rst) can be used without 1062306a36Sopenharmony_cicreating custom kernel modules to register probe functions using the event 1162306a36Sopenharmony_citracing infrastructure. 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciSimplistically, tracepoints represent important events that can be 1462306a36Sopenharmony_citaken in conjunction with other tracepoints to build a "Big Picture" of 1562306a36Sopenharmony_ciwhat is going on within the system. There are a large number of methods for 1662306a36Sopenharmony_cigathering and interpreting these events. Lacking any current Best Practises, 1762306a36Sopenharmony_cithis document describes some of the methods that can be used. 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciThis document assumes that debugfs is mounted on /sys/kernel/debug and that 2062306a36Sopenharmony_cithe appropriate tracing options have been configured into the kernel. It is 2162306a36Sopenharmony_ciassumed that the PCL tool tools/perf has been installed and is in your path. 2262306a36Sopenharmony_ci 2362306a36Sopenharmony_ci2. Listing Available Events 2462306a36Sopenharmony_ci=========================== 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ci2.1 Standard Utilities 2762306a36Sopenharmony_ci---------------------- 2862306a36Sopenharmony_ci 2962306a36Sopenharmony_ciAll possible events are visible from /sys/kernel/tracing/events. Simply 3062306a36Sopenharmony_cicalling:: 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ci $ find /sys/kernel/tracing/events -type d 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ciwill give a fair indication of the number of events available. 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ci2.2 PCL (Performance Counters for Linux) 3762306a36Sopenharmony_ci---------------------------------------- 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ciDiscovery and enumeration of all counters and events, including tracepoints, 4062306a36Sopenharmony_ciare available with the perf tool. Getting a list of available events is a 4162306a36Sopenharmony_cisimple case of:: 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ci $ perf list 2>&1 | grep Tracepoint 4462306a36Sopenharmony_ci ext4:ext4_free_inode [Tracepoint event] 4562306a36Sopenharmony_ci ext4:ext4_request_inode [Tracepoint event] 4662306a36Sopenharmony_ci ext4:ext4_allocate_inode [Tracepoint event] 4762306a36Sopenharmony_ci ext4:ext4_write_begin [Tracepoint event] 4862306a36Sopenharmony_ci ext4:ext4_ordered_write_end [Tracepoint event] 4962306a36Sopenharmony_ci [ .... remaining output snipped .... ] 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ci3. Enabling Events 5362306a36Sopenharmony_ci================== 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ci3.1 System-Wide Event Enabling 5662306a36Sopenharmony_ci------------------------------ 5762306a36Sopenharmony_ci 5862306a36Sopenharmony_ciSee Documentation/trace/events.rst for a proper description on how events 5962306a36Sopenharmony_cican be enabled system-wide. A short example of enabling all events related 6062306a36Sopenharmony_cito page allocation would look something like:: 6162306a36Sopenharmony_ci 6262306a36Sopenharmony_ci $ for i in `find /sys/kernel/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ci3.2 System-Wide Event Enabling with SystemTap 6562306a36Sopenharmony_ci--------------------------------------------- 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ciIn SystemTap, tracepoints are accessible using the kernel.trace() function 6862306a36Sopenharmony_cicall. The following is an example that reports every 5 seconds what processes 6962306a36Sopenharmony_ciwere allocating the pages. 7062306a36Sopenharmony_ci:: 7162306a36Sopenharmony_ci 7262306a36Sopenharmony_ci global page_allocs 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ci probe kernel.trace("mm_page_alloc") { 7562306a36Sopenharmony_ci page_allocs[execname()]++ 7662306a36Sopenharmony_ci } 7762306a36Sopenharmony_ci 7862306a36Sopenharmony_ci function print_count() { 7962306a36Sopenharmony_ci printf ("%-25s %-s\n", "#Pages Allocated", "Process Name") 8062306a36Sopenharmony_ci foreach (proc in page_allocs-) 8162306a36Sopenharmony_ci printf("%-25d %s\n", page_allocs[proc], proc) 8262306a36Sopenharmony_ci printf ("\n") 8362306a36Sopenharmony_ci delete page_allocs 8462306a36Sopenharmony_ci } 8562306a36Sopenharmony_ci 8662306a36Sopenharmony_ci probe timer.s(5) { 8762306a36Sopenharmony_ci print_count() 8862306a36Sopenharmony_ci } 8962306a36Sopenharmony_ci 9062306a36Sopenharmony_ci3.3 System-Wide Event Enabling with PCL 9162306a36Sopenharmony_ci--------------------------------------- 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ciBy specifying the -a switch and analysing sleep, the system-wide events 9462306a36Sopenharmony_cifor a duration of time can be examined. 9562306a36Sopenharmony_ci:: 9662306a36Sopenharmony_ci 9762306a36Sopenharmony_ci $ perf stat -a \ 9862306a36Sopenharmony_ci -e kmem:mm_page_alloc -e kmem:mm_page_free \ 9962306a36Sopenharmony_ci -e kmem:mm_page_free_batched \ 10062306a36Sopenharmony_ci sleep 10 10162306a36Sopenharmony_ci Performance counter stats for 'sleep 10': 10262306a36Sopenharmony_ci 10362306a36Sopenharmony_ci 9630 kmem:mm_page_alloc 10462306a36Sopenharmony_ci 2143 kmem:mm_page_free 10562306a36Sopenharmony_ci 7424 kmem:mm_page_free_batched 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ci 10.002577764 seconds time elapsed 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ciSimilarly, one could execute a shell and exit it as desired to get a report 11062306a36Sopenharmony_ciat that point. 11162306a36Sopenharmony_ci 11262306a36Sopenharmony_ci3.4 Local Event Enabling 11362306a36Sopenharmony_ci------------------------ 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ciDocumentation/trace/ftrace.rst describes how to enable events on a per-thread 11662306a36Sopenharmony_cibasis using set_ftrace_pid. 11762306a36Sopenharmony_ci 11862306a36Sopenharmony_ci3.5 Local Event Enablement with PCL 11962306a36Sopenharmony_ci----------------------------------- 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ciEvents can be activated and tracked for the duration of a process on a local 12262306a36Sopenharmony_cibasis using PCL such as follows. 12362306a36Sopenharmony_ci:: 12462306a36Sopenharmony_ci 12562306a36Sopenharmony_ci $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \ 12662306a36Sopenharmony_ci -e kmem:mm_page_free_batched ./hackbench 10 12762306a36Sopenharmony_ci Time: 0.909 12862306a36Sopenharmony_ci 12962306a36Sopenharmony_ci Performance counter stats for './hackbench 10': 13062306a36Sopenharmony_ci 13162306a36Sopenharmony_ci 17803 kmem:mm_page_alloc 13262306a36Sopenharmony_ci 12398 kmem:mm_page_free 13362306a36Sopenharmony_ci 4827 kmem:mm_page_free_batched 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ci 0.973913387 seconds time elapsed 13662306a36Sopenharmony_ci 13762306a36Sopenharmony_ci4. Event Filtering 13862306a36Sopenharmony_ci================== 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ciDocumentation/trace/ftrace.rst covers in-depth how to filter events in 14162306a36Sopenharmony_ciftrace. Obviously using grep and awk of trace_pipe is an option as well 14262306a36Sopenharmony_cias any script reading trace_pipe. 14362306a36Sopenharmony_ci 14462306a36Sopenharmony_ci5. Analysing Event Variances with PCL 14562306a36Sopenharmony_ci===================================== 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ciAny workload can exhibit variances between runs and it can be important 14862306a36Sopenharmony_cito know what the standard deviation is. By and large, this is left to the 14962306a36Sopenharmony_ciperformance analyst to do it by hand. In the event that the discrete event 15062306a36Sopenharmony_cioccurrences are useful to the performance analyst, then perf can be used. 15162306a36Sopenharmony_ci:: 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ci $ perf stat --repeat 5 -e kmem:mm_page_alloc -e kmem:mm_page_free 15462306a36Sopenharmony_ci -e kmem:mm_page_free_batched ./hackbench 10 15562306a36Sopenharmony_ci Time: 0.890 15662306a36Sopenharmony_ci Time: 0.895 15762306a36Sopenharmony_ci Time: 0.915 15862306a36Sopenharmony_ci Time: 1.001 15962306a36Sopenharmony_ci Time: 0.899 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ci Performance counter stats for './hackbench 10' (5 runs): 16262306a36Sopenharmony_ci 16362306a36Sopenharmony_ci 16630 kmem:mm_page_alloc ( +- 3.542% ) 16462306a36Sopenharmony_ci 11486 kmem:mm_page_free ( +- 4.771% ) 16562306a36Sopenharmony_ci 4730 kmem:mm_page_free_batched ( +- 2.325% ) 16662306a36Sopenharmony_ci 16762306a36Sopenharmony_ci 0.982653002 seconds time elapsed ( +- 1.448% ) 16862306a36Sopenharmony_ci 16962306a36Sopenharmony_ciIn the event that some higher-level event is required that depends on some 17062306a36Sopenharmony_ciaggregation of discrete events, then a script would need to be developed. 17162306a36Sopenharmony_ci 17262306a36Sopenharmony_ciUsing --repeat, it is also possible to view how events are fluctuating over 17362306a36Sopenharmony_citime on a system-wide basis using -a and sleep. 17462306a36Sopenharmony_ci:: 17562306a36Sopenharmony_ci 17662306a36Sopenharmony_ci $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \ 17762306a36Sopenharmony_ci -e kmem:mm_page_free_batched \ 17862306a36Sopenharmony_ci -a --repeat 10 \ 17962306a36Sopenharmony_ci sleep 1 18062306a36Sopenharmony_ci Performance counter stats for 'sleep 1' (10 runs): 18162306a36Sopenharmony_ci 18262306a36Sopenharmony_ci 1066 kmem:mm_page_alloc ( +- 26.148% ) 18362306a36Sopenharmony_ci 182 kmem:mm_page_free ( +- 5.464% ) 18462306a36Sopenharmony_ci 890 kmem:mm_page_free_batched ( +- 30.079% ) 18562306a36Sopenharmony_ci 18662306a36Sopenharmony_ci 1.002251757 seconds time elapsed ( +- 0.005% ) 18762306a36Sopenharmony_ci 18862306a36Sopenharmony_ci6. Higher-Level Analysis with Helper Scripts 18962306a36Sopenharmony_ci============================================ 19062306a36Sopenharmony_ci 19162306a36Sopenharmony_ciWhen events are enabled the events that are triggering can be read from 19262306a36Sopenharmony_ci/sys/kernel/tracing/trace_pipe in human-readable format although binary 19362306a36Sopenharmony_cioptions exist as well. By post-processing the output, further information can 19462306a36Sopenharmony_cibe gathered on-line as appropriate. Examples of post-processing might include 19562306a36Sopenharmony_ci 19662306a36Sopenharmony_ci - Reading information from /proc for the PID that triggered the event 19762306a36Sopenharmony_ci - Deriving a higher-level event from a series of lower-level events. 19862306a36Sopenharmony_ci - Calculating latencies between two events 19962306a36Sopenharmony_ci 20062306a36Sopenharmony_ciDocumentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example 20162306a36Sopenharmony_ciscript that can read trace_pipe from STDIN or a copy of a trace. When used 20262306a36Sopenharmony_cion-line, it can be interrupted once to generate a report without exiting 20362306a36Sopenharmony_ciand twice to exit. 20462306a36Sopenharmony_ci 20562306a36Sopenharmony_ciSimplistically, the script just reads STDIN and counts up events but it 20662306a36Sopenharmony_cialso can do more such as 20762306a36Sopenharmony_ci 20862306a36Sopenharmony_ci - Derive high-level events from many low-level events. If a number of pages 20962306a36Sopenharmony_ci are freed to the main allocator from the per-CPU lists, it recognises 21062306a36Sopenharmony_ci that as one per-CPU drain even though there is no specific tracepoint 21162306a36Sopenharmony_ci for that event 21262306a36Sopenharmony_ci - It can aggregate based on PID or individual process number 21362306a36Sopenharmony_ci - In the event memory is getting externally fragmented, it reports 21462306a36Sopenharmony_ci on whether the fragmentation event was severe or moderate. 21562306a36Sopenharmony_ci - When receiving an event about a PID, it can record who the parent was so 21662306a36Sopenharmony_ci that if large numbers of events are coming from very short-lived 21762306a36Sopenharmony_ci processes, the parent process responsible for creating all the helpers 21862306a36Sopenharmony_ci can be identified 21962306a36Sopenharmony_ci 22062306a36Sopenharmony_ci7. Lower-Level Analysis with PCL 22162306a36Sopenharmony_ci================================ 22262306a36Sopenharmony_ci 22362306a36Sopenharmony_ciThere may also be a requirement to identify what functions within a program 22462306a36Sopenharmony_ciwere generating events within the kernel. To begin this sort of analysis, the 22562306a36Sopenharmony_cidata must be recorded. At the time of writing, this required root: 22662306a36Sopenharmony_ci:: 22762306a36Sopenharmony_ci 22862306a36Sopenharmony_ci $ perf record -c 1 \ 22962306a36Sopenharmony_ci -e kmem:mm_page_alloc -e kmem:mm_page_free \ 23062306a36Sopenharmony_ci -e kmem:mm_page_free_batched \ 23162306a36Sopenharmony_ci ./hackbench 10 23262306a36Sopenharmony_ci Time: 0.894 23362306a36Sopenharmony_ci [ perf record: Captured and wrote 0.733 MB perf.data (~32010 samples) ] 23462306a36Sopenharmony_ci 23562306a36Sopenharmony_ciNote the use of '-c 1' to set the event period to sample. The default sample 23662306a36Sopenharmony_ciperiod is quite high to minimise overhead but the information collected can be 23762306a36Sopenharmony_civery coarse as a result. 23862306a36Sopenharmony_ci 23962306a36Sopenharmony_ciThis record outputted a file called perf.data which can be analysed using 24062306a36Sopenharmony_ciperf report. 24162306a36Sopenharmony_ci:: 24262306a36Sopenharmony_ci 24362306a36Sopenharmony_ci $ perf report 24462306a36Sopenharmony_ci # Samples: 30922 24562306a36Sopenharmony_ci # 24662306a36Sopenharmony_ci # Overhead Command Shared Object 24762306a36Sopenharmony_ci # ........ ......... ................................ 24862306a36Sopenharmony_ci # 24962306a36Sopenharmony_ci 87.27% hackbench [vdso] 25062306a36Sopenharmony_ci 6.85% hackbench /lib/i686/cmov/libc-2.9.so 25162306a36Sopenharmony_ci 2.62% hackbench /lib/ld-2.9.so 25262306a36Sopenharmony_ci 1.52% perf [vdso] 25362306a36Sopenharmony_ci 1.22% hackbench ./hackbench 25462306a36Sopenharmony_ci 0.48% hackbench [kernel] 25562306a36Sopenharmony_ci 0.02% perf /lib/i686/cmov/libc-2.9.so 25662306a36Sopenharmony_ci 0.01% perf /usr/bin/perf 25762306a36Sopenharmony_ci 0.01% perf /lib/ld-2.9.so 25862306a36Sopenharmony_ci 0.00% hackbench /lib/i686/cmov/libpthread-2.9.so 25962306a36Sopenharmony_ci # 26062306a36Sopenharmony_ci # (For more details, try: perf report --sort comm,dso,symbol) 26162306a36Sopenharmony_ci # 26262306a36Sopenharmony_ci 26362306a36Sopenharmony_ciAccording to this, the vast majority of events triggered on events 26462306a36Sopenharmony_ciwithin the VDSO. With simple binaries, this will often be the case so let's 26562306a36Sopenharmony_citake a slightly different example. In the course of writing this, it was 26662306a36Sopenharmony_cinoticed that X was generating an insane amount of page allocations so let's look 26762306a36Sopenharmony_ciat it: 26862306a36Sopenharmony_ci:: 26962306a36Sopenharmony_ci 27062306a36Sopenharmony_ci $ perf record -c 1 -f \ 27162306a36Sopenharmony_ci -e kmem:mm_page_alloc -e kmem:mm_page_free \ 27262306a36Sopenharmony_ci -e kmem:mm_page_free_batched \ 27362306a36Sopenharmony_ci -p `pidof X` 27462306a36Sopenharmony_ci 27562306a36Sopenharmony_ciThis was interrupted after a few seconds and 27662306a36Sopenharmony_ci:: 27762306a36Sopenharmony_ci 27862306a36Sopenharmony_ci $ perf report 27962306a36Sopenharmony_ci # Samples: 27666 28062306a36Sopenharmony_ci # 28162306a36Sopenharmony_ci # Overhead Command Shared Object 28262306a36Sopenharmony_ci # ........ ....... ....................................... 28362306a36Sopenharmony_ci # 28462306a36Sopenharmony_ci 51.95% Xorg [vdso] 28562306a36Sopenharmony_ci 47.95% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 28662306a36Sopenharmony_ci 0.09% Xorg /lib/i686/cmov/libc-2.9.so 28762306a36Sopenharmony_ci 0.01% Xorg [kernel] 28862306a36Sopenharmony_ci # 28962306a36Sopenharmony_ci # (For more details, try: perf report --sort comm,dso,symbol) 29062306a36Sopenharmony_ci # 29162306a36Sopenharmony_ci 29262306a36Sopenharmony_ciSo, almost half of the events are occurring in a library. To get an idea which 29362306a36Sopenharmony_cisymbol: 29462306a36Sopenharmony_ci:: 29562306a36Sopenharmony_ci 29662306a36Sopenharmony_ci $ perf report --sort comm,dso,symbol 29762306a36Sopenharmony_ci # Samples: 27666 29862306a36Sopenharmony_ci # 29962306a36Sopenharmony_ci # Overhead Command Shared Object Symbol 30062306a36Sopenharmony_ci # ........ ....... ....................................... ...... 30162306a36Sopenharmony_ci # 30262306a36Sopenharmony_ci 51.95% Xorg [vdso] [.] 0x000000ffffe424 30362306a36Sopenharmony_ci 47.93% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] pixmanFillsse2 30462306a36Sopenharmony_ci 0.09% Xorg /lib/i686/cmov/libc-2.9.so [.] _int_malloc 30562306a36Sopenharmony_ci 0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] pixman_region32_copy_f 30662306a36Sopenharmony_ci 0.01% Xorg [kernel] [k] read_hpet 30762306a36Sopenharmony_ci 0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] get_fast_path 30862306a36Sopenharmony_ci 0.00% Xorg [kernel] [k] ftrace_trace_userstack 30962306a36Sopenharmony_ci 31062306a36Sopenharmony_ciTo see where within the function pixmanFillsse2 things are going wrong: 31162306a36Sopenharmony_ci:: 31262306a36Sopenharmony_ci 31362306a36Sopenharmony_ci $ perf annotate pixmanFillsse2 31462306a36Sopenharmony_ci [ ... ] 31562306a36Sopenharmony_ci 0.00 : 34eeb: 0f 18 08 prefetcht0 (%eax) 31662306a36Sopenharmony_ci : } 31762306a36Sopenharmony_ci : 31862306a36Sopenharmony_ci : extern __inline void __attribute__((__gnu_inline__, __always_inline__, _ 31962306a36Sopenharmony_ci : _mm_store_si128 (__m128i *__P, __m128i __B) : { 32062306a36Sopenharmony_ci : *__P = __B; 32162306a36Sopenharmony_ci 12.40 : 34eee: 66 0f 7f 80 40 ff ff movdqa %xmm0,-0xc0(%eax) 32262306a36Sopenharmony_ci 0.00 : 34ef5: ff 32362306a36Sopenharmony_ci 12.40 : 34ef6: 66 0f 7f 80 50 ff ff movdqa %xmm0,-0xb0(%eax) 32462306a36Sopenharmony_ci 0.00 : 34efd: ff 32562306a36Sopenharmony_ci 12.39 : 34efe: 66 0f 7f 80 60 ff ff movdqa %xmm0,-0xa0(%eax) 32662306a36Sopenharmony_ci 0.00 : 34f05: ff 32762306a36Sopenharmony_ci 12.67 : 34f06: 66 0f 7f 80 70 ff ff movdqa %xmm0,-0x90(%eax) 32862306a36Sopenharmony_ci 0.00 : 34f0d: ff 32962306a36Sopenharmony_ci 12.58 : 34f0e: 66 0f 7f 40 80 movdqa %xmm0,-0x80(%eax) 33062306a36Sopenharmony_ci 12.31 : 34f13: 66 0f 7f 40 90 movdqa %xmm0,-0x70(%eax) 33162306a36Sopenharmony_ci 12.40 : 34f18: 66 0f 7f 40 a0 movdqa %xmm0,-0x60(%eax) 33262306a36Sopenharmony_ci 12.31 : 34f1d: 66 0f 7f 40 b0 movdqa %xmm0,-0x50(%eax) 33362306a36Sopenharmony_ci 33462306a36Sopenharmony_ciAt a glance, it looks like the time is being spent copying pixmaps to 33562306a36Sopenharmony_cithe card. Further investigation would be needed to determine why pixmaps 33662306a36Sopenharmony_ciare being copied around so much but a starting point would be to take an 33762306a36Sopenharmony_ciancient build of libpixmap out of the library path where it was totally 33862306a36Sopenharmony_ciforgotten about from months ago! 339