18c2ecf20Sopenharmony_ci.. _perf_security: 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ciPerf events and tool security 48c2ecf20Sopenharmony_ci============================= 58c2ecf20Sopenharmony_ci 68c2ecf20Sopenharmony_ciOverview 78c2ecf20Sopenharmony_ci-------- 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ciUsage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ 108c2ecf20Sopenharmony_cican impose a considerable risk of leaking sensitive data accessed by 118c2ecf20Sopenharmony_cimonitored processes. The data leakage is possible both in scenarios of 128c2ecf20Sopenharmony_cidirect usage of perf_events system call API [2]_ and over data files 138c2ecf20Sopenharmony_cigenerated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk 148c2ecf20Sopenharmony_cidepends on the nature of data that perf_events performance monitoring 158c2ecf20Sopenharmony_ciunits (PMU) [2]_ and Perf collect and expose for performance analysis. 168c2ecf20Sopenharmony_ciCollected system and performance data may be split into several 178c2ecf20Sopenharmony_cicategories: 188c2ecf20Sopenharmony_ci 198c2ecf20Sopenharmony_ci1. System hardware and software configuration data, for example: a CPU 208c2ecf20Sopenharmony_ci model and its cache configuration, an amount of available memory and 218c2ecf20Sopenharmony_ci its topology, used kernel and Perf versions, performance monitoring 228c2ecf20Sopenharmony_ci setup including experiment time, events configuration, Perf command 238c2ecf20Sopenharmony_ci line parameters, etc. 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ci2. User and kernel module paths and their load addresses with sizes, 268c2ecf20Sopenharmony_ci process and thread names with their PIDs and TIDs, timestamps for 278c2ecf20Sopenharmony_ci captured hardware and software events. 288c2ecf20Sopenharmony_ci 298c2ecf20Sopenharmony_ci3. Content of kernel software counters (e.g., for context switches, page 308c2ecf20Sopenharmony_ci faults, CPU migrations), architectural hardware performance counters 318c2ecf20Sopenharmony_ci (PMC) [8]_ and machine specific registers (MSR) [9]_ that provide 328c2ecf20Sopenharmony_ci execution metrics for various monitored parts of the system (e.g., 338c2ecf20Sopenharmony_ci memory controller (IMC), interconnect (QPI/UPI) or peripheral (PCIe) 348c2ecf20Sopenharmony_ci uncore counters) without direct attribution to any execution context 358c2ecf20Sopenharmony_ci state. 368c2ecf20Sopenharmony_ci 378c2ecf20Sopenharmony_ci4. Content of architectural execution context registers (e.g., RIP, RSP, 388c2ecf20Sopenharmony_ci RBP on x86_64), process user and kernel space memory addresses and 398c2ecf20Sopenharmony_ci data, content of various architectural MSRs that capture data from 408c2ecf20Sopenharmony_ci this category. 418c2ecf20Sopenharmony_ci 428c2ecf20Sopenharmony_ciData that belong to the fourth category can potentially contain 438c2ecf20Sopenharmony_cisensitive process data. If PMUs in some monitoring modes capture values 448c2ecf20Sopenharmony_ciof execution context registers or data from process memory then access 458c2ecf20Sopenharmony_cito such monitoring modes requires to be ordered and secured properly. 468c2ecf20Sopenharmony_ciSo, perf_events performance monitoring and observability operations are 478c2ecf20Sopenharmony_cithe subject for security access control management [5]_ . 488c2ecf20Sopenharmony_ci 498c2ecf20Sopenharmony_ciperf_events access control 508c2ecf20Sopenharmony_ci------------------------------- 518c2ecf20Sopenharmony_ci 528c2ecf20Sopenharmony_ciTo perform security checks, the Linux implementation splits processes 538c2ecf20Sopenharmony_ciinto two categories [6]_ : a) privileged processes (whose effective user 548c2ecf20Sopenharmony_ciID is 0, referred to as superuser or root), and b) unprivileged 558c2ecf20Sopenharmony_ciprocesses (whose effective UID is nonzero). Privileged processes bypass 568c2ecf20Sopenharmony_ciall kernel security permission checks so perf_events performance 578c2ecf20Sopenharmony_cimonitoring is fully available to privileged processes without access, 588c2ecf20Sopenharmony_ciscope and resource restrictions. 598c2ecf20Sopenharmony_ci 608c2ecf20Sopenharmony_ciUnprivileged processes are subject to a full security permission check 618c2ecf20Sopenharmony_cibased on the process's credentials [5]_ (usually: effective UID, 628c2ecf20Sopenharmony_cieffective GID, and supplementary group list). 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ciLinux divides the privileges traditionally associated with superuser 658c2ecf20Sopenharmony_ciinto distinct units, known as capabilities [6]_ , which can be 668c2ecf20Sopenharmony_ciindependently enabled and disabled on per-thread basis for processes and 678c2ecf20Sopenharmony_cifiles of unprivileged users. 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_ciUnprivileged processes with enabled CAP_PERFMON capability are treated 708c2ecf20Sopenharmony_cias privileged processes with respect to perf_events performance 718c2ecf20Sopenharmony_cimonitoring and observability operations, thus, bypass *scope* permissions 728c2ecf20Sopenharmony_cichecks in the kernel. CAP_PERFMON implements the principle of least 738c2ecf20Sopenharmony_ciprivilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and 748c2ecf20Sopenharmony_ciobservability operations in the kernel and provides a secure approach to 758c2ecf20Sopenharmony_ciperfomance monitoring and observability in the system. 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciFor backward compatibility reasons the access to perf_events monitoring and 788c2ecf20Sopenharmony_ciobservability operations is also open for CAP_SYS_ADMIN privileged 798c2ecf20Sopenharmony_ciprocesses but CAP_SYS_ADMIN usage for secure monitoring and observability 808c2ecf20Sopenharmony_ciuse cases is discouraged with respect to the CAP_PERFMON capability. 818c2ecf20Sopenharmony_ciIf system audit records [14]_ for a process using perf_events system call 828c2ecf20Sopenharmony_ciAPI contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN 838c2ecf20Sopenharmony_cicapabilities then providing the process with CAP_PERFMON capability singly 848c2ecf20Sopenharmony_ciis recommended as the preferred secure approach to resolve double access 858c2ecf20Sopenharmony_cidenial logging related to usage of performance monitoring and observability. 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ciUnprivileged processes using perf_events system call are also subject 888c2ecf20Sopenharmony_cifor PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose 898c2ecf20Sopenharmony_cioutcome determines whether monitoring is permitted. So unprivileged 908c2ecf20Sopenharmony_ciprocesses provided with CAP_SYS_PTRACE capability are effectively 918c2ecf20Sopenharmony_cipermitted to pass the check. 928c2ecf20Sopenharmony_ci 938c2ecf20Sopenharmony_ciOther capabilities being granted to unprivileged processes can 948c2ecf20Sopenharmony_cieffectively enable capturing of additional data required for later 958c2ecf20Sopenharmony_ciperformance analysis of monitored processes or a system. For example, 968c2ecf20Sopenharmony_ciCAP_SYSLOG capability permits reading kernel space memory addresses from 978c2ecf20Sopenharmony_ci/proc/kallsyms file. 988c2ecf20Sopenharmony_ci 998c2ecf20Sopenharmony_ciPrivileged Perf users groups 1008c2ecf20Sopenharmony_ci--------------------------------- 1018c2ecf20Sopenharmony_ci 1028c2ecf20Sopenharmony_ciMechanisms of capabilities, privileged capability-dumb files [6]_ and 1038c2ecf20Sopenharmony_cifile system ACLs [10]_ can be used to create dedicated groups of 1048c2ecf20Sopenharmony_ciprivileged Perf users who are permitted to execute performance monitoring 1058c2ecf20Sopenharmony_ciand observability without scope limits. The following steps can be 1068c2ecf20Sopenharmony_citaken to create such groups of privileged Perf users. 1078c2ecf20Sopenharmony_ci 1088c2ecf20Sopenharmony_ci1. Create perf_users group of privileged Perf users, assign perf_users 1098c2ecf20Sopenharmony_ci group to Perf tool executable and limit access to the executable for 1108c2ecf20Sopenharmony_ci other users in the system who are not in the perf_users group: 1118c2ecf20Sopenharmony_ci 1128c2ecf20Sopenharmony_ci:: 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_ci # groupadd perf_users 1158c2ecf20Sopenharmony_ci # ls -alhF 1168c2ecf20Sopenharmony_ci -rwxr-xr-x 2 root root 11M Oct 19 15:12 perf 1178c2ecf20Sopenharmony_ci # chgrp perf_users perf 1188c2ecf20Sopenharmony_ci # ls -alhF 1198c2ecf20Sopenharmony_ci -rwxr-xr-x 2 root perf_users 11M Oct 19 15:12 perf 1208c2ecf20Sopenharmony_ci # chmod o-rwx perf 1218c2ecf20Sopenharmony_ci # ls -alhF 1228c2ecf20Sopenharmony_ci -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf 1238c2ecf20Sopenharmony_ci 1248c2ecf20Sopenharmony_ci2. Assign the required capabilities to the Perf tool executable file and 1258c2ecf20Sopenharmony_ci enable members of perf_users group with monitoring and observability 1268c2ecf20Sopenharmony_ci privileges [6]_ : 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ci:: 1298c2ecf20Sopenharmony_ci 1308c2ecf20Sopenharmony_ci # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf 1318c2ecf20Sopenharmony_ci # setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf 1328c2ecf20Sopenharmony_ci perf: OK 1338c2ecf20Sopenharmony_ci # getcap perf 1348c2ecf20Sopenharmony_ci perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep 1358c2ecf20Sopenharmony_ci 1368c2ecf20Sopenharmony_ciIf the libcap installed doesn't yet support "cap_perfmon", use "38" instead, 1378c2ecf20Sopenharmony_cii.e.: 1388c2ecf20Sopenharmony_ci 1398c2ecf20Sopenharmony_ci:: 1408c2ecf20Sopenharmony_ci 1418c2ecf20Sopenharmony_ci # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf 1428c2ecf20Sopenharmony_ci 1438c2ecf20Sopenharmony_ciNote that you may need to have 'cap_ipc_lock' in the mix for tools such as 1448c2ecf20Sopenharmony_ci'perf top', alternatively use 'perf top -m N', to reduce the memory that 1458c2ecf20Sopenharmony_ciit uses for the perf ring buffer, see the memory allocation section below. 1468c2ecf20Sopenharmony_ci 1478c2ecf20Sopenharmony_ciUsing a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38, 1488c2ecf20Sopenharmony_ciCAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u', 1498c2ecf20Sopenharmony_ciso as a workaround explicitly ask for the 'cycles' event, i.e.: 1508c2ecf20Sopenharmony_ci 1518c2ecf20Sopenharmony_ci:: 1528c2ecf20Sopenharmony_ci 1538c2ecf20Sopenharmony_ci # perf top -e cycles 1548c2ecf20Sopenharmony_ci 1558c2ecf20Sopenharmony_ciTo get kernel and user samples with a perf binary with just CAP_PERFMON. 1568c2ecf20Sopenharmony_ci 1578c2ecf20Sopenharmony_ciAs a result, members of perf_users group are capable of conducting 1588c2ecf20Sopenharmony_ciperformance monitoring and observability by using functionality of the 1598c2ecf20Sopenharmony_ciconfigured Perf tool executable that, when executes, passes perf_events 1608c2ecf20Sopenharmony_cisubsystem scope checks. 1618c2ecf20Sopenharmony_ci 1628c2ecf20Sopenharmony_ciThis specific access control management is only available to superuser 1638c2ecf20Sopenharmony_cior root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ 1648c2ecf20Sopenharmony_cicapabilities. 1658c2ecf20Sopenharmony_ci 1668c2ecf20Sopenharmony_ciUnprivileged users 1678c2ecf20Sopenharmony_ci----------------------------------- 1688c2ecf20Sopenharmony_ci 1698c2ecf20Sopenharmony_ciperf_events *scope* and *access* control for unprivileged processes 1708c2ecf20Sopenharmony_ciis governed by perf_event_paranoid [2]_ setting: 1718c2ecf20Sopenharmony_ci 1728c2ecf20Sopenharmony_ci-1: 1738c2ecf20Sopenharmony_ci Impose no *scope* and *access* restrictions on using perf_events 1748c2ecf20Sopenharmony_ci performance monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ 1758c2ecf20Sopenharmony_ci locking limit is ignored when allocating memory buffers for storing 1768c2ecf20Sopenharmony_ci performance data. This is the least secure mode since allowed 1778c2ecf20Sopenharmony_ci monitored *scope* is maximized and no perf_events specific limits 1788c2ecf20Sopenharmony_ci are imposed on *resources* allocated for performance monitoring. 1798c2ecf20Sopenharmony_ci 1808c2ecf20Sopenharmony_ci>=0: 1818c2ecf20Sopenharmony_ci *scope* includes per-process and system wide performance monitoring 1828c2ecf20Sopenharmony_ci but excludes raw tracepoints and ftrace function tracepoints 1838c2ecf20Sopenharmony_ci monitoring. CPU and system events happened when executing either in 1848c2ecf20Sopenharmony_ci user or in kernel space can be monitored and captured for later 1858c2ecf20Sopenharmony_ci analysis. Per-user per-cpu perf_event_mlock_kb locking limit is 1868c2ecf20Sopenharmony_ci imposed but ignored for unprivileged processes with CAP_IPC_LOCK 1878c2ecf20Sopenharmony_ci [6]_ capability. 1888c2ecf20Sopenharmony_ci 1898c2ecf20Sopenharmony_ci>=1: 1908c2ecf20Sopenharmony_ci *scope* includes per-process performance monitoring only and 1918c2ecf20Sopenharmony_ci excludes system wide performance monitoring. CPU and system events 1928c2ecf20Sopenharmony_ci happened when executing either in user or in kernel space can be 1938c2ecf20Sopenharmony_ci monitored and captured for later analysis. Per-user per-cpu 1948c2ecf20Sopenharmony_ci perf_event_mlock_kb locking limit is imposed but ignored for 1958c2ecf20Sopenharmony_ci unprivileged processes with CAP_IPC_LOCK capability. 1968c2ecf20Sopenharmony_ci 1978c2ecf20Sopenharmony_ci>=2: 1988c2ecf20Sopenharmony_ci *scope* includes per-process performance monitoring only. CPU and 1998c2ecf20Sopenharmony_ci system events happened when executing in user space only can be 2008c2ecf20Sopenharmony_ci monitored and captured for later analysis. Per-user per-cpu 2018c2ecf20Sopenharmony_ci perf_event_mlock_kb locking limit is imposed but ignored for 2028c2ecf20Sopenharmony_ci unprivileged processes with CAP_IPC_LOCK capability. 2038c2ecf20Sopenharmony_ci 2048c2ecf20Sopenharmony_ciResource control 2058c2ecf20Sopenharmony_ci--------------------------------- 2068c2ecf20Sopenharmony_ci 2078c2ecf20Sopenharmony_ciOpen file descriptors 2088c2ecf20Sopenharmony_ci+++++++++++++++++++++ 2098c2ecf20Sopenharmony_ci 2108c2ecf20Sopenharmony_ciThe perf_events system call API [2]_ allocates file descriptors for 2118c2ecf20Sopenharmony_cievery configured PMU event. Open file descriptors are a per-process 2128c2ecf20Sopenharmony_ciaccountable resource governed by the RLIMIT_NOFILE [11]_ limit 2138c2ecf20Sopenharmony_ci(ulimit -n), which is usually derived from the login shell process. When 2148c2ecf20Sopenharmony_ciconfiguring Perf collection for a long list of events on a large server 2158c2ecf20Sopenharmony_cisystem, this limit can be easily hit preventing required monitoring 2168c2ecf20Sopenharmony_ciconfiguration. RLIMIT_NOFILE limit can be increased on per-user basis 2178c2ecf20Sopenharmony_cimodifying content of the limits.conf file [12]_ . Ordinarily, a Perf 2188c2ecf20Sopenharmony_cisampling session (perf record) requires an amount of open perf_event 2198c2ecf20Sopenharmony_cifile descriptors that is not less than the number of monitored events 2208c2ecf20Sopenharmony_cimultiplied by the number of monitored CPUs. 2218c2ecf20Sopenharmony_ci 2228c2ecf20Sopenharmony_ciMemory allocation 2238c2ecf20Sopenharmony_ci+++++++++++++++++ 2248c2ecf20Sopenharmony_ci 2258c2ecf20Sopenharmony_ciThe amount of memory available to user processes for capturing 2268c2ecf20Sopenharmony_ciperformance monitoring data is governed by the perf_event_mlock_kb [2]_ 2278c2ecf20Sopenharmony_cisetting. This perf_event specific resource setting defines overall 2288c2ecf20Sopenharmony_ciper-cpu limits of memory allowed for mapping by the user processes to 2298c2ecf20Sopenharmony_ciexecute performance monitoring. The setting essentially extends the 2308c2ecf20Sopenharmony_ciRLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped 2318c2ecf20Sopenharmony_cispecifically for capturing monitored performance events and related data. 2328c2ecf20Sopenharmony_ci 2338c2ecf20Sopenharmony_ciFor example, if a machine has eight cores and perf_event_mlock_kb limit 2348c2ecf20Sopenharmony_ciis set to 516 KiB, then a user process is provided with 516 KiB * 8 = 2358c2ecf20Sopenharmony_ci4128 KiB of memory above the RLIMIT_MEMLOCK limit (ulimit -l) for 2368c2ecf20Sopenharmony_ciperf_event mmap buffers. In particular, this means that, if the user 2378c2ecf20Sopenharmony_ciwants to start two or more performance monitoring processes, the user is 2388c2ecf20Sopenharmony_cirequired to manually distribute the available 4128 KiB between the 2398c2ecf20Sopenharmony_cimonitoring processes, for example, using the --mmap-pages Perf record 2408c2ecf20Sopenharmony_cimode option. Otherwise, the first started performance monitoring process 2418c2ecf20Sopenharmony_ciallocates all available 4128 KiB and the other processes will fail to 2428c2ecf20Sopenharmony_ciproceed due to the lack of memory. 2438c2ecf20Sopenharmony_ci 2448c2ecf20Sopenharmony_ciRLIMIT_MEMLOCK and perf_event_mlock_kb resource constraints are ignored 2458c2ecf20Sopenharmony_cifor processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf 2468c2ecf20Sopenharmony_ciprivileged users can be provided with memory above the constraints for 2478c2ecf20Sopenharmony_ciperf_events/Perf performance monitoring purpose by providing the Perf 2488c2ecf20Sopenharmony_ciexecutable with CAP_IPC_LOCK capability. 2498c2ecf20Sopenharmony_ci 2508c2ecf20Sopenharmony_ciBibliography 2518c2ecf20Sopenharmony_ci------------ 2528c2ecf20Sopenharmony_ci 2538c2ecf20Sopenharmony_ci.. [1] `<https://lwn.net/Articles/337493/>`_ 2548c2ecf20Sopenharmony_ci.. [2] `<http://man7.org/linux/man-pages/man2/perf_event_open.2.html>`_ 2558c2ecf20Sopenharmony_ci.. [3] `<http://web.eece.maine.edu/~vweaver/projects/perf_events/>`_ 2568c2ecf20Sopenharmony_ci.. [4] `<https://perf.wiki.kernel.org/index.php/Main_Page>`_ 2578c2ecf20Sopenharmony_ci.. [5] `<https://www.kernel.org/doc/html/latest/security/credentials.html>`_ 2588c2ecf20Sopenharmony_ci.. [6] `<http://man7.org/linux/man-pages/man7/capabilities.7.html>`_ 2598c2ecf20Sopenharmony_ci.. [7] `<http://man7.org/linux/man-pages/man2/ptrace.2.html>`_ 2608c2ecf20Sopenharmony_ci.. [8] `<https://en.wikipedia.org/wiki/Hardware_performance_counter>`_ 2618c2ecf20Sopenharmony_ci.. [9] `<https://en.wikipedia.org/wiki/Model-specific_register>`_ 2628c2ecf20Sopenharmony_ci.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_ 2638c2ecf20Sopenharmony_ci.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_ 2648c2ecf20Sopenharmony_ci.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_ 2658c2ecf20Sopenharmony_ci.. [13] `<https://sites.google.com/site/fullycapable>`_ 2668c2ecf20Sopenharmony_ci.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_ 267