18c2ecf20Sopenharmony_ci.. _perf_security:
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ciPerf events and tool security
48c2ecf20Sopenharmony_ci=============================
58c2ecf20Sopenharmony_ci
68c2ecf20Sopenharmony_ciOverview
78c2ecf20Sopenharmony_ci--------
88c2ecf20Sopenharmony_ci
98c2ecf20Sopenharmony_ciUsage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_
108c2ecf20Sopenharmony_cican impose a considerable risk of leaking sensitive data accessed by
118c2ecf20Sopenharmony_cimonitored processes. The data leakage is possible both in scenarios of
128c2ecf20Sopenharmony_cidirect usage of perf_events system call API [2]_ and over data files
138c2ecf20Sopenharmony_cigenerated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk
148c2ecf20Sopenharmony_cidepends on the nature of data that perf_events performance monitoring
158c2ecf20Sopenharmony_ciunits (PMU) [2]_ and Perf collect and expose for performance analysis.
168c2ecf20Sopenharmony_ciCollected system and performance data may be split into several
178c2ecf20Sopenharmony_cicategories:
188c2ecf20Sopenharmony_ci
198c2ecf20Sopenharmony_ci1. System hardware and software configuration data, for example: a CPU
208c2ecf20Sopenharmony_ci   model and its cache configuration, an amount of available memory and
218c2ecf20Sopenharmony_ci   its topology, used kernel and Perf versions, performance monitoring
228c2ecf20Sopenharmony_ci   setup including experiment time, events configuration, Perf command
238c2ecf20Sopenharmony_ci   line parameters, etc.
248c2ecf20Sopenharmony_ci
258c2ecf20Sopenharmony_ci2. User and kernel module paths and their load addresses with sizes,
268c2ecf20Sopenharmony_ci   process and thread names with their PIDs and TIDs, timestamps for
278c2ecf20Sopenharmony_ci   captured hardware and software events.
288c2ecf20Sopenharmony_ci
298c2ecf20Sopenharmony_ci3. Content of kernel software counters (e.g., for context switches, page
308c2ecf20Sopenharmony_ci   faults, CPU migrations), architectural hardware performance counters
318c2ecf20Sopenharmony_ci   (PMC) [8]_ and machine specific registers (MSR) [9]_ that provide
328c2ecf20Sopenharmony_ci   execution metrics for various monitored parts of the system (e.g.,
338c2ecf20Sopenharmony_ci   memory controller (IMC), interconnect (QPI/UPI) or peripheral (PCIe)
348c2ecf20Sopenharmony_ci   uncore counters) without direct attribution to any execution context
358c2ecf20Sopenharmony_ci   state.
368c2ecf20Sopenharmony_ci
378c2ecf20Sopenharmony_ci4. Content of architectural execution context registers (e.g., RIP, RSP,
388c2ecf20Sopenharmony_ci   RBP on x86_64), process user and kernel space memory addresses and
398c2ecf20Sopenharmony_ci   data, content of various architectural MSRs that capture data from
408c2ecf20Sopenharmony_ci   this category.
418c2ecf20Sopenharmony_ci
428c2ecf20Sopenharmony_ciData that belong to the fourth category can potentially contain
438c2ecf20Sopenharmony_cisensitive process data. If PMUs in some monitoring modes capture values
448c2ecf20Sopenharmony_ciof execution context registers or data from process memory then access
458c2ecf20Sopenharmony_cito such monitoring modes requires to be ordered and secured properly.
468c2ecf20Sopenharmony_ciSo, perf_events performance monitoring and observability operations are
478c2ecf20Sopenharmony_cithe subject for security access control management [5]_ .
488c2ecf20Sopenharmony_ci
498c2ecf20Sopenharmony_ciperf_events access control
508c2ecf20Sopenharmony_ci-------------------------------
518c2ecf20Sopenharmony_ci
528c2ecf20Sopenharmony_ciTo perform security checks, the Linux implementation splits processes
538c2ecf20Sopenharmony_ciinto two categories [6]_ : a) privileged processes (whose effective user
548c2ecf20Sopenharmony_ciID is 0, referred to as superuser or root), and b) unprivileged
558c2ecf20Sopenharmony_ciprocesses (whose effective UID is nonzero). Privileged processes bypass
568c2ecf20Sopenharmony_ciall kernel security permission checks so perf_events performance
578c2ecf20Sopenharmony_cimonitoring is fully available to privileged processes without access,
588c2ecf20Sopenharmony_ciscope and resource restrictions.
598c2ecf20Sopenharmony_ci
608c2ecf20Sopenharmony_ciUnprivileged processes are subject to a full security permission check
618c2ecf20Sopenharmony_cibased on the process's credentials [5]_ (usually: effective UID,
628c2ecf20Sopenharmony_cieffective GID, and supplementary group list).
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ciLinux divides the privileges traditionally associated with superuser
658c2ecf20Sopenharmony_ciinto distinct units, known as capabilities [6]_ , which can be
668c2ecf20Sopenharmony_ciindependently enabled and disabled on per-thread basis for processes and
678c2ecf20Sopenharmony_cifiles of unprivileged users.
688c2ecf20Sopenharmony_ci
698c2ecf20Sopenharmony_ciUnprivileged processes with enabled CAP_PERFMON capability are treated
708c2ecf20Sopenharmony_cias privileged processes with respect to perf_events performance
718c2ecf20Sopenharmony_cimonitoring and observability operations, thus, bypass *scope* permissions
728c2ecf20Sopenharmony_cichecks in the kernel. CAP_PERFMON implements the principle of least
738c2ecf20Sopenharmony_ciprivilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and
748c2ecf20Sopenharmony_ciobservability operations in the kernel and provides a secure approach to
758c2ecf20Sopenharmony_ciperfomance monitoring and observability in the system.
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ciFor backward compatibility reasons the access to perf_events monitoring and
788c2ecf20Sopenharmony_ciobservability operations is also open for CAP_SYS_ADMIN privileged
798c2ecf20Sopenharmony_ciprocesses but CAP_SYS_ADMIN usage for secure monitoring and observability
808c2ecf20Sopenharmony_ciuse cases is discouraged with respect to the CAP_PERFMON capability.
818c2ecf20Sopenharmony_ciIf system audit records [14]_ for a process using perf_events system call
828c2ecf20Sopenharmony_ciAPI contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN
838c2ecf20Sopenharmony_cicapabilities then providing the process with CAP_PERFMON capability singly
848c2ecf20Sopenharmony_ciis recommended as the preferred secure approach to resolve double access
858c2ecf20Sopenharmony_cidenial logging related to usage of performance monitoring and observability.
868c2ecf20Sopenharmony_ci
878c2ecf20Sopenharmony_ciUnprivileged processes using perf_events system call are also subject
888c2ecf20Sopenharmony_cifor PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose
898c2ecf20Sopenharmony_cioutcome determines whether monitoring is permitted. So unprivileged
908c2ecf20Sopenharmony_ciprocesses provided with CAP_SYS_PTRACE capability are effectively
918c2ecf20Sopenharmony_cipermitted to pass the check.
928c2ecf20Sopenharmony_ci
938c2ecf20Sopenharmony_ciOther capabilities being granted to unprivileged processes can
948c2ecf20Sopenharmony_cieffectively enable capturing of additional data required for later
958c2ecf20Sopenharmony_ciperformance analysis of monitored processes or a system. For example,
968c2ecf20Sopenharmony_ciCAP_SYSLOG capability permits reading kernel space memory addresses from
978c2ecf20Sopenharmony_ci/proc/kallsyms file.
988c2ecf20Sopenharmony_ci
998c2ecf20Sopenharmony_ciPrivileged Perf users groups
1008c2ecf20Sopenharmony_ci---------------------------------
1018c2ecf20Sopenharmony_ci
1028c2ecf20Sopenharmony_ciMechanisms of capabilities, privileged capability-dumb files [6]_ and
1038c2ecf20Sopenharmony_cifile system ACLs [10]_ can be used to create dedicated groups of
1048c2ecf20Sopenharmony_ciprivileged Perf users who are permitted to execute performance monitoring
1058c2ecf20Sopenharmony_ciand observability without scope limits. The following steps can be
1068c2ecf20Sopenharmony_citaken to create such groups of privileged Perf users.
1078c2ecf20Sopenharmony_ci
1088c2ecf20Sopenharmony_ci1. Create perf_users group of privileged Perf users, assign perf_users
1098c2ecf20Sopenharmony_ci   group to Perf tool executable and limit access to the executable for
1108c2ecf20Sopenharmony_ci   other users in the system who are not in the perf_users group:
1118c2ecf20Sopenharmony_ci
1128c2ecf20Sopenharmony_ci::
1138c2ecf20Sopenharmony_ci
1148c2ecf20Sopenharmony_ci   # groupadd perf_users
1158c2ecf20Sopenharmony_ci   # ls -alhF
1168c2ecf20Sopenharmony_ci   -rwxr-xr-x  2 root root  11M Oct 19 15:12 perf
1178c2ecf20Sopenharmony_ci   # chgrp perf_users perf
1188c2ecf20Sopenharmony_ci   # ls -alhF
1198c2ecf20Sopenharmony_ci   -rwxr-xr-x  2 root perf_users  11M Oct 19 15:12 perf
1208c2ecf20Sopenharmony_ci   # chmod o-rwx perf
1218c2ecf20Sopenharmony_ci   # ls -alhF
1228c2ecf20Sopenharmony_ci   -rwxr-x---  2 root perf_users  11M Oct 19 15:12 perf
1238c2ecf20Sopenharmony_ci
1248c2ecf20Sopenharmony_ci2. Assign the required capabilities to the Perf tool executable file and
1258c2ecf20Sopenharmony_ci   enable members of perf_users group with monitoring and observability
1268c2ecf20Sopenharmony_ci   privileges [6]_ :
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ci::
1298c2ecf20Sopenharmony_ci
1308c2ecf20Sopenharmony_ci   # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
1318c2ecf20Sopenharmony_ci   # setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
1328c2ecf20Sopenharmony_ci   perf: OK
1338c2ecf20Sopenharmony_ci   # getcap perf
1348c2ecf20Sopenharmony_ci   perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
1358c2ecf20Sopenharmony_ci
1368c2ecf20Sopenharmony_ciIf the libcap installed doesn't yet support "cap_perfmon", use "38" instead,
1378c2ecf20Sopenharmony_cii.e.:
1388c2ecf20Sopenharmony_ci
1398c2ecf20Sopenharmony_ci::
1408c2ecf20Sopenharmony_ci
1418c2ecf20Sopenharmony_ci   # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
1428c2ecf20Sopenharmony_ci
1438c2ecf20Sopenharmony_ciNote that you may need to have 'cap_ipc_lock' in the mix for tools such as
1448c2ecf20Sopenharmony_ci'perf top', alternatively use 'perf top -m N', to reduce the memory that
1458c2ecf20Sopenharmony_ciit uses for the perf ring buffer, see the memory allocation section below.
1468c2ecf20Sopenharmony_ci
1478c2ecf20Sopenharmony_ciUsing a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38,
1488c2ecf20Sopenharmony_ciCAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u',
1498c2ecf20Sopenharmony_ciso as a workaround explicitly ask for the 'cycles' event, i.e.:
1508c2ecf20Sopenharmony_ci
1518c2ecf20Sopenharmony_ci::
1528c2ecf20Sopenharmony_ci
1538c2ecf20Sopenharmony_ci  # perf top -e cycles
1548c2ecf20Sopenharmony_ci
1558c2ecf20Sopenharmony_ciTo get kernel and user samples with a perf binary with just CAP_PERFMON.
1568c2ecf20Sopenharmony_ci
1578c2ecf20Sopenharmony_ciAs a result, members of perf_users group are capable of conducting
1588c2ecf20Sopenharmony_ciperformance monitoring and observability by using functionality of the
1598c2ecf20Sopenharmony_ciconfigured Perf tool executable that, when executes, passes perf_events
1608c2ecf20Sopenharmony_cisubsystem scope checks.
1618c2ecf20Sopenharmony_ci
1628c2ecf20Sopenharmony_ciThis specific access control management is only available to superuser
1638c2ecf20Sopenharmony_cior root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_
1648c2ecf20Sopenharmony_cicapabilities.
1658c2ecf20Sopenharmony_ci
1668c2ecf20Sopenharmony_ciUnprivileged users
1678c2ecf20Sopenharmony_ci-----------------------------------
1688c2ecf20Sopenharmony_ci
1698c2ecf20Sopenharmony_ciperf_events *scope* and *access* control for unprivileged processes
1708c2ecf20Sopenharmony_ciis governed by perf_event_paranoid [2]_ setting:
1718c2ecf20Sopenharmony_ci
1728c2ecf20Sopenharmony_ci-1:
1738c2ecf20Sopenharmony_ci     Impose no *scope* and *access* restrictions on using perf_events
1748c2ecf20Sopenharmony_ci     performance monitoring. Per-user per-cpu perf_event_mlock_kb [2]_
1758c2ecf20Sopenharmony_ci     locking limit is ignored when allocating memory buffers for storing
1768c2ecf20Sopenharmony_ci     performance data. This is the least secure mode since allowed
1778c2ecf20Sopenharmony_ci     monitored *scope* is maximized and no perf_events specific limits
1788c2ecf20Sopenharmony_ci     are imposed on *resources* allocated for performance monitoring.
1798c2ecf20Sopenharmony_ci
1808c2ecf20Sopenharmony_ci>=0:
1818c2ecf20Sopenharmony_ci     *scope* includes per-process and system wide performance monitoring
1828c2ecf20Sopenharmony_ci     but excludes raw tracepoints and ftrace function tracepoints
1838c2ecf20Sopenharmony_ci     monitoring. CPU and system events happened when executing either in
1848c2ecf20Sopenharmony_ci     user or in kernel space can be monitored and captured for later
1858c2ecf20Sopenharmony_ci     analysis. Per-user per-cpu perf_event_mlock_kb locking limit is
1868c2ecf20Sopenharmony_ci     imposed but ignored for unprivileged processes with CAP_IPC_LOCK
1878c2ecf20Sopenharmony_ci     [6]_ capability.
1888c2ecf20Sopenharmony_ci
1898c2ecf20Sopenharmony_ci>=1:
1908c2ecf20Sopenharmony_ci     *scope* includes per-process performance monitoring only and
1918c2ecf20Sopenharmony_ci     excludes system wide performance monitoring. CPU and system events
1928c2ecf20Sopenharmony_ci     happened when executing either in user or in kernel space can be
1938c2ecf20Sopenharmony_ci     monitored and captured for later analysis. Per-user per-cpu
1948c2ecf20Sopenharmony_ci     perf_event_mlock_kb locking limit is imposed but ignored for
1958c2ecf20Sopenharmony_ci     unprivileged processes with CAP_IPC_LOCK capability.
1968c2ecf20Sopenharmony_ci
1978c2ecf20Sopenharmony_ci>=2:
1988c2ecf20Sopenharmony_ci     *scope* includes per-process performance monitoring only. CPU and
1998c2ecf20Sopenharmony_ci     system events happened when executing in user space only can be
2008c2ecf20Sopenharmony_ci     monitored and captured for later analysis. Per-user per-cpu
2018c2ecf20Sopenharmony_ci     perf_event_mlock_kb locking limit is imposed but ignored for
2028c2ecf20Sopenharmony_ci     unprivileged processes with CAP_IPC_LOCK capability.
2038c2ecf20Sopenharmony_ci
2048c2ecf20Sopenharmony_ciResource control
2058c2ecf20Sopenharmony_ci---------------------------------
2068c2ecf20Sopenharmony_ci
2078c2ecf20Sopenharmony_ciOpen file descriptors
2088c2ecf20Sopenharmony_ci+++++++++++++++++++++
2098c2ecf20Sopenharmony_ci
2108c2ecf20Sopenharmony_ciThe perf_events system call API [2]_ allocates file descriptors for
2118c2ecf20Sopenharmony_cievery configured PMU event. Open file descriptors are a per-process
2128c2ecf20Sopenharmony_ciaccountable resource governed by the RLIMIT_NOFILE [11]_ limit
2138c2ecf20Sopenharmony_ci(ulimit -n), which is usually derived from the login shell process. When
2148c2ecf20Sopenharmony_ciconfiguring Perf collection for a long list of events on a large server
2158c2ecf20Sopenharmony_cisystem, this limit can be easily hit preventing required monitoring
2168c2ecf20Sopenharmony_ciconfiguration. RLIMIT_NOFILE limit can be increased on per-user basis
2178c2ecf20Sopenharmony_cimodifying content of the limits.conf file [12]_ . Ordinarily, a Perf
2188c2ecf20Sopenharmony_cisampling session (perf record) requires an amount of open perf_event
2198c2ecf20Sopenharmony_cifile descriptors that is not less than the number of monitored events
2208c2ecf20Sopenharmony_cimultiplied by the number of monitored CPUs.
2218c2ecf20Sopenharmony_ci
2228c2ecf20Sopenharmony_ciMemory allocation
2238c2ecf20Sopenharmony_ci+++++++++++++++++
2248c2ecf20Sopenharmony_ci
2258c2ecf20Sopenharmony_ciThe amount of memory available to user processes for capturing
2268c2ecf20Sopenharmony_ciperformance monitoring data is governed by the perf_event_mlock_kb [2]_
2278c2ecf20Sopenharmony_cisetting. This perf_event specific resource setting defines overall
2288c2ecf20Sopenharmony_ciper-cpu limits of memory allowed for mapping by the user processes to
2298c2ecf20Sopenharmony_ciexecute performance monitoring. The setting essentially extends the
2308c2ecf20Sopenharmony_ciRLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped
2318c2ecf20Sopenharmony_cispecifically for capturing monitored performance events and related data.
2328c2ecf20Sopenharmony_ci
2338c2ecf20Sopenharmony_ciFor example, if a machine has eight cores and perf_event_mlock_kb limit
2348c2ecf20Sopenharmony_ciis set to 516 KiB, then a user process is provided with 516 KiB * 8 =
2358c2ecf20Sopenharmony_ci4128 KiB of memory above the RLIMIT_MEMLOCK limit (ulimit -l) for
2368c2ecf20Sopenharmony_ciperf_event mmap buffers. In particular, this means that, if the user
2378c2ecf20Sopenharmony_ciwants to start two or more performance monitoring processes, the user is
2388c2ecf20Sopenharmony_cirequired to manually distribute the available 4128 KiB between the
2398c2ecf20Sopenharmony_cimonitoring processes, for example, using the --mmap-pages Perf record
2408c2ecf20Sopenharmony_cimode option. Otherwise, the first started performance monitoring process
2418c2ecf20Sopenharmony_ciallocates all available 4128 KiB and the other processes will fail to
2428c2ecf20Sopenharmony_ciproceed due to the lack of memory.
2438c2ecf20Sopenharmony_ci
2448c2ecf20Sopenharmony_ciRLIMIT_MEMLOCK and perf_event_mlock_kb resource constraints are ignored
2458c2ecf20Sopenharmony_cifor processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf
2468c2ecf20Sopenharmony_ciprivileged users can be provided with memory above the constraints for
2478c2ecf20Sopenharmony_ciperf_events/Perf performance monitoring purpose by providing the Perf
2488c2ecf20Sopenharmony_ciexecutable with CAP_IPC_LOCK capability.
2498c2ecf20Sopenharmony_ci
2508c2ecf20Sopenharmony_ciBibliography
2518c2ecf20Sopenharmony_ci------------
2528c2ecf20Sopenharmony_ci
2538c2ecf20Sopenharmony_ci.. [1] `<https://lwn.net/Articles/337493/>`_
2548c2ecf20Sopenharmony_ci.. [2] `<http://man7.org/linux/man-pages/man2/perf_event_open.2.html>`_
2558c2ecf20Sopenharmony_ci.. [3] `<http://web.eece.maine.edu/~vweaver/projects/perf_events/>`_
2568c2ecf20Sopenharmony_ci.. [4] `<https://perf.wiki.kernel.org/index.php/Main_Page>`_
2578c2ecf20Sopenharmony_ci.. [5] `<https://www.kernel.org/doc/html/latest/security/credentials.html>`_
2588c2ecf20Sopenharmony_ci.. [6] `<http://man7.org/linux/man-pages/man7/capabilities.7.html>`_
2598c2ecf20Sopenharmony_ci.. [7] `<http://man7.org/linux/man-pages/man2/ptrace.2.html>`_
2608c2ecf20Sopenharmony_ci.. [8] `<https://en.wikipedia.org/wiki/Hardware_performance_counter>`_
2618c2ecf20Sopenharmony_ci.. [9] `<https://en.wikipedia.org/wiki/Model-specific_register>`_
2628c2ecf20Sopenharmony_ci.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_
2638c2ecf20Sopenharmony_ci.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_
2648c2ecf20Sopenharmony_ci.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_
2658c2ecf20Sopenharmony_ci.. [13] `<https://sites.google.com/site/fullycapable>`_
2668c2ecf20Sopenharmony_ci.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_
267