162306a36Sopenharmony_ci.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci=========================== 462306a36Sopenharmony_ciBPF_PROG_TYPE_CGROUP_SYSCTL 562306a36Sopenharmony_ci=========================== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciThis document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that 862306a36Sopenharmony_ciprovides cgroup-bpf hook for sysctl. 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciThe hook has to be attached to a cgroup and will be called every time a 1162306a36Sopenharmony_ciprocess inside that cgroup tries to read from or write to sysctl knob in proc. 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ci1. Attach type 1462306a36Sopenharmony_ci************** 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ci``BPF_CGROUP_SYSCTL`` attach type has to be used to attach 1762306a36Sopenharmony_ci``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup. 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ci2. Context 2062306a36Sopenharmony_ci********** 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ci``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from 2362306a36Sopenharmony_ciBPF program:: 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ci struct bpf_sysctl { 2662306a36Sopenharmony_ci __u32 write; 2762306a36Sopenharmony_ci __u32 file_pos; 2862306a36Sopenharmony_ci }; 2962306a36Sopenharmony_ci 3062306a36Sopenharmony_ci* ``write`` indicates whether sysctl value is being read (``0``) or written 3162306a36Sopenharmony_ci (``1``). This field is read-only. 3262306a36Sopenharmony_ci 3362306a36Sopenharmony_ci* ``file_pos`` indicates file position sysctl is being accessed at, read 3462306a36Sopenharmony_ci or written. This field is read-write. Writing to the field sets the starting 3562306a36Sopenharmony_ci position in sysctl proc file ``read(2)`` will be reading from or ``write(2)`` 3662306a36Sopenharmony_ci will be writing to. Writing zero to the field can be used e.g. to override 3762306a36Sopenharmony_ci whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even 3862306a36Sopenharmony_ci when it's called by user space on ``file_pos > 0``. Writing non-zero 3962306a36Sopenharmony_ci value to the field can be used to access part of sysctl value starting from 4062306a36Sopenharmony_ci specified ``file_pos``. Not all sysctl support access with ``file_pos != 4162306a36Sopenharmony_ci 0``, e.g. writes to numeric sysctl entries must always be at file position 4262306a36Sopenharmony_ci ``0``. See also ``kernel.sysctl_writes_strict`` sysctl. 4362306a36Sopenharmony_ci 4462306a36Sopenharmony_ciSee `linux/bpf.h`_ for more details on how context field can be accessed. 4562306a36Sopenharmony_ci 4662306a36Sopenharmony_ci3. Return code 4762306a36Sopenharmony_ci************** 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_ci``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following 5062306a36Sopenharmony_cireturn codes: 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ci* ``0`` means "reject access to sysctl"; 5362306a36Sopenharmony_ci* ``1`` means "proceed with access". 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ciIf program returns ``0`` user space will get ``-1`` from ``read(2)`` or 5662306a36Sopenharmony_ci``write(2)`` and ``errno`` will be set to ``EPERM``. 5762306a36Sopenharmony_ci 5862306a36Sopenharmony_ci4. Helpers 5962306a36Sopenharmony_ci********** 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ciSince sysctl knob is represented by a name and a value, sysctl specific BPF 6262306a36Sopenharmony_cihelpers focus on providing access to these properties: 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ci* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in 6562306a36Sopenharmony_ci ``/proc/sys`` into provided by BPF program buffer; 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ci* ``bpf_sysctl_get_current_value()`` to get string value currently held by 6862306a36Sopenharmony_ci sysctl into provided by BPF program buffer. This helper is available on both 6962306a36Sopenharmony_ci ``read(2)`` from and ``write(2)`` to sysctl; 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ci* ``bpf_sysctl_get_new_value()`` to get new string value currently being 7262306a36Sopenharmony_ci written to sysctl before actual write happens. This helper can be used only 7362306a36Sopenharmony_ci on ``ctx->write == 1``; 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ci* ``bpf_sysctl_set_new_value()`` to override new string value currently being 7662306a36Sopenharmony_ci written to sysctl before actual write happens. Sysctl value will be 7762306a36Sopenharmony_ci overridden starting from the current ``ctx->file_pos``. If the whole value 7862306a36Sopenharmony_ci has to be overridden BPF program can set ``file_pos`` to zero before calling 7962306a36Sopenharmony_ci to the helper. This helper can be used only on ``ctx->write == 1``. New 8062306a36Sopenharmony_ci string value set by the helper is treated and verified by kernel same way as 8162306a36Sopenharmony_ci an equivalent string passed by user space. 8262306a36Sopenharmony_ci 8362306a36Sopenharmony_ciBPF program sees sysctl value same way as user space does in proc filesystem, 8462306a36Sopenharmony_cii.e. as a string. Since many sysctl values represent an integer or a vector 8562306a36Sopenharmony_ciof integers, the following helpers can be used to get numeric value from the 8662306a36Sopenharmony_cistring: 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ci* ``bpf_strtol()`` to convert initial part of the string to long integer 8962306a36Sopenharmony_ci similar to user space `strtol(3)`_; 9062306a36Sopenharmony_ci* ``bpf_strtoul()`` to convert initial part of the string to unsigned long 9162306a36Sopenharmony_ci integer similar to user space `strtoul(3)`_; 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ciSee `linux/bpf.h`_ for more details on helpers described here. 9462306a36Sopenharmony_ci 9562306a36Sopenharmony_ci5. Examples 9662306a36Sopenharmony_ci*********** 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciSee `test_sysctl_prog.c`_ for an example of BPF program in C that access 9962306a36Sopenharmony_cisysctl name and value, parses string value to get vector of integers and uses 10062306a36Sopenharmony_cithe result to make decision whether to allow or deny access to sysctl. 10162306a36Sopenharmony_ci 10262306a36Sopenharmony_ci6. Notes 10362306a36Sopenharmony_ci******** 10462306a36Sopenharmony_ci 10562306a36Sopenharmony_ci``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root 10662306a36Sopenharmony_cienvironment, for example to monitor sysctl usage or catch unreasonable values 10762306a36Sopenharmony_cian application, running as root in a separate cgroup, is trying to set. 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ciSince `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it 11062306a36Sopenharmony_cimay return results different from that at `sys_open` time, i.e. process that 11162306a36Sopenharmony_ciopened sysctl file in proc filesystem may differ from process that is trying 11262306a36Sopenharmony_cito read from / write to it and two such processes may run in different 11362306a36Sopenharmony_cicgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a 11462306a36Sopenharmony_cisecurity mechanism to limit sysctl usage. 11562306a36Sopenharmony_ci 11662306a36Sopenharmony_ciAs with any cgroup-bpf program additional care should be taken if an 11762306a36Sopenharmony_ciapplication running as root in a cgroup should not be allowed to 11862306a36Sopenharmony_cidetach/replace BPF program attached by administrator. 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ci.. Links 12162306a36Sopenharmony_ci.. _linux/bpf.h: ../../include/uapi/linux/bpf.h 12262306a36Sopenharmony_ci.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html 12362306a36Sopenharmony_ci.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html 12462306a36Sopenharmony_ci.. _test_sysctl_prog.c: 12562306a36Sopenharmony_ci ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c 126