18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci============================
48c2ecf20Sopenharmony_ciBPF_PROG_TYPE_CGROUP_SOCKOPT
58c2ecf20Sopenharmony_ci============================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ci``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
88c2ecf20Sopenharmony_cicgroup hooks:
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ci* ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
118c2ecf20Sopenharmony_ci  system call.
128c2ecf20Sopenharmony_ci* ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
138c2ecf20Sopenharmony_ci  system call.
148c2ecf20Sopenharmony_ci
158c2ecf20Sopenharmony_ciThe context (``struct bpf_sockopt``) has associated socket (``sk``) and
168c2ecf20Sopenharmony_ciall input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.
178c2ecf20Sopenharmony_ci
188c2ecf20Sopenharmony_ciBPF_CGROUP_SETSOCKOPT
198c2ecf20Sopenharmony_ci=====================
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_ci``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of
228c2ecf20Sopenharmony_cisockopt and it has writable context: it can modify the supplied arguments
238c2ecf20Sopenharmony_cibefore passing them down to the kernel. This hook has access to the cgroup
248c2ecf20Sopenharmony_ciand socket local storage.
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ciIf BPF program sets ``optlen`` to -1, the control will be returned
278c2ecf20Sopenharmony_ciback to the userspace after all other BPF programs in the cgroup
288c2ecf20Sopenharmony_cichain finish (i.e. kernel ``setsockopt`` handling will *not* be executed).
298c2ecf20Sopenharmony_ci
308c2ecf20Sopenharmony_ciNote, that ``optlen`` can not be increased beyond the user-supplied
318c2ecf20Sopenharmony_civalue. It can only be decreased or set to -1. Any other value will
328c2ecf20Sopenharmony_citrigger ``EFAULT``.
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ciReturn Type
358c2ecf20Sopenharmony_ci-----------
368c2ecf20Sopenharmony_ci
378c2ecf20Sopenharmony_ci* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
388c2ecf20Sopenharmony_ci* ``1`` - success, continue with next BPF program in the cgroup chain.
398c2ecf20Sopenharmony_ci
408c2ecf20Sopenharmony_ciBPF_CGROUP_GETSOCKOPT
418c2ecf20Sopenharmony_ci=====================
428c2ecf20Sopenharmony_ci
438c2ecf20Sopenharmony_ci``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of
448c2ecf20Sopenharmony_cisockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
458c2ecf20Sopenharmony_ciif it's interested in whatever kernel has returned. BPF hook can override
468c2ecf20Sopenharmony_cithe values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
478c2ecf20Sopenharmony_cihas been increased above initial ``getsockopt`` value (i.e. userspace
488c2ecf20Sopenharmony_cibuffer is too small), ``EFAULT`` is returned.
498c2ecf20Sopenharmony_ci
508c2ecf20Sopenharmony_ciThis hook has access to the cgroup and socket local storage.
518c2ecf20Sopenharmony_ci
528c2ecf20Sopenharmony_ciNote, that the only acceptable value to set to ``retval`` is 0 and the
538c2ecf20Sopenharmony_cioriginal value that the kernel returned. Any other value will trigger
548c2ecf20Sopenharmony_ci``EFAULT``.
558c2ecf20Sopenharmony_ci
568c2ecf20Sopenharmony_ciReturn Type
578c2ecf20Sopenharmony_ci-----------
588c2ecf20Sopenharmony_ci
598c2ecf20Sopenharmony_ci* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
608c2ecf20Sopenharmony_ci* ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return
618c2ecf20Sopenharmony_ci  ``retval`` from the syscall (note that this can be overwritten by
628c2ecf20Sopenharmony_ci  the BPF program from the parent cgroup).
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ciCgroup Inheritance
658c2ecf20Sopenharmony_ci==================
668c2ecf20Sopenharmony_ci
678c2ecf20Sopenharmony_ciSuppose, there is the following cgroup hierarchy where each cgroup
688c2ecf20Sopenharmony_cihas ``BPF_CGROUP_GETSOCKOPT`` attached at each level with
698c2ecf20Sopenharmony_ci``BPF_F_ALLOW_MULTI`` flag::
708c2ecf20Sopenharmony_ci
718c2ecf20Sopenharmony_ci  A (root, parent)
728c2ecf20Sopenharmony_ci   \
738c2ecf20Sopenharmony_ci    B (child)
748c2ecf20Sopenharmony_ci
758c2ecf20Sopenharmony_ciWhen the application calls ``getsockopt`` syscall from the cgroup B,
768c2ecf20Sopenharmony_cithe programs are executed from the bottom up: B, A. First program
778c2ecf20Sopenharmony_ci(B) sees the result of kernel's ``getsockopt``. It can optionally
788c2ecf20Sopenharmony_ciadjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that
798c2ecf20Sopenharmony_cicontrol will be passed to the second (A) program which will see the
808c2ecf20Sopenharmony_cisame context as B including any potential modifications.
818c2ecf20Sopenharmony_ci
828c2ecf20Sopenharmony_ciSame for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to
838c2ecf20Sopenharmony_ciA and B, the trigger order is B, then A. If B does any changes
848c2ecf20Sopenharmony_cito the input arguments (``level``, ``optname``, ``optval``, ``optlen``),
858c2ecf20Sopenharmony_cithen the next program in the chain (A) will see those changes,
868c2ecf20Sopenharmony_ci*not* the original input ``setsockopt`` arguments. The potentially
878c2ecf20Sopenharmony_cimodified values will be then passed down to the kernel.
888c2ecf20Sopenharmony_ci
898c2ecf20Sopenharmony_ciLarge optval
908c2ecf20Sopenharmony_ci============
918c2ecf20Sopenharmony_ciWhen the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program
928c2ecf20Sopenharmony_cican access only the first ``PAGE_SIZE`` of that data. So it has to options:
938c2ecf20Sopenharmony_ci
948c2ecf20Sopenharmony_ci* Set ``optlen`` to zero, which indicates that the kernel should
958c2ecf20Sopenharmony_ci  use the original buffer from the userspace. Any modifications
968c2ecf20Sopenharmony_ci  done by the BPF program to the ``optval`` are ignored.
978c2ecf20Sopenharmony_ci* Set ``optlen`` to the value less than ``PAGE_SIZE``, which
988c2ecf20Sopenharmony_ci  indicates that the kernel should use BPF's trimmed ``optval``.
998c2ecf20Sopenharmony_ci
1008c2ecf20Sopenharmony_ciWhen the BPF program returns with the ``optlen`` greater than
1018c2ecf20Sopenharmony_ci``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
1028c2ecf20Sopenharmony_ci
1038c2ecf20Sopenharmony_ciExample
1048c2ecf20Sopenharmony_ci=======
1058c2ecf20Sopenharmony_ci
1068c2ecf20Sopenharmony_ciSee ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
1078c2ecf20Sopenharmony_ciof BPF program that handles socket options.
108