162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci===================================
462306a36Sopenharmony_ciRunning BPF programs from userspace
562306a36Sopenharmony_ci===================================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciThis document describes the ``BPF_PROG_RUN`` facility for running BPF programs
862306a36Sopenharmony_cifrom userspace.
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ci.. contents::
1162306a36Sopenharmony_ci    :local:
1262306a36Sopenharmony_ci    :depth: 2
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_ci
1562306a36Sopenharmony_ciOverview
1662306a36Sopenharmony_ci--------
1762306a36Sopenharmony_ci
1862306a36Sopenharmony_ciThe ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to
1962306a36Sopenharmony_ciexecute a BPF program in the kernel and return the results to userspace. This
2062306a36Sopenharmony_cican be used to unit test BPF programs against user-supplied context objects, and
2162306a36Sopenharmony_cias way to explicitly execute programs in the kernel for their side effects. The
2262306a36Sopenharmony_cicommand was previously named ``BPF_PROG_TEST_RUN``, and both constants continue
2362306a36Sopenharmony_cito be defined in the UAPI header, aliased to the same value.
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ciThe ``BPF_PROG_RUN`` command can be used to execute BPF programs of the
2662306a36Sopenharmony_cifollowing types:
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ci- ``BPF_PROG_TYPE_SOCKET_FILTER``
2962306a36Sopenharmony_ci- ``BPF_PROG_TYPE_SCHED_CLS``
3062306a36Sopenharmony_ci- ``BPF_PROG_TYPE_SCHED_ACT``
3162306a36Sopenharmony_ci- ``BPF_PROG_TYPE_XDP``
3262306a36Sopenharmony_ci- ``BPF_PROG_TYPE_SK_LOOKUP``
3362306a36Sopenharmony_ci- ``BPF_PROG_TYPE_CGROUP_SKB``
3462306a36Sopenharmony_ci- ``BPF_PROG_TYPE_LWT_IN``
3562306a36Sopenharmony_ci- ``BPF_PROG_TYPE_LWT_OUT``
3662306a36Sopenharmony_ci- ``BPF_PROG_TYPE_LWT_XMIT``
3762306a36Sopenharmony_ci- ``BPF_PROG_TYPE_LWT_SEG6LOCAL``
3862306a36Sopenharmony_ci- ``BPF_PROG_TYPE_FLOW_DISSECTOR``
3962306a36Sopenharmony_ci- ``BPF_PROG_TYPE_STRUCT_OPS``
4062306a36Sopenharmony_ci- ``BPF_PROG_TYPE_RAW_TRACEPOINT``
4162306a36Sopenharmony_ci- ``BPF_PROG_TYPE_SYSCALL``
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciWhen using the ``BPF_PROG_RUN`` command, userspace supplies an input context
4462306a36Sopenharmony_ciobject and (for program types operating on network packets) a buffer containing
4562306a36Sopenharmony_cithe packet data that the BPF program will operate on. The kernel will then
4662306a36Sopenharmony_ciexecute the program and return the results to userspace. Note that programs will
4762306a36Sopenharmony_cinot have any side effects while being run in this mode; in particular, packets
4862306a36Sopenharmony_ciwill not actually be redirected or dropped, the program return code will just be
4962306a36Sopenharmony_cireturned to userspace. A separate mode for live execution of XDP programs is
5062306a36Sopenharmony_ciprovided, documented separately below.
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciRunning XDP programs in "live frame mode"
5362306a36Sopenharmony_ci-----------------------------------------
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ciThe ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs,
5662306a36Sopenharmony_ciwhich can be used to execute XDP programs in a way where packets will actually
5762306a36Sopenharmony_cibe processed by the kernel after the execution of the XDP program as if they
5862306a36Sopenharmony_ciarrived on a physical interface. This mode is activated by setting the
5962306a36Sopenharmony_ci``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to
6062306a36Sopenharmony_ci``BPF_PROG_RUN``.
6162306a36Sopenharmony_ci
6262306a36Sopenharmony_ciThe live packet mode is optimised for high performance execution of the supplied
6362306a36Sopenharmony_ciXDP program many times (suitable for, e.g., running as a traffic generator),
6462306a36Sopenharmony_ciwhich means the semantics are not quite as straight-forward as the regular test
6562306a36Sopenharmony_cirun mode. Specifically:
6662306a36Sopenharmony_ci
6762306a36Sopenharmony_ci- When executing an XDP program in live frame mode, the result of the execution
6862306a36Sopenharmony_ci  will not be returned to userspace; instead, the kernel will perform the
6962306a36Sopenharmony_ci  operation indicated by the program's return code (drop the packet, redirect
7062306a36Sopenharmony_ci  it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes
7162306a36Sopenharmony_ci  in the syscall parameters when running in this mode will be rejected. In
7262306a36Sopenharmony_ci  addition, not all failures will be reported back to userspace directly;
7362306a36Sopenharmony_ci  specifically, only fatal errors in setup or during execution (like memory
7462306a36Sopenharmony_ci  allocation errors) will halt execution and return an error. If an error occurs
7562306a36Sopenharmony_ci  in packet processing, like a failure to redirect to a given interface,
7662306a36Sopenharmony_ci  execution will continue with the next repetition; these errors can be detected
7762306a36Sopenharmony_ci  via the same trace points as for regular XDP programs.
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ci- Userspace can supply an ifindex as part of the context object, just like in
8062306a36Sopenharmony_ci  the regular (non-live) mode. The XDP program will be executed as though the
8162306a36Sopenharmony_ci  packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context
8262306a36Sopenharmony_ci  object will point to that interface. Furthermore, if the XDP program returns
8362306a36Sopenharmony_ci  ``XDP_PASS``, the packet will be injected into the kernel networking stack as
8462306a36Sopenharmony_ci  though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet
8562306a36Sopenharmony_ci  will be transmitted *out* of that same interface. Do note, though, that
8662306a36Sopenharmony_ci  because the program execution is not happening in driver context, an
8762306a36Sopenharmony_ci  ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to
8862306a36Sopenharmony_ci  that same interface (i.e., it will only work if the driver has support for the
8962306a36Sopenharmony_ci  ``ndo_xdp_xmit`` driver op).
9062306a36Sopenharmony_ci
9162306a36Sopenharmony_ci- When running the program with multiple repetitions, the execution will happen
9262306a36Sopenharmony_ci  in batches. The batch size defaults to 64 packets (which is same as the
9362306a36Sopenharmony_ci  maximum NAPI receive batch size), but can be specified by userspace through
9462306a36Sopenharmony_ci  the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch,
9562306a36Sopenharmony_ci  the kernel executes the XDP program repeatedly, each invocation getting a
9662306a36Sopenharmony_ci  separate copy of the packet data. For each repetition, if the program drops
9762306a36Sopenharmony_ci  the packet, the data page is immediately recycled (see below). Otherwise, the
9862306a36Sopenharmony_ci  packet is buffered until the end of the batch, at which point all packets
9962306a36Sopenharmony_ci  buffered this way during the batch are transmitted at once.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ci- When setting up the test run, the kernel will initialise a pool of memory
10262306a36Sopenharmony_ci  pages of the same size as the batch size. Each memory page will be initialised
10362306a36Sopenharmony_ci  with the initial packet data supplied by userspace at ``BPF_PROG_RUN``
10462306a36Sopenharmony_ci  invocation. When possible, the pages will be recycled on future program
10562306a36Sopenharmony_ci  invocations, to improve performance. Pages will generally be recycled a full
10662306a36Sopenharmony_ci  batch at a time, except when a packet is dropped (by return code or because
10762306a36Sopenharmony_ci  of, say, a redirection error), in which case that page will be recycled
10862306a36Sopenharmony_ci  immediately. If a packet ends up being passed to the regular networking stack
10962306a36Sopenharmony_ci  (because the XDP program returns ``XDP_PASS``, or because it ends up being
11062306a36Sopenharmony_ci  redirected to an interface that injects it into the stack), the page will be
11162306a36Sopenharmony_ci  released and a new one will be allocated when the pool is empty.
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ci  When recycling, the page content is not rewritten; only the packet boundary
11462306a36Sopenharmony_ci  pointers (``data``, ``data_end`` and ``data_meta``) in the context object will
11562306a36Sopenharmony_ci  be reset to the original values. This means that if a program rewrites the
11662306a36Sopenharmony_ci  packet contents, it has to be prepared to see either the original content or
11762306a36Sopenharmony_ci  the modified version on subsequent invocations.
118