162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci============================
462306a36Sopenharmony_ciBPF_PROG_TYPE_FLOW_DISSECTOR
562306a36Sopenharmony_ci============================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciOverview
862306a36Sopenharmony_ci========
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciFlow dissector is a routine that parses metadata out of the packets. It's
1162306a36Sopenharmony_ciused in the various places in the networking subsystem (RFS, flow hash, etc).
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciBPF flow dissector is an attempt to reimplement C-based flow dissector logic
1462306a36Sopenharmony_ciin BPF to gain all the benefits of BPF verifier (namely, limits on the
1562306a36Sopenharmony_cinumber of instructions and tail calls).
1662306a36Sopenharmony_ci
1762306a36Sopenharmony_ciAPI
1862306a36Sopenharmony_ci===
1962306a36Sopenharmony_ci
2062306a36Sopenharmony_ciBPF flow dissector programs operate on an ``__sk_buff``. However, only the
2162306a36Sopenharmony_cilimited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
2262306a36Sopenharmony_ci``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
2362306a36Sopenharmony_ciand output arguments.
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ciThe inputs are:
2662306a36Sopenharmony_ci  * ``nhoff`` - initial offset of the networking header
2762306a36Sopenharmony_ci  * ``thoff`` - initial offset of the transport header, initialized to nhoff
2862306a36Sopenharmony_ci  * ``n_proto`` - L3 protocol type, parsed out of L2 header
2962306a36Sopenharmony_ci  * ``flags`` - optional flags
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciFlow dissector BPF program should fill out the rest of the ``struct
3262306a36Sopenharmony_cibpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
3362306a36Sopenharmony_cialso adjusted accordingly.
3462306a36Sopenharmony_ci
3562306a36Sopenharmony_ciThe return code of the BPF program is either BPF_OK to indicate successful
3662306a36Sopenharmony_cidissection, or BPF_DROP to indicate parsing error.
3762306a36Sopenharmony_ci
3862306a36Sopenharmony_ci__sk_buff->data
3962306a36Sopenharmony_ci===============
4062306a36Sopenharmony_ci
4162306a36Sopenharmony_ciIn the VLAN-less case, this is what the initial state of the BPF flow
4262306a36Sopenharmony_cidissector looks like::
4362306a36Sopenharmony_ci
4462306a36Sopenharmony_ci  +------+------+------------+-----------+
4562306a36Sopenharmony_ci  | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
4662306a36Sopenharmony_ci  +------+------+------------+-----------+
4762306a36Sopenharmony_ci                              ^
4862306a36Sopenharmony_ci                              |
4962306a36Sopenharmony_ci                              +-- flow dissector starts here
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ci.. code:: c
5362306a36Sopenharmony_ci
5462306a36Sopenharmony_ci  skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
5562306a36Sopenharmony_ci  flow_keys->thoff = nhoff
5662306a36Sopenharmony_ci  flow_keys->n_proto = ETHER_TYPE
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_ciIn case of VLAN, flow dissector can be called with the two different states.
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ciPre-VLAN parsing::
6162306a36Sopenharmony_ci
6262306a36Sopenharmony_ci  +------+------+------+-----+-----------+-----------+
6362306a36Sopenharmony_ci  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
6462306a36Sopenharmony_ci  +------+------+------+-----+-----------+-----------+
6562306a36Sopenharmony_ci                        ^
6662306a36Sopenharmony_ci                        |
6762306a36Sopenharmony_ci                        +-- flow dissector starts here
6862306a36Sopenharmony_ci
6962306a36Sopenharmony_ci.. code:: c
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ci  skb->data + flow_keys->nhoff point the to first byte of TCI
7262306a36Sopenharmony_ci  flow_keys->thoff = nhoff
7362306a36Sopenharmony_ci  flow_keys->n_proto = TPID
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_ciPlease note that TPID can be 802.1AD and, hence, BPF program would
7662306a36Sopenharmony_cihave to parse VLAN information twice for double tagged packets.
7762306a36Sopenharmony_ci
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ciPost-VLAN parsing::
8062306a36Sopenharmony_ci
8162306a36Sopenharmony_ci  +------+------+------+-----+-----------+-----------+
8262306a36Sopenharmony_ci  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
8362306a36Sopenharmony_ci  +------+------+------+-----+-----------+-----------+
8462306a36Sopenharmony_ci                                          ^
8562306a36Sopenharmony_ci                                          |
8662306a36Sopenharmony_ci                                          +-- flow dissector starts here
8762306a36Sopenharmony_ci
8862306a36Sopenharmony_ci.. code:: c
8962306a36Sopenharmony_ci
9062306a36Sopenharmony_ci  skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
9162306a36Sopenharmony_ci  flow_keys->thoff = nhoff
9262306a36Sopenharmony_ci  flow_keys->n_proto = ETHER_TYPE
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ciIn this case VLAN information has been processed before the flow dissector
9562306a36Sopenharmony_ciand BPF flow dissector is not required to handle it.
9662306a36Sopenharmony_ci
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ciThe takeaway here is as follows: BPF flow dissector program can be called with
9962306a36Sopenharmony_cithe optional VLAN header and should gracefully handle both cases: when single
10062306a36Sopenharmony_cior double VLAN is present and when it is not present. The same program
10162306a36Sopenharmony_cican be called for both cases and would have to be written carefully to
10262306a36Sopenharmony_cihandle both cases.
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ci
10562306a36Sopenharmony_ciFlags
10662306a36Sopenharmony_ci=====
10762306a36Sopenharmony_ci
10862306a36Sopenharmony_ci``flow_keys->flags`` might contain optional input flags that work as follows:
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ci* ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to
11162306a36Sopenharmony_ci  continue parsing first fragment; the default expected behavior is that
11262306a36Sopenharmony_ci  flow dissector returns as soon as it finds out that the packet is fragmented;
11362306a36Sopenharmony_ci  used by ``eth_get_headlen`` to estimate length of all headers for GRO.
11462306a36Sopenharmony_ci* ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to
11562306a36Sopenharmony_ci  stop parsing as soon as it reaches IPv6 flow label; used by
11662306a36Sopenharmony_ci  ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash.
11762306a36Sopenharmony_ci* ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
11862306a36Sopenharmony_ci  parsing as soon as it reaches encapsulated headers; used by routing
11962306a36Sopenharmony_ci  infrastructure.
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ci
12262306a36Sopenharmony_ciReference Implementation
12362306a36Sopenharmony_ci========================
12462306a36Sopenharmony_ci
12562306a36Sopenharmony_ciSee ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
12662306a36Sopenharmony_ciimplementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
12762306a36Sopenharmony_cifor the loader. bpftool can be used to load BPF flow dissector program as well.
12862306a36Sopenharmony_ci
12962306a36Sopenharmony_ciThe reference implementation is organized as follows:
13062306a36Sopenharmony_ci  * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
13162306a36Sopenharmony_ci  * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
13262306a36Sopenharmony_ci    does ``bpf_tail_call`` to the appropriate L3 handler
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ciSince BPF at this point doesn't support looping (or any jumping back),
13562306a36Sopenharmony_cijmp_table is used instead to handle multiple levels of encapsulation (and
13662306a36Sopenharmony_ciIPv6 options).
13762306a36Sopenharmony_ci
13862306a36Sopenharmony_ci
13962306a36Sopenharmony_ciCurrent Limitations
14062306a36Sopenharmony_ci===================
14162306a36Sopenharmony_ciBPF flow dissector doesn't support exporting all the metadata that in-kernel
14262306a36Sopenharmony_ciC-based implementation can export. Notable example is single VLAN (802.1Q)
14362306a36Sopenharmony_ciand double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
14462306a36Sopenharmony_cifor a set of information that's currently can be exported from the BPF context.
14562306a36Sopenharmony_ci
14662306a36Sopenharmony_ciWhen BPF flow dissector is attached to the root network namespace (machine-wide
14762306a36Sopenharmony_cipolicy), users can't override it in their child network namespaces.
148