162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci============================ 462306a36Sopenharmony_ciBPF_PROG_TYPE_FLOW_DISSECTOR 562306a36Sopenharmony_ci============================ 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciOverview 862306a36Sopenharmony_ci======== 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciFlow dissector is a routine that parses metadata out of the packets. It's 1162306a36Sopenharmony_ciused in the various places in the networking subsystem (RFS, flow hash, etc). 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciBPF flow dissector is an attempt to reimplement C-based flow dissector logic 1462306a36Sopenharmony_ciin BPF to gain all the benefits of BPF verifier (namely, limits on the 1562306a36Sopenharmony_cinumber of instructions and tail calls). 1662306a36Sopenharmony_ci 1762306a36Sopenharmony_ciAPI 1862306a36Sopenharmony_ci=== 1962306a36Sopenharmony_ci 2062306a36Sopenharmony_ciBPF flow dissector programs operate on an ``__sk_buff``. However, only the 2162306a36Sopenharmony_cilimited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``. 2262306a36Sopenharmony_ci``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input 2362306a36Sopenharmony_ciand output arguments. 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ciThe inputs are: 2662306a36Sopenharmony_ci * ``nhoff`` - initial offset of the networking header 2762306a36Sopenharmony_ci * ``thoff`` - initial offset of the transport header, initialized to nhoff 2862306a36Sopenharmony_ci * ``n_proto`` - L3 protocol type, parsed out of L2 header 2962306a36Sopenharmony_ci * ``flags`` - optional flags 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciFlow dissector BPF program should fill out the rest of the ``struct 3262306a36Sopenharmony_cibpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be 3362306a36Sopenharmony_cialso adjusted accordingly. 3462306a36Sopenharmony_ci 3562306a36Sopenharmony_ciThe return code of the BPF program is either BPF_OK to indicate successful 3662306a36Sopenharmony_cidissection, or BPF_DROP to indicate parsing error. 3762306a36Sopenharmony_ci 3862306a36Sopenharmony_ci__sk_buff->data 3962306a36Sopenharmony_ci=============== 4062306a36Sopenharmony_ci 4162306a36Sopenharmony_ciIn the VLAN-less case, this is what the initial state of the BPF flow 4262306a36Sopenharmony_cidissector looks like:: 4362306a36Sopenharmony_ci 4462306a36Sopenharmony_ci +------+------+------------+-----------+ 4562306a36Sopenharmony_ci | DMAC | SMAC | ETHER_TYPE | L3_HEADER | 4662306a36Sopenharmony_ci +------+------+------------+-----------+ 4762306a36Sopenharmony_ci ^ 4862306a36Sopenharmony_ci | 4962306a36Sopenharmony_ci +-- flow dissector starts here 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ci.. code:: c 5362306a36Sopenharmony_ci 5462306a36Sopenharmony_ci skb->data + flow_keys->nhoff point to the first byte of L3_HEADER 5562306a36Sopenharmony_ci flow_keys->thoff = nhoff 5662306a36Sopenharmony_ci flow_keys->n_proto = ETHER_TYPE 5762306a36Sopenharmony_ci 5862306a36Sopenharmony_ciIn case of VLAN, flow dissector can be called with the two different states. 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciPre-VLAN parsing:: 6162306a36Sopenharmony_ci 6262306a36Sopenharmony_ci +------+------+------+-----+-----------+-----------+ 6362306a36Sopenharmony_ci | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 6462306a36Sopenharmony_ci +------+------+------+-----+-----------+-----------+ 6562306a36Sopenharmony_ci ^ 6662306a36Sopenharmony_ci | 6762306a36Sopenharmony_ci +-- flow dissector starts here 6862306a36Sopenharmony_ci 6962306a36Sopenharmony_ci.. code:: c 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ci skb->data + flow_keys->nhoff point the to first byte of TCI 7262306a36Sopenharmony_ci flow_keys->thoff = nhoff 7362306a36Sopenharmony_ci flow_keys->n_proto = TPID 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ciPlease note that TPID can be 802.1AD and, hence, BPF program would 7662306a36Sopenharmony_cihave to parse VLAN information twice for double tagged packets. 7762306a36Sopenharmony_ci 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ciPost-VLAN parsing:: 8062306a36Sopenharmony_ci 8162306a36Sopenharmony_ci +------+------+------+-----+-----------+-----------+ 8262306a36Sopenharmony_ci | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 8362306a36Sopenharmony_ci +------+------+------+-----+-----------+-----------+ 8462306a36Sopenharmony_ci ^ 8562306a36Sopenharmony_ci | 8662306a36Sopenharmony_ci +-- flow dissector starts here 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ci.. code:: c 8962306a36Sopenharmony_ci 9062306a36Sopenharmony_ci skb->data + flow_keys->nhoff point the to first byte of L3_HEADER 9162306a36Sopenharmony_ci flow_keys->thoff = nhoff 9262306a36Sopenharmony_ci flow_keys->n_proto = ETHER_TYPE 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ciIn this case VLAN information has been processed before the flow dissector 9562306a36Sopenharmony_ciand BPF flow dissector is not required to handle it. 9662306a36Sopenharmony_ci 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciThe takeaway here is as follows: BPF flow dissector program can be called with 9962306a36Sopenharmony_cithe optional VLAN header and should gracefully handle both cases: when single 10062306a36Sopenharmony_cior double VLAN is present and when it is not present. The same program 10162306a36Sopenharmony_cican be called for both cases and would have to be written carefully to 10262306a36Sopenharmony_cihandle both cases. 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ci 10562306a36Sopenharmony_ciFlags 10662306a36Sopenharmony_ci===== 10762306a36Sopenharmony_ci 10862306a36Sopenharmony_ci``flow_keys->flags`` might contain optional input flags that work as follows: 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ci* ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to 11162306a36Sopenharmony_ci continue parsing first fragment; the default expected behavior is that 11262306a36Sopenharmony_ci flow dissector returns as soon as it finds out that the packet is fragmented; 11362306a36Sopenharmony_ci used by ``eth_get_headlen`` to estimate length of all headers for GRO. 11462306a36Sopenharmony_ci* ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to 11562306a36Sopenharmony_ci stop parsing as soon as it reaches IPv6 flow label; used by 11662306a36Sopenharmony_ci ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash. 11762306a36Sopenharmony_ci* ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop 11862306a36Sopenharmony_ci parsing as soon as it reaches encapsulated headers; used by routing 11962306a36Sopenharmony_ci infrastructure. 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ci 12262306a36Sopenharmony_ciReference Implementation 12362306a36Sopenharmony_ci======================== 12462306a36Sopenharmony_ci 12562306a36Sopenharmony_ciSee ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference 12662306a36Sopenharmony_ciimplementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]`` 12762306a36Sopenharmony_cifor the loader. bpftool can be used to load BPF flow dissector program as well. 12862306a36Sopenharmony_ci 12962306a36Sopenharmony_ciThe reference implementation is organized as follows: 13062306a36Sopenharmony_ci * ``jmp_table`` map that contains sub-programs for each supported L3 protocol 13162306a36Sopenharmony_ci * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and 13262306a36Sopenharmony_ci does ``bpf_tail_call`` to the appropriate L3 handler 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ciSince BPF at this point doesn't support looping (or any jumping back), 13562306a36Sopenharmony_cijmp_table is used instead to handle multiple levels of encapsulation (and 13662306a36Sopenharmony_ciIPv6 options). 13762306a36Sopenharmony_ci 13862306a36Sopenharmony_ci 13962306a36Sopenharmony_ciCurrent Limitations 14062306a36Sopenharmony_ci=================== 14162306a36Sopenharmony_ciBPF flow dissector doesn't support exporting all the metadata that in-kernel 14262306a36Sopenharmony_ciC-based implementation can export. Notable example is single VLAN (802.1Q) 14362306a36Sopenharmony_ciand double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` 14462306a36Sopenharmony_cifor a set of information that's currently can be exported from the BPF context. 14562306a36Sopenharmony_ci 14662306a36Sopenharmony_ciWhen BPF flow dissector is attached to the root network namespace (machine-wide 14762306a36Sopenharmony_cipolicy), users can't override it in their child network namespaces. 148