162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci=====================
462306a36Sopenharmony_ciSegmentation Offloads
562306a36Sopenharmony_ci=====================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ci
862306a36Sopenharmony_ciIntroduction
962306a36Sopenharmony_ci============
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ciThis document describes a set of techniques in the Linux networking stack
1262306a36Sopenharmony_cito take advantage of segmentation offload capabilities of various NICs.
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_ciThe following technologies are described:
1562306a36Sopenharmony_ci * TCP Segmentation Offload - TSO
1662306a36Sopenharmony_ci * UDP Fragmentation Offload - UFO
1762306a36Sopenharmony_ci * IPIP, SIT, GRE, and UDP Tunnel Offloads
1862306a36Sopenharmony_ci * Generic Segmentation Offload - GSO
1962306a36Sopenharmony_ci * Generic Receive Offload - GRO
2062306a36Sopenharmony_ci * Partial Generic Segmentation Offload - GSO_PARTIAL
2162306a36Sopenharmony_ci * SCTP acceleration with GSO - GSO_BY_FRAGS
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ci
2462306a36Sopenharmony_ciTCP Segmentation Offload
2562306a36Sopenharmony_ci========================
2662306a36Sopenharmony_ci
2762306a36Sopenharmony_ciTCP segmentation allows a device to segment a single frame into multiple
2862306a36Sopenharmony_ciframes with a data payload size specified in skb_shinfo()->gso_size.
2962306a36Sopenharmony_ciWhen TCP segmentation requested the bit for either SKB_GSO_TCPV4 or
3062306a36Sopenharmony_ciSKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and
3162306a36Sopenharmony_ciskb_shinfo()->gso_size should be set to a non-zero value.
3262306a36Sopenharmony_ci
3362306a36Sopenharmony_ciTCP segmentation is dependent on support for the use of partial checksum
3462306a36Sopenharmony_cioffload.  For this reason TSO is normally disabled if the Tx checksum
3562306a36Sopenharmony_cioffload for a given device is disabled.
3662306a36Sopenharmony_ci
3762306a36Sopenharmony_ciIn order to support TCP segmentation offload it is necessary to populate
3862306a36Sopenharmony_cithe network and transport header offsets of the skbuff so that the device
3962306a36Sopenharmony_cidrivers will be able determine the offsets of the IP or IPv6 header and the
4062306a36Sopenharmony_ciTCP header.  In addition as CHECKSUM_PARTIAL is required csum_start should
4162306a36Sopenharmony_cialso point to the TCP header of the packet.
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciFor IPv4 segmentation we support one of two types in terms of the IP ID.
4462306a36Sopenharmony_ciThe default behavior is to increment the IP ID with every segment.  If the
4562306a36Sopenharmony_ciGSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
4662306a36Sopenharmony_ciID and all segments will use the same IP ID.  If a device has
4762306a36Sopenharmony_ciNETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
4862306a36Sopenharmony_ciand we will either increment the IP ID for all frames, or leave it at a
4962306a36Sopenharmony_cistatic value based on driver preference.
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciUDP Fragmentation Offload
5362306a36Sopenharmony_ci=========================
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ciUDP fragmentation offload allows a device to fragment an oversized UDP
5662306a36Sopenharmony_cidatagram into multiple IPv4 fragments.  Many of the requirements for UDP
5762306a36Sopenharmony_cifragmentation offload are the same as TSO.  However the IPv4 ID for
5862306a36Sopenharmony_cifragments should not increment as a single IPv4 datagram is fragmented.
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ciUFO is deprecated: modern kernels will no longer generate UFO skbs, but can
6162306a36Sopenharmony_cistill receive them from tuntap and similar devices. Offload of UDP-based
6262306a36Sopenharmony_citunnel protocols is still supported.
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ci
6562306a36Sopenharmony_ciIPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
6662306a36Sopenharmony_ci========================================================
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ciIn addition to the offloads described above it is possible for a frame to
6962306a36Sopenharmony_cicontain additional headers such as an outer tunnel.  In order to account
7062306a36Sopenharmony_cifor such instances an additional set of segmentation offload types were
7162306a36Sopenharmony_ciintroduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and
7262306a36Sopenharmony_ciSKB_GSO_UDP_TUNNEL.  These extra segmentation types are used to identify
7362306a36Sopenharmony_cicases where there are more than just 1 set of headers.  For example in the
7462306a36Sopenharmony_cicase of IPIP and SIT we should have the network and transport headers moved
7562306a36Sopenharmony_cifrom the standard list of headers to "inner" header offsets.
7662306a36Sopenharmony_ci
7762306a36Sopenharmony_ciCurrently only two levels of headers are supported.  The convention is to
7862306a36Sopenharmony_cirefer to the tunnel headers as the outer headers, while the encapsulated
7962306a36Sopenharmony_cidata is normally referred to as the inner headers.  Below is the list of
8062306a36Sopenharmony_cicalls to access the given headers:
8162306a36Sopenharmony_ci
8262306a36Sopenharmony_ciIPIP/SIT Tunnel::
8362306a36Sopenharmony_ci
8462306a36Sopenharmony_ci             Outer                  Inner
8562306a36Sopenharmony_ci  MAC        skb_mac_header
8662306a36Sopenharmony_ci  Network    skb_network_header     skb_inner_network_header
8762306a36Sopenharmony_ci  Transport  skb_transport_header
8862306a36Sopenharmony_ci
8962306a36Sopenharmony_ciUDP/GRE Tunnel::
9062306a36Sopenharmony_ci
9162306a36Sopenharmony_ci             Outer                  Inner
9262306a36Sopenharmony_ci  MAC        skb_mac_header         skb_inner_mac_header
9362306a36Sopenharmony_ci  Network    skb_network_header     skb_inner_network_header
9462306a36Sopenharmony_ci  Transport  skb_transport_header   skb_inner_transport_header
9562306a36Sopenharmony_ci
9662306a36Sopenharmony_ciIn addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
9762306a36Sopenharmony_ciSKB_GSO_UDP_TUNNEL_CSUM.  These two additional tunnel types reflect the
9862306a36Sopenharmony_cifact that the outer header also requests to have a non-zero checksum
9962306a36Sopenharmony_ciincluded in the outer header.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ciFinally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel
10262306a36Sopenharmony_ciheader has requested a remote checksum offload.  In this case the inner
10362306a36Sopenharmony_ciheaders will be left with a partial checksum and only the outer header
10462306a36Sopenharmony_cichecksum will be computed.
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ciGeneric Segmentation Offload
10862306a36Sopenharmony_ci============================
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ciGeneric segmentation offload is a pure software offload that is meant to
11162306a36Sopenharmony_cideal with cases where device drivers cannot perform the offloads described
11262306a36Sopenharmony_ciabove.  What occurs in GSO is that a given skbuff will have its data broken
11362306a36Sopenharmony_ciout over multiple skbuffs that have been resized to match the MSS provided
11462306a36Sopenharmony_civia skb_shinfo()->gso_size.
11562306a36Sopenharmony_ci
11662306a36Sopenharmony_ciBefore enabling any hardware segmentation offload a corresponding software
11762306a36Sopenharmony_cioffload is required in GSO.  Otherwise it becomes possible for a frame to
11862306a36Sopenharmony_cibe re-routed between devices and end up being unable to be transmitted.
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ciGeneric Receive Offload
12262306a36Sopenharmony_ci=======================
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ciGeneric receive offload is the complement to GSO.  Ideally any frame
12562306a36Sopenharmony_ciassembled by GRO should be segmented to create an identical sequence of
12662306a36Sopenharmony_ciframes using GSO, and any sequence of frames segmented by GSO should be
12762306a36Sopenharmony_ciable to be reassembled back to the original by GRO.  The only exception to
12862306a36Sopenharmony_cithis is IPv4 ID in the case that the DF bit is set for a given IP header.
12962306a36Sopenharmony_ciIf the value of the IPv4 ID is not sequentially incrementing it will be
13062306a36Sopenharmony_cialtered so that it is when a frame assembled via GRO is segmented via GSO.
13162306a36Sopenharmony_ci
13262306a36Sopenharmony_ci
13362306a36Sopenharmony_ciPartial Generic Segmentation Offload
13462306a36Sopenharmony_ci====================================
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_ciPartial generic segmentation offload is a hybrid between TSO and GSO.  What
13762306a36Sopenharmony_ciit effectively does is take advantage of certain traits of TCP and tunnels
13862306a36Sopenharmony_ciso that instead of having to rewrite the packet headers for each segment
13962306a36Sopenharmony_cionly the inner-most transport header and possibly the outer-most network
14062306a36Sopenharmony_ciheader need to be updated.  This allows devices that do not support tunnel
14162306a36Sopenharmony_cioffloads or tunnel offloads with checksum to still make use of segmentation.
14262306a36Sopenharmony_ci
14362306a36Sopenharmony_ciWith the partial offload what occurs is that all headers excluding the
14462306a36Sopenharmony_ciinner transport header are updated such that they will contain the correct
14562306a36Sopenharmony_civalues for if the header was simply duplicated.  The one exception to this
14662306a36Sopenharmony_ciis the outer IPv4 ID field.  It is up to the device drivers to guarantee
14762306a36Sopenharmony_cithat the IPv4 ID field is incremented in the case that a given header does
14862306a36Sopenharmony_cinot have the DF bit set.
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_ci
15162306a36Sopenharmony_ciSCTP acceleration with GSO
15262306a36Sopenharmony_ci===========================
15362306a36Sopenharmony_ci
15462306a36Sopenharmony_ciSCTP - despite the lack of hardware support - can still take advantage of
15562306a36Sopenharmony_ciGSO to pass one large packet through the network stack, rather than
15662306a36Sopenharmony_cimultiple small packets.
15762306a36Sopenharmony_ci
15862306a36Sopenharmony_ciThis requires a different approach to other offloads, as SCTP packets
15962306a36Sopenharmony_cicannot be just segmented to (P)MTU. Rather, the chunks must be contained in
16062306a36Sopenharmony_ciIP segments, padding respected. So unlike regular GSO, SCTP can't just
16162306a36Sopenharmony_cigenerate a big skb, set gso_size to the fragmentation point and deliver it
16262306a36Sopenharmony_cito IP layer.
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ciInstead, the SCTP protocol layer builds an skb with the segments correctly
16562306a36Sopenharmony_cipadded and stored as chained skbs, and skb_segment() splits based on those.
16662306a36Sopenharmony_ciTo signal this, gso_size is set to the special value GSO_BY_FRAGS.
16762306a36Sopenharmony_ci
16862306a36Sopenharmony_ciTherefore, any code in the core networking stack must be aware of the
16962306a36Sopenharmony_cipossibility that gso_size will be GSO_BY_FRAGS and handle that case
17062306a36Sopenharmony_ciappropriately.
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_ciThere are some helpers to make this easier:
17362306a36Sopenharmony_ci
17462306a36Sopenharmony_ci- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
17562306a36Sopenharmony_ci  an skb is an SCTP GSO skb.
17662306a36Sopenharmony_ci
17762306a36Sopenharmony_ci- For size checks, the skb_gso_validate_*_len family of helpers correctly
17862306a36Sopenharmony_ci  considers GSO_BY_FRAGS.
17962306a36Sopenharmony_ci
18062306a36Sopenharmony_ci- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
18162306a36Sopenharmony_ci  will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
18262306a36Sopenharmony_ci
18362306a36Sopenharmony_ciThis also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
18462306a36Sopenharmony_ciset. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.
185