162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci===================== 462306a36Sopenharmony_ciSegmentation Offloads 562306a36Sopenharmony_ci===================== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ci 862306a36Sopenharmony_ciIntroduction 962306a36Sopenharmony_ci============ 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ciThis document describes a set of techniques in the Linux networking stack 1262306a36Sopenharmony_cito take advantage of segmentation offload capabilities of various NICs. 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ciThe following technologies are described: 1562306a36Sopenharmony_ci * TCP Segmentation Offload - TSO 1662306a36Sopenharmony_ci * UDP Fragmentation Offload - UFO 1762306a36Sopenharmony_ci * IPIP, SIT, GRE, and UDP Tunnel Offloads 1862306a36Sopenharmony_ci * Generic Segmentation Offload - GSO 1962306a36Sopenharmony_ci * Generic Receive Offload - GRO 2062306a36Sopenharmony_ci * Partial Generic Segmentation Offload - GSO_PARTIAL 2162306a36Sopenharmony_ci * SCTP acceleration with GSO - GSO_BY_FRAGS 2262306a36Sopenharmony_ci 2362306a36Sopenharmony_ci 2462306a36Sopenharmony_ciTCP Segmentation Offload 2562306a36Sopenharmony_ci======================== 2662306a36Sopenharmony_ci 2762306a36Sopenharmony_ciTCP segmentation allows a device to segment a single frame into multiple 2862306a36Sopenharmony_ciframes with a data payload size specified in skb_shinfo()->gso_size. 2962306a36Sopenharmony_ciWhen TCP segmentation requested the bit for either SKB_GSO_TCPV4 or 3062306a36Sopenharmony_ciSKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and 3162306a36Sopenharmony_ciskb_shinfo()->gso_size should be set to a non-zero value. 3262306a36Sopenharmony_ci 3362306a36Sopenharmony_ciTCP segmentation is dependent on support for the use of partial checksum 3462306a36Sopenharmony_cioffload. For this reason TSO is normally disabled if the Tx checksum 3562306a36Sopenharmony_cioffload for a given device is disabled. 3662306a36Sopenharmony_ci 3762306a36Sopenharmony_ciIn order to support TCP segmentation offload it is necessary to populate 3862306a36Sopenharmony_cithe network and transport header offsets of the skbuff so that the device 3962306a36Sopenharmony_cidrivers will be able determine the offsets of the IP or IPv6 header and the 4062306a36Sopenharmony_ciTCP header. In addition as CHECKSUM_PARTIAL is required csum_start should 4162306a36Sopenharmony_cialso point to the TCP header of the packet. 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciFor IPv4 segmentation we support one of two types in terms of the IP ID. 4462306a36Sopenharmony_ciThe default behavior is to increment the IP ID with every segment. If the 4562306a36Sopenharmony_ciGSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP 4662306a36Sopenharmony_ciID and all segments will use the same IP ID. If a device has 4762306a36Sopenharmony_ciNETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO 4862306a36Sopenharmony_ciand we will either increment the IP ID for all frames, or leave it at a 4962306a36Sopenharmony_cistatic value based on driver preference. 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ciUDP Fragmentation Offload 5362306a36Sopenharmony_ci========================= 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ciUDP fragmentation offload allows a device to fragment an oversized UDP 5662306a36Sopenharmony_cidatagram into multiple IPv4 fragments. Many of the requirements for UDP 5762306a36Sopenharmony_cifragmentation offload are the same as TSO. However the IPv4 ID for 5862306a36Sopenharmony_cifragments should not increment as a single IPv4 datagram is fragmented. 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciUFO is deprecated: modern kernels will no longer generate UFO skbs, but can 6162306a36Sopenharmony_cistill receive them from tuntap and similar devices. Offload of UDP-based 6262306a36Sopenharmony_citunnel protocols is still supported. 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ci 6562306a36Sopenharmony_ciIPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads 6662306a36Sopenharmony_ci======================================================== 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ciIn addition to the offloads described above it is possible for a frame to 6962306a36Sopenharmony_cicontain additional headers such as an outer tunnel. In order to account 7062306a36Sopenharmony_cifor such instances an additional set of segmentation offload types were 7162306a36Sopenharmony_ciintroduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and 7262306a36Sopenharmony_ciSKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify 7362306a36Sopenharmony_cicases where there are more than just 1 set of headers. For example in the 7462306a36Sopenharmony_cicase of IPIP and SIT we should have the network and transport headers moved 7562306a36Sopenharmony_cifrom the standard list of headers to "inner" header offsets. 7662306a36Sopenharmony_ci 7762306a36Sopenharmony_ciCurrently only two levels of headers are supported. The convention is to 7862306a36Sopenharmony_cirefer to the tunnel headers as the outer headers, while the encapsulated 7962306a36Sopenharmony_cidata is normally referred to as the inner headers. Below is the list of 8062306a36Sopenharmony_cicalls to access the given headers: 8162306a36Sopenharmony_ci 8262306a36Sopenharmony_ciIPIP/SIT Tunnel:: 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ci Outer Inner 8562306a36Sopenharmony_ci MAC skb_mac_header 8662306a36Sopenharmony_ci Network skb_network_header skb_inner_network_header 8762306a36Sopenharmony_ci Transport skb_transport_header 8862306a36Sopenharmony_ci 8962306a36Sopenharmony_ciUDP/GRE Tunnel:: 9062306a36Sopenharmony_ci 9162306a36Sopenharmony_ci Outer Inner 9262306a36Sopenharmony_ci MAC skb_mac_header skb_inner_mac_header 9362306a36Sopenharmony_ci Network skb_network_header skb_inner_network_header 9462306a36Sopenharmony_ci Transport skb_transport_header skb_inner_transport_header 9562306a36Sopenharmony_ci 9662306a36Sopenharmony_ciIn addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and 9762306a36Sopenharmony_ciSKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the 9862306a36Sopenharmony_cifact that the outer header also requests to have a non-zero checksum 9962306a36Sopenharmony_ciincluded in the outer header. 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ciFinally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel 10262306a36Sopenharmony_ciheader has requested a remote checksum offload. In this case the inner 10362306a36Sopenharmony_ciheaders will be left with a partial checksum and only the outer header 10462306a36Sopenharmony_cichecksum will be computed. 10562306a36Sopenharmony_ci 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ciGeneric Segmentation Offload 10862306a36Sopenharmony_ci============================ 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ciGeneric segmentation offload is a pure software offload that is meant to 11162306a36Sopenharmony_cideal with cases where device drivers cannot perform the offloads described 11262306a36Sopenharmony_ciabove. What occurs in GSO is that a given skbuff will have its data broken 11362306a36Sopenharmony_ciout over multiple skbuffs that have been resized to match the MSS provided 11462306a36Sopenharmony_civia skb_shinfo()->gso_size. 11562306a36Sopenharmony_ci 11662306a36Sopenharmony_ciBefore enabling any hardware segmentation offload a corresponding software 11762306a36Sopenharmony_cioffload is required in GSO. Otherwise it becomes possible for a frame to 11862306a36Sopenharmony_cibe re-routed between devices and end up being unable to be transmitted. 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ciGeneric Receive Offload 12262306a36Sopenharmony_ci======================= 12362306a36Sopenharmony_ci 12462306a36Sopenharmony_ciGeneric receive offload is the complement to GSO. Ideally any frame 12562306a36Sopenharmony_ciassembled by GRO should be segmented to create an identical sequence of 12662306a36Sopenharmony_ciframes using GSO, and any sequence of frames segmented by GSO should be 12762306a36Sopenharmony_ciable to be reassembled back to the original by GRO. The only exception to 12862306a36Sopenharmony_cithis is IPv4 ID in the case that the DF bit is set for a given IP header. 12962306a36Sopenharmony_ciIf the value of the IPv4 ID is not sequentially incrementing it will be 13062306a36Sopenharmony_cialtered so that it is when a frame assembled via GRO is segmented via GSO. 13162306a36Sopenharmony_ci 13262306a36Sopenharmony_ci 13362306a36Sopenharmony_ciPartial Generic Segmentation Offload 13462306a36Sopenharmony_ci==================================== 13562306a36Sopenharmony_ci 13662306a36Sopenharmony_ciPartial generic segmentation offload is a hybrid between TSO and GSO. What 13762306a36Sopenharmony_ciit effectively does is take advantage of certain traits of TCP and tunnels 13862306a36Sopenharmony_ciso that instead of having to rewrite the packet headers for each segment 13962306a36Sopenharmony_cionly the inner-most transport header and possibly the outer-most network 14062306a36Sopenharmony_ciheader need to be updated. This allows devices that do not support tunnel 14162306a36Sopenharmony_cioffloads or tunnel offloads with checksum to still make use of segmentation. 14262306a36Sopenharmony_ci 14362306a36Sopenharmony_ciWith the partial offload what occurs is that all headers excluding the 14462306a36Sopenharmony_ciinner transport header are updated such that they will contain the correct 14562306a36Sopenharmony_civalues for if the header was simply duplicated. The one exception to this 14662306a36Sopenharmony_ciis the outer IPv4 ID field. It is up to the device drivers to guarantee 14762306a36Sopenharmony_cithat the IPv4 ID field is incremented in the case that a given header does 14862306a36Sopenharmony_cinot have the DF bit set. 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ci 15162306a36Sopenharmony_ciSCTP acceleration with GSO 15262306a36Sopenharmony_ci=========================== 15362306a36Sopenharmony_ci 15462306a36Sopenharmony_ciSCTP - despite the lack of hardware support - can still take advantage of 15562306a36Sopenharmony_ciGSO to pass one large packet through the network stack, rather than 15662306a36Sopenharmony_cimultiple small packets. 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ciThis requires a different approach to other offloads, as SCTP packets 15962306a36Sopenharmony_cicannot be just segmented to (P)MTU. Rather, the chunks must be contained in 16062306a36Sopenharmony_ciIP segments, padding respected. So unlike regular GSO, SCTP can't just 16162306a36Sopenharmony_cigenerate a big skb, set gso_size to the fragmentation point and deliver it 16262306a36Sopenharmony_cito IP layer. 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ciInstead, the SCTP protocol layer builds an skb with the segments correctly 16562306a36Sopenharmony_cipadded and stored as chained skbs, and skb_segment() splits based on those. 16662306a36Sopenharmony_ciTo signal this, gso_size is set to the special value GSO_BY_FRAGS. 16762306a36Sopenharmony_ci 16862306a36Sopenharmony_ciTherefore, any code in the core networking stack must be aware of the 16962306a36Sopenharmony_cipossibility that gso_size will be GSO_BY_FRAGS and handle that case 17062306a36Sopenharmony_ciappropriately. 17162306a36Sopenharmony_ci 17262306a36Sopenharmony_ciThere are some helpers to make this easier: 17362306a36Sopenharmony_ci 17462306a36Sopenharmony_ci- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if 17562306a36Sopenharmony_ci an skb is an SCTP GSO skb. 17662306a36Sopenharmony_ci 17762306a36Sopenharmony_ci- For size checks, the skb_gso_validate_*_len family of helpers correctly 17862306a36Sopenharmony_ci considers GSO_BY_FRAGS. 17962306a36Sopenharmony_ci 18062306a36Sopenharmony_ci- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size 18162306a36Sopenharmony_ci will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. 18262306a36Sopenharmony_ci 18362306a36Sopenharmony_ciThis also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits 18462306a36Sopenharmony_ciset. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. 185