162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci================= 462306a36Sopenharmony_ciChecksum Offloads 562306a36Sopenharmony_ci================= 662306a36Sopenharmony_ci 762306a36Sopenharmony_ci 862306a36Sopenharmony_ciIntroduction 962306a36Sopenharmony_ci============ 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ciThis document describes a set of techniques in the Linux networking stack to 1262306a36Sopenharmony_citake advantage of checksum offload capabilities of various NICs. 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ciThe following technologies are described: 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ci* TX Checksum Offload 1762306a36Sopenharmony_ci* LCO: Local Checksum Offload 1862306a36Sopenharmony_ci* RCO: Remote Checksum Offload 1962306a36Sopenharmony_ci 2062306a36Sopenharmony_ciThings that should be documented here but aren't yet: 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ci* RX Checksum Offload 2362306a36Sopenharmony_ci* CHECKSUM_UNNECESSARY conversion 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ciTX Checksum Offload 2762306a36Sopenharmony_ci=================== 2862306a36Sopenharmony_ci 2962306a36Sopenharmony_ciThe interface for offloading a transmit checksum to a device is explained in 3062306a36Sopenharmony_cidetail in comments near the top of include/linux/skbuff.h. 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ciIn brief, it allows to request the device fill in a single ones-complement 3362306a36Sopenharmony_cichecksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. 3462306a36Sopenharmony_ciThe device should compute the 16-bit ones-complement checksum (i.e. the 3562306a36Sopenharmony_ci'IP-style' checksum) from csum_start to the end of the packet, and fill in the 3662306a36Sopenharmony_ciresult at (csum_start + csum_offset). 3762306a36Sopenharmony_ci 3862306a36Sopenharmony_ciBecause csum_offset cannot be negative, this ensures that the previous value of 3962306a36Sopenharmony_cithe checksum field is included in the checksum computation, thus it can be used 4062306a36Sopenharmony_cito supply any needed corrections to the checksum (such as the sum of the 4162306a36Sopenharmony_cipseudo-header for UDP or TCP). 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciThis interface only allows a single checksum to be offloaded. Where 4462306a36Sopenharmony_ciencapsulation is used, the packet may have multiple checksum fields in 4562306a36Sopenharmony_cidifferent header layers, and the rest will have to be handled by another 4662306a36Sopenharmony_cimechanism such as LCO or RCO. 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ciCRC32c can also be offloaded using this interface, by means of filling 4962306a36Sopenharmony_ciskb->csum_start and skb->csum_offset as described above, and setting 5062306a36Sopenharmony_ciskb->csum_not_inet: see skbuff.h comment (section 'D') for more details. 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ciNo offloading of the IP header checksum is performed; it is always done in 5362306a36Sopenharmony_cisoftware. This is OK because when we build the IP header, we obviously have it 5462306a36Sopenharmony_ciin cache, so summing it isn't expensive. It's also rather short. 5562306a36Sopenharmony_ci 5662306a36Sopenharmony_ciThe requirements for GSO are more complicated, because when segmenting an 5762306a36Sopenharmony_ciencapsulated packet both the inner and outer checksums may need to be edited or 5862306a36Sopenharmony_cirecomputed for each resulting segment. See the skbuff.h comment (section 'E') 5962306a36Sopenharmony_cifor more details. 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ciA driver declares its offload capabilities in netdev->hw_features; see 6262306a36Sopenharmony_ciDocumentation/networking/netdev-features.rst for more. Note that a device 6362306a36Sopenharmony_ciwhich only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and 6462306a36Sopenharmony_cicsum_offset given in the SKB; if it tries to deduce these itself in hardware 6562306a36Sopenharmony_ci(as some NICs do) the driver should check that the values in the SKB match 6662306a36Sopenharmony_cithose which the hardware will deduce, and if not, fall back to checksumming in 6762306a36Sopenharmony_cisoftware instead (with skb_csum_hwoffload_help() or one of the 6862306a36Sopenharmony_ciskb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in 6962306a36Sopenharmony_ciinclude/linux/skbuff.h). 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ciThe stack should, for the most part, assume that checksum offload is supported 7262306a36Sopenharmony_ciby the underlying device. The only place that should check is 7362306a36Sopenharmony_civalidate_xmit_skb(), and the functions it calls directly or indirectly. That 7462306a36Sopenharmony_cifunction compares the offload features requested by the SKB (which may include 7562306a36Sopenharmony_ciother offloads besides TX Checksum Offload) and, if they are not supported or 7662306a36Sopenharmony_cienabled on the device (determined by netdev->features), performs the 7762306a36Sopenharmony_cicorresponding offload in software. In the case of TX Checksum Offload, that 7862306a36Sopenharmony_cimeans calling skb_csum_hwoffload_help(skb, features). 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ci 8162306a36Sopenharmony_ciLCO: Local Checksum Offload 8262306a36Sopenharmony_ci=========================== 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ciLCO is a technique for efficiently computing the outer checksum of an 8562306a36Sopenharmony_ciencapsulated datagram when the inner checksum is due to be offloaded. 8662306a36Sopenharmony_ci 8762306a36Sopenharmony_ciThe ones-complement sum of a correctly checksummed TCP or UDP packet is equal 8862306a36Sopenharmony_cito the complement of the sum of the pseudo header, because everything else gets 8962306a36Sopenharmony_ci'cancelled out' by the checksum field. This is because the sum was 9062306a36Sopenharmony_cicomplemented before being written to the checksum field. 9162306a36Sopenharmony_ci 9262306a36Sopenharmony_ciMore generally, this holds in any case where the 'IP-style' ones complement 9362306a36Sopenharmony_cichecksum is used, and thus any checksum that TX Checksum Offload supports. 9462306a36Sopenharmony_ci 9562306a36Sopenharmony_ciThat is, if we have set up TX Checksum Offload with a start/offset pair, we 9662306a36Sopenharmony_ciknow that after the device has filled in that checksum, the ones complement sum 9762306a36Sopenharmony_cifrom csum_start to the end of the packet will be equal to the complement of 9862306a36Sopenharmony_ciwhatever value we put in the checksum field beforehand. This allows us to 9962306a36Sopenharmony_cicompute the outer checksum without looking at the payload: we simply stop 10062306a36Sopenharmony_cisumming when we get to csum_start, then add the complement of the 16-bit word 10162306a36Sopenharmony_ciat (csum_start + csum_offset). 10262306a36Sopenharmony_ci 10362306a36Sopenharmony_ciThen, when the true inner checksum is filled in (either by hardware or by 10462306a36Sopenharmony_ciskb_checksum_help()), the outer checksum will become correct by virtue of the 10562306a36Sopenharmony_ciarithmetic. 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ciLCO is performed by the stack when constructing an outer UDP header for an 10862306a36Sopenharmony_ciencapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the 10962306a36Sopenharmony_ciIPv6 equivalents, in udp6_set_csum(). 11062306a36Sopenharmony_ci 11162306a36Sopenharmony_ciIt is also performed when constructing an IPv4 GRE header, in 11262306a36Sopenharmony_cinet/ipv4/ip_gre.c:build_header(). It is *not* currently performed when 11362306a36Sopenharmony_ciconstructing an IPv6 GRE header; the GRE checksum is computed over the whole 11462306a36Sopenharmony_cipacket in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use 11562306a36Sopenharmony_ciLCO here as IPv6 GRE still uses an IP-style checksum. 11662306a36Sopenharmony_ci 11762306a36Sopenharmony_ciAll of the LCO implementations use a helper function lco_csum(), in 11862306a36Sopenharmony_ciinclude/linux/skbuff.h. 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ciLCO can safely be used for nested encapsulations; in this case, the outer 12162306a36Sopenharmony_ciencapsulation layer will sum over both its own header and the 'middle' header. 12262306a36Sopenharmony_ciThis does mean that the 'middle' header will get summed multiple times, but 12362306a36Sopenharmony_cithere doesn't seem to be a way to avoid that without incurring bigger costs 12462306a36Sopenharmony_ci(e.g. in SKB bloat). 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ci 12762306a36Sopenharmony_ciRCO: Remote Checksum Offload 12862306a36Sopenharmony_ci============================ 12962306a36Sopenharmony_ci 13062306a36Sopenharmony_ciRCO is a technique for eliding the inner checksum of an encapsulated datagram, 13162306a36Sopenharmony_ciallowing the outer checksum to be offloaded. It does, however, involve a 13262306a36Sopenharmony_cichange to the encapsulation protocols, which the receiver must also support. 13362306a36Sopenharmony_ciFor this reason, it is disabled by default. 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ciRCO is detailed in the following Internet-Drafts: 13662306a36Sopenharmony_ci 13762306a36Sopenharmony_ci* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 13862306a36Sopenharmony_ci* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ciIn Linux, RCO is implemented individually in each encapsulation protocol, and 14162306a36Sopenharmony_cimost tunnel types have flags controlling its use. For instance, VXLAN has the 14262306a36Sopenharmony_ciflag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be 14362306a36Sopenharmony_ciused when transmitting to a given remote destination. 144