162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci=================
462306a36Sopenharmony_ciChecksum Offloads
562306a36Sopenharmony_ci=================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ci
862306a36Sopenharmony_ciIntroduction
962306a36Sopenharmony_ci============
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ciThis document describes a set of techniques in the Linux networking stack to
1262306a36Sopenharmony_citake advantage of checksum offload capabilities of various NICs.
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_ciThe following technologies are described:
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ci* TX Checksum Offload
1762306a36Sopenharmony_ci* LCO: Local Checksum Offload
1862306a36Sopenharmony_ci* RCO: Remote Checksum Offload
1962306a36Sopenharmony_ci
2062306a36Sopenharmony_ciThings that should be documented here but aren't yet:
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ci* RX Checksum Offload
2362306a36Sopenharmony_ci* CHECKSUM_UNNECESSARY conversion
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ciTX Checksum Offload
2762306a36Sopenharmony_ci===================
2862306a36Sopenharmony_ci
2962306a36Sopenharmony_ciThe interface for offloading a transmit checksum to a device is explained in
3062306a36Sopenharmony_cidetail in comments near the top of include/linux/skbuff.h.
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ciIn brief, it allows to request the device fill in a single ones-complement
3362306a36Sopenharmony_cichecksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
3462306a36Sopenharmony_ciThe device should compute the 16-bit ones-complement checksum (i.e. the
3562306a36Sopenharmony_ci'IP-style' checksum) from csum_start to the end of the packet, and fill in the
3662306a36Sopenharmony_ciresult at (csum_start + csum_offset).
3762306a36Sopenharmony_ci
3862306a36Sopenharmony_ciBecause csum_offset cannot be negative, this ensures that the previous value of
3962306a36Sopenharmony_cithe checksum field is included in the checksum computation, thus it can be used
4062306a36Sopenharmony_cito supply any needed corrections to the checksum (such as the sum of the
4162306a36Sopenharmony_cipseudo-header for UDP or TCP).
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciThis interface only allows a single checksum to be offloaded.  Where
4462306a36Sopenharmony_ciencapsulation is used, the packet may have multiple checksum fields in
4562306a36Sopenharmony_cidifferent header layers, and the rest will have to be handled by another
4662306a36Sopenharmony_cimechanism such as LCO or RCO.
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ciCRC32c can also be offloaded using this interface, by means of filling
4962306a36Sopenharmony_ciskb->csum_start and skb->csum_offset as described above, and setting
5062306a36Sopenharmony_ciskb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciNo offloading of the IP header checksum is performed; it is always done in
5362306a36Sopenharmony_cisoftware.  This is OK because when we build the IP header, we obviously have it
5462306a36Sopenharmony_ciin cache, so summing it isn't expensive.  It's also rather short.
5562306a36Sopenharmony_ci
5662306a36Sopenharmony_ciThe requirements for GSO are more complicated, because when segmenting an
5762306a36Sopenharmony_ciencapsulated packet both the inner and outer checksums may need to be edited or
5862306a36Sopenharmony_cirecomputed for each resulting segment.  See the skbuff.h comment (section 'E')
5962306a36Sopenharmony_cifor more details.
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ciA driver declares its offload capabilities in netdev->hw_features; see
6262306a36Sopenharmony_ciDocumentation/networking/netdev-features.rst for more.  Note that a device
6362306a36Sopenharmony_ciwhich only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
6462306a36Sopenharmony_cicsum_offset given in the SKB; if it tries to deduce these itself in hardware
6562306a36Sopenharmony_ci(as some NICs do) the driver should check that the values in the SKB match
6662306a36Sopenharmony_cithose which the hardware will deduce, and if not, fall back to checksumming in
6762306a36Sopenharmony_cisoftware instead (with skb_csum_hwoffload_help() or one of the
6862306a36Sopenharmony_ciskb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
6962306a36Sopenharmony_ciinclude/linux/skbuff.h).
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ciThe stack should, for the most part, assume that checksum offload is supported
7262306a36Sopenharmony_ciby the underlying device.  The only place that should check is
7362306a36Sopenharmony_civalidate_xmit_skb(), and the functions it calls directly or indirectly.  That
7462306a36Sopenharmony_cifunction compares the offload features requested by the SKB (which may include
7562306a36Sopenharmony_ciother offloads besides TX Checksum Offload) and, if they are not supported or
7662306a36Sopenharmony_cienabled on the device (determined by netdev->features), performs the
7762306a36Sopenharmony_cicorresponding offload in software.  In the case of TX Checksum Offload, that
7862306a36Sopenharmony_cimeans calling skb_csum_hwoffload_help(skb, features).
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ci
8162306a36Sopenharmony_ciLCO: Local Checksum Offload
8262306a36Sopenharmony_ci===========================
8362306a36Sopenharmony_ci
8462306a36Sopenharmony_ciLCO is a technique for efficiently computing the outer checksum of an
8562306a36Sopenharmony_ciencapsulated datagram when the inner checksum is due to be offloaded.
8662306a36Sopenharmony_ci
8762306a36Sopenharmony_ciThe ones-complement sum of a correctly checksummed TCP or UDP packet is equal
8862306a36Sopenharmony_cito the complement of the sum of the pseudo header, because everything else gets
8962306a36Sopenharmony_ci'cancelled out' by the checksum field.  This is because the sum was
9062306a36Sopenharmony_cicomplemented before being written to the checksum field.
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ciMore generally, this holds in any case where the 'IP-style' ones complement
9362306a36Sopenharmony_cichecksum is used, and thus any checksum that TX Checksum Offload supports.
9462306a36Sopenharmony_ci
9562306a36Sopenharmony_ciThat is, if we have set up TX Checksum Offload with a start/offset pair, we
9662306a36Sopenharmony_ciknow that after the device has filled in that checksum, the ones complement sum
9762306a36Sopenharmony_cifrom csum_start to the end of the packet will be equal to the complement of
9862306a36Sopenharmony_ciwhatever value we put in the checksum field beforehand.  This allows us to
9962306a36Sopenharmony_cicompute the outer checksum without looking at the payload: we simply stop
10062306a36Sopenharmony_cisumming when we get to csum_start, then add the complement of the 16-bit word
10162306a36Sopenharmony_ciat (csum_start + csum_offset).
10262306a36Sopenharmony_ci
10362306a36Sopenharmony_ciThen, when the true inner checksum is filled in (either by hardware or by
10462306a36Sopenharmony_ciskb_checksum_help()), the outer checksum will become correct by virtue of the
10562306a36Sopenharmony_ciarithmetic.
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ciLCO is performed by the stack when constructing an outer UDP header for an
10862306a36Sopenharmony_ciencapsulation such as VXLAN or GENEVE, in udp_set_csum().  Similarly for the
10962306a36Sopenharmony_ciIPv6 equivalents, in udp6_set_csum().
11062306a36Sopenharmony_ci
11162306a36Sopenharmony_ciIt is also performed when constructing an IPv4 GRE header, in
11262306a36Sopenharmony_cinet/ipv4/ip_gre.c:build_header().  It is *not* currently performed when
11362306a36Sopenharmony_ciconstructing an IPv6 GRE header; the GRE checksum is computed over the whole
11462306a36Sopenharmony_cipacket in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
11562306a36Sopenharmony_ciLCO here as IPv6 GRE still uses an IP-style checksum.
11662306a36Sopenharmony_ci
11762306a36Sopenharmony_ciAll of the LCO implementations use a helper function lco_csum(), in
11862306a36Sopenharmony_ciinclude/linux/skbuff.h.
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ciLCO can safely be used for nested encapsulations; in this case, the outer
12162306a36Sopenharmony_ciencapsulation layer will sum over both its own header and the 'middle' header.
12262306a36Sopenharmony_ciThis does mean that the 'middle' header will get summed multiple times, but
12362306a36Sopenharmony_cithere doesn't seem to be a way to avoid that without incurring bigger costs
12462306a36Sopenharmony_ci(e.g. in SKB bloat).
12562306a36Sopenharmony_ci
12662306a36Sopenharmony_ci
12762306a36Sopenharmony_ciRCO: Remote Checksum Offload
12862306a36Sopenharmony_ci============================
12962306a36Sopenharmony_ci
13062306a36Sopenharmony_ciRCO is a technique for eliding the inner checksum of an encapsulated datagram,
13162306a36Sopenharmony_ciallowing the outer checksum to be offloaded.  It does, however, involve a
13262306a36Sopenharmony_cichange to the encapsulation protocols, which the receiver must also support.
13362306a36Sopenharmony_ciFor this reason, it is disabled by default.
13462306a36Sopenharmony_ci
13562306a36Sopenharmony_ciRCO is detailed in the following Internet-Drafts:
13662306a36Sopenharmony_ci
13762306a36Sopenharmony_ci* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
13862306a36Sopenharmony_ci* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ciIn Linux, RCO is implemented individually in each encapsulation protocol, and
14162306a36Sopenharmony_cimost tunnel types have flags controlling its use.  For instance, VXLAN has the
14262306a36Sopenharmony_ciflag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
14362306a36Sopenharmony_ciused when transmitting to a given remote destination.
144