162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci.. _xfrm_device:
362306a36Sopenharmony_ci
462306a36Sopenharmony_ci===============================================
562306a36Sopenharmony_ciXFRM device - offloading the IPsec computations
662306a36Sopenharmony_ci===============================================
762306a36Sopenharmony_ci
862306a36Sopenharmony_ciShannon Nelson <shannon.nelson@oracle.com>
962306a36Sopenharmony_ciLeon Romanovsky <leonro@nvidia.com>
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ci
1262306a36Sopenharmony_ciOverview
1362306a36Sopenharmony_ci========
1462306a36Sopenharmony_ci
1562306a36Sopenharmony_ciIPsec is a useful feature for securing network traffic, but the
1662306a36Sopenharmony_cicomputational cost is high: a 10Gbps link can easily be brought down
1762306a36Sopenharmony_cito under 1Gbps, depending on the traffic and link configuration.
1862306a36Sopenharmony_ciLuckily, there are NICs that offer a hardware based IPsec offload which
1962306a36Sopenharmony_cican radically increase throughput and decrease CPU utilization.  The XFRM
2062306a36Sopenharmony_ciDevice interface allows NIC drivers to offer to the stack access to the
2162306a36Sopenharmony_cihardware offload.
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ciRight now, there are two types of hardware offload that kernel supports.
2462306a36Sopenharmony_ci * IPsec crypto offload:
2562306a36Sopenharmony_ci   * NIC performs encrypt/decrypt
2662306a36Sopenharmony_ci   * Kernel does everything else
2762306a36Sopenharmony_ci * IPsec packet offload:
2862306a36Sopenharmony_ci   * NIC performs encrypt/decrypt
2962306a36Sopenharmony_ci   * NIC does encapsulation
3062306a36Sopenharmony_ci   * Kernel and NIC have SA and policy in-sync
3162306a36Sopenharmony_ci   * NIC handles the SA and policies states
3262306a36Sopenharmony_ci   * The Kernel talks to the keymanager
3362306a36Sopenharmony_ci
3462306a36Sopenharmony_ciUserland access to the offload is typically through a system such as
3562306a36Sopenharmony_cilibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
3662306a36Sopenharmony_cibe handy when experimenting.  An example command might look something
3762306a36Sopenharmony_cilike this for crypto offload:
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ci  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
4062306a36Sopenharmony_ci     reqid 0x07 replay-window 32 \
4162306a36Sopenharmony_ci     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
4262306a36Sopenharmony_ci     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
4362306a36Sopenharmony_ci     offload dev eth4 dir in
4462306a36Sopenharmony_ci
4562306a36Sopenharmony_ciand for packet offload
4662306a36Sopenharmony_ci
4762306a36Sopenharmony_ci  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
4862306a36Sopenharmony_ci     reqid 0x07 replay-window 32 \
4962306a36Sopenharmony_ci     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
5062306a36Sopenharmony_ci     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
5162306a36Sopenharmony_ci     offload packet dev eth4 dir in
5262306a36Sopenharmony_ci
5362306a36Sopenharmony_ci  ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in
5462306a36Sopenharmony_ci  tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport
5562306a36Sopenharmony_ci
5662306a36Sopenharmony_ciYes, that's ugly, but that's what shell scripts and/or libreswan are for.
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_ci
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ciCallbacks to implement
6162306a36Sopenharmony_ci======================
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ci::
6462306a36Sopenharmony_ci
6562306a36Sopenharmony_ci  /* from include/linux/netdevice.h */
6662306a36Sopenharmony_ci  struct xfrmdev_ops {
6762306a36Sopenharmony_ci        /* Crypto and Packet offload callbacks */
6862306a36Sopenharmony_ci	int	(*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack);
6962306a36Sopenharmony_ci	void	(*xdo_dev_state_delete) (struct xfrm_state *x);
7062306a36Sopenharmony_ci	void	(*xdo_dev_state_free) (struct xfrm_state *x);
7162306a36Sopenharmony_ci	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
7262306a36Sopenharmony_ci				       struct xfrm_state *x);
7362306a36Sopenharmony_ci	void    (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_ci        /* Solely packet offload callbacks */
7662306a36Sopenharmony_ci	void    (*xdo_dev_state_update_curlft) (struct xfrm_state *x);
7762306a36Sopenharmony_ci	int	(*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack);
7862306a36Sopenharmony_ci	void	(*xdo_dev_policy_delete) (struct xfrm_policy *x);
7962306a36Sopenharmony_ci	void	(*xdo_dev_policy_free) (struct xfrm_policy *x);
8062306a36Sopenharmony_ci  };
8162306a36Sopenharmony_ci
8262306a36Sopenharmony_ciThe NIC driver offering ipsec offload will need to implement callbacks
8362306a36Sopenharmony_cirelevant to supported offload to make the offload available to the network
8462306a36Sopenharmony_cistack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and
8562306a36Sopenharmony_ciNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload.
8662306a36Sopenharmony_ci
8762306a36Sopenharmony_ci
8862306a36Sopenharmony_ci
8962306a36Sopenharmony_ciFlow
9062306a36Sopenharmony_ci====
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ciAt probe time and before the call to register_netdev(), the driver should
9362306a36Sopenharmony_ciset up local data structures and XFRM callbacks, and set the feature bits.
9462306a36Sopenharmony_ciThe XFRM code's listener will finish the setup on NETDEV_REGISTER.
9562306a36Sopenharmony_ci
9662306a36Sopenharmony_ci::
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ci		adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops;
9962306a36Sopenharmony_ci		adapter->netdev->features |= NETIF_F_HW_ESP;
10062306a36Sopenharmony_ci		adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP;
10162306a36Sopenharmony_ci
10262306a36Sopenharmony_ciWhen new SAs are set up with a request for "offload" feature, the
10362306a36Sopenharmony_cidriver's xdo_dev_state_add() will be given the new SA to be offloaded
10462306a36Sopenharmony_ciand an indication of whether it is for Rx or Tx.  The driver should
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ci	- verify the algorithm is supported for offloads
10762306a36Sopenharmony_ci	- store the SA information (key, salt, target-ip, protocol, etc)
10862306a36Sopenharmony_ci	- enable the HW offload of the SA
10962306a36Sopenharmony_ci	- return status value:
11062306a36Sopenharmony_ci
11162306a36Sopenharmony_ci		===========   ===================================
11262306a36Sopenharmony_ci		0             success
11362306a36Sopenharmony_ci		-EOPNETSUPP   offload not supported, try SW IPsec,
11462306a36Sopenharmony_ci                              not applicable for packet offload mode
11562306a36Sopenharmony_ci		other         fail the request
11662306a36Sopenharmony_ci		===========   ===================================
11762306a36Sopenharmony_ci
11862306a36Sopenharmony_ciThe driver can also set an offload_handle in the SA, an opaque void pointer
11962306a36Sopenharmony_cithat can be used to convey context into the fast-path offload requests::
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ci		xs->xso.offload_handle = context;
12262306a36Sopenharmony_ci
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ciWhen the network stack is preparing an IPsec packet for an SA that has
12562306a36Sopenharmony_cibeen setup for offload, it first calls into xdo_dev_offload_ok() with
12662306a36Sopenharmony_cithe skb and the intended offload state to ask the driver if the offload
12762306a36Sopenharmony_ciwill serviceable.  This can check the packet information to be sure the
12862306a36Sopenharmony_cioffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
12962306a36Sopenharmony_cireturn true of false to signify its support.
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ciCrypto offload mode:
13262306a36Sopenharmony_ciWhen ready to send, the driver needs to inspect the Tx packet for the
13362306a36Sopenharmony_cioffload information, including the opaque context, and set up the packet
13462306a36Sopenharmony_cisend accordingly::
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_ci		xs = xfrm_input_state(skb);
13762306a36Sopenharmony_ci		context = xs->xso.offload_handle;
13862306a36Sopenharmony_ci		set up HW for send
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ciThe stack has already inserted the appropriate IPsec headers in the
14162306a36Sopenharmony_cipacket data, the offload just needs to do the encryption and fix up the
14262306a36Sopenharmony_ciheader values.
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ci
14562306a36Sopenharmony_ciWhen a packet is received and the HW has indicated that it offloaded a
14662306a36Sopenharmony_cidecryption, the driver needs to add a reference to the decoded SA into
14762306a36Sopenharmony_cithe packet's skb.  At this point the data should be decrypted but the
14862306a36Sopenharmony_ciIPsec headers are still in the packet data; they are removed later up
14962306a36Sopenharmony_cithe stack in xfrm_input().
15062306a36Sopenharmony_ci
15162306a36Sopenharmony_ci	find and hold the SA that was used to the Rx skb::
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ci		get spi, protocol, and destination IP from packet headers
15462306a36Sopenharmony_ci		xs = find xs from (spi, protocol, dest_IP)
15562306a36Sopenharmony_ci		xfrm_state_hold(xs);
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ci	store the state information into the skb::
15862306a36Sopenharmony_ci
15962306a36Sopenharmony_ci		sp = secpath_set(skb);
16062306a36Sopenharmony_ci		if (!sp) return;
16162306a36Sopenharmony_ci		sp->xvec[sp->len++] = xs;
16262306a36Sopenharmony_ci		sp->olen++;
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ci	indicate the success and/or error status of the offload::
16562306a36Sopenharmony_ci
16662306a36Sopenharmony_ci		xo = xfrm_offload(skb);
16762306a36Sopenharmony_ci		xo->flags = CRYPTO_DONE;
16862306a36Sopenharmony_ci		xo->status = crypto_status;
16962306a36Sopenharmony_ci
17062306a36Sopenharmony_ci	hand the packet to napi_gro_receive() as usual
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_ciIn ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn().
17362306a36Sopenharmony_ciDriver will check packet seq number and update HW ESN state machine if needed.
17462306a36Sopenharmony_ci
17562306a36Sopenharmony_ciPacket offload mode:
17662306a36Sopenharmony_ciHW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW
17762306a36Sopenharmony_cireported success. In TX path, the packet lefts kernel without extra header
17862306a36Sopenharmony_ciand not encrypted, the HW is responsible to perform it.
17962306a36Sopenharmony_ci
18062306a36Sopenharmony_ciWhen the SA is removed by the user, the driver's xdo_dev_state_delete()
18162306a36Sopenharmony_ciand xdo_dev_policy_delete() are asked to disable the offload.  Later,
18262306a36Sopenharmony_cixdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage
18362306a36Sopenharmony_cicollection routine after all reference counts to the state and policy
18462306a36Sopenharmony_cihave been removed and any remaining resources can be cleared for the
18562306a36Sopenharmony_cioffload state.  How these are used by the driver will depend on specific
18662306a36Sopenharmony_cihardware needs.
18762306a36Sopenharmony_ci
18862306a36Sopenharmony_ciAs a netdev is set to DOWN the XFRM stack's netdev listener will call
18962306a36Sopenharmony_cixdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and
19062306a36Sopenharmony_cixdo_dev_policy_free() on any remaining offloaded states.
19162306a36Sopenharmony_ci
19262306a36Sopenharmony_ciOutcome of HW handling packets, the XFRM core can't count hard, soft limits.
19362306a36Sopenharmony_ciThe HW/driver are responsible to perform it and provide accurate data when
19462306a36Sopenharmony_cixdo_dev_state_update_curlft() is called. In case of one of these limits
19562306a36Sopenharmony_cioccuried, the driver needs to call to xfrm_state_check_expire() to make sure
19662306a36Sopenharmony_cithat XFRM performs rekeying sequence.
197