162306a36Sopenharmony_ci.. _kernel_tls:
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci==========
462306a36Sopenharmony_ciKernel TLS
562306a36Sopenharmony_ci==========
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciOverview
862306a36Sopenharmony_ci========
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciTransport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
1162306a36Sopenharmony_ciTCP. TLS provides end-to-end data integrity and confidentiality.
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciUser interface
1462306a36Sopenharmony_ci==============
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ciCreating a TLS connection
1762306a36Sopenharmony_ci-------------------------
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciFirst create a new TCP socket and set the TLS ULP.
2062306a36Sopenharmony_ci
2162306a36Sopenharmony_ci.. code-block:: c
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ci  sock = socket(AF_INET, SOCK_STREAM, 0);
2462306a36Sopenharmony_ci  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ciSetting the TLS ULP allows us to set/get TLS socket options. Currently
2762306a36Sopenharmony_cionly the symmetric encryption is handled in the kernel.  After the TLS
2862306a36Sopenharmony_cihandshake is complete, we have all the parameters required to move the
2962306a36Sopenharmony_cidata-path to the kernel. There is a separate socket option for moving
3062306a36Sopenharmony_cithe transmit and the receive into the kernel.
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ci.. code-block:: c
3362306a36Sopenharmony_ci
3462306a36Sopenharmony_ci  /* From linux/tls.h */
3562306a36Sopenharmony_ci  struct tls_crypto_info {
3662306a36Sopenharmony_ci          unsigned short version;
3762306a36Sopenharmony_ci          unsigned short cipher_type;
3862306a36Sopenharmony_ci  };
3962306a36Sopenharmony_ci
4062306a36Sopenharmony_ci  struct tls12_crypto_info_aes_gcm_128 {
4162306a36Sopenharmony_ci          struct tls_crypto_info info;
4262306a36Sopenharmony_ci          unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
4362306a36Sopenharmony_ci          unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
4462306a36Sopenharmony_ci          unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
4562306a36Sopenharmony_ci          unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
4662306a36Sopenharmony_ci  };
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ci
4962306a36Sopenharmony_ci  struct tls12_crypto_info_aes_gcm_128 crypto_info;
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci  crypto_info.info.version = TLS_1_2_VERSION;
5262306a36Sopenharmony_ci  crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
5362306a36Sopenharmony_ci  memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
5462306a36Sopenharmony_ci  memcpy(crypto_info.rec_seq, seq_number_write,
5562306a36Sopenharmony_ci					TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
5662306a36Sopenharmony_ci  memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
5762306a36Sopenharmony_ci  memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
5862306a36Sopenharmony_ci
5962306a36Sopenharmony_ci  setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info));
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ciTransmit and receive are set separately, but the setup is the same, using either
6262306a36Sopenharmony_ciTLS_TX or TLS_RX.
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ciSending TLS application data
6562306a36Sopenharmony_ci----------------------------
6662306a36Sopenharmony_ci
6762306a36Sopenharmony_ciAfter setting the TLS_TX socket option all application data sent over this
6862306a36Sopenharmony_cisocket is encrypted using TLS and the parameters provided in the socket option.
6962306a36Sopenharmony_ciFor example, we can send an encrypted hello world record as follows:
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ci.. code-block:: c
7262306a36Sopenharmony_ci
7362306a36Sopenharmony_ci  const char *msg = "hello world\n";
7462306a36Sopenharmony_ci  send(sock, msg, strlen(msg));
7562306a36Sopenharmony_ci
7662306a36Sopenharmony_cisend() data is directly encrypted from the userspace buffer provided
7762306a36Sopenharmony_cito the encrypted kernel send buffer if possible.
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ciThe sendfile system call will send the file's data over TLS records of maximum
8062306a36Sopenharmony_cilength (2^14).
8162306a36Sopenharmony_ci
8262306a36Sopenharmony_ci.. code-block:: c
8362306a36Sopenharmony_ci
8462306a36Sopenharmony_ci  file = open(filename, O_RDONLY);
8562306a36Sopenharmony_ci  fstat(file, &stat);
8662306a36Sopenharmony_ci  sendfile(sock, file, &offset, stat.st_size);
8762306a36Sopenharmony_ci
8862306a36Sopenharmony_ciTLS records are created and sent after each send() call, unless
8962306a36Sopenharmony_ciMSG_MORE is passed.  MSG_MORE will delay creation of a record until
9062306a36Sopenharmony_ciMSG_MORE is not passed, or the maximum record size is reached.
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ciThe kernel will need to allocate a buffer for the encrypted data.
9362306a36Sopenharmony_ciThis buffer is allocated at the time send() is called, such that
9462306a36Sopenharmony_cieither the entire send() call will return -ENOMEM (or block waiting
9562306a36Sopenharmony_cifor memory), or the encryption will always succeed.  If send() returns
9662306a36Sopenharmony_ci-ENOMEM and some data was left on the socket buffer from a previous
9762306a36Sopenharmony_cicall using MSG_MORE, the MSG_MORE data is left on the socket buffer.
9862306a36Sopenharmony_ci
9962306a36Sopenharmony_ciReceiving TLS application data
10062306a36Sopenharmony_ci------------------------------
10162306a36Sopenharmony_ci
10262306a36Sopenharmony_ciAfter setting the TLS_RX socket option, all recv family socket calls
10362306a36Sopenharmony_ciare decrypted using TLS parameters provided.  A full TLS record must
10462306a36Sopenharmony_cibe received before decryption can happen.
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ci.. code-block:: c
10762306a36Sopenharmony_ci
10862306a36Sopenharmony_ci  char buffer[16384];
10962306a36Sopenharmony_ci  recv(sock, buffer, 16384);
11062306a36Sopenharmony_ci
11162306a36Sopenharmony_ciReceived data is decrypted directly in to the user buffer if it is
11262306a36Sopenharmony_cilarge enough, and no additional allocations occur.  If the userspace
11362306a36Sopenharmony_cibuffer is too small, data is decrypted in the kernel and copied to
11462306a36Sopenharmony_ciuserspace.
11562306a36Sopenharmony_ci
11662306a36Sopenharmony_ci``EINVAL`` is returned if the TLS version in the received message does not
11762306a36Sopenharmony_cimatch the version passed in setsockopt.
11862306a36Sopenharmony_ci
11962306a36Sopenharmony_ci``EMSGSIZE`` is returned if the received message is too big.
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ci``EBADMSG`` is returned if decryption failed for any other reason.
12262306a36Sopenharmony_ci
12362306a36Sopenharmony_ciSend TLS control messages
12462306a36Sopenharmony_ci-------------------------
12562306a36Sopenharmony_ci
12662306a36Sopenharmony_ciOther than application data, TLS has control messages such as alert
12762306a36Sopenharmony_cimessages (record type 21) and handshake messages (record type 22), etc.
12862306a36Sopenharmony_ciThese messages can be sent over the socket by providing the TLS record type
12962306a36Sopenharmony_civia a CMSG. For example the following function sends @data of @length bytes
13062306a36Sopenharmony_ciusing a record of type @record_type.
13162306a36Sopenharmony_ci
13262306a36Sopenharmony_ci.. code-block:: c
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ci  /* send TLS control message using record_type */
13562306a36Sopenharmony_ci  static int klts_send_ctrl_message(int sock, unsigned char record_type,
13662306a36Sopenharmony_ci                                    void *data, size_t length)
13762306a36Sopenharmony_ci  {
13862306a36Sopenharmony_ci        struct msghdr msg = {0};
13962306a36Sopenharmony_ci        int cmsg_len = sizeof(record_type);
14062306a36Sopenharmony_ci        struct cmsghdr *cmsg;
14162306a36Sopenharmony_ci        char buf[CMSG_SPACE(cmsg_len)];
14262306a36Sopenharmony_ci        struct iovec msg_iov;   /* Vector of data to send/receive into.  */
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ci        msg.msg_control = buf;
14562306a36Sopenharmony_ci        msg.msg_controllen = sizeof(buf);
14662306a36Sopenharmony_ci        cmsg = CMSG_FIRSTHDR(&msg);
14762306a36Sopenharmony_ci        cmsg->cmsg_level = SOL_TLS;
14862306a36Sopenharmony_ci        cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
14962306a36Sopenharmony_ci        cmsg->cmsg_len = CMSG_LEN(cmsg_len);
15062306a36Sopenharmony_ci        *CMSG_DATA(cmsg) = record_type;
15162306a36Sopenharmony_ci        msg.msg_controllen = cmsg->cmsg_len;
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ci        msg_iov.iov_base = data;
15462306a36Sopenharmony_ci        msg_iov.iov_len = length;
15562306a36Sopenharmony_ci        msg.msg_iov = &msg_iov;
15662306a36Sopenharmony_ci        msg.msg_iovlen = 1;
15762306a36Sopenharmony_ci
15862306a36Sopenharmony_ci        return sendmsg(sock, &msg, 0);
15962306a36Sopenharmony_ci  }
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ciControl message data should be provided unencrypted, and will be
16262306a36Sopenharmony_ciencrypted by the kernel.
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ciReceiving TLS control messages
16562306a36Sopenharmony_ci------------------------------
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ciTLS control messages are passed in the userspace buffer, with message
16862306a36Sopenharmony_citype passed via cmsg.  If no cmsg buffer is provided, an error is
16962306a36Sopenharmony_cireturned if a control message is received.  Data messages may be
17062306a36Sopenharmony_cireceived without a cmsg buffer set.
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_ci.. code-block:: c
17362306a36Sopenharmony_ci
17462306a36Sopenharmony_ci  char buffer[16384];
17562306a36Sopenharmony_ci  char cmsg[CMSG_SPACE(sizeof(unsigned char))];
17662306a36Sopenharmony_ci  struct msghdr msg = {0};
17762306a36Sopenharmony_ci  msg.msg_control = cmsg;
17862306a36Sopenharmony_ci  msg.msg_controllen = sizeof(cmsg);
17962306a36Sopenharmony_ci
18062306a36Sopenharmony_ci  struct iovec msg_iov;
18162306a36Sopenharmony_ci  msg_iov.iov_base = buffer;
18262306a36Sopenharmony_ci  msg_iov.iov_len = 16384;
18362306a36Sopenharmony_ci
18462306a36Sopenharmony_ci  msg.msg_iov = &msg_iov;
18562306a36Sopenharmony_ci  msg.msg_iovlen = 1;
18662306a36Sopenharmony_ci
18762306a36Sopenharmony_ci  int ret = recvmsg(sock, &msg, 0 /* flags */);
18862306a36Sopenharmony_ci
18962306a36Sopenharmony_ci  struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
19062306a36Sopenharmony_ci  if (cmsg->cmsg_level == SOL_TLS &&
19162306a36Sopenharmony_ci      cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
19262306a36Sopenharmony_ci      int record_type = *((unsigned char *)CMSG_DATA(cmsg));
19362306a36Sopenharmony_ci      // Do something with record_type, and control message data in
19462306a36Sopenharmony_ci      // buffer.
19562306a36Sopenharmony_ci      //
19662306a36Sopenharmony_ci      // Note that record_type may be == to application data (23).
19762306a36Sopenharmony_ci  } else {
19862306a36Sopenharmony_ci      // Buffer contains application data.
19962306a36Sopenharmony_ci  }
20062306a36Sopenharmony_ci
20162306a36Sopenharmony_cirecv will never return data from mixed types of TLS records.
20262306a36Sopenharmony_ci
20362306a36Sopenharmony_ciIntegrating in to userspace TLS library
20462306a36Sopenharmony_ci---------------------------------------
20562306a36Sopenharmony_ci
20662306a36Sopenharmony_ciAt a high level, the kernel TLS ULP is a replacement for the record
20762306a36Sopenharmony_cilayer of a userspace TLS library.
20862306a36Sopenharmony_ci
20962306a36Sopenharmony_ciA patchset to OpenSSL to use ktls as the record layer is
21062306a36Sopenharmony_ci`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
21162306a36Sopenharmony_ci
21262306a36Sopenharmony_ci`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
21362306a36Sopenharmony_ciof calling send directly after a handshake using gnutls.
21462306a36Sopenharmony_ciSince it doesn't implement a full record layer, control
21562306a36Sopenharmony_cimessages are not supported.
21662306a36Sopenharmony_ci
21762306a36Sopenharmony_ciOptional optimizations
21862306a36Sopenharmony_ci----------------------
21962306a36Sopenharmony_ci
22062306a36Sopenharmony_ciThere are certain condition-specific optimizations the TLS ULP can make,
22162306a36Sopenharmony_ciif requested. Those optimizations are either not universally beneficial
22262306a36Sopenharmony_cior may impact correctness, hence they require an opt-in.
22362306a36Sopenharmony_ciAll options are set per-socket using setsockopt(), and their
22462306a36Sopenharmony_cistate can be checked using getsockopt() and via socket diag (``ss``).
22562306a36Sopenharmony_ci
22662306a36Sopenharmony_ciTLS_TX_ZEROCOPY_RO
22762306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~
22862306a36Sopenharmony_ci
22962306a36Sopenharmony_ciFor device offload only. Allow sendfile() data to be transmitted directly
23062306a36Sopenharmony_cito the NIC without making an in-kernel copy. This allows true zero-copy
23162306a36Sopenharmony_cibehavior when device offload is enabled.
23262306a36Sopenharmony_ci
23362306a36Sopenharmony_ciThe application must make sure that the data is not modified between being
23462306a36Sopenharmony_cisubmitted and transmission completing. In other words this is mostly
23562306a36Sopenharmony_ciapplicable if the data sent on a socket via sendfile() is read-only.
23662306a36Sopenharmony_ci
23762306a36Sopenharmony_ciModifying the data may result in different versions of the data being used
23862306a36Sopenharmony_cifor the original TCP transmission and TCP retransmissions. To the receiver
23962306a36Sopenharmony_cithis will look like TLS records had been tampered with and will result
24062306a36Sopenharmony_ciin record authentication failures.
24162306a36Sopenharmony_ci
24262306a36Sopenharmony_ciTLS_RX_EXPECT_NO_PAD
24362306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~
24462306a36Sopenharmony_ci
24562306a36Sopenharmony_ciTLS 1.3 only. Expect the sender to not pad records. This allows the data
24662306a36Sopenharmony_cito be decrypted directly into user space buffers with TLS 1.3.
24762306a36Sopenharmony_ci
24862306a36Sopenharmony_ciThis optimization is safe to enable only if the remote end is trusted,
24962306a36Sopenharmony_ciotherwise it is an attack vector to doubling the TLS processing cost.
25062306a36Sopenharmony_ci
25162306a36Sopenharmony_ciIf the record decrypted turns out to had been padded or is not a data
25262306a36Sopenharmony_cirecord it will be decrypted again into a kernel buffer without zero copy.
25362306a36Sopenharmony_ciSuch events are counted in the ``TlsDecryptRetry`` statistic.
25462306a36Sopenharmony_ci
25562306a36Sopenharmony_ciStatistics
25662306a36Sopenharmony_ci==========
25762306a36Sopenharmony_ci
25862306a36Sopenharmony_ciTLS implementation exposes the following per-namespace statistics
25962306a36Sopenharmony_ci(``/proc/net/tls_stat``):
26062306a36Sopenharmony_ci
26162306a36Sopenharmony_ci- ``TlsCurrTxSw``, ``TlsCurrRxSw`` -
26262306a36Sopenharmony_ci  number of TX and RX sessions currently installed where host handles
26362306a36Sopenharmony_ci  cryptography
26462306a36Sopenharmony_ci
26562306a36Sopenharmony_ci- ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` -
26662306a36Sopenharmony_ci  number of TX and RX sessions currently installed where NIC handles
26762306a36Sopenharmony_ci  cryptography
26862306a36Sopenharmony_ci
26962306a36Sopenharmony_ci- ``TlsTxSw``, ``TlsRxSw`` -
27062306a36Sopenharmony_ci  number of TX and RX sessions opened with host cryptography
27162306a36Sopenharmony_ci
27262306a36Sopenharmony_ci- ``TlsTxDevice``, ``TlsRxDevice`` -
27362306a36Sopenharmony_ci  number of TX and RX sessions opened with NIC cryptography
27462306a36Sopenharmony_ci
27562306a36Sopenharmony_ci- ``TlsDecryptError`` -
27662306a36Sopenharmony_ci  record decryption failed (e.g. due to incorrect authentication tag)
27762306a36Sopenharmony_ci
27862306a36Sopenharmony_ci- ``TlsDeviceRxResync`` -
27962306a36Sopenharmony_ci  number of RX resyncs sent to NICs handling cryptography
28062306a36Sopenharmony_ci
28162306a36Sopenharmony_ci- ``TlsDecryptRetry`` -
28262306a36Sopenharmony_ci  number of RX records which had to be re-decrypted due to
28362306a36Sopenharmony_ci  ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. Note that this counter will
28462306a36Sopenharmony_ci  also increment for non-data records.
28562306a36Sopenharmony_ci
28662306a36Sopenharmony_ci- ``TlsRxNoPadViolation`` -
28762306a36Sopenharmony_ci  number of data RX records which had to be re-decrypted due to
28862306a36Sopenharmony_ci  ``TLS_RX_EXPECT_NO_PAD`` mis-prediction.
289