162306a36Sopenharmony_ci.. _kernel_tls: 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci========== 462306a36Sopenharmony_ciKernel TLS 562306a36Sopenharmony_ci========== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciOverview 862306a36Sopenharmony_ci======== 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciTransport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over 1162306a36Sopenharmony_ciTCP. TLS provides end-to-end data integrity and confidentiality. 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciUser interface 1462306a36Sopenharmony_ci============== 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ciCreating a TLS connection 1762306a36Sopenharmony_ci------------------------- 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciFirst create a new TCP socket and set the TLS ULP. 2062306a36Sopenharmony_ci 2162306a36Sopenharmony_ci.. code-block:: c 2262306a36Sopenharmony_ci 2362306a36Sopenharmony_ci sock = socket(AF_INET, SOCK_STREAM, 0); 2462306a36Sopenharmony_ci setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ciSetting the TLS ULP allows us to set/get TLS socket options. Currently 2762306a36Sopenharmony_cionly the symmetric encryption is handled in the kernel. After the TLS 2862306a36Sopenharmony_cihandshake is complete, we have all the parameters required to move the 2962306a36Sopenharmony_cidata-path to the kernel. There is a separate socket option for moving 3062306a36Sopenharmony_cithe transmit and the receive into the kernel. 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ci.. code-block:: c 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ci /* From linux/tls.h */ 3562306a36Sopenharmony_ci struct tls_crypto_info { 3662306a36Sopenharmony_ci unsigned short version; 3762306a36Sopenharmony_ci unsigned short cipher_type; 3862306a36Sopenharmony_ci }; 3962306a36Sopenharmony_ci 4062306a36Sopenharmony_ci struct tls12_crypto_info_aes_gcm_128 { 4162306a36Sopenharmony_ci struct tls_crypto_info info; 4262306a36Sopenharmony_ci unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE]; 4362306a36Sopenharmony_ci unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE]; 4462306a36Sopenharmony_ci unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE]; 4562306a36Sopenharmony_ci unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE]; 4662306a36Sopenharmony_ci }; 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_ci struct tls12_crypto_info_aes_gcm_128 crypto_info; 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ci crypto_info.info.version = TLS_1_2_VERSION; 5262306a36Sopenharmony_ci crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128; 5362306a36Sopenharmony_ci memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE); 5462306a36Sopenharmony_ci memcpy(crypto_info.rec_seq, seq_number_write, 5562306a36Sopenharmony_ci TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); 5662306a36Sopenharmony_ci memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE); 5762306a36Sopenharmony_ci memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE); 5862306a36Sopenharmony_ci 5962306a36Sopenharmony_ci setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)); 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ciTransmit and receive are set separately, but the setup is the same, using either 6262306a36Sopenharmony_ciTLS_TX or TLS_RX. 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ciSending TLS application data 6562306a36Sopenharmony_ci---------------------------- 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ciAfter setting the TLS_TX socket option all application data sent over this 6862306a36Sopenharmony_cisocket is encrypted using TLS and the parameters provided in the socket option. 6962306a36Sopenharmony_ciFor example, we can send an encrypted hello world record as follows: 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ci.. code-block:: c 7262306a36Sopenharmony_ci 7362306a36Sopenharmony_ci const char *msg = "hello world\n"; 7462306a36Sopenharmony_ci send(sock, msg, strlen(msg)); 7562306a36Sopenharmony_ci 7662306a36Sopenharmony_cisend() data is directly encrypted from the userspace buffer provided 7762306a36Sopenharmony_cito the encrypted kernel send buffer if possible. 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ciThe sendfile system call will send the file's data over TLS records of maximum 8062306a36Sopenharmony_cilength (2^14). 8162306a36Sopenharmony_ci 8262306a36Sopenharmony_ci.. code-block:: c 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ci file = open(filename, O_RDONLY); 8562306a36Sopenharmony_ci fstat(file, &stat); 8662306a36Sopenharmony_ci sendfile(sock, file, &offset, stat.st_size); 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ciTLS records are created and sent after each send() call, unless 8962306a36Sopenharmony_ciMSG_MORE is passed. MSG_MORE will delay creation of a record until 9062306a36Sopenharmony_ciMSG_MORE is not passed, or the maximum record size is reached. 9162306a36Sopenharmony_ci 9262306a36Sopenharmony_ciThe kernel will need to allocate a buffer for the encrypted data. 9362306a36Sopenharmony_ciThis buffer is allocated at the time send() is called, such that 9462306a36Sopenharmony_cieither the entire send() call will return -ENOMEM (or block waiting 9562306a36Sopenharmony_cifor memory), or the encryption will always succeed. If send() returns 9662306a36Sopenharmony_ci-ENOMEM and some data was left on the socket buffer from a previous 9762306a36Sopenharmony_cicall using MSG_MORE, the MSG_MORE data is left on the socket buffer. 9862306a36Sopenharmony_ci 9962306a36Sopenharmony_ciReceiving TLS application data 10062306a36Sopenharmony_ci------------------------------ 10162306a36Sopenharmony_ci 10262306a36Sopenharmony_ciAfter setting the TLS_RX socket option, all recv family socket calls 10362306a36Sopenharmony_ciare decrypted using TLS parameters provided. A full TLS record must 10462306a36Sopenharmony_cibe received before decryption can happen. 10562306a36Sopenharmony_ci 10662306a36Sopenharmony_ci.. code-block:: c 10762306a36Sopenharmony_ci 10862306a36Sopenharmony_ci char buffer[16384]; 10962306a36Sopenharmony_ci recv(sock, buffer, 16384); 11062306a36Sopenharmony_ci 11162306a36Sopenharmony_ciReceived data is decrypted directly in to the user buffer if it is 11262306a36Sopenharmony_cilarge enough, and no additional allocations occur. If the userspace 11362306a36Sopenharmony_cibuffer is too small, data is decrypted in the kernel and copied to 11462306a36Sopenharmony_ciuserspace. 11562306a36Sopenharmony_ci 11662306a36Sopenharmony_ci``EINVAL`` is returned if the TLS version in the received message does not 11762306a36Sopenharmony_cimatch the version passed in setsockopt. 11862306a36Sopenharmony_ci 11962306a36Sopenharmony_ci``EMSGSIZE`` is returned if the received message is too big. 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ci``EBADMSG`` is returned if decryption failed for any other reason. 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ciSend TLS control messages 12462306a36Sopenharmony_ci------------------------- 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ciOther than application data, TLS has control messages such as alert 12762306a36Sopenharmony_cimessages (record type 21) and handshake messages (record type 22), etc. 12862306a36Sopenharmony_ciThese messages can be sent over the socket by providing the TLS record type 12962306a36Sopenharmony_civia a CMSG. For example the following function sends @data of @length bytes 13062306a36Sopenharmony_ciusing a record of type @record_type. 13162306a36Sopenharmony_ci 13262306a36Sopenharmony_ci.. code-block:: c 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ci /* send TLS control message using record_type */ 13562306a36Sopenharmony_ci static int klts_send_ctrl_message(int sock, unsigned char record_type, 13662306a36Sopenharmony_ci void *data, size_t length) 13762306a36Sopenharmony_ci { 13862306a36Sopenharmony_ci struct msghdr msg = {0}; 13962306a36Sopenharmony_ci int cmsg_len = sizeof(record_type); 14062306a36Sopenharmony_ci struct cmsghdr *cmsg; 14162306a36Sopenharmony_ci char buf[CMSG_SPACE(cmsg_len)]; 14262306a36Sopenharmony_ci struct iovec msg_iov; /* Vector of data to send/receive into. */ 14362306a36Sopenharmony_ci 14462306a36Sopenharmony_ci msg.msg_control = buf; 14562306a36Sopenharmony_ci msg.msg_controllen = sizeof(buf); 14662306a36Sopenharmony_ci cmsg = CMSG_FIRSTHDR(&msg); 14762306a36Sopenharmony_ci cmsg->cmsg_level = SOL_TLS; 14862306a36Sopenharmony_ci cmsg->cmsg_type = TLS_SET_RECORD_TYPE; 14962306a36Sopenharmony_ci cmsg->cmsg_len = CMSG_LEN(cmsg_len); 15062306a36Sopenharmony_ci *CMSG_DATA(cmsg) = record_type; 15162306a36Sopenharmony_ci msg.msg_controllen = cmsg->cmsg_len; 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ci msg_iov.iov_base = data; 15462306a36Sopenharmony_ci msg_iov.iov_len = length; 15562306a36Sopenharmony_ci msg.msg_iov = &msg_iov; 15662306a36Sopenharmony_ci msg.msg_iovlen = 1; 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ci return sendmsg(sock, &msg, 0); 15962306a36Sopenharmony_ci } 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ciControl message data should be provided unencrypted, and will be 16262306a36Sopenharmony_ciencrypted by the kernel. 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ciReceiving TLS control messages 16562306a36Sopenharmony_ci------------------------------ 16662306a36Sopenharmony_ci 16762306a36Sopenharmony_ciTLS control messages are passed in the userspace buffer, with message 16862306a36Sopenharmony_citype passed via cmsg. If no cmsg buffer is provided, an error is 16962306a36Sopenharmony_cireturned if a control message is received. Data messages may be 17062306a36Sopenharmony_cireceived without a cmsg buffer set. 17162306a36Sopenharmony_ci 17262306a36Sopenharmony_ci.. code-block:: c 17362306a36Sopenharmony_ci 17462306a36Sopenharmony_ci char buffer[16384]; 17562306a36Sopenharmony_ci char cmsg[CMSG_SPACE(sizeof(unsigned char))]; 17662306a36Sopenharmony_ci struct msghdr msg = {0}; 17762306a36Sopenharmony_ci msg.msg_control = cmsg; 17862306a36Sopenharmony_ci msg.msg_controllen = sizeof(cmsg); 17962306a36Sopenharmony_ci 18062306a36Sopenharmony_ci struct iovec msg_iov; 18162306a36Sopenharmony_ci msg_iov.iov_base = buffer; 18262306a36Sopenharmony_ci msg_iov.iov_len = 16384; 18362306a36Sopenharmony_ci 18462306a36Sopenharmony_ci msg.msg_iov = &msg_iov; 18562306a36Sopenharmony_ci msg.msg_iovlen = 1; 18662306a36Sopenharmony_ci 18762306a36Sopenharmony_ci int ret = recvmsg(sock, &msg, 0 /* flags */); 18862306a36Sopenharmony_ci 18962306a36Sopenharmony_ci struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); 19062306a36Sopenharmony_ci if (cmsg->cmsg_level == SOL_TLS && 19162306a36Sopenharmony_ci cmsg->cmsg_type == TLS_GET_RECORD_TYPE) { 19262306a36Sopenharmony_ci int record_type = *((unsigned char *)CMSG_DATA(cmsg)); 19362306a36Sopenharmony_ci // Do something with record_type, and control message data in 19462306a36Sopenharmony_ci // buffer. 19562306a36Sopenharmony_ci // 19662306a36Sopenharmony_ci // Note that record_type may be == to application data (23). 19762306a36Sopenharmony_ci } else { 19862306a36Sopenharmony_ci // Buffer contains application data. 19962306a36Sopenharmony_ci } 20062306a36Sopenharmony_ci 20162306a36Sopenharmony_cirecv will never return data from mixed types of TLS records. 20262306a36Sopenharmony_ci 20362306a36Sopenharmony_ciIntegrating in to userspace TLS library 20462306a36Sopenharmony_ci--------------------------------------- 20562306a36Sopenharmony_ci 20662306a36Sopenharmony_ciAt a high level, the kernel TLS ULP is a replacement for the record 20762306a36Sopenharmony_cilayer of a userspace TLS library. 20862306a36Sopenharmony_ci 20962306a36Sopenharmony_ciA patchset to OpenSSL to use ktls as the record layer is 21062306a36Sopenharmony_ci`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_. 21162306a36Sopenharmony_ci 21262306a36Sopenharmony_ci`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_ 21362306a36Sopenharmony_ciof calling send directly after a handshake using gnutls. 21462306a36Sopenharmony_ciSince it doesn't implement a full record layer, control 21562306a36Sopenharmony_cimessages are not supported. 21662306a36Sopenharmony_ci 21762306a36Sopenharmony_ciOptional optimizations 21862306a36Sopenharmony_ci---------------------- 21962306a36Sopenharmony_ci 22062306a36Sopenharmony_ciThere are certain condition-specific optimizations the TLS ULP can make, 22162306a36Sopenharmony_ciif requested. Those optimizations are either not universally beneficial 22262306a36Sopenharmony_cior may impact correctness, hence they require an opt-in. 22362306a36Sopenharmony_ciAll options are set per-socket using setsockopt(), and their 22462306a36Sopenharmony_cistate can be checked using getsockopt() and via socket diag (``ss``). 22562306a36Sopenharmony_ci 22662306a36Sopenharmony_ciTLS_TX_ZEROCOPY_RO 22762306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~ 22862306a36Sopenharmony_ci 22962306a36Sopenharmony_ciFor device offload only. Allow sendfile() data to be transmitted directly 23062306a36Sopenharmony_cito the NIC without making an in-kernel copy. This allows true zero-copy 23162306a36Sopenharmony_cibehavior when device offload is enabled. 23262306a36Sopenharmony_ci 23362306a36Sopenharmony_ciThe application must make sure that the data is not modified between being 23462306a36Sopenharmony_cisubmitted and transmission completing. In other words this is mostly 23562306a36Sopenharmony_ciapplicable if the data sent on a socket via sendfile() is read-only. 23662306a36Sopenharmony_ci 23762306a36Sopenharmony_ciModifying the data may result in different versions of the data being used 23862306a36Sopenharmony_cifor the original TCP transmission and TCP retransmissions. To the receiver 23962306a36Sopenharmony_cithis will look like TLS records had been tampered with and will result 24062306a36Sopenharmony_ciin record authentication failures. 24162306a36Sopenharmony_ci 24262306a36Sopenharmony_ciTLS_RX_EXPECT_NO_PAD 24362306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~ 24462306a36Sopenharmony_ci 24562306a36Sopenharmony_ciTLS 1.3 only. Expect the sender to not pad records. This allows the data 24662306a36Sopenharmony_cito be decrypted directly into user space buffers with TLS 1.3. 24762306a36Sopenharmony_ci 24862306a36Sopenharmony_ciThis optimization is safe to enable only if the remote end is trusted, 24962306a36Sopenharmony_ciotherwise it is an attack vector to doubling the TLS processing cost. 25062306a36Sopenharmony_ci 25162306a36Sopenharmony_ciIf the record decrypted turns out to had been padded or is not a data 25262306a36Sopenharmony_cirecord it will be decrypted again into a kernel buffer without zero copy. 25362306a36Sopenharmony_ciSuch events are counted in the ``TlsDecryptRetry`` statistic. 25462306a36Sopenharmony_ci 25562306a36Sopenharmony_ciStatistics 25662306a36Sopenharmony_ci========== 25762306a36Sopenharmony_ci 25862306a36Sopenharmony_ciTLS implementation exposes the following per-namespace statistics 25962306a36Sopenharmony_ci(``/proc/net/tls_stat``): 26062306a36Sopenharmony_ci 26162306a36Sopenharmony_ci- ``TlsCurrTxSw``, ``TlsCurrRxSw`` - 26262306a36Sopenharmony_ci number of TX and RX sessions currently installed where host handles 26362306a36Sopenharmony_ci cryptography 26462306a36Sopenharmony_ci 26562306a36Sopenharmony_ci- ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` - 26662306a36Sopenharmony_ci number of TX and RX sessions currently installed where NIC handles 26762306a36Sopenharmony_ci cryptography 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ci- ``TlsTxSw``, ``TlsRxSw`` - 27062306a36Sopenharmony_ci number of TX and RX sessions opened with host cryptography 27162306a36Sopenharmony_ci 27262306a36Sopenharmony_ci- ``TlsTxDevice``, ``TlsRxDevice`` - 27362306a36Sopenharmony_ci number of TX and RX sessions opened with NIC cryptography 27462306a36Sopenharmony_ci 27562306a36Sopenharmony_ci- ``TlsDecryptError`` - 27662306a36Sopenharmony_ci record decryption failed (e.g. due to incorrect authentication tag) 27762306a36Sopenharmony_ci 27862306a36Sopenharmony_ci- ``TlsDeviceRxResync`` - 27962306a36Sopenharmony_ci number of RX resyncs sent to NICs handling cryptography 28062306a36Sopenharmony_ci 28162306a36Sopenharmony_ci- ``TlsDecryptRetry`` - 28262306a36Sopenharmony_ci number of RX records which had to be re-decrypted due to 28362306a36Sopenharmony_ci ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. Note that this counter will 28462306a36Sopenharmony_ci also increment for non-data records. 28562306a36Sopenharmony_ci 28662306a36Sopenharmony_ci- ``TlsRxNoPadViolation`` - 28762306a36Sopenharmony_ci number of data RX records which had to be re-decrypted due to 28862306a36Sopenharmony_ci ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. 289