162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci============================= 462306a36Sopenharmony_ciKernel Connection Multiplexor 562306a36Sopenharmony_ci============================= 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciKernel Connection Multiplexor (KCM) is a mechanism that provides a message based 862306a36Sopenharmony_ciinterface over TCP for generic application protocols. With KCM an application 962306a36Sopenharmony_cican efficiently send and receive application protocol messages over TCP using 1062306a36Sopenharmony_cidatagram sockets. 1162306a36Sopenharmony_ci 1262306a36Sopenharmony_ciKCM implements an NxM multiplexor in the kernel as diagrammed below:: 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ci +------------+ +------------+ +------------+ +------------+ 1562306a36Sopenharmony_ci | KCM socket | | KCM socket | | KCM socket | | KCM socket | 1662306a36Sopenharmony_ci +------------+ +------------+ +------------+ +------------+ 1762306a36Sopenharmony_ci | | | | 1862306a36Sopenharmony_ci +-----------+ | | +----------+ 1962306a36Sopenharmony_ci | | | | 2062306a36Sopenharmony_ci +----------------------------------+ 2162306a36Sopenharmony_ci | Multiplexor | 2262306a36Sopenharmony_ci +----------------------------------+ 2362306a36Sopenharmony_ci | | | | | 2462306a36Sopenharmony_ci +---------+ | | | ------------+ 2562306a36Sopenharmony_ci | | | | | 2662306a36Sopenharmony_ci +----------+ +----------+ +----------+ +----------+ +----------+ 2762306a36Sopenharmony_ci | Psock | | Psock | | Psock | | Psock | | Psock | 2862306a36Sopenharmony_ci +----------+ +----------+ +----------+ +----------+ +----------+ 2962306a36Sopenharmony_ci | | | | | 3062306a36Sopenharmony_ci +----------+ +----------+ +----------+ +----------+ +----------+ 3162306a36Sopenharmony_ci | TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock | 3262306a36Sopenharmony_ci +----------+ +----------+ +----------+ +----------+ +----------+ 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ciKCM sockets 3562306a36Sopenharmony_ci=========== 3662306a36Sopenharmony_ci 3762306a36Sopenharmony_ciThe KCM sockets provide the user interface to the multiplexor. All the KCM sockets 3862306a36Sopenharmony_cibound to a multiplexor are considered to have equivalent function, and I/O 3962306a36Sopenharmony_cioperations in different sockets may be done in parallel without the need for 4062306a36Sopenharmony_cisynchronization between threads in userspace. 4162306a36Sopenharmony_ci 4262306a36Sopenharmony_ciMultiplexor 4362306a36Sopenharmony_ci=========== 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ciThe multiplexor provides the message steering. In the transmit path, messages 4662306a36Sopenharmony_ciwritten on a KCM socket are sent atomically on an appropriate TCP socket. 4762306a36Sopenharmony_ciSimilarly, in the receive path, messages are constructed on each TCP socket 4862306a36Sopenharmony_ci(Psock) and complete messages are steered to a KCM socket. 4962306a36Sopenharmony_ci 5062306a36Sopenharmony_ciTCP sockets & Psocks 5162306a36Sopenharmony_ci==================== 5262306a36Sopenharmony_ci 5362306a36Sopenharmony_ciTCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated 5462306a36Sopenharmony_cifor each bound TCP socket, this structure holds the state for constructing 5562306a36Sopenharmony_cimessages on receive as well as other connection specific information for KCM. 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ciConnected mode semantics 5862306a36Sopenharmony_ci======================== 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciEach multiplexor assumes that all attached TCP connections are to the same 6162306a36Sopenharmony_cidestination and can use the different connections for load balancing when 6262306a36Sopenharmony_citransmitting. The normal send and recv calls (include sendmmsg and recvmmsg) 6362306a36Sopenharmony_cican be used to send and receive messages from the KCM socket. 6462306a36Sopenharmony_ci 6562306a36Sopenharmony_ciSocket types 6662306a36Sopenharmony_ci============ 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ciKCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types. 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_ciMessage delineation 7162306a36Sopenharmony_ci------------------- 7262306a36Sopenharmony_ci 7362306a36Sopenharmony_ciMessages are sent over a TCP stream with some application protocol message 7462306a36Sopenharmony_ciformat that typically includes a header which frames the messages. The length 7562306a36Sopenharmony_ciof a received message can be deduced from the application protocol header 7662306a36Sopenharmony_ci(often just a simple length field). 7762306a36Sopenharmony_ci 7862306a36Sopenharmony_ciA TCP stream must be parsed to determine message boundaries. Berkeley Packet 7962306a36Sopenharmony_ciFilter (BPF) is used for this. When attaching a TCP socket to a multiplexor a 8062306a36Sopenharmony_ciBPF program must be specified. The program is called at the start of receiving 8162306a36Sopenharmony_cia new message and is given an skbuff that contains the bytes received so far. 8262306a36Sopenharmony_ciIt parses the message header and returns the length of the message. Given this 8362306a36Sopenharmony_ciinformation, KCM will construct the message of the stated length and deliver it 8462306a36Sopenharmony_cito a KCM socket. 8562306a36Sopenharmony_ci 8662306a36Sopenharmony_ciTCP socket management 8762306a36Sopenharmony_ci--------------------- 8862306a36Sopenharmony_ci 8962306a36Sopenharmony_ciWhen a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and 9062306a36Sopenharmony_ciwrite space available (POLLOUT) events are handled by the multiplexor. If there 9162306a36Sopenharmony_ciis a state change (disconnection) or other error on a TCP socket, an error is 9262306a36Sopenharmony_ciposted on the TCP socket so that a POLLERR event happens and KCM discontinues 9362306a36Sopenharmony_ciusing the socket. When the application gets the error notification for a 9462306a36Sopenharmony_ciTCP socket, it should unattach the socket from KCM and then handle the error 9562306a36Sopenharmony_cicondition (the typical response is to close the socket and create a new 9662306a36Sopenharmony_ciconnection if necessary). 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciKCM limits the maximum receive message size to be the size of the receive 9962306a36Sopenharmony_cisocket buffer on the attached TCP socket (the socket buffer size can be set by 10062306a36Sopenharmony_ciSO_RCVBUF). If the length of a new message reported by the BPF program is 10162306a36Sopenharmony_cigreater than this limit a corresponding error (EMSGSIZE) is posted on the TCP 10262306a36Sopenharmony_cisocket. The BPF program may also enforce a maximum messages size and report an 10362306a36Sopenharmony_cierror when it is exceeded. 10462306a36Sopenharmony_ci 10562306a36Sopenharmony_ciA timeout may be set for assembling messages on a receive socket. The timeout 10662306a36Sopenharmony_civalue is taken from the receive timeout of the attached TCP socket (this is set 10762306a36Sopenharmony_ciby SO_RCVTIMEO). If the timer expires before assembly is complete an error 10862306a36Sopenharmony_ci(ETIMEDOUT) is posted on the socket. 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ciUser interface 11162306a36Sopenharmony_ci============== 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ciCreating a multiplexor 11462306a36Sopenharmony_ci---------------------- 11562306a36Sopenharmony_ci 11662306a36Sopenharmony_ciA new multiplexor and initial KCM socket is created by a socket call:: 11762306a36Sopenharmony_ci 11862306a36Sopenharmony_ci socket(AF_KCM, type, protocol) 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ci- type is either SOCK_DGRAM or SOCK_SEQPACKET 12162306a36Sopenharmony_ci- protocol is KCMPROTO_CONNECTED 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ciCloning KCM sockets 12462306a36Sopenharmony_ci------------------- 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ciAfter the first KCM socket is created using the socket call as described 12762306a36Sopenharmony_ciabove, additional sockets for the multiplexor can be created by cloning 12862306a36Sopenharmony_cia KCM socket. This is accomplished by an ioctl on a KCM socket:: 12962306a36Sopenharmony_ci 13062306a36Sopenharmony_ci /* From linux/kcm.h */ 13162306a36Sopenharmony_ci struct kcm_clone { 13262306a36Sopenharmony_ci int fd; 13362306a36Sopenharmony_ci }; 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ci struct kcm_clone info; 13662306a36Sopenharmony_ci 13762306a36Sopenharmony_ci memset(&info, 0, sizeof(info)); 13862306a36Sopenharmony_ci 13962306a36Sopenharmony_ci err = ioctl(kcmfd, SIOCKCMCLONE, &info); 14062306a36Sopenharmony_ci 14162306a36Sopenharmony_ci if (!err) 14262306a36Sopenharmony_ci newkcmfd = info.fd; 14362306a36Sopenharmony_ci 14462306a36Sopenharmony_ciAttach transport sockets 14562306a36Sopenharmony_ci------------------------ 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ciAttaching of transport sockets to a multiplexor is performed by calling an 14862306a36Sopenharmony_ciioctl on a KCM socket for the multiplexor. e.g.:: 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ci /* From linux/kcm.h */ 15162306a36Sopenharmony_ci struct kcm_attach { 15262306a36Sopenharmony_ci int fd; 15362306a36Sopenharmony_ci int bpf_fd; 15462306a36Sopenharmony_ci }; 15562306a36Sopenharmony_ci 15662306a36Sopenharmony_ci struct kcm_attach info; 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ci memset(&info, 0, sizeof(info)); 15962306a36Sopenharmony_ci 16062306a36Sopenharmony_ci info.fd = tcpfd; 16162306a36Sopenharmony_ci info.bpf_fd = bpf_prog_fd; 16262306a36Sopenharmony_ci 16362306a36Sopenharmony_ci ioctl(kcmfd, SIOCKCMATTACH, &info); 16462306a36Sopenharmony_ci 16562306a36Sopenharmony_ciThe kcm_attach structure contains: 16662306a36Sopenharmony_ci 16762306a36Sopenharmony_ci - fd: file descriptor for TCP socket being attached 16862306a36Sopenharmony_ci - bpf_prog_fd: file descriptor for compiled BPF program downloaded 16962306a36Sopenharmony_ci 17062306a36Sopenharmony_ciUnattach transport sockets 17162306a36Sopenharmony_ci-------------------------- 17262306a36Sopenharmony_ci 17362306a36Sopenharmony_ciUnattaching a transport socket from a multiplexor is straightforward. An 17462306a36Sopenharmony_ci"unattach" ioctl is done with the kcm_unattach structure as the argument:: 17562306a36Sopenharmony_ci 17662306a36Sopenharmony_ci /* From linux/kcm.h */ 17762306a36Sopenharmony_ci struct kcm_unattach { 17862306a36Sopenharmony_ci int fd; 17962306a36Sopenharmony_ci }; 18062306a36Sopenharmony_ci 18162306a36Sopenharmony_ci struct kcm_unattach info; 18262306a36Sopenharmony_ci 18362306a36Sopenharmony_ci memset(&info, 0, sizeof(info)); 18462306a36Sopenharmony_ci 18562306a36Sopenharmony_ci info.fd = cfd; 18662306a36Sopenharmony_ci 18762306a36Sopenharmony_ci ioctl(fd, SIOCKCMUNATTACH, &info); 18862306a36Sopenharmony_ci 18962306a36Sopenharmony_ciDisabling receive on KCM socket 19062306a36Sopenharmony_ci------------------------------- 19162306a36Sopenharmony_ci 19262306a36Sopenharmony_ciA setsockopt is used to disable or enable receiving on a KCM socket. 19362306a36Sopenharmony_ciWhen receive is disabled, any pending messages in the socket's 19462306a36Sopenharmony_cireceive buffer are moved to other sockets. This feature is useful 19562306a36Sopenharmony_ciif an application thread knows that it will be doing a lot of 19662306a36Sopenharmony_ciwork on a request and won't be able to service new messages for a 19762306a36Sopenharmony_ciwhile. Example use:: 19862306a36Sopenharmony_ci 19962306a36Sopenharmony_ci int val = 1; 20062306a36Sopenharmony_ci 20162306a36Sopenharmony_ci setsockopt(kcmfd, SOL_KCM, KCM_RECV_DISABLE, &val, sizeof(val)) 20262306a36Sopenharmony_ci 20362306a36Sopenharmony_ciBFP programs for message delineation 20462306a36Sopenharmony_ci------------------------------------ 20562306a36Sopenharmony_ci 20662306a36Sopenharmony_ciBPF programs can be compiled using the BPF LLVM backend. For example, 20762306a36Sopenharmony_cithe BPF program for parsing Thrift is:: 20862306a36Sopenharmony_ci 20962306a36Sopenharmony_ci #include "bpf.h" /* for __sk_buff */ 21062306a36Sopenharmony_ci #include "bpf_helpers.h" /* for load_word intrinsic */ 21162306a36Sopenharmony_ci 21262306a36Sopenharmony_ci SEC("socket_kcm") 21362306a36Sopenharmony_ci int bpf_prog1(struct __sk_buff *skb) 21462306a36Sopenharmony_ci { 21562306a36Sopenharmony_ci return load_word(skb, 0) + 4; 21662306a36Sopenharmony_ci } 21762306a36Sopenharmony_ci 21862306a36Sopenharmony_ci char _license[] SEC("license") = "GPL"; 21962306a36Sopenharmony_ci 22062306a36Sopenharmony_ciUse in applications 22162306a36Sopenharmony_ci=================== 22262306a36Sopenharmony_ci 22362306a36Sopenharmony_ciKCM accelerates application layer protocols. Specifically, it allows 22462306a36Sopenharmony_ciapplications to use a message based interface for sending and receiving 22562306a36Sopenharmony_cimessages. The kernel provides necessary assurances that messages are sent 22662306a36Sopenharmony_ciand received atomically. This relieves much of the burden applications have 22762306a36Sopenharmony_ciin mapping a message based protocol onto the TCP stream. KCM also make 22862306a36Sopenharmony_ciapplication layer messages a unit of work in the kernel for the purposes of 22962306a36Sopenharmony_cisteering and scheduling, which in turn allows a simpler networking model in 23062306a36Sopenharmony_cimultithreaded applications. 23162306a36Sopenharmony_ci 23262306a36Sopenharmony_ciConfigurations 23362306a36Sopenharmony_ci-------------- 23462306a36Sopenharmony_ci 23562306a36Sopenharmony_ciIn an Nx1 configuration, KCM logically provides multiple socket handles 23662306a36Sopenharmony_cito the same TCP connection. This allows parallelism between in I/O 23762306a36Sopenharmony_cioperations on the TCP socket (for instance copyin and copyout of data is 23862306a36Sopenharmony_ciparallelized). In an application, a KCM socket can be opened for each 23962306a36Sopenharmony_ciprocessing thread and inserted into the epoll (similar to how SO_REUSEPORT 24062306a36Sopenharmony_ciis used to allow multiple listener sockets on the same port). 24162306a36Sopenharmony_ci 24262306a36Sopenharmony_ciIn a MxN configuration, multiple connections are established to the 24362306a36Sopenharmony_cisame destination. These are used for simple load balancing. 24462306a36Sopenharmony_ci 24562306a36Sopenharmony_ciMessage batching 24662306a36Sopenharmony_ci---------------- 24762306a36Sopenharmony_ci 24862306a36Sopenharmony_ciThe primary purpose of KCM is load balancing between KCM sockets and hence 24962306a36Sopenharmony_cithreads in a nominal use case. Perfect load balancing, that is steering 25062306a36Sopenharmony_cieach received message to a different KCM socket or steering each sent 25162306a36Sopenharmony_cimessage to a different TCP socket, can negatively impact performance 25262306a36Sopenharmony_cisince this doesn't allow for affinities to be established. Balancing 25362306a36Sopenharmony_cibased on groups, or batches of messages, can be beneficial for performance. 25462306a36Sopenharmony_ci 25562306a36Sopenharmony_ciOn transmit, there are three ways an application can batch (pipeline) 25662306a36Sopenharmony_cimessages on a KCM socket. 25762306a36Sopenharmony_ci 25862306a36Sopenharmony_ci 1) Send multiple messages in a single sendmmsg. 25962306a36Sopenharmony_ci 2) Send a group of messages each with a sendmsg call, where all messages 26062306a36Sopenharmony_ci except the last have MSG_BATCH in the flags of sendmsg call. 26162306a36Sopenharmony_ci 3) Create "super message" composed of multiple messages and send this 26262306a36Sopenharmony_ci with a single sendmsg. 26362306a36Sopenharmony_ci 26462306a36Sopenharmony_ciOn receive, the KCM module attempts to queue messages received on the 26562306a36Sopenharmony_cisame KCM socket during each TCP ready callback. The targeted KCM socket 26662306a36Sopenharmony_cichanges at each receive ready callback on the KCM socket. The application 26762306a36Sopenharmony_cidoes not need to configure this. 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ciError handling 27062306a36Sopenharmony_ci-------------- 27162306a36Sopenharmony_ci 27262306a36Sopenharmony_ciAn application should include a thread to monitor errors raised on 27362306a36Sopenharmony_cithe TCP connection. Normally, this will be done by placing each 27462306a36Sopenharmony_ciTCP socket attached to a KCM multiplexor in epoll set for POLLERR 27562306a36Sopenharmony_cievent. If an error occurs on an attached TCP socket, KCM sets an EPIPE 27662306a36Sopenharmony_cion the socket thus waking up the application thread. When the application 27762306a36Sopenharmony_cisees the error (which may just be a disconnect) it should unattach the 27862306a36Sopenharmony_cisocket from KCM and then close it. It is assumed that once an error is 27962306a36Sopenharmony_ciposted on the TCP socket the data stream is unrecoverable (i.e. an error 28062306a36Sopenharmony_cimay have occurred in the middle of receiving a message). 28162306a36Sopenharmony_ci 28262306a36Sopenharmony_ciTCP connection monitoring 28362306a36Sopenharmony_ci------------------------- 28462306a36Sopenharmony_ci 28562306a36Sopenharmony_ciIn KCM there is no means to correlate a message to the TCP socket that 28662306a36Sopenharmony_ciwas used to send or receive the message (except in the case there is 28762306a36Sopenharmony_cionly one attached TCP socket). However, the application does retain 28862306a36Sopenharmony_cian open file descriptor to the socket so it will be able to get statistics 28962306a36Sopenharmony_cifrom the socket which can be used in detecting issues (such as high 29062306a36Sopenharmony_ciretransmissions on the socket). 291