162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci=========== 462306a36Sopenharmony_ciPacket MMAP 562306a36Sopenharmony_ci=========== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciAbstract 862306a36Sopenharmony_ci======== 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciThis file documents the mmap() facility available with the PACKET 1162306a36Sopenharmony_cisocket interface. This type of sockets is used for 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_cii) capture network traffic with utilities like tcpdump, 1462306a36Sopenharmony_ciii) transmit network traffic, or any other that needs raw 1562306a36Sopenharmony_ci access to network interface. 1662306a36Sopenharmony_ci 1762306a36Sopenharmony_ciHowto can be found at: 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ci https://sites.google.com/site/packetmmap/ 2062306a36Sopenharmony_ci 2162306a36Sopenharmony_ciPlease send your comments to 2262306a36Sopenharmony_ci - Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es> 2362306a36Sopenharmony_ci - Johann Baudy 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ciWhy use PACKET_MMAP 2662306a36Sopenharmony_ci=================== 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ciNon PACKET_MMAP capture process (plain AF_PACKET) is very 2962306a36Sopenharmony_ciinefficient. It uses very limited buffers and requires one system call to 3062306a36Sopenharmony_cicapture each packet, it requires two if you want to get packet's timestamp 3162306a36Sopenharmony_ci(like libpcap always does). 3262306a36Sopenharmony_ci 3362306a36Sopenharmony_ciOn the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size 3462306a36Sopenharmony_ciconfigurable circular buffer mapped in user space that can be used to either 3562306a36Sopenharmony_cisend or receive packets. This way reading packets just needs to wait for them, 3662306a36Sopenharmony_cimost of the time there is no need to issue a single system call. Concerning 3762306a36Sopenharmony_citransmission, multiple packets can be sent through one system call to get the 3862306a36Sopenharmony_cihighest bandwidth. By using a shared buffer between the kernel and the user 3962306a36Sopenharmony_cialso has the benefit of minimizing packet copies. 4062306a36Sopenharmony_ci 4162306a36Sopenharmony_ciIt's fine to use PACKET_MMAP to improve the performance of the capture and 4262306a36Sopenharmony_citransmission process, but it isn't everything. At least, if you are capturing 4362306a36Sopenharmony_ciat high speeds (this is relative to the cpu speed), you should check if the 4462306a36Sopenharmony_cidevice driver of your network interface card supports some sort of interrupt 4562306a36Sopenharmony_ciload mitigation or (even better) if it supports NAPI, also make sure it is 4662306a36Sopenharmony_cienabled. For transmission, check the MTU (Maximum Transmission Unit) used and 4762306a36Sopenharmony_cisupported by devices of your network. CPU IRQ pinning of your network interface 4862306a36Sopenharmony_cicard can also be an advantage. 4962306a36Sopenharmony_ci 5062306a36Sopenharmony_ciHow to use mmap() to improve capture process 5162306a36Sopenharmony_ci============================================ 5262306a36Sopenharmony_ci 5362306a36Sopenharmony_ciFrom the user standpoint, you should use the higher level libpcap library, which 5462306a36Sopenharmony_ciis a de facto standard, portable across nearly all operating systems 5562306a36Sopenharmony_ciincluding Win32. 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ciPacket MMAP support was integrated into libpcap around the time of version 1.3.0; 5862306a36Sopenharmony_ciTPACKET_V3 support was added in version 1.5.0 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciHow to use mmap() directly to improve capture process 6162306a36Sopenharmony_ci===================================================== 6262306a36Sopenharmony_ci 6362306a36Sopenharmony_ciFrom the system calls stand point, the use of PACKET_MMAP involves 6462306a36Sopenharmony_cithe following process:: 6562306a36Sopenharmony_ci 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ci [setup] socket() -------> creation of the capture socket 6862306a36Sopenharmony_ci setsockopt() ---> allocation of the circular buffer (ring) 6962306a36Sopenharmony_ci option: PACKET_RX_RING 7062306a36Sopenharmony_ci mmap() ---------> mapping of the allocated buffer to the 7162306a36Sopenharmony_ci user process 7262306a36Sopenharmony_ci 7362306a36Sopenharmony_ci [capture] poll() ---------> to wait for incoming packets 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ci [shutdown] close() --------> destruction of the capture socket and 7662306a36Sopenharmony_ci deallocation of all associated 7762306a36Sopenharmony_ci resources. 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_cisocket creation and destruction is straight forward, and is done 8162306a36Sopenharmony_cithe same way with or without PACKET_MMAP:: 8262306a36Sopenharmony_ci 8362306a36Sopenharmony_ci int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL)); 8462306a36Sopenharmony_ci 8562306a36Sopenharmony_ciwhere mode is SOCK_RAW for the raw interface were link level 8662306a36Sopenharmony_ciinformation can be captured or SOCK_DGRAM for the cooked 8762306a36Sopenharmony_ciinterface where link level information capture is not 8862306a36Sopenharmony_cisupported and a link level pseudo-header is provided 8962306a36Sopenharmony_ciby the kernel. 9062306a36Sopenharmony_ci 9162306a36Sopenharmony_ciThe destruction of the socket and all associated resources 9262306a36Sopenharmony_ciis done by a simple call to close(fd). 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ciSimilarly as without PACKET_MMAP, it is possible to use one socket 9562306a36Sopenharmony_cifor capture and transmission. This can be done by mapping the 9662306a36Sopenharmony_ciallocated RX and TX buffer ring with a single mmap() call. 9762306a36Sopenharmony_ciSee "Mapping and use of the circular buffer (ring)". 9862306a36Sopenharmony_ci 9962306a36Sopenharmony_ciNext I will describe PACKET_MMAP settings and its constraints, 10062306a36Sopenharmony_cialso the mapping of the circular buffer in the user process and 10162306a36Sopenharmony_cithe use of this buffer. 10262306a36Sopenharmony_ci 10362306a36Sopenharmony_ciHow to use mmap() directly to improve transmission process 10462306a36Sopenharmony_ci========================================================== 10562306a36Sopenharmony_ciTransmission process is similar to capture as shown below:: 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ci [setup] socket() -------> creation of the transmission socket 10862306a36Sopenharmony_ci setsockopt() ---> allocation of the circular buffer (ring) 10962306a36Sopenharmony_ci option: PACKET_TX_RING 11062306a36Sopenharmony_ci bind() ---------> bind transmission socket with a network interface 11162306a36Sopenharmony_ci mmap() ---------> mapping of the allocated buffer to the 11262306a36Sopenharmony_ci user process 11362306a36Sopenharmony_ci 11462306a36Sopenharmony_ci [transmission] poll() ---------> wait for free packets (optional) 11562306a36Sopenharmony_ci send() ---------> send all packets that are set as ready in 11662306a36Sopenharmony_ci the ring 11762306a36Sopenharmony_ci The flag MSG_DONTWAIT can be used to return 11862306a36Sopenharmony_ci before end of transfer. 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ci [shutdown] close() --------> destruction of the transmission socket and 12162306a36Sopenharmony_ci deallocation of all associated resources. 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ciSocket creation and destruction is also straight forward, and is done 12462306a36Sopenharmony_cithe same way as in capturing described in the previous paragraph:: 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ci int fd = socket(PF_PACKET, mode, 0); 12762306a36Sopenharmony_ci 12862306a36Sopenharmony_ciThe protocol can optionally be 0 in case we only want to transmit 12962306a36Sopenharmony_civia this socket, which avoids an expensive call to packet_rcv(). 13062306a36Sopenharmony_ciIn this case, you also need to bind(2) the TX_RING with sll_protocol = 0 13162306a36Sopenharmony_ciset. Otherwise, htons(ETH_P_ALL) or any other protocol, for example. 13262306a36Sopenharmony_ci 13362306a36Sopenharmony_ciBinding the socket to your network interface is mandatory (with zero copy) to 13462306a36Sopenharmony_ciknow the header size of frames used in the circular buffer. 13562306a36Sopenharmony_ci 13662306a36Sopenharmony_ciAs capture, each frame contains two parts:: 13762306a36Sopenharmony_ci 13862306a36Sopenharmony_ci -------------------- 13962306a36Sopenharmony_ci | struct tpacket_hdr | Header. It contains the status of 14062306a36Sopenharmony_ci | | of this frame 14162306a36Sopenharmony_ci |--------------------| 14262306a36Sopenharmony_ci | data buffer | 14362306a36Sopenharmony_ci . . Data that will be sent over the network interface. 14462306a36Sopenharmony_ci . . 14562306a36Sopenharmony_ci -------------------- 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ci bind() associates the socket to your network interface thanks to 14862306a36Sopenharmony_ci sll_ifindex parameter of struct sockaddr_ll. 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ci Initialization example:: 15162306a36Sopenharmony_ci 15262306a36Sopenharmony_ci struct sockaddr_ll my_addr; 15362306a36Sopenharmony_ci struct ifreq s_ifr; 15462306a36Sopenharmony_ci ... 15562306a36Sopenharmony_ci 15662306a36Sopenharmony_ci strscpy_pad (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name)); 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ci /* get interface index of eth0 */ 15962306a36Sopenharmony_ci ioctl(this->socket, SIOCGIFINDEX, &s_ifr); 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ci /* fill sockaddr_ll struct to prepare binding */ 16262306a36Sopenharmony_ci my_addr.sll_family = AF_PACKET; 16362306a36Sopenharmony_ci my_addr.sll_protocol = htons(ETH_P_ALL); 16462306a36Sopenharmony_ci my_addr.sll_ifindex = s_ifr.ifr_ifindex; 16562306a36Sopenharmony_ci 16662306a36Sopenharmony_ci /* bind socket to eth0 */ 16762306a36Sopenharmony_ci bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll)); 16862306a36Sopenharmony_ci 16962306a36Sopenharmony_ci A complete tutorial is available at: https://sites.google.com/site/packetmmap/ 17062306a36Sopenharmony_ci 17162306a36Sopenharmony_ciBy default, the user should put data at:: 17262306a36Sopenharmony_ci 17362306a36Sopenharmony_ci frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll) 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ciSo, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW), 17662306a36Sopenharmony_cithe beginning of the user data will be at:: 17762306a36Sopenharmony_ci 17862306a36Sopenharmony_ci frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr)) 17962306a36Sopenharmony_ci 18062306a36Sopenharmony_ciIf you wish to put user data at a custom offset from the beginning of 18162306a36Sopenharmony_cithe frame (for payload alignment with SOCK_RAW mode for instance) you 18262306a36Sopenharmony_cican set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order 18362306a36Sopenharmony_cito make this work it must be enabled previously with setsockopt() 18462306a36Sopenharmony_ciand the PACKET_TX_HAS_OFF option. 18562306a36Sopenharmony_ci 18662306a36Sopenharmony_ciPACKET_MMAP settings 18762306a36Sopenharmony_ci==================== 18862306a36Sopenharmony_ci 18962306a36Sopenharmony_ciTo setup PACKET_MMAP from user level code is done with a call like 19062306a36Sopenharmony_ci 19162306a36Sopenharmony_ci - Capture process:: 19262306a36Sopenharmony_ci 19362306a36Sopenharmony_ci setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req)) 19462306a36Sopenharmony_ci 19562306a36Sopenharmony_ci - Transmission process:: 19662306a36Sopenharmony_ci 19762306a36Sopenharmony_ci setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req)) 19862306a36Sopenharmony_ci 19962306a36Sopenharmony_ciThe most significant argument in the previous call is the req parameter, 20062306a36Sopenharmony_cithis parameter must to have the following structure:: 20162306a36Sopenharmony_ci 20262306a36Sopenharmony_ci struct tpacket_req 20362306a36Sopenharmony_ci { 20462306a36Sopenharmony_ci unsigned int tp_block_size; /* Minimal size of contiguous block */ 20562306a36Sopenharmony_ci unsigned int tp_block_nr; /* Number of blocks */ 20662306a36Sopenharmony_ci unsigned int tp_frame_size; /* Size of frame */ 20762306a36Sopenharmony_ci unsigned int tp_frame_nr; /* Total number of frames */ 20862306a36Sopenharmony_ci }; 20962306a36Sopenharmony_ci 21062306a36Sopenharmony_ciThis structure is defined in /usr/include/linux/if_packet.h and establishes a 21162306a36Sopenharmony_cicircular buffer (ring) of unswappable memory. 21262306a36Sopenharmony_ciBeing mapped in the capture process allows reading the captured frames and 21362306a36Sopenharmony_cirelated meta-information like timestamps without requiring a system call. 21462306a36Sopenharmony_ci 21562306a36Sopenharmony_ciFrames are grouped in blocks. Each block is a physically contiguous 21662306a36Sopenharmony_ciregion of memory and holds tp_block_size/tp_frame_size frames. The total number 21762306a36Sopenharmony_ciof blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because:: 21862306a36Sopenharmony_ci 21962306a36Sopenharmony_ci frames_per_block = tp_block_size/tp_frame_size 22062306a36Sopenharmony_ci 22162306a36Sopenharmony_ciindeed, packet_set_ring checks that the following condition is true:: 22262306a36Sopenharmony_ci 22362306a36Sopenharmony_ci frames_per_block * tp_block_nr == tp_frame_nr 22462306a36Sopenharmony_ci 22562306a36Sopenharmony_ciLets see an example, with the following values:: 22662306a36Sopenharmony_ci 22762306a36Sopenharmony_ci tp_block_size= 4096 22862306a36Sopenharmony_ci tp_frame_size= 2048 22962306a36Sopenharmony_ci tp_block_nr = 4 23062306a36Sopenharmony_ci tp_frame_nr = 8 23162306a36Sopenharmony_ci 23262306a36Sopenharmony_ciwe will get the following buffer structure:: 23362306a36Sopenharmony_ci 23462306a36Sopenharmony_ci block #1 block #2 23562306a36Sopenharmony_ci +---------+---------+ +---------+---------+ 23662306a36Sopenharmony_ci | frame 1 | frame 2 | | frame 3 | frame 4 | 23762306a36Sopenharmony_ci +---------+---------+ +---------+---------+ 23862306a36Sopenharmony_ci 23962306a36Sopenharmony_ci block #3 block #4 24062306a36Sopenharmony_ci +---------+---------+ +---------+---------+ 24162306a36Sopenharmony_ci | frame 5 | frame 6 | | frame 7 | frame 8 | 24262306a36Sopenharmony_ci +---------+---------+ +---------+---------+ 24362306a36Sopenharmony_ci 24462306a36Sopenharmony_ciA frame can be of any size with the only condition it can fit in a block. A block 24562306a36Sopenharmony_cican only hold an integer number of frames, or in other words, a frame cannot 24662306a36Sopenharmony_cibe spawned across two blocks, so there are some details you have to take into 24762306a36Sopenharmony_ciaccount when choosing the frame_size. See "Mapping and use of the circular 24862306a36Sopenharmony_cibuffer (ring)". 24962306a36Sopenharmony_ci 25062306a36Sopenharmony_ciPACKET_MMAP setting constraints 25162306a36Sopenharmony_ci=============================== 25262306a36Sopenharmony_ci 25362306a36Sopenharmony_ciIn kernel versions prior to 2.4.26 (for the 2.4 branch) and 2.6.5 (2.6 branch), 25462306a36Sopenharmony_cithe PACKET_MMAP buffer could hold only 32768 frames in a 32 bit architecture or 25562306a36Sopenharmony_ci16384 in a 64 bit architecture. 25662306a36Sopenharmony_ci 25762306a36Sopenharmony_ciBlock size limit 25862306a36Sopenharmony_ci---------------- 25962306a36Sopenharmony_ci 26062306a36Sopenharmony_ciAs stated earlier, each block is a contiguous physical region of memory. These 26162306a36Sopenharmony_cimemory regions are allocated with calls to the __get_free_pages() function. As 26262306a36Sopenharmony_cithe name indicates, this function allocates pages of memory, and the second 26362306a36Sopenharmony_ciargument is "order" or a power of two number of pages, that is 26462306a36Sopenharmony_ci(for PAGE_SIZE == 4096) order=0 ==> 4096 bytes, order=1 ==> 8192 bytes, 26562306a36Sopenharmony_ciorder=2 ==> 16384 bytes, etc. The maximum size of a 26662306a36Sopenharmony_ciregion allocated by __get_free_pages is determined by the MAX_ORDER macro. More 26762306a36Sopenharmony_ciprecisely the limit can be calculated as:: 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ci PAGE_SIZE << MAX_ORDER 27062306a36Sopenharmony_ci 27162306a36Sopenharmony_ci In a i386 architecture PAGE_SIZE is 4096 bytes 27262306a36Sopenharmony_ci In a 2.4/i386 kernel MAX_ORDER is 10 27362306a36Sopenharmony_ci In a 2.6/i386 kernel MAX_ORDER is 11 27462306a36Sopenharmony_ci 27562306a36Sopenharmony_ciSo get_free_pages can allocate as much as 4MB or 8MB in a 2.4/2.6 kernel 27662306a36Sopenharmony_cirespectively, with an i386 architecture. 27762306a36Sopenharmony_ci 27862306a36Sopenharmony_ciUser space programs can include /usr/include/sys/user.h and 27962306a36Sopenharmony_ci/usr/include/linux/mmzone.h to get PAGE_SIZE MAX_ORDER declarations. 28062306a36Sopenharmony_ci 28162306a36Sopenharmony_ciThe pagesize can also be determined dynamically with the getpagesize (2) 28262306a36Sopenharmony_cisystem call. 28362306a36Sopenharmony_ci 28462306a36Sopenharmony_ciBlock number limit 28562306a36Sopenharmony_ci------------------ 28662306a36Sopenharmony_ci 28762306a36Sopenharmony_ciTo understand the constraints of PACKET_MMAP, we have to see the structure 28862306a36Sopenharmony_ciused to hold the pointers to each block. 28962306a36Sopenharmony_ci 29062306a36Sopenharmony_ciCurrently, this structure is a dynamically allocated vector with kmalloc 29162306a36Sopenharmony_cicalled pg_vec, its size limits the number of blocks that can be allocated:: 29262306a36Sopenharmony_ci 29362306a36Sopenharmony_ci +---+---+---+---+ 29462306a36Sopenharmony_ci | x | x | x | x | 29562306a36Sopenharmony_ci +---+---+---+---+ 29662306a36Sopenharmony_ci | | | | 29762306a36Sopenharmony_ci | | | v 29862306a36Sopenharmony_ci | | v block #4 29962306a36Sopenharmony_ci | v block #3 30062306a36Sopenharmony_ci v block #2 30162306a36Sopenharmony_ci block #1 30262306a36Sopenharmony_ci 30362306a36Sopenharmony_cikmalloc allocates any number of bytes of physically contiguous memory from 30462306a36Sopenharmony_cia pool of pre-determined sizes. This pool of memory is maintained by the slab 30562306a36Sopenharmony_ciallocator which is at the end the responsible for doing the allocation and 30662306a36Sopenharmony_cihence which imposes the maximum memory that kmalloc can allocate. 30762306a36Sopenharmony_ci 30862306a36Sopenharmony_ciIn a 2.4/2.6 kernel and the i386 architecture, the limit is 131072 bytes. The 30962306a36Sopenharmony_cipredetermined sizes that kmalloc uses can be checked in the "size-<bytes>" 31062306a36Sopenharmony_cientries of /proc/slabinfo 31162306a36Sopenharmony_ci 31262306a36Sopenharmony_ciIn a 32 bit architecture, pointers are 4 bytes long, so the total number of 31362306a36Sopenharmony_cipointers to blocks is:: 31462306a36Sopenharmony_ci 31562306a36Sopenharmony_ci 131072/4 = 32768 blocks 31662306a36Sopenharmony_ci 31762306a36Sopenharmony_ciPACKET_MMAP buffer size calculator 31862306a36Sopenharmony_ci================================== 31962306a36Sopenharmony_ci 32062306a36Sopenharmony_ciDefinitions: 32162306a36Sopenharmony_ci 32262306a36Sopenharmony_ci============== ================================================================ 32362306a36Sopenharmony_ci<size-max> is the maximum size of allocable with kmalloc 32462306a36Sopenharmony_ci (see /proc/slabinfo) 32562306a36Sopenharmony_ci<pointer size> depends on the architecture -- ``sizeof(void *)`` 32662306a36Sopenharmony_ci<page size> depends on the architecture -- PAGE_SIZE or getpagesize (2) 32762306a36Sopenharmony_ci<max-order> is the value defined with MAX_ORDER 32862306a36Sopenharmony_ci<frame size> it's an upper bound of frame's capture size (more on this later) 32962306a36Sopenharmony_ci============== ================================================================ 33062306a36Sopenharmony_ci 33162306a36Sopenharmony_cifrom these definitions we will derive:: 33262306a36Sopenharmony_ci 33362306a36Sopenharmony_ci <block number> = <size-max>/<pointer size> 33462306a36Sopenharmony_ci <block size> = <pagesize> << <max-order> 33562306a36Sopenharmony_ci 33662306a36Sopenharmony_ciso, the max buffer size is:: 33762306a36Sopenharmony_ci 33862306a36Sopenharmony_ci <block number> * <block size> 33962306a36Sopenharmony_ci 34062306a36Sopenharmony_ciand, the number of frames be:: 34162306a36Sopenharmony_ci 34262306a36Sopenharmony_ci <block number> * <block size> / <frame size> 34362306a36Sopenharmony_ci 34462306a36Sopenharmony_ciSuppose the following parameters, which apply for 2.6 kernel and an 34562306a36Sopenharmony_cii386 architecture:: 34662306a36Sopenharmony_ci 34762306a36Sopenharmony_ci <size-max> = 131072 bytes 34862306a36Sopenharmony_ci <pointer size> = 4 bytes 34962306a36Sopenharmony_ci <pagesize> = 4096 bytes 35062306a36Sopenharmony_ci <max-order> = 11 35162306a36Sopenharmony_ci 35262306a36Sopenharmony_ciand a value for <frame size> of 2048 bytes. These parameters will yield:: 35362306a36Sopenharmony_ci 35462306a36Sopenharmony_ci <block number> = 131072/4 = 32768 blocks 35562306a36Sopenharmony_ci <block size> = 4096 << 11 = 8 MiB. 35662306a36Sopenharmony_ci 35762306a36Sopenharmony_ciand hence the buffer will have a 262144 MiB size. So it can hold 35862306a36Sopenharmony_ci262144 MiB / 2048 bytes = 134217728 frames 35962306a36Sopenharmony_ci 36062306a36Sopenharmony_ciActually, this buffer size is not possible with an i386 architecture. 36162306a36Sopenharmony_ciRemember that the memory is allocated in kernel space, in the case of 36262306a36Sopenharmony_cian i386 kernel's memory size is limited to 1GiB. 36362306a36Sopenharmony_ci 36462306a36Sopenharmony_ciAll memory allocations are not freed until the socket is closed. The memory 36562306a36Sopenharmony_ciallocations are done with GFP_KERNEL priority, this basically means that 36662306a36Sopenharmony_cithe allocation can wait and swap other process' memory in order to allocate 36762306a36Sopenharmony_cithe necessary memory, so normally limits can be reached. 36862306a36Sopenharmony_ci 36962306a36Sopenharmony_ciOther constraints 37062306a36Sopenharmony_ci----------------- 37162306a36Sopenharmony_ci 37262306a36Sopenharmony_ciIf you check the source code you will see that what I draw here as a frame 37362306a36Sopenharmony_ciis not only the link level frame. At the beginning of each frame there is a 37462306a36Sopenharmony_ciheader called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame 37562306a36Sopenharmony_cimeta information like timestamp. So what we draw here a frame it's really 37662306a36Sopenharmony_cithe following (from include/linux/if_packet.h):: 37762306a36Sopenharmony_ci 37862306a36Sopenharmony_ci /* 37962306a36Sopenharmony_ci Frame structure: 38062306a36Sopenharmony_ci 38162306a36Sopenharmony_ci - Start. Frame must be aligned to TPACKET_ALIGNMENT=16 38262306a36Sopenharmony_ci - struct tpacket_hdr 38362306a36Sopenharmony_ci - pad to TPACKET_ALIGNMENT=16 38462306a36Sopenharmony_ci - struct sockaddr_ll 38562306a36Sopenharmony_ci - Gap, chosen so that packet data (Start+tp_net) aligns to 38662306a36Sopenharmony_ci TPACKET_ALIGNMENT=16 38762306a36Sopenharmony_ci - Start+tp_mac: [ Optional MAC header ] 38862306a36Sopenharmony_ci - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16. 38962306a36Sopenharmony_ci - Pad to align to TPACKET_ALIGNMENT=16 39062306a36Sopenharmony_ci */ 39162306a36Sopenharmony_ci 39262306a36Sopenharmony_ciThe following are conditions that are checked in packet_set_ring 39362306a36Sopenharmony_ci 39462306a36Sopenharmony_ci - tp_block_size must be a multiple of PAGE_SIZE (1) 39562306a36Sopenharmony_ci - tp_frame_size must be greater than TPACKET_HDRLEN (obvious) 39662306a36Sopenharmony_ci - tp_frame_size must be a multiple of TPACKET_ALIGNMENT 39762306a36Sopenharmony_ci - tp_frame_nr must be exactly frames_per_block*tp_block_nr 39862306a36Sopenharmony_ci 39962306a36Sopenharmony_ciNote that tp_block_size should be chosen to be a power of two or there will 40062306a36Sopenharmony_cibe a waste of memory. 40162306a36Sopenharmony_ci 40262306a36Sopenharmony_ciMapping and use of the circular buffer (ring) 40362306a36Sopenharmony_ci--------------------------------------------- 40462306a36Sopenharmony_ci 40562306a36Sopenharmony_ciThe mapping of the buffer in the user process is done with the conventional 40662306a36Sopenharmony_cimmap function. Even the circular buffer is compound of several physically 40762306a36Sopenharmony_cidiscontiguous blocks of memory, they are contiguous to the user space, hence 40862306a36Sopenharmony_cijust one call to mmap is needed:: 40962306a36Sopenharmony_ci 41062306a36Sopenharmony_ci mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); 41162306a36Sopenharmony_ci 41262306a36Sopenharmony_ciIf tp_frame_size is a divisor of tp_block_size frames will be 41362306a36Sopenharmony_cicontiguously spaced by tp_frame_size bytes. If not, each 41462306a36Sopenharmony_citp_block_size/tp_frame_size frames there will be a gap between 41562306a36Sopenharmony_cithe frames. This is because a frame cannot be spawn across two 41662306a36Sopenharmony_ciblocks. 41762306a36Sopenharmony_ci 41862306a36Sopenharmony_ciTo use one socket for capture and transmission, the mapping of both the 41962306a36Sopenharmony_ciRX and TX buffer ring has to be done with one call to mmap:: 42062306a36Sopenharmony_ci 42162306a36Sopenharmony_ci ... 42262306a36Sopenharmony_ci setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &foo, sizeof(foo)); 42362306a36Sopenharmony_ci setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &bar, sizeof(bar)); 42462306a36Sopenharmony_ci ... 42562306a36Sopenharmony_ci rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); 42662306a36Sopenharmony_ci tx_ring = rx_ring + size; 42762306a36Sopenharmony_ci 42862306a36Sopenharmony_ciRX must be the first as the kernel maps the TX ring memory right 42962306a36Sopenharmony_ciafter the RX one. 43062306a36Sopenharmony_ci 43162306a36Sopenharmony_ciAt the beginning of each frame there is an status field (see 43262306a36Sopenharmony_cistruct tpacket_hdr). If this field is 0 means that the frame is ready 43362306a36Sopenharmony_cito be used for the kernel, If not, there is a frame the user can read 43462306a36Sopenharmony_ciand the following flags apply: 43562306a36Sopenharmony_ci 43662306a36Sopenharmony_ciCapture process 43762306a36Sopenharmony_ci^^^^^^^^^^^^^^^ 43862306a36Sopenharmony_ci 43962306a36Sopenharmony_ciFrom include/linux/if_packet.h:: 44062306a36Sopenharmony_ci 44162306a36Sopenharmony_ci #define TP_STATUS_COPY (1 << 1) 44262306a36Sopenharmony_ci #define TP_STATUS_LOSING (1 << 2) 44362306a36Sopenharmony_ci #define TP_STATUS_CSUMNOTREADY (1 << 3) 44462306a36Sopenharmony_ci #define TP_STATUS_CSUM_VALID (1 << 7) 44562306a36Sopenharmony_ci 44662306a36Sopenharmony_ci====================== ======================================================= 44762306a36Sopenharmony_ciTP_STATUS_COPY This flag indicates that the frame (and associated 44862306a36Sopenharmony_ci meta information) has been truncated because it's 44962306a36Sopenharmony_ci larger than tp_frame_size. This packet can be 45062306a36Sopenharmony_ci read entirely with recvfrom(). 45162306a36Sopenharmony_ci 45262306a36Sopenharmony_ci In order to make this work it must to be 45362306a36Sopenharmony_ci enabled previously with setsockopt() and 45462306a36Sopenharmony_ci the PACKET_COPY_THRESH option. 45562306a36Sopenharmony_ci 45662306a36Sopenharmony_ci The number of frames that can be buffered to 45762306a36Sopenharmony_ci be read with recvfrom is limited like a normal socket. 45862306a36Sopenharmony_ci See the SO_RCVBUF option in the socket (7) man page. 45962306a36Sopenharmony_ci 46062306a36Sopenharmony_ciTP_STATUS_LOSING indicates there were packet drops from last time 46162306a36Sopenharmony_ci statistics where checked with getsockopt() and 46262306a36Sopenharmony_ci the PACKET_STATISTICS option. 46362306a36Sopenharmony_ci 46462306a36Sopenharmony_ciTP_STATUS_CSUMNOTREADY currently it's used for outgoing IP packets which 46562306a36Sopenharmony_ci its checksum will be done in hardware. So while 46662306a36Sopenharmony_ci reading the packet we should not try to check the 46762306a36Sopenharmony_ci checksum. 46862306a36Sopenharmony_ci 46962306a36Sopenharmony_ciTP_STATUS_CSUM_VALID This flag indicates that at least the transport 47062306a36Sopenharmony_ci header checksum of the packet has been already 47162306a36Sopenharmony_ci validated on the kernel side. If the flag is not set 47262306a36Sopenharmony_ci then we are free to check the checksum by ourselves 47362306a36Sopenharmony_ci provided that TP_STATUS_CSUMNOTREADY is also not set. 47462306a36Sopenharmony_ci====================== ======================================================= 47562306a36Sopenharmony_ci 47662306a36Sopenharmony_cifor convenience there are also the following defines:: 47762306a36Sopenharmony_ci 47862306a36Sopenharmony_ci #define TP_STATUS_KERNEL 0 47962306a36Sopenharmony_ci #define TP_STATUS_USER 1 48062306a36Sopenharmony_ci 48162306a36Sopenharmony_ciThe kernel initializes all frames to TP_STATUS_KERNEL, when the kernel 48262306a36Sopenharmony_cireceives a packet it puts in the buffer and updates the status with 48362306a36Sopenharmony_ciat least the TP_STATUS_USER flag. Then the user can read the packet, 48462306a36Sopenharmony_cionce the packet is read the user must zero the status field, so the kernel 48562306a36Sopenharmony_cican use again that frame buffer. 48662306a36Sopenharmony_ci 48762306a36Sopenharmony_ciThe user can use poll (any other variant should apply too) to check if new 48862306a36Sopenharmony_cipackets are in the ring:: 48962306a36Sopenharmony_ci 49062306a36Sopenharmony_ci struct pollfd pfd; 49162306a36Sopenharmony_ci 49262306a36Sopenharmony_ci pfd.fd = fd; 49362306a36Sopenharmony_ci pfd.revents = 0; 49462306a36Sopenharmony_ci pfd.events = POLLIN|POLLRDNORM|POLLERR; 49562306a36Sopenharmony_ci 49662306a36Sopenharmony_ci if (status == TP_STATUS_KERNEL) 49762306a36Sopenharmony_ci retval = poll(&pfd, 1, timeout); 49862306a36Sopenharmony_ci 49962306a36Sopenharmony_ciIt doesn't incur in a race condition to first check the status value and 50062306a36Sopenharmony_cithen poll for frames. 50162306a36Sopenharmony_ci 50262306a36Sopenharmony_ciTransmission process 50362306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^ 50462306a36Sopenharmony_ci 50562306a36Sopenharmony_ciThose defines are also used for transmission:: 50662306a36Sopenharmony_ci 50762306a36Sopenharmony_ci #define TP_STATUS_AVAILABLE 0 // Frame is available 50862306a36Sopenharmony_ci #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send() 50962306a36Sopenharmony_ci #define TP_STATUS_SENDING 2 // Frame is currently in transmission 51062306a36Sopenharmony_ci #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct 51162306a36Sopenharmony_ci 51262306a36Sopenharmony_ciFirst, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a 51362306a36Sopenharmony_cipacket, the user fills a data buffer of an available frame, sets tp_len to 51462306a36Sopenharmony_cicurrent data buffer size and sets its status field to TP_STATUS_SEND_REQUEST. 51562306a36Sopenharmony_ciThis can be done on multiple frames. Once the user is ready to transmit, it 51662306a36Sopenharmony_cicalls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are 51762306a36Sopenharmony_ciforwarded to the network device. The kernel updates each status of sent 51862306a36Sopenharmony_ciframes with TP_STATUS_SENDING until the end of transfer. 51962306a36Sopenharmony_ci 52062306a36Sopenharmony_ciAt the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE. 52162306a36Sopenharmony_ci 52262306a36Sopenharmony_ci:: 52362306a36Sopenharmony_ci 52462306a36Sopenharmony_ci header->tp_len = in_i_size; 52562306a36Sopenharmony_ci header->tp_status = TP_STATUS_SEND_REQUEST; 52662306a36Sopenharmony_ci retval = send(this->socket, NULL, 0, 0); 52762306a36Sopenharmony_ci 52862306a36Sopenharmony_ciThe user can also use poll() to check if a buffer is available: 52962306a36Sopenharmony_ci 53062306a36Sopenharmony_ci(status == TP_STATUS_SENDING) 53162306a36Sopenharmony_ci 53262306a36Sopenharmony_ci:: 53362306a36Sopenharmony_ci 53462306a36Sopenharmony_ci struct pollfd pfd; 53562306a36Sopenharmony_ci pfd.fd = fd; 53662306a36Sopenharmony_ci pfd.revents = 0; 53762306a36Sopenharmony_ci pfd.events = POLLOUT; 53862306a36Sopenharmony_ci retval = poll(&pfd, 1, timeout); 53962306a36Sopenharmony_ci 54062306a36Sopenharmony_ciWhat TPACKET versions are available and when to use them? 54162306a36Sopenharmony_ci========================================================= 54262306a36Sopenharmony_ci 54362306a36Sopenharmony_ci:: 54462306a36Sopenharmony_ci 54562306a36Sopenharmony_ci int val = tpacket_version; 54662306a36Sopenharmony_ci setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val)); 54762306a36Sopenharmony_ci getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val)); 54862306a36Sopenharmony_ci 54962306a36Sopenharmony_ciwhere 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3. 55062306a36Sopenharmony_ci 55162306a36Sopenharmony_ciTPACKET_V1: 55262306a36Sopenharmony_ci - Default if not otherwise specified by setsockopt(2) 55362306a36Sopenharmony_ci - RX_RING, TX_RING available 55462306a36Sopenharmony_ci 55562306a36Sopenharmony_ciTPACKET_V1 --> TPACKET_V2: 55662306a36Sopenharmony_ci - Made 64 bit clean due to unsigned long usage in TPACKET_V1 55762306a36Sopenharmony_ci structures, thus this also works on 64 bit kernel with 32 bit 55862306a36Sopenharmony_ci userspace and the like 55962306a36Sopenharmony_ci - Timestamp resolution in nanoseconds instead of microseconds 56062306a36Sopenharmony_ci - RX_RING, TX_RING available 56162306a36Sopenharmony_ci - VLAN metadata information available for packets 56262306a36Sopenharmony_ci (TP_STATUS_VLAN_VALID, TP_STATUS_VLAN_TPID_VALID), 56362306a36Sopenharmony_ci in the tpacket2_hdr structure: 56462306a36Sopenharmony_ci 56562306a36Sopenharmony_ci - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates 56662306a36Sopenharmony_ci that the tp_vlan_tci field has valid VLAN TCI value 56762306a36Sopenharmony_ci - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field 56862306a36Sopenharmony_ci indicates that the tp_vlan_tpid field has valid VLAN TPID value 56962306a36Sopenharmony_ci 57062306a36Sopenharmony_ci - How to switch to TPACKET_V2: 57162306a36Sopenharmony_ci 57262306a36Sopenharmony_ci 1. Replace struct tpacket_hdr by struct tpacket2_hdr 57362306a36Sopenharmony_ci 2. Query header len and save 57462306a36Sopenharmony_ci 3. Set protocol version to 2, set up ring as usual 57562306a36Sopenharmony_ci 4. For getting the sockaddr_ll, 57662306a36Sopenharmony_ci use ``(void *)hdr + TPACKET_ALIGN(hdrlen)`` instead of 57762306a36Sopenharmony_ci ``(void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))`` 57862306a36Sopenharmony_ci 57962306a36Sopenharmony_ciTPACKET_V2 --> TPACKET_V3: 58062306a36Sopenharmony_ci - Flexible buffer implementation for RX_RING: 58162306a36Sopenharmony_ci 1. Blocks can be configured with non-static frame-size 58262306a36Sopenharmony_ci 2. Read/poll is at a block-level (as opposed to packet-level) 58362306a36Sopenharmony_ci 3. Added poll timeout to avoid indefinite user-space wait 58462306a36Sopenharmony_ci on idle links 58562306a36Sopenharmony_ci 4. Added user-configurable knobs: 58662306a36Sopenharmony_ci 58762306a36Sopenharmony_ci 4.1 block::timeout 58862306a36Sopenharmony_ci 4.2 tpkt_hdr::sk_rxhash 58962306a36Sopenharmony_ci 59062306a36Sopenharmony_ci - RX Hash data available in user space 59162306a36Sopenharmony_ci - TX_RING semantics are conceptually similar to TPACKET_V2; 59262306a36Sopenharmony_ci use tpacket3_hdr instead of tpacket2_hdr, and TPACKET3_HDRLEN 59362306a36Sopenharmony_ci instead of TPACKET2_HDRLEN. In the current implementation, 59462306a36Sopenharmony_ci the tp_next_offset field in the tpacket3_hdr MUST be set to 59562306a36Sopenharmony_ci zero, indicating that the ring does not hold variable sized frames. 59662306a36Sopenharmony_ci Packets with non-zero values of tp_next_offset will be dropped. 59762306a36Sopenharmony_ci 59862306a36Sopenharmony_ciAF_PACKET fanout mode 59962306a36Sopenharmony_ci===================== 60062306a36Sopenharmony_ci 60162306a36Sopenharmony_ciIn the AF_PACKET fanout mode, packet reception can be load balanced among 60262306a36Sopenharmony_ciprocesses. This also works in combination with mmap(2) on packet sockets. 60362306a36Sopenharmony_ci 60462306a36Sopenharmony_ciCurrently implemented fanout policies are: 60562306a36Sopenharmony_ci 60662306a36Sopenharmony_ci - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash 60762306a36Sopenharmony_ci - PACKET_FANOUT_LB: schedule to socket by round-robin 60862306a36Sopenharmony_ci - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on 60962306a36Sopenharmony_ci - PACKET_FANOUT_RND: schedule to socket by random selection 61062306a36Sopenharmony_ci - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another 61162306a36Sopenharmony_ci - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping 61262306a36Sopenharmony_ci 61362306a36Sopenharmony_ciMinimal example code by David S. Miller (try things like "./test eth0 hash", 61462306a36Sopenharmony_ci"./test eth0 lb", etc.):: 61562306a36Sopenharmony_ci 61662306a36Sopenharmony_ci #include <stddef.h> 61762306a36Sopenharmony_ci #include <stdlib.h> 61862306a36Sopenharmony_ci #include <stdio.h> 61962306a36Sopenharmony_ci #include <string.h> 62062306a36Sopenharmony_ci 62162306a36Sopenharmony_ci #include <sys/types.h> 62262306a36Sopenharmony_ci #include <sys/wait.h> 62362306a36Sopenharmony_ci #include <sys/socket.h> 62462306a36Sopenharmony_ci #include <sys/ioctl.h> 62562306a36Sopenharmony_ci 62662306a36Sopenharmony_ci #include <unistd.h> 62762306a36Sopenharmony_ci 62862306a36Sopenharmony_ci #include <linux/if_ether.h> 62962306a36Sopenharmony_ci #include <linux/if_packet.h> 63062306a36Sopenharmony_ci 63162306a36Sopenharmony_ci #include <net/if.h> 63262306a36Sopenharmony_ci 63362306a36Sopenharmony_ci static const char *device_name; 63462306a36Sopenharmony_ci static int fanout_type; 63562306a36Sopenharmony_ci static int fanout_id; 63662306a36Sopenharmony_ci 63762306a36Sopenharmony_ci #ifndef PACKET_FANOUT 63862306a36Sopenharmony_ci # define PACKET_FANOUT 18 63962306a36Sopenharmony_ci # define PACKET_FANOUT_HASH 0 64062306a36Sopenharmony_ci # define PACKET_FANOUT_LB 1 64162306a36Sopenharmony_ci #endif 64262306a36Sopenharmony_ci 64362306a36Sopenharmony_ci static int setup_socket(void) 64462306a36Sopenharmony_ci { 64562306a36Sopenharmony_ci int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP)); 64662306a36Sopenharmony_ci struct sockaddr_ll ll; 64762306a36Sopenharmony_ci struct ifreq ifr; 64862306a36Sopenharmony_ci int fanout_arg; 64962306a36Sopenharmony_ci 65062306a36Sopenharmony_ci if (fd < 0) { 65162306a36Sopenharmony_ci perror("socket"); 65262306a36Sopenharmony_ci return EXIT_FAILURE; 65362306a36Sopenharmony_ci } 65462306a36Sopenharmony_ci 65562306a36Sopenharmony_ci memset(&ifr, 0, sizeof(ifr)); 65662306a36Sopenharmony_ci strcpy(ifr.ifr_name, device_name); 65762306a36Sopenharmony_ci err = ioctl(fd, SIOCGIFINDEX, &ifr); 65862306a36Sopenharmony_ci if (err < 0) { 65962306a36Sopenharmony_ci perror("SIOCGIFINDEX"); 66062306a36Sopenharmony_ci return EXIT_FAILURE; 66162306a36Sopenharmony_ci } 66262306a36Sopenharmony_ci 66362306a36Sopenharmony_ci memset(&ll, 0, sizeof(ll)); 66462306a36Sopenharmony_ci ll.sll_family = AF_PACKET; 66562306a36Sopenharmony_ci ll.sll_ifindex = ifr.ifr_ifindex; 66662306a36Sopenharmony_ci err = bind(fd, (struct sockaddr *) &ll, sizeof(ll)); 66762306a36Sopenharmony_ci if (err < 0) { 66862306a36Sopenharmony_ci perror("bind"); 66962306a36Sopenharmony_ci return EXIT_FAILURE; 67062306a36Sopenharmony_ci } 67162306a36Sopenharmony_ci 67262306a36Sopenharmony_ci fanout_arg = (fanout_id | (fanout_type << 16)); 67362306a36Sopenharmony_ci err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT, 67462306a36Sopenharmony_ci &fanout_arg, sizeof(fanout_arg)); 67562306a36Sopenharmony_ci if (err) { 67662306a36Sopenharmony_ci perror("setsockopt"); 67762306a36Sopenharmony_ci return EXIT_FAILURE; 67862306a36Sopenharmony_ci } 67962306a36Sopenharmony_ci 68062306a36Sopenharmony_ci return fd; 68162306a36Sopenharmony_ci } 68262306a36Sopenharmony_ci 68362306a36Sopenharmony_ci static void fanout_thread(void) 68462306a36Sopenharmony_ci { 68562306a36Sopenharmony_ci int fd = setup_socket(); 68662306a36Sopenharmony_ci int limit = 10000; 68762306a36Sopenharmony_ci 68862306a36Sopenharmony_ci if (fd < 0) 68962306a36Sopenharmony_ci exit(fd); 69062306a36Sopenharmony_ci 69162306a36Sopenharmony_ci while (limit-- > 0) { 69262306a36Sopenharmony_ci char buf[1600]; 69362306a36Sopenharmony_ci int err; 69462306a36Sopenharmony_ci 69562306a36Sopenharmony_ci err = read(fd, buf, sizeof(buf)); 69662306a36Sopenharmony_ci if (err < 0) { 69762306a36Sopenharmony_ci perror("read"); 69862306a36Sopenharmony_ci exit(EXIT_FAILURE); 69962306a36Sopenharmony_ci } 70062306a36Sopenharmony_ci if ((limit % 10) == 0) 70162306a36Sopenharmony_ci fprintf(stdout, "(%d) \n", getpid()); 70262306a36Sopenharmony_ci } 70362306a36Sopenharmony_ci 70462306a36Sopenharmony_ci fprintf(stdout, "%d: Received 10000 packets\n", getpid()); 70562306a36Sopenharmony_ci 70662306a36Sopenharmony_ci close(fd); 70762306a36Sopenharmony_ci exit(0); 70862306a36Sopenharmony_ci } 70962306a36Sopenharmony_ci 71062306a36Sopenharmony_ci int main(int argc, char **argp) 71162306a36Sopenharmony_ci { 71262306a36Sopenharmony_ci int fd, err; 71362306a36Sopenharmony_ci int i; 71462306a36Sopenharmony_ci 71562306a36Sopenharmony_ci if (argc != 3) { 71662306a36Sopenharmony_ci fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]); 71762306a36Sopenharmony_ci return EXIT_FAILURE; 71862306a36Sopenharmony_ci } 71962306a36Sopenharmony_ci 72062306a36Sopenharmony_ci if (!strcmp(argp[2], "hash")) 72162306a36Sopenharmony_ci fanout_type = PACKET_FANOUT_HASH; 72262306a36Sopenharmony_ci else if (!strcmp(argp[2], "lb")) 72362306a36Sopenharmony_ci fanout_type = PACKET_FANOUT_LB; 72462306a36Sopenharmony_ci else { 72562306a36Sopenharmony_ci fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]); 72662306a36Sopenharmony_ci exit(EXIT_FAILURE); 72762306a36Sopenharmony_ci } 72862306a36Sopenharmony_ci 72962306a36Sopenharmony_ci device_name = argp[1]; 73062306a36Sopenharmony_ci fanout_id = getpid() & 0xffff; 73162306a36Sopenharmony_ci 73262306a36Sopenharmony_ci for (i = 0; i < 4; i++) { 73362306a36Sopenharmony_ci pid_t pid = fork(); 73462306a36Sopenharmony_ci 73562306a36Sopenharmony_ci switch (pid) { 73662306a36Sopenharmony_ci case 0: 73762306a36Sopenharmony_ci fanout_thread(); 73862306a36Sopenharmony_ci 73962306a36Sopenharmony_ci case -1: 74062306a36Sopenharmony_ci perror("fork"); 74162306a36Sopenharmony_ci exit(EXIT_FAILURE); 74262306a36Sopenharmony_ci } 74362306a36Sopenharmony_ci } 74462306a36Sopenharmony_ci 74562306a36Sopenharmony_ci for (i = 0; i < 4; i++) { 74662306a36Sopenharmony_ci int status; 74762306a36Sopenharmony_ci 74862306a36Sopenharmony_ci wait(&status); 74962306a36Sopenharmony_ci } 75062306a36Sopenharmony_ci 75162306a36Sopenharmony_ci return 0; 75262306a36Sopenharmony_ci } 75362306a36Sopenharmony_ci 75462306a36Sopenharmony_ciAF_PACKET TPACKET_V3 example 75562306a36Sopenharmony_ci============================ 75662306a36Sopenharmony_ci 75762306a36Sopenharmony_ciAF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame 75862306a36Sopenharmony_cisizes by doing its own memory management. It is based on blocks where polling 75962306a36Sopenharmony_ciworks on a per block basis instead of per ring as in TPACKET_V2 and predecessor. 76062306a36Sopenharmony_ci 76162306a36Sopenharmony_ciIt is said that TPACKET_V3 brings the following benefits: 76262306a36Sopenharmony_ci 76362306a36Sopenharmony_ci * ~15% - 20% reduction in CPU-usage 76462306a36Sopenharmony_ci * ~20% increase in packet capture rate 76562306a36Sopenharmony_ci * ~2x increase in packet density 76662306a36Sopenharmony_ci * Port aggregation analysis 76762306a36Sopenharmony_ci * Non static frame size to capture entire packet payload 76862306a36Sopenharmony_ci 76962306a36Sopenharmony_ciSo it seems to be a good candidate to be used with packet fanout. 77062306a36Sopenharmony_ci 77162306a36Sopenharmony_ciMinimal example code by Daniel Borkmann based on Chetan Loke's lolpcap (compile 77262306a36Sopenharmony_ciit with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.):: 77362306a36Sopenharmony_ci 77462306a36Sopenharmony_ci /* Written from scratch, but kernel-to-user space API usage 77562306a36Sopenharmony_ci * dissected from lolpcap: 77662306a36Sopenharmony_ci * Copyright 2011, Chetan Loke <loke.chetan@gmail.com> 77762306a36Sopenharmony_ci * License: GPL, version 2.0 77862306a36Sopenharmony_ci */ 77962306a36Sopenharmony_ci 78062306a36Sopenharmony_ci #include <stdio.h> 78162306a36Sopenharmony_ci #include <stdlib.h> 78262306a36Sopenharmony_ci #include <stdint.h> 78362306a36Sopenharmony_ci #include <string.h> 78462306a36Sopenharmony_ci #include <assert.h> 78562306a36Sopenharmony_ci #include <net/if.h> 78662306a36Sopenharmony_ci #include <arpa/inet.h> 78762306a36Sopenharmony_ci #include <netdb.h> 78862306a36Sopenharmony_ci #include <poll.h> 78962306a36Sopenharmony_ci #include <unistd.h> 79062306a36Sopenharmony_ci #include <signal.h> 79162306a36Sopenharmony_ci #include <inttypes.h> 79262306a36Sopenharmony_ci #include <sys/socket.h> 79362306a36Sopenharmony_ci #include <sys/mman.h> 79462306a36Sopenharmony_ci #include <linux/if_packet.h> 79562306a36Sopenharmony_ci #include <linux/if_ether.h> 79662306a36Sopenharmony_ci #include <linux/ip.h> 79762306a36Sopenharmony_ci 79862306a36Sopenharmony_ci #ifndef likely 79962306a36Sopenharmony_ci # define likely(x) __builtin_expect(!!(x), 1) 80062306a36Sopenharmony_ci #endif 80162306a36Sopenharmony_ci #ifndef unlikely 80262306a36Sopenharmony_ci # define unlikely(x) __builtin_expect(!!(x), 0) 80362306a36Sopenharmony_ci #endif 80462306a36Sopenharmony_ci 80562306a36Sopenharmony_ci struct block_desc { 80662306a36Sopenharmony_ci uint32_t version; 80762306a36Sopenharmony_ci uint32_t offset_to_priv; 80862306a36Sopenharmony_ci struct tpacket_hdr_v1 h1; 80962306a36Sopenharmony_ci }; 81062306a36Sopenharmony_ci 81162306a36Sopenharmony_ci struct ring { 81262306a36Sopenharmony_ci struct iovec *rd; 81362306a36Sopenharmony_ci uint8_t *map; 81462306a36Sopenharmony_ci struct tpacket_req3 req; 81562306a36Sopenharmony_ci }; 81662306a36Sopenharmony_ci 81762306a36Sopenharmony_ci static unsigned long packets_total = 0, bytes_total = 0; 81862306a36Sopenharmony_ci static sig_atomic_t sigint = 0; 81962306a36Sopenharmony_ci 82062306a36Sopenharmony_ci static void sighandler(int num) 82162306a36Sopenharmony_ci { 82262306a36Sopenharmony_ci sigint = 1; 82362306a36Sopenharmony_ci } 82462306a36Sopenharmony_ci 82562306a36Sopenharmony_ci static int setup_socket(struct ring *ring, char *netdev) 82662306a36Sopenharmony_ci { 82762306a36Sopenharmony_ci int err, i, fd, v = TPACKET_V3; 82862306a36Sopenharmony_ci struct sockaddr_ll ll; 82962306a36Sopenharmony_ci unsigned int blocksiz = 1 << 22, framesiz = 1 << 11; 83062306a36Sopenharmony_ci unsigned int blocknum = 64; 83162306a36Sopenharmony_ci 83262306a36Sopenharmony_ci fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); 83362306a36Sopenharmony_ci if (fd < 0) { 83462306a36Sopenharmony_ci perror("socket"); 83562306a36Sopenharmony_ci exit(1); 83662306a36Sopenharmony_ci } 83762306a36Sopenharmony_ci 83862306a36Sopenharmony_ci err = setsockopt(fd, SOL_PACKET, PACKET_VERSION, &v, sizeof(v)); 83962306a36Sopenharmony_ci if (err < 0) { 84062306a36Sopenharmony_ci perror("setsockopt"); 84162306a36Sopenharmony_ci exit(1); 84262306a36Sopenharmony_ci } 84362306a36Sopenharmony_ci 84462306a36Sopenharmony_ci memset(&ring->req, 0, sizeof(ring->req)); 84562306a36Sopenharmony_ci ring->req.tp_block_size = blocksiz; 84662306a36Sopenharmony_ci ring->req.tp_frame_size = framesiz; 84762306a36Sopenharmony_ci ring->req.tp_block_nr = blocknum; 84862306a36Sopenharmony_ci ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz; 84962306a36Sopenharmony_ci ring->req.tp_retire_blk_tov = 60; 85062306a36Sopenharmony_ci ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH; 85162306a36Sopenharmony_ci 85262306a36Sopenharmony_ci err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req, 85362306a36Sopenharmony_ci sizeof(ring->req)); 85462306a36Sopenharmony_ci if (err < 0) { 85562306a36Sopenharmony_ci perror("setsockopt"); 85662306a36Sopenharmony_ci exit(1); 85762306a36Sopenharmony_ci } 85862306a36Sopenharmony_ci 85962306a36Sopenharmony_ci ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr, 86062306a36Sopenharmony_ci PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd, 0); 86162306a36Sopenharmony_ci if (ring->map == MAP_FAILED) { 86262306a36Sopenharmony_ci perror("mmap"); 86362306a36Sopenharmony_ci exit(1); 86462306a36Sopenharmony_ci } 86562306a36Sopenharmony_ci 86662306a36Sopenharmony_ci ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd)); 86762306a36Sopenharmony_ci assert(ring->rd); 86862306a36Sopenharmony_ci for (i = 0; i < ring->req.tp_block_nr; ++i) { 86962306a36Sopenharmony_ci ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size); 87062306a36Sopenharmony_ci ring->rd[i].iov_len = ring->req.tp_block_size; 87162306a36Sopenharmony_ci } 87262306a36Sopenharmony_ci 87362306a36Sopenharmony_ci memset(&ll, 0, sizeof(ll)); 87462306a36Sopenharmony_ci ll.sll_family = PF_PACKET; 87562306a36Sopenharmony_ci ll.sll_protocol = htons(ETH_P_ALL); 87662306a36Sopenharmony_ci ll.sll_ifindex = if_nametoindex(netdev); 87762306a36Sopenharmony_ci ll.sll_hatype = 0; 87862306a36Sopenharmony_ci ll.sll_pkttype = 0; 87962306a36Sopenharmony_ci ll.sll_halen = 0; 88062306a36Sopenharmony_ci 88162306a36Sopenharmony_ci err = bind(fd, (struct sockaddr *) &ll, sizeof(ll)); 88262306a36Sopenharmony_ci if (err < 0) { 88362306a36Sopenharmony_ci perror("bind"); 88462306a36Sopenharmony_ci exit(1); 88562306a36Sopenharmony_ci } 88662306a36Sopenharmony_ci 88762306a36Sopenharmony_ci return fd; 88862306a36Sopenharmony_ci } 88962306a36Sopenharmony_ci 89062306a36Sopenharmony_ci static void display(struct tpacket3_hdr *ppd) 89162306a36Sopenharmony_ci { 89262306a36Sopenharmony_ci struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac); 89362306a36Sopenharmony_ci struct iphdr *ip = (struct iphdr *) ((uint8_t *) eth + ETH_HLEN); 89462306a36Sopenharmony_ci 89562306a36Sopenharmony_ci if (eth->h_proto == htons(ETH_P_IP)) { 89662306a36Sopenharmony_ci struct sockaddr_in ss, sd; 89762306a36Sopenharmony_ci char sbuff[NI_MAXHOST], dbuff[NI_MAXHOST]; 89862306a36Sopenharmony_ci 89962306a36Sopenharmony_ci memset(&ss, 0, sizeof(ss)); 90062306a36Sopenharmony_ci ss.sin_family = PF_INET; 90162306a36Sopenharmony_ci ss.sin_addr.s_addr = ip->saddr; 90262306a36Sopenharmony_ci getnameinfo((struct sockaddr *) &ss, sizeof(ss), 90362306a36Sopenharmony_ci sbuff, sizeof(sbuff), NULL, 0, NI_NUMERICHOST); 90462306a36Sopenharmony_ci 90562306a36Sopenharmony_ci memset(&sd, 0, sizeof(sd)); 90662306a36Sopenharmony_ci sd.sin_family = PF_INET; 90762306a36Sopenharmony_ci sd.sin_addr.s_addr = ip->daddr; 90862306a36Sopenharmony_ci getnameinfo((struct sockaddr *) &sd, sizeof(sd), 90962306a36Sopenharmony_ci dbuff, sizeof(dbuff), NULL, 0, NI_NUMERICHOST); 91062306a36Sopenharmony_ci 91162306a36Sopenharmony_ci printf("%s -> %s, ", sbuff, dbuff); 91262306a36Sopenharmony_ci } 91362306a36Sopenharmony_ci 91462306a36Sopenharmony_ci printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash); 91562306a36Sopenharmony_ci } 91662306a36Sopenharmony_ci 91762306a36Sopenharmony_ci static void walk_block(struct block_desc *pbd, const int block_num) 91862306a36Sopenharmony_ci { 91962306a36Sopenharmony_ci int num_pkts = pbd->h1.num_pkts, i; 92062306a36Sopenharmony_ci unsigned long bytes = 0; 92162306a36Sopenharmony_ci struct tpacket3_hdr *ppd; 92262306a36Sopenharmony_ci 92362306a36Sopenharmony_ci ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd + 92462306a36Sopenharmony_ci pbd->h1.offset_to_first_pkt); 92562306a36Sopenharmony_ci for (i = 0; i < num_pkts; ++i) { 92662306a36Sopenharmony_ci bytes += ppd->tp_snaplen; 92762306a36Sopenharmony_ci display(ppd); 92862306a36Sopenharmony_ci 92962306a36Sopenharmony_ci ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd + 93062306a36Sopenharmony_ci ppd->tp_next_offset); 93162306a36Sopenharmony_ci } 93262306a36Sopenharmony_ci 93362306a36Sopenharmony_ci packets_total += num_pkts; 93462306a36Sopenharmony_ci bytes_total += bytes; 93562306a36Sopenharmony_ci } 93662306a36Sopenharmony_ci 93762306a36Sopenharmony_ci static void flush_block(struct block_desc *pbd) 93862306a36Sopenharmony_ci { 93962306a36Sopenharmony_ci pbd->h1.block_status = TP_STATUS_KERNEL; 94062306a36Sopenharmony_ci } 94162306a36Sopenharmony_ci 94262306a36Sopenharmony_ci static void teardown_socket(struct ring *ring, int fd) 94362306a36Sopenharmony_ci { 94462306a36Sopenharmony_ci munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr); 94562306a36Sopenharmony_ci free(ring->rd); 94662306a36Sopenharmony_ci close(fd); 94762306a36Sopenharmony_ci } 94862306a36Sopenharmony_ci 94962306a36Sopenharmony_ci int main(int argc, char **argp) 95062306a36Sopenharmony_ci { 95162306a36Sopenharmony_ci int fd, err; 95262306a36Sopenharmony_ci socklen_t len; 95362306a36Sopenharmony_ci struct ring ring; 95462306a36Sopenharmony_ci struct pollfd pfd; 95562306a36Sopenharmony_ci unsigned int block_num = 0, blocks = 64; 95662306a36Sopenharmony_ci struct block_desc *pbd; 95762306a36Sopenharmony_ci struct tpacket_stats_v3 stats; 95862306a36Sopenharmony_ci 95962306a36Sopenharmony_ci if (argc != 2) { 96062306a36Sopenharmony_ci fprintf(stderr, "Usage: %s INTERFACE\n", argp[0]); 96162306a36Sopenharmony_ci return EXIT_FAILURE; 96262306a36Sopenharmony_ci } 96362306a36Sopenharmony_ci 96462306a36Sopenharmony_ci signal(SIGINT, sighandler); 96562306a36Sopenharmony_ci 96662306a36Sopenharmony_ci memset(&ring, 0, sizeof(ring)); 96762306a36Sopenharmony_ci fd = setup_socket(&ring, argp[argc - 1]); 96862306a36Sopenharmony_ci assert(fd > 0); 96962306a36Sopenharmony_ci 97062306a36Sopenharmony_ci memset(&pfd, 0, sizeof(pfd)); 97162306a36Sopenharmony_ci pfd.fd = fd; 97262306a36Sopenharmony_ci pfd.events = POLLIN | POLLERR; 97362306a36Sopenharmony_ci pfd.revents = 0; 97462306a36Sopenharmony_ci 97562306a36Sopenharmony_ci while (likely(!sigint)) { 97662306a36Sopenharmony_ci pbd = (struct block_desc *) ring.rd[block_num].iov_base; 97762306a36Sopenharmony_ci 97862306a36Sopenharmony_ci if ((pbd->h1.block_status & TP_STATUS_USER) == 0) { 97962306a36Sopenharmony_ci poll(&pfd, 1, -1); 98062306a36Sopenharmony_ci continue; 98162306a36Sopenharmony_ci } 98262306a36Sopenharmony_ci 98362306a36Sopenharmony_ci walk_block(pbd, block_num); 98462306a36Sopenharmony_ci flush_block(pbd); 98562306a36Sopenharmony_ci block_num = (block_num + 1) % blocks; 98662306a36Sopenharmony_ci } 98762306a36Sopenharmony_ci 98862306a36Sopenharmony_ci len = sizeof(stats); 98962306a36Sopenharmony_ci err = getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, &stats, &len); 99062306a36Sopenharmony_ci if (err < 0) { 99162306a36Sopenharmony_ci perror("getsockopt"); 99262306a36Sopenharmony_ci exit(1); 99362306a36Sopenharmony_ci } 99462306a36Sopenharmony_ci 99562306a36Sopenharmony_ci fflush(stdout); 99662306a36Sopenharmony_ci printf("\nReceived %u packets, %lu bytes, %u dropped, freeze_q_cnt: %u\n", 99762306a36Sopenharmony_ci stats.tp_packets, bytes_total, stats.tp_drops, 99862306a36Sopenharmony_ci stats.tp_freeze_q_cnt); 99962306a36Sopenharmony_ci 100062306a36Sopenharmony_ci teardown_socket(&ring, fd); 100162306a36Sopenharmony_ci return 0; 100262306a36Sopenharmony_ci } 100362306a36Sopenharmony_ci 100462306a36Sopenharmony_ciPACKET_QDISC_BYPASS 100562306a36Sopenharmony_ci=================== 100662306a36Sopenharmony_ci 100762306a36Sopenharmony_ciIf there is a requirement to load the network with many packets in a similar 100862306a36Sopenharmony_cifashion as pktgen does, you might set the following option after socket 100962306a36Sopenharmony_cicreation:: 101062306a36Sopenharmony_ci 101162306a36Sopenharmony_ci int one = 1; 101262306a36Sopenharmony_ci setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, &one, sizeof(one)); 101362306a36Sopenharmony_ci 101462306a36Sopenharmony_ciThis has the side-effect, that packets sent through PF_PACKET will bypass the 101562306a36Sopenharmony_cikernel's qdisc layer and are forcedly pushed to the driver directly. Meaning, 101662306a36Sopenharmony_cipacket are not buffered, tc disciplines are ignored, increased loss can occur 101762306a36Sopenharmony_ciand such packets are also not visible to other PF_PACKET sockets anymore. So, 101862306a36Sopenharmony_ciyou have been warned; generally, this can be useful for stress testing various 101962306a36Sopenharmony_cicomponents of a system. 102062306a36Sopenharmony_ci 102162306a36Sopenharmony_ciOn default, PACKET_QDISC_BYPASS is disabled and needs to be explicitly enabled 102262306a36Sopenharmony_cion PF_PACKET sockets. 102362306a36Sopenharmony_ci 102462306a36Sopenharmony_ciPACKET_TIMESTAMP 102562306a36Sopenharmony_ci================ 102662306a36Sopenharmony_ci 102762306a36Sopenharmony_ciThe PACKET_TIMESTAMP setting determines the source of the timestamp in 102862306a36Sopenharmony_cithe packet meta information for mmap(2)ed RX_RING and TX_RINGs. If your 102962306a36Sopenharmony_ciNIC is capable of timestamping packets in hardware, you can request those 103062306a36Sopenharmony_cihardware timestamps to be used. Note: you may need to enable the generation 103162306a36Sopenharmony_ciof hardware timestamps with SIOCSHWTSTAMP (see related information from 103262306a36Sopenharmony_ciDocumentation/networking/timestamping.rst). 103362306a36Sopenharmony_ci 103462306a36Sopenharmony_ciPACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING:: 103562306a36Sopenharmony_ci 103662306a36Sopenharmony_ci int req = SOF_TIMESTAMPING_RAW_HARDWARE; 103762306a36Sopenharmony_ci setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req)) 103862306a36Sopenharmony_ci 103962306a36Sopenharmony_ciFor the mmap(2)ed ring buffers, such timestamps are stored in the 104062306a36Sopenharmony_ci``tpacket{,2,3}_hdr`` structure's tp_sec and ``tp_{n,u}sec`` members. 104162306a36Sopenharmony_ciTo determine what kind of timestamp has been reported, the tp_status field 104262306a36Sopenharmony_ciis binary or'ed with the following possible bits ... 104362306a36Sopenharmony_ci 104462306a36Sopenharmony_ci:: 104562306a36Sopenharmony_ci 104662306a36Sopenharmony_ci TP_STATUS_TS_RAW_HARDWARE 104762306a36Sopenharmony_ci TP_STATUS_TS_SOFTWARE 104862306a36Sopenharmony_ci 104962306a36Sopenharmony_ci... that are equivalent to its ``SOF_TIMESTAMPING_*`` counterparts. For the 105062306a36Sopenharmony_ciRX_RING, if neither is set (i.e. PACKET_TIMESTAMP is not set), then a 105162306a36Sopenharmony_cisoftware fallback was invoked *within* PF_PACKET's processing code (less 105262306a36Sopenharmony_ciprecise). 105362306a36Sopenharmony_ci 105462306a36Sopenharmony_ciGetting timestamps for the TX_RING works as follows: i) fill the ring frames, 105562306a36Sopenharmony_ciii) call sendto() e.g. in blocking mode, iii) wait for status of relevant 105662306a36Sopenharmony_ciframes to be updated resp. the frame handed over to the application, iv) walk 105762306a36Sopenharmony_cithrough the frames to pick up the individual hw/sw timestamps. 105862306a36Sopenharmony_ci 105962306a36Sopenharmony_ciOnly (!) if transmit timestamping is enabled, then these bits are combined 106062306a36Sopenharmony_ciwith binary | with TP_STATUS_AVAILABLE, so you must check for that in your 106162306a36Sopenharmony_ciapplication (e.g. !(tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING)) 106262306a36Sopenharmony_ciin a first step to see if the frame belongs to the application, and then 106362306a36Sopenharmony_cione can extract the type of timestamp in a second step from tp_status)! 106462306a36Sopenharmony_ci 106562306a36Sopenharmony_ciIf you don't care about them, thus having it disabled, checking for 106662306a36Sopenharmony_ciTP_STATUS_AVAILABLE resp. TP_STATUS_WRONG_FORMAT is sufficient. If in the 106762306a36Sopenharmony_ciTX_RING part only TP_STATUS_AVAILABLE is set, then the tp_sec and tp_{n,u}sec 106862306a36Sopenharmony_cimembers do not contain a valid value. For TX_RINGs, by default no timestamp 106962306a36Sopenharmony_ciis generated! 107062306a36Sopenharmony_ci 107162306a36Sopenharmony_ciSee include/linux/net_tstamp.h and Documentation/networking/timestamping.rst 107262306a36Sopenharmony_cifor more information on hardware timestamps. 107362306a36Sopenharmony_ci 107462306a36Sopenharmony_ciMiscellaneous bits 107562306a36Sopenharmony_ci================== 107662306a36Sopenharmony_ci 107762306a36Sopenharmony_ci- Packet sockets work well together with Linux socket filters, thus you also 107862306a36Sopenharmony_ci might want to have a look at Documentation/networking/filter.rst 107962306a36Sopenharmony_ci 108062306a36Sopenharmony_ciTHANKS 108162306a36Sopenharmony_ci====== 108262306a36Sopenharmony_ci 108362306a36Sopenharmony_ci Jesse Brandeburg, for fixing my grammathical/spelling errors 1084