18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci============ 48c2ecf20Sopenharmony_ciTimestamping 58c2ecf20Sopenharmony_ci============ 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ci1. Control Interfaces 98c2ecf20Sopenharmony_ci===================== 108c2ecf20Sopenharmony_ci 118c2ecf20Sopenharmony_ciThe interfaces for receiving network packages timestamps are: 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciSO_TIMESTAMP 148c2ecf20Sopenharmony_ci Generates a timestamp for each incoming packet in (not necessarily 158c2ecf20Sopenharmony_ci monotonic) system time. Reports the timestamp via recvmsg() in a 168c2ecf20Sopenharmony_ci control message in usec resolution. 178c2ecf20Sopenharmony_ci SO_TIMESTAMP is defined as SO_TIMESTAMP_NEW or SO_TIMESTAMP_OLD 188c2ecf20Sopenharmony_ci based on the architecture type and time_t representation of libc. 198c2ecf20Sopenharmony_ci Control message format is in struct __kernel_old_timeval for 208c2ecf20Sopenharmony_ci SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for 218c2ecf20Sopenharmony_ci SO_TIMESTAMP_NEW options respectively. 228c2ecf20Sopenharmony_ci 238c2ecf20Sopenharmony_ciSO_TIMESTAMPNS 248c2ecf20Sopenharmony_ci Same timestamping mechanism as SO_TIMESTAMP, but reports the 258c2ecf20Sopenharmony_ci timestamp as struct timespec in nsec resolution. 268c2ecf20Sopenharmony_ci SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD 278c2ecf20Sopenharmony_ci based on the architecture type and time_t representation of libc. 288c2ecf20Sopenharmony_ci Control message format is in struct timespec for SO_TIMESTAMPNS_OLD 298c2ecf20Sopenharmony_ci and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options 308c2ecf20Sopenharmony_ci respectively. 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_ciIP_MULTICAST_LOOP + SO_TIMESTAMP[NS] 338c2ecf20Sopenharmony_ci Only for multicast:approximate transmit timestamp obtained by 348c2ecf20Sopenharmony_ci reading the looped packet receive timestamp. 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ciSO_TIMESTAMPING 378c2ecf20Sopenharmony_ci Generates timestamps on reception, transmission or both. Supports 388c2ecf20Sopenharmony_ci multiple timestamp sources, including hardware. Supports generating 398c2ecf20Sopenharmony_ci timestamps for stream sockets. 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_ci 428c2ecf20Sopenharmony_ci1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW) 438c2ecf20Sopenharmony_ci------------------------------------------------------------- 448c2ecf20Sopenharmony_ci 458c2ecf20Sopenharmony_ciThis socket option enables timestamping of datagrams on the reception 468c2ecf20Sopenharmony_cipath. Because the destination socket, if any, is not known early in 478c2ecf20Sopenharmony_cithe network stack, the feature has to be enabled for all packets. The 488c2ecf20Sopenharmony_cisame is true for all early receive timestamp options. 498c2ecf20Sopenharmony_ci 508c2ecf20Sopenharmony_ciFor interface details, see `man 7 socket`. 518c2ecf20Sopenharmony_ci 528c2ecf20Sopenharmony_ciAlways use SO_TIMESTAMP_NEW timestamp to always get timestamp in 538c2ecf20Sopenharmony_cistruct __kernel_sock_timeval format. 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ciSO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038 568c2ecf20Sopenharmony_cion 32 bit machines. 578c2ecf20Sopenharmony_ci 588c2ecf20Sopenharmony_ci1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW): 598c2ecf20Sopenharmony_ci 608c2ecf20Sopenharmony_ciThis option is identical to SO_TIMESTAMP except for the returned data type. 618c2ecf20Sopenharmony_ciIts struct timespec allows for higher resolution (ns) timestamps than the 628c2ecf20Sopenharmony_citimeval of SO_TIMESTAMP (ms). 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ciAlways use SO_TIMESTAMPNS_NEW timestamp to always get timestamp in 658c2ecf20Sopenharmony_cistruct __kernel_timespec format. 668c2ecf20Sopenharmony_ci 678c2ecf20Sopenharmony_ciSO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038 688c2ecf20Sopenharmony_cion 32 bit machines. 698c2ecf20Sopenharmony_ci 708c2ecf20Sopenharmony_ci1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW) 718c2ecf20Sopenharmony_ci---------------------------------------------------------------------- 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ciSupports multiple types of timestamp requests. As a result, this 748c2ecf20Sopenharmony_cisocket option takes a bitmap of flags, not a boolean. In:: 758c2ecf20Sopenharmony_ci 768c2ecf20Sopenharmony_ci err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); 778c2ecf20Sopenharmony_ci 788c2ecf20Sopenharmony_cival is an integer with any of the following bits set. Setting other 798c2ecf20Sopenharmony_cibit returns EINVAL and does not change the current state. 808c2ecf20Sopenharmony_ci 818c2ecf20Sopenharmony_ciThe socket option configures timestamp generation for individual 828c2ecf20Sopenharmony_cisk_buffs (1.3.1), timestamp reporting to the socket's error 838c2ecf20Sopenharmony_ciqueue (1.3.2) and options (1.3.3). Timestamp generation can also 848c2ecf20Sopenharmony_cibe enabled for individual sendmsg calls using cmsg (1.3.4). 858c2ecf20Sopenharmony_ci 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ci1.3.1 Timestamp Generation 888c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^ 898c2ecf20Sopenharmony_ci 908c2ecf20Sopenharmony_ciSome bits are requests to the stack to try to generate timestamps. Any 918c2ecf20Sopenharmony_cicombination of them is valid. Changes to these bits apply to newly 928c2ecf20Sopenharmony_cicreated packets, not to packets already in the stack. As a result, it 938c2ecf20Sopenharmony_ciis possible to selectively request timestamps for a subset of packets 948c2ecf20Sopenharmony_ci(e.g., for sampling) by embedding an send() call within two setsockopt 958c2ecf20Sopenharmony_cicalls, one to enable timestamp generation and one to disable it. 968c2ecf20Sopenharmony_ciTimestamps may also be generated for reasons other than being 978c2ecf20Sopenharmony_cirequested by a particular socket, such as when receive timestamping is 988c2ecf20Sopenharmony_cienabled system wide, as explained earlier. 998c2ecf20Sopenharmony_ci 1008c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_RX_HARDWARE: 1018c2ecf20Sopenharmony_ci Request rx timestamps generated by the network adapter. 1028c2ecf20Sopenharmony_ci 1038c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_RX_SOFTWARE: 1048c2ecf20Sopenharmony_ci Request rx timestamps when data enters the kernel. These timestamps 1058c2ecf20Sopenharmony_ci are generated just after a device driver hands a packet to the 1068c2ecf20Sopenharmony_ci kernel receive stack. 1078c2ecf20Sopenharmony_ci 1088c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_TX_HARDWARE: 1098c2ecf20Sopenharmony_ci Request tx timestamps generated by the network adapter. This flag 1108c2ecf20Sopenharmony_ci can be enabled via both socket options and control messages. 1118c2ecf20Sopenharmony_ci 1128c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_TX_SOFTWARE: 1138c2ecf20Sopenharmony_ci Request tx timestamps when data leaves the kernel. These timestamps 1148c2ecf20Sopenharmony_ci are generated in the device driver as close as possible, but always 1158c2ecf20Sopenharmony_ci prior to, passing the packet to the network interface. Hence, they 1168c2ecf20Sopenharmony_ci require driver support and may not be available for all devices. 1178c2ecf20Sopenharmony_ci This flag can be enabled via both socket options and control messages. 1188c2ecf20Sopenharmony_ci 1198c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_TX_SCHED: 1208c2ecf20Sopenharmony_ci Request tx timestamps prior to entering the packet scheduler. Kernel 1218c2ecf20Sopenharmony_ci transmit latency is, if long, often dominated by queuing delay. The 1228c2ecf20Sopenharmony_ci difference between this timestamp and one taken at 1238c2ecf20Sopenharmony_ci SOF_TIMESTAMPING_TX_SOFTWARE will expose this latency independent 1248c2ecf20Sopenharmony_ci of protocol processing. The latency incurred in protocol 1258c2ecf20Sopenharmony_ci processing, if any, can be computed by subtracting a userspace 1268c2ecf20Sopenharmony_ci timestamp taken immediately before send() from this timestamp. On 1278c2ecf20Sopenharmony_ci machines with virtual devices where a transmitted packet travels 1288c2ecf20Sopenharmony_ci through multiple devices and, hence, multiple packet schedulers, 1298c2ecf20Sopenharmony_ci a timestamp is generated at each layer. This allows for fine 1308c2ecf20Sopenharmony_ci grained measurement of queuing delay. This flag can be enabled 1318c2ecf20Sopenharmony_ci via both socket options and control messages. 1328c2ecf20Sopenharmony_ci 1338c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_TX_ACK: 1348c2ecf20Sopenharmony_ci Request tx timestamps when all data in the send buffer has been 1358c2ecf20Sopenharmony_ci acknowledged. This only makes sense for reliable protocols. It is 1368c2ecf20Sopenharmony_ci currently only implemented for TCP. For that protocol, it may 1378c2ecf20Sopenharmony_ci over-report measurement, because the timestamp is generated when all 1388c2ecf20Sopenharmony_ci data up to and including the buffer at send() was acknowledged: the 1398c2ecf20Sopenharmony_ci cumulative acknowledgment. The mechanism ignores SACK and FACK. 1408c2ecf20Sopenharmony_ci This flag can be enabled via both socket options and control messages. 1418c2ecf20Sopenharmony_ci 1428c2ecf20Sopenharmony_ci 1438c2ecf20Sopenharmony_ci1.3.2 Timestamp Reporting 1448c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^ 1458c2ecf20Sopenharmony_ci 1468c2ecf20Sopenharmony_ciThe other three bits control which timestamps will be reported in a 1478c2ecf20Sopenharmony_cigenerated control message. Changes to the bits take immediate 1488c2ecf20Sopenharmony_cieffect at the timestamp reporting locations in the stack. Timestamps 1498c2ecf20Sopenharmony_ciare only reported for packets that also have the relevant timestamp 1508c2ecf20Sopenharmony_cigeneration request set. 1518c2ecf20Sopenharmony_ci 1528c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_SOFTWARE: 1538c2ecf20Sopenharmony_ci Report any software timestamps when available. 1548c2ecf20Sopenharmony_ci 1558c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_SYS_HARDWARE: 1568c2ecf20Sopenharmony_ci This option is deprecated and ignored. 1578c2ecf20Sopenharmony_ci 1588c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_RAW_HARDWARE: 1598c2ecf20Sopenharmony_ci Report hardware timestamps as generated by 1608c2ecf20Sopenharmony_ci SOF_TIMESTAMPING_TX_HARDWARE when available. 1618c2ecf20Sopenharmony_ci 1628c2ecf20Sopenharmony_ci 1638c2ecf20Sopenharmony_ci1.3.3 Timestamp Options 1648c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^ 1658c2ecf20Sopenharmony_ci 1668c2ecf20Sopenharmony_ciThe interface supports the options 1678c2ecf20Sopenharmony_ci 1688c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_OPT_ID: 1698c2ecf20Sopenharmony_ci Generate a unique identifier along with each packet. A process can 1708c2ecf20Sopenharmony_ci have multiple concurrent timestamping requests outstanding. Packets 1718c2ecf20Sopenharmony_ci can be reordered in the transmit path, for instance in the packet 1728c2ecf20Sopenharmony_ci scheduler. In that case timestamps will be queued onto the error 1738c2ecf20Sopenharmony_ci queue out of order from the original send() calls. It is not always 1748c2ecf20Sopenharmony_ci possible to uniquely match timestamps to the original send() calls 1758c2ecf20Sopenharmony_ci based on timestamp order or payload inspection alone, then. 1768c2ecf20Sopenharmony_ci 1778c2ecf20Sopenharmony_ci This option associates each packet at send() with a unique 1788c2ecf20Sopenharmony_ci identifier and returns that along with the timestamp. The identifier 1798c2ecf20Sopenharmony_ci is derived from a per-socket u32 counter (that wraps). For datagram 1808c2ecf20Sopenharmony_ci sockets, the counter increments with each sent packet. For stream 1818c2ecf20Sopenharmony_ci sockets, it increments with every byte. 1828c2ecf20Sopenharmony_ci 1838c2ecf20Sopenharmony_ci The counter starts at zero. It is initialized the first time that 1848c2ecf20Sopenharmony_ci the socket option is enabled. It is reset each time the option is 1858c2ecf20Sopenharmony_ci enabled after having been disabled. Resetting the counter does not 1868c2ecf20Sopenharmony_ci change the identifiers of existing packets in the system. 1878c2ecf20Sopenharmony_ci 1888c2ecf20Sopenharmony_ci This option is implemented only for transmit timestamps. There, the 1898c2ecf20Sopenharmony_ci timestamp is always looped along with a struct sock_extended_err. 1908c2ecf20Sopenharmony_ci The option modifies field ee_data to pass an id that is unique 1918c2ecf20Sopenharmony_ci among all possibly concurrently outstanding timestamp requests for 1928c2ecf20Sopenharmony_ci that socket. 1938c2ecf20Sopenharmony_ci 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_OPT_CMSG: 1968c2ecf20Sopenharmony_ci Support recv() cmsg for all timestamped packets. Control messages 1978c2ecf20Sopenharmony_ci are already supported unconditionally on all packets with receive 1988c2ecf20Sopenharmony_ci timestamps and on IPv6 packets with transmit timestamp. This option 1998c2ecf20Sopenharmony_ci extends them to IPv4 packets with transmit timestamp. One use case 2008c2ecf20Sopenharmony_ci is to correlate packets with their egress device, by enabling socket 2018c2ecf20Sopenharmony_ci option IP_PKTINFO simultaneously. 2028c2ecf20Sopenharmony_ci 2038c2ecf20Sopenharmony_ci 2048c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_OPT_TSONLY: 2058c2ecf20Sopenharmony_ci Applies to transmit timestamps only. Makes the kernel return the 2068c2ecf20Sopenharmony_ci timestamp as a cmsg alongside an empty packet, as opposed to 2078c2ecf20Sopenharmony_ci alongside the original packet. This reduces the amount of memory 2088c2ecf20Sopenharmony_ci charged to the socket's receive budget (SO_RCVBUF) and delivers 2098c2ecf20Sopenharmony_ci the timestamp even if sysctl net.core.tstamp_allow_data is 0. 2108c2ecf20Sopenharmony_ci This option disables SOF_TIMESTAMPING_OPT_CMSG. 2118c2ecf20Sopenharmony_ci 2128c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_OPT_STATS: 2138c2ecf20Sopenharmony_ci Optional stats that are obtained along with the transmit timestamps. 2148c2ecf20Sopenharmony_ci It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the 2158c2ecf20Sopenharmony_ci transmit timestamp is available, the stats are available in a 2168c2ecf20Sopenharmony_ci separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a 2178c2ecf20Sopenharmony_ci list of TLVs (struct nlattr) of types. These stats allow the 2188c2ecf20Sopenharmony_ci application to associate various transport layer stats with 2198c2ecf20Sopenharmony_ci the transmit timestamps, such as how long a certain block of 2208c2ecf20Sopenharmony_ci data was limited by peer's receiver window. 2218c2ecf20Sopenharmony_ci 2228c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_OPT_PKTINFO: 2238c2ecf20Sopenharmony_ci Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming 2248c2ecf20Sopenharmony_ci packets with hardware timestamps. The message contains struct 2258c2ecf20Sopenharmony_ci scm_ts_pktinfo, which supplies the index of the real interface which 2268c2ecf20Sopenharmony_ci received the packet and its length at layer 2. A valid (non-zero) 2278c2ecf20Sopenharmony_ci interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is 2288c2ecf20Sopenharmony_ci enabled and the driver is using NAPI. The struct contains also two 2298c2ecf20Sopenharmony_ci other fields, but they are reserved and undefined. 2308c2ecf20Sopenharmony_ci 2318c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_OPT_TX_SWHW: 2328c2ecf20Sopenharmony_ci Request both hardware and software timestamps for outgoing packets 2338c2ecf20Sopenharmony_ci when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE 2348c2ecf20Sopenharmony_ci are enabled at the same time. If both timestamps are generated, 2358c2ecf20Sopenharmony_ci two separate messages will be looped to the socket's error queue, 2368c2ecf20Sopenharmony_ci each containing just one timestamp. 2378c2ecf20Sopenharmony_ci 2388c2ecf20Sopenharmony_ciNew applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to 2398c2ecf20Sopenharmony_cidisambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate 2408c2ecf20Sopenharmony_ciregardless of the setting of sysctl net.core.tstamp_allow_data. 2418c2ecf20Sopenharmony_ci 2428c2ecf20Sopenharmony_ciAn exception is when a process needs additional cmsg data, for 2438c2ecf20Sopenharmony_ciinstance SOL_IP/IP_PKTINFO to detect the egress network interface. 2448c2ecf20Sopenharmony_ciThen pass option SOF_TIMESTAMPING_OPT_CMSG. This option depends on 2458c2ecf20Sopenharmony_cihaving access to the contents of the original packet, so cannot be 2468c2ecf20Sopenharmony_cicombined with SOF_TIMESTAMPING_OPT_TSONLY. 2478c2ecf20Sopenharmony_ci 2488c2ecf20Sopenharmony_ci 2498c2ecf20Sopenharmony_ci1.3.4. Enabling timestamps via control messages 2508c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2518c2ecf20Sopenharmony_ci 2528c2ecf20Sopenharmony_ciIn addition to socket options, timestamp generation can be requested 2538c2ecf20Sopenharmony_ciper write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1). 2548c2ecf20Sopenharmony_ciUsing this feature, applications can sample timestamps per sendmsg() 2558c2ecf20Sopenharmony_ciwithout paying the overhead of enabling and disabling timestamps via 2568c2ecf20Sopenharmony_cisetsockopt:: 2578c2ecf20Sopenharmony_ci 2588c2ecf20Sopenharmony_ci struct msghdr *msg; 2598c2ecf20Sopenharmony_ci ... 2608c2ecf20Sopenharmony_ci cmsg = CMSG_FIRSTHDR(msg); 2618c2ecf20Sopenharmony_ci cmsg->cmsg_level = SOL_SOCKET; 2628c2ecf20Sopenharmony_ci cmsg->cmsg_type = SO_TIMESTAMPING; 2638c2ecf20Sopenharmony_ci cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); 2648c2ecf20Sopenharmony_ci *((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED | 2658c2ecf20Sopenharmony_ci SOF_TIMESTAMPING_TX_SOFTWARE | 2668c2ecf20Sopenharmony_ci SOF_TIMESTAMPING_TX_ACK; 2678c2ecf20Sopenharmony_ci err = sendmsg(fd, msg, 0); 2688c2ecf20Sopenharmony_ci 2698c2ecf20Sopenharmony_ciThe SOF_TIMESTAMPING_TX_* flags set via cmsg will override 2708c2ecf20Sopenharmony_cithe SOF_TIMESTAMPING_TX_* flags set via setsockopt. 2718c2ecf20Sopenharmony_ci 2728c2ecf20Sopenharmony_ciMoreover, applications must still enable timestamp reporting via 2738c2ecf20Sopenharmony_cisetsockopt to receive timestamps:: 2748c2ecf20Sopenharmony_ci 2758c2ecf20Sopenharmony_ci __u32 val = SOF_TIMESTAMPING_SOFTWARE | 2768c2ecf20Sopenharmony_ci SOF_TIMESTAMPING_OPT_ID /* or any other flag */; 2778c2ecf20Sopenharmony_ci err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); 2788c2ecf20Sopenharmony_ci 2798c2ecf20Sopenharmony_ci 2808c2ecf20Sopenharmony_ci1.4 Bytestream Timestamps 2818c2ecf20Sopenharmony_ci------------------------- 2828c2ecf20Sopenharmony_ci 2838c2ecf20Sopenharmony_ciThe SO_TIMESTAMPING interface supports timestamping of bytes in a 2848c2ecf20Sopenharmony_cibytestream. Each request is interpreted as a request for when the 2858c2ecf20Sopenharmony_cientire contents of the buffer has passed a timestamping point. That 2868c2ecf20Sopenharmony_ciis, for streams option SOF_TIMESTAMPING_TX_SOFTWARE will record 2878c2ecf20Sopenharmony_ciwhen all bytes have reached the device driver, regardless of how 2888c2ecf20Sopenharmony_cimany packets the data has been converted into. 2898c2ecf20Sopenharmony_ci 2908c2ecf20Sopenharmony_ciIn general, bytestreams have no natural delimiters and therefore 2918c2ecf20Sopenharmony_cicorrelating a timestamp with data is non-trivial. A range of bytes 2928c2ecf20Sopenharmony_cimay be split across segments, any segments may be merged (possibly 2938c2ecf20Sopenharmony_cicoalescing sections of previously segmented buffers associated with 2948c2ecf20Sopenharmony_ciindependent send() calls). Segments can be reordered and the same 2958c2ecf20Sopenharmony_cibyte range can coexist in multiple segments for protocols that 2968c2ecf20Sopenharmony_ciimplement retransmissions. 2978c2ecf20Sopenharmony_ci 2988c2ecf20Sopenharmony_ciIt is essential that all timestamps implement the same semantics, 2998c2ecf20Sopenharmony_ciregardless of these possible transformations, as otherwise they are 3008c2ecf20Sopenharmony_ciincomparable. Handling "rare" corner cases differently from the 3018c2ecf20Sopenharmony_cisimple case (a 1:1 mapping from buffer to skb) is insufficient 3028c2ecf20Sopenharmony_cibecause performance debugging often needs to focus on such outliers. 3038c2ecf20Sopenharmony_ci 3048c2ecf20Sopenharmony_ciIn practice, timestamps can be correlated with segments of a 3058c2ecf20Sopenharmony_cibytestream consistently, if both semantics of the timestamp and the 3068c2ecf20Sopenharmony_citiming of measurement are chosen correctly. This challenge is no 3078c2ecf20Sopenharmony_cidifferent from deciding on a strategy for IP fragmentation. There, the 3088c2ecf20Sopenharmony_cidefinition is that only the first fragment is timestamped. For 3098c2ecf20Sopenharmony_cibytestreams, we chose that a timestamp is generated only when all 3108c2ecf20Sopenharmony_cibytes have passed a point. SOF_TIMESTAMPING_TX_ACK as defined is easy to 3118c2ecf20Sopenharmony_ciimplement and reason about. An implementation that has to take into 3128c2ecf20Sopenharmony_ciaccount SACK would be more complex due to possible transmission holes 3138c2ecf20Sopenharmony_ciand out of order arrival. 3148c2ecf20Sopenharmony_ci 3158c2ecf20Sopenharmony_ciOn the host, TCP can also break the simple 1:1 mapping from buffer to 3168c2ecf20Sopenharmony_ciskbuff as a result of Nagle, cork, autocork, segmentation and GSO. The 3178c2ecf20Sopenharmony_ciimplementation ensures correctness in all cases by tracking the 3188c2ecf20Sopenharmony_ciindividual last byte passed to send(), even if it is no longer the 3198c2ecf20Sopenharmony_cilast byte after an skbuff extend or merge operation. It stores the 3208c2ecf20Sopenharmony_cirelevant sequence number in skb_shinfo(skb)->tskey. Because an skbuff 3218c2ecf20Sopenharmony_cihas only one such field, only one timestamp can be generated. 3228c2ecf20Sopenharmony_ci 3238c2ecf20Sopenharmony_ciIn rare cases, a timestamp request can be missed if two requests are 3248c2ecf20Sopenharmony_cicollapsed onto the same skb. A process can detect this situation by 3258c2ecf20Sopenharmony_cienabling SOF_TIMESTAMPING_OPT_ID and comparing the byte offset at 3268c2ecf20Sopenharmony_cisend time with the value returned for each timestamp. It can prevent 3278c2ecf20Sopenharmony_cithe situation by always flushing the TCP stack in between requests, 3288c2ecf20Sopenharmony_cifor instance by enabling TCP_NODELAY and disabling TCP_CORK and 3298c2ecf20Sopenharmony_ciautocork. 3308c2ecf20Sopenharmony_ci 3318c2ecf20Sopenharmony_ciThese precautions ensure that the timestamp is generated only when all 3328c2ecf20Sopenharmony_cibytes have passed a timestamp point, assuming that the network stack 3338c2ecf20Sopenharmony_ciitself does not reorder the segments. The stack indeed tries to avoid 3348c2ecf20Sopenharmony_cireordering. The one exception is under administrator control: it is 3358c2ecf20Sopenharmony_cipossible to construct a packet scheduler configuration that delays 3368c2ecf20Sopenharmony_cisegments from the same stream differently. Such a setup would be 3378c2ecf20Sopenharmony_ciunusual. 3388c2ecf20Sopenharmony_ci 3398c2ecf20Sopenharmony_ci 3408c2ecf20Sopenharmony_ci2 Data Interfaces 3418c2ecf20Sopenharmony_ci================== 3428c2ecf20Sopenharmony_ci 3438c2ecf20Sopenharmony_ciTimestamps are read using the ancillary data feature of recvmsg(). 3448c2ecf20Sopenharmony_ciSee `man 3 cmsg` for details of this interface. The socket manual 3458c2ecf20Sopenharmony_cipage (`man 7 socket`) describes how timestamps generated with 3468c2ecf20Sopenharmony_ciSO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved. 3478c2ecf20Sopenharmony_ci 3488c2ecf20Sopenharmony_ci 3498c2ecf20Sopenharmony_ci2.1 SCM_TIMESTAMPING records 3508c2ecf20Sopenharmony_ci---------------------------- 3518c2ecf20Sopenharmony_ci 3528c2ecf20Sopenharmony_ciThese timestamps are returned in a control message with cmsg_level 3538c2ecf20Sopenharmony_ciSOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type 3548c2ecf20Sopenharmony_ci 3558c2ecf20Sopenharmony_ciFor SO_TIMESTAMPING_OLD:: 3568c2ecf20Sopenharmony_ci 3578c2ecf20Sopenharmony_ci struct scm_timestamping { 3588c2ecf20Sopenharmony_ci struct timespec ts[3]; 3598c2ecf20Sopenharmony_ci }; 3608c2ecf20Sopenharmony_ci 3618c2ecf20Sopenharmony_ciFor SO_TIMESTAMPING_NEW:: 3628c2ecf20Sopenharmony_ci 3638c2ecf20Sopenharmony_ci struct scm_timestamping64 { 3648c2ecf20Sopenharmony_ci struct __kernel_timespec ts[3]; 3658c2ecf20Sopenharmony_ci 3668c2ecf20Sopenharmony_ciAlways use SO_TIMESTAMPING_NEW timestamp to always get timestamp in 3678c2ecf20Sopenharmony_cistruct scm_timestamping64 format. 3688c2ecf20Sopenharmony_ci 3698c2ecf20Sopenharmony_ciSO_TIMESTAMPING_OLD returns incorrect timestamps after the year 2038 3708c2ecf20Sopenharmony_cion 32 bit machines. 3718c2ecf20Sopenharmony_ci 3728c2ecf20Sopenharmony_ciThe structure can return up to three timestamps. This is a legacy 3738c2ecf20Sopenharmony_cifeature. At least one field is non-zero at any time. Most timestamps 3748c2ecf20Sopenharmony_ciare passed in ts[0]. Hardware timestamps are passed in ts[2]. 3758c2ecf20Sopenharmony_ci 3768c2ecf20Sopenharmony_cits[1] used to hold hardware timestamps converted to system time. 3778c2ecf20Sopenharmony_ciInstead, expose the hardware clock device on the NIC directly as 3788c2ecf20Sopenharmony_cia HW PTP clock source, to allow time conversion in userspace and 3798c2ecf20Sopenharmony_cioptionally synchronize system time with a userspace PTP stack such 3808c2ecf20Sopenharmony_cias linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst. 3818c2ecf20Sopenharmony_ci 3828c2ecf20Sopenharmony_ciNote that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled 3838c2ecf20Sopenharmony_citogether with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false 3848c2ecf20Sopenharmony_cisoftware timestamp will be generated in the recvmsg() call and passed 3858c2ecf20Sopenharmony_ciin ts[0] when a real software timestamp is missing. This happens also 3868c2ecf20Sopenharmony_cion hardware transmit timestamps. 3878c2ecf20Sopenharmony_ci 3888c2ecf20Sopenharmony_ci2.1.1 Transmit timestamps with MSG_ERRQUEUE 3898c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3908c2ecf20Sopenharmony_ci 3918c2ecf20Sopenharmony_ciFor transmit timestamps the outgoing packet is looped back to the 3928c2ecf20Sopenharmony_cisocket's error queue with the send timestamp(s) attached. A process 3938c2ecf20Sopenharmony_cireceives the timestamps by calling recvmsg() with flag MSG_ERRQUEUE 3948c2ecf20Sopenharmony_ciset and with a msg_control buffer sufficiently large to receive the 3958c2ecf20Sopenharmony_cirelevant metadata structures. The recvmsg call returns the original 3968c2ecf20Sopenharmony_cioutgoing data packet with two ancillary messages attached. 3978c2ecf20Sopenharmony_ci 3988c2ecf20Sopenharmony_ciA message of cm_level SOL_IP(V6) and cm_type IP(V6)_RECVERR 3998c2ecf20Sopenharmony_ciembeds a struct sock_extended_err. This defines the error type. For 4008c2ecf20Sopenharmony_citimestamps, the ee_errno field is ENOMSG. The other ancillary message 4018c2ecf20Sopenharmony_ciwill have cm_level SOL_SOCKET and cm_type SCM_TIMESTAMPING. This 4028c2ecf20Sopenharmony_ciembeds the struct scm_timestamping. 4038c2ecf20Sopenharmony_ci 4048c2ecf20Sopenharmony_ci 4058c2ecf20Sopenharmony_ci2.1.1.2 Timestamp types 4068c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~ 4078c2ecf20Sopenharmony_ci 4088c2ecf20Sopenharmony_ciThe semantics of the three struct timespec are defined by field 4098c2ecf20Sopenharmony_ciee_info in the extended error structure. It contains a value of 4108c2ecf20Sopenharmony_citype SCM_TSTAMP_* to define the actual timestamp passed in 4118c2ecf20Sopenharmony_ciscm_timestamping. 4128c2ecf20Sopenharmony_ci 4138c2ecf20Sopenharmony_ciThe SCM_TSTAMP_* types are 1:1 matches to the SOF_TIMESTAMPING_* 4148c2ecf20Sopenharmony_cicontrol fields discussed previously, with one exception. For legacy 4158c2ecf20Sopenharmony_cireasons, SCM_TSTAMP_SND is equal to zero and can be set for both 4168c2ecf20Sopenharmony_ciSOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE. It 4178c2ecf20Sopenharmony_ciis the first if ts[2] is non-zero, the second otherwise, in which 4188c2ecf20Sopenharmony_cicase the timestamp is stored in ts[0]. 4198c2ecf20Sopenharmony_ci 4208c2ecf20Sopenharmony_ci 4218c2ecf20Sopenharmony_ci2.1.1.3 Fragmentation 4228c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~ 4238c2ecf20Sopenharmony_ci 4248c2ecf20Sopenharmony_ciFragmentation of outgoing datagrams is rare, but is possible, e.g., by 4258c2ecf20Sopenharmony_ciexplicitly disabling PMTU discovery. If an outgoing packet is fragmented, 4268c2ecf20Sopenharmony_cithen only the first fragment is timestamped and returned to the sending 4278c2ecf20Sopenharmony_cisocket. 4288c2ecf20Sopenharmony_ci 4298c2ecf20Sopenharmony_ci 4308c2ecf20Sopenharmony_ci2.1.1.4 Packet Payload 4318c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~ 4328c2ecf20Sopenharmony_ci 4338c2ecf20Sopenharmony_ciThe calling application is often not interested in receiving the whole 4348c2ecf20Sopenharmony_cipacket payload that it passed to the stack originally: the socket 4358c2ecf20Sopenharmony_cierror queue mechanism is just a method to piggyback the timestamp on. 4368c2ecf20Sopenharmony_ciIn this case, the application can choose to read datagrams with a 4378c2ecf20Sopenharmony_cismaller buffer, possibly even of length 0. The payload is truncated 4388c2ecf20Sopenharmony_ciaccordingly. Until the process calls recvmsg() on the error queue, 4398c2ecf20Sopenharmony_cihowever, the full packet is queued, taking up budget from SO_RCVBUF. 4408c2ecf20Sopenharmony_ci 4418c2ecf20Sopenharmony_ci 4428c2ecf20Sopenharmony_ci2.1.1.5 Blocking Read 4438c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~ 4448c2ecf20Sopenharmony_ci 4458c2ecf20Sopenharmony_ciReading from the error queue is always a non-blocking operation. To 4468c2ecf20Sopenharmony_ciblock waiting on a timestamp, use poll or select. poll() will return 4478c2ecf20Sopenharmony_ciPOLLERR in pollfd.revents if any data is ready on the error queue. 4488c2ecf20Sopenharmony_ciThere is no need to pass this flag in pollfd.events. This flag is 4498c2ecf20Sopenharmony_ciignored on request. See also `man 2 poll`. 4508c2ecf20Sopenharmony_ci 4518c2ecf20Sopenharmony_ci 4528c2ecf20Sopenharmony_ci2.1.2 Receive timestamps 4538c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^ 4548c2ecf20Sopenharmony_ci 4558c2ecf20Sopenharmony_ciOn reception, there is no reason to read from the socket error queue. 4568c2ecf20Sopenharmony_ciThe SCM_TIMESTAMPING ancillary data is sent along with the packet data 4578c2ecf20Sopenharmony_cion a normal recvmsg(). Since this is not a socket error, it is not 4588c2ecf20Sopenharmony_ciaccompanied by a message SOL_IP(V6)/IP(V6)_RECVERROR. In this case, 4598c2ecf20Sopenharmony_cithe meaning of the three fields in struct scm_timestamping is 4608c2ecf20Sopenharmony_ciimplicitly defined. ts[0] holds a software timestamp if set, ts[1] 4618c2ecf20Sopenharmony_ciis again deprecated and ts[2] holds a hardware timestamp if set. 4628c2ecf20Sopenharmony_ci 4638c2ecf20Sopenharmony_ci 4648c2ecf20Sopenharmony_ci3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP 4658c2ecf20Sopenharmony_ci======================================================================= 4668c2ecf20Sopenharmony_ci 4678c2ecf20Sopenharmony_ciHardware time stamping must also be initialized for each device driver 4688c2ecf20Sopenharmony_cithat is expected to do hardware time stamping. The parameter is defined in 4698c2ecf20Sopenharmony_ciinclude/uapi/linux/net_tstamp.h as:: 4708c2ecf20Sopenharmony_ci 4718c2ecf20Sopenharmony_ci struct hwtstamp_config { 4728c2ecf20Sopenharmony_ci int flags; /* no flags defined right now, must be zero */ 4738c2ecf20Sopenharmony_ci int tx_type; /* HWTSTAMP_TX_* */ 4748c2ecf20Sopenharmony_ci int rx_filter; /* HWTSTAMP_FILTER_* */ 4758c2ecf20Sopenharmony_ci }; 4768c2ecf20Sopenharmony_ci 4778c2ecf20Sopenharmony_ciDesired behavior is passed into the kernel and to a specific device by 4788c2ecf20Sopenharmony_cicalling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose 4798c2ecf20Sopenharmony_ciifr_data points to a struct hwtstamp_config. The tx_type and 4808c2ecf20Sopenharmony_cirx_filter are hints to the driver what it is expected to do. If 4818c2ecf20Sopenharmony_cithe requested fine-grained filtering for incoming packets is not 4828c2ecf20Sopenharmony_cisupported, the driver may time stamp more than just the requested types 4838c2ecf20Sopenharmony_ciof packets. 4848c2ecf20Sopenharmony_ci 4858c2ecf20Sopenharmony_ciDrivers are free to use a more permissive configuration than the requested 4868c2ecf20Sopenharmony_ciconfiguration. It is expected that drivers should only implement directly the 4878c2ecf20Sopenharmony_cimost generic mode that can be supported. For example if the hardware can 4888c2ecf20Sopenharmony_cisupport HWTSTAMP_FILTER_V2_EVENT, then it should generally always upscale 4898c2ecf20Sopenharmony_ciHWTSTAMP_FILTER_V2_L2_SYNC_MESSAGE, and so forth, as HWTSTAMP_FILTER_V2_EVENT 4908c2ecf20Sopenharmony_ciis more generic (and more useful to applications). 4918c2ecf20Sopenharmony_ci 4928c2ecf20Sopenharmony_ciA driver which supports hardware time stamping shall update the struct 4938c2ecf20Sopenharmony_ciwith the actual, possibly more permissive configuration. If the 4948c2ecf20Sopenharmony_cirequested packets cannot be time stamped, then nothing should be 4958c2ecf20Sopenharmony_cichanged and ERANGE shall be returned (in contrast to EINVAL, which 4968c2ecf20Sopenharmony_ciindicates that SIOCSHWTSTAMP is not supported at all). 4978c2ecf20Sopenharmony_ci 4988c2ecf20Sopenharmony_ciOnly a processes with admin rights may change the configuration. User 4998c2ecf20Sopenharmony_cispace is responsible to ensure that multiple processes don't interfere 5008c2ecf20Sopenharmony_ciwith each other and that the settings are reset. 5018c2ecf20Sopenharmony_ci 5028c2ecf20Sopenharmony_ciAny process can read the actual configuration by passing this 5038c2ecf20Sopenharmony_cistructure to ioctl(SIOCGHWTSTAMP) in the same way. However, this has 5048c2ecf20Sopenharmony_cinot been implemented in all drivers. 5058c2ecf20Sopenharmony_ci 5068c2ecf20Sopenharmony_ci:: 5078c2ecf20Sopenharmony_ci 5088c2ecf20Sopenharmony_ci /* possible values for hwtstamp_config->tx_type */ 5098c2ecf20Sopenharmony_ci enum { 5108c2ecf20Sopenharmony_ci /* 5118c2ecf20Sopenharmony_ci * no outgoing packet will need hardware time stamping; 5128c2ecf20Sopenharmony_ci * should a packet arrive which asks for it, no hardware 5138c2ecf20Sopenharmony_ci * time stamping will be done 5148c2ecf20Sopenharmony_ci */ 5158c2ecf20Sopenharmony_ci HWTSTAMP_TX_OFF, 5168c2ecf20Sopenharmony_ci 5178c2ecf20Sopenharmony_ci /* 5188c2ecf20Sopenharmony_ci * enables hardware time stamping for outgoing packets; 5198c2ecf20Sopenharmony_ci * the sender of the packet decides which are to be 5208c2ecf20Sopenharmony_ci * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE 5218c2ecf20Sopenharmony_ci * before sending the packet 5228c2ecf20Sopenharmony_ci */ 5238c2ecf20Sopenharmony_ci HWTSTAMP_TX_ON, 5248c2ecf20Sopenharmony_ci }; 5258c2ecf20Sopenharmony_ci 5268c2ecf20Sopenharmony_ci /* possible values for hwtstamp_config->rx_filter */ 5278c2ecf20Sopenharmony_ci enum { 5288c2ecf20Sopenharmony_ci /* time stamp no incoming packet at all */ 5298c2ecf20Sopenharmony_ci HWTSTAMP_FILTER_NONE, 5308c2ecf20Sopenharmony_ci 5318c2ecf20Sopenharmony_ci /* time stamp any incoming packet */ 5328c2ecf20Sopenharmony_ci HWTSTAMP_FILTER_ALL, 5338c2ecf20Sopenharmony_ci 5348c2ecf20Sopenharmony_ci /* return value: time stamp all packets requested plus some others */ 5358c2ecf20Sopenharmony_ci HWTSTAMP_FILTER_SOME, 5368c2ecf20Sopenharmony_ci 5378c2ecf20Sopenharmony_ci /* PTP v1, UDP, any kind of event packet */ 5388c2ecf20Sopenharmony_ci HWTSTAMP_FILTER_PTP_V1_L4_EVENT, 5398c2ecf20Sopenharmony_ci 5408c2ecf20Sopenharmony_ci /* for the complete list of values, please check 5418c2ecf20Sopenharmony_ci * the include file include/uapi/linux/net_tstamp.h 5428c2ecf20Sopenharmony_ci */ 5438c2ecf20Sopenharmony_ci }; 5448c2ecf20Sopenharmony_ci 5458c2ecf20Sopenharmony_ci3.1 Hardware Timestamping Implementation: Device Drivers 5468c2ecf20Sopenharmony_ci-------------------------------------------------------- 5478c2ecf20Sopenharmony_ci 5488c2ecf20Sopenharmony_ciA driver which supports hardware time stamping must support the 5498c2ecf20Sopenharmony_ciSIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with 5508c2ecf20Sopenharmony_cithe actual values as described in the section on SIOCSHWTSTAMP. It 5518c2ecf20Sopenharmony_cishould also support SIOCGHWTSTAMP. 5528c2ecf20Sopenharmony_ci 5538c2ecf20Sopenharmony_ciTime stamps for received packets must be stored in the skb. To get a pointer 5548c2ecf20Sopenharmony_cito the shared time stamp structure of the skb call skb_hwtstamps(). Then 5558c2ecf20Sopenharmony_ciset the time stamps in the structure:: 5568c2ecf20Sopenharmony_ci 5578c2ecf20Sopenharmony_ci struct skb_shared_hwtstamps { 5588c2ecf20Sopenharmony_ci /* hardware time stamp transformed into duration 5598c2ecf20Sopenharmony_ci * since arbitrary point in time 5608c2ecf20Sopenharmony_ci */ 5618c2ecf20Sopenharmony_ci ktime_t hwtstamp; 5628c2ecf20Sopenharmony_ci }; 5638c2ecf20Sopenharmony_ci 5648c2ecf20Sopenharmony_ciTime stamps for outgoing packets are to be generated as follows: 5658c2ecf20Sopenharmony_ci 5668c2ecf20Sopenharmony_ci- In hard_start_xmit(), check if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) 5678c2ecf20Sopenharmony_ci is set no-zero. If yes, then the driver is expected to do hardware time 5688c2ecf20Sopenharmony_ci stamping. 5698c2ecf20Sopenharmony_ci- If this is possible for the skb and requested, then declare 5708c2ecf20Sopenharmony_ci that the driver is doing the time stamping by setting the flag 5718c2ecf20Sopenharmony_ci SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with:: 5728c2ecf20Sopenharmony_ci 5738c2ecf20Sopenharmony_ci skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; 5748c2ecf20Sopenharmony_ci 5758c2ecf20Sopenharmony_ci You might want to keep a pointer to the associated skb for the next step 5768c2ecf20Sopenharmony_ci and not free the skb. A driver not supporting hardware time stamping doesn't 5778c2ecf20Sopenharmony_ci do that. A driver must never touch sk_buff::tstamp! It is used to store 5788c2ecf20Sopenharmony_ci software generated time stamps by the network subsystem. 5798c2ecf20Sopenharmony_ci- Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware 5808c2ecf20Sopenharmony_ci as possible. skb_tx_timestamp() provides a software time stamp if requested 5818c2ecf20Sopenharmony_ci and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set). 5828c2ecf20Sopenharmony_ci- As soon as the driver has sent the packet and/or obtained a 5838c2ecf20Sopenharmony_ci hardware time stamp for it, it passes the time stamp back by 5848c2ecf20Sopenharmony_ci calling skb_hwtstamp_tx() with the original skb, the raw 5858c2ecf20Sopenharmony_ci hardware time stamp. skb_hwtstamp_tx() clones the original skb and 5868c2ecf20Sopenharmony_ci adds the timestamps, therefore the original skb has to be freed now. 5878c2ecf20Sopenharmony_ci If obtaining the hardware time stamp somehow fails, then the driver 5888c2ecf20Sopenharmony_ci should not fall back to software time stamping. The rationale is that 5898c2ecf20Sopenharmony_ci this would occur at a later time in the processing pipeline than other 5908c2ecf20Sopenharmony_ci software time stamping and therefore could lead to unexpected deltas 5918c2ecf20Sopenharmony_ci between time stamps. 5928c2ecf20Sopenharmony_ci 5938c2ecf20Sopenharmony_ci3.2 Special considerations for stacked PTP Hardware Clocks 5948c2ecf20Sopenharmony_ci---------------------------------------------------------- 5958c2ecf20Sopenharmony_ci 5968c2ecf20Sopenharmony_ciThere are situations when there may be more than one PHC (PTP Hardware Clock) 5978c2ecf20Sopenharmony_ciin the data path of a packet. The kernel has no explicit mechanism to allow the 5988c2ecf20Sopenharmony_ciuser to select which PHC to use for timestamping Ethernet frames. Instead, the 5998c2ecf20Sopenharmony_ciassumption is that the outermost PHC is always the most preferable, and that 6008c2ecf20Sopenharmony_cikernel drivers collaborate towards achieving that goal. Currently there are 3 6018c2ecf20Sopenharmony_cicases of stacked PHCs, detailed below: 6028c2ecf20Sopenharmony_ci 6038c2ecf20Sopenharmony_ci3.2.1 DSA (Distributed Switch Architecture) switches 6048c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 6058c2ecf20Sopenharmony_ci 6068c2ecf20Sopenharmony_ciThese are Ethernet switches which have one of their ports connected to an 6078c2ecf20Sopenharmony_ci(otherwise completely unaware) host Ethernet interface, and perform the role of 6088c2ecf20Sopenharmony_cia port multiplier with optional forwarding acceleration features. Each DSA 6098c2ecf20Sopenharmony_ciswitch port is visible to the user as a standalone (virtual) network interface, 6108c2ecf20Sopenharmony_ciand its network I/O is performed, under the hood, indirectly through the host 6118c2ecf20Sopenharmony_ciinterface (redirecting to the host port on TX, and intercepting frames on RX). 6128c2ecf20Sopenharmony_ci 6138c2ecf20Sopenharmony_ciWhen a DSA switch is attached to a host port, PTP synchronization has to 6148c2ecf20Sopenharmony_cisuffer, since the switch's variable queuing delay introduces a path delay 6158c2ecf20Sopenharmony_cijitter between the host port and its PTP partner. For this reason, some DSA 6168c2ecf20Sopenharmony_ciswitches include a timestamping clock of their own, and have the ability to 6178c2ecf20Sopenharmony_ciperform network timestamping on their own MAC, such that path delays only 6188c2ecf20Sopenharmony_cimeasure wire and PHY propagation latencies. Timestamping DSA switches are 6198c2ecf20Sopenharmony_cisupported in Linux and expose the same ABI as any other network interface (save 6208c2ecf20Sopenharmony_cifor the fact that the DSA interfaces are in fact virtual in terms of network 6218c2ecf20Sopenharmony_ciI/O, they do have their own PHC). It is typical, but not mandatory, for all 6228c2ecf20Sopenharmony_ciinterfaces of a DSA switch to share the same PHC. 6238c2ecf20Sopenharmony_ci 6248c2ecf20Sopenharmony_ciBy design, PTP timestamping with a DSA switch does not need any special 6258c2ecf20Sopenharmony_cihandling in the driver for the host port it is attached to. However, when the 6268c2ecf20Sopenharmony_cihost port also supports PTP timestamping, DSA will take care of intercepting 6278c2ecf20Sopenharmony_cithe ``.ndo_do_ioctl`` calls towards the host port, and block attempts to enable 6288c2ecf20Sopenharmony_cihardware timestamping on it. This is because the SO_TIMESTAMPING API does not 6298c2ecf20Sopenharmony_ciallow the delivery of multiple hardware timestamps for the same packet, so 6308c2ecf20Sopenharmony_cianybody else except for the DSA switch port must be prevented from doing so. 6318c2ecf20Sopenharmony_ci 6328c2ecf20Sopenharmony_ciIn code, DSA provides for most of the infrastructure for timestamping already, 6338c2ecf20Sopenharmony_ciin generic code: a BPF classifier (``ptp_classify_raw``) is used to identify 6348c2ecf20Sopenharmony_ciPTP event messages (any other packets, including PTP general messages, are not 6358c2ecf20Sopenharmony_citimestamped), and provides two hooks to drivers: 6368c2ecf20Sopenharmony_ci 6378c2ecf20Sopenharmony_ci- ``.port_txtstamp()``: The driver is passed a clone of the timestampable skb 6388c2ecf20Sopenharmony_ci to be transmitted, before actually transmitting it. Typically, a switch will 6398c2ecf20Sopenharmony_ci have a PTP TX timestamp register (or sometimes a FIFO) where the timestamp 6408c2ecf20Sopenharmony_ci becomes available. There may be an IRQ that is raised upon this timestamp's 6418c2ecf20Sopenharmony_ci availability, or the driver might have to poll after invoking 6428c2ecf20Sopenharmony_ci ``dev_queue_xmit()`` towards the host interface. Either way, in the 6438c2ecf20Sopenharmony_ci ``.port_txtstamp()`` method, the driver only needs to save the clone for 6448c2ecf20Sopenharmony_ci later use (when the timestamp becomes available). Each skb is annotated with 6458c2ecf20Sopenharmony_ci a pointer to its clone, in ``DSA_SKB_CB(skb)->clone``, to ease the driver's 6468c2ecf20Sopenharmony_ci job of keeping track of which clone belongs to which skb. 6478c2ecf20Sopenharmony_ci 6488c2ecf20Sopenharmony_ci- ``.port_rxtstamp()``: The original (and only) timestampable skb is provided 6498c2ecf20Sopenharmony_ci to the driver, for it to annotate it with a timestamp, if that is immediately 6508c2ecf20Sopenharmony_ci available, or defer to later. On reception, timestamps might either be 6518c2ecf20Sopenharmony_ci available in-band (through metadata in the DSA header, or attached in other 6528c2ecf20Sopenharmony_ci ways to the packet), or out-of-band (through another RX timestamping FIFO). 6538c2ecf20Sopenharmony_ci Deferral on RX is typically necessary when retrieving the timestamp needs a 6548c2ecf20Sopenharmony_ci sleepable context. In that case, it is the responsibility of the DSA driver 6558c2ecf20Sopenharmony_ci to call ``netif_rx_ni()`` on the freshly timestamped skb. 6568c2ecf20Sopenharmony_ci 6578c2ecf20Sopenharmony_ci3.2.2 Ethernet PHYs 6588c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^ 6598c2ecf20Sopenharmony_ci 6608c2ecf20Sopenharmony_ciThese are devices that typically fulfill a Layer 1 role in the network stack, 6618c2ecf20Sopenharmony_cihence they do not have a representation in terms of a network interface as DSA 6628c2ecf20Sopenharmony_ciswitches do. However, PHYs may be able to detect and timestamp PTP packets, for 6638c2ecf20Sopenharmony_ciperformance reasons: timestamps taken as close as possible to the wire have the 6648c2ecf20Sopenharmony_cipotential to yield a more stable and precise synchronization. 6658c2ecf20Sopenharmony_ci 6668c2ecf20Sopenharmony_ciA PHY driver that supports PTP timestamping must create a ``struct 6678c2ecf20Sopenharmony_cimii_timestamper`` and add a pointer to it in ``phydev->mii_ts``. The presence 6688c2ecf20Sopenharmony_ciof this pointer will be checked by the networking stack. 6698c2ecf20Sopenharmony_ci 6708c2ecf20Sopenharmony_ciSince PHYs do not have network interface representations, the timestamping and 6718c2ecf20Sopenharmony_ciethtool ioctl operations for them need to be mediated by their respective MAC 6728c2ecf20Sopenharmony_cidriver. Therefore, as opposed to DSA switches, modifications need to be done 6738c2ecf20Sopenharmony_cito each individual MAC driver for PHY timestamping support. This entails: 6748c2ecf20Sopenharmony_ci 6758c2ecf20Sopenharmony_ci- Checking, in ``.ndo_do_ioctl``, whether ``phy_has_hwtstamp(netdev->phydev)`` 6768c2ecf20Sopenharmony_ci is true or not. If it is, then the MAC driver should not process this request 6778c2ecf20Sopenharmony_ci but instead pass it on to the PHY using ``phy_mii_ioctl()``. 6788c2ecf20Sopenharmony_ci 6798c2ecf20Sopenharmony_ci- On RX, special intervention may or may not be needed, depending on the 6808c2ecf20Sopenharmony_ci function used to deliver skb's up the network stack. In the case of plain 6818c2ecf20Sopenharmony_ci ``netif_rx()`` and similar, MAC drivers must check whether 6828c2ecf20Sopenharmony_ci ``skb_defer_rx_timestamp(skb)`` is necessary or not - and if it is, don't 6838c2ecf20Sopenharmony_ci call ``netif_rx()`` at all. If ``CONFIG_NETWORK_PHY_TIMESTAMPING`` is 6848c2ecf20Sopenharmony_ci enabled, and ``skb->dev->phydev->mii_ts`` exists, its ``.rxtstamp()`` hook 6858c2ecf20Sopenharmony_ci will be called now, to determine, using logic very similar to DSA, whether 6868c2ecf20Sopenharmony_ci deferral for RX timestamping is necessary. Again like DSA, it becomes the 6878c2ecf20Sopenharmony_ci responsibility of the PHY driver to send the packet up the stack when the 6888c2ecf20Sopenharmony_ci timestamp is available. 6898c2ecf20Sopenharmony_ci 6908c2ecf20Sopenharmony_ci For other skb receive functions, such as ``napi_gro_receive`` and 6918c2ecf20Sopenharmony_ci ``netif_receive_skb``, the stack automatically checks whether 6928c2ecf20Sopenharmony_ci ``skb_defer_rx_timestamp()`` is necessary, so this check is not needed inside 6938c2ecf20Sopenharmony_ci the driver. 6948c2ecf20Sopenharmony_ci 6958c2ecf20Sopenharmony_ci- On TX, again, special intervention might or might not be needed. The 6968c2ecf20Sopenharmony_ci function that calls the ``mii_ts->txtstamp()`` hook is named 6978c2ecf20Sopenharmony_ci ``skb_clone_tx_timestamp()``. This function can either be called directly 6988c2ecf20Sopenharmony_ci (case in which explicit MAC driver support is indeed needed), but the 6998c2ecf20Sopenharmony_ci function also piggybacks from the ``skb_tx_timestamp()`` call, which many MAC 7008c2ecf20Sopenharmony_ci drivers already perform for software timestamping purposes. Therefore, if a 7018c2ecf20Sopenharmony_ci MAC supports software timestamping, it does not need to do anything further 7028c2ecf20Sopenharmony_ci at this stage. 7038c2ecf20Sopenharmony_ci 7048c2ecf20Sopenharmony_ci3.2.3 MII bus snooping devices 7058c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 7068c2ecf20Sopenharmony_ci 7078c2ecf20Sopenharmony_ciThese perform the same role as timestamping Ethernet PHYs, save for the fact 7088c2ecf20Sopenharmony_cithat they are discrete devices and can therefore be used in conjunction with 7098c2ecf20Sopenharmony_ciany PHY even if it doesn't support timestamping. In Linux, they are 7108c2ecf20Sopenharmony_cidiscoverable and attachable to a ``struct phy_device`` through Device Tree, and 7118c2ecf20Sopenharmony_cifor the rest, they use the same mii_ts infrastructure as those. See 7128c2ecf20Sopenharmony_ciDocumentation/devicetree/bindings/ptp/timestamper.txt for more details. 7138c2ecf20Sopenharmony_ci 7148c2ecf20Sopenharmony_ci3.2.4 Other caveats for MAC drivers 7158c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 7168c2ecf20Sopenharmony_ci 7178c2ecf20Sopenharmony_ciStacked PHCs, especially DSA (but not only) - since that doesn't require any 7188c2ecf20Sopenharmony_cimodification to MAC drivers, so it is more difficult to ensure correctness of 7198c2ecf20Sopenharmony_ciall possible code paths - is that they uncover bugs which were impossible to 7208c2ecf20Sopenharmony_citrigger before the existence of stacked PTP clocks. One example has to do with 7218c2ecf20Sopenharmony_cithis line of code, already presented earlier:: 7228c2ecf20Sopenharmony_ci 7238c2ecf20Sopenharmony_ci skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; 7248c2ecf20Sopenharmony_ci 7258c2ecf20Sopenharmony_ciAny TX timestamping logic, be it a plain MAC driver, a DSA switch driver, a PHY 7268c2ecf20Sopenharmony_cidriver or a MII bus snooping device driver, should set this flag. 7278c2ecf20Sopenharmony_ciBut a MAC driver that is unaware of PHC stacking might get tripped up by 7288c2ecf20Sopenharmony_cisomebody other than itself setting this flag, and deliver a duplicate 7298c2ecf20Sopenharmony_citimestamp. 7308c2ecf20Sopenharmony_ciFor example, a typical driver design for TX timestamping might be to split the 7318c2ecf20Sopenharmony_citransmission part into 2 portions: 7328c2ecf20Sopenharmony_ci 7338c2ecf20Sopenharmony_ci1. "TX": checks whether PTP timestamping has been previously enabled through 7348c2ecf20Sopenharmony_ci the ``.ndo_do_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the 7358c2ecf20Sopenharmony_ci current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags & 7368c2ecf20Sopenharmony_ci SKBTX_HW_TSTAMP``"). If this is true, it sets the 7378c2ecf20Sopenharmony_ci "``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS``" flag. Note: as 7388c2ecf20Sopenharmony_ci described above, in the case of a stacked PHC system, this condition should 7398c2ecf20Sopenharmony_ci never trigger, as this MAC is certainly not the outermost PHC. But this is 7408c2ecf20Sopenharmony_ci not where the typical issue is. Transmission proceeds with this packet. 7418c2ecf20Sopenharmony_ci 7428c2ecf20Sopenharmony_ci2. "TX confirmation": Transmission has finished. The driver checks whether it 7438c2ecf20Sopenharmony_ci is necessary to collect any TX timestamp for it. Here is where the typical 7448c2ecf20Sopenharmony_ci issues are: the MAC driver takes a shortcut and only checks whether 7458c2ecf20Sopenharmony_ci "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``" was set. With a stacked 7468c2ecf20Sopenharmony_ci PHC system, this is incorrect because this MAC driver is not the only entity 7478c2ecf20Sopenharmony_ci in the TX data path who could have enabled SKBTX_IN_PROGRESS in the first 7488c2ecf20Sopenharmony_ci place. 7498c2ecf20Sopenharmony_ci 7508c2ecf20Sopenharmony_ciThe correct solution for this problem is for MAC drivers to have a compound 7518c2ecf20Sopenharmony_cicheck in their "TX confirmation" portion, not only for 7528c2ecf20Sopenharmony_ci"``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``", but also for 7538c2ecf20Sopenharmony_ci"``priv->hwtstamp_tx_enabled == true``". Because the rest of the system ensures 7548c2ecf20Sopenharmony_cithat PTP timestamping is not enabled for anything other than the outermost PHC, 7558c2ecf20Sopenharmony_cithis enhanced check will avoid delivering a duplicated TX timestamp to user 7568c2ecf20Sopenharmony_cispace. 757