162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci============ 462306a36Sopenharmony_ciTimestamping 562306a36Sopenharmony_ci============ 662306a36Sopenharmony_ci 762306a36Sopenharmony_ci 862306a36Sopenharmony_ci1. Control Interfaces 962306a36Sopenharmony_ci===================== 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ciThe interfaces for receiving network packages timestamps are: 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciSO_TIMESTAMP 1462306a36Sopenharmony_ci Generates a timestamp for each incoming packet in (not necessarily 1562306a36Sopenharmony_ci monotonic) system time. Reports the timestamp via recvmsg() in a 1662306a36Sopenharmony_ci control message in usec resolution. 1762306a36Sopenharmony_ci SO_TIMESTAMP is defined as SO_TIMESTAMP_NEW or SO_TIMESTAMP_OLD 1862306a36Sopenharmony_ci based on the architecture type and time_t representation of libc. 1962306a36Sopenharmony_ci Control message format is in struct __kernel_old_timeval for 2062306a36Sopenharmony_ci SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for 2162306a36Sopenharmony_ci SO_TIMESTAMP_NEW options respectively. 2262306a36Sopenharmony_ci 2362306a36Sopenharmony_ciSO_TIMESTAMPNS 2462306a36Sopenharmony_ci Same timestamping mechanism as SO_TIMESTAMP, but reports the 2562306a36Sopenharmony_ci timestamp as struct timespec in nsec resolution. 2662306a36Sopenharmony_ci SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD 2762306a36Sopenharmony_ci based on the architecture type and time_t representation of libc. 2862306a36Sopenharmony_ci Control message format is in struct timespec for SO_TIMESTAMPNS_OLD 2962306a36Sopenharmony_ci and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options 3062306a36Sopenharmony_ci respectively. 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ciIP_MULTICAST_LOOP + SO_TIMESTAMP[NS] 3362306a36Sopenharmony_ci Only for multicast:approximate transmit timestamp obtained by 3462306a36Sopenharmony_ci reading the looped packet receive timestamp. 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ciSO_TIMESTAMPING 3762306a36Sopenharmony_ci Generates timestamps on reception, transmission or both. Supports 3862306a36Sopenharmony_ci multiple timestamp sources, including hardware. Supports generating 3962306a36Sopenharmony_ci timestamps for stream sockets. 4062306a36Sopenharmony_ci 4162306a36Sopenharmony_ci 4262306a36Sopenharmony_ci1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW) 4362306a36Sopenharmony_ci------------------------------------------------------------- 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ciThis socket option enables timestamping of datagrams on the reception 4662306a36Sopenharmony_cipath. Because the destination socket, if any, is not known early in 4762306a36Sopenharmony_cithe network stack, the feature has to be enabled for all packets. The 4862306a36Sopenharmony_cisame is true for all early receive timestamp options. 4962306a36Sopenharmony_ci 5062306a36Sopenharmony_ciFor interface details, see `man 7 socket`. 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ciAlways use SO_TIMESTAMP_NEW timestamp to always get timestamp in 5362306a36Sopenharmony_cistruct __kernel_sock_timeval format. 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ciSO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038 5662306a36Sopenharmony_cion 32 bit machines. 5762306a36Sopenharmony_ci 5862306a36Sopenharmony_ci1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW) 5962306a36Sopenharmony_ci------------------------------------------------------------------- 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ciThis option is identical to SO_TIMESTAMP except for the returned data type. 6262306a36Sopenharmony_ciIts struct timespec allows for higher resolution (ns) timestamps than the 6362306a36Sopenharmony_citimeval of SO_TIMESTAMP (ms). 6462306a36Sopenharmony_ci 6562306a36Sopenharmony_ciAlways use SO_TIMESTAMPNS_NEW timestamp to always get timestamp in 6662306a36Sopenharmony_cistruct __kernel_timespec format. 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ciSO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038 6962306a36Sopenharmony_cion 32 bit machines. 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ci1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW) 7262306a36Sopenharmony_ci---------------------------------------------------------------------- 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ciSupports multiple types of timestamp requests. As a result, this 7562306a36Sopenharmony_cisocket option takes a bitmap of flags, not a boolean. In:: 7662306a36Sopenharmony_ci 7762306a36Sopenharmony_ci err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_cival is an integer with any of the following bits set. Setting other 8062306a36Sopenharmony_cibit returns EINVAL and does not change the current state. 8162306a36Sopenharmony_ci 8262306a36Sopenharmony_ciThe socket option configures timestamp generation for individual 8362306a36Sopenharmony_cisk_buffs (1.3.1), timestamp reporting to the socket's error 8462306a36Sopenharmony_ciqueue (1.3.2) and options (1.3.3). Timestamp generation can also 8562306a36Sopenharmony_cibe enabled for individual sendmsg calls using cmsg (1.3.4). 8662306a36Sopenharmony_ci 8762306a36Sopenharmony_ci 8862306a36Sopenharmony_ci1.3.1 Timestamp Generation 8962306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^ 9062306a36Sopenharmony_ci 9162306a36Sopenharmony_ciSome bits are requests to the stack to try to generate timestamps. Any 9262306a36Sopenharmony_cicombination of them is valid. Changes to these bits apply to newly 9362306a36Sopenharmony_cicreated packets, not to packets already in the stack. As a result, it 9462306a36Sopenharmony_ciis possible to selectively request timestamps for a subset of packets 9562306a36Sopenharmony_ci(e.g., for sampling) by embedding an send() call within two setsockopt 9662306a36Sopenharmony_cicalls, one to enable timestamp generation and one to disable it. 9762306a36Sopenharmony_ciTimestamps may also be generated for reasons other than being 9862306a36Sopenharmony_cirequested by a particular socket, such as when receive timestamping is 9962306a36Sopenharmony_cienabled system wide, as explained earlier. 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ciSOF_TIMESTAMPING_RX_HARDWARE: 10262306a36Sopenharmony_ci Request rx timestamps generated by the network adapter. 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ciSOF_TIMESTAMPING_RX_SOFTWARE: 10562306a36Sopenharmony_ci Request rx timestamps when data enters the kernel. These timestamps 10662306a36Sopenharmony_ci are generated just after a device driver hands a packet to the 10762306a36Sopenharmony_ci kernel receive stack. 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_HARDWARE: 11062306a36Sopenharmony_ci Request tx timestamps generated by the network adapter. This flag 11162306a36Sopenharmony_ci can be enabled via both socket options and control messages. 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_SOFTWARE: 11462306a36Sopenharmony_ci Request tx timestamps when data leaves the kernel. These timestamps 11562306a36Sopenharmony_ci are generated in the device driver as close as possible, but always 11662306a36Sopenharmony_ci prior to, passing the packet to the network interface. Hence, they 11762306a36Sopenharmony_ci require driver support and may not be available for all devices. 11862306a36Sopenharmony_ci This flag can be enabled via both socket options and control messages. 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_SCHED: 12162306a36Sopenharmony_ci Request tx timestamps prior to entering the packet scheduler. Kernel 12262306a36Sopenharmony_ci transmit latency is, if long, often dominated by queuing delay. The 12362306a36Sopenharmony_ci difference between this timestamp and one taken at 12462306a36Sopenharmony_ci SOF_TIMESTAMPING_TX_SOFTWARE will expose this latency independent 12562306a36Sopenharmony_ci of protocol processing. The latency incurred in protocol 12662306a36Sopenharmony_ci processing, if any, can be computed by subtracting a userspace 12762306a36Sopenharmony_ci timestamp taken immediately before send() from this timestamp. On 12862306a36Sopenharmony_ci machines with virtual devices where a transmitted packet travels 12962306a36Sopenharmony_ci through multiple devices and, hence, multiple packet schedulers, 13062306a36Sopenharmony_ci a timestamp is generated at each layer. This allows for fine 13162306a36Sopenharmony_ci grained measurement of queuing delay. This flag can be enabled 13262306a36Sopenharmony_ci via both socket options and control messages. 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_ACK: 13562306a36Sopenharmony_ci Request tx timestamps when all data in the send buffer has been 13662306a36Sopenharmony_ci acknowledged. This only makes sense for reliable protocols. It is 13762306a36Sopenharmony_ci currently only implemented for TCP. For that protocol, it may 13862306a36Sopenharmony_ci over-report measurement, because the timestamp is generated when all 13962306a36Sopenharmony_ci data up to and including the buffer at send() was acknowledged: the 14062306a36Sopenharmony_ci cumulative acknowledgment. The mechanism ignores SACK and FACK. 14162306a36Sopenharmony_ci This flag can be enabled via both socket options and control messages. 14262306a36Sopenharmony_ci 14362306a36Sopenharmony_ci 14462306a36Sopenharmony_ci1.3.2 Timestamp Reporting 14562306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^ 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ciThe other three bits control which timestamps will be reported in a 14862306a36Sopenharmony_cigenerated control message. Changes to the bits take immediate 14962306a36Sopenharmony_cieffect at the timestamp reporting locations in the stack. Timestamps 15062306a36Sopenharmony_ciare only reported for packets that also have the relevant timestamp 15162306a36Sopenharmony_cigeneration request set. 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ciSOF_TIMESTAMPING_SOFTWARE: 15462306a36Sopenharmony_ci Report any software timestamps when available. 15562306a36Sopenharmony_ci 15662306a36Sopenharmony_ciSOF_TIMESTAMPING_SYS_HARDWARE: 15762306a36Sopenharmony_ci This option is deprecated and ignored. 15862306a36Sopenharmony_ci 15962306a36Sopenharmony_ciSOF_TIMESTAMPING_RAW_HARDWARE: 16062306a36Sopenharmony_ci Report hardware timestamps as generated by 16162306a36Sopenharmony_ci SOF_TIMESTAMPING_TX_HARDWARE when available. 16262306a36Sopenharmony_ci 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ci1.3.3 Timestamp Options 16562306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^ 16662306a36Sopenharmony_ci 16762306a36Sopenharmony_ciThe interface supports the options 16862306a36Sopenharmony_ci 16962306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_ID: 17062306a36Sopenharmony_ci Generate a unique identifier along with each packet. A process can 17162306a36Sopenharmony_ci have multiple concurrent timestamping requests outstanding. Packets 17262306a36Sopenharmony_ci can be reordered in the transmit path, for instance in the packet 17362306a36Sopenharmony_ci scheduler. In that case timestamps will be queued onto the error 17462306a36Sopenharmony_ci queue out of order from the original send() calls. It is not always 17562306a36Sopenharmony_ci possible to uniquely match timestamps to the original send() calls 17662306a36Sopenharmony_ci based on timestamp order or payload inspection alone, then. 17762306a36Sopenharmony_ci 17862306a36Sopenharmony_ci This option associates each packet at send() with a unique 17962306a36Sopenharmony_ci identifier and returns that along with the timestamp. The identifier 18062306a36Sopenharmony_ci is derived from a per-socket u32 counter (that wraps). For datagram 18162306a36Sopenharmony_ci sockets, the counter increments with each sent packet. For stream 18262306a36Sopenharmony_ci sockets, it increments with every byte. For stream sockets, also set 18362306a36Sopenharmony_ci SOF_TIMESTAMPING_OPT_ID_TCP, see the section below. 18462306a36Sopenharmony_ci 18562306a36Sopenharmony_ci The counter starts at zero. It is initialized the first time that 18662306a36Sopenharmony_ci the socket option is enabled. It is reset each time the option is 18762306a36Sopenharmony_ci enabled after having been disabled. Resetting the counter does not 18862306a36Sopenharmony_ci change the identifiers of existing packets in the system. 18962306a36Sopenharmony_ci 19062306a36Sopenharmony_ci This option is implemented only for transmit timestamps. There, the 19162306a36Sopenharmony_ci timestamp is always looped along with a struct sock_extended_err. 19262306a36Sopenharmony_ci The option modifies field ee_data to pass an id that is unique 19362306a36Sopenharmony_ci among all possibly concurrently outstanding timestamp requests for 19462306a36Sopenharmony_ci that socket. 19562306a36Sopenharmony_ci 19662306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_ID_TCP: 19762306a36Sopenharmony_ci Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP 19862306a36Sopenharmony_ci timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the 19962306a36Sopenharmony_ci counter increments for stream sockets, but its starting point is 20062306a36Sopenharmony_ci not entirely trivial. This option fixes that. 20162306a36Sopenharmony_ci 20262306a36Sopenharmony_ci For stream sockets, if SOF_TIMESTAMPING_OPT_ID is set, this should 20362306a36Sopenharmony_ci always be set too. On datagram sockets the option has no effect. 20462306a36Sopenharmony_ci 20562306a36Sopenharmony_ci A reasonable expectation is that the counter is reset to zero with 20662306a36Sopenharmony_ci the system call, so that a subsequent write() of N bytes generates 20762306a36Sopenharmony_ci a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP 20862306a36Sopenharmony_ci implements this behavior under all conditions. 20962306a36Sopenharmony_ci 21062306a36Sopenharmony_ci SOF_TIMESTAMPING_OPT_ID without modifier often reports the same, 21162306a36Sopenharmony_ci especially when the socket option is set when no data is in 21262306a36Sopenharmony_ci transmission. If data is being transmitted, it may be off by the 21362306a36Sopenharmony_ci length of the output queue (SIOCOUTQ). 21462306a36Sopenharmony_ci 21562306a36Sopenharmony_ci The difference is due to being based on snd_una versus write_seq. 21662306a36Sopenharmony_ci snd_una is the offset in the stream acknowledged by the peer. This 21762306a36Sopenharmony_ci depends on factors outside of process control, such as network RTT. 21862306a36Sopenharmony_ci write_seq is the last byte written by the process. This offset is 21962306a36Sopenharmony_ci not affected by external inputs. 22062306a36Sopenharmony_ci 22162306a36Sopenharmony_ci The difference is subtle and unlikely to be noticed when configured 22262306a36Sopenharmony_ci at initial socket creation, when no data is queued or sent. But 22362306a36Sopenharmony_ci SOF_TIMESTAMPING_OPT_ID_TCP behavior is more robust regardless of 22462306a36Sopenharmony_ci when the socket option is set. 22562306a36Sopenharmony_ci 22662306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_CMSG: 22762306a36Sopenharmony_ci Support recv() cmsg for all timestamped packets. Control messages 22862306a36Sopenharmony_ci are already supported unconditionally on all packets with receive 22962306a36Sopenharmony_ci timestamps and on IPv6 packets with transmit timestamp. This option 23062306a36Sopenharmony_ci extends them to IPv4 packets with transmit timestamp. One use case 23162306a36Sopenharmony_ci is to correlate packets with their egress device, by enabling socket 23262306a36Sopenharmony_ci option IP_PKTINFO simultaneously. 23362306a36Sopenharmony_ci 23462306a36Sopenharmony_ci 23562306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_TSONLY: 23662306a36Sopenharmony_ci Applies to transmit timestamps only. Makes the kernel return the 23762306a36Sopenharmony_ci timestamp as a cmsg alongside an empty packet, as opposed to 23862306a36Sopenharmony_ci alongside the original packet. This reduces the amount of memory 23962306a36Sopenharmony_ci charged to the socket's receive budget (SO_RCVBUF) and delivers 24062306a36Sopenharmony_ci the timestamp even if sysctl net.core.tstamp_allow_data is 0. 24162306a36Sopenharmony_ci This option disables SOF_TIMESTAMPING_OPT_CMSG. 24262306a36Sopenharmony_ci 24362306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_STATS: 24462306a36Sopenharmony_ci Optional stats that are obtained along with the transmit timestamps. 24562306a36Sopenharmony_ci It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the 24662306a36Sopenharmony_ci transmit timestamp is available, the stats are available in a 24762306a36Sopenharmony_ci separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a 24862306a36Sopenharmony_ci list of TLVs (struct nlattr) of types. These stats allow the 24962306a36Sopenharmony_ci application to associate various transport layer stats with 25062306a36Sopenharmony_ci the transmit timestamps, such as how long a certain block of 25162306a36Sopenharmony_ci data was limited by peer's receiver window. 25262306a36Sopenharmony_ci 25362306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_PKTINFO: 25462306a36Sopenharmony_ci Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming 25562306a36Sopenharmony_ci packets with hardware timestamps. The message contains struct 25662306a36Sopenharmony_ci scm_ts_pktinfo, which supplies the index of the real interface which 25762306a36Sopenharmony_ci received the packet and its length at layer 2. A valid (non-zero) 25862306a36Sopenharmony_ci interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is 25962306a36Sopenharmony_ci enabled and the driver is using NAPI. The struct contains also two 26062306a36Sopenharmony_ci other fields, but they are reserved and undefined. 26162306a36Sopenharmony_ci 26262306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_TX_SWHW: 26362306a36Sopenharmony_ci Request both hardware and software timestamps for outgoing packets 26462306a36Sopenharmony_ci when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE 26562306a36Sopenharmony_ci are enabled at the same time. If both timestamps are generated, 26662306a36Sopenharmony_ci two separate messages will be looped to the socket's error queue, 26762306a36Sopenharmony_ci each containing just one timestamp. 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ciNew applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to 27062306a36Sopenharmony_cidisambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate 27162306a36Sopenharmony_ciregardless of the setting of sysctl net.core.tstamp_allow_data. 27262306a36Sopenharmony_ci 27362306a36Sopenharmony_ciAn exception is when a process needs additional cmsg data, for 27462306a36Sopenharmony_ciinstance SOL_IP/IP_PKTINFO to detect the egress network interface. 27562306a36Sopenharmony_ciThen pass option SOF_TIMESTAMPING_OPT_CMSG. This option depends on 27662306a36Sopenharmony_cihaving access to the contents of the original packet, so cannot be 27762306a36Sopenharmony_cicombined with SOF_TIMESTAMPING_OPT_TSONLY. 27862306a36Sopenharmony_ci 27962306a36Sopenharmony_ci 28062306a36Sopenharmony_ci1.3.4. Enabling timestamps via control messages 28162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 28262306a36Sopenharmony_ci 28362306a36Sopenharmony_ciIn addition to socket options, timestamp generation can be requested 28462306a36Sopenharmony_ciper write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1). 28562306a36Sopenharmony_ciUsing this feature, applications can sample timestamps per sendmsg() 28662306a36Sopenharmony_ciwithout paying the overhead of enabling and disabling timestamps via 28762306a36Sopenharmony_cisetsockopt:: 28862306a36Sopenharmony_ci 28962306a36Sopenharmony_ci struct msghdr *msg; 29062306a36Sopenharmony_ci ... 29162306a36Sopenharmony_ci cmsg = CMSG_FIRSTHDR(msg); 29262306a36Sopenharmony_ci cmsg->cmsg_level = SOL_SOCKET; 29362306a36Sopenharmony_ci cmsg->cmsg_type = SO_TIMESTAMPING; 29462306a36Sopenharmony_ci cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); 29562306a36Sopenharmony_ci *((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED | 29662306a36Sopenharmony_ci SOF_TIMESTAMPING_TX_SOFTWARE | 29762306a36Sopenharmony_ci SOF_TIMESTAMPING_TX_ACK; 29862306a36Sopenharmony_ci err = sendmsg(fd, msg, 0); 29962306a36Sopenharmony_ci 30062306a36Sopenharmony_ciThe SOF_TIMESTAMPING_TX_* flags set via cmsg will override 30162306a36Sopenharmony_cithe SOF_TIMESTAMPING_TX_* flags set via setsockopt. 30262306a36Sopenharmony_ci 30362306a36Sopenharmony_ciMoreover, applications must still enable timestamp reporting via 30462306a36Sopenharmony_cisetsockopt to receive timestamps:: 30562306a36Sopenharmony_ci 30662306a36Sopenharmony_ci __u32 val = SOF_TIMESTAMPING_SOFTWARE | 30762306a36Sopenharmony_ci SOF_TIMESTAMPING_OPT_ID /* or any other flag */; 30862306a36Sopenharmony_ci err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); 30962306a36Sopenharmony_ci 31062306a36Sopenharmony_ci 31162306a36Sopenharmony_ci1.4 Bytestream Timestamps 31262306a36Sopenharmony_ci------------------------- 31362306a36Sopenharmony_ci 31462306a36Sopenharmony_ciThe SO_TIMESTAMPING interface supports timestamping of bytes in a 31562306a36Sopenharmony_cibytestream. Each request is interpreted as a request for when the 31662306a36Sopenharmony_cientire contents of the buffer has passed a timestamping point. That 31762306a36Sopenharmony_ciis, for streams option SOF_TIMESTAMPING_TX_SOFTWARE will record 31862306a36Sopenharmony_ciwhen all bytes have reached the device driver, regardless of how 31962306a36Sopenharmony_cimany packets the data has been converted into. 32062306a36Sopenharmony_ci 32162306a36Sopenharmony_ciIn general, bytestreams have no natural delimiters and therefore 32262306a36Sopenharmony_cicorrelating a timestamp with data is non-trivial. A range of bytes 32362306a36Sopenharmony_cimay be split across segments, any segments may be merged (possibly 32462306a36Sopenharmony_cicoalescing sections of previously segmented buffers associated with 32562306a36Sopenharmony_ciindependent send() calls). Segments can be reordered and the same 32662306a36Sopenharmony_cibyte range can coexist in multiple segments for protocols that 32762306a36Sopenharmony_ciimplement retransmissions. 32862306a36Sopenharmony_ci 32962306a36Sopenharmony_ciIt is essential that all timestamps implement the same semantics, 33062306a36Sopenharmony_ciregardless of these possible transformations, as otherwise they are 33162306a36Sopenharmony_ciincomparable. Handling "rare" corner cases differently from the 33262306a36Sopenharmony_cisimple case (a 1:1 mapping from buffer to skb) is insufficient 33362306a36Sopenharmony_cibecause performance debugging often needs to focus on such outliers. 33462306a36Sopenharmony_ci 33562306a36Sopenharmony_ciIn practice, timestamps can be correlated with segments of a 33662306a36Sopenharmony_cibytestream consistently, if both semantics of the timestamp and the 33762306a36Sopenharmony_citiming of measurement are chosen correctly. This challenge is no 33862306a36Sopenharmony_cidifferent from deciding on a strategy for IP fragmentation. There, the 33962306a36Sopenharmony_cidefinition is that only the first fragment is timestamped. For 34062306a36Sopenharmony_cibytestreams, we chose that a timestamp is generated only when all 34162306a36Sopenharmony_cibytes have passed a point. SOF_TIMESTAMPING_TX_ACK as defined is easy to 34262306a36Sopenharmony_ciimplement and reason about. An implementation that has to take into 34362306a36Sopenharmony_ciaccount SACK would be more complex due to possible transmission holes 34462306a36Sopenharmony_ciand out of order arrival. 34562306a36Sopenharmony_ci 34662306a36Sopenharmony_ciOn the host, TCP can also break the simple 1:1 mapping from buffer to 34762306a36Sopenharmony_ciskbuff as a result of Nagle, cork, autocork, segmentation and GSO. The 34862306a36Sopenharmony_ciimplementation ensures correctness in all cases by tracking the 34962306a36Sopenharmony_ciindividual last byte passed to send(), even if it is no longer the 35062306a36Sopenharmony_cilast byte after an skbuff extend or merge operation. It stores the 35162306a36Sopenharmony_cirelevant sequence number in skb_shinfo(skb)->tskey. Because an skbuff 35262306a36Sopenharmony_cihas only one such field, only one timestamp can be generated. 35362306a36Sopenharmony_ci 35462306a36Sopenharmony_ciIn rare cases, a timestamp request can be missed if two requests are 35562306a36Sopenharmony_cicollapsed onto the same skb. A process can detect this situation by 35662306a36Sopenharmony_cienabling SOF_TIMESTAMPING_OPT_ID and comparing the byte offset at 35762306a36Sopenharmony_cisend time with the value returned for each timestamp. It can prevent 35862306a36Sopenharmony_cithe situation by always flushing the TCP stack in between requests, 35962306a36Sopenharmony_cifor instance by enabling TCP_NODELAY and disabling TCP_CORK and 36062306a36Sopenharmony_ciautocork. 36162306a36Sopenharmony_ci 36262306a36Sopenharmony_ciThese precautions ensure that the timestamp is generated only when all 36362306a36Sopenharmony_cibytes have passed a timestamp point, assuming that the network stack 36462306a36Sopenharmony_ciitself does not reorder the segments. The stack indeed tries to avoid 36562306a36Sopenharmony_cireordering. The one exception is under administrator control: it is 36662306a36Sopenharmony_cipossible to construct a packet scheduler configuration that delays 36762306a36Sopenharmony_cisegments from the same stream differently. Such a setup would be 36862306a36Sopenharmony_ciunusual. 36962306a36Sopenharmony_ci 37062306a36Sopenharmony_ci 37162306a36Sopenharmony_ci2 Data Interfaces 37262306a36Sopenharmony_ci================== 37362306a36Sopenharmony_ci 37462306a36Sopenharmony_ciTimestamps are read using the ancillary data feature of recvmsg(). 37562306a36Sopenharmony_ciSee `man 3 cmsg` for details of this interface. The socket manual 37662306a36Sopenharmony_cipage (`man 7 socket`) describes how timestamps generated with 37762306a36Sopenharmony_ciSO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved. 37862306a36Sopenharmony_ci 37962306a36Sopenharmony_ci 38062306a36Sopenharmony_ci2.1 SCM_TIMESTAMPING records 38162306a36Sopenharmony_ci---------------------------- 38262306a36Sopenharmony_ci 38362306a36Sopenharmony_ciThese timestamps are returned in a control message with cmsg_level 38462306a36Sopenharmony_ciSOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type 38562306a36Sopenharmony_ci 38662306a36Sopenharmony_ciFor SO_TIMESTAMPING_OLD:: 38762306a36Sopenharmony_ci 38862306a36Sopenharmony_ci struct scm_timestamping { 38962306a36Sopenharmony_ci struct timespec ts[3]; 39062306a36Sopenharmony_ci }; 39162306a36Sopenharmony_ci 39262306a36Sopenharmony_ciFor SO_TIMESTAMPING_NEW:: 39362306a36Sopenharmony_ci 39462306a36Sopenharmony_ci struct scm_timestamping64 { 39562306a36Sopenharmony_ci struct __kernel_timespec ts[3]; 39662306a36Sopenharmony_ci 39762306a36Sopenharmony_ciAlways use SO_TIMESTAMPING_NEW timestamp to always get timestamp in 39862306a36Sopenharmony_cistruct scm_timestamping64 format. 39962306a36Sopenharmony_ci 40062306a36Sopenharmony_ciSO_TIMESTAMPING_OLD returns incorrect timestamps after the year 2038 40162306a36Sopenharmony_cion 32 bit machines. 40262306a36Sopenharmony_ci 40362306a36Sopenharmony_ciThe structure can return up to three timestamps. This is a legacy 40462306a36Sopenharmony_cifeature. At least one field is non-zero at any time. Most timestamps 40562306a36Sopenharmony_ciare passed in ts[0]. Hardware timestamps are passed in ts[2]. 40662306a36Sopenharmony_ci 40762306a36Sopenharmony_cits[1] used to hold hardware timestamps converted to system time. 40862306a36Sopenharmony_ciInstead, expose the hardware clock device on the NIC directly as 40962306a36Sopenharmony_cia HW PTP clock source, to allow time conversion in userspace and 41062306a36Sopenharmony_cioptionally synchronize system time with a userspace PTP stack such 41162306a36Sopenharmony_cias linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst. 41262306a36Sopenharmony_ci 41362306a36Sopenharmony_ciNote that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled 41462306a36Sopenharmony_citogether with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false 41562306a36Sopenharmony_cisoftware timestamp will be generated in the recvmsg() call and passed 41662306a36Sopenharmony_ciin ts[0] when a real software timestamp is missing. This happens also 41762306a36Sopenharmony_cion hardware transmit timestamps. 41862306a36Sopenharmony_ci 41962306a36Sopenharmony_ci2.1.1 Transmit timestamps with MSG_ERRQUEUE 42062306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 42162306a36Sopenharmony_ci 42262306a36Sopenharmony_ciFor transmit timestamps the outgoing packet is looped back to the 42362306a36Sopenharmony_cisocket's error queue with the send timestamp(s) attached. A process 42462306a36Sopenharmony_cireceives the timestamps by calling recvmsg() with flag MSG_ERRQUEUE 42562306a36Sopenharmony_ciset and with a msg_control buffer sufficiently large to receive the 42662306a36Sopenharmony_cirelevant metadata structures. The recvmsg call returns the original 42762306a36Sopenharmony_cioutgoing data packet with two ancillary messages attached. 42862306a36Sopenharmony_ci 42962306a36Sopenharmony_ciA message of cm_level SOL_IP(V6) and cm_type IP(V6)_RECVERR 43062306a36Sopenharmony_ciembeds a struct sock_extended_err. This defines the error type. For 43162306a36Sopenharmony_citimestamps, the ee_errno field is ENOMSG. The other ancillary message 43262306a36Sopenharmony_ciwill have cm_level SOL_SOCKET and cm_type SCM_TIMESTAMPING. This 43362306a36Sopenharmony_ciembeds the struct scm_timestamping. 43462306a36Sopenharmony_ci 43562306a36Sopenharmony_ci 43662306a36Sopenharmony_ci2.1.1.2 Timestamp types 43762306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~ 43862306a36Sopenharmony_ci 43962306a36Sopenharmony_ciThe semantics of the three struct timespec are defined by field 44062306a36Sopenharmony_ciee_info in the extended error structure. It contains a value of 44162306a36Sopenharmony_citype SCM_TSTAMP_* to define the actual timestamp passed in 44262306a36Sopenharmony_ciscm_timestamping. 44362306a36Sopenharmony_ci 44462306a36Sopenharmony_ciThe SCM_TSTAMP_* types are 1:1 matches to the SOF_TIMESTAMPING_* 44562306a36Sopenharmony_cicontrol fields discussed previously, with one exception. For legacy 44662306a36Sopenharmony_cireasons, SCM_TSTAMP_SND is equal to zero and can be set for both 44762306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE. It 44862306a36Sopenharmony_ciis the first if ts[2] is non-zero, the second otherwise, in which 44962306a36Sopenharmony_cicase the timestamp is stored in ts[0]. 45062306a36Sopenharmony_ci 45162306a36Sopenharmony_ci 45262306a36Sopenharmony_ci2.1.1.3 Fragmentation 45362306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~ 45462306a36Sopenharmony_ci 45562306a36Sopenharmony_ciFragmentation of outgoing datagrams is rare, but is possible, e.g., by 45662306a36Sopenharmony_ciexplicitly disabling PMTU discovery. If an outgoing packet is fragmented, 45762306a36Sopenharmony_cithen only the first fragment is timestamped and returned to the sending 45862306a36Sopenharmony_cisocket. 45962306a36Sopenharmony_ci 46062306a36Sopenharmony_ci 46162306a36Sopenharmony_ci2.1.1.4 Packet Payload 46262306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~ 46362306a36Sopenharmony_ci 46462306a36Sopenharmony_ciThe calling application is often not interested in receiving the whole 46562306a36Sopenharmony_cipacket payload that it passed to the stack originally: the socket 46662306a36Sopenharmony_cierror queue mechanism is just a method to piggyback the timestamp on. 46762306a36Sopenharmony_ciIn this case, the application can choose to read datagrams with a 46862306a36Sopenharmony_cismaller buffer, possibly even of length 0. The payload is truncated 46962306a36Sopenharmony_ciaccordingly. Until the process calls recvmsg() on the error queue, 47062306a36Sopenharmony_cihowever, the full packet is queued, taking up budget from SO_RCVBUF. 47162306a36Sopenharmony_ci 47262306a36Sopenharmony_ci 47362306a36Sopenharmony_ci2.1.1.5 Blocking Read 47462306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~ 47562306a36Sopenharmony_ci 47662306a36Sopenharmony_ciReading from the error queue is always a non-blocking operation. To 47762306a36Sopenharmony_ciblock waiting on a timestamp, use poll or select. poll() will return 47862306a36Sopenharmony_ciPOLLERR in pollfd.revents if any data is ready on the error queue. 47962306a36Sopenharmony_ciThere is no need to pass this flag in pollfd.events. This flag is 48062306a36Sopenharmony_ciignored on request. See also `man 2 poll`. 48162306a36Sopenharmony_ci 48262306a36Sopenharmony_ci 48362306a36Sopenharmony_ci2.1.2 Receive timestamps 48462306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^ 48562306a36Sopenharmony_ci 48662306a36Sopenharmony_ciOn reception, there is no reason to read from the socket error queue. 48762306a36Sopenharmony_ciThe SCM_TIMESTAMPING ancillary data is sent along with the packet data 48862306a36Sopenharmony_cion a normal recvmsg(). Since this is not a socket error, it is not 48962306a36Sopenharmony_ciaccompanied by a message SOL_IP(V6)/IP(V6)_RECVERROR. In this case, 49062306a36Sopenharmony_cithe meaning of the three fields in struct scm_timestamping is 49162306a36Sopenharmony_ciimplicitly defined. ts[0] holds a software timestamp if set, ts[1] 49262306a36Sopenharmony_ciis again deprecated and ts[2] holds a hardware timestamp if set. 49362306a36Sopenharmony_ci 49462306a36Sopenharmony_ci 49562306a36Sopenharmony_ci3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP 49662306a36Sopenharmony_ci======================================================================= 49762306a36Sopenharmony_ci 49862306a36Sopenharmony_ciHardware time stamping must also be initialized for each device driver 49962306a36Sopenharmony_cithat is expected to do hardware time stamping. The parameter is defined in 50062306a36Sopenharmony_ciinclude/uapi/linux/net_tstamp.h as:: 50162306a36Sopenharmony_ci 50262306a36Sopenharmony_ci struct hwtstamp_config { 50362306a36Sopenharmony_ci int flags; /* no flags defined right now, must be zero */ 50462306a36Sopenharmony_ci int tx_type; /* HWTSTAMP_TX_* */ 50562306a36Sopenharmony_ci int rx_filter; /* HWTSTAMP_FILTER_* */ 50662306a36Sopenharmony_ci }; 50762306a36Sopenharmony_ci 50862306a36Sopenharmony_ciDesired behavior is passed into the kernel and to a specific device by 50962306a36Sopenharmony_cicalling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose 51062306a36Sopenharmony_ciifr_data points to a struct hwtstamp_config. The tx_type and 51162306a36Sopenharmony_cirx_filter are hints to the driver what it is expected to do. If 51262306a36Sopenharmony_cithe requested fine-grained filtering for incoming packets is not 51362306a36Sopenharmony_cisupported, the driver may time stamp more than just the requested types 51462306a36Sopenharmony_ciof packets. 51562306a36Sopenharmony_ci 51662306a36Sopenharmony_ciDrivers are free to use a more permissive configuration than the requested 51762306a36Sopenharmony_ciconfiguration. It is expected that drivers should only implement directly the 51862306a36Sopenharmony_cimost generic mode that can be supported. For example if the hardware can 51962306a36Sopenharmony_cisupport HWTSTAMP_FILTER_PTP_V2_EVENT, then it should generally always upscale 52062306a36Sopenharmony_ciHWTSTAMP_FILTER_PTP_V2_L2_SYNC, and so forth, as HWTSTAMP_FILTER_PTP_V2_EVENT 52162306a36Sopenharmony_ciis more generic (and more useful to applications). 52262306a36Sopenharmony_ci 52362306a36Sopenharmony_ciA driver which supports hardware time stamping shall update the struct 52462306a36Sopenharmony_ciwith the actual, possibly more permissive configuration. If the 52562306a36Sopenharmony_cirequested packets cannot be time stamped, then nothing should be 52662306a36Sopenharmony_cichanged and ERANGE shall be returned (in contrast to EINVAL, which 52762306a36Sopenharmony_ciindicates that SIOCSHWTSTAMP is not supported at all). 52862306a36Sopenharmony_ci 52962306a36Sopenharmony_ciOnly a processes with admin rights may change the configuration. User 53062306a36Sopenharmony_cispace is responsible to ensure that multiple processes don't interfere 53162306a36Sopenharmony_ciwith each other and that the settings are reset. 53262306a36Sopenharmony_ci 53362306a36Sopenharmony_ciAny process can read the actual configuration by passing this 53462306a36Sopenharmony_cistructure to ioctl(SIOCGHWTSTAMP) in the same way. However, this has 53562306a36Sopenharmony_cinot been implemented in all drivers. 53662306a36Sopenharmony_ci 53762306a36Sopenharmony_ci:: 53862306a36Sopenharmony_ci 53962306a36Sopenharmony_ci /* possible values for hwtstamp_config->tx_type */ 54062306a36Sopenharmony_ci enum { 54162306a36Sopenharmony_ci /* 54262306a36Sopenharmony_ci * no outgoing packet will need hardware time stamping; 54362306a36Sopenharmony_ci * should a packet arrive which asks for it, no hardware 54462306a36Sopenharmony_ci * time stamping will be done 54562306a36Sopenharmony_ci */ 54662306a36Sopenharmony_ci HWTSTAMP_TX_OFF, 54762306a36Sopenharmony_ci 54862306a36Sopenharmony_ci /* 54962306a36Sopenharmony_ci * enables hardware time stamping for outgoing packets; 55062306a36Sopenharmony_ci * the sender of the packet decides which are to be 55162306a36Sopenharmony_ci * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE 55262306a36Sopenharmony_ci * before sending the packet 55362306a36Sopenharmony_ci */ 55462306a36Sopenharmony_ci HWTSTAMP_TX_ON, 55562306a36Sopenharmony_ci }; 55662306a36Sopenharmony_ci 55762306a36Sopenharmony_ci /* possible values for hwtstamp_config->rx_filter */ 55862306a36Sopenharmony_ci enum { 55962306a36Sopenharmony_ci /* time stamp no incoming packet at all */ 56062306a36Sopenharmony_ci HWTSTAMP_FILTER_NONE, 56162306a36Sopenharmony_ci 56262306a36Sopenharmony_ci /* time stamp any incoming packet */ 56362306a36Sopenharmony_ci HWTSTAMP_FILTER_ALL, 56462306a36Sopenharmony_ci 56562306a36Sopenharmony_ci /* return value: time stamp all packets requested plus some others */ 56662306a36Sopenharmony_ci HWTSTAMP_FILTER_SOME, 56762306a36Sopenharmony_ci 56862306a36Sopenharmony_ci /* PTP v1, UDP, any kind of event packet */ 56962306a36Sopenharmony_ci HWTSTAMP_FILTER_PTP_V1_L4_EVENT, 57062306a36Sopenharmony_ci 57162306a36Sopenharmony_ci /* for the complete list of values, please check 57262306a36Sopenharmony_ci * the include file include/uapi/linux/net_tstamp.h 57362306a36Sopenharmony_ci */ 57462306a36Sopenharmony_ci }; 57562306a36Sopenharmony_ci 57662306a36Sopenharmony_ci3.1 Hardware Timestamping Implementation: Device Drivers 57762306a36Sopenharmony_ci-------------------------------------------------------- 57862306a36Sopenharmony_ci 57962306a36Sopenharmony_ciA driver which supports hardware time stamping must support the 58062306a36Sopenharmony_ciSIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with 58162306a36Sopenharmony_cithe actual values as described in the section on SIOCSHWTSTAMP. It 58262306a36Sopenharmony_cishould also support SIOCGHWTSTAMP. 58362306a36Sopenharmony_ci 58462306a36Sopenharmony_ciTime stamps for received packets must be stored in the skb. To get a pointer 58562306a36Sopenharmony_cito the shared time stamp structure of the skb call skb_hwtstamps(). Then 58662306a36Sopenharmony_ciset the time stamps in the structure:: 58762306a36Sopenharmony_ci 58862306a36Sopenharmony_ci struct skb_shared_hwtstamps { 58962306a36Sopenharmony_ci /* hardware time stamp transformed into duration 59062306a36Sopenharmony_ci * since arbitrary point in time 59162306a36Sopenharmony_ci */ 59262306a36Sopenharmony_ci ktime_t hwtstamp; 59362306a36Sopenharmony_ci }; 59462306a36Sopenharmony_ci 59562306a36Sopenharmony_ciTime stamps for outgoing packets are to be generated as follows: 59662306a36Sopenharmony_ci 59762306a36Sopenharmony_ci- In hard_start_xmit(), check if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) 59862306a36Sopenharmony_ci is set no-zero. If yes, then the driver is expected to do hardware time 59962306a36Sopenharmony_ci stamping. 60062306a36Sopenharmony_ci- If this is possible for the skb and requested, then declare 60162306a36Sopenharmony_ci that the driver is doing the time stamping by setting the flag 60262306a36Sopenharmony_ci SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with:: 60362306a36Sopenharmony_ci 60462306a36Sopenharmony_ci skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; 60562306a36Sopenharmony_ci 60662306a36Sopenharmony_ci You might want to keep a pointer to the associated skb for the next step 60762306a36Sopenharmony_ci and not free the skb. A driver not supporting hardware time stamping doesn't 60862306a36Sopenharmony_ci do that. A driver must never touch sk_buff::tstamp! It is used to store 60962306a36Sopenharmony_ci software generated time stamps by the network subsystem. 61062306a36Sopenharmony_ci- Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware 61162306a36Sopenharmony_ci as possible. skb_tx_timestamp() provides a software time stamp if requested 61262306a36Sopenharmony_ci and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set). 61362306a36Sopenharmony_ci- As soon as the driver has sent the packet and/or obtained a 61462306a36Sopenharmony_ci hardware time stamp for it, it passes the time stamp back by 61562306a36Sopenharmony_ci calling skb_tstamp_tx() with the original skb, the raw 61662306a36Sopenharmony_ci hardware time stamp. skb_tstamp_tx() clones the original skb and 61762306a36Sopenharmony_ci adds the timestamps, therefore the original skb has to be freed now. 61862306a36Sopenharmony_ci If obtaining the hardware time stamp somehow fails, then the driver 61962306a36Sopenharmony_ci should not fall back to software time stamping. The rationale is that 62062306a36Sopenharmony_ci this would occur at a later time in the processing pipeline than other 62162306a36Sopenharmony_ci software time stamping and therefore could lead to unexpected deltas 62262306a36Sopenharmony_ci between time stamps. 62362306a36Sopenharmony_ci 62462306a36Sopenharmony_ci3.2 Special considerations for stacked PTP Hardware Clocks 62562306a36Sopenharmony_ci---------------------------------------------------------- 62662306a36Sopenharmony_ci 62762306a36Sopenharmony_ciThere are situations when there may be more than one PHC (PTP Hardware Clock) 62862306a36Sopenharmony_ciin the data path of a packet. The kernel has no explicit mechanism to allow the 62962306a36Sopenharmony_ciuser to select which PHC to use for timestamping Ethernet frames. Instead, the 63062306a36Sopenharmony_ciassumption is that the outermost PHC is always the most preferable, and that 63162306a36Sopenharmony_cikernel drivers collaborate towards achieving that goal. Currently there are 3 63262306a36Sopenharmony_cicases of stacked PHCs, detailed below: 63362306a36Sopenharmony_ci 63462306a36Sopenharmony_ci3.2.1 DSA (Distributed Switch Architecture) switches 63562306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 63662306a36Sopenharmony_ci 63762306a36Sopenharmony_ciThese are Ethernet switches which have one of their ports connected to an 63862306a36Sopenharmony_ci(otherwise completely unaware) host Ethernet interface, and perform the role of 63962306a36Sopenharmony_cia port multiplier with optional forwarding acceleration features. Each DSA 64062306a36Sopenharmony_ciswitch port is visible to the user as a standalone (virtual) network interface, 64162306a36Sopenharmony_ciand its network I/O is performed, under the hood, indirectly through the host 64262306a36Sopenharmony_ciinterface (redirecting to the host port on TX, and intercepting frames on RX). 64362306a36Sopenharmony_ci 64462306a36Sopenharmony_ciWhen a DSA switch is attached to a host port, PTP synchronization has to 64562306a36Sopenharmony_cisuffer, since the switch's variable queuing delay introduces a path delay 64662306a36Sopenharmony_cijitter between the host port and its PTP partner. For this reason, some DSA 64762306a36Sopenharmony_ciswitches include a timestamping clock of their own, and have the ability to 64862306a36Sopenharmony_ciperform network timestamping on their own MAC, such that path delays only 64962306a36Sopenharmony_cimeasure wire and PHY propagation latencies. Timestamping DSA switches are 65062306a36Sopenharmony_cisupported in Linux and expose the same ABI as any other network interface (save 65162306a36Sopenharmony_cifor the fact that the DSA interfaces are in fact virtual in terms of network 65262306a36Sopenharmony_ciI/O, they do have their own PHC). It is typical, but not mandatory, for all 65362306a36Sopenharmony_ciinterfaces of a DSA switch to share the same PHC. 65462306a36Sopenharmony_ci 65562306a36Sopenharmony_ciBy design, PTP timestamping with a DSA switch does not need any special 65662306a36Sopenharmony_cihandling in the driver for the host port it is attached to. However, when the 65762306a36Sopenharmony_cihost port also supports PTP timestamping, DSA will take care of intercepting 65862306a36Sopenharmony_cithe ``.ndo_eth_ioctl`` calls towards the host port, and block attempts to enable 65962306a36Sopenharmony_cihardware timestamping on it. This is because the SO_TIMESTAMPING API does not 66062306a36Sopenharmony_ciallow the delivery of multiple hardware timestamps for the same packet, so 66162306a36Sopenharmony_cianybody else except for the DSA switch port must be prevented from doing so. 66262306a36Sopenharmony_ci 66362306a36Sopenharmony_ciIn the generic layer, DSA provides the following infrastructure for PTP 66462306a36Sopenharmony_citimestamping: 66562306a36Sopenharmony_ci 66662306a36Sopenharmony_ci- ``.port_txtstamp()``: a hook called prior to the transmission of 66762306a36Sopenharmony_ci packets with a hardware TX timestamping request from user space. 66862306a36Sopenharmony_ci This is required for two-step timestamping, since the hardware 66962306a36Sopenharmony_ci timestamp becomes available after the actual MAC transmission, so the 67062306a36Sopenharmony_ci driver must be prepared to correlate the timestamp with the original 67162306a36Sopenharmony_ci packet so that it can re-enqueue the packet back into the socket's 67262306a36Sopenharmony_ci error queue. To save the packet for when the timestamp becomes 67362306a36Sopenharmony_ci available, the driver can call ``skb_clone_sk`` , save the clone pointer 67462306a36Sopenharmony_ci in skb->cb and enqueue a tx skb queue. Typically, a switch will have a 67562306a36Sopenharmony_ci PTP TX timestamp register (or sometimes a FIFO) where the timestamp 67662306a36Sopenharmony_ci becomes available. In case of a FIFO, the hardware might store 67762306a36Sopenharmony_ci key-value pairs of PTP sequence ID/message type/domain number and the 67862306a36Sopenharmony_ci actual timestamp. To perform the correlation correctly between the 67962306a36Sopenharmony_ci packets in a queue waiting for timestamping and the actual timestamps, 68062306a36Sopenharmony_ci drivers can use a BPF classifier (``ptp_classify_raw``) to identify 68162306a36Sopenharmony_ci the PTP transport type, and ``ptp_parse_header`` to interpret the PTP 68262306a36Sopenharmony_ci header fields. There may be an IRQ that is raised upon this 68362306a36Sopenharmony_ci timestamp's availability, or the driver might have to poll after 68462306a36Sopenharmony_ci invoking ``dev_queue_xmit()`` towards the host interface. 68562306a36Sopenharmony_ci One-step TX timestamping do not require packet cloning, since there is 68662306a36Sopenharmony_ci no follow-up message required by the PTP protocol (because the 68762306a36Sopenharmony_ci TX timestamp is embedded into the packet by the MAC), and therefore 68862306a36Sopenharmony_ci user space does not expect the packet annotated with the TX timestamp 68962306a36Sopenharmony_ci to be re-enqueued into its socket's error queue. 69062306a36Sopenharmony_ci 69162306a36Sopenharmony_ci- ``.port_rxtstamp()``: On RX, the BPF classifier is run by DSA to 69262306a36Sopenharmony_ci identify PTP event messages (any other packets, including PTP general 69362306a36Sopenharmony_ci messages, are not timestamped). The original (and only) timestampable 69462306a36Sopenharmony_ci skb is provided to the driver, for it to annotate it with a timestamp, 69562306a36Sopenharmony_ci if that is immediately available, or defer to later. On reception, 69662306a36Sopenharmony_ci timestamps might either be available in-band (through metadata in the 69762306a36Sopenharmony_ci DSA header, or attached in other ways to the packet), or out-of-band 69862306a36Sopenharmony_ci (through another RX timestamping FIFO). Deferral on RX is typically 69962306a36Sopenharmony_ci necessary when retrieving the timestamp needs a sleepable context. In 70062306a36Sopenharmony_ci that case, it is the responsibility of the DSA driver to call 70162306a36Sopenharmony_ci ``netif_rx()`` on the freshly timestamped skb. 70262306a36Sopenharmony_ci 70362306a36Sopenharmony_ci3.2.2 Ethernet PHYs 70462306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^ 70562306a36Sopenharmony_ci 70662306a36Sopenharmony_ciThese are devices that typically fulfill a Layer 1 role in the network stack, 70762306a36Sopenharmony_cihence they do not have a representation in terms of a network interface as DSA 70862306a36Sopenharmony_ciswitches do. However, PHYs may be able to detect and timestamp PTP packets, for 70962306a36Sopenharmony_ciperformance reasons: timestamps taken as close as possible to the wire have the 71062306a36Sopenharmony_cipotential to yield a more stable and precise synchronization. 71162306a36Sopenharmony_ci 71262306a36Sopenharmony_ciA PHY driver that supports PTP timestamping must create a ``struct 71362306a36Sopenharmony_cimii_timestamper`` and add a pointer to it in ``phydev->mii_ts``. The presence 71462306a36Sopenharmony_ciof this pointer will be checked by the networking stack. 71562306a36Sopenharmony_ci 71662306a36Sopenharmony_ciSince PHYs do not have network interface representations, the timestamping and 71762306a36Sopenharmony_ciethtool ioctl operations for them need to be mediated by their respective MAC 71862306a36Sopenharmony_cidriver. Therefore, as opposed to DSA switches, modifications need to be done 71962306a36Sopenharmony_cito each individual MAC driver for PHY timestamping support. This entails: 72062306a36Sopenharmony_ci 72162306a36Sopenharmony_ci- Checking, in ``.ndo_eth_ioctl``, whether ``phy_has_hwtstamp(netdev->phydev)`` 72262306a36Sopenharmony_ci is true or not. If it is, then the MAC driver should not process this request 72362306a36Sopenharmony_ci but instead pass it on to the PHY using ``phy_mii_ioctl()``. 72462306a36Sopenharmony_ci 72562306a36Sopenharmony_ci- On RX, special intervention may or may not be needed, depending on the 72662306a36Sopenharmony_ci function used to deliver skb's up the network stack. In the case of plain 72762306a36Sopenharmony_ci ``netif_rx()`` and similar, MAC drivers must check whether 72862306a36Sopenharmony_ci ``skb_defer_rx_timestamp(skb)`` is necessary or not - and if it is, don't 72962306a36Sopenharmony_ci call ``netif_rx()`` at all. If ``CONFIG_NETWORK_PHY_TIMESTAMPING`` is 73062306a36Sopenharmony_ci enabled, and ``skb->dev->phydev->mii_ts`` exists, its ``.rxtstamp()`` hook 73162306a36Sopenharmony_ci will be called now, to determine, using logic very similar to DSA, whether 73262306a36Sopenharmony_ci deferral for RX timestamping is necessary. Again like DSA, it becomes the 73362306a36Sopenharmony_ci responsibility of the PHY driver to send the packet up the stack when the 73462306a36Sopenharmony_ci timestamp is available. 73562306a36Sopenharmony_ci 73662306a36Sopenharmony_ci For other skb receive functions, such as ``napi_gro_receive`` and 73762306a36Sopenharmony_ci ``netif_receive_skb``, the stack automatically checks whether 73862306a36Sopenharmony_ci ``skb_defer_rx_timestamp()`` is necessary, so this check is not needed inside 73962306a36Sopenharmony_ci the driver. 74062306a36Sopenharmony_ci 74162306a36Sopenharmony_ci- On TX, again, special intervention might or might not be needed. The 74262306a36Sopenharmony_ci function that calls the ``mii_ts->txtstamp()`` hook is named 74362306a36Sopenharmony_ci ``skb_clone_tx_timestamp()``. This function can either be called directly 74462306a36Sopenharmony_ci (case in which explicit MAC driver support is indeed needed), but the 74562306a36Sopenharmony_ci function also piggybacks from the ``skb_tx_timestamp()`` call, which many MAC 74662306a36Sopenharmony_ci drivers already perform for software timestamping purposes. Therefore, if a 74762306a36Sopenharmony_ci MAC supports software timestamping, it does not need to do anything further 74862306a36Sopenharmony_ci at this stage. 74962306a36Sopenharmony_ci 75062306a36Sopenharmony_ci3.2.3 MII bus snooping devices 75162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 75262306a36Sopenharmony_ci 75362306a36Sopenharmony_ciThese perform the same role as timestamping Ethernet PHYs, save for the fact 75462306a36Sopenharmony_cithat they are discrete devices and can therefore be used in conjunction with 75562306a36Sopenharmony_ciany PHY even if it doesn't support timestamping. In Linux, they are 75662306a36Sopenharmony_cidiscoverable and attachable to a ``struct phy_device`` through Device Tree, and 75762306a36Sopenharmony_cifor the rest, they use the same mii_ts infrastructure as those. See 75862306a36Sopenharmony_ciDocumentation/devicetree/bindings/ptp/timestamper.txt for more details. 75962306a36Sopenharmony_ci 76062306a36Sopenharmony_ci3.2.4 Other caveats for MAC drivers 76162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 76262306a36Sopenharmony_ci 76362306a36Sopenharmony_ciStacked PHCs, especially DSA (but not only) - since that doesn't require any 76462306a36Sopenharmony_cimodification to MAC drivers, so it is more difficult to ensure correctness of 76562306a36Sopenharmony_ciall possible code paths - is that they uncover bugs which were impossible to 76662306a36Sopenharmony_citrigger before the existence of stacked PTP clocks. One example has to do with 76762306a36Sopenharmony_cithis line of code, already presented earlier:: 76862306a36Sopenharmony_ci 76962306a36Sopenharmony_ci skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; 77062306a36Sopenharmony_ci 77162306a36Sopenharmony_ciAny TX timestamping logic, be it a plain MAC driver, a DSA switch driver, a PHY 77262306a36Sopenharmony_cidriver or a MII bus snooping device driver, should set this flag. 77362306a36Sopenharmony_ciBut a MAC driver that is unaware of PHC stacking might get tripped up by 77462306a36Sopenharmony_cisomebody other than itself setting this flag, and deliver a duplicate 77562306a36Sopenharmony_citimestamp. 77662306a36Sopenharmony_ciFor example, a typical driver design for TX timestamping might be to split the 77762306a36Sopenharmony_citransmission part into 2 portions: 77862306a36Sopenharmony_ci 77962306a36Sopenharmony_ci1. "TX": checks whether PTP timestamping has been previously enabled through 78062306a36Sopenharmony_ci the ``.ndo_eth_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the 78162306a36Sopenharmony_ci current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags & 78262306a36Sopenharmony_ci SKBTX_HW_TSTAMP``"). If this is true, it sets the 78362306a36Sopenharmony_ci "``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS``" flag. Note: as 78462306a36Sopenharmony_ci described above, in the case of a stacked PHC system, this condition should 78562306a36Sopenharmony_ci never trigger, as this MAC is certainly not the outermost PHC. But this is 78662306a36Sopenharmony_ci not where the typical issue is. Transmission proceeds with this packet. 78762306a36Sopenharmony_ci 78862306a36Sopenharmony_ci2. "TX confirmation": Transmission has finished. The driver checks whether it 78962306a36Sopenharmony_ci is necessary to collect any TX timestamp for it. Here is where the typical 79062306a36Sopenharmony_ci issues are: the MAC driver takes a shortcut and only checks whether 79162306a36Sopenharmony_ci "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``" was set. With a stacked 79262306a36Sopenharmony_ci PHC system, this is incorrect because this MAC driver is not the only entity 79362306a36Sopenharmony_ci in the TX data path who could have enabled SKBTX_IN_PROGRESS in the first 79462306a36Sopenharmony_ci place. 79562306a36Sopenharmony_ci 79662306a36Sopenharmony_ciThe correct solution for this problem is for MAC drivers to have a compound 79762306a36Sopenharmony_cicheck in their "TX confirmation" portion, not only for 79862306a36Sopenharmony_ci"``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``", but also for 79962306a36Sopenharmony_ci"``priv->hwtstamp_tx_enabled == true``". Because the rest of the system ensures 80062306a36Sopenharmony_cithat PTP timestamping is not enabled for anything other than the outermost PHC, 80162306a36Sopenharmony_cithis enhanced check will avoid delivering a duplicated TX timestamp to user 80262306a36Sopenharmony_cispace. 803