162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci============
462306a36Sopenharmony_ciTimestamping
562306a36Sopenharmony_ci============
662306a36Sopenharmony_ci
762306a36Sopenharmony_ci
862306a36Sopenharmony_ci1. Control Interfaces
962306a36Sopenharmony_ci=====================
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ciThe interfaces for receiving network packages timestamps are:
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciSO_TIMESTAMP
1462306a36Sopenharmony_ci  Generates a timestamp for each incoming packet in (not necessarily
1562306a36Sopenharmony_ci  monotonic) system time. Reports the timestamp via recvmsg() in a
1662306a36Sopenharmony_ci  control message in usec resolution.
1762306a36Sopenharmony_ci  SO_TIMESTAMP is defined as SO_TIMESTAMP_NEW or SO_TIMESTAMP_OLD
1862306a36Sopenharmony_ci  based on the architecture type and time_t representation of libc.
1962306a36Sopenharmony_ci  Control message format is in struct __kernel_old_timeval for
2062306a36Sopenharmony_ci  SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for
2162306a36Sopenharmony_ci  SO_TIMESTAMP_NEW options respectively.
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ciSO_TIMESTAMPNS
2462306a36Sopenharmony_ci  Same timestamping mechanism as SO_TIMESTAMP, but reports the
2562306a36Sopenharmony_ci  timestamp as struct timespec in nsec resolution.
2662306a36Sopenharmony_ci  SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD
2762306a36Sopenharmony_ci  based on the architecture type and time_t representation of libc.
2862306a36Sopenharmony_ci  Control message format is in struct timespec for SO_TIMESTAMPNS_OLD
2962306a36Sopenharmony_ci  and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options
3062306a36Sopenharmony_ci  respectively.
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ciIP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
3362306a36Sopenharmony_ci  Only for multicast:approximate transmit timestamp obtained by
3462306a36Sopenharmony_ci  reading the looped packet receive timestamp.
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ciSO_TIMESTAMPING
3762306a36Sopenharmony_ci  Generates timestamps on reception, transmission or both. Supports
3862306a36Sopenharmony_ci  multiple timestamp sources, including hardware. Supports generating
3962306a36Sopenharmony_ci  timestamps for stream sockets.
4062306a36Sopenharmony_ci
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ci1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW)
4362306a36Sopenharmony_ci-------------------------------------------------------------
4462306a36Sopenharmony_ci
4562306a36Sopenharmony_ciThis socket option enables timestamping of datagrams on the reception
4662306a36Sopenharmony_cipath. Because the destination socket, if any, is not known early in
4762306a36Sopenharmony_cithe network stack, the feature has to be enabled for all packets. The
4862306a36Sopenharmony_cisame is true for all early receive timestamp options.
4962306a36Sopenharmony_ci
5062306a36Sopenharmony_ciFor interface details, see `man 7 socket`.
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciAlways use SO_TIMESTAMP_NEW timestamp to always get timestamp in
5362306a36Sopenharmony_cistruct __kernel_sock_timeval format.
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ciSO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038
5662306a36Sopenharmony_cion 32 bit machines.
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_ci1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW)
5962306a36Sopenharmony_ci-------------------------------------------------------------------
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ciThis option is identical to SO_TIMESTAMP except for the returned data type.
6262306a36Sopenharmony_ciIts struct timespec allows for higher resolution (ns) timestamps than the
6362306a36Sopenharmony_citimeval of SO_TIMESTAMP (ms).
6462306a36Sopenharmony_ci
6562306a36Sopenharmony_ciAlways use SO_TIMESTAMPNS_NEW timestamp to always get timestamp in
6662306a36Sopenharmony_cistruct __kernel_timespec format.
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ciSO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038
6962306a36Sopenharmony_cion 32 bit machines.
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ci1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW)
7262306a36Sopenharmony_ci----------------------------------------------------------------------
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ciSupports multiple types of timestamp requests. As a result, this
7562306a36Sopenharmony_cisocket option takes a bitmap of flags, not a boolean. In::
7662306a36Sopenharmony_ci
7762306a36Sopenharmony_ci  err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_cival is an integer with any of the following bits set. Setting other
8062306a36Sopenharmony_cibit returns EINVAL and does not change the current state.
8162306a36Sopenharmony_ci
8262306a36Sopenharmony_ciThe socket option configures timestamp generation for individual
8362306a36Sopenharmony_cisk_buffs (1.3.1), timestamp reporting to the socket's error
8462306a36Sopenharmony_ciqueue (1.3.2) and options (1.3.3). Timestamp generation can also
8562306a36Sopenharmony_cibe enabled for individual sendmsg calls using cmsg (1.3.4).
8662306a36Sopenharmony_ci
8762306a36Sopenharmony_ci
8862306a36Sopenharmony_ci1.3.1 Timestamp Generation
8962306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^
9062306a36Sopenharmony_ci
9162306a36Sopenharmony_ciSome bits are requests to the stack to try to generate timestamps. Any
9262306a36Sopenharmony_cicombination of them is valid. Changes to these bits apply to newly
9362306a36Sopenharmony_cicreated packets, not to packets already in the stack. As a result, it
9462306a36Sopenharmony_ciis possible to selectively request timestamps for a subset of packets
9562306a36Sopenharmony_ci(e.g., for sampling) by embedding an send() call within two setsockopt
9662306a36Sopenharmony_cicalls, one to enable timestamp generation and one to disable it.
9762306a36Sopenharmony_ciTimestamps may also be generated for reasons other than being
9862306a36Sopenharmony_cirequested by a particular socket, such as when receive timestamping is
9962306a36Sopenharmony_cienabled system wide, as explained earlier.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ciSOF_TIMESTAMPING_RX_HARDWARE:
10262306a36Sopenharmony_ci  Request rx timestamps generated by the network adapter.
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ciSOF_TIMESTAMPING_RX_SOFTWARE:
10562306a36Sopenharmony_ci  Request rx timestamps when data enters the kernel. These timestamps
10662306a36Sopenharmony_ci  are generated just after a device driver hands a packet to the
10762306a36Sopenharmony_ci  kernel receive stack.
10862306a36Sopenharmony_ci
10962306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_HARDWARE:
11062306a36Sopenharmony_ci  Request tx timestamps generated by the network adapter. This flag
11162306a36Sopenharmony_ci  can be enabled via both socket options and control messages.
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_SOFTWARE:
11462306a36Sopenharmony_ci  Request tx timestamps when data leaves the kernel. These timestamps
11562306a36Sopenharmony_ci  are generated in the device driver as close as possible, but always
11662306a36Sopenharmony_ci  prior to, passing the packet to the network interface. Hence, they
11762306a36Sopenharmony_ci  require driver support and may not be available for all devices.
11862306a36Sopenharmony_ci  This flag can be enabled via both socket options and control messages.
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_SCHED:
12162306a36Sopenharmony_ci  Request tx timestamps prior to entering the packet scheduler. Kernel
12262306a36Sopenharmony_ci  transmit latency is, if long, often dominated by queuing delay. The
12362306a36Sopenharmony_ci  difference between this timestamp and one taken at
12462306a36Sopenharmony_ci  SOF_TIMESTAMPING_TX_SOFTWARE will expose this latency independent
12562306a36Sopenharmony_ci  of protocol processing. The latency incurred in protocol
12662306a36Sopenharmony_ci  processing, if any, can be computed by subtracting a userspace
12762306a36Sopenharmony_ci  timestamp taken immediately before send() from this timestamp. On
12862306a36Sopenharmony_ci  machines with virtual devices where a transmitted packet travels
12962306a36Sopenharmony_ci  through multiple devices and, hence, multiple packet schedulers,
13062306a36Sopenharmony_ci  a timestamp is generated at each layer. This allows for fine
13162306a36Sopenharmony_ci  grained measurement of queuing delay. This flag can be enabled
13262306a36Sopenharmony_ci  via both socket options and control messages.
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_ACK:
13562306a36Sopenharmony_ci  Request tx timestamps when all data in the send buffer has been
13662306a36Sopenharmony_ci  acknowledged. This only makes sense for reliable protocols. It is
13762306a36Sopenharmony_ci  currently only implemented for TCP. For that protocol, it may
13862306a36Sopenharmony_ci  over-report measurement, because the timestamp is generated when all
13962306a36Sopenharmony_ci  data up to and including the buffer at send() was acknowledged: the
14062306a36Sopenharmony_ci  cumulative acknowledgment. The mechanism ignores SACK and FACK.
14162306a36Sopenharmony_ci  This flag can be enabled via both socket options and control messages.
14262306a36Sopenharmony_ci
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ci1.3.2 Timestamp Reporting
14562306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ciThe other three bits control which timestamps will be reported in a
14862306a36Sopenharmony_cigenerated control message. Changes to the bits take immediate
14962306a36Sopenharmony_cieffect at the timestamp reporting locations in the stack. Timestamps
15062306a36Sopenharmony_ciare only reported for packets that also have the relevant timestamp
15162306a36Sopenharmony_cigeneration request set.
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ciSOF_TIMESTAMPING_SOFTWARE:
15462306a36Sopenharmony_ci  Report any software timestamps when available.
15562306a36Sopenharmony_ci
15662306a36Sopenharmony_ciSOF_TIMESTAMPING_SYS_HARDWARE:
15762306a36Sopenharmony_ci  This option is deprecated and ignored.
15862306a36Sopenharmony_ci
15962306a36Sopenharmony_ciSOF_TIMESTAMPING_RAW_HARDWARE:
16062306a36Sopenharmony_ci  Report hardware timestamps as generated by
16162306a36Sopenharmony_ci  SOF_TIMESTAMPING_TX_HARDWARE when available.
16262306a36Sopenharmony_ci
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ci1.3.3 Timestamp Options
16562306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_ciThe interface supports the options
16862306a36Sopenharmony_ci
16962306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_ID:
17062306a36Sopenharmony_ci  Generate a unique identifier along with each packet. A process can
17162306a36Sopenharmony_ci  have multiple concurrent timestamping requests outstanding. Packets
17262306a36Sopenharmony_ci  can be reordered in the transmit path, for instance in the packet
17362306a36Sopenharmony_ci  scheduler. In that case timestamps will be queued onto the error
17462306a36Sopenharmony_ci  queue out of order from the original send() calls. It is not always
17562306a36Sopenharmony_ci  possible to uniquely match timestamps to the original send() calls
17662306a36Sopenharmony_ci  based on timestamp order or payload inspection alone, then.
17762306a36Sopenharmony_ci
17862306a36Sopenharmony_ci  This option associates each packet at send() with a unique
17962306a36Sopenharmony_ci  identifier and returns that along with the timestamp. The identifier
18062306a36Sopenharmony_ci  is derived from a per-socket u32 counter (that wraps). For datagram
18162306a36Sopenharmony_ci  sockets, the counter increments with each sent packet. For stream
18262306a36Sopenharmony_ci  sockets, it increments with every byte. For stream sockets, also set
18362306a36Sopenharmony_ci  SOF_TIMESTAMPING_OPT_ID_TCP, see the section below.
18462306a36Sopenharmony_ci
18562306a36Sopenharmony_ci  The counter starts at zero. It is initialized the first time that
18662306a36Sopenharmony_ci  the socket option is enabled. It is reset each time the option is
18762306a36Sopenharmony_ci  enabled after having been disabled. Resetting the counter does not
18862306a36Sopenharmony_ci  change the identifiers of existing packets in the system.
18962306a36Sopenharmony_ci
19062306a36Sopenharmony_ci  This option is implemented only for transmit timestamps. There, the
19162306a36Sopenharmony_ci  timestamp is always looped along with a struct sock_extended_err.
19262306a36Sopenharmony_ci  The option modifies field ee_data to pass an id that is unique
19362306a36Sopenharmony_ci  among all possibly concurrently outstanding timestamp requests for
19462306a36Sopenharmony_ci  that socket.
19562306a36Sopenharmony_ci
19662306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_ID_TCP:
19762306a36Sopenharmony_ci  Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
19862306a36Sopenharmony_ci  timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
19962306a36Sopenharmony_ci  counter increments for stream sockets, but its starting point is
20062306a36Sopenharmony_ci  not entirely trivial. This option fixes that.
20162306a36Sopenharmony_ci
20262306a36Sopenharmony_ci  For stream sockets, if SOF_TIMESTAMPING_OPT_ID is set, this should
20362306a36Sopenharmony_ci  always be set too. On datagram sockets the option has no effect.
20462306a36Sopenharmony_ci
20562306a36Sopenharmony_ci  A reasonable expectation is that the counter is reset to zero with
20662306a36Sopenharmony_ci  the system call, so that a subsequent write() of N bytes generates
20762306a36Sopenharmony_ci  a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
20862306a36Sopenharmony_ci  implements this behavior under all conditions.
20962306a36Sopenharmony_ci
21062306a36Sopenharmony_ci  SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
21162306a36Sopenharmony_ci  especially when the socket option is set when no data is in
21262306a36Sopenharmony_ci  transmission. If data is being transmitted, it may be off by the
21362306a36Sopenharmony_ci  length of the output queue (SIOCOUTQ).
21462306a36Sopenharmony_ci
21562306a36Sopenharmony_ci  The difference is due to being based on snd_una versus write_seq.
21662306a36Sopenharmony_ci  snd_una is the offset in the stream acknowledged by the peer. This
21762306a36Sopenharmony_ci  depends on factors outside of process control, such as network RTT.
21862306a36Sopenharmony_ci  write_seq is the last byte written by the process. This offset is
21962306a36Sopenharmony_ci  not affected by external inputs.
22062306a36Sopenharmony_ci
22162306a36Sopenharmony_ci  The difference is subtle and unlikely to be noticed when configured
22262306a36Sopenharmony_ci  at initial socket creation, when no data is queued or sent. But
22362306a36Sopenharmony_ci  SOF_TIMESTAMPING_OPT_ID_TCP behavior is more robust regardless of
22462306a36Sopenharmony_ci  when the socket option is set.
22562306a36Sopenharmony_ci
22662306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_CMSG:
22762306a36Sopenharmony_ci  Support recv() cmsg for all timestamped packets. Control messages
22862306a36Sopenharmony_ci  are already supported unconditionally on all packets with receive
22962306a36Sopenharmony_ci  timestamps and on IPv6 packets with transmit timestamp. This option
23062306a36Sopenharmony_ci  extends them to IPv4 packets with transmit timestamp. One use case
23162306a36Sopenharmony_ci  is to correlate packets with their egress device, by enabling socket
23262306a36Sopenharmony_ci  option IP_PKTINFO simultaneously.
23362306a36Sopenharmony_ci
23462306a36Sopenharmony_ci
23562306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_TSONLY:
23662306a36Sopenharmony_ci  Applies to transmit timestamps only. Makes the kernel return the
23762306a36Sopenharmony_ci  timestamp as a cmsg alongside an empty packet, as opposed to
23862306a36Sopenharmony_ci  alongside the original packet. This reduces the amount of memory
23962306a36Sopenharmony_ci  charged to the socket's receive budget (SO_RCVBUF) and delivers
24062306a36Sopenharmony_ci  the timestamp even if sysctl net.core.tstamp_allow_data is 0.
24162306a36Sopenharmony_ci  This option disables SOF_TIMESTAMPING_OPT_CMSG.
24262306a36Sopenharmony_ci
24362306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_STATS:
24462306a36Sopenharmony_ci  Optional stats that are obtained along with the transmit timestamps.
24562306a36Sopenharmony_ci  It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the
24662306a36Sopenharmony_ci  transmit timestamp is available, the stats are available in a
24762306a36Sopenharmony_ci  separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a
24862306a36Sopenharmony_ci  list of TLVs (struct nlattr) of types. These stats allow the
24962306a36Sopenharmony_ci  application to associate various transport layer stats with
25062306a36Sopenharmony_ci  the transmit timestamps, such as how long a certain block of
25162306a36Sopenharmony_ci  data was limited by peer's receiver window.
25262306a36Sopenharmony_ci
25362306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_PKTINFO:
25462306a36Sopenharmony_ci  Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming
25562306a36Sopenharmony_ci  packets with hardware timestamps. The message contains struct
25662306a36Sopenharmony_ci  scm_ts_pktinfo, which supplies the index of the real interface which
25762306a36Sopenharmony_ci  received the packet and its length at layer 2. A valid (non-zero)
25862306a36Sopenharmony_ci  interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is
25962306a36Sopenharmony_ci  enabled and the driver is using NAPI. The struct contains also two
26062306a36Sopenharmony_ci  other fields, but they are reserved and undefined.
26162306a36Sopenharmony_ci
26262306a36Sopenharmony_ciSOF_TIMESTAMPING_OPT_TX_SWHW:
26362306a36Sopenharmony_ci  Request both hardware and software timestamps for outgoing packets
26462306a36Sopenharmony_ci  when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE
26562306a36Sopenharmony_ci  are enabled at the same time. If both timestamps are generated,
26662306a36Sopenharmony_ci  two separate messages will be looped to the socket's error queue,
26762306a36Sopenharmony_ci  each containing just one timestamp.
26862306a36Sopenharmony_ci
26962306a36Sopenharmony_ciNew applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
27062306a36Sopenharmony_cidisambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
27162306a36Sopenharmony_ciregardless of the setting of sysctl net.core.tstamp_allow_data.
27262306a36Sopenharmony_ci
27362306a36Sopenharmony_ciAn exception is when a process needs additional cmsg data, for
27462306a36Sopenharmony_ciinstance SOL_IP/IP_PKTINFO to detect the egress network interface.
27562306a36Sopenharmony_ciThen pass option SOF_TIMESTAMPING_OPT_CMSG. This option depends on
27662306a36Sopenharmony_cihaving access to the contents of the original packet, so cannot be
27762306a36Sopenharmony_cicombined with SOF_TIMESTAMPING_OPT_TSONLY.
27862306a36Sopenharmony_ci
27962306a36Sopenharmony_ci
28062306a36Sopenharmony_ci1.3.4. Enabling timestamps via control messages
28162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
28262306a36Sopenharmony_ci
28362306a36Sopenharmony_ciIn addition to socket options, timestamp generation can be requested
28462306a36Sopenharmony_ciper write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1).
28562306a36Sopenharmony_ciUsing this feature, applications can sample timestamps per sendmsg()
28662306a36Sopenharmony_ciwithout paying the overhead of enabling and disabling timestamps via
28762306a36Sopenharmony_cisetsockopt::
28862306a36Sopenharmony_ci
28962306a36Sopenharmony_ci  struct msghdr *msg;
29062306a36Sopenharmony_ci  ...
29162306a36Sopenharmony_ci  cmsg			       = CMSG_FIRSTHDR(msg);
29262306a36Sopenharmony_ci  cmsg->cmsg_level	       = SOL_SOCKET;
29362306a36Sopenharmony_ci  cmsg->cmsg_type	       = SO_TIMESTAMPING;
29462306a36Sopenharmony_ci  cmsg->cmsg_len	       = CMSG_LEN(sizeof(__u32));
29562306a36Sopenharmony_ci  *((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED |
29662306a36Sopenharmony_ci				 SOF_TIMESTAMPING_TX_SOFTWARE |
29762306a36Sopenharmony_ci				 SOF_TIMESTAMPING_TX_ACK;
29862306a36Sopenharmony_ci  err = sendmsg(fd, msg, 0);
29962306a36Sopenharmony_ci
30062306a36Sopenharmony_ciThe SOF_TIMESTAMPING_TX_* flags set via cmsg will override
30162306a36Sopenharmony_cithe SOF_TIMESTAMPING_TX_* flags set via setsockopt.
30262306a36Sopenharmony_ci
30362306a36Sopenharmony_ciMoreover, applications must still enable timestamp reporting via
30462306a36Sopenharmony_cisetsockopt to receive timestamps::
30562306a36Sopenharmony_ci
30662306a36Sopenharmony_ci  __u32 val = SOF_TIMESTAMPING_SOFTWARE |
30762306a36Sopenharmony_ci	      SOF_TIMESTAMPING_OPT_ID /* or any other flag */;
30862306a36Sopenharmony_ci  err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
30962306a36Sopenharmony_ci
31062306a36Sopenharmony_ci
31162306a36Sopenharmony_ci1.4 Bytestream Timestamps
31262306a36Sopenharmony_ci-------------------------
31362306a36Sopenharmony_ci
31462306a36Sopenharmony_ciThe SO_TIMESTAMPING interface supports timestamping of bytes in a
31562306a36Sopenharmony_cibytestream. Each request is interpreted as a request for when the
31662306a36Sopenharmony_cientire contents of the buffer has passed a timestamping point. That
31762306a36Sopenharmony_ciis, for streams option SOF_TIMESTAMPING_TX_SOFTWARE will record
31862306a36Sopenharmony_ciwhen all bytes have reached the device driver, regardless of how
31962306a36Sopenharmony_cimany packets the data has been converted into.
32062306a36Sopenharmony_ci
32162306a36Sopenharmony_ciIn general, bytestreams have no natural delimiters and therefore
32262306a36Sopenharmony_cicorrelating a timestamp with data is non-trivial. A range of bytes
32362306a36Sopenharmony_cimay be split across segments, any segments may be merged (possibly
32462306a36Sopenharmony_cicoalescing sections of previously segmented buffers associated with
32562306a36Sopenharmony_ciindependent send() calls). Segments can be reordered and the same
32662306a36Sopenharmony_cibyte range can coexist in multiple segments for protocols that
32762306a36Sopenharmony_ciimplement retransmissions.
32862306a36Sopenharmony_ci
32962306a36Sopenharmony_ciIt is essential that all timestamps implement the same semantics,
33062306a36Sopenharmony_ciregardless of these possible transformations, as otherwise they are
33162306a36Sopenharmony_ciincomparable. Handling "rare" corner cases differently from the
33262306a36Sopenharmony_cisimple case (a 1:1 mapping from buffer to skb) is insufficient
33362306a36Sopenharmony_cibecause performance debugging often needs to focus on such outliers.
33462306a36Sopenharmony_ci
33562306a36Sopenharmony_ciIn practice, timestamps can be correlated with segments of a
33662306a36Sopenharmony_cibytestream consistently, if both semantics of the timestamp and the
33762306a36Sopenharmony_citiming of measurement are chosen correctly. This challenge is no
33862306a36Sopenharmony_cidifferent from deciding on a strategy for IP fragmentation. There, the
33962306a36Sopenharmony_cidefinition is that only the first fragment is timestamped. For
34062306a36Sopenharmony_cibytestreams, we chose that a timestamp is generated only when all
34162306a36Sopenharmony_cibytes have passed a point. SOF_TIMESTAMPING_TX_ACK as defined is easy to
34262306a36Sopenharmony_ciimplement and reason about. An implementation that has to take into
34362306a36Sopenharmony_ciaccount SACK would be more complex due to possible transmission holes
34462306a36Sopenharmony_ciand out of order arrival.
34562306a36Sopenharmony_ci
34662306a36Sopenharmony_ciOn the host, TCP can also break the simple 1:1 mapping from buffer to
34762306a36Sopenharmony_ciskbuff as a result of Nagle, cork, autocork, segmentation and GSO. The
34862306a36Sopenharmony_ciimplementation ensures correctness in all cases by tracking the
34962306a36Sopenharmony_ciindividual last byte passed to send(), even if it is no longer the
35062306a36Sopenharmony_cilast byte after an skbuff extend or merge operation. It stores the
35162306a36Sopenharmony_cirelevant sequence number in skb_shinfo(skb)->tskey. Because an skbuff
35262306a36Sopenharmony_cihas only one such field, only one timestamp can be generated.
35362306a36Sopenharmony_ci
35462306a36Sopenharmony_ciIn rare cases, a timestamp request can be missed if two requests are
35562306a36Sopenharmony_cicollapsed onto the same skb. A process can detect this situation by
35662306a36Sopenharmony_cienabling SOF_TIMESTAMPING_OPT_ID and comparing the byte offset at
35762306a36Sopenharmony_cisend time with the value returned for each timestamp. It can prevent
35862306a36Sopenharmony_cithe situation by always flushing the TCP stack in between requests,
35962306a36Sopenharmony_cifor instance by enabling TCP_NODELAY and disabling TCP_CORK and
36062306a36Sopenharmony_ciautocork.
36162306a36Sopenharmony_ci
36262306a36Sopenharmony_ciThese precautions ensure that the timestamp is generated only when all
36362306a36Sopenharmony_cibytes have passed a timestamp point, assuming that the network stack
36462306a36Sopenharmony_ciitself does not reorder the segments. The stack indeed tries to avoid
36562306a36Sopenharmony_cireordering. The one exception is under administrator control: it is
36662306a36Sopenharmony_cipossible to construct a packet scheduler configuration that delays
36762306a36Sopenharmony_cisegments from the same stream differently. Such a setup would be
36862306a36Sopenharmony_ciunusual.
36962306a36Sopenharmony_ci
37062306a36Sopenharmony_ci
37162306a36Sopenharmony_ci2 Data Interfaces
37262306a36Sopenharmony_ci==================
37362306a36Sopenharmony_ci
37462306a36Sopenharmony_ciTimestamps are read using the ancillary data feature of recvmsg().
37562306a36Sopenharmony_ciSee `man 3 cmsg` for details of this interface. The socket manual
37662306a36Sopenharmony_cipage (`man 7 socket`) describes how timestamps generated with
37762306a36Sopenharmony_ciSO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved.
37862306a36Sopenharmony_ci
37962306a36Sopenharmony_ci
38062306a36Sopenharmony_ci2.1 SCM_TIMESTAMPING records
38162306a36Sopenharmony_ci----------------------------
38262306a36Sopenharmony_ci
38362306a36Sopenharmony_ciThese timestamps are returned in a control message with cmsg_level
38462306a36Sopenharmony_ciSOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type
38562306a36Sopenharmony_ci
38662306a36Sopenharmony_ciFor SO_TIMESTAMPING_OLD::
38762306a36Sopenharmony_ci
38862306a36Sopenharmony_ci	struct scm_timestamping {
38962306a36Sopenharmony_ci		struct timespec ts[3];
39062306a36Sopenharmony_ci	};
39162306a36Sopenharmony_ci
39262306a36Sopenharmony_ciFor SO_TIMESTAMPING_NEW::
39362306a36Sopenharmony_ci
39462306a36Sopenharmony_ci	struct scm_timestamping64 {
39562306a36Sopenharmony_ci		struct __kernel_timespec ts[3];
39662306a36Sopenharmony_ci
39762306a36Sopenharmony_ciAlways use SO_TIMESTAMPING_NEW timestamp to always get timestamp in
39862306a36Sopenharmony_cistruct scm_timestamping64 format.
39962306a36Sopenharmony_ci
40062306a36Sopenharmony_ciSO_TIMESTAMPING_OLD returns incorrect timestamps after the year 2038
40162306a36Sopenharmony_cion 32 bit machines.
40262306a36Sopenharmony_ci
40362306a36Sopenharmony_ciThe structure can return up to three timestamps. This is a legacy
40462306a36Sopenharmony_cifeature. At least one field is non-zero at any time. Most timestamps
40562306a36Sopenharmony_ciare passed in ts[0]. Hardware timestamps are passed in ts[2].
40662306a36Sopenharmony_ci
40762306a36Sopenharmony_cits[1] used to hold hardware timestamps converted to system time.
40862306a36Sopenharmony_ciInstead, expose the hardware clock device on the NIC directly as
40962306a36Sopenharmony_cia HW PTP clock source, to allow time conversion in userspace and
41062306a36Sopenharmony_cioptionally synchronize system time with a userspace PTP stack such
41162306a36Sopenharmony_cias linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst.
41262306a36Sopenharmony_ci
41362306a36Sopenharmony_ciNote that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled
41462306a36Sopenharmony_citogether with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false
41562306a36Sopenharmony_cisoftware timestamp will be generated in the recvmsg() call and passed
41662306a36Sopenharmony_ciin ts[0] when a real software timestamp is missing. This happens also
41762306a36Sopenharmony_cion hardware transmit timestamps.
41862306a36Sopenharmony_ci
41962306a36Sopenharmony_ci2.1.1 Transmit timestamps with MSG_ERRQUEUE
42062306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
42162306a36Sopenharmony_ci
42262306a36Sopenharmony_ciFor transmit timestamps the outgoing packet is looped back to the
42362306a36Sopenharmony_cisocket's error queue with the send timestamp(s) attached. A process
42462306a36Sopenharmony_cireceives the timestamps by calling recvmsg() with flag MSG_ERRQUEUE
42562306a36Sopenharmony_ciset and with a msg_control buffer sufficiently large to receive the
42662306a36Sopenharmony_cirelevant metadata structures. The recvmsg call returns the original
42762306a36Sopenharmony_cioutgoing data packet with two ancillary messages attached.
42862306a36Sopenharmony_ci
42962306a36Sopenharmony_ciA message of cm_level SOL_IP(V6) and cm_type IP(V6)_RECVERR
43062306a36Sopenharmony_ciembeds a struct sock_extended_err. This defines the error type. For
43162306a36Sopenharmony_citimestamps, the ee_errno field is ENOMSG. The other ancillary message
43262306a36Sopenharmony_ciwill have cm_level SOL_SOCKET and cm_type SCM_TIMESTAMPING. This
43362306a36Sopenharmony_ciembeds the struct scm_timestamping.
43462306a36Sopenharmony_ci
43562306a36Sopenharmony_ci
43662306a36Sopenharmony_ci2.1.1.2 Timestamp types
43762306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~
43862306a36Sopenharmony_ci
43962306a36Sopenharmony_ciThe semantics of the three struct timespec are defined by field
44062306a36Sopenharmony_ciee_info in the extended error structure. It contains a value of
44162306a36Sopenharmony_citype SCM_TSTAMP_* to define the actual timestamp passed in
44262306a36Sopenharmony_ciscm_timestamping.
44362306a36Sopenharmony_ci
44462306a36Sopenharmony_ciThe SCM_TSTAMP_* types are 1:1 matches to the SOF_TIMESTAMPING_*
44562306a36Sopenharmony_cicontrol fields discussed previously, with one exception. For legacy
44662306a36Sopenharmony_cireasons, SCM_TSTAMP_SND is equal to zero and can be set for both
44762306a36Sopenharmony_ciSOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE. It
44862306a36Sopenharmony_ciis the first if ts[2] is non-zero, the second otherwise, in which
44962306a36Sopenharmony_cicase the timestamp is stored in ts[0].
45062306a36Sopenharmony_ci
45162306a36Sopenharmony_ci
45262306a36Sopenharmony_ci2.1.1.3 Fragmentation
45362306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~
45462306a36Sopenharmony_ci
45562306a36Sopenharmony_ciFragmentation of outgoing datagrams is rare, but is possible, e.g., by
45662306a36Sopenharmony_ciexplicitly disabling PMTU discovery. If an outgoing packet is fragmented,
45762306a36Sopenharmony_cithen only the first fragment is timestamped and returned to the sending
45862306a36Sopenharmony_cisocket.
45962306a36Sopenharmony_ci
46062306a36Sopenharmony_ci
46162306a36Sopenharmony_ci2.1.1.4 Packet Payload
46262306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~
46362306a36Sopenharmony_ci
46462306a36Sopenharmony_ciThe calling application is often not interested in receiving the whole
46562306a36Sopenharmony_cipacket payload that it passed to the stack originally: the socket
46662306a36Sopenharmony_cierror queue mechanism is just a method to piggyback the timestamp on.
46762306a36Sopenharmony_ciIn this case, the application can choose to read datagrams with a
46862306a36Sopenharmony_cismaller buffer, possibly even of length 0. The payload is truncated
46962306a36Sopenharmony_ciaccordingly. Until the process calls recvmsg() on the error queue,
47062306a36Sopenharmony_cihowever, the full packet is queued, taking up budget from SO_RCVBUF.
47162306a36Sopenharmony_ci
47262306a36Sopenharmony_ci
47362306a36Sopenharmony_ci2.1.1.5 Blocking Read
47462306a36Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~
47562306a36Sopenharmony_ci
47662306a36Sopenharmony_ciReading from the error queue is always a non-blocking operation. To
47762306a36Sopenharmony_ciblock waiting on a timestamp, use poll or select. poll() will return
47862306a36Sopenharmony_ciPOLLERR in pollfd.revents if any data is ready on the error queue.
47962306a36Sopenharmony_ciThere is no need to pass this flag in pollfd.events. This flag is
48062306a36Sopenharmony_ciignored on request. See also `man 2 poll`.
48162306a36Sopenharmony_ci
48262306a36Sopenharmony_ci
48362306a36Sopenharmony_ci2.1.2 Receive timestamps
48462306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^
48562306a36Sopenharmony_ci
48662306a36Sopenharmony_ciOn reception, there is no reason to read from the socket error queue.
48762306a36Sopenharmony_ciThe SCM_TIMESTAMPING ancillary data is sent along with the packet data
48862306a36Sopenharmony_cion a normal recvmsg(). Since this is not a socket error, it is not
48962306a36Sopenharmony_ciaccompanied by a message SOL_IP(V6)/IP(V6)_RECVERROR. In this case,
49062306a36Sopenharmony_cithe meaning of the three fields in struct scm_timestamping is
49162306a36Sopenharmony_ciimplicitly defined. ts[0] holds a software timestamp if set, ts[1]
49262306a36Sopenharmony_ciis again deprecated and ts[2] holds a hardware timestamp if set.
49362306a36Sopenharmony_ci
49462306a36Sopenharmony_ci
49562306a36Sopenharmony_ci3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP
49662306a36Sopenharmony_ci=======================================================================
49762306a36Sopenharmony_ci
49862306a36Sopenharmony_ciHardware time stamping must also be initialized for each device driver
49962306a36Sopenharmony_cithat is expected to do hardware time stamping. The parameter is defined in
50062306a36Sopenharmony_ciinclude/uapi/linux/net_tstamp.h as::
50162306a36Sopenharmony_ci
50262306a36Sopenharmony_ci	struct hwtstamp_config {
50362306a36Sopenharmony_ci		int flags;	/* no flags defined right now, must be zero */
50462306a36Sopenharmony_ci		int tx_type;	/* HWTSTAMP_TX_* */
50562306a36Sopenharmony_ci		int rx_filter;	/* HWTSTAMP_FILTER_* */
50662306a36Sopenharmony_ci	};
50762306a36Sopenharmony_ci
50862306a36Sopenharmony_ciDesired behavior is passed into the kernel and to a specific device by
50962306a36Sopenharmony_cicalling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose
51062306a36Sopenharmony_ciifr_data points to a struct hwtstamp_config. The tx_type and
51162306a36Sopenharmony_cirx_filter are hints to the driver what it is expected to do. If
51262306a36Sopenharmony_cithe requested fine-grained filtering for incoming packets is not
51362306a36Sopenharmony_cisupported, the driver may time stamp more than just the requested types
51462306a36Sopenharmony_ciof packets.
51562306a36Sopenharmony_ci
51662306a36Sopenharmony_ciDrivers are free to use a more permissive configuration than the requested
51762306a36Sopenharmony_ciconfiguration. It is expected that drivers should only implement directly the
51862306a36Sopenharmony_cimost generic mode that can be supported. For example if the hardware can
51962306a36Sopenharmony_cisupport HWTSTAMP_FILTER_PTP_V2_EVENT, then it should generally always upscale
52062306a36Sopenharmony_ciHWTSTAMP_FILTER_PTP_V2_L2_SYNC, and so forth, as HWTSTAMP_FILTER_PTP_V2_EVENT
52162306a36Sopenharmony_ciis more generic (and more useful to applications).
52262306a36Sopenharmony_ci
52362306a36Sopenharmony_ciA driver which supports hardware time stamping shall update the struct
52462306a36Sopenharmony_ciwith the actual, possibly more permissive configuration. If the
52562306a36Sopenharmony_cirequested packets cannot be time stamped, then nothing should be
52662306a36Sopenharmony_cichanged and ERANGE shall be returned (in contrast to EINVAL, which
52762306a36Sopenharmony_ciindicates that SIOCSHWTSTAMP is not supported at all).
52862306a36Sopenharmony_ci
52962306a36Sopenharmony_ciOnly a processes with admin rights may change the configuration. User
53062306a36Sopenharmony_cispace is responsible to ensure that multiple processes don't interfere
53162306a36Sopenharmony_ciwith each other and that the settings are reset.
53262306a36Sopenharmony_ci
53362306a36Sopenharmony_ciAny process can read the actual configuration by passing this
53462306a36Sopenharmony_cistructure to ioctl(SIOCGHWTSTAMP) in the same way.  However, this has
53562306a36Sopenharmony_cinot been implemented in all drivers.
53662306a36Sopenharmony_ci
53762306a36Sopenharmony_ci::
53862306a36Sopenharmony_ci
53962306a36Sopenharmony_ci    /* possible values for hwtstamp_config->tx_type */
54062306a36Sopenharmony_ci    enum {
54162306a36Sopenharmony_ci	    /*
54262306a36Sopenharmony_ci	    * no outgoing packet will need hardware time stamping;
54362306a36Sopenharmony_ci	    * should a packet arrive which asks for it, no hardware
54462306a36Sopenharmony_ci	    * time stamping will be done
54562306a36Sopenharmony_ci	    */
54662306a36Sopenharmony_ci	    HWTSTAMP_TX_OFF,
54762306a36Sopenharmony_ci
54862306a36Sopenharmony_ci	    /*
54962306a36Sopenharmony_ci	    * enables hardware time stamping for outgoing packets;
55062306a36Sopenharmony_ci	    * the sender of the packet decides which are to be
55162306a36Sopenharmony_ci	    * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
55262306a36Sopenharmony_ci	    * before sending the packet
55362306a36Sopenharmony_ci	    */
55462306a36Sopenharmony_ci	    HWTSTAMP_TX_ON,
55562306a36Sopenharmony_ci    };
55662306a36Sopenharmony_ci
55762306a36Sopenharmony_ci    /* possible values for hwtstamp_config->rx_filter */
55862306a36Sopenharmony_ci    enum {
55962306a36Sopenharmony_ci	    /* time stamp no incoming packet at all */
56062306a36Sopenharmony_ci	    HWTSTAMP_FILTER_NONE,
56162306a36Sopenharmony_ci
56262306a36Sopenharmony_ci	    /* time stamp any incoming packet */
56362306a36Sopenharmony_ci	    HWTSTAMP_FILTER_ALL,
56462306a36Sopenharmony_ci
56562306a36Sopenharmony_ci	    /* return value: time stamp all packets requested plus some others */
56662306a36Sopenharmony_ci	    HWTSTAMP_FILTER_SOME,
56762306a36Sopenharmony_ci
56862306a36Sopenharmony_ci	    /* PTP v1, UDP, any kind of event packet */
56962306a36Sopenharmony_ci	    HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
57062306a36Sopenharmony_ci
57162306a36Sopenharmony_ci	    /* for the complete list of values, please check
57262306a36Sopenharmony_ci	    * the include file include/uapi/linux/net_tstamp.h
57362306a36Sopenharmony_ci	    */
57462306a36Sopenharmony_ci    };
57562306a36Sopenharmony_ci
57662306a36Sopenharmony_ci3.1 Hardware Timestamping Implementation: Device Drivers
57762306a36Sopenharmony_ci--------------------------------------------------------
57862306a36Sopenharmony_ci
57962306a36Sopenharmony_ciA driver which supports hardware time stamping must support the
58062306a36Sopenharmony_ciSIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with
58162306a36Sopenharmony_cithe actual values as described in the section on SIOCSHWTSTAMP.  It
58262306a36Sopenharmony_cishould also support SIOCGHWTSTAMP.
58362306a36Sopenharmony_ci
58462306a36Sopenharmony_ciTime stamps for received packets must be stored in the skb. To get a pointer
58562306a36Sopenharmony_cito the shared time stamp structure of the skb call skb_hwtstamps(). Then
58662306a36Sopenharmony_ciset the time stamps in the structure::
58762306a36Sopenharmony_ci
58862306a36Sopenharmony_ci    struct skb_shared_hwtstamps {
58962306a36Sopenharmony_ci	    /* hardware time stamp transformed into duration
59062306a36Sopenharmony_ci	    * since arbitrary point in time
59162306a36Sopenharmony_ci	    */
59262306a36Sopenharmony_ci	    ktime_t	hwtstamp;
59362306a36Sopenharmony_ci    };
59462306a36Sopenharmony_ci
59562306a36Sopenharmony_ciTime stamps for outgoing packets are to be generated as follows:
59662306a36Sopenharmony_ci
59762306a36Sopenharmony_ci- In hard_start_xmit(), check if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)
59862306a36Sopenharmony_ci  is set no-zero. If yes, then the driver is expected to do hardware time
59962306a36Sopenharmony_ci  stamping.
60062306a36Sopenharmony_ci- If this is possible for the skb and requested, then declare
60162306a36Sopenharmony_ci  that the driver is doing the time stamping by setting the flag
60262306a36Sopenharmony_ci  SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with::
60362306a36Sopenharmony_ci
60462306a36Sopenharmony_ci      skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
60562306a36Sopenharmony_ci
60662306a36Sopenharmony_ci  You might want to keep a pointer to the associated skb for the next step
60762306a36Sopenharmony_ci  and not free the skb. A driver not supporting hardware time stamping doesn't
60862306a36Sopenharmony_ci  do that. A driver must never touch sk_buff::tstamp! It is used to store
60962306a36Sopenharmony_ci  software generated time stamps by the network subsystem.
61062306a36Sopenharmony_ci- Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware
61162306a36Sopenharmony_ci  as possible. skb_tx_timestamp() provides a software time stamp if requested
61262306a36Sopenharmony_ci  and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set).
61362306a36Sopenharmony_ci- As soon as the driver has sent the packet and/or obtained a
61462306a36Sopenharmony_ci  hardware time stamp for it, it passes the time stamp back by
61562306a36Sopenharmony_ci  calling skb_tstamp_tx() with the original skb, the raw
61662306a36Sopenharmony_ci  hardware time stamp. skb_tstamp_tx() clones the original skb and
61762306a36Sopenharmony_ci  adds the timestamps, therefore the original skb has to be freed now.
61862306a36Sopenharmony_ci  If obtaining the hardware time stamp somehow fails, then the driver
61962306a36Sopenharmony_ci  should not fall back to software time stamping. The rationale is that
62062306a36Sopenharmony_ci  this would occur at a later time in the processing pipeline than other
62162306a36Sopenharmony_ci  software time stamping and therefore could lead to unexpected deltas
62262306a36Sopenharmony_ci  between time stamps.
62362306a36Sopenharmony_ci
62462306a36Sopenharmony_ci3.2 Special considerations for stacked PTP Hardware Clocks
62562306a36Sopenharmony_ci----------------------------------------------------------
62662306a36Sopenharmony_ci
62762306a36Sopenharmony_ciThere are situations when there may be more than one PHC (PTP Hardware Clock)
62862306a36Sopenharmony_ciin the data path of a packet. The kernel has no explicit mechanism to allow the
62962306a36Sopenharmony_ciuser to select which PHC to use for timestamping Ethernet frames. Instead, the
63062306a36Sopenharmony_ciassumption is that the outermost PHC is always the most preferable, and that
63162306a36Sopenharmony_cikernel drivers collaborate towards achieving that goal. Currently there are 3
63262306a36Sopenharmony_cicases of stacked PHCs, detailed below:
63362306a36Sopenharmony_ci
63462306a36Sopenharmony_ci3.2.1 DSA (Distributed Switch Architecture) switches
63562306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
63662306a36Sopenharmony_ci
63762306a36Sopenharmony_ciThese are Ethernet switches which have one of their ports connected to an
63862306a36Sopenharmony_ci(otherwise completely unaware) host Ethernet interface, and perform the role of
63962306a36Sopenharmony_cia port multiplier with optional forwarding acceleration features.  Each DSA
64062306a36Sopenharmony_ciswitch port is visible to the user as a standalone (virtual) network interface,
64162306a36Sopenharmony_ciand its network I/O is performed, under the hood, indirectly through the host
64262306a36Sopenharmony_ciinterface (redirecting to the host port on TX, and intercepting frames on RX).
64362306a36Sopenharmony_ci
64462306a36Sopenharmony_ciWhen a DSA switch is attached to a host port, PTP synchronization has to
64562306a36Sopenharmony_cisuffer, since the switch's variable queuing delay introduces a path delay
64662306a36Sopenharmony_cijitter between the host port and its PTP partner. For this reason, some DSA
64762306a36Sopenharmony_ciswitches include a timestamping clock of their own, and have the ability to
64862306a36Sopenharmony_ciperform network timestamping on their own MAC, such that path delays only
64962306a36Sopenharmony_cimeasure wire and PHY propagation latencies. Timestamping DSA switches are
65062306a36Sopenharmony_cisupported in Linux and expose the same ABI as any other network interface (save
65162306a36Sopenharmony_cifor the fact that the DSA interfaces are in fact virtual in terms of network
65262306a36Sopenharmony_ciI/O, they do have their own PHC).  It is typical, but not mandatory, for all
65362306a36Sopenharmony_ciinterfaces of a DSA switch to share the same PHC.
65462306a36Sopenharmony_ci
65562306a36Sopenharmony_ciBy design, PTP timestamping with a DSA switch does not need any special
65662306a36Sopenharmony_cihandling in the driver for the host port it is attached to.  However, when the
65762306a36Sopenharmony_cihost port also supports PTP timestamping, DSA will take care of intercepting
65862306a36Sopenharmony_cithe ``.ndo_eth_ioctl`` calls towards the host port, and block attempts to enable
65962306a36Sopenharmony_cihardware timestamping on it. This is because the SO_TIMESTAMPING API does not
66062306a36Sopenharmony_ciallow the delivery of multiple hardware timestamps for the same packet, so
66162306a36Sopenharmony_cianybody else except for the DSA switch port must be prevented from doing so.
66262306a36Sopenharmony_ci
66362306a36Sopenharmony_ciIn the generic layer, DSA provides the following infrastructure for PTP
66462306a36Sopenharmony_citimestamping:
66562306a36Sopenharmony_ci
66662306a36Sopenharmony_ci- ``.port_txtstamp()``: a hook called prior to the transmission of
66762306a36Sopenharmony_ci  packets with a hardware TX timestamping request from user space.
66862306a36Sopenharmony_ci  This is required for two-step timestamping, since the hardware
66962306a36Sopenharmony_ci  timestamp becomes available after the actual MAC transmission, so the
67062306a36Sopenharmony_ci  driver must be prepared to correlate the timestamp with the original
67162306a36Sopenharmony_ci  packet so that it can re-enqueue the packet back into the socket's
67262306a36Sopenharmony_ci  error queue. To save the packet for when the timestamp becomes
67362306a36Sopenharmony_ci  available, the driver can call ``skb_clone_sk`` , save the clone pointer
67462306a36Sopenharmony_ci  in skb->cb and enqueue a tx skb queue. Typically, a switch will have a
67562306a36Sopenharmony_ci  PTP TX timestamp register (or sometimes a FIFO) where the timestamp
67662306a36Sopenharmony_ci  becomes available. In case of a FIFO, the hardware might store
67762306a36Sopenharmony_ci  key-value pairs of PTP sequence ID/message type/domain number and the
67862306a36Sopenharmony_ci  actual timestamp. To perform the correlation correctly between the
67962306a36Sopenharmony_ci  packets in a queue waiting for timestamping and the actual timestamps,
68062306a36Sopenharmony_ci  drivers can use a BPF classifier (``ptp_classify_raw``) to identify
68162306a36Sopenharmony_ci  the PTP transport type, and ``ptp_parse_header`` to interpret the PTP
68262306a36Sopenharmony_ci  header fields. There may be an IRQ that is raised upon this
68362306a36Sopenharmony_ci  timestamp's availability, or the driver might have to poll after
68462306a36Sopenharmony_ci  invoking ``dev_queue_xmit()`` towards the host interface.
68562306a36Sopenharmony_ci  One-step TX timestamping do not require packet cloning, since there is
68662306a36Sopenharmony_ci  no follow-up message required by the PTP protocol (because the
68762306a36Sopenharmony_ci  TX timestamp is embedded into the packet by the MAC), and therefore
68862306a36Sopenharmony_ci  user space does not expect the packet annotated with the TX timestamp
68962306a36Sopenharmony_ci  to be re-enqueued into its socket's error queue.
69062306a36Sopenharmony_ci
69162306a36Sopenharmony_ci- ``.port_rxtstamp()``: On RX, the BPF classifier is run by DSA to
69262306a36Sopenharmony_ci  identify PTP event messages (any other packets, including PTP general
69362306a36Sopenharmony_ci  messages, are not timestamped). The original (and only) timestampable
69462306a36Sopenharmony_ci  skb is provided to the driver, for it to annotate it with a timestamp,
69562306a36Sopenharmony_ci  if that is immediately available, or defer to later. On reception,
69662306a36Sopenharmony_ci  timestamps might either be available in-band (through metadata in the
69762306a36Sopenharmony_ci  DSA header, or attached in other ways to the packet), or out-of-band
69862306a36Sopenharmony_ci  (through another RX timestamping FIFO). Deferral on RX is typically
69962306a36Sopenharmony_ci  necessary when retrieving the timestamp needs a sleepable context. In
70062306a36Sopenharmony_ci  that case, it is the responsibility of the DSA driver to call
70162306a36Sopenharmony_ci  ``netif_rx()`` on the freshly timestamped skb.
70262306a36Sopenharmony_ci
70362306a36Sopenharmony_ci3.2.2 Ethernet PHYs
70462306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^
70562306a36Sopenharmony_ci
70662306a36Sopenharmony_ciThese are devices that typically fulfill a Layer 1 role in the network stack,
70762306a36Sopenharmony_cihence they do not have a representation in terms of a network interface as DSA
70862306a36Sopenharmony_ciswitches do. However, PHYs may be able to detect and timestamp PTP packets, for
70962306a36Sopenharmony_ciperformance reasons: timestamps taken as close as possible to the wire have the
71062306a36Sopenharmony_cipotential to yield a more stable and precise synchronization.
71162306a36Sopenharmony_ci
71262306a36Sopenharmony_ciA PHY driver that supports PTP timestamping must create a ``struct
71362306a36Sopenharmony_cimii_timestamper`` and add a pointer to it in ``phydev->mii_ts``. The presence
71462306a36Sopenharmony_ciof this pointer will be checked by the networking stack.
71562306a36Sopenharmony_ci
71662306a36Sopenharmony_ciSince PHYs do not have network interface representations, the timestamping and
71762306a36Sopenharmony_ciethtool ioctl operations for them need to be mediated by their respective MAC
71862306a36Sopenharmony_cidriver.  Therefore, as opposed to DSA switches, modifications need to be done
71962306a36Sopenharmony_cito each individual MAC driver for PHY timestamping support. This entails:
72062306a36Sopenharmony_ci
72162306a36Sopenharmony_ci- Checking, in ``.ndo_eth_ioctl``, whether ``phy_has_hwtstamp(netdev->phydev)``
72262306a36Sopenharmony_ci  is true or not. If it is, then the MAC driver should not process this request
72362306a36Sopenharmony_ci  but instead pass it on to the PHY using ``phy_mii_ioctl()``.
72462306a36Sopenharmony_ci
72562306a36Sopenharmony_ci- On RX, special intervention may or may not be needed, depending on the
72662306a36Sopenharmony_ci  function used to deliver skb's up the network stack. In the case of plain
72762306a36Sopenharmony_ci  ``netif_rx()`` and similar, MAC drivers must check whether
72862306a36Sopenharmony_ci  ``skb_defer_rx_timestamp(skb)`` is necessary or not - and if it is, don't
72962306a36Sopenharmony_ci  call ``netif_rx()`` at all.  If ``CONFIG_NETWORK_PHY_TIMESTAMPING`` is
73062306a36Sopenharmony_ci  enabled, and ``skb->dev->phydev->mii_ts`` exists, its ``.rxtstamp()`` hook
73162306a36Sopenharmony_ci  will be called now, to determine, using logic very similar to DSA, whether
73262306a36Sopenharmony_ci  deferral for RX timestamping is necessary.  Again like DSA, it becomes the
73362306a36Sopenharmony_ci  responsibility of the PHY driver to send the packet up the stack when the
73462306a36Sopenharmony_ci  timestamp is available.
73562306a36Sopenharmony_ci
73662306a36Sopenharmony_ci  For other skb receive functions, such as ``napi_gro_receive`` and
73762306a36Sopenharmony_ci  ``netif_receive_skb``, the stack automatically checks whether
73862306a36Sopenharmony_ci  ``skb_defer_rx_timestamp()`` is necessary, so this check is not needed inside
73962306a36Sopenharmony_ci  the driver.
74062306a36Sopenharmony_ci
74162306a36Sopenharmony_ci- On TX, again, special intervention might or might not be needed.  The
74262306a36Sopenharmony_ci  function that calls the ``mii_ts->txtstamp()`` hook is named
74362306a36Sopenharmony_ci  ``skb_clone_tx_timestamp()``. This function can either be called directly
74462306a36Sopenharmony_ci  (case in which explicit MAC driver support is indeed needed), but the
74562306a36Sopenharmony_ci  function also piggybacks from the ``skb_tx_timestamp()`` call, which many MAC
74662306a36Sopenharmony_ci  drivers already perform for software timestamping purposes. Therefore, if a
74762306a36Sopenharmony_ci  MAC supports software timestamping, it does not need to do anything further
74862306a36Sopenharmony_ci  at this stage.
74962306a36Sopenharmony_ci
75062306a36Sopenharmony_ci3.2.3 MII bus snooping devices
75162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
75262306a36Sopenharmony_ci
75362306a36Sopenharmony_ciThese perform the same role as timestamping Ethernet PHYs, save for the fact
75462306a36Sopenharmony_cithat they are discrete devices and can therefore be used in conjunction with
75562306a36Sopenharmony_ciany PHY even if it doesn't support timestamping. In Linux, they are
75662306a36Sopenharmony_cidiscoverable and attachable to a ``struct phy_device`` through Device Tree, and
75762306a36Sopenharmony_cifor the rest, they use the same mii_ts infrastructure as those. See
75862306a36Sopenharmony_ciDocumentation/devicetree/bindings/ptp/timestamper.txt for more details.
75962306a36Sopenharmony_ci
76062306a36Sopenharmony_ci3.2.4 Other caveats for MAC drivers
76162306a36Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
76262306a36Sopenharmony_ci
76362306a36Sopenharmony_ciStacked PHCs, especially DSA (but not only) - since that doesn't require any
76462306a36Sopenharmony_cimodification to MAC drivers, so it is more difficult to ensure correctness of
76562306a36Sopenharmony_ciall possible code paths - is that they uncover bugs which were impossible to
76662306a36Sopenharmony_citrigger before the existence of stacked PTP clocks.  One example has to do with
76762306a36Sopenharmony_cithis line of code, already presented earlier::
76862306a36Sopenharmony_ci
76962306a36Sopenharmony_ci      skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
77062306a36Sopenharmony_ci
77162306a36Sopenharmony_ciAny TX timestamping logic, be it a plain MAC driver, a DSA switch driver, a PHY
77262306a36Sopenharmony_cidriver or a MII bus snooping device driver, should set this flag.
77362306a36Sopenharmony_ciBut a MAC driver that is unaware of PHC stacking might get tripped up by
77462306a36Sopenharmony_cisomebody other than itself setting this flag, and deliver a duplicate
77562306a36Sopenharmony_citimestamp.
77662306a36Sopenharmony_ciFor example, a typical driver design for TX timestamping might be to split the
77762306a36Sopenharmony_citransmission part into 2 portions:
77862306a36Sopenharmony_ci
77962306a36Sopenharmony_ci1. "TX": checks whether PTP timestamping has been previously enabled through
78062306a36Sopenharmony_ci   the ``.ndo_eth_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the
78162306a36Sopenharmony_ci   current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags &
78262306a36Sopenharmony_ci   SKBTX_HW_TSTAMP``"). If this is true, it sets the
78362306a36Sopenharmony_ci   "``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS``" flag. Note: as
78462306a36Sopenharmony_ci   described above, in the case of a stacked PHC system, this condition should
78562306a36Sopenharmony_ci   never trigger, as this MAC is certainly not the outermost PHC. But this is
78662306a36Sopenharmony_ci   not where the typical issue is.  Transmission proceeds with this packet.
78762306a36Sopenharmony_ci
78862306a36Sopenharmony_ci2. "TX confirmation": Transmission has finished. The driver checks whether it
78962306a36Sopenharmony_ci   is necessary to collect any TX timestamp for it. Here is where the typical
79062306a36Sopenharmony_ci   issues are: the MAC driver takes a shortcut and only checks whether
79162306a36Sopenharmony_ci   "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``" was set. With a stacked
79262306a36Sopenharmony_ci   PHC system, this is incorrect because this MAC driver is not the only entity
79362306a36Sopenharmony_ci   in the TX data path who could have enabled SKBTX_IN_PROGRESS in the first
79462306a36Sopenharmony_ci   place.
79562306a36Sopenharmony_ci
79662306a36Sopenharmony_ciThe correct solution for this problem is for MAC drivers to have a compound
79762306a36Sopenharmony_cicheck in their "TX confirmation" portion, not only for
79862306a36Sopenharmony_ci"``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``", but also for
79962306a36Sopenharmony_ci"``priv->hwtstamp_tx_enabled == true``". Because the rest of the system ensures
80062306a36Sopenharmony_cithat PTP timestamping is not enabled for anything other than the outermost PHC,
80162306a36Sopenharmony_cithis enhanced check will avoid delivering a duplicated TX timestamp to user
80262306a36Sopenharmony_cispace.
803