18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci=====================================
48c2ecf20Sopenharmony_ciNetwork Devices, the Kernel, and You!
58c2ecf20Sopenharmony_ci=====================================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ci
88c2ecf20Sopenharmony_ciIntroduction
98c2ecf20Sopenharmony_ci============
108c2ecf20Sopenharmony_ciThe following is a random collection of documentation regarding
118c2ecf20Sopenharmony_cinetwork devices.
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_cistruct net_device allocation rules
148c2ecf20Sopenharmony_ci==================================
158c2ecf20Sopenharmony_ciNetwork device structures need to persist even after module is unloaded and
168c2ecf20Sopenharmony_cimust be allocated with alloc_netdev_mqs() and friends.
178c2ecf20Sopenharmony_ciIf device has registered successfully, it will be freed on last use
188c2ecf20Sopenharmony_ciby free_netdev(). This is required to handle the pathologic case cleanly
198c2ecf20Sopenharmony_ci(example: rmmod mydriver </sys/class/net/myeth/mtu )
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_cialloc_netdev_mqs()/alloc_netdev() reserve extra space for driver
228c2ecf20Sopenharmony_ciprivate data which gets freed when the network device is freed. If
238c2ecf20Sopenharmony_ciseparately allocated data is attached to the network device
248c2ecf20Sopenharmony_ci(netdev_priv(dev)) then it is up to the module exit handler to free that.
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ciMTU
278c2ecf20Sopenharmony_ci===
288c2ecf20Sopenharmony_ciEach network device has a Maximum Transfer Unit. The MTU does not
298c2ecf20Sopenharmony_ciinclude any link layer protocol overhead. Upper layer protocols must
308c2ecf20Sopenharmony_cinot pass a socket buffer (skb) to a device to transmit with more data
318c2ecf20Sopenharmony_cithan the mtu. The MTU does not include link layer header overhead, so
328c2ecf20Sopenharmony_cifor example on Ethernet if the standard MTU is 1500 bytes used, the
338c2ecf20Sopenharmony_ciactual skb will contain up to 1514 bytes because of the Ethernet
348c2ecf20Sopenharmony_ciheader. Devices should allow for the 4 byte VLAN header as well.
358c2ecf20Sopenharmony_ci
368c2ecf20Sopenharmony_ciSegmentation Offload (GSO, TSO) is an exception to this rule.  The
378c2ecf20Sopenharmony_ciupper layer protocol may pass a large socket buffer to the device
388c2ecf20Sopenharmony_citransmit routine, and the device will break that up into separate
398c2ecf20Sopenharmony_cipackets based on the current MTU.
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_ciMTU is symmetrical and applies both to receive and transmit. A device
428c2ecf20Sopenharmony_cimust be able to receive at least the maximum size packet allowed by
438c2ecf20Sopenharmony_cithe MTU. A network device may use the MTU as mechanism to size receive
448c2ecf20Sopenharmony_cibuffers, but the device should allow packets with VLAN header. With
458c2ecf20Sopenharmony_cistandard Ethernet mtu of 1500 bytes, the device should allow up to
468c2ecf20Sopenharmony_ci1518 byte packets (1500 + 14 header + 4 tag).  The device may either:
478c2ecf20Sopenharmony_cidrop, truncate, or pass up oversize packets, but dropping oversize
488c2ecf20Sopenharmony_cipackets is preferred.
498c2ecf20Sopenharmony_ci
508c2ecf20Sopenharmony_ci
518c2ecf20Sopenharmony_cistruct net_device synchronization rules
528c2ecf20Sopenharmony_ci=======================================
538c2ecf20Sopenharmony_cindo_open:
548c2ecf20Sopenharmony_ci	Synchronization: rtnl_lock() semaphore.
558c2ecf20Sopenharmony_ci	Context: process
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_cindo_stop:
588c2ecf20Sopenharmony_ci	Synchronization: rtnl_lock() semaphore.
598c2ecf20Sopenharmony_ci	Context: process
608c2ecf20Sopenharmony_ci	Note: netif_running() is guaranteed false
618c2ecf20Sopenharmony_ci
628c2ecf20Sopenharmony_cindo_do_ioctl:
638c2ecf20Sopenharmony_ci	Synchronization: rtnl_lock() semaphore.
648c2ecf20Sopenharmony_ci	Context: process
658c2ecf20Sopenharmony_ci
668c2ecf20Sopenharmony_cindo_get_stats:
678c2ecf20Sopenharmony_ci	Synchronization: dev_base_lock rwlock.
688c2ecf20Sopenharmony_ci	Context: nominally process, but don't sleep inside an rwlock
698c2ecf20Sopenharmony_ci
708c2ecf20Sopenharmony_cindo_start_xmit:
718c2ecf20Sopenharmony_ci	Synchronization: __netif_tx_lock spinlock.
728c2ecf20Sopenharmony_ci
738c2ecf20Sopenharmony_ci	When the driver sets NETIF_F_LLTX in dev->features this will be
748c2ecf20Sopenharmony_ci	called without holding netif_tx_lock. In this case the driver
758c2ecf20Sopenharmony_ci	has to lock by itself when needed.
768c2ecf20Sopenharmony_ci	The locking there should also properly protect against
778c2ecf20Sopenharmony_ci	set_rx_mode. WARNING: use of NETIF_F_LLTX is deprecated.
788c2ecf20Sopenharmony_ci	Don't use it for new drivers.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ci	Context: Process with BHs disabled or BH (timer),
818c2ecf20Sopenharmony_ci		 will be called with interrupts disabled by netconsole.
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_ci	Return codes:
848c2ecf20Sopenharmony_ci
858c2ecf20Sopenharmony_ci	* NETDEV_TX_OK everything ok.
868c2ecf20Sopenharmony_ci	* NETDEV_TX_BUSY Cannot transmit packet, try later
878c2ecf20Sopenharmony_ci	  Usually a bug, means queue start/stop flow control is broken in
888c2ecf20Sopenharmony_ci	  the driver. Note: the driver must NOT put the skb in its DMA ring.
898c2ecf20Sopenharmony_ci
908c2ecf20Sopenharmony_cindo_tx_timeout:
918c2ecf20Sopenharmony_ci	Synchronization: netif_tx_lock spinlock; all TX queues frozen.
928c2ecf20Sopenharmony_ci	Context: BHs disabled
938c2ecf20Sopenharmony_ci	Notes: netif_queue_stopped() is guaranteed true
948c2ecf20Sopenharmony_ci
958c2ecf20Sopenharmony_cindo_set_rx_mode:
968c2ecf20Sopenharmony_ci	Synchronization: netif_addr_lock spinlock.
978c2ecf20Sopenharmony_ci	Context: BHs disabled
988c2ecf20Sopenharmony_ci
998c2ecf20Sopenharmony_cistruct napi_struct synchronization rules
1008c2ecf20Sopenharmony_ci========================================
1018c2ecf20Sopenharmony_cinapi->poll:
1028c2ecf20Sopenharmony_ci	Synchronization:
1038c2ecf20Sopenharmony_ci		NAPI_STATE_SCHED bit in napi->state.  Device
1048c2ecf20Sopenharmony_ci		driver's ndo_stop method will invoke napi_disable() on
1058c2ecf20Sopenharmony_ci		all NAPI instances which will do a sleeping poll on the
1068c2ecf20Sopenharmony_ci		NAPI_STATE_SCHED napi->state bit, waiting for all pending
1078c2ecf20Sopenharmony_ci		NAPI activity to cease.
1088c2ecf20Sopenharmony_ci
1098c2ecf20Sopenharmony_ci	Context:
1108c2ecf20Sopenharmony_ci		 softirq
1118c2ecf20Sopenharmony_ci		 will be called with interrupts disabled by netconsole.
112