18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci===================================== 48c2ecf20Sopenharmony_ciNetwork Devices, the Kernel, and You! 58c2ecf20Sopenharmony_ci===================================== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ciIntroduction 98c2ecf20Sopenharmony_ci============ 108c2ecf20Sopenharmony_ciThe following is a random collection of documentation regarding 118c2ecf20Sopenharmony_cinetwork devices. 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_cistruct net_device allocation rules 148c2ecf20Sopenharmony_ci================================== 158c2ecf20Sopenharmony_ciNetwork device structures need to persist even after module is unloaded and 168c2ecf20Sopenharmony_cimust be allocated with alloc_netdev_mqs() and friends. 178c2ecf20Sopenharmony_ciIf device has registered successfully, it will be freed on last use 188c2ecf20Sopenharmony_ciby free_netdev(). This is required to handle the pathologic case cleanly 198c2ecf20Sopenharmony_ci(example: rmmod mydriver </sys/class/net/myeth/mtu ) 208c2ecf20Sopenharmony_ci 218c2ecf20Sopenharmony_cialloc_netdev_mqs()/alloc_netdev() reserve extra space for driver 228c2ecf20Sopenharmony_ciprivate data which gets freed when the network device is freed. If 238c2ecf20Sopenharmony_ciseparately allocated data is attached to the network device 248c2ecf20Sopenharmony_ci(netdev_priv(dev)) then it is up to the module exit handler to free that. 258c2ecf20Sopenharmony_ci 268c2ecf20Sopenharmony_ciMTU 278c2ecf20Sopenharmony_ci=== 288c2ecf20Sopenharmony_ciEach network device has a Maximum Transfer Unit. The MTU does not 298c2ecf20Sopenharmony_ciinclude any link layer protocol overhead. Upper layer protocols must 308c2ecf20Sopenharmony_cinot pass a socket buffer (skb) to a device to transmit with more data 318c2ecf20Sopenharmony_cithan the mtu. The MTU does not include link layer header overhead, so 328c2ecf20Sopenharmony_cifor example on Ethernet if the standard MTU is 1500 bytes used, the 338c2ecf20Sopenharmony_ciactual skb will contain up to 1514 bytes because of the Ethernet 348c2ecf20Sopenharmony_ciheader. Devices should allow for the 4 byte VLAN header as well. 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ciSegmentation Offload (GSO, TSO) is an exception to this rule. The 378c2ecf20Sopenharmony_ciupper layer protocol may pass a large socket buffer to the device 388c2ecf20Sopenharmony_citransmit routine, and the device will break that up into separate 398c2ecf20Sopenharmony_cipackets based on the current MTU. 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_ciMTU is symmetrical and applies both to receive and transmit. A device 428c2ecf20Sopenharmony_cimust be able to receive at least the maximum size packet allowed by 438c2ecf20Sopenharmony_cithe MTU. A network device may use the MTU as mechanism to size receive 448c2ecf20Sopenharmony_cibuffers, but the device should allow packets with VLAN header. With 458c2ecf20Sopenharmony_cistandard Ethernet mtu of 1500 bytes, the device should allow up to 468c2ecf20Sopenharmony_ci1518 byte packets (1500 + 14 header + 4 tag). The device may either: 478c2ecf20Sopenharmony_cidrop, truncate, or pass up oversize packets, but dropping oversize 488c2ecf20Sopenharmony_cipackets is preferred. 498c2ecf20Sopenharmony_ci 508c2ecf20Sopenharmony_ci 518c2ecf20Sopenharmony_cistruct net_device synchronization rules 528c2ecf20Sopenharmony_ci======================================= 538c2ecf20Sopenharmony_cindo_open: 548c2ecf20Sopenharmony_ci Synchronization: rtnl_lock() semaphore. 558c2ecf20Sopenharmony_ci Context: process 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_cindo_stop: 588c2ecf20Sopenharmony_ci Synchronization: rtnl_lock() semaphore. 598c2ecf20Sopenharmony_ci Context: process 608c2ecf20Sopenharmony_ci Note: netif_running() is guaranteed false 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_cindo_do_ioctl: 638c2ecf20Sopenharmony_ci Synchronization: rtnl_lock() semaphore. 648c2ecf20Sopenharmony_ci Context: process 658c2ecf20Sopenharmony_ci 668c2ecf20Sopenharmony_cindo_get_stats: 678c2ecf20Sopenharmony_ci Synchronization: dev_base_lock rwlock. 688c2ecf20Sopenharmony_ci Context: nominally process, but don't sleep inside an rwlock 698c2ecf20Sopenharmony_ci 708c2ecf20Sopenharmony_cindo_start_xmit: 718c2ecf20Sopenharmony_ci Synchronization: __netif_tx_lock spinlock. 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ci When the driver sets NETIF_F_LLTX in dev->features this will be 748c2ecf20Sopenharmony_ci called without holding netif_tx_lock. In this case the driver 758c2ecf20Sopenharmony_ci has to lock by itself when needed. 768c2ecf20Sopenharmony_ci The locking there should also properly protect against 778c2ecf20Sopenharmony_ci set_rx_mode. WARNING: use of NETIF_F_LLTX is deprecated. 788c2ecf20Sopenharmony_ci Don't use it for new drivers. 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ci Context: Process with BHs disabled or BH (timer), 818c2ecf20Sopenharmony_ci will be called with interrupts disabled by netconsole. 828c2ecf20Sopenharmony_ci 838c2ecf20Sopenharmony_ci Return codes: 848c2ecf20Sopenharmony_ci 858c2ecf20Sopenharmony_ci * NETDEV_TX_OK everything ok. 868c2ecf20Sopenharmony_ci * NETDEV_TX_BUSY Cannot transmit packet, try later 878c2ecf20Sopenharmony_ci Usually a bug, means queue start/stop flow control is broken in 888c2ecf20Sopenharmony_ci the driver. Note: the driver must NOT put the skb in its DMA ring. 898c2ecf20Sopenharmony_ci 908c2ecf20Sopenharmony_cindo_tx_timeout: 918c2ecf20Sopenharmony_ci Synchronization: netif_tx_lock spinlock; all TX queues frozen. 928c2ecf20Sopenharmony_ci Context: BHs disabled 938c2ecf20Sopenharmony_ci Notes: netif_queue_stopped() is guaranteed true 948c2ecf20Sopenharmony_ci 958c2ecf20Sopenharmony_cindo_set_rx_mode: 968c2ecf20Sopenharmony_ci Synchronization: netif_addr_lock spinlock. 978c2ecf20Sopenharmony_ci Context: BHs disabled 988c2ecf20Sopenharmony_ci 998c2ecf20Sopenharmony_cistruct napi_struct synchronization rules 1008c2ecf20Sopenharmony_ci======================================== 1018c2ecf20Sopenharmony_cinapi->poll: 1028c2ecf20Sopenharmony_ci Synchronization: 1038c2ecf20Sopenharmony_ci NAPI_STATE_SCHED bit in napi->state. Device 1048c2ecf20Sopenharmony_ci driver's ndo_stop method will invoke napi_disable() on 1058c2ecf20Sopenharmony_ci all NAPI instances which will do a sleeping poll on the 1068c2ecf20Sopenharmony_ci NAPI_STATE_SCHED napi->state bit, waiting for all pending 1078c2ecf20Sopenharmony_ci NAPI activity to cease. 1088c2ecf20Sopenharmony_ci 1098c2ecf20Sopenharmony_ci Context: 1108c2ecf20Sopenharmony_ci softirq 1118c2ecf20Sopenharmony_ci will be called with interrupts disabled by netconsole. 112