18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci.. include:: <isonum.txt> 38c2ecf20Sopenharmony_ci 48c2ecf20Sopenharmony_ci=============================================== 58c2ecf20Sopenharmony_ciEthernet switch device driver model (switchdev) 68c2ecf20Sopenharmony_ci=============================================== 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ciCopyright |copy| 2014 Jiri Pirko <jiri@resnulli.us> 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ciCopyright |copy| 2014-2015 Scott Feldman <sfeldma@gmail.com> 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciThe Ethernet switch device driver model (switchdev) is an in-kernel driver 148c2ecf20Sopenharmony_cimodel for switch devices which offload the forwarding (data) plane from the 158c2ecf20Sopenharmony_cikernel. 168c2ecf20Sopenharmony_ci 178c2ecf20Sopenharmony_ciFigure 1 is a block diagram showing the components of the switchdev model for 188c2ecf20Sopenharmony_cian example setup using a data-center-class switch ASIC chip. Other setups 198c2ecf20Sopenharmony_ciwith SR-IOV or soft switches, such as OVS, are possible. 208c2ecf20Sopenharmony_ci 218c2ecf20Sopenharmony_ci:: 228c2ecf20Sopenharmony_ci 238c2ecf20Sopenharmony_ci 248c2ecf20Sopenharmony_ci User-space tools 258c2ecf20Sopenharmony_ci 268c2ecf20Sopenharmony_ci user space | 278c2ecf20Sopenharmony_ci +-------------------------------------------------------------------+ 288c2ecf20Sopenharmony_ci kernel | Netlink 298c2ecf20Sopenharmony_ci | 308c2ecf20Sopenharmony_ci +--------------+-------------------------------+ 318c2ecf20Sopenharmony_ci | Network stack | 328c2ecf20Sopenharmony_ci | (Linux) | 338c2ecf20Sopenharmony_ci | | 348c2ecf20Sopenharmony_ci +----------------------------------------------+ 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ci sw1p2 sw1p4 sw1p6 378c2ecf20Sopenharmony_ci sw1p1 + sw1p3 + sw1p5 + eth1 388c2ecf20Sopenharmony_ci + | + | + | + 398c2ecf20Sopenharmony_ci | | | | | | | 408c2ecf20Sopenharmony_ci +--+----+----+----+----+----+---+ +-----+-----+ 418c2ecf20Sopenharmony_ci | Switch driver | | mgmt | 428c2ecf20Sopenharmony_ci | (this document) | | driver | 438c2ecf20Sopenharmony_ci | | | | 448c2ecf20Sopenharmony_ci +--------------+----------------+ +-----------+ 458c2ecf20Sopenharmony_ci | 468c2ecf20Sopenharmony_ci kernel | HW bus (eg PCI) 478c2ecf20Sopenharmony_ci +-------------------------------------------------------------------+ 488c2ecf20Sopenharmony_ci hardware | 498c2ecf20Sopenharmony_ci +--------------+----------------+ 508c2ecf20Sopenharmony_ci | Switch device (sw1) | 518c2ecf20Sopenharmony_ci | +----+ +--------+ 528c2ecf20Sopenharmony_ci | | v offloaded data path | mgmt port 538c2ecf20Sopenharmony_ci | | | | 548c2ecf20Sopenharmony_ci +--|----|----+----+----+----+---+ 558c2ecf20Sopenharmony_ci | | | | | | 568c2ecf20Sopenharmony_ci + + + + + + 578c2ecf20Sopenharmony_ci p1 p2 p3 p4 p5 p6 588c2ecf20Sopenharmony_ci 598c2ecf20Sopenharmony_ci front-panel ports 608c2ecf20Sopenharmony_ci 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ci Fig 1. 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ci 658c2ecf20Sopenharmony_ciInclude Files 668c2ecf20Sopenharmony_ci------------- 678c2ecf20Sopenharmony_ci 688c2ecf20Sopenharmony_ci:: 698c2ecf20Sopenharmony_ci 708c2ecf20Sopenharmony_ci #include <linux/netdevice.h> 718c2ecf20Sopenharmony_ci #include <net/switchdev.h> 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_ciConfiguration 758c2ecf20Sopenharmony_ci------------- 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciUse "depends NET_SWITCHDEV" in driver's Kconfig to ensure switchdev model 788c2ecf20Sopenharmony_cisupport is built for driver. 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ci 818c2ecf20Sopenharmony_ciSwitch Ports 828c2ecf20Sopenharmony_ci------------ 838c2ecf20Sopenharmony_ci 848c2ecf20Sopenharmony_ciOn switchdev driver initialization, the driver will allocate and register a 858c2ecf20Sopenharmony_cistruct net_device (using register_netdev()) for each enumerated physical switch 868c2ecf20Sopenharmony_ciport, called the port netdev. A port netdev is the software representation of 878c2ecf20Sopenharmony_cithe physical port and provides a conduit for control traffic to/from the 888c2ecf20Sopenharmony_cicontroller (the kernel) and the network, as well as an anchor point for higher 898c2ecf20Sopenharmony_cilevel constructs such as bridges, bonds, VLANs, tunnels, and L3 routers. Using 908c2ecf20Sopenharmony_cistandard netdev tools (iproute2, ethtool, etc), the port netdev can also 918c2ecf20Sopenharmony_ciprovide to the user access to the physical properties of the switch port such 928c2ecf20Sopenharmony_cias PHY link state and I/O statistics. 938c2ecf20Sopenharmony_ci 948c2ecf20Sopenharmony_ciThere is (currently) no higher-level kernel object for the switch beyond the 958c2ecf20Sopenharmony_ciport netdevs. All of the switchdev driver ops are netdev ops or switchdev ops. 968c2ecf20Sopenharmony_ci 978c2ecf20Sopenharmony_ciA switch management port is outside the scope of the switchdev driver model. 988c2ecf20Sopenharmony_ciTypically, the management port is not participating in offloaded data plane and 998c2ecf20Sopenharmony_ciis loaded with a different driver, such as a NIC driver, on the management port 1008c2ecf20Sopenharmony_cidevice. 1018c2ecf20Sopenharmony_ci 1028c2ecf20Sopenharmony_ciSwitch ID 1038c2ecf20Sopenharmony_ci^^^^^^^^^ 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ciThe switchdev driver must implement the net_device operation 1068c2ecf20Sopenharmony_cindo_get_port_parent_id for each port netdev, returning the same physical ID for 1078c2ecf20Sopenharmony_cieach port of a switch. The ID must be unique between switches on the same 1088c2ecf20Sopenharmony_cisystem. The ID does not need to be unique between switches on different 1098c2ecf20Sopenharmony_cisystems. 1108c2ecf20Sopenharmony_ci 1118c2ecf20Sopenharmony_ciThe switch ID is used to locate ports on a switch and to know if aggregated 1128c2ecf20Sopenharmony_ciports belong to the same switch. 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_ciPort Netdev Naming 1158c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^ 1168c2ecf20Sopenharmony_ci 1178c2ecf20Sopenharmony_ciUdev rules should be used for port netdev naming, using some unique attribute 1188c2ecf20Sopenharmony_ciof the port as a key, for example the port MAC address or the port PHYS name. 1198c2ecf20Sopenharmony_ciHard-coding of kernel netdev names within the driver is discouraged; let the 1208c2ecf20Sopenharmony_cikernel pick the default netdev name, and let udev set the final name based on a 1218c2ecf20Sopenharmony_ciport attribute. 1228c2ecf20Sopenharmony_ci 1238c2ecf20Sopenharmony_ciUsing port PHYS name (ndo_get_phys_port_name) for the key is particularly 1248c2ecf20Sopenharmony_ciuseful for dynamically-named ports where the device names its ports based on 1258c2ecf20Sopenharmony_ciexternal configuration. For example, if a physical 40G port is split logically 1268c2ecf20Sopenharmony_ciinto 4 10G ports, resulting in 4 port netdevs, the device can give a unique 1278c2ecf20Sopenharmony_ciname for each port using port PHYS name. The udev rule would be:: 1288c2ecf20Sopenharmony_ci 1298c2ecf20Sopenharmony_ci SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \ 1308c2ecf20Sopenharmony_ci ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}" 1318c2ecf20Sopenharmony_ci 1328c2ecf20Sopenharmony_ciSuggested naming convention is "swXpYsZ", where X is the switch name or ID, Y 1338c2ecf20Sopenharmony_ciis the port name or ID, and Z is the sub-port name or ID. For example, sw1p1s0 1348c2ecf20Sopenharmony_ciwould be sub-port 0 on port 1 on switch 1. 1358c2ecf20Sopenharmony_ci 1368c2ecf20Sopenharmony_ciPort Features 1378c2ecf20Sopenharmony_ci^^^^^^^^^^^^^ 1388c2ecf20Sopenharmony_ci 1398c2ecf20Sopenharmony_ciNETIF_F_NETNS_LOCAL 1408c2ecf20Sopenharmony_ci 1418c2ecf20Sopenharmony_ciIf the switchdev driver (and device) only supports offloading of the default 1428c2ecf20Sopenharmony_cinetwork namespace (netns), the driver should set this feature flag to prevent 1438c2ecf20Sopenharmony_cithe port netdev from being moved out of the default netns. A netns-aware 1448c2ecf20Sopenharmony_cidriver/device would not set this flag and be responsible for partitioning 1458c2ecf20Sopenharmony_cihardware to preserve netns containment. This means hardware cannot forward 1468c2ecf20Sopenharmony_citraffic from a port in one namespace to another port in another namespace. 1478c2ecf20Sopenharmony_ci 1488c2ecf20Sopenharmony_ciPort Topology 1498c2ecf20Sopenharmony_ci^^^^^^^^^^^^^ 1508c2ecf20Sopenharmony_ci 1518c2ecf20Sopenharmony_ciThe port netdevs representing the physical switch ports can be organized into 1528c2ecf20Sopenharmony_cihigher-level switching constructs. The default construct is a standalone 1538c2ecf20Sopenharmony_cirouter port, used to offload L3 forwarding. Two or more ports can be bonded 1548c2ecf20Sopenharmony_citogether to form a LAG. Two or more ports (or LAGs) can be bridged to bridge 1558c2ecf20Sopenharmony_ciL2 networks. VLANs can be applied to sub-divide L2 networks. L2-over-L3 1568c2ecf20Sopenharmony_citunnels can be built on ports. These constructs are built using standard Linux 1578c2ecf20Sopenharmony_citools such as the bridge driver, the bonding/team drivers, and netlink-based 1588c2ecf20Sopenharmony_citools such as iproute2. 1598c2ecf20Sopenharmony_ci 1608c2ecf20Sopenharmony_ciThe switchdev driver can know a particular port's position in the topology by 1618c2ecf20Sopenharmony_cimonitoring NETDEV_CHANGEUPPER notifications. For example, a port moved into a 1628c2ecf20Sopenharmony_cibond will see it's upper master change. If that bond is moved into a bridge, 1638c2ecf20Sopenharmony_cithe bond's upper master will change. And so on. The driver will track such 1648c2ecf20Sopenharmony_cimovements to know what position a port is in in the overall topology by 1658c2ecf20Sopenharmony_ciregistering for netdevice events and acting on NETDEV_CHANGEUPPER. 1668c2ecf20Sopenharmony_ci 1678c2ecf20Sopenharmony_ciL2 Forwarding Offload 1688c2ecf20Sopenharmony_ci--------------------- 1698c2ecf20Sopenharmony_ci 1708c2ecf20Sopenharmony_ciThe idea is to offload the L2 data forwarding (switching) path from the kernel 1718c2ecf20Sopenharmony_cito the switchdev device by mirroring bridge FDB entries down to the device. An 1728c2ecf20Sopenharmony_ciFDB entry is the {port, MAC, VLAN} tuple forwarding destination. 1738c2ecf20Sopenharmony_ci 1748c2ecf20Sopenharmony_ciTo offloading L2 bridging, the switchdev driver/device should support: 1758c2ecf20Sopenharmony_ci 1768c2ecf20Sopenharmony_ci - Static FDB entries installed on a bridge port 1778c2ecf20Sopenharmony_ci - Notification of learned/forgotten src mac/vlans from device 1788c2ecf20Sopenharmony_ci - STP state changes on the port 1798c2ecf20Sopenharmony_ci - VLAN flooding of multicast/broadcast and unknown unicast packets 1808c2ecf20Sopenharmony_ci 1818c2ecf20Sopenharmony_ciStatic FDB Entries 1828c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^ 1838c2ecf20Sopenharmony_ci 1848c2ecf20Sopenharmony_ciThe switchdev driver should implement ndo_fdb_add, ndo_fdb_del and ndo_fdb_dump 1858c2ecf20Sopenharmony_cito support static FDB entries installed to the device. Static bridge FDB 1868c2ecf20Sopenharmony_cientries are installed, for example, using iproute2 bridge cmd:: 1878c2ecf20Sopenharmony_ci 1888c2ecf20Sopenharmony_ci bridge fdb add ADDR dev DEV [vlan VID] [self] 1898c2ecf20Sopenharmony_ci 1908c2ecf20Sopenharmony_ciThe driver should use the helper switchdev_port_fdb_xxx ops for ndo_fdb_xxx 1918c2ecf20Sopenharmony_ciops, and handle add/delete/dump of SWITCHDEV_OBJ_ID_PORT_FDB object using 1928c2ecf20Sopenharmony_ciswitchdev_port_obj_xxx ops. 1938c2ecf20Sopenharmony_ci 1948c2ecf20Sopenharmony_ciXXX: what should be done if offloading this rule to hardware fails (for 1958c2ecf20Sopenharmony_ciexample, due to full capacity in hardware tables) ? 1968c2ecf20Sopenharmony_ci 1978c2ecf20Sopenharmony_ciNote: by default, the bridge does not filter on VLAN and only bridges untagged 1988c2ecf20Sopenharmony_citraffic. To enable VLAN support, turn on VLAN filtering:: 1998c2ecf20Sopenharmony_ci 2008c2ecf20Sopenharmony_ci echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering 2018c2ecf20Sopenharmony_ci 2028c2ecf20Sopenharmony_ciNotification of Learned/Forgotten Source MAC/VLANs 2038c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2048c2ecf20Sopenharmony_ci 2058c2ecf20Sopenharmony_ciThe switch device will learn/forget source MAC address/VLAN on ingress packets 2068c2ecf20Sopenharmony_ciand notify the switch driver of the mac/vlan/port tuples. The switch driver, 2078c2ecf20Sopenharmony_ciin turn, will notify the bridge driver using the switchdev notifier call:: 2088c2ecf20Sopenharmony_ci 2098c2ecf20Sopenharmony_ci err = call_switchdev_notifiers(val, dev, info, extack); 2108c2ecf20Sopenharmony_ci 2118c2ecf20Sopenharmony_ciWhere val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when 2128c2ecf20Sopenharmony_ciforgetting, and info points to a struct switchdev_notifier_fdb_info. On 2138c2ecf20Sopenharmony_ciSWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the 2148c2ecf20Sopenharmony_cibridge's FDB and mark the entry as NTF_EXT_LEARNED. The iproute2 bridge 2158c2ecf20Sopenharmony_cicommand will label these entries "offload":: 2168c2ecf20Sopenharmony_ci 2178c2ecf20Sopenharmony_ci $ bridge fdb 2188c2ecf20Sopenharmony_ci 52:54:00:12:35:01 dev sw1p1 master br0 permanent 2198c2ecf20Sopenharmony_ci 00:02:00:00:02:00 dev sw1p1 master br0 offload 2208c2ecf20Sopenharmony_ci 00:02:00:00:02:00 dev sw1p1 self 2218c2ecf20Sopenharmony_ci 52:54:00:12:35:02 dev sw1p2 master br0 permanent 2228c2ecf20Sopenharmony_ci 00:02:00:00:03:00 dev sw1p2 master br0 offload 2238c2ecf20Sopenharmony_ci 00:02:00:00:03:00 dev sw1p2 self 2248c2ecf20Sopenharmony_ci 33:33:00:00:00:01 dev eth0 self permanent 2258c2ecf20Sopenharmony_ci 01:00:5e:00:00:01 dev eth0 self permanent 2268c2ecf20Sopenharmony_ci 33:33:ff:00:00:00 dev eth0 self permanent 2278c2ecf20Sopenharmony_ci 01:80:c2:00:00:0e dev eth0 self permanent 2288c2ecf20Sopenharmony_ci 33:33:00:00:00:01 dev br0 self permanent 2298c2ecf20Sopenharmony_ci 01:00:5e:00:00:01 dev br0 self permanent 2308c2ecf20Sopenharmony_ci 33:33:ff:12:35:01 dev br0 self permanent 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ciLearning on the port should be disabled on the bridge using the bridge command:: 2338c2ecf20Sopenharmony_ci 2348c2ecf20Sopenharmony_ci bridge link set dev DEV learning off 2358c2ecf20Sopenharmony_ci 2368c2ecf20Sopenharmony_ciLearning on the device port should be enabled, as well as learning_sync:: 2378c2ecf20Sopenharmony_ci 2388c2ecf20Sopenharmony_ci bridge link set dev DEV learning on self 2398c2ecf20Sopenharmony_ci bridge link set dev DEV learning_sync on self 2408c2ecf20Sopenharmony_ci 2418c2ecf20Sopenharmony_ciLearning_sync attribute enables syncing of the learned/forgotten FDB entry to 2428c2ecf20Sopenharmony_cithe bridge's FDB. It's possible, but not optimal, to enable learning on the 2438c2ecf20Sopenharmony_cidevice port and on the bridge port, and disable learning_sync. 2448c2ecf20Sopenharmony_ci 2458c2ecf20Sopenharmony_ciTo support learning, the driver implements switchdev op 2468c2ecf20Sopenharmony_ciswitchdev_port_attr_set for SWITCHDEV_ATTR_PORT_ID_{PRE}_BRIDGE_FLAGS. 2478c2ecf20Sopenharmony_ci 2488c2ecf20Sopenharmony_ciFDB Ageing 2498c2ecf20Sopenharmony_ci^^^^^^^^^^ 2508c2ecf20Sopenharmony_ci 2518c2ecf20Sopenharmony_ciThe bridge will skip ageing FDB entries marked with NTF_EXT_LEARNED and it is 2528c2ecf20Sopenharmony_cithe responsibility of the port driver/device to age out these entries. If the 2538c2ecf20Sopenharmony_ciport device supports ageing, when the FDB entry expires, it will notify the 2548c2ecf20Sopenharmony_cidriver which in turn will notify the bridge with SWITCHDEV_FDB_DEL. If the 2558c2ecf20Sopenharmony_cidevice does not support ageing, the driver can simulate ageing using a 2568c2ecf20Sopenharmony_cigarbage collection timer to monitor FDB entries. Expired entries will be 2578c2ecf20Sopenharmony_cinotified to the bridge using SWITCHDEV_FDB_DEL. See rocker driver for 2588c2ecf20Sopenharmony_ciexample of driver running ageing timer. 2598c2ecf20Sopenharmony_ci 2608c2ecf20Sopenharmony_ciTo keep an NTF_EXT_LEARNED entry "alive", the driver should refresh the FDB 2618c2ecf20Sopenharmony_cientry by calling call_switchdev_notifiers(SWITCHDEV_FDB_ADD, ...). The 2628c2ecf20Sopenharmony_cinotification will reset the FDB entry's last-used time to now. The driver 2638c2ecf20Sopenharmony_cishould rate limit refresh notifications, for example, no more than once a 2648c2ecf20Sopenharmony_cisecond. (The last-used time is visible using the bridge -s fdb option). 2658c2ecf20Sopenharmony_ci 2668c2ecf20Sopenharmony_ciSTP State Change on Port 2678c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^ 2688c2ecf20Sopenharmony_ci 2698c2ecf20Sopenharmony_ciInternally or with a third-party STP protocol implementation (e.g. mstpd), the 2708c2ecf20Sopenharmony_cibridge driver maintains the STP state for ports, and will notify the switch 2718c2ecf20Sopenharmony_cidriver of STP state change on a port using the switchdev op 2728c2ecf20Sopenharmony_ciswitchdev_attr_port_set for SWITCHDEV_ATTR_PORT_ID_STP_UPDATE. 2738c2ecf20Sopenharmony_ci 2748c2ecf20Sopenharmony_ciState is one of BR_STATE_*. The switch driver can use STP state updates to 2758c2ecf20Sopenharmony_ciupdate ingress packet filter list for the port. For example, if port is 2768c2ecf20Sopenharmony_ciDISABLED, no packets should pass, but if port moves to BLOCKED, then STP BPDUs 2778c2ecf20Sopenharmony_ciand other IEEE 01:80:c2:xx:xx:xx link-local multicast packets can pass. 2788c2ecf20Sopenharmony_ci 2798c2ecf20Sopenharmony_ciNote that STP BDPUs are untagged and STP state applies to all VLANs on the port 2808c2ecf20Sopenharmony_ciso packet filters should be applied consistently across untagged and tagged 2818c2ecf20Sopenharmony_ciVLANs on the port. 2828c2ecf20Sopenharmony_ci 2838c2ecf20Sopenharmony_ciFlooding L2 domain 2848c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^ 2858c2ecf20Sopenharmony_ci 2868c2ecf20Sopenharmony_ciFor a given L2 VLAN domain, the switch device should flood multicast/broadcast 2878c2ecf20Sopenharmony_ciand unknown unicast packets to all ports in domain, if allowed by port's 2888c2ecf20Sopenharmony_cicurrent STP state. The switch driver, knowing which ports are within which 2898c2ecf20Sopenharmony_civlan L2 domain, can program the switch device for flooding. The packet may 2908c2ecf20Sopenharmony_cibe sent to the port netdev for processing by the bridge driver. The 2918c2ecf20Sopenharmony_cibridge should not reflood the packet to the same ports the device flooded, 2928c2ecf20Sopenharmony_ciotherwise there will be duplicate packets on the wire. 2938c2ecf20Sopenharmony_ci 2948c2ecf20Sopenharmony_ciTo avoid duplicate packets, the switch driver should mark a packet as already 2958c2ecf20Sopenharmony_ciforwarded by setting the skb->offload_fwd_mark bit. The bridge driver will mark 2968c2ecf20Sopenharmony_cithe skb using the ingress bridge port's mark and prevent it from being forwarded 2978c2ecf20Sopenharmony_cithrough any bridge port with the same mark. 2988c2ecf20Sopenharmony_ci 2998c2ecf20Sopenharmony_ciIt is possible for the switch device to not handle flooding and push the 3008c2ecf20Sopenharmony_cipackets up to the bridge driver for flooding. This is not ideal as the number 3018c2ecf20Sopenharmony_ciof ports scale in the L2 domain as the device is much more efficient at 3028c2ecf20Sopenharmony_ciflooding packets that software. 3038c2ecf20Sopenharmony_ci 3048c2ecf20Sopenharmony_ciIf supported by the device, flood control can be offloaded to it, preventing 3058c2ecf20Sopenharmony_cicertain netdevs from flooding unicast traffic for which there is no FDB entry. 3068c2ecf20Sopenharmony_ci 3078c2ecf20Sopenharmony_ciIGMP Snooping 3088c2ecf20Sopenharmony_ci^^^^^^^^^^^^^ 3098c2ecf20Sopenharmony_ci 3108c2ecf20Sopenharmony_ciIn order to support IGMP snooping, the port netdevs should trap to the bridge 3118c2ecf20Sopenharmony_cidriver all IGMP join and leave messages. 3128c2ecf20Sopenharmony_ciThe bridge multicast module will notify port netdevs on every multicast group 3138c2ecf20Sopenharmony_cichanged whether it is static configured or dynamically joined/leave. 3148c2ecf20Sopenharmony_ciThe hardware implementation should be forwarding all registered multicast 3158c2ecf20Sopenharmony_citraffic groups only to the configured ports. 3168c2ecf20Sopenharmony_ci 3178c2ecf20Sopenharmony_ciL3 Routing Offload 3188c2ecf20Sopenharmony_ci------------------ 3198c2ecf20Sopenharmony_ci 3208c2ecf20Sopenharmony_ciOffloading L3 routing requires that device be programmed with FIB entries from 3218c2ecf20Sopenharmony_cithe kernel, with the device doing the FIB lookup and forwarding. The device 3228c2ecf20Sopenharmony_cidoes a longest prefix match (LPM) on FIB entries matching route prefix and 3238c2ecf20Sopenharmony_ciforwards the packet to the matching FIB entry's nexthop(s) egress ports. 3248c2ecf20Sopenharmony_ci 3258c2ecf20Sopenharmony_ciTo program the device, the driver has to register a FIB notifier handler 3268c2ecf20Sopenharmony_ciusing register_fib_notifier. The following events are available: 3278c2ecf20Sopenharmony_ci 3288c2ecf20Sopenharmony_ci=================== =================================================== 3298c2ecf20Sopenharmony_ciFIB_EVENT_ENTRY_ADD used for both adding a new FIB entry to the device, 3308c2ecf20Sopenharmony_ci or modifying an existing entry on the device. 3318c2ecf20Sopenharmony_ciFIB_EVENT_ENTRY_DEL used for removing a FIB entry 3328c2ecf20Sopenharmony_ciFIB_EVENT_RULE_ADD, 3338c2ecf20Sopenharmony_ciFIB_EVENT_RULE_DEL used to propagate FIB rule changes 3348c2ecf20Sopenharmony_ci=================== =================================================== 3358c2ecf20Sopenharmony_ci 3368c2ecf20Sopenharmony_ciFIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass:: 3378c2ecf20Sopenharmony_ci 3388c2ecf20Sopenharmony_ci struct fib_entry_notifier_info { 3398c2ecf20Sopenharmony_ci struct fib_notifier_info info; /* must be first */ 3408c2ecf20Sopenharmony_ci u32 dst; 3418c2ecf20Sopenharmony_ci int dst_len; 3428c2ecf20Sopenharmony_ci struct fib_info *fi; 3438c2ecf20Sopenharmony_ci u8 tos; 3448c2ecf20Sopenharmony_ci u8 type; 3458c2ecf20Sopenharmony_ci u32 tb_id; 3468c2ecf20Sopenharmony_ci u32 nlflags; 3478c2ecf20Sopenharmony_ci }; 3488c2ecf20Sopenharmony_ci 3498c2ecf20Sopenharmony_cito add/modify/delete IPv4 dst/dest_len prefix on table tb_id. The ``*fi`` 3508c2ecf20Sopenharmony_cistructure holds details on the route and route's nexthops. ``*dev`` is one 3518c2ecf20Sopenharmony_ciof the port netdevs mentioned in the route's next hop list. 3528c2ecf20Sopenharmony_ci 3538c2ecf20Sopenharmony_ciRoutes offloaded to the device are labeled with "offload" in the ip route 3548c2ecf20Sopenharmony_cilisting:: 3558c2ecf20Sopenharmony_ci 3568c2ecf20Sopenharmony_ci $ ip route show 3578c2ecf20Sopenharmony_ci default via 192.168.0.2 dev eth0 3588c2ecf20Sopenharmony_ci 11.0.0.0/30 dev sw1p1 proto kernel scope link src 11.0.0.2 offload 3598c2ecf20Sopenharmony_ci 11.0.0.4/30 via 11.0.0.1 dev sw1p1 proto zebra metric 20 offload 3608c2ecf20Sopenharmony_ci 11.0.0.8/30 dev sw1p2 proto kernel scope link src 11.0.0.10 offload 3618c2ecf20Sopenharmony_ci 11.0.0.12/30 via 11.0.0.9 dev sw1p2 proto zebra metric 20 offload 3628c2ecf20Sopenharmony_ci 12.0.0.2 proto zebra metric 30 offload 3638c2ecf20Sopenharmony_ci nexthop via 11.0.0.1 dev sw1p1 weight 1 3648c2ecf20Sopenharmony_ci nexthop via 11.0.0.9 dev sw1p2 weight 1 3658c2ecf20Sopenharmony_ci 12.0.0.3 via 11.0.0.1 dev sw1p1 proto zebra metric 20 offload 3668c2ecf20Sopenharmony_ci 12.0.0.4 via 11.0.0.9 dev sw1p2 proto zebra metric 20 offload 3678c2ecf20Sopenharmony_ci 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.15 3688c2ecf20Sopenharmony_ci 3698c2ecf20Sopenharmony_ciThe "offload" flag is set in case at least one device offloads the FIB entry. 3708c2ecf20Sopenharmony_ci 3718c2ecf20Sopenharmony_ciXXX: add/mod/del IPv6 FIB API 3728c2ecf20Sopenharmony_ci 3738c2ecf20Sopenharmony_ciNexthop Resolution 3748c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^ 3758c2ecf20Sopenharmony_ci 3768c2ecf20Sopenharmony_ciThe FIB entry's nexthop list contains the nexthop tuple (gateway, dev), but for 3778c2ecf20Sopenharmony_cithe switch device to forward the packet with the correct dst mac address, the 3788c2ecf20Sopenharmony_cinexthop gateways must be resolved to the neighbor's mac address. Neighbor mac 3798c2ecf20Sopenharmony_ciaddress discovery comes via the ARP (or ND) process and is available via the 3808c2ecf20Sopenharmony_ciarp_tbl neighbor table. To resolve the routes nexthop gateways, the driver 3818c2ecf20Sopenharmony_cishould trigger the kernel's neighbor resolution process. See the rocker 3828c2ecf20Sopenharmony_cidriver's rocker_port_ipv4_resolve() for an example. 3838c2ecf20Sopenharmony_ci 3848c2ecf20Sopenharmony_ciThe driver can monitor for updates to arp_tbl using the netevent notifier 3858c2ecf20Sopenharmony_ciNETEVENT_NEIGH_UPDATE. The device can be programmed with resolved nexthops 3868c2ecf20Sopenharmony_cifor the routes as arp_tbl updates. The driver implements ndo_neigh_destroy 3878c2ecf20Sopenharmony_cito know when arp_tbl neighbor entries are purged from the port. 388