18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci.. include:: <isonum.txt>
38c2ecf20Sopenharmony_ci
48c2ecf20Sopenharmony_ci===============================================
58c2ecf20Sopenharmony_ciEthernet switch device driver model (switchdev)
68c2ecf20Sopenharmony_ci===============================================
78c2ecf20Sopenharmony_ci
88c2ecf20Sopenharmony_ciCopyright |copy| 2014 Jiri Pirko <jiri@resnulli.us>
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciCopyright |copy| 2014-2015 Scott Feldman <sfeldma@gmail.com>
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ciThe Ethernet switch device driver model (switchdev) is an in-kernel driver
148c2ecf20Sopenharmony_cimodel for switch devices which offload the forwarding (data) plane from the
158c2ecf20Sopenharmony_cikernel.
168c2ecf20Sopenharmony_ci
178c2ecf20Sopenharmony_ciFigure 1 is a block diagram showing the components of the switchdev model for
188c2ecf20Sopenharmony_cian example setup using a data-center-class switch ASIC chip.  Other setups
198c2ecf20Sopenharmony_ciwith SR-IOV or soft switches, such as OVS, are possible.
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_ci::
228c2ecf20Sopenharmony_ci
238c2ecf20Sopenharmony_ci
248c2ecf20Sopenharmony_ci			     User-space tools
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ci       user space                   |
278c2ecf20Sopenharmony_ci      +-------------------------------------------------------------------+
288c2ecf20Sopenharmony_ci       kernel                       | Netlink
298c2ecf20Sopenharmony_ci				    |
308c2ecf20Sopenharmony_ci		     +--------------+-------------------------------+
318c2ecf20Sopenharmony_ci		     |         Network stack                        |
328c2ecf20Sopenharmony_ci		     |           (Linux)                            |
338c2ecf20Sopenharmony_ci		     |                                              |
348c2ecf20Sopenharmony_ci		     +----------------------------------------------+
358c2ecf20Sopenharmony_ci
368c2ecf20Sopenharmony_ci			   sw1p2     sw1p4     sw1p6
378c2ecf20Sopenharmony_ci		      sw1p1  +  sw1p3  +  sw1p5  +          eth1
388c2ecf20Sopenharmony_ci			+    |    +    |    +    |            +
398c2ecf20Sopenharmony_ci			|    |    |    |    |    |            |
408c2ecf20Sopenharmony_ci		     +--+----+----+----+----+----+---+  +-----+-----+
418c2ecf20Sopenharmony_ci		     |         Switch driver         |  |    mgmt   |
428c2ecf20Sopenharmony_ci		     |        (this document)        |  |   driver  |
438c2ecf20Sopenharmony_ci		     |                               |  |           |
448c2ecf20Sopenharmony_ci		     +--------------+----------------+  +-----------+
458c2ecf20Sopenharmony_ci				    |
468c2ecf20Sopenharmony_ci       kernel                       | HW bus (eg PCI)
478c2ecf20Sopenharmony_ci      +-------------------------------------------------------------------+
488c2ecf20Sopenharmony_ci       hardware                     |
498c2ecf20Sopenharmony_ci		     +--------------+----------------+
508c2ecf20Sopenharmony_ci		     |         Switch device (sw1)   |
518c2ecf20Sopenharmony_ci		     |  +----+                       +--------+
528c2ecf20Sopenharmony_ci		     |  |    v offloaded data path   | mgmt port
538c2ecf20Sopenharmony_ci		     |  |    |                       |
548c2ecf20Sopenharmony_ci		     +--|----|----+----+----+----+---+
558c2ecf20Sopenharmony_ci			|    |    |    |    |    |
568c2ecf20Sopenharmony_ci			+    +    +    +    +    +
578c2ecf20Sopenharmony_ci		       p1   p2   p3   p4   p5   p6
588c2ecf20Sopenharmony_ci
598c2ecf20Sopenharmony_ci			     front-panel ports
608c2ecf20Sopenharmony_ci
618c2ecf20Sopenharmony_ci
628c2ecf20Sopenharmony_ci				    Fig 1.
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ci
658c2ecf20Sopenharmony_ciInclude Files
668c2ecf20Sopenharmony_ci-------------
678c2ecf20Sopenharmony_ci
688c2ecf20Sopenharmony_ci::
698c2ecf20Sopenharmony_ci
708c2ecf20Sopenharmony_ci    #include <linux/netdevice.h>
718c2ecf20Sopenharmony_ci    #include <net/switchdev.h>
728c2ecf20Sopenharmony_ci
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ciConfiguration
758c2ecf20Sopenharmony_ci-------------
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ciUse "depends NET_SWITCHDEV" in driver's Kconfig to ensure switchdev model
788c2ecf20Sopenharmony_cisupport is built for driver.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ci
818c2ecf20Sopenharmony_ciSwitch Ports
828c2ecf20Sopenharmony_ci------------
838c2ecf20Sopenharmony_ci
848c2ecf20Sopenharmony_ciOn switchdev driver initialization, the driver will allocate and register a
858c2ecf20Sopenharmony_cistruct net_device (using register_netdev()) for each enumerated physical switch
868c2ecf20Sopenharmony_ciport, called the port netdev.  A port netdev is the software representation of
878c2ecf20Sopenharmony_cithe physical port and provides a conduit for control traffic to/from the
888c2ecf20Sopenharmony_cicontroller (the kernel) and the network, as well as an anchor point for higher
898c2ecf20Sopenharmony_cilevel constructs such as bridges, bonds, VLANs, tunnels, and L3 routers.  Using
908c2ecf20Sopenharmony_cistandard netdev tools (iproute2, ethtool, etc), the port netdev can also
918c2ecf20Sopenharmony_ciprovide to the user access to the physical properties of the switch port such
928c2ecf20Sopenharmony_cias PHY link state and I/O statistics.
938c2ecf20Sopenharmony_ci
948c2ecf20Sopenharmony_ciThere is (currently) no higher-level kernel object for the switch beyond the
958c2ecf20Sopenharmony_ciport netdevs.  All of the switchdev driver ops are netdev ops or switchdev ops.
968c2ecf20Sopenharmony_ci
978c2ecf20Sopenharmony_ciA switch management port is outside the scope of the switchdev driver model.
988c2ecf20Sopenharmony_ciTypically, the management port is not participating in offloaded data plane and
998c2ecf20Sopenharmony_ciis loaded with a different driver, such as a NIC driver, on the management port
1008c2ecf20Sopenharmony_cidevice.
1018c2ecf20Sopenharmony_ci
1028c2ecf20Sopenharmony_ciSwitch ID
1038c2ecf20Sopenharmony_ci^^^^^^^^^
1048c2ecf20Sopenharmony_ci
1058c2ecf20Sopenharmony_ciThe switchdev driver must implement the net_device operation
1068c2ecf20Sopenharmony_cindo_get_port_parent_id for each port netdev, returning the same physical ID for
1078c2ecf20Sopenharmony_cieach port of a switch. The ID must be unique between switches on the same
1088c2ecf20Sopenharmony_cisystem. The ID does not need to be unique between switches on different
1098c2ecf20Sopenharmony_cisystems.
1108c2ecf20Sopenharmony_ci
1118c2ecf20Sopenharmony_ciThe switch ID is used to locate ports on a switch and to know if aggregated
1128c2ecf20Sopenharmony_ciports belong to the same switch.
1138c2ecf20Sopenharmony_ci
1148c2ecf20Sopenharmony_ciPort Netdev Naming
1158c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^
1168c2ecf20Sopenharmony_ci
1178c2ecf20Sopenharmony_ciUdev rules should be used for port netdev naming, using some unique attribute
1188c2ecf20Sopenharmony_ciof the port as a key, for example the port MAC address or the port PHYS name.
1198c2ecf20Sopenharmony_ciHard-coding of kernel netdev names within the driver is discouraged; let the
1208c2ecf20Sopenharmony_cikernel pick the default netdev name, and let udev set the final name based on a
1218c2ecf20Sopenharmony_ciport attribute.
1228c2ecf20Sopenharmony_ci
1238c2ecf20Sopenharmony_ciUsing port PHYS name (ndo_get_phys_port_name) for the key is particularly
1248c2ecf20Sopenharmony_ciuseful for dynamically-named ports where the device names its ports based on
1258c2ecf20Sopenharmony_ciexternal configuration.  For example, if a physical 40G port is split logically
1268c2ecf20Sopenharmony_ciinto 4 10G ports, resulting in 4 port netdevs, the device can give a unique
1278c2ecf20Sopenharmony_ciname for each port using port PHYS name.  The udev rule would be::
1288c2ecf20Sopenharmony_ci
1298c2ecf20Sopenharmony_ci    SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
1308c2ecf20Sopenharmony_ci	    ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
1318c2ecf20Sopenharmony_ci
1328c2ecf20Sopenharmony_ciSuggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
1338c2ecf20Sopenharmony_ciis the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
1348c2ecf20Sopenharmony_ciwould be sub-port 0 on port 1 on switch 1.
1358c2ecf20Sopenharmony_ci
1368c2ecf20Sopenharmony_ciPort Features
1378c2ecf20Sopenharmony_ci^^^^^^^^^^^^^
1388c2ecf20Sopenharmony_ci
1398c2ecf20Sopenharmony_ciNETIF_F_NETNS_LOCAL
1408c2ecf20Sopenharmony_ci
1418c2ecf20Sopenharmony_ciIf the switchdev driver (and device) only supports offloading of the default
1428c2ecf20Sopenharmony_cinetwork namespace (netns), the driver should set this feature flag to prevent
1438c2ecf20Sopenharmony_cithe port netdev from being moved out of the default netns.  A netns-aware
1448c2ecf20Sopenharmony_cidriver/device would not set this flag and be responsible for partitioning
1458c2ecf20Sopenharmony_cihardware to preserve netns containment.  This means hardware cannot forward
1468c2ecf20Sopenharmony_citraffic from a port in one namespace to another port in another namespace.
1478c2ecf20Sopenharmony_ci
1488c2ecf20Sopenharmony_ciPort Topology
1498c2ecf20Sopenharmony_ci^^^^^^^^^^^^^
1508c2ecf20Sopenharmony_ci
1518c2ecf20Sopenharmony_ciThe port netdevs representing the physical switch ports can be organized into
1528c2ecf20Sopenharmony_cihigher-level switching constructs.  The default construct is a standalone
1538c2ecf20Sopenharmony_cirouter port, used to offload L3 forwarding.  Two or more ports can be bonded
1548c2ecf20Sopenharmony_citogether to form a LAG.  Two or more ports (or LAGs) can be bridged to bridge
1558c2ecf20Sopenharmony_ciL2 networks.  VLANs can be applied to sub-divide L2 networks.  L2-over-L3
1568c2ecf20Sopenharmony_citunnels can be built on ports.  These constructs are built using standard Linux
1578c2ecf20Sopenharmony_citools such as the bridge driver, the bonding/team drivers, and netlink-based
1588c2ecf20Sopenharmony_citools such as iproute2.
1598c2ecf20Sopenharmony_ci
1608c2ecf20Sopenharmony_ciThe switchdev driver can know a particular port's position in the topology by
1618c2ecf20Sopenharmony_cimonitoring NETDEV_CHANGEUPPER notifications.  For example, a port moved into a
1628c2ecf20Sopenharmony_cibond will see it's upper master change.  If that bond is moved into a bridge,
1638c2ecf20Sopenharmony_cithe bond's upper master will change.  And so on.  The driver will track such
1648c2ecf20Sopenharmony_cimovements to know what position a port is in in the overall topology by
1658c2ecf20Sopenharmony_ciregistering for netdevice events and acting on NETDEV_CHANGEUPPER.
1668c2ecf20Sopenharmony_ci
1678c2ecf20Sopenharmony_ciL2 Forwarding Offload
1688c2ecf20Sopenharmony_ci---------------------
1698c2ecf20Sopenharmony_ci
1708c2ecf20Sopenharmony_ciThe idea is to offload the L2 data forwarding (switching) path from the kernel
1718c2ecf20Sopenharmony_cito the switchdev device by mirroring bridge FDB entries down to the device.  An
1728c2ecf20Sopenharmony_ciFDB entry is the {port, MAC, VLAN} tuple forwarding destination.
1738c2ecf20Sopenharmony_ci
1748c2ecf20Sopenharmony_ciTo offloading L2 bridging, the switchdev driver/device should support:
1758c2ecf20Sopenharmony_ci
1768c2ecf20Sopenharmony_ci	- Static FDB entries installed on a bridge port
1778c2ecf20Sopenharmony_ci	- Notification of learned/forgotten src mac/vlans from device
1788c2ecf20Sopenharmony_ci	- STP state changes on the port
1798c2ecf20Sopenharmony_ci	- VLAN flooding of multicast/broadcast and unknown unicast packets
1808c2ecf20Sopenharmony_ci
1818c2ecf20Sopenharmony_ciStatic FDB Entries
1828c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^
1838c2ecf20Sopenharmony_ci
1848c2ecf20Sopenharmony_ciThe switchdev driver should implement ndo_fdb_add, ndo_fdb_del and ndo_fdb_dump
1858c2ecf20Sopenharmony_cito support static FDB entries installed to the device.  Static bridge FDB
1868c2ecf20Sopenharmony_cientries are installed, for example, using iproute2 bridge cmd::
1878c2ecf20Sopenharmony_ci
1888c2ecf20Sopenharmony_ci	bridge fdb add ADDR dev DEV [vlan VID] [self]
1898c2ecf20Sopenharmony_ci
1908c2ecf20Sopenharmony_ciThe driver should use the helper switchdev_port_fdb_xxx ops for ndo_fdb_xxx
1918c2ecf20Sopenharmony_ciops, and handle add/delete/dump of SWITCHDEV_OBJ_ID_PORT_FDB object using
1928c2ecf20Sopenharmony_ciswitchdev_port_obj_xxx ops.
1938c2ecf20Sopenharmony_ci
1948c2ecf20Sopenharmony_ciXXX: what should be done if offloading this rule to hardware fails (for
1958c2ecf20Sopenharmony_ciexample, due to full capacity in hardware tables) ?
1968c2ecf20Sopenharmony_ci
1978c2ecf20Sopenharmony_ciNote: by default, the bridge does not filter on VLAN and only bridges untagged
1988c2ecf20Sopenharmony_citraffic.  To enable VLAN support, turn on VLAN filtering::
1998c2ecf20Sopenharmony_ci
2008c2ecf20Sopenharmony_ci	echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering
2018c2ecf20Sopenharmony_ci
2028c2ecf20Sopenharmony_ciNotification of Learned/Forgotten Source MAC/VLANs
2038c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2048c2ecf20Sopenharmony_ci
2058c2ecf20Sopenharmony_ciThe switch device will learn/forget source MAC address/VLAN on ingress packets
2068c2ecf20Sopenharmony_ciand notify the switch driver of the mac/vlan/port tuples.  The switch driver,
2078c2ecf20Sopenharmony_ciin turn, will notify the bridge driver using the switchdev notifier call::
2088c2ecf20Sopenharmony_ci
2098c2ecf20Sopenharmony_ci	err = call_switchdev_notifiers(val, dev, info, extack);
2108c2ecf20Sopenharmony_ci
2118c2ecf20Sopenharmony_ciWhere val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when
2128c2ecf20Sopenharmony_ciforgetting, and info points to a struct switchdev_notifier_fdb_info.  On
2138c2ecf20Sopenharmony_ciSWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the
2148c2ecf20Sopenharmony_cibridge's FDB and mark the entry as NTF_EXT_LEARNED.  The iproute2 bridge
2158c2ecf20Sopenharmony_cicommand will label these entries "offload"::
2168c2ecf20Sopenharmony_ci
2178c2ecf20Sopenharmony_ci	$ bridge fdb
2188c2ecf20Sopenharmony_ci	52:54:00:12:35:01 dev sw1p1 master br0 permanent
2198c2ecf20Sopenharmony_ci	00:02:00:00:02:00 dev sw1p1 master br0 offload
2208c2ecf20Sopenharmony_ci	00:02:00:00:02:00 dev sw1p1 self
2218c2ecf20Sopenharmony_ci	52:54:00:12:35:02 dev sw1p2 master br0 permanent
2228c2ecf20Sopenharmony_ci	00:02:00:00:03:00 dev sw1p2 master br0 offload
2238c2ecf20Sopenharmony_ci	00:02:00:00:03:00 dev sw1p2 self
2248c2ecf20Sopenharmony_ci	33:33:00:00:00:01 dev eth0 self permanent
2258c2ecf20Sopenharmony_ci	01:00:5e:00:00:01 dev eth0 self permanent
2268c2ecf20Sopenharmony_ci	33:33:ff:00:00:00 dev eth0 self permanent
2278c2ecf20Sopenharmony_ci	01:80:c2:00:00:0e dev eth0 self permanent
2288c2ecf20Sopenharmony_ci	33:33:00:00:00:01 dev br0 self permanent
2298c2ecf20Sopenharmony_ci	01:00:5e:00:00:01 dev br0 self permanent
2308c2ecf20Sopenharmony_ci	33:33:ff:12:35:01 dev br0 self permanent
2318c2ecf20Sopenharmony_ci
2328c2ecf20Sopenharmony_ciLearning on the port should be disabled on the bridge using the bridge command::
2338c2ecf20Sopenharmony_ci
2348c2ecf20Sopenharmony_ci	bridge link set dev DEV learning off
2358c2ecf20Sopenharmony_ci
2368c2ecf20Sopenharmony_ciLearning on the device port should be enabled, as well as learning_sync::
2378c2ecf20Sopenharmony_ci
2388c2ecf20Sopenharmony_ci	bridge link set dev DEV learning on self
2398c2ecf20Sopenharmony_ci	bridge link set dev DEV learning_sync on self
2408c2ecf20Sopenharmony_ci
2418c2ecf20Sopenharmony_ciLearning_sync attribute enables syncing of the learned/forgotten FDB entry to
2428c2ecf20Sopenharmony_cithe bridge's FDB.  It's possible, but not optimal, to enable learning on the
2438c2ecf20Sopenharmony_cidevice port and on the bridge port, and disable learning_sync.
2448c2ecf20Sopenharmony_ci
2458c2ecf20Sopenharmony_ciTo support learning, the driver implements switchdev op
2468c2ecf20Sopenharmony_ciswitchdev_port_attr_set for SWITCHDEV_ATTR_PORT_ID_{PRE}_BRIDGE_FLAGS.
2478c2ecf20Sopenharmony_ci
2488c2ecf20Sopenharmony_ciFDB Ageing
2498c2ecf20Sopenharmony_ci^^^^^^^^^^
2508c2ecf20Sopenharmony_ci
2518c2ecf20Sopenharmony_ciThe bridge will skip ageing FDB entries marked with NTF_EXT_LEARNED and it is
2528c2ecf20Sopenharmony_cithe responsibility of the port driver/device to age out these entries.  If the
2538c2ecf20Sopenharmony_ciport device supports ageing, when the FDB entry expires, it will notify the
2548c2ecf20Sopenharmony_cidriver which in turn will notify the bridge with SWITCHDEV_FDB_DEL.  If the
2558c2ecf20Sopenharmony_cidevice does not support ageing, the driver can simulate ageing using a
2568c2ecf20Sopenharmony_cigarbage collection timer to monitor FDB entries.  Expired entries will be
2578c2ecf20Sopenharmony_cinotified to the bridge using SWITCHDEV_FDB_DEL.  See rocker driver for
2588c2ecf20Sopenharmony_ciexample of driver running ageing timer.
2598c2ecf20Sopenharmony_ci
2608c2ecf20Sopenharmony_ciTo keep an NTF_EXT_LEARNED entry "alive", the driver should refresh the FDB
2618c2ecf20Sopenharmony_cientry by calling call_switchdev_notifiers(SWITCHDEV_FDB_ADD, ...).  The
2628c2ecf20Sopenharmony_cinotification will reset the FDB entry's last-used time to now.  The driver
2638c2ecf20Sopenharmony_cishould rate limit refresh notifications, for example, no more than once a
2648c2ecf20Sopenharmony_cisecond.  (The last-used time is visible using the bridge -s fdb option).
2658c2ecf20Sopenharmony_ci
2668c2ecf20Sopenharmony_ciSTP State Change on Port
2678c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^^^^^^^
2688c2ecf20Sopenharmony_ci
2698c2ecf20Sopenharmony_ciInternally or with a third-party STP protocol implementation (e.g. mstpd), the
2708c2ecf20Sopenharmony_cibridge driver maintains the STP state for ports, and will notify the switch
2718c2ecf20Sopenharmony_cidriver of STP state change on a port using the switchdev op
2728c2ecf20Sopenharmony_ciswitchdev_attr_port_set for SWITCHDEV_ATTR_PORT_ID_STP_UPDATE.
2738c2ecf20Sopenharmony_ci
2748c2ecf20Sopenharmony_ciState is one of BR_STATE_*.  The switch driver can use STP state updates to
2758c2ecf20Sopenharmony_ciupdate ingress packet filter list for the port.  For example, if port is
2768c2ecf20Sopenharmony_ciDISABLED, no packets should pass, but if port moves to BLOCKED, then STP BPDUs
2778c2ecf20Sopenharmony_ciand other IEEE 01:80:c2:xx:xx:xx link-local multicast packets can pass.
2788c2ecf20Sopenharmony_ci
2798c2ecf20Sopenharmony_ciNote that STP BDPUs are untagged and STP state applies to all VLANs on the port
2808c2ecf20Sopenharmony_ciso packet filters should be applied consistently across untagged and tagged
2818c2ecf20Sopenharmony_ciVLANs on the port.
2828c2ecf20Sopenharmony_ci
2838c2ecf20Sopenharmony_ciFlooding L2 domain
2848c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^
2858c2ecf20Sopenharmony_ci
2868c2ecf20Sopenharmony_ciFor a given L2 VLAN domain, the switch device should flood multicast/broadcast
2878c2ecf20Sopenharmony_ciand unknown unicast packets to all ports in domain, if allowed by port's
2888c2ecf20Sopenharmony_cicurrent STP state.  The switch driver, knowing which ports are within which
2898c2ecf20Sopenharmony_civlan L2 domain, can program the switch device for flooding.  The packet may
2908c2ecf20Sopenharmony_cibe sent to the port netdev for processing by the bridge driver.  The
2918c2ecf20Sopenharmony_cibridge should not reflood the packet to the same ports the device flooded,
2928c2ecf20Sopenharmony_ciotherwise there will be duplicate packets on the wire.
2938c2ecf20Sopenharmony_ci
2948c2ecf20Sopenharmony_ciTo avoid duplicate packets, the switch driver should mark a packet as already
2958c2ecf20Sopenharmony_ciforwarded by setting the skb->offload_fwd_mark bit. The bridge driver will mark
2968c2ecf20Sopenharmony_cithe skb using the ingress bridge port's mark and prevent it from being forwarded
2978c2ecf20Sopenharmony_cithrough any bridge port with the same mark.
2988c2ecf20Sopenharmony_ci
2998c2ecf20Sopenharmony_ciIt is possible for the switch device to not handle flooding and push the
3008c2ecf20Sopenharmony_cipackets up to the bridge driver for flooding.  This is not ideal as the number
3018c2ecf20Sopenharmony_ciof ports scale in the L2 domain as the device is much more efficient at
3028c2ecf20Sopenharmony_ciflooding packets that software.
3038c2ecf20Sopenharmony_ci
3048c2ecf20Sopenharmony_ciIf supported by the device, flood control can be offloaded to it, preventing
3058c2ecf20Sopenharmony_cicertain netdevs from flooding unicast traffic for which there is no FDB entry.
3068c2ecf20Sopenharmony_ci
3078c2ecf20Sopenharmony_ciIGMP Snooping
3088c2ecf20Sopenharmony_ci^^^^^^^^^^^^^
3098c2ecf20Sopenharmony_ci
3108c2ecf20Sopenharmony_ciIn order to support IGMP snooping, the port netdevs should trap to the bridge
3118c2ecf20Sopenharmony_cidriver all IGMP join and leave messages.
3128c2ecf20Sopenharmony_ciThe bridge multicast module will notify port netdevs on every multicast group
3138c2ecf20Sopenharmony_cichanged whether it is static configured or dynamically joined/leave.
3148c2ecf20Sopenharmony_ciThe hardware implementation should be forwarding all registered multicast
3158c2ecf20Sopenharmony_citraffic groups only to the configured ports.
3168c2ecf20Sopenharmony_ci
3178c2ecf20Sopenharmony_ciL3 Routing Offload
3188c2ecf20Sopenharmony_ci------------------
3198c2ecf20Sopenharmony_ci
3208c2ecf20Sopenharmony_ciOffloading L3 routing requires that device be programmed with FIB entries from
3218c2ecf20Sopenharmony_cithe kernel, with the device doing the FIB lookup and forwarding.  The device
3228c2ecf20Sopenharmony_cidoes a longest prefix match (LPM) on FIB entries matching route prefix and
3238c2ecf20Sopenharmony_ciforwards the packet to the matching FIB entry's nexthop(s) egress ports.
3248c2ecf20Sopenharmony_ci
3258c2ecf20Sopenharmony_ciTo program the device, the driver has to register a FIB notifier handler
3268c2ecf20Sopenharmony_ciusing register_fib_notifier. The following events are available:
3278c2ecf20Sopenharmony_ci
3288c2ecf20Sopenharmony_ci===================  ===================================================
3298c2ecf20Sopenharmony_ciFIB_EVENT_ENTRY_ADD  used for both adding a new FIB entry to the device,
3308c2ecf20Sopenharmony_ci		     or modifying an existing entry on the device.
3318c2ecf20Sopenharmony_ciFIB_EVENT_ENTRY_DEL  used for removing a FIB entry
3328c2ecf20Sopenharmony_ciFIB_EVENT_RULE_ADD,
3338c2ecf20Sopenharmony_ciFIB_EVENT_RULE_DEL   used to propagate FIB rule changes
3348c2ecf20Sopenharmony_ci===================  ===================================================
3358c2ecf20Sopenharmony_ci
3368c2ecf20Sopenharmony_ciFIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass::
3378c2ecf20Sopenharmony_ci
3388c2ecf20Sopenharmony_ci	struct fib_entry_notifier_info {
3398c2ecf20Sopenharmony_ci		struct fib_notifier_info info; /* must be first */
3408c2ecf20Sopenharmony_ci		u32 dst;
3418c2ecf20Sopenharmony_ci		int dst_len;
3428c2ecf20Sopenharmony_ci		struct fib_info *fi;
3438c2ecf20Sopenharmony_ci		u8 tos;
3448c2ecf20Sopenharmony_ci		u8 type;
3458c2ecf20Sopenharmony_ci		u32 tb_id;
3468c2ecf20Sopenharmony_ci		u32 nlflags;
3478c2ecf20Sopenharmony_ci	};
3488c2ecf20Sopenharmony_ci
3498c2ecf20Sopenharmony_cito add/modify/delete IPv4 dst/dest_len prefix on table tb_id.  The ``*fi``
3508c2ecf20Sopenharmony_cistructure holds details on the route and route's nexthops.  ``*dev`` is one
3518c2ecf20Sopenharmony_ciof the port netdevs mentioned in the route's next hop list.
3528c2ecf20Sopenharmony_ci
3538c2ecf20Sopenharmony_ciRoutes offloaded to the device are labeled with "offload" in the ip route
3548c2ecf20Sopenharmony_cilisting::
3558c2ecf20Sopenharmony_ci
3568c2ecf20Sopenharmony_ci	$ ip route show
3578c2ecf20Sopenharmony_ci	default via 192.168.0.2 dev eth0
3588c2ecf20Sopenharmony_ci	11.0.0.0/30 dev sw1p1  proto kernel  scope link  src 11.0.0.2 offload
3598c2ecf20Sopenharmony_ci	11.0.0.4/30 via 11.0.0.1 dev sw1p1  proto zebra  metric 20 offload
3608c2ecf20Sopenharmony_ci	11.0.0.8/30 dev sw1p2  proto kernel  scope link  src 11.0.0.10 offload
3618c2ecf20Sopenharmony_ci	11.0.0.12/30 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
3628c2ecf20Sopenharmony_ci	12.0.0.2  proto zebra  metric 30 offload
3638c2ecf20Sopenharmony_ci		nexthop via 11.0.0.1  dev sw1p1 weight 1
3648c2ecf20Sopenharmony_ci		nexthop via 11.0.0.9  dev sw1p2 weight 1
3658c2ecf20Sopenharmony_ci	12.0.0.3 via 11.0.0.1 dev sw1p1  proto zebra  metric 20 offload
3668c2ecf20Sopenharmony_ci	12.0.0.4 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
3678c2ecf20Sopenharmony_ci	192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.15
3688c2ecf20Sopenharmony_ci
3698c2ecf20Sopenharmony_ciThe "offload" flag is set in case at least one device offloads the FIB entry.
3708c2ecf20Sopenharmony_ci
3718c2ecf20Sopenharmony_ciXXX: add/mod/del IPv6 FIB API
3728c2ecf20Sopenharmony_ci
3738c2ecf20Sopenharmony_ciNexthop Resolution
3748c2ecf20Sopenharmony_ci^^^^^^^^^^^^^^^^^^
3758c2ecf20Sopenharmony_ci
3768c2ecf20Sopenharmony_ciThe FIB entry's nexthop list contains the nexthop tuple (gateway, dev), but for
3778c2ecf20Sopenharmony_cithe switch device to forward the packet with the correct dst mac address, the
3788c2ecf20Sopenharmony_cinexthop gateways must be resolved to the neighbor's mac address.  Neighbor mac
3798c2ecf20Sopenharmony_ciaddress discovery comes via the ARP (or ND) process and is available via the
3808c2ecf20Sopenharmony_ciarp_tbl neighbor table.  To resolve the routes nexthop gateways, the driver
3818c2ecf20Sopenharmony_cishould trigger the kernel's neighbor resolution process.  See the rocker
3828c2ecf20Sopenharmony_cidriver's rocker_port_ipv4_resolve() for an example.
3838c2ecf20Sopenharmony_ci
3848c2ecf20Sopenharmony_ciThe driver can monitor for updates to arp_tbl using the netevent notifier
3858c2ecf20Sopenharmony_ciNETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
3868c2ecf20Sopenharmony_cifor the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
3878c2ecf20Sopenharmony_cito know when arp_tbl neighbor entries are purged from the port.
388