18c2ecf20Sopenharmony_ci============ 28c2ecf20Sopenharmony_ciArchitecture 38c2ecf20Sopenharmony_ci============ 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciThis document describes the **Distributed Switch Architecture (DSA)** subsystem 68c2ecf20Sopenharmony_cidesign principles, limitations, interactions with other subsystems, and how to 78c2ecf20Sopenharmony_cidevelop drivers for this subsystem as well as a TODO for developers interested 88c2ecf20Sopenharmony_ciin joining the effort. 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ciDesign principles 118c2ecf20Sopenharmony_ci================= 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciThe Distributed Switch Architecture is a subsystem which was primarily designed 148c2ecf20Sopenharmony_cito support Marvell Ethernet switches (MV88E6xxx, a.k.a Linkstreet product line) 158c2ecf20Sopenharmony_ciusing Linux, but has since evolved to support other vendors as well. 168c2ecf20Sopenharmony_ci 178c2ecf20Sopenharmony_ciThe original philosophy behind this design was to be able to use unmodified 188c2ecf20Sopenharmony_ciLinux tools such as bridge, iproute2, ifconfig to work transparently whether 198c2ecf20Sopenharmony_cithey configured/queried a switch port network device or a regular network 208c2ecf20Sopenharmony_cidevice. 218c2ecf20Sopenharmony_ci 228c2ecf20Sopenharmony_ciAn Ethernet switch is typically comprised of multiple front-panel ports, and one 238c2ecf20Sopenharmony_cior more CPU or management port. The DSA subsystem currently relies on the 248c2ecf20Sopenharmony_cipresence of a management port connected to an Ethernet controller capable of 258c2ecf20Sopenharmony_cireceiving Ethernet frames from the switch. This is a very common setup for all 268c2ecf20Sopenharmony_cikinds of Ethernet switches found in Small Home and Office products: routers, 278c2ecf20Sopenharmony_cigateways, or even top-of-the rack switches. This host Ethernet controller will 288c2ecf20Sopenharmony_cibe later referred to as "master" and "cpu" in DSA terminology and code. 298c2ecf20Sopenharmony_ci 308c2ecf20Sopenharmony_ciThe D in DSA stands for Distributed, because the subsystem has been designed 318c2ecf20Sopenharmony_ciwith the ability to configure and manage cascaded switches on top of each other 328c2ecf20Sopenharmony_ciusing upstream and downstream Ethernet links between switches. These specific 338c2ecf20Sopenharmony_ciports are referred to as "dsa" ports in DSA terminology and code. A collection 348c2ecf20Sopenharmony_ciof multiple switches connected to each other is called a "switch tree". 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ciFor each front-panel port, DSA will create specialized network devices which are 378c2ecf20Sopenharmony_ciused as controlling and data-flowing endpoints for use by the Linux networking 388c2ecf20Sopenharmony_cistack. These specialized network interfaces are referred to as "slave" network 398c2ecf20Sopenharmony_ciinterfaces in DSA terminology and code. 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_ciThe ideal case for using DSA is when an Ethernet switch supports a "switch tag" 428c2ecf20Sopenharmony_ciwhich is a hardware feature making the switch insert a specific tag for each 438c2ecf20Sopenharmony_ciEthernet frames it received to/from specific ports to help the management 448c2ecf20Sopenharmony_ciinterface figure out: 458c2ecf20Sopenharmony_ci 468c2ecf20Sopenharmony_ci- what port is this frame coming from 478c2ecf20Sopenharmony_ci- what was the reason why this frame got forwarded 488c2ecf20Sopenharmony_ci- how to send CPU originated traffic to specific ports 498c2ecf20Sopenharmony_ci 508c2ecf20Sopenharmony_ciThe subsystem does support switches not capable of inserting/stripping tags, but 518c2ecf20Sopenharmony_cithe features might be slightly limited in that case (traffic separation relies 528c2ecf20Sopenharmony_cion Port-based VLAN IDs). 538c2ecf20Sopenharmony_ci 548c2ecf20Sopenharmony_ciNote that DSA does not currently create network interfaces for the "cpu" and 558c2ecf20Sopenharmony_ci"dsa" ports because: 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ci- the "cpu" port is the Ethernet switch facing side of the management 588c2ecf20Sopenharmony_ci controller, and as such, would create a duplication of feature, since you 598c2ecf20Sopenharmony_ci would get two interfaces for the same conduit: master netdev, and "cpu" netdev 608c2ecf20Sopenharmony_ci 618c2ecf20Sopenharmony_ci- the "dsa" port(s) are just conduits between two or more switches, and as such 628c2ecf20Sopenharmony_ci cannot really be used as proper network interfaces either, only the 638c2ecf20Sopenharmony_ci downstream, or the top-most upstream interface makes sense with that model 648c2ecf20Sopenharmony_ci 658c2ecf20Sopenharmony_ciSwitch tagging protocols 668c2ecf20Sopenharmony_ci------------------------ 678c2ecf20Sopenharmony_ci 688c2ecf20Sopenharmony_ciDSA currently supports 5 different tagging protocols, and a tag-less mode as 698c2ecf20Sopenharmony_ciwell. The different protocols are implemented in: 708c2ecf20Sopenharmony_ci 718c2ecf20Sopenharmony_ci- ``net/dsa/tag_trailer.c``: Marvell's 4 trailer tag mode (legacy) 728c2ecf20Sopenharmony_ci- ``net/dsa/tag_dsa.c``: Marvell's original DSA tag 738c2ecf20Sopenharmony_ci- ``net/dsa/tag_edsa.c``: Marvell's enhanced DSA tag 748c2ecf20Sopenharmony_ci- ``net/dsa/tag_brcm.c``: Broadcom's 4 bytes tag 758c2ecf20Sopenharmony_ci- ``net/dsa/tag_qca.c``: Qualcomm's 2 bytes tag 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciThe exact format of the tag protocol is vendor specific, but in general, they 788c2ecf20Sopenharmony_ciall contain something which: 798c2ecf20Sopenharmony_ci 808c2ecf20Sopenharmony_ci- identifies which port the Ethernet frame came from/should be sent to 818c2ecf20Sopenharmony_ci- provides a reason why this frame was forwarded to the management interface 828c2ecf20Sopenharmony_ci 838c2ecf20Sopenharmony_ciMaster network devices 848c2ecf20Sopenharmony_ci---------------------- 858c2ecf20Sopenharmony_ci 868c2ecf20Sopenharmony_ciMaster network devices are regular, unmodified Linux network device drivers for 878c2ecf20Sopenharmony_cithe CPU/management Ethernet interface. Such a driver might occasionally need to 888c2ecf20Sopenharmony_ciknow whether DSA is enabled (e.g.: to enable/disable specific offload features), 898c2ecf20Sopenharmony_cibut the DSA subsystem has been proven to work with industry standard drivers: 908c2ecf20Sopenharmony_ci``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these 918c2ecf20Sopenharmony_cidrivers. Such network devices are also often referred to as conduit network 928c2ecf20Sopenharmony_cidevices since they act as a pipe between the host processor and the hardware 938c2ecf20Sopenharmony_ciEthernet switch. 948c2ecf20Sopenharmony_ci 958c2ecf20Sopenharmony_ciNetworking stack hooks 968c2ecf20Sopenharmony_ci---------------------- 978c2ecf20Sopenharmony_ci 988c2ecf20Sopenharmony_ciWhen a master netdev is used with DSA, a small hook is placed in the 998c2ecf20Sopenharmony_cinetworking stack is in order to have the DSA subsystem process the Ethernet 1008c2ecf20Sopenharmony_ciswitch specific tagging protocol. DSA accomplishes this by registering a 1018c2ecf20Sopenharmony_cispecific (and fake) Ethernet type (later becoming ``skb->protocol``) with the 1028c2ecf20Sopenharmony_cinetworking stack, this is also known as a ``ptype`` or ``packet_type``. A typical 1038c2ecf20Sopenharmony_ciEthernet Frame receive sequence looks like this: 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ciMaster network device (e.g.: e1000e): 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ci1. Receive interrupt fires: 1088c2ecf20Sopenharmony_ci 1098c2ecf20Sopenharmony_ci - receive function is invoked 1108c2ecf20Sopenharmony_ci - basic packet processing is done: getting length, status etc. 1118c2ecf20Sopenharmony_ci - packet is prepared to be processed by the Ethernet layer by calling 1128c2ecf20Sopenharmony_ci ``eth_type_trans`` 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_ci2. net/ethernet/eth.c:: 1158c2ecf20Sopenharmony_ci 1168c2ecf20Sopenharmony_ci eth_type_trans(skb, dev) 1178c2ecf20Sopenharmony_ci if (dev->dsa_ptr != NULL) 1188c2ecf20Sopenharmony_ci -> skb->protocol = ETH_P_XDSA 1198c2ecf20Sopenharmony_ci 1208c2ecf20Sopenharmony_ci3. drivers/net/ethernet/\*:: 1218c2ecf20Sopenharmony_ci 1228c2ecf20Sopenharmony_ci netif_receive_skb(skb) 1238c2ecf20Sopenharmony_ci -> iterate over registered packet_type 1248c2ecf20Sopenharmony_ci -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv() 1258c2ecf20Sopenharmony_ci 1268c2ecf20Sopenharmony_ci4. net/dsa/dsa.c:: 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ci -> dsa_switch_rcv() 1298c2ecf20Sopenharmony_ci -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c' 1308c2ecf20Sopenharmony_ci 1318c2ecf20Sopenharmony_ci5. net/dsa/tag_*.c: 1328c2ecf20Sopenharmony_ci 1338c2ecf20Sopenharmony_ci - inspect and strip switch tag protocol to determine originating port 1348c2ecf20Sopenharmony_ci - locate per-port network device 1358c2ecf20Sopenharmony_ci - invoke ``eth_type_trans()`` with the DSA slave network device 1368c2ecf20Sopenharmony_ci - invoked ``netif_receive_skb()`` 1378c2ecf20Sopenharmony_ci 1388c2ecf20Sopenharmony_ciPast this point, the DSA slave network devices get delivered regular Ethernet 1398c2ecf20Sopenharmony_ciframes that can be processed by the networking stack. 1408c2ecf20Sopenharmony_ci 1418c2ecf20Sopenharmony_ciSlave network devices 1428c2ecf20Sopenharmony_ci--------------------- 1438c2ecf20Sopenharmony_ci 1448c2ecf20Sopenharmony_ciSlave network devices created by DSA are stacked on top of their master network 1458c2ecf20Sopenharmony_cidevice, each of these network interfaces will be responsible for being a 1468c2ecf20Sopenharmony_cicontrolling and data-flowing end-point for each front-panel port of the switch. 1478c2ecf20Sopenharmony_ciThese interfaces are specialized in order to: 1488c2ecf20Sopenharmony_ci 1498c2ecf20Sopenharmony_ci- insert/remove the switch tag protocol (if it exists) when sending traffic 1508c2ecf20Sopenharmony_ci to/from specific switch ports 1518c2ecf20Sopenharmony_ci- query the switch for ethtool operations: statistics, link state, 1528c2ecf20Sopenharmony_ci Wake-on-LAN, register dumps... 1538c2ecf20Sopenharmony_ci- external/internal PHY management: link, auto-negotiation etc. 1548c2ecf20Sopenharmony_ci 1558c2ecf20Sopenharmony_ciThese slave network devices have custom net_device_ops and ethtool_ops function 1568c2ecf20Sopenharmony_cipointers which allow DSA to introduce a level of layering between the networking 1578c2ecf20Sopenharmony_cistack/ethtool, and the switch driver implementation. 1588c2ecf20Sopenharmony_ci 1598c2ecf20Sopenharmony_ciUpon frame transmission from these slave network devices, DSA will look up which 1608c2ecf20Sopenharmony_ciswitch tagging protocol is currently registered with these network devices, and 1618c2ecf20Sopenharmony_ciinvoke a specific transmit routine which takes care of adding the relevant 1628c2ecf20Sopenharmony_ciswitch tag in the Ethernet frames. 1638c2ecf20Sopenharmony_ci 1648c2ecf20Sopenharmony_ciThese frames are then queued for transmission using the master network device 1658c2ecf20Sopenharmony_ci``ndo_start_xmit()`` function, since they contain the appropriate switch tag, the 1668c2ecf20Sopenharmony_ciEthernet switch will be able to process these incoming frames from the 1678c2ecf20Sopenharmony_cimanagement interface and delivers these frames to the physical switch port. 1688c2ecf20Sopenharmony_ci 1698c2ecf20Sopenharmony_ciGraphical representation 1708c2ecf20Sopenharmony_ci------------------------ 1718c2ecf20Sopenharmony_ci 1728c2ecf20Sopenharmony_ciSummarized, this is basically how DSA looks like from a network device 1738c2ecf20Sopenharmony_ciperspective:: 1748c2ecf20Sopenharmony_ci 1758c2ecf20Sopenharmony_ci 1768c2ecf20Sopenharmony_ci |--------------------------- 1778c2ecf20Sopenharmony_ci | CPU network device (eth0)| 1788c2ecf20Sopenharmony_ci ---------------------------- 1798c2ecf20Sopenharmony_ci | <tag added by switch | 1808c2ecf20Sopenharmony_ci | | 1818c2ecf20Sopenharmony_ci | | 1828c2ecf20Sopenharmony_ci | tag added by CPU> | 1838c2ecf20Sopenharmony_ci |--------------------------------------------| 1848c2ecf20Sopenharmony_ci | Switch driver | 1858c2ecf20Sopenharmony_ci |--------------------------------------------| 1868c2ecf20Sopenharmony_ci || || || 1878c2ecf20Sopenharmony_ci |-------| |-------| |-------| 1888c2ecf20Sopenharmony_ci | sw0p0 | | sw0p1 | | sw0p2 | 1898c2ecf20Sopenharmony_ci |-------| |-------| |-------| 1908c2ecf20Sopenharmony_ci 1918c2ecf20Sopenharmony_ci 1928c2ecf20Sopenharmony_ci 1938c2ecf20Sopenharmony_ciSlave MDIO bus 1948c2ecf20Sopenharmony_ci-------------- 1958c2ecf20Sopenharmony_ci 1968c2ecf20Sopenharmony_ciIn order to be able to read to/from a switch PHY built into it, DSA creates a 1978c2ecf20Sopenharmony_cislave MDIO bus which allows a specific switch driver to divert and intercept 1988c2ecf20Sopenharmony_ciMDIO reads/writes towards specific PHY addresses. In most MDIO-connected 1998c2ecf20Sopenharmony_ciswitches, these functions would utilize direct or indirect PHY addressing mode 2008c2ecf20Sopenharmony_cito return standard MII registers from the switch builtin PHYs, allowing the PHY 2018c2ecf20Sopenharmony_cilibrary and/or to return link status, link partner pages, auto-negotiation 2028c2ecf20Sopenharmony_ciresults etc.. 2038c2ecf20Sopenharmony_ci 2048c2ecf20Sopenharmony_ciFor Ethernet switches which have both external and internal MDIO busses, the 2058c2ecf20Sopenharmony_cislave MII bus can be utilized to mux/demux MDIO reads and writes towards either 2068c2ecf20Sopenharmony_ciinternal or external MDIO devices this switch might be connected to: internal 2078c2ecf20Sopenharmony_ciPHYs, external PHYs, or even external switches. 2088c2ecf20Sopenharmony_ci 2098c2ecf20Sopenharmony_ciData structures 2108c2ecf20Sopenharmony_ci--------------- 2118c2ecf20Sopenharmony_ci 2128c2ecf20Sopenharmony_ciDSA data structures are defined in ``include/net/dsa.h`` as well as 2138c2ecf20Sopenharmony_ci``net/dsa/dsa_priv.h``: 2148c2ecf20Sopenharmony_ci 2158c2ecf20Sopenharmony_ci- ``dsa_chip_data``: platform data configuration for a given switch device, 2168c2ecf20Sopenharmony_ci this structure describes a switch device's parent device, its address, as 2178c2ecf20Sopenharmony_ci well as various properties of its ports: names/labels, and finally a routing 2188c2ecf20Sopenharmony_ci table indication (when cascading switches) 2198c2ecf20Sopenharmony_ci 2208c2ecf20Sopenharmony_ci- ``dsa_platform_data``: platform device configuration data which can reference 2218c2ecf20Sopenharmony_ci a collection of dsa_chip_data structure if multiples switches are cascaded, 2228c2ecf20Sopenharmony_ci the master network device this switch tree is attached to needs to be 2238c2ecf20Sopenharmony_ci referenced 2248c2ecf20Sopenharmony_ci 2258c2ecf20Sopenharmony_ci- ``dsa_switch_tree``: structure assigned to the master network device under 2268c2ecf20Sopenharmony_ci ``dsa_ptr``, this structure references a dsa_platform_data structure as well as 2278c2ecf20Sopenharmony_ci the tagging protocol supported by the switch tree, and which receive/transmit 2288c2ecf20Sopenharmony_ci function hooks should be invoked, information about the directly attached 2298c2ecf20Sopenharmony_ci switch is also provided: CPU port. Finally, a collection of dsa_switch are 2308c2ecf20Sopenharmony_ci referenced to address individual switches in the tree. 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ci- ``dsa_switch``: structure describing a switch device in the tree, referencing 2338c2ecf20Sopenharmony_ci a ``dsa_switch_tree`` as a backpointer, slave network devices, master network 2348c2ecf20Sopenharmony_ci device, and a reference to the backing``dsa_switch_ops`` 2358c2ecf20Sopenharmony_ci 2368c2ecf20Sopenharmony_ci- ``dsa_switch_ops``: structure referencing function pointers, see below for a 2378c2ecf20Sopenharmony_ci full description. 2388c2ecf20Sopenharmony_ci 2398c2ecf20Sopenharmony_ciDesign limitations 2408c2ecf20Sopenharmony_ci================== 2418c2ecf20Sopenharmony_ci 2428c2ecf20Sopenharmony_ciLimits on the number of devices and ports 2438c2ecf20Sopenharmony_ci----------------------------------------- 2448c2ecf20Sopenharmony_ci 2458c2ecf20Sopenharmony_ciDSA currently limits the number of maximum switches within a tree to 4 2468c2ecf20Sopenharmony_ci(``DSA_MAX_SWITCHES``), and the number of ports per switch to 12 (``DSA_MAX_PORTS``). 2478c2ecf20Sopenharmony_ciThese limits could be extended to support larger configurations would this need 2488c2ecf20Sopenharmony_ciarise. 2498c2ecf20Sopenharmony_ci 2508c2ecf20Sopenharmony_ciLack of CPU/DSA network devices 2518c2ecf20Sopenharmony_ci------------------------------- 2528c2ecf20Sopenharmony_ci 2538c2ecf20Sopenharmony_ciDSA does not currently create slave network devices for the CPU or DSA ports, as 2548c2ecf20Sopenharmony_cidescribed before. This might be an issue in the following cases: 2558c2ecf20Sopenharmony_ci 2568c2ecf20Sopenharmony_ci- inability to fetch switch CPU port statistics counters using ethtool, which 2578c2ecf20Sopenharmony_ci can make it harder to debug MDIO switch connected using xMII interfaces 2588c2ecf20Sopenharmony_ci 2598c2ecf20Sopenharmony_ci- inability to configure the CPU port link parameters based on the Ethernet 2608c2ecf20Sopenharmony_ci controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/ 2618c2ecf20Sopenharmony_ci 2628c2ecf20Sopenharmony_ci- inability to configure specific VLAN IDs / trunking VLANs between switches 2638c2ecf20Sopenharmony_ci when using a cascaded setup 2648c2ecf20Sopenharmony_ci 2658c2ecf20Sopenharmony_ciCommon pitfalls using DSA setups 2668c2ecf20Sopenharmony_ci-------------------------------- 2678c2ecf20Sopenharmony_ci 2688c2ecf20Sopenharmony_ciOnce a master network device is configured to use DSA (dev->dsa_ptr becomes 2698c2ecf20Sopenharmony_cinon-NULL), and the switch behind it expects a tagging protocol, this network 2708c2ecf20Sopenharmony_ciinterface can only exclusively be used as a conduit interface. Sending packets 2718c2ecf20Sopenharmony_cidirectly through this interface (e.g.: opening a socket using this interface) 2728c2ecf20Sopenharmony_ciwill not make us go through the switch tagging protocol transmit function, so 2738c2ecf20Sopenharmony_cithe Ethernet switch on the other end, expecting a tag will typically drop this 2748c2ecf20Sopenharmony_ciframe. 2758c2ecf20Sopenharmony_ci 2768c2ecf20Sopenharmony_ciSlave network devices check that the master network device is UP before allowing 2778c2ecf20Sopenharmony_ciyou to administratively bring UP these slave network devices. A common 2788c2ecf20Sopenharmony_ciconfiguration mistake is forgetting to bring UP the master network device first. 2798c2ecf20Sopenharmony_ci 2808c2ecf20Sopenharmony_ciInteractions with other subsystems 2818c2ecf20Sopenharmony_ci================================== 2828c2ecf20Sopenharmony_ci 2838c2ecf20Sopenharmony_ciDSA currently leverages the following subsystems: 2848c2ecf20Sopenharmony_ci 2858c2ecf20Sopenharmony_ci- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c`` 2868c2ecf20Sopenharmony_ci- Switchdev:``net/switchdev/*`` 2878c2ecf20Sopenharmony_ci- Device Tree for various of_* functions 2888c2ecf20Sopenharmony_ci 2898c2ecf20Sopenharmony_ciMDIO/PHY library 2908c2ecf20Sopenharmony_ci---------------- 2918c2ecf20Sopenharmony_ci 2928c2ecf20Sopenharmony_ciSlave network devices exposed by DSA may or may not be interfacing with PHY 2938c2ecf20Sopenharmony_cidevices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA 2948c2ecf20Sopenharmony_cisubsystem deals with all possible combinations: 2958c2ecf20Sopenharmony_ci 2968c2ecf20Sopenharmony_ci- internal PHY devices, built into the Ethernet switch hardware 2978c2ecf20Sopenharmony_ci- external PHY devices, connected via an internal or external MDIO bus 2988c2ecf20Sopenharmony_ci- internal PHY devices, connected via an internal MDIO bus 2998c2ecf20Sopenharmony_ci- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a 3008c2ecf20Sopenharmony_ci fixed PHYs 3018c2ecf20Sopenharmony_ci 3028c2ecf20Sopenharmony_ciThe PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the 3038c2ecf20Sopenharmony_cilogic basically looks like this: 3048c2ecf20Sopenharmony_ci 3058c2ecf20Sopenharmony_ci- if Device Tree is used, the PHY device is looked up using the standard 3068c2ecf20Sopenharmony_ci "phy-handle" property, if found, this PHY device is created and registered 3078c2ecf20Sopenharmony_ci using ``of_phy_connect()`` 3088c2ecf20Sopenharmony_ci 3098c2ecf20Sopenharmony_ci- if Device Tree is used, and the PHY device is "fixed", that is, conforms to 3108c2ecf20Sopenharmony_ci the definition of a non-MDIO managed PHY as defined in 3118c2ecf20Sopenharmony_ci ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered 3128c2ecf20Sopenharmony_ci and connected transparently using the special fixed MDIO bus driver 3138c2ecf20Sopenharmony_ci 3148c2ecf20Sopenharmony_ci- finally, if the PHY is built into the switch, as is very common with 3158c2ecf20Sopenharmony_ci standalone switch packages, the PHY is probed using the slave MII bus created 3168c2ecf20Sopenharmony_ci by DSA 3178c2ecf20Sopenharmony_ci 3188c2ecf20Sopenharmony_ci 3198c2ecf20Sopenharmony_ciSWITCHDEV 3208c2ecf20Sopenharmony_ci--------- 3218c2ecf20Sopenharmony_ci 3228c2ecf20Sopenharmony_ciDSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and 3238c2ecf20Sopenharmony_cimore specifically with its VLAN filtering portion when configuring VLANs on top 3248c2ecf20Sopenharmony_ciof per-port slave network devices. Since DSA primarily deals with 3258c2ecf20Sopenharmony_ciMDIO-connected switches, although not exclusively, SWITCHDEV's 3268c2ecf20Sopenharmony_ciprepare/abort/commit phases are often simplified into a prepare phase which 3278c2ecf20Sopenharmony_cichecks whether the operation is supported by the DSA switch driver, and a commit 3288c2ecf20Sopenharmony_ciphase which applies the changes. 3298c2ecf20Sopenharmony_ci 3308c2ecf20Sopenharmony_ciAs of today, the only SWITCHDEV objects supported by DSA are the FDB and VLAN 3318c2ecf20Sopenharmony_ciobjects. 3328c2ecf20Sopenharmony_ci 3338c2ecf20Sopenharmony_ciDevice Tree 3348c2ecf20Sopenharmony_ci----------- 3358c2ecf20Sopenharmony_ci 3368c2ecf20Sopenharmony_ciDSA features a standardized binding which is documented in 3378c2ecf20Sopenharmony_ci``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper 3388c2ecf20Sopenharmony_cifunctions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query 3398c2ecf20Sopenharmony_ciper-port PHY specific details: interface connection, MDIO bus location etc.. 3408c2ecf20Sopenharmony_ci 3418c2ecf20Sopenharmony_ciDriver development 3428c2ecf20Sopenharmony_ci================== 3438c2ecf20Sopenharmony_ci 3448c2ecf20Sopenharmony_ciDSA switch drivers need to implement a dsa_switch_ops structure which will 3458c2ecf20Sopenharmony_cicontain the various members described below. 3468c2ecf20Sopenharmony_ci 3478c2ecf20Sopenharmony_ci``register_switch_driver()`` registers this dsa_switch_ops in its internal list 3488c2ecf20Sopenharmony_ciof drivers to probe for. ``unregister_switch_driver()`` does the exact opposite. 3498c2ecf20Sopenharmony_ci 3508c2ecf20Sopenharmony_ciUnless requested differently by setting the priv_size member accordingly, DSA 3518c2ecf20Sopenharmony_cidoes not allocate any driver private context space. 3528c2ecf20Sopenharmony_ci 3538c2ecf20Sopenharmony_ciSwitch configuration 3548c2ecf20Sopenharmony_ci-------------------- 3558c2ecf20Sopenharmony_ci 3568c2ecf20Sopenharmony_ci- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported, 3578c2ecf20Sopenharmony_ci should be a valid value from the ``dsa_tag_protocol`` enum 3588c2ecf20Sopenharmony_ci 3598c2ecf20Sopenharmony_ci- ``probe``: probe routine which will be invoked by the DSA platform device upon 3608c2ecf20Sopenharmony_ci registration to test for the presence/absence of a switch device. For MDIO 3618c2ecf20Sopenharmony_ci devices, it is recommended to issue a read towards internal registers using 3628c2ecf20Sopenharmony_ci the switch pseudo-PHY and return whether this is a supported device. For other 3638c2ecf20Sopenharmony_ci buses, return a non-NULL string 3648c2ecf20Sopenharmony_ci 3658c2ecf20Sopenharmony_ci- ``setup``: setup function for the switch, this function is responsible for setting 3668c2ecf20Sopenharmony_ci up the ``dsa_switch_ops`` private structure with all it needs: register maps, 3678c2ecf20Sopenharmony_ci interrupts, mutexes, locks etc.. This function is also expected to properly 3688c2ecf20Sopenharmony_ci configure the switch to separate all network interfaces from each other, that 3698c2ecf20Sopenharmony_ci is, they should be isolated by the switch hardware itself, typically by creating 3708c2ecf20Sopenharmony_ci a Port-based VLAN ID for each port and allowing only the CPU port and the 3718c2ecf20Sopenharmony_ci specific port to be in the forwarding vector. Ports that are unused by the 3728c2ecf20Sopenharmony_ci platform should be disabled. Past this function, the switch is expected to be 3738c2ecf20Sopenharmony_ci fully configured and ready to serve any kind of request. It is recommended 3748c2ecf20Sopenharmony_ci to issue a software reset of the switch during this setup function in order to 3758c2ecf20Sopenharmony_ci avoid relying on what a previous software agent such as a bootloader/firmware 3768c2ecf20Sopenharmony_ci may have previously configured. 3778c2ecf20Sopenharmony_ci 3788c2ecf20Sopenharmony_ciPHY devices and link management 3798c2ecf20Sopenharmony_ci------------------------------- 3808c2ecf20Sopenharmony_ci 3818c2ecf20Sopenharmony_ci- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs, 3828c2ecf20Sopenharmony_ci if the PHY library PHY driver needs to know about information it cannot obtain 3838c2ecf20Sopenharmony_ci on its own (e.g.: coming from switch memory mapped registers), this function 3848c2ecf20Sopenharmony_ci should return a 32-bits bitmask of "flags", that is private between the switch 3858c2ecf20Sopenharmony_ci driver and the Ethernet PHY driver in ``drivers/net/phy/\*``. 3868c2ecf20Sopenharmony_ci 3878c2ecf20Sopenharmony_ci- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read 3888c2ecf20Sopenharmony_ci the switch port MDIO registers. If unavailable, return 0xffff for each read. 3898c2ecf20Sopenharmony_ci For builtin switch Ethernet PHYs, this function should allow reading the link 3908c2ecf20Sopenharmony_ci status, auto-negotiation results, link partner pages etc.. 3918c2ecf20Sopenharmony_ci 3928c2ecf20Sopenharmony_ci- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write 3938c2ecf20Sopenharmony_ci to the switch port MDIO registers. If unavailable return a negative error 3948c2ecf20Sopenharmony_ci code. 3958c2ecf20Sopenharmony_ci 3968c2ecf20Sopenharmony_ci- ``adjust_link``: Function invoked by the PHY library when a slave network device 3978c2ecf20Sopenharmony_ci is attached to a PHY device. This function is responsible for appropriately 3988c2ecf20Sopenharmony_ci configuring the switch port link parameters: speed, duplex, pause based on 3998c2ecf20Sopenharmony_ci what the ``phy_device`` is providing. 4008c2ecf20Sopenharmony_ci 4018c2ecf20Sopenharmony_ci- ``fixed_link_update``: Function invoked by the PHY library, and specifically by 4028c2ecf20Sopenharmony_ci the fixed PHY driver asking the switch driver for link parameters that could 4038c2ecf20Sopenharmony_ci not be auto-negotiated, or obtained by reading the PHY registers through MDIO. 4048c2ecf20Sopenharmony_ci This is particularly useful for specific kinds of hardware such as QSGMII, 4058c2ecf20Sopenharmony_ci MoCA or other kinds of non-MDIO managed PHYs where out of band link 4068c2ecf20Sopenharmony_ci information is obtained 4078c2ecf20Sopenharmony_ci 4088c2ecf20Sopenharmony_ciEthtool operations 4098c2ecf20Sopenharmony_ci------------------ 4108c2ecf20Sopenharmony_ci 4118c2ecf20Sopenharmony_ci- ``get_strings``: ethtool function used to query the driver's strings, will 4128c2ecf20Sopenharmony_ci typically return statistics strings, private flags strings etc. 4138c2ecf20Sopenharmony_ci 4148c2ecf20Sopenharmony_ci- ``get_ethtool_stats``: ethtool function used to query per-port statistics and 4158c2ecf20Sopenharmony_ci return their values. DSA overlays slave network devices general statistics: 4168c2ecf20Sopenharmony_ci RX/TX counters from the network device, with switch driver specific statistics 4178c2ecf20Sopenharmony_ci per port 4188c2ecf20Sopenharmony_ci 4198c2ecf20Sopenharmony_ci- ``get_sset_count``: ethtool function used to query the number of statistics items 4208c2ecf20Sopenharmony_ci 4218c2ecf20Sopenharmony_ci- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this 4228c2ecf20Sopenharmony_ci function may, for certain implementations also query the master network device 4238c2ecf20Sopenharmony_ci Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN 4248c2ecf20Sopenharmony_ci 4258c2ecf20Sopenharmony_ci- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port, 4268c2ecf20Sopenharmony_ci direct counterpart to set_wol with similar restrictions 4278c2ecf20Sopenharmony_ci 4288c2ecf20Sopenharmony_ci- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green 4298c2ecf20Sopenharmony_ci Ethernet) settings, can optionally invoke the PHY library to enable EEE at the 4308c2ecf20Sopenharmony_ci PHY level if relevant. This function should enable EEE at the switch port MAC 4318c2ecf20Sopenharmony_ci controller and data-processing logic 4328c2ecf20Sopenharmony_ci 4338c2ecf20Sopenharmony_ci- ``get_eee``: ethtool function which is used to query a switch port EEE settings, 4348c2ecf20Sopenharmony_ci this function should return the EEE state of the switch port MAC controller 4358c2ecf20Sopenharmony_ci and data-processing logic as well as query the PHY for its currently configured 4368c2ecf20Sopenharmony_ci EEE settings 4378c2ecf20Sopenharmony_ci 4388c2ecf20Sopenharmony_ci- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM 4398c2ecf20Sopenharmony_ci length/size in bytes 4408c2ecf20Sopenharmony_ci 4418c2ecf20Sopenharmony_ci- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents 4428c2ecf20Sopenharmony_ci 4438c2ecf20Sopenharmony_ci- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM 4448c2ecf20Sopenharmony_ci 4458c2ecf20Sopenharmony_ci- ``get_regs_len``: ethtool function returning the register length for a given 4468c2ecf20Sopenharmony_ci switch 4478c2ecf20Sopenharmony_ci 4488c2ecf20Sopenharmony_ci- ``get_regs``: ethtool function returning the Ethernet switch internal register 4498c2ecf20Sopenharmony_ci contents. This function might require user-land code in ethtool to 4508c2ecf20Sopenharmony_ci pretty-print register values and registers 4518c2ecf20Sopenharmony_ci 4528c2ecf20Sopenharmony_ciPower management 4538c2ecf20Sopenharmony_ci---------------- 4548c2ecf20Sopenharmony_ci 4558c2ecf20Sopenharmony_ci- ``suspend``: function invoked by the DSA platform device when the system goes to 4568c2ecf20Sopenharmony_ci suspend, should quiesce all Ethernet switch activities, but keep ports 4578c2ecf20Sopenharmony_ci participating in Wake-on-LAN active as well as additional wake-up logic if 4588c2ecf20Sopenharmony_ci supported 4598c2ecf20Sopenharmony_ci 4608c2ecf20Sopenharmony_ci- ``resume``: function invoked by the DSA platform device when the system resumes, 4618c2ecf20Sopenharmony_ci should resume all Ethernet switch activities and re-configure the switch to be 4628c2ecf20Sopenharmony_ci in a fully active state 4638c2ecf20Sopenharmony_ci 4648c2ecf20Sopenharmony_ci- ``port_enable``: function invoked by the DSA slave network device ndo_open 4658c2ecf20Sopenharmony_ci function when a port is administratively brought up, this function should be 4668c2ecf20Sopenharmony_ci fully enabling a given switch port. DSA takes care of marking the port with 4678c2ecf20Sopenharmony_ci ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it 4688c2ecf20Sopenharmony_ci was not, and propagating these changes down to the hardware 4698c2ecf20Sopenharmony_ci 4708c2ecf20Sopenharmony_ci- ``port_disable``: function invoked by the DSA slave network device ndo_close 4718c2ecf20Sopenharmony_ci function when a port is administratively brought down, this function should be 4728c2ecf20Sopenharmony_ci fully disabling a given switch port. DSA takes care of marking the port with 4738c2ecf20Sopenharmony_ci ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is 4748c2ecf20Sopenharmony_ci disabled while being a bridge member 4758c2ecf20Sopenharmony_ci 4768c2ecf20Sopenharmony_ciBridge layer 4778c2ecf20Sopenharmony_ci------------ 4788c2ecf20Sopenharmony_ci 4798c2ecf20Sopenharmony_ci- ``port_bridge_join``: bridge layer function invoked when a given switch port is 4808c2ecf20Sopenharmony_ci added to a bridge, this function should be doing the necessary at the switch 4818c2ecf20Sopenharmony_ci level to permit the joining port from being added to the relevant logical 4828c2ecf20Sopenharmony_ci domain for it to ingress/egress traffic with other members of the bridge. 4838c2ecf20Sopenharmony_ci 4848c2ecf20Sopenharmony_ci- ``port_bridge_leave``: bridge layer function invoked when a given switch port is 4858c2ecf20Sopenharmony_ci removed from a bridge, this function should be doing the necessary at the 4868c2ecf20Sopenharmony_ci switch level to deny the leaving port from ingress/egress traffic from the 4878c2ecf20Sopenharmony_ci remaining bridge members. When the port leaves the bridge, it should be aged 4888c2ecf20Sopenharmony_ci out at the switch hardware for the switch to (re) learn MAC addresses behind 4898c2ecf20Sopenharmony_ci this port. 4908c2ecf20Sopenharmony_ci 4918c2ecf20Sopenharmony_ci- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP 4928c2ecf20Sopenharmony_ci state is computed by the bridge layer and should be propagated to switch 4938c2ecf20Sopenharmony_ci hardware to forward/block/learn traffic. The switch driver is responsible for 4948c2ecf20Sopenharmony_ci computing a STP state change based on current and asked parameters and perform 4958c2ecf20Sopenharmony_ci the relevant ageing based on the intersection results 4968c2ecf20Sopenharmony_ci 4978c2ecf20Sopenharmony_ciBridge VLAN filtering 4988c2ecf20Sopenharmony_ci--------------------- 4998c2ecf20Sopenharmony_ci 5008c2ecf20Sopenharmony_ci- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets 5018c2ecf20Sopenharmony_ci configured for turning on or off VLAN filtering. If nothing specific needs to 5028c2ecf20Sopenharmony_ci be done at the hardware level, this callback does not need to be implemented. 5038c2ecf20Sopenharmony_ci When VLAN filtering is turned on, the hardware must be programmed with 5048c2ecf20Sopenharmony_ci rejecting 802.1Q frames which have VLAN IDs outside of the programmed allowed 5058c2ecf20Sopenharmony_ci VLAN ID map/rules. If there is no PVID programmed into the switch port, 5068c2ecf20Sopenharmony_ci untagged frames must be rejected as well. When turned off the switch must 5078c2ecf20Sopenharmony_ci accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are 5088c2ecf20Sopenharmony_ci allowed. 5098c2ecf20Sopenharmony_ci 5108c2ecf20Sopenharmony_ci- ``port_vlan_prepare``: bridge layer function invoked when the bridge prepares the 5118c2ecf20Sopenharmony_ci configuration of a VLAN on the given port. If the operation is not supported 5128c2ecf20Sopenharmony_ci by the hardware, this function should return ``-EOPNOTSUPP`` to inform the bridge 5138c2ecf20Sopenharmony_ci code to fallback to a software implementation. No hardware setup must be done 5148c2ecf20Sopenharmony_ci in this function. See port_vlan_add for this and details. 5158c2ecf20Sopenharmony_ci 5168c2ecf20Sopenharmony_ci- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured 5178c2ecf20Sopenharmony_ci (tagged or untagged) for the given switch port 5188c2ecf20Sopenharmony_ci 5198c2ecf20Sopenharmony_ci- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the 5208c2ecf20Sopenharmony_ci given switch port 5218c2ecf20Sopenharmony_ci 5228c2ecf20Sopenharmony_ci- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback 5238c2ecf20Sopenharmony_ci function that the driver has to call for each VLAN the given port is a member 5248c2ecf20Sopenharmony_ci of. A switchdev object is used to carry the VID and bridge flags. 5258c2ecf20Sopenharmony_ci 5268c2ecf20Sopenharmony_ci- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a 5278c2ecf20Sopenharmony_ci Forwarding Database entry, the switch hardware should be programmed with the 5288c2ecf20Sopenharmony_ci specified address in the specified VLAN Id in the forwarding database 5298c2ecf20Sopenharmony_ci associated with this VLAN ID. If the operation is not supported, this 5308c2ecf20Sopenharmony_ci function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to 5318c2ecf20Sopenharmony_ci a software implementation. 5328c2ecf20Sopenharmony_ci 5338c2ecf20Sopenharmony_ci.. note:: VLAN ID 0 corresponds to the port private database, which, in the context 5348c2ecf20Sopenharmony_ci of DSA, would be its port-based VLAN, used by the associated bridge device. 5358c2ecf20Sopenharmony_ci 5368c2ecf20Sopenharmony_ci- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a 5378c2ecf20Sopenharmony_ci Forwarding Database entry, the switch hardware should be programmed to delete 5388c2ecf20Sopenharmony_ci the specified MAC address from the specified VLAN ID if it was mapped into 5398c2ecf20Sopenharmony_ci this port forwarding database 5408c2ecf20Sopenharmony_ci 5418c2ecf20Sopenharmony_ci- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback 5428c2ecf20Sopenharmony_ci function that the driver has to call for each MAC address known to be behind 5438c2ecf20Sopenharmony_ci the given port. A switchdev object is used to carry the VID and FDB info. 5448c2ecf20Sopenharmony_ci 5458c2ecf20Sopenharmony_ci- ``port_mdb_prepare``: bridge layer function invoked when the bridge prepares the 5468c2ecf20Sopenharmony_ci installation of a multicast database entry. If the operation is not supported, 5478c2ecf20Sopenharmony_ci this function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback 5488c2ecf20Sopenharmony_ci to a software implementation. No hardware setup must be done in this function. 5498c2ecf20Sopenharmony_ci See ``port_fdb_add`` for this and details. 5508c2ecf20Sopenharmony_ci 5518c2ecf20Sopenharmony_ci- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install 5528c2ecf20Sopenharmony_ci a multicast database entry, the switch hardware should be programmed with the 5538c2ecf20Sopenharmony_ci specified address in the specified VLAN ID in the forwarding database 5548c2ecf20Sopenharmony_ci associated with this VLAN ID. 5558c2ecf20Sopenharmony_ci 5568c2ecf20Sopenharmony_ci.. note:: VLAN ID 0 corresponds to the port private database, which, in the context 5578c2ecf20Sopenharmony_ci of DSA, would be its port-based VLAN, used by the associated bridge device. 5588c2ecf20Sopenharmony_ci 5598c2ecf20Sopenharmony_ci- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a 5608c2ecf20Sopenharmony_ci multicast database entry, the switch hardware should be programmed to delete 5618c2ecf20Sopenharmony_ci the specified MAC address from the specified VLAN ID if it was mapped into 5628c2ecf20Sopenharmony_ci this port forwarding database. 5638c2ecf20Sopenharmony_ci 5648c2ecf20Sopenharmony_ci- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback 5658c2ecf20Sopenharmony_ci function that the driver has to call for each MAC address known to be behind 5668c2ecf20Sopenharmony_ci the given port. A switchdev object is used to carry the VID and MDB info. 5678c2ecf20Sopenharmony_ci 5688c2ecf20Sopenharmony_ciTODO 5698c2ecf20Sopenharmony_ci==== 5708c2ecf20Sopenharmony_ci 5718c2ecf20Sopenharmony_ciMaking SWITCHDEV and DSA converge towards an unified codebase 5728c2ecf20Sopenharmony_ci------------------------------------------------------------- 5738c2ecf20Sopenharmony_ci 5748c2ecf20Sopenharmony_ciSWITCHDEV properly takes care of abstracting the networking stack with offload 5758c2ecf20Sopenharmony_cicapable hardware, but does not enforce a strict switch device driver model. On 5768c2ecf20Sopenharmony_cithe other DSA enforces a fairly strict device driver model, and deals with most 5778c2ecf20Sopenharmony_ciof the switch specific. At some point we should envision a merger between these 5788c2ecf20Sopenharmony_citwo subsystems and get the best of both worlds. 5798c2ecf20Sopenharmony_ci 5808c2ecf20Sopenharmony_ciOther hanging fruits 5818c2ecf20Sopenharmony_ci-------------------- 5828c2ecf20Sopenharmony_ci 5838c2ecf20Sopenharmony_ci- making the number of ports fully dynamic and not dependent on ``DSA_MAX_PORTS`` 5848c2ecf20Sopenharmony_ci- allowing more than one CPU/management interface: 5858c2ecf20Sopenharmony_ci http://comments.gmane.org/gmane.linux.network/365657 5868c2ecf20Sopenharmony_ci- porting more drivers from other vendors: 5878c2ecf20Sopenharmony_ci http://comments.gmane.org/gmane.linux.network/365510 588