18c2ecf20Sopenharmony_ci============
28c2ecf20Sopenharmony_ciArchitecture
38c2ecf20Sopenharmony_ci============
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ciThis document describes the **Distributed Switch Architecture (DSA)** subsystem
68c2ecf20Sopenharmony_cidesign principles, limitations, interactions with other subsystems, and how to
78c2ecf20Sopenharmony_cidevelop drivers for this subsystem as well as a TODO for developers interested
88c2ecf20Sopenharmony_ciin joining the effort.
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciDesign principles
118c2ecf20Sopenharmony_ci=================
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ciThe Distributed Switch Architecture is a subsystem which was primarily designed
148c2ecf20Sopenharmony_cito support Marvell Ethernet switches (MV88E6xxx, a.k.a Linkstreet product line)
158c2ecf20Sopenharmony_ciusing Linux, but has since evolved to support other vendors as well.
168c2ecf20Sopenharmony_ci
178c2ecf20Sopenharmony_ciThe original philosophy behind this design was to be able to use unmodified
188c2ecf20Sopenharmony_ciLinux tools such as bridge, iproute2, ifconfig to work transparently whether
198c2ecf20Sopenharmony_cithey configured/queried a switch port network device or a regular network
208c2ecf20Sopenharmony_cidevice.
218c2ecf20Sopenharmony_ci
228c2ecf20Sopenharmony_ciAn Ethernet switch is typically comprised of multiple front-panel ports, and one
238c2ecf20Sopenharmony_cior more CPU or management port. The DSA subsystem currently relies on the
248c2ecf20Sopenharmony_cipresence of a management port connected to an Ethernet controller capable of
258c2ecf20Sopenharmony_cireceiving Ethernet frames from the switch. This is a very common setup for all
268c2ecf20Sopenharmony_cikinds of Ethernet switches found in Small Home and Office products: routers,
278c2ecf20Sopenharmony_cigateways, or even top-of-the rack switches. This host Ethernet controller will
288c2ecf20Sopenharmony_cibe later referred to as "master" and "cpu" in DSA terminology and code.
298c2ecf20Sopenharmony_ci
308c2ecf20Sopenharmony_ciThe D in DSA stands for Distributed, because the subsystem has been designed
318c2ecf20Sopenharmony_ciwith the ability to configure and manage cascaded switches on top of each other
328c2ecf20Sopenharmony_ciusing upstream and downstream Ethernet links between switches. These specific
338c2ecf20Sopenharmony_ciports are referred to as "dsa" ports in DSA terminology and code. A collection
348c2ecf20Sopenharmony_ciof multiple switches connected to each other is called a "switch tree".
358c2ecf20Sopenharmony_ci
368c2ecf20Sopenharmony_ciFor each front-panel port, DSA will create specialized network devices which are
378c2ecf20Sopenharmony_ciused as controlling and data-flowing endpoints for use by the Linux networking
388c2ecf20Sopenharmony_cistack. These specialized network interfaces are referred to as "slave" network
398c2ecf20Sopenharmony_ciinterfaces in DSA terminology and code.
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_ciThe ideal case for using DSA is when an Ethernet switch supports a "switch tag"
428c2ecf20Sopenharmony_ciwhich is a hardware feature making the switch insert a specific tag for each
438c2ecf20Sopenharmony_ciEthernet frames it received to/from specific ports to help the management
448c2ecf20Sopenharmony_ciinterface figure out:
458c2ecf20Sopenharmony_ci
468c2ecf20Sopenharmony_ci- what port is this frame coming from
478c2ecf20Sopenharmony_ci- what was the reason why this frame got forwarded
488c2ecf20Sopenharmony_ci- how to send CPU originated traffic to specific ports
498c2ecf20Sopenharmony_ci
508c2ecf20Sopenharmony_ciThe subsystem does support switches not capable of inserting/stripping tags, but
518c2ecf20Sopenharmony_cithe features might be slightly limited in that case (traffic separation relies
528c2ecf20Sopenharmony_cion Port-based VLAN IDs).
538c2ecf20Sopenharmony_ci
548c2ecf20Sopenharmony_ciNote that DSA does not currently create network interfaces for the "cpu" and
558c2ecf20Sopenharmony_ci"dsa" ports because:
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ci- the "cpu" port is the Ethernet switch facing side of the management
588c2ecf20Sopenharmony_ci  controller, and as such, would create a duplication of feature, since you
598c2ecf20Sopenharmony_ci  would get two interfaces for the same conduit: master netdev, and "cpu" netdev
608c2ecf20Sopenharmony_ci
618c2ecf20Sopenharmony_ci- the "dsa" port(s) are just conduits between two or more switches, and as such
628c2ecf20Sopenharmony_ci  cannot really be used as proper network interfaces either, only the
638c2ecf20Sopenharmony_ci  downstream, or the top-most upstream interface makes sense with that model
648c2ecf20Sopenharmony_ci
658c2ecf20Sopenharmony_ciSwitch tagging protocols
668c2ecf20Sopenharmony_ci------------------------
678c2ecf20Sopenharmony_ci
688c2ecf20Sopenharmony_ciDSA currently supports 5 different tagging protocols, and a tag-less mode as
698c2ecf20Sopenharmony_ciwell. The different protocols are implemented in:
708c2ecf20Sopenharmony_ci
718c2ecf20Sopenharmony_ci- ``net/dsa/tag_trailer.c``: Marvell's 4 trailer tag mode (legacy)
728c2ecf20Sopenharmony_ci- ``net/dsa/tag_dsa.c``: Marvell's original DSA tag
738c2ecf20Sopenharmony_ci- ``net/dsa/tag_edsa.c``: Marvell's enhanced DSA tag
748c2ecf20Sopenharmony_ci- ``net/dsa/tag_brcm.c``: Broadcom's 4 bytes tag
758c2ecf20Sopenharmony_ci- ``net/dsa/tag_qca.c``: Qualcomm's 2 bytes tag
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ciThe exact format of the tag protocol is vendor specific, but in general, they
788c2ecf20Sopenharmony_ciall contain something which:
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ci- identifies which port the Ethernet frame came from/should be sent to
818c2ecf20Sopenharmony_ci- provides a reason why this frame was forwarded to the management interface
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_ciMaster network devices
848c2ecf20Sopenharmony_ci----------------------
858c2ecf20Sopenharmony_ci
868c2ecf20Sopenharmony_ciMaster network devices are regular, unmodified Linux network device drivers for
878c2ecf20Sopenharmony_cithe CPU/management Ethernet interface. Such a driver might occasionally need to
888c2ecf20Sopenharmony_ciknow whether DSA is enabled (e.g.: to enable/disable specific offload features),
898c2ecf20Sopenharmony_cibut the DSA subsystem has been proven to work with industry standard drivers:
908c2ecf20Sopenharmony_ci``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these
918c2ecf20Sopenharmony_cidrivers. Such network devices are also often referred to as conduit network
928c2ecf20Sopenharmony_cidevices since they act as a pipe between the host processor and the hardware
938c2ecf20Sopenharmony_ciEthernet switch.
948c2ecf20Sopenharmony_ci
958c2ecf20Sopenharmony_ciNetworking stack hooks
968c2ecf20Sopenharmony_ci----------------------
978c2ecf20Sopenharmony_ci
988c2ecf20Sopenharmony_ciWhen a master netdev is used with DSA, a small hook is placed in the
998c2ecf20Sopenharmony_cinetworking stack is in order to have the DSA subsystem process the Ethernet
1008c2ecf20Sopenharmony_ciswitch specific tagging protocol. DSA accomplishes this by registering a
1018c2ecf20Sopenharmony_cispecific (and fake) Ethernet type (later becoming ``skb->protocol``) with the
1028c2ecf20Sopenharmony_cinetworking stack, this is also known as a ``ptype`` or ``packet_type``. A typical
1038c2ecf20Sopenharmony_ciEthernet Frame receive sequence looks like this:
1048c2ecf20Sopenharmony_ci
1058c2ecf20Sopenharmony_ciMaster network device (e.g.: e1000e):
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ci1. Receive interrupt fires:
1088c2ecf20Sopenharmony_ci
1098c2ecf20Sopenharmony_ci        - receive function is invoked
1108c2ecf20Sopenharmony_ci        - basic packet processing is done: getting length, status etc.
1118c2ecf20Sopenharmony_ci        - packet is prepared to be processed by the Ethernet layer by calling
1128c2ecf20Sopenharmony_ci          ``eth_type_trans``
1138c2ecf20Sopenharmony_ci
1148c2ecf20Sopenharmony_ci2. net/ethernet/eth.c::
1158c2ecf20Sopenharmony_ci
1168c2ecf20Sopenharmony_ci          eth_type_trans(skb, dev)
1178c2ecf20Sopenharmony_ci                  if (dev->dsa_ptr != NULL)
1188c2ecf20Sopenharmony_ci                          -> skb->protocol = ETH_P_XDSA
1198c2ecf20Sopenharmony_ci
1208c2ecf20Sopenharmony_ci3. drivers/net/ethernet/\*::
1218c2ecf20Sopenharmony_ci
1228c2ecf20Sopenharmony_ci          netif_receive_skb(skb)
1238c2ecf20Sopenharmony_ci                  -> iterate over registered packet_type
1248c2ecf20Sopenharmony_ci                          -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv()
1258c2ecf20Sopenharmony_ci
1268c2ecf20Sopenharmony_ci4. net/dsa/dsa.c::
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ci          -> dsa_switch_rcv()
1298c2ecf20Sopenharmony_ci                  -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c'
1308c2ecf20Sopenharmony_ci
1318c2ecf20Sopenharmony_ci5. net/dsa/tag_*.c:
1328c2ecf20Sopenharmony_ci
1338c2ecf20Sopenharmony_ci        - inspect and strip switch tag protocol to determine originating port
1348c2ecf20Sopenharmony_ci        - locate per-port network device
1358c2ecf20Sopenharmony_ci        - invoke ``eth_type_trans()`` with the DSA slave network device
1368c2ecf20Sopenharmony_ci        - invoked ``netif_receive_skb()``
1378c2ecf20Sopenharmony_ci
1388c2ecf20Sopenharmony_ciPast this point, the DSA slave network devices get delivered regular Ethernet
1398c2ecf20Sopenharmony_ciframes that can be processed by the networking stack.
1408c2ecf20Sopenharmony_ci
1418c2ecf20Sopenharmony_ciSlave network devices
1428c2ecf20Sopenharmony_ci---------------------
1438c2ecf20Sopenharmony_ci
1448c2ecf20Sopenharmony_ciSlave network devices created by DSA are stacked on top of their master network
1458c2ecf20Sopenharmony_cidevice, each of these network interfaces will be responsible for being a
1468c2ecf20Sopenharmony_cicontrolling and data-flowing end-point for each front-panel port of the switch.
1478c2ecf20Sopenharmony_ciThese interfaces are specialized in order to:
1488c2ecf20Sopenharmony_ci
1498c2ecf20Sopenharmony_ci- insert/remove the switch tag protocol (if it exists) when sending traffic
1508c2ecf20Sopenharmony_ci  to/from specific switch ports
1518c2ecf20Sopenharmony_ci- query the switch for ethtool operations: statistics, link state,
1528c2ecf20Sopenharmony_ci  Wake-on-LAN, register dumps...
1538c2ecf20Sopenharmony_ci- external/internal PHY management: link, auto-negotiation etc.
1548c2ecf20Sopenharmony_ci
1558c2ecf20Sopenharmony_ciThese slave network devices have custom net_device_ops and ethtool_ops function
1568c2ecf20Sopenharmony_cipointers which allow DSA to introduce a level of layering between the networking
1578c2ecf20Sopenharmony_cistack/ethtool, and the switch driver implementation.
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ciUpon frame transmission from these slave network devices, DSA will look up which
1608c2ecf20Sopenharmony_ciswitch tagging protocol is currently registered with these network devices, and
1618c2ecf20Sopenharmony_ciinvoke a specific transmit routine which takes care of adding the relevant
1628c2ecf20Sopenharmony_ciswitch tag in the Ethernet frames.
1638c2ecf20Sopenharmony_ci
1648c2ecf20Sopenharmony_ciThese frames are then queued for transmission using the master network device
1658c2ecf20Sopenharmony_ci``ndo_start_xmit()`` function, since they contain the appropriate switch tag, the
1668c2ecf20Sopenharmony_ciEthernet switch will be able to process these incoming frames from the
1678c2ecf20Sopenharmony_cimanagement interface and delivers these frames to the physical switch port.
1688c2ecf20Sopenharmony_ci
1698c2ecf20Sopenharmony_ciGraphical representation
1708c2ecf20Sopenharmony_ci------------------------
1718c2ecf20Sopenharmony_ci
1728c2ecf20Sopenharmony_ciSummarized, this is basically how DSA looks like from a network device
1738c2ecf20Sopenharmony_ciperspective::
1748c2ecf20Sopenharmony_ci
1758c2ecf20Sopenharmony_ci
1768c2ecf20Sopenharmony_ci                |---------------------------
1778c2ecf20Sopenharmony_ci                | CPU network device (eth0)|
1788c2ecf20Sopenharmony_ci                ----------------------------
1798c2ecf20Sopenharmony_ci                | <tag added by switch     |
1808c2ecf20Sopenharmony_ci                |                          |
1818c2ecf20Sopenharmony_ci                |                          |
1828c2ecf20Sopenharmony_ci                |        tag added by CPU> |
1838c2ecf20Sopenharmony_ci        |--------------------------------------------|
1848c2ecf20Sopenharmony_ci        |            Switch driver                   |
1858c2ecf20Sopenharmony_ci        |--------------------------------------------|
1868c2ecf20Sopenharmony_ci                  ||        ||         ||
1878c2ecf20Sopenharmony_ci              |-------|  |-------|  |-------|
1888c2ecf20Sopenharmony_ci              | sw0p0 |  | sw0p1 |  | sw0p2 |
1898c2ecf20Sopenharmony_ci              |-------|  |-------|  |-------|
1908c2ecf20Sopenharmony_ci
1918c2ecf20Sopenharmony_ci
1928c2ecf20Sopenharmony_ci
1938c2ecf20Sopenharmony_ciSlave MDIO bus
1948c2ecf20Sopenharmony_ci--------------
1958c2ecf20Sopenharmony_ci
1968c2ecf20Sopenharmony_ciIn order to be able to read to/from a switch PHY built into it, DSA creates a
1978c2ecf20Sopenharmony_cislave MDIO bus which allows a specific switch driver to divert and intercept
1988c2ecf20Sopenharmony_ciMDIO reads/writes towards specific PHY addresses. In most MDIO-connected
1998c2ecf20Sopenharmony_ciswitches, these functions would utilize direct or indirect PHY addressing mode
2008c2ecf20Sopenharmony_cito return standard MII registers from the switch builtin PHYs, allowing the PHY
2018c2ecf20Sopenharmony_cilibrary and/or to return link status, link partner pages, auto-negotiation
2028c2ecf20Sopenharmony_ciresults etc..
2038c2ecf20Sopenharmony_ci
2048c2ecf20Sopenharmony_ciFor Ethernet switches which have both external and internal MDIO busses, the
2058c2ecf20Sopenharmony_cislave MII bus can be utilized to mux/demux MDIO reads and writes towards either
2068c2ecf20Sopenharmony_ciinternal or external MDIO devices this switch might be connected to: internal
2078c2ecf20Sopenharmony_ciPHYs, external PHYs, or even external switches.
2088c2ecf20Sopenharmony_ci
2098c2ecf20Sopenharmony_ciData structures
2108c2ecf20Sopenharmony_ci---------------
2118c2ecf20Sopenharmony_ci
2128c2ecf20Sopenharmony_ciDSA data structures are defined in ``include/net/dsa.h`` as well as
2138c2ecf20Sopenharmony_ci``net/dsa/dsa_priv.h``:
2148c2ecf20Sopenharmony_ci
2158c2ecf20Sopenharmony_ci- ``dsa_chip_data``: platform data configuration for a given switch device,
2168c2ecf20Sopenharmony_ci  this structure describes a switch device's parent device, its address, as
2178c2ecf20Sopenharmony_ci  well as various properties of its ports: names/labels, and finally a routing
2188c2ecf20Sopenharmony_ci  table indication (when cascading switches)
2198c2ecf20Sopenharmony_ci
2208c2ecf20Sopenharmony_ci- ``dsa_platform_data``: platform device configuration data which can reference
2218c2ecf20Sopenharmony_ci  a collection of dsa_chip_data structure if multiples switches are cascaded,
2228c2ecf20Sopenharmony_ci  the master network device this switch tree is attached to needs to be
2238c2ecf20Sopenharmony_ci  referenced
2248c2ecf20Sopenharmony_ci
2258c2ecf20Sopenharmony_ci- ``dsa_switch_tree``: structure assigned to the master network device under
2268c2ecf20Sopenharmony_ci  ``dsa_ptr``, this structure references a dsa_platform_data structure as well as
2278c2ecf20Sopenharmony_ci  the tagging protocol supported by the switch tree, and which receive/transmit
2288c2ecf20Sopenharmony_ci  function hooks should be invoked, information about the directly attached
2298c2ecf20Sopenharmony_ci  switch is also provided: CPU port. Finally, a collection of dsa_switch are
2308c2ecf20Sopenharmony_ci  referenced to address individual switches in the tree.
2318c2ecf20Sopenharmony_ci
2328c2ecf20Sopenharmony_ci- ``dsa_switch``: structure describing a switch device in the tree, referencing
2338c2ecf20Sopenharmony_ci  a ``dsa_switch_tree`` as a backpointer, slave network devices, master network
2348c2ecf20Sopenharmony_ci  device, and a reference to the backing``dsa_switch_ops``
2358c2ecf20Sopenharmony_ci
2368c2ecf20Sopenharmony_ci- ``dsa_switch_ops``: structure referencing function pointers, see below for a
2378c2ecf20Sopenharmony_ci  full description.
2388c2ecf20Sopenharmony_ci
2398c2ecf20Sopenharmony_ciDesign limitations
2408c2ecf20Sopenharmony_ci==================
2418c2ecf20Sopenharmony_ci
2428c2ecf20Sopenharmony_ciLimits on the number of devices and ports
2438c2ecf20Sopenharmony_ci-----------------------------------------
2448c2ecf20Sopenharmony_ci
2458c2ecf20Sopenharmony_ciDSA currently limits the number of maximum switches within a tree to 4
2468c2ecf20Sopenharmony_ci(``DSA_MAX_SWITCHES``), and the number of ports per switch to 12 (``DSA_MAX_PORTS``).
2478c2ecf20Sopenharmony_ciThese limits could be extended to support larger configurations would this need
2488c2ecf20Sopenharmony_ciarise.
2498c2ecf20Sopenharmony_ci
2508c2ecf20Sopenharmony_ciLack of CPU/DSA network devices
2518c2ecf20Sopenharmony_ci-------------------------------
2528c2ecf20Sopenharmony_ci
2538c2ecf20Sopenharmony_ciDSA does not currently create slave network devices for the CPU or DSA ports, as
2548c2ecf20Sopenharmony_cidescribed before. This might be an issue in the following cases:
2558c2ecf20Sopenharmony_ci
2568c2ecf20Sopenharmony_ci- inability to fetch switch CPU port statistics counters using ethtool, which
2578c2ecf20Sopenharmony_ci  can make it harder to debug MDIO switch connected using xMII interfaces
2588c2ecf20Sopenharmony_ci
2598c2ecf20Sopenharmony_ci- inability to configure the CPU port link parameters based on the Ethernet
2608c2ecf20Sopenharmony_ci  controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/
2618c2ecf20Sopenharmony_ci
2628c2ecf20Sopenharmony_ci- inability to configure specific VLAN IDs / trunking VLANs between switches
2638c2ecf20Sopenharmony_ci  when using a cascaded setup
2648c2ecf20Sopenharmony_ci
2658c2ecf20Sopenharmony_ciCommon pitfalls using DSA setups
2668c2ecf20Sopenharmony_ci--------------------------------
2678c2ecf20Sopenharmony_ci
2688c2ecf20Sopenharmony_ciOnce a master network device is configured to use DSA (dev->dsa_ptr becomes
2698c2ecf20Sopenharmony_cinon-NULL), and the switch behind it expects a tagging protocol, this network
2708c2ecf20Sopenharmony_ciinterface can only exclusively be used as a conduit interface. Sending packets
2718c2ecf20Sopenharmony_cidirectly through this interface (e.g.: opening a socket using this interface)
2728c2ecf20Sopenharmony_ciwill not make us go through the switch tagging protocol transmit function, so
2738c2ecf20Sopenharmony_cithe Ethernet switch on the other end, expecting a tag will typically drop this
2748c2ecf20Sopenharmony_ciframe.
2758c2ecf20Sopenharmony_ci
2768c2ecf20Sopenharmony_ciSlave network devices check that the master network device is UP before allowing
2778c2ecf20Sopenharmony_ciyou to administratively bring UP these slave network devices. A common
2788c2ecf20Sopenharmony_ciconfiguration mistake is forgetting to bring UP the master network device first.
2798c2ecf20Sopenharmony_ci
2808c2ecf20Sopenharmony_ciInteractions with other subsystems
2818c2ecf20Sopenharmony_ci==================================
2828c2ecf20Sopenharmony_ci
2838c2ecf20Sopenharmony_ciDSA currently leverages the following subsystems:
2848c2ecf20Sopenharmony_ci
2858c2ecf20Sopenharmony_ci- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c``
2868c2ecf20Sopenharmony_ci- Switchdev:``net/switchdev/*``
2878c2ecf20Sopenharmony_ci- Device Tree for various of_* functions
2888c2ecf20Sopenharmony_ci
2898c2ecf20Sopenharmony_ciMDIO/PHY library
2908c2ecf20Sopenharmony_ci----------------
2918c2ecf20Sopenharmony_ci
2928c2ecf20Sopenharmony_ciSlave network devices exposed by DSA may or may not be interfacing with PHY
2938c2ecf20Sopenharmony_cidevices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA
2948c2ecf20Sopenharmony_cisubsystem deals with all possible combinations:
2958c2ecf20Sopenharmony_ci
2968c2ecf20Sopenharmony_ci- internal PHY devices, built into the Ethernet switch hardware
2978c2ecf20Sopenharmony_ci- external PHY devices, connected via an internal or external MDIO bus
2988c2ecf20Sopenharmony_ci- internal PHY devices, connected via an internal MDIO bus
2998c2ecf20Sopenharmony_ci- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a
3008c2ecf20Sopenharmony_ci  fixed PHYs
3018c2ecf20Sopenharmony_ci
3028c2ecf20Sopenharmony_ciThe PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the
3038c2ecf20Sopenharmony_cilogic basically looks like this:
3048c2ecf20Sopenharmony_ci
3058c2ecf20Sopenharmony_ci- if Device Tree is used, the PHY device is looked up using the standard
3068c2ecf20Sopenharmony_ci  "phy-handle" property, if found, this PHY device is created and registered
3078c2ecf20Sopenharmony_ci  using ``of_phy_connect()``
3088c2ecf20Sopenharmony_ci
3098c2ecf20Sopenharmony_ci- if Device Tree is used, and the PHY device is "fixed", that is, conforms to
3108c2ecf20Sopenharmony_ci  the definition of a non-MDIO managed PHY as defined in
3118c2ecf20Sopenharmony_ci  ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered
3128c2ecf20Sopenharmony_ci  and connected transparently using the special fixed MDIO bus driver
3138c2ecf20Sopenharmony_ci
3148c2ecf20Sopenharmony_ci- finally, if the PHY is built into the switch, as is very common with
3158c2ecf20Sopenharmony_ci  standalone switch packages, the PHY is probed using the slave MII bus created
3168c2ecf20Sopenharmony_ci  by DSA
3178c2ecf20Sopenharmony_ci
3188c2ecf20Sopenharmony_ci
3198c2ecf20Sopenharmony_ciSWITCHDEV
3208c2ecf20Sopenharmony_ci---------
3218c2ecf20Sopenharmony_ci
3228c2ecf20Sopenharmony_ciDSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and
3238c2ecf20Sopenharmony_cimore specifically with its VLAN filtering portion when configuring VLANs on top
3248c2ecf20Sopenharmony_ciof per-port slave network devices. Since DSA primarily deals with
3258c2ecf20Sopenharmony_ciMDIO-connected switches, although not exclusively, SWITCHDEV's
3268c2ecf20Sopenharmony_ciprepare/abort/commit phases are often simplified into a prepare phase which
3278c2ecf20Sopenharmony_cichecks whether the operation is supported by the DSA switch driver, and a commit
3288c2ecf20Sopenharmony_ciphase which applies the changes.
3298c2ecf20Sopenharmony_ci
3308c2ecf20Sopenharmony_ciAs of today, the only SWITCHDEV objects supported by DSA are the FDB and VLAN
3318c2ecf20Sopenharmony_ciobjects.
3328c2ecf20Sopenharmony_ci
3338c2ecf20Sopenharmony_ciDevice Tree
3348c2ecf20Sopenharmony_ci-----------
3358c2ecf20Sopenharmony_ci
3368c2ecf20Sopenharmony_ciDSA features a standardized binding which is documented in
3378c2ecf20Sopenharmony_ci``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper
3388c2ecf20Sopenharmony_cifunctions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query
3398c2ecf20Sopenharmony_ciper-port PHY specific details: interface connection, MDIO bus location etc..
3408c2ecf20Sopenharmony_ci
3418c2ecf20Sopenharmony_ciDriver development
3428c2ecf20Sopenharmony_ci==================
3438c2ecf20Sopenharmony_ci
3448c2ecf20Sopenharmony_ciDSA switch drivers need to implement a dsa_switch_ops structure which will
3458c2ecf20Sopenharmony_cicontain the various members described below.
3468c2ecf20Sopenharmony_ci
3478c2ecf20Sopenharmony_ci``register_switch_driver()`` registers this dsa_switch_ops in its internal list
3488c2ecf20Sopenharmony_ciof drivers to probe for. ``unregister_switch_driver()`` does the exact opposite.
3498c2ecf20Sopenharmony_ci
3508c2ecf20Sopenharmony_ciUnless requested differently by setting the priv_size member accordingly, DSA
3518c2ecf20Sopenharmony_cidoes not allocate any driver private context space.
3528c2ecf20Sopenharmony_ci
3538c2ecf20Sopenharmony_ciSwitch configuration
3548c2ecf20Sopenharmony_ci--------------------
3558c2ecf20Sopenharmony_ci
3568c2ecf20Sopenharmony_ci- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported,
3578c2ecf20Sopenharmony_ci  should be a valid value from the ``dsa_tag_protocol`` enum
3588c2ecf20Sopenharmony_ci
3598c2ecf20Sopenharmony_ci- ``probe``: probe routine which will be invoked by the DSA platform device upon
3608c2ecf20Sopenharmony_ci  registration to test for the presence/absence of a switch device. For MDIO
3618c2ecf20Sopenharmony_ci  devices, it is recommended to issue a read towards internal registers using
3628c2ecf20Sopenharmony_ci  the switch pseudo-PHY and return whether this is a supported device. For other
3638c2ecf20Sopenharmony_ci  buses, return a non-NULL string
3648c2ecf20Sopenharmony_ci
3658c2ecf20Sopenharmony_ci- ``setup``: setup function for the switch, this function is responsible for setting
3668c2ecf20Sopenharmony_ci  up the ``dsa_switch_ops`` private structure with all it needs: register maps,
3678c2ecf20Sopenharmony_ci  interrupts, mutexes, locks etc.. This function is also expected to properly
3688c2ecf20Sopenharmony_ci  configure the switch to separate all network interfaces from each other, that
3698c2ecf20Sopenharmony_ci  is, they should be isolated by the switch hardware itself, typically by creating
3708c2ecf20Sopenharmony_ci  a Port-based VLAN ID for each port and allowing only the CPU port and the
3718c2ecf20Sopenharmony_ci  specific port to be in the forwarding vector. Ports that are unused by the
3728c2ecf20Sopenharmony_ci  platform should be disabled. Past this function, the switch is expected to be
3738c2ecf20Sopenharmony_ci  fully configured and ready to serve any kind of request. It is recommended
3748c2ecf20Sopenharmony_ci  to issue a software reset of the switch during this setup function in order to
3758c2ecf20Sopenharmony_ci  avoid relying on what a previous software agent such as a bootloader/firmware
3768c2ecf20Sopenharmony_ci  may have previously configured.
3778c2ecf20Sopenharmony_ci
3788c2ecf20Sopenharmony_ciPHY devices and link management
3798c2ecf20Sopenharmony_ci-------------------------------
3808c2ecf20Sopenharmony_ci
3818c2ecf20Sopenharmony_ci- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs,
3828c2ecf20Sopenharmony_ci  if the PHY library PHY driver needs to know about information it cannot obtain
3838c2ecf20Sopenharmony_ci  on its own (e.g.: coming from switch memory mapped registers), this function
3848c2ecf20Sopenharmony_ci  should return a 32-bits bitmask of "flags", that is private between the switch
3858c2ecf20Sopenharmony_ci  driver and the Ethernet PHY driver in ``drivers/net/phy/\*``.
3868c2ecf20Sopenharmony_ci
3878c2ecf20Sopenharmony_ci- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read
3888c2ecf20Sopenharmony_ci  the switch port MDIO registers. If unavailable, return 0xffff for each read.
3898c2ecf20Sopenharmony_ci  For builtin switch Ethernet PHYs, this function should allow reading the link
3908c2ecf20Sopenharmony_ci  status, auto-negotiation results, link partner pages etc..
3918c2ecf20Sopenharmony_ci
3928c2ecf20Sopenharmony_ci- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write
3938c2ecf20Sopenharmony_ci  to the switch port MDIO registers. If unavailable return a negative error
3948c2ecf20Sopenharmony_ci  code.
3958c2ecf20Sopenharmony_ci
3968c2ecf20Sopenharmony_ci- ``adjust_link``: Function invoked by the PHY library when a slave network device
3978c2ecf20Sopenharmony_ci  is attached to a PHY device. This function is responsible for appropriately
3988c2ecf20Sopenharmony_ci  configuring the switch port link parameters: speed, duplex, pause based on
3998c2ecf20Sopenharmony_ci  what the ``phy_device`` is providing.
4008c2ecf20Sopenharmony_ci
4018c2ecf20Sopenharmony_ci- ``fixed_link_update``: Function invoked by the PHY library, and specifically by
4028c2ecf20Sopenharmony_ci  the fixed PHY driver asking the switch driver for link parameters that could
4038c2ecf20Sopenharmony_ci  not be auto-negotiated, or obtained by reading the PHY registers through MDIO.
4048c2ecf20Sopenharmony_ci  This is particularly useful for specific kinds of hardware such as QSGMII,
4058c2ecf20Sopenharmony_ci  MoCA or other kinds of non-MDIO managed PHYs where out of band link
4068c2ecf20Sopenharmony_ci  information is obtained
4078c2ecf20Sopenharmony_ci
4088c2ecf20Sopenharmony_ciEthtool operations
4098c2ecf20Sopenharmony_ci------------------
4108c2ecf20Sopenharmony_ci
4118c2ecf20Sopenharmony_ci- ``get_strings``: ethtool function used to query the driver's strings, will
4128c2ecf20Sopenharmony_ci  typically return statistics strings, private flags strings etc.
4138c2ecf20Sopenharmony_ci
4148c2ecf20Sopenharmony_ci- ``get_ethtool_stats``: ethtool function used to query per-port statistics and
4158c2ecf20Sopenharmony_ci  return their values. DSA overlays slave network devices general statistics:
4168c2ecf20Sopenharmony_ci  RX/TX counters from the network device, with switch driver specific statistics
4178c2ecf20Sopenharmony_ci  per port
4188c2ecf20Sopenharmony_ci
4198c2ecf20Sopenharmony_ci- ``get_sset_count``: ethtool function used to query the number of statistics items
4208c2ecf20Sopenharmony_ci
4218c2ecf20Sopenharmony_ci- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this
4228c2ecf20Sopenharmony_ci  function may, for certain implementations also query the master network device
4238c2ecf20Sopenharmony_ci  Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN
4248c2ecf20Sopenharmony_ci
4258c2ecf20Sopenharmony_ci- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port,
4268c2ecf20Sopenharmony_ci  direct counterpart to set_wol with similar restrictions
4278c2ecf20Sopenharmony_ci
4288c2ecf20Sopenharmony_ci- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green
4298c2ecf20Sopenharmony_ci  Ethernet) settings, can optionally invoke the PHY library to enable EEE at the
4308c2ecf20Sopenharmony_ci  PHY level if relevant. This function should enable EEE at the switch port MAC
4318c2ecf20Sopenharmony_ci  controller and data-processing logic
4328c2ecf20Sopenharmony_ci
4338c2ecf20Sopenharmony_ci- ``get_eee``: ethtool function which is used to query a switch port EEE settings,
4348c2ecf20Sopenharmony_ci  this function should return the EEE state of the switch port MAC controller
4358c2ecf20Sopenharmony_ci  and data-processing logic as well as query the PHY for its currently configured
4368c2ecf20Sopenharmony_ci  EEE settings
4378c2ecf20Sopenharmony_ci
4388c2ecf20Sopenharmony_ci- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM
4398c2ecf20Sopenharmony_ci  length/size in bytes
4408c2ecf20Sopenharmony_ci
4418c2ecf20Sopenharmony_ci- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents
4428c2ecf20Sopenharmony_ci
4438c2ecf20Sopenharmony_ci- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM
4448c2ecf20Sopenharmony_ci
4458c2ecf20Sopenharmony_ci- ``get_regs_len``: ethtool function returning the register length for a given
4468c2ecf20Sopenharmony_ci  switch
4478c2ecf20Sopenharmony_ci
4488c2ecf20Sopenharmony_ci- ``get_regs``: ethtool function returning the Ethernet switch internal register
4498c2ecf20Sopenharmony_ci  contents. This function might require user-land code in ethtool to
4508c2ecf20Sopenharmony_ci  pretty-print register values and registers
4518c2ecf20Sopenharmony_ci
4528c2ecf20Sopenharmony_ciPower management
4538c2ecf20Sopenharmony_ci----------------
4548c2ecf20Sopenharmony_ci
4558c2ecf20Sopenharmony_ci- ``suspend``: function invoked by the DSA platform device when the system goes to
4568c2ecf20Sopenharmony_ci  suspend, should quiesce all Ethernet switch activities, but keep ports
4578c2ecf20Sopenharmony_ci  participating in Wake-on-LAN active as well as additional wake-up logic if
4588c2ecf20Sopenharmony_ci  supported
4598c2ecf20Sopenharmony_ci
4608c2ecf20Sopenharmony_ci- ``resume``: function invoked by the DSA platform device when the system resumes,
4618c2ecf20Sopenharmony_ci  should resume all Ethernet switch activities and re-configure the switch to be
4628c2ecf20Sopenharmony_ci  in a fully active state
4638c2ecf20Sopenharmony_ci
4648c2ecf20Sopenharmony_ci- ``port_enable``: function invoked by the DSA slave network device ndo_open
4658c2ecf20Sopenharmony_ci  function when a port is administratively brought up, this function should be
4668c2ecf20Sopenharmony_ci  fully enabling a given switch port. DSA takes care of marking the port with
4678c2ecf20Sopenharmony_ci  ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it
4688c2ecf20Sopenharmony_ci  was not, and propagating these changes down to the hardware
4698c2ecf20Sopenharmony_ci
4708c2ecf20Sopenharmony_ci- ``port_disable``: function invoked by the DSA slave network device ndo_close
4718c2ecf20Sopenharmony_ci  function when a port is administratively brought down, this function should be
4728c2ecf20Sopenharmony_ci  fully disabling a given switch port. DSA takes care of marking the port with
4738c2ecf20Sopenharmony_ci  ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is
4748c2ecf20Sopenharmony_ci  disabled while being a bridge member
4758c2ecf20Sopenharmony_ci
4768c2ecf20Sopenharmony_ciBridge layer
4778c2ecf20Sopenharmony_ci------------
4788c2ecf20Sopenharmony_ci
4798c2ecf20Sopenharmony_ci- ``port_bridge_join``: bridge layer function invoked when a given switch port is
4808c2ecf20Sopenharmony_ci  added to a bridge, this function should be doing the necessary at the switch
4818c2ecf20Sopenharmony_ci  level to permit the joining port from being added to the relevant logical
4828c2ecf20Sopenharmony_ci  domain for it to ingress/egress traffic with other members of the bridge.
4838c2ecf20Sopenharmony_ci
4848c2ecf20Sopenharmony_ci- ``port_bridge_leave``: bridge layer function invoked when a given switch port is
4858c2ecf20Sopenharmony_ci  removed from a bridge, this function should be doing the necessary at the
4868c2ecf20Sopenharmony_ci  switch level to deny the leaving port from ingress/egress traffic from the
4878c2ecf20Sopenharmony_ci  remaining bridge members. When the port leaves the bridge, it should be aged
4888c2ecf20Sopenharmony_ci  out at the switch hardware for the switch to (re) learn MAC addresses behind
4898c2ecf20Sopenharmony_ci  this port.
4908c2ecf20Sopenharmony_ci
4918c2ecf20Sopenharmony_ci- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP
4928c2ecf20Sopenharmony_ci  state is computed by the bridge layer and should be propagated to switch
4938c2ecf20Sopenharmony_ci  hardware to forward/block/learn traffic. The switch driver is responsible for
4948c2ecf20Sopenharmony_ci  computing a STP state change based on current and asked parameters and perform
4958c2ecf20Sopenharmony_ci  the relevant ageing based on the intersection results
4968c2ecf20Sopenharmony_ci
4978c2ecf20Sopenharmony_ciBridge VLAN filtering
4988c2ecf20Sopenharmony_ci---------------------
4998c2ecf20Sopenharmony_ci
5008c2ecf20Sopenharmony_ci- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets
5018c2ecf20Sopenharmony_ci  configured for turning on or off VLAN filtering. If nothing specific needs to
5028c2ecf20Sopenharmony_ci  be done at the hardware level, this callback does not need to be implemented.
5038c2ecf20Sopenharmony_ci  When VLAN filtering is turned on, the hardware must be programmed with
5048c2ecf20Sopenharmony_ci  rejecting 802.1Q frames which have VLAN IDs outside of the programmed allowed
5058c2ecf20Sopenharmony_ci  VLAN ID map/rules.  If there is no PVID programmed into the switch port,
5068c2ecf20Sopenharmony_ci  untagged frames must be rejected as well. When turned off the switch must
5078c2ecf20Sopenharmony_ci  accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are
5088c2ecf20Sopenharmony_ci  allowed.
5098c2ecf20Sopenharmony_ci
5108c2ecf20Sopenharmony_ci- ``port_vlan_prepare``: bridge layer function invoked when the bridge prepares the
5118c2ecf20Sopenharmony_ci  configuration of a VLAN on the given port. If the operation is not supported
5128c2ecf20Sopenharmony_ci  by the hardware, this function should return ``-EOPNOTSUPP`` to inform the bridge
5138c2ecf20Sopenharmony_ci  code to fallback to a software implementation. No hardware setup must be done
5148c2ecf20Sopenharmony_ci  in this function. See port_vlan_add for this and details.
5158c2ecf20Sopenharmony_ci
5168c2ecf20Sopenharmony_ci- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured
5178c2ecf20Sopenharmony_ci  (tagged or untagged) for the given switch port
5188c2ecf20Sopenharmony_ci
5198c2ecf20Sopenharmony_ci- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the
5208c2ecf20Sopenharmony_ci  given switch port
5218c2ecf20Sopenharmony_ci
5228c2ecf20Sopenharmony_ci- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback
5238c2ecf20Sopenharmony_ci  function that the driver has to call for each VLAN the given port is a member
5248c2ecf20Sopenharmony_ci  of. A switchdev object is used to carry the VID and bridge flags.
5258c2ecf20Sopenharmony_ci
5268c2ecf20Sopenharmony_ci- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a
5278c2ecf20Sopenharmony_ci  Forwarding Database entry, the switch hardware should be programmed with the
5288c2ecf20Sopenharmony_ci  specified address in the specified VLAN Id in the forwarding database
5298c2ecf20Sopenharmony_ci  associated with this VLAN ID. If the operation is not supported, this
5308c2ecf20Sopenharmony_ci  function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to
5318c2ecf20Sopenharmony_ci  a software implementation.
5328c2ecf20Sopenharmony_ci
5338c2ecf20Sopenharmony_ci.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
5348c2ecf20Sopenharmony_ci        of DSA, would be its port-based VLAN, used by the associated bridge device.
5358c2ecf20Sopenharmony_ci
5368c2ecf20Sopenharmony_ci- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a
5378c2ecf20Sopenharmony_ci  Forwarding Database entry, the switch hardware should be programmed to delete
5388c2ecf20Sopenharmony_ci  the specified MAC address from the specified VLAN ID if it was mapped into
5398c2ecf20Sopenharmony_ci  this port forwarding database
5408c2ecf20Sopenharmony_ci
5418c2ecf20Sopenharmony_ci- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback
5428c2ecf20Sopenharmony_ci  function that the driver has to call for each MAC address known to be behind
5438c2ecf20Sopenharmony_ci  the given port. A switchdev object is used to carry the VID and FDB info.
5448c2ecf20Sopenharmony_ci
5458c2ecf20Sopenharmony_ci- ``port_mdb_prepare``: bridge layer function invoked when the bridge prepares the
5468c2ecf20Sopenharmony_ci  installation of a multicast database entry. If the operation is not supported,
5478c2ecf20Sopenharmony_ci  this function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback
5488c2ecf20Sopenharmony_ci  to a software implementation. No hardware setup must be done in this function.
5498c2ecf20Sopenharmony_ci  See ``port_fdb_add`` for this and details.
5508c2ecf20Sopenharmony_ci
5518c2ecf20Sopenharmony_ci- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install
5528c2ecf20Sopenharmony_ci  a multicast database entry, the switch hardware should be programmed with the
5538c2ecf20Sopenharmony_ci  specified address in the specified VLAN ID in the forwarding database
5548c2ecf20Sopenharmony_ci  associated with this VLAN ID.
5558c2ecf20Sopenharmony_ci
5568c2ecf20Sopenharmony_ci.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
5578c2ecf20Sopenharmony_ci        of DSA, would be its port-based VLAN, used by the associated bridge device.
5588c2ecf20Sopenharmony_ci
5598c2ecf20Sopenharmony_ci- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a
5608c2ecf20Sopenharmony_ci  multicast database entry, the switch hardware should be programmed to delete
5618c2ecf20Sopenharmony_ci  the specified MAC address from the specified VLAN ID if it was mapped into
5628c2ecf20Sopenharmony_ci  this port forwarding database.
5638c2ecf20Sopenharmony_ci
5648c2ecf20Sopenharmony_ci- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback
5658c2ecf20Sopenharmony_ci  function that the driver has to call for each MAC address known to be behind
5668c2ecf20Sopenharmony_ci  the given port. A switchdev object is used to carry the VID and MDB info.
5678c2ecf20Sopenharmony_ci
5688c2ecf20Sopenharmony_ciTODO
5698c2ecf20Sopenharmony_ci====
5708c2ecf20Sopenharmony_ci
5718c2ecf20Sopenharmony_ciMaking SWITCHDEV and DSA converge towards an unified codebase
5728c2ecf20Sopenharmony_ci-------------------------------------------------------------
5738c2ecf20Sopenharmony_ci
5748c2ecf20Sopenharmony_ciSWITCHDEV properly takes care of abstracting the networking stack with offload
5758c2ecf20Sopenharmony_cicapable hardware, but does not enforce a strict switch device driver model. On
5768c2ecf20Sopenharmony_cithe other DSA enforces a fairly strict device driver model, and deals with most
5778c2ecf20Sopenharmony_ciof the switch specific. At some point we should envision a merger between these
5788c2ecf20Sopenharmony_citwo subsystems and get the best of both worlds.
5798c2ecf20Sopenharmony_ci
5808c2ecf20Sopenharmony_ciOther hanging fruits
5818c2ecf20Sopenharmony_ci--------------------
5828c2ecf20Sopenharmony_ci
5838c2ecf20Sopenharmony_ci- making the number of ports fully dynamic and not dependent on ``DSA_MAX_PORTS``
5848c2ecf20Sopenharmony_ci- allowing more than one CPU/management interface:
5858c2ecf20Sopenharmony_ci  http://comments.gmane.org/gmane.linux.network/365657
5868c2ecf20Sopenharmony_ci- porting more drivers from other vendors:
5878c2ecf20Sopenharmony_ci  http://comments.gmane.org/gmane.linux.network/365510
588