162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci============= 462306a36Sopenharmony_ciDevlink DPIPE 562306a36Sopenharmony_ci============= 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciBackground 862306a36Sopenharmony_ci========== 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciWhile performing the hardware offloading process, much of the hardware 1162306a36Sopenharmony_cispecifics cannot be presented. These details are useful for debugging, and 1262306a36Sopenharmony_ci``devlink-dpipe`` provides a standardized way to provide visibility into the 1362306a36Sopenharmony_cioffloading process. 1462306a36Sopenharmony_ci 1562306a36Sopenharmony_ciFor example, the routing longest prefix match (LPM) algorithm used by the 1662306a36Sopenharmony_ciLinux kernel may differ from the hardware implementation. The pipeline debug 1762306a36Sopenharmony_ciAPI (DPIPE) is aimed at providing the user visibility into the ASIC's 1862306a36Sopenharmony_cipipeline in a generic way. 1962306a36Sopenharmony_ci 2062306a36Sopenharmony_ciThe hardware offload process is expected to be done in a way that the user 2162306a36Sopenharmony_cishould not be able to distinguish between the hardware vs. software 2262306a36Sopenharmony_ciimplementation. In this process, hardware specifics are neglected. In 2362306a36Sopenharmony_cireality those details can have lots of meaning and should be exposed in some 2462306a36Sopenharmony_cistandard way. 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ciThis problem is made even more complex when one wishes to offload the 2762306a36Sopenharmony_cicontrol path of the whole networking stack to a switch ASIC. Due to 2862306a36Sopenharmony_cidifferences in the hardware and software models some processes cannot be 2962306a36Sopenharmony_cirepresented correctly. 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciOne example is the kernel's LPM algorithm which in many cases differs 3262306a36Sopenharmony_cigreatly to the hardware implementation. The configuration API is the same, 3362306a36Sopenharmony_cibut one cannot rely on the Forward Information Base (FIB) to look like the 3462306a36Sopenharmony_ciLevel Path Compression trie (LPC-trie) in hardware. 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ciIn many situations trying to analyze systems failure solely based on the 3762306a36Sopenharmony_cikernel's dump may not be enough. By combining this data with complementary 3862306a36Sopenharmony_ciinformation about the underlying hardware, this debugging can be made 3962306a36Sopenharmony_cieasier; additionally, the information can be useful when debugging 4062306a36Sopenharmony_ciperformance issues. 4162306a36Sopenharmony_ci 4262306a36Sopenharmony_ciOverview 4362306a36Sopenharmony_ci======== 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ciThe ``devlink-dpipe`` interface closes this gap. The hardware's pipeline is 4662306a36Sopenharmony_cimodeled as a graph of match/action tables. Each table represents a specific 4762306a36Sopenharmony_cihardware block. This model is not new, first being used by the P4 language. 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_ciTraditionally it has been used as an alternative model for hardware 5062306a36Sopenharmony_ciconfiguration, but the ``devlink-dpipe`` interface uses it for visibility 5162306a36Sopenharmony_cipurposes as a standard complementary tool. The system's view from 5262306a36Sopenharmony_ci``devlink-dpipe`` should change according to the changes done by the 5362306a36Sopenharmony_cistandard configuration tools. 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ciFor example, it’s quite common to implement Access Control Lists (ACL) 5662306a36Sopenharmony_ciusing Ternary Content Addressable Memory (TCAM). The TCAM memory can be 5762306a36Sopenharmony_cidivided into TCAM regions. Complex TC filters can have multiple rules with 5862306a36Sopenharmony_cidifferent priorities and different lookup keys. On the other hand hardware 5962306a36Sopenharmony_ciTCAM regions have a predefined lookup key. Offloading the TC filter rules 6062306a36Sopenharmony_ciusing TCAM engine can result in multiple TCAM regions being interconnected 6162306a36Sopenharmony_ciin a chain (which may affect the data path latency). In response to a new TC 6262306a36Sopenharmony_cifilter new tables should be created describing those regions. 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ciModel 6562306a36Sopenharmony_ci===== 6662306a36Sopenharmony_ci 6762306a36Sopenharmony_ciThe ``DPIPE`` model introduces several objects: 6862306a36Sopenharmony_ci 6962306a36Sopenharmony_ci * headers 7062306a36Sopenharmony_ci * tables 7162306a36Sopenharmony_ci * entries 7262306a36Sopenharmony_ci 7362306a36Sopenharmony_ciA ``header`` describes packet formats and provides names for fields within 7462306a36Sopenharmony_cithe packet. A ``table`` describes hardware blocks. An ``entry`` describes 7562306a36Sopenharmony_cithe actual content of a specific table. 7662306a36Sopenharmony_ci 7762306a36Sopenharmony_ciThe hardware pipeline is not port specific, but rather describes the whole 7862306a36Sopenharmony_ciASIC. Thus it is tied to the top of the ``devlink`` infrastructure. 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ciDrivers can register and unregister tables at run time, in order to support 8162306a36Sopenharmony_cidynamic behavior. This dynamic behavior is mandatory for describing hardware 8262306a36Sopenharmony_ciblocks like TCAM regions which can be allocated and freed dynamically. 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ci``devlink-dpipe`` generally is not intended for configuration. The exception 8562306a36Sopenharmony_ciis hardware counting for a specific table. 8662306a36Sopenharmony_ci 8762306a36Sopenharmony_ciThe following commands are used to obtain the ``dpipe`` objects from 8862306a36Sopenharmony_ciuserspace: 8962306a36Sopenharmony_ci 9062306a36Sopenharmony_ci * ``table_get``: Receive a table's description. 9162306a36Sopenharmony_ci * ``headers_get``: Receive a device's supported headers. 9262306a36Sopenharmony_ci * ``entries_get``: Receive a table's current entries. 9362306a36Sopenharmony_ci * ``counters_set``: Enable or disable counters on a table. 9462306a36Sopenharmony_ci 9562306a36Sopenharmony_ciTable 9662306a36Sopenharmony_ci----- 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciThe driver should implement the following operations for each table: 9962306a36Sopenharmony_ci 10062306a36Sopenharmony_ci * ``matches_dump``: Dump the supported matches. 10162306a36Sopenharmony_ci * ``actions_dump``: Dump the supported actions. 10262306a36Sopenharmony_ci * ``entries_dump``: Dump the actual content of the table. 10362306a36Sopenharmony_ci * ``counters_set_update``: Synchronize hardware with counters enabled or 10462306a36Sopenharmony_ci disabled. 10562306a36Sopenharmony_ci 10662306a36Sopenharmony_ciHeader/Field 10762306a36Sopenharmony_ci------------ 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ciIn a similar way to P4 headers and fields are used to describe a table's 11062306a36Sopenharmony_cibehavior. There is a slight difference between the standard protocol headers 11162306a36Sopenharmony_ciand specific ASIC metadata. The protocol headers should be declared in the 11262306a36Sopenharmony_ci``devlink`` core API. On the other hand ASIC meta data is driver specific 11362306a36Sopenharmony_ciand should be defined in the driver. Additionally, each driver-specific 11462306a36Sopenharmony_cidevlink documentation file should document the driver-specific ``dpipe`` 11562306a36Sopenharmony_ciheaders it implements. The headers and fields are identified by enumeration. 11662306a36Sopenharmony_ci 11762306a36Sopenharmony_ciIn order to provide further visibility some ASIC metadata fields could be 11862306a36Sopenharmony_cimapped to kernel objects. For example, internal router interface indexes can 11962306a36Sopenharmony_cibe directly mapped to the net device ifindex. FIB table indexes used by 12062306a36Sopenharmony_cidifferent Virtual Routing and Forwarding (VRF) tables can be mapped to 12162306a36Sopenharmony_ciinternal routing table indexes. 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ciMatch 12462306a36Sopenharmony_ci----- 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ciMatches are kept primitive and close to hardware operation. Match types like 12762306a36Sopenharmony_ciLPM are not supported due to the fact that this is exactly a process we wish 12862306a36Sopenharmony_cito describe in full detail. Example of matches: 12962306a36Sopenharmony_ci 13062306a36Sopenharmony_ci * ``field_exact``: Exact match on a specific field. 13162306a36Sopenharmony_ci * ``field_exact_mask``: Exact match on a specific field after masking. 13262306a36Sopenharmony_ci * ``field_range``: Match on a specific range. 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ciThe id's of the header and the field should be specified in order to 13562306a36Sopenharmony_ciidentify the specific field. Furthermore, the header index should be 13662306a36Sopenharmony_cispecified in order to distinguish multiple headers of the same type in a 13762306a36Sopenharmony_cipacket (tunneling). 13862306a36Sopenharmony_ci 13962306a36Sopenharmony_ciAction 14062306a36Sopenharmony_ci------ 14162306a36Sopenharmony_ci 14262306a36Sopenharmony_ciSimilar to match, the actions are kept primitive and close to hardware 14362306a36Sopenharmony_cioperation. For example: 14462306a36Sopenharmony_ci 14562306a36Sopenharmony_ci * ``field_modify``: Modify the field value. 14662306a36Sopenharmony_ci * ``field_inc``: Increment the field value. 14762306a36Sopenharmony_ci * ``push_header``: Add a header. 14862306a36Sopenharmony_ci * ``pop_header``: Remove a header. 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ciEntry 15162306a36Sopenharmony_ci----- 15262306a36Sopenharmony_ci 15362306a36Sopenharmony_ciEntries of a specific table can be dumped on demand. Each eentry is 15462306a36Sopenharmony_ciidentified with an index and its properties are described by a list of 15562306a36Sopenharmony_cimatch/action values and specific counter. By dumping the tables content the 15662306a36Sopenharmony_ciinteractions between tables can be resolved. 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ciAbstraction Example 15962306a36Sopenharmony_ci=================== 16062306a36Sopenharmony_ci 16162306a36Sopenharmony_ciThe following is an example of the abstraction model of the L3 part of 16262306a36Sopenharmony_ciMellanox Spectrum ASIC. The blocks are described in the order they appear in 16362306a36Sopenharmony_cithe pipeline. The table sizes in the following examples are not real 16462306a36Sopenharmony_cihardware sizes and are provided for demonstration purposes. 16562306a36Sopenharmony_ci 16662306a36Sopenharmony_ciLPM 16762306a36Sopenharmony_ci--- 16862306a36Sopenharmony_ci 16962306a36Sopenharmony_ciThe LPM algorithm can be implemented as a list of hash tables. Each hash 17062306a36Sopenharmony_citable contains routes with the same prefix length. The root of the list is 17162306a36Sopenharmony_ci/32, and in case of a miss the hardware will continue to the next hash 17262306a36Sopenharmony_citable. The depth of the search will affect the data path latency. 17362306a36Sopenharmony_ci 17462306a36Sopenharmony_ciIn case of a hit the entry contains information about the next stage of the 17562306a36Sopenharmony_cipipeline which resolves the MAC address. The next stage can be either local 17662306a36Sopenharmony_cihost table for directly connected routes, or adjacency table for next-hops. 17762306a36Sopenharmony_ciThe ``meta.lpm_prefix`` field is used to connect two LPM tables. 17862306a36Sopenharmony_ci 17962306a36Sopenharmony_ci.. code:: 18062306a36Sopenharmony_ci 18162306a36Sopenharmony_ci table lpm_prefix_16 { 18262306a36Sopenharmony_ci size: 4096, 18362306a36Sopenharmony_ci counters_enabled: true, 18462306a36Sopenharmony_ci match: { meta.vr_id: exact, 18562306a36Sopenharmony_ci ipv4.dst_addr: exact_mask, 18662306a36Sopenharmony_ci ipv6.dst_addr: exact_mask, 18762306a36Sopenharmony_ci meta.lpm_prefix: exact }, 18862306a36Sopenharmony_ci action: { meta.adj_index: set, 18962306a36Sopenharmony_ci meta.adj_group_size: set, 19062306a36Sopenharmony_ci meta.rif_port: set, 19162306a36Sopenharmony_ci meta.lpm_prefix: set }, 19262306a36Sopenharmony_ci } 19362306a36Sopenharmony_ci 19462306a36Sopenharmony_ciLocal Host 19562306a36Sopenharmony_ci---------- 19662306a36Sopenharmony_ci 19762306a36Sopenharmony_ciIn the case of local routes the LPM lookup already resolves the egress 19862306a36Sopenharmony_cirouter interface (RIF), yet the exact MAC address is not known. The local 19962306a36Sopenharmony_cihost table is a hash table combining the output interface id with 20062306a36Sopenharmony_cidestination IP address as a key. The result is the MAC address. 20162306a36Sopenharmony_ci 20262306a36Sopenharmony_ci.. code:: 20362306a36Sopenharmony_ci 20462306a36Sopenharmony_ci table local_host { 20562306a36Sopenharmony_ci size: 4096, 20662306a36Sopenharmony_ci counters_enabled: true, 20762306a36Sopenharmony_ci match: { meta.rif_port: exact, 20862306a36Sopenharmony_ci ipv4.dst_addr: exact}, 20962306a36Sopenharmony_ci action: { ethernet.daddr: set } 21062306a36Sopenharmony_ci } 21162306a36Sopenharmony_ci 21262306a36Sopenharmony_ciAdjacency 21362306a36Sopenharmony_ci--------- 21462306a36Sopenharmony_ci 21562306a36Sopenharmony_ciIn case of remote routes this table does the ECMP. The LPM lookup results in 21662306a36Sopenharmony_ciECMP group size and index that serves as a global offset into this table. 21762306a36Sopenharmony_ciConcurrently a hash of the packet is generated. Based on the ECMP group size 21862306a36Sopenharmony_ciand the packet's hash a local offset is generated. Multiple LPM entries can 21962306a36Sopenharmony_cipoint to the same adjacency group. 22062306a36Sopenharmony_ci 22162306a36Sopenharmony_ci.. code:: 22262306a36Sopenharmony_ci 22362306a36Sopenharmony_ci table adjacency { 22462306a36Sopenharmony_ci size: 4096, 22562306a36Sopenharmony_ci counters_enabled: true, 22662306a36Sopenharmony_ci match: { meta.adj_index: exact, 22762306a36Sopenharmony_ci meta.adj_group_size: exact, 22862306a36Sopenharmony_ci meta.packet_hash_index: exact }, 22962306a36Sopenharmony_ci action: { ethernet.daddr: set, 23062306a36Sopenharmony_ci meta.erif: set } 23162306a36Sopenharmony_ci } 23262306a36Sopenharmony_ci 23362306a36Sopenharmony_ciERIF 23462306a36Sopenharmony_ci---- 23562306a36Sopenharmony_ci 23662306a36Sopenharmony_ciIn case the egress RIF and destination MAC have been resolved by previous 23762306a36Sopenharmony_citables this table does multiple operations like TTL decrease and MTU check. 23862306a36Sopenharmony_ciThen the decision of forward/drop is taken and the port L3 statistics are 23962306a36Sopenharmony_ciupdated based on the packet's type (broadcast, unicast, multicast). 24062306a36Sopenharmony_ci 24162306a36Sopenharmony_ci.. code:: 24262306a36Sopenharmony_ci 24362306a36Sopenharmony_ci table erif { 24462306a36Sopenharmony_ci size: 800, 24562306a36Sopenharmony_ci counters_enabled: true, 24662306a36Sopenharmony_ci match: { meta.rif_port: exact, 24762306a36Sopenharmony_ci meta.is_l3_unicast: exact, 24862306a36Sopenharmony_ci meta.is_l3_broadcast: exact, 24962306a36Sopenharmony_ci meta.is_l3_multicast, exact }, 25062306a36Sopenharmony_ci action: { meta.l3_drop: set, 25162306a36Sopenharmony_ci meta.l3_forward: set } 25262306a36Sopenharmony_ci } 253