162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci====================
462306a36Sopenharmony_cimlx5 devlink support
562306a36Sopenharmony_ci====================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciThis document describes the devlink features implemented by the ``mlx5``
862306a36Sopenharmony_cidevice driver.
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciParameters
1162306a36Sopenharmony_ci==========
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ci.. list-table:: Generic parameters implemented
1462306a36Sopenharmony_ci
1562306a36Sopenharmony_ci   * - Name
1662306a36Sopenharmony_ci     - Mode
1762306a36Sopenharmony_ci     - Validation
1862306a36Sopenharmony_ci   * - ``enable_roce``
1962306a36Sopenharmony_ci     - driverinit
2062306a36Sopenharmony_ci     - Type: Boolean
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ci       If the device supports RoCE disablement, RoCE enablement state controls
2362306a36Sopenharmony_ci       device support for RoCE capability. Otherwise, the control occurs in the
2462306a36Sopenharmony_ci       driver stack. When RoCE is disabled at the driver level, only raw
2562306a36Sopenharmony_ci       ethernet QPs are supported.
2662306a36Sopenharmony_ci   * - ``io_eq_size``
2762306a36Sopenharmony_ci     - driverinit
2862306a36Sopenharmony_ci     - The range is between 64 and 4096.
2962306a36Sopenharmony_ci   * - ``event_eq_size``
3062306a36Sopenharmony_ci     - driverinit
3162306a36Sopenharmony_ci     - The range is between 64 and 4096.
3262306a36Sopenharmony_ci   * - ``max_macs``
3362306a36Sopenharmony_ci     - driverinit
3462306a36Sopenharmony_ci     - The range is between 1 and 2^31. Only power of 2 values are supported.
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ciThe ``mlx5`` driver also implements the following driver-specific
3762306a36Sopenharmony_ciparameters.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ci.. list-table:: Driver-specific parameters implemented
4062306a36Sopenharmony_ci   :widths: 5 5 5 85
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ci   * - Name
4362306a36Sopenharmony_ci     - Type
4462306a36Sopenharmony_ci     - Mode
4562306a36Sopenharmony_ci     - Description
4662306a36Sopenharmony_ci   * - ``flow_steering_mode``
4762306a36Sopenharmony_ci     - string
4862306a36Sopenharmony_ci     - runtime
4962306a36Sopenharmony_ci     - Controls the flow steering mode of the driver
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci       * ``dmfs`` Device managed flow steering. In DMFS mode, the HW
5262306a36Sopenharmony_ci         steering entities are created and managed through firmware.
5362306a36Sopenharmony_ci       * ``smfs`` Software managed flow steering. In SMFS mode, the HW
5462306a36Sopenharmony_ci         steering entities are created and manage through the driver without
5562306a36Sopenharmony_ci         firmware intervention.
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ci       SMFS mode is faster and provides better rule insertion rate compared to
5862306a36Sopenharmony_ci       default DMFS mode.
5962306a36Sopenharmony_ci   * - ``fdb_large_groups``
6062306a36Sopenharmony_ci     - u32
6162306a36Sopenharmony_ci     - driverinit
6262306a36Sopenharmony_ci     - Control the number of large groups (size > 1) in the FDB table.
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ci       * The default value is 15, and the range is between 1 and 1024.
6562306a36Sopenharmony_ci   * - ``esw_multiport``
6662306a36Sopenharmony_ci     - Boolean
6762306a36Sopenharmony_ci     - runtime
6862306a36Sopenharmony_ci     - Control MultiPort E-Switch shared fdb mode.
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ci       An experimental mode where a single E-Switch is used and all the vports
7162306a36Sopenharmony_ci       and physical ports on the NIC are connected to it.
7262306a36Sopenharmony_ci
7362306a36Sopenharmony_ci       An example is to send traffic from a VF that is created on PF0 to an
7462306a36Sopenharmony_ci       uplink that is natively associated with the uplink of PF1
7562306a36Sopenharmony_ci
7662306a36Sopenharmony_ci       Note: Future devices, ConnectX-8 and onward, will eventually have this
7762306a36Sopenharmony_ci       as the default to allow forwarding between all NIC ports in a single
7862306a36Sopenharmony_ci       E-switch environment and the dual E-switch mode will likely get
7962306a36Sopenharmony_ci       deprecated.
8062306a36Sopenharmony_ci
8162306a36Sopenharmony_ci       Default: disabled
8262306a36Sopenharmony_ci   * - ``esw_port_metadata``
8362306a36Sopenharmony_ci     - Boolean
8462306a36Sopenharmony_ci     - runtime
8562306a36Sopenharmony_ci     - When applicable, disabling eswitch metadata can increase packet rate up
8662306a36Sopenharmony_ci       to 20% depending on the use case and packet sizes.
8762306a36Sopenharmony_ci
8862306a36Sopenharmony_ci       Eswitch port metadata state controls whether to internally tag packets
8962306a36Sopenharmony_ci       with metadata. Metadata tagging must be enabled for multi-port RoCE,
9062306a36Sopenharmony_ci       failover between representors and stacked devices. By default metadata is
9162306a36Sopenharmony_ci       enabled on the supported devices in E-switch. Metadata is applicable only
9262306a36Sopenharmony_ci       for E-switch in switchdev mode and users may disable it when NONE of the
9362306a36Sopenharmony_ci       below use cases will be in use:
9462306a36Sopenharmony_ci       1. HCA is in Dual/multi-port RoCE mode.
9562306a36Sopenharmony_ci       2. VF/SF representor bonding (Usually used for Live migration)
9662306a36Sopenharmony_ci       3. Stacked devices
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ci       When metadata is disabled, the above use cases will fail to initialize if
9962306a36Sopenharmony_ci       users try to enable them.
10062306a36Sopenharmony_ci   * - ``hairpin_num_queues``
10162306a36Sopenharmony_ci     - u32
10262306a36Sopenharmony_ci     - driverinit
10362306a36Sopenharmony_ci     - We refer to a TC NIC rule that involves forwarding as "hairpin".
10462306a36Sopenharmony_ci       Hairpin queues are mlx5 hardware specific implementation for hardware
10562306a36Sopenharmony_ci       forwarding of such packets.
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ci       Control the number of hairpin queues.
10862306a36Sopenharmony_ci   * - ``hairpin_queue_size``
10962306a36Sopenharmony_ci     - u32
11062306a36Sopenharmony_ci     - driverinit
11162306a36Sopenharmony_ci     - Control the size (in packets) of the hairpin queues.
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ciThe ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ciInfo versions
11662306a36Sopenharmony_ci=============
11762306a36Sopenharmony_ci
11862306a36Sopenharmony_ciThe ``mlx5`` driver reports the following versions
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ci.. list-table:: devlink info versions implemented
12162306a36Sopenharmony_ci   :widths: 5 5 90
12262306a36Sopenharmony_ci
12362306a36Sopenharmony_ci   * - Name
12462306a36Sopenharmony_ci     - Type
12562306a36Sopenharmony_ci     - Description
12662306a36Sopenharmony_ci   * - ``fw.psid``
12762306a36Sopenharmony_ci     - fixed
12862306a36Sopenharmony_ci     - Used to represent the board id of the device.
12962306a36Sopenharmony_ci   * - ``fw.version``
13062306a36Sopenharmony_ci     - stored, running
13162306a36Sopenharmony_ci     - Three digit major.minor.subminor firmware version number.
13262306a36Sopenharmony_ci
13362306a36Sopenharmony_ciHealth reporters
13462306a36Sopenharmony_ci================
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_citx reporter
13762306a36Sopenharmony_ci-----------
13862306a36Sopenharmony_ciThe tx reporter is responsible for reporting and recovering of the following three error scenarios:
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ci- tx timeout
14162306a36Sopenharmony_ci    Report on kernel tx timeout detection.
14262306a36Sopenharmony_ci    Recover by searching lost interrupts.
14362306a36Sopenharmony_ci- tx error completion
14462306a36Sopenharmony_ci    Report on error tx completion.
14562306a36Sopenharmony_ci    Recover by flushing the tx queue and reset it.
14662306a36Sopenharmony_ci- tx PTP port timestamping CQ unhealthy
14762306a36Sopenharmony_ci    Report too many CQEs never delivered on port ts CQ.
14862306a36Sopenharmony_ci    Recover by flushing and re-creating all PTP channels.
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_citx reporter also support on demand diagnose callback, on which it provides
15162306a36Sopenharmony_cireal time information of its send queues status.
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ciUser commands examples:
15462306a36Sopenharmony_ci
15562306a36Sopenharmony_ci- Diagnose send queues status::
15662306a36Sopenharmony_ci
15762306a36Sopenharmony_ci    $ devlink health diagnose pci/0000:82:00.0 reporter tx
15862306a36Sopenharmony_ci
15962306a36Sopenharmony_ci.. note::
16062306a36Sopenharmony_ci   This command has valid output only when interface is up, otherwise the command has empty output.
16162306a36Sopenharmony_ci
16262306a36Sopenharmony_ci- Show number of tx errors indicated, number of recover flows ended successfully,
16362306a36Sopenharmony_ci  is autorecover enabled and graceful period from last recover::
16462306a36Sopenharmony_ci
16562306a36Sopenharmony_ci    $ devlink health show pci/0000:82:00.0 reporter tx
16662306a36Sopenharmony_ci
16762306a36Sopenharmony_cirx reporter
16862306a36Sopenharmony_ci-----------
16962306a36Sopenharmony_ciThe rx reporter is responsible for reporting and recovering of the following two error scenarios:
17062306a36Sopenharmony_ci
17162306a36Sopenharmony_ci- rx queues' initialization (population) timeout
17262306a36Sopenharmony_ci    Population of rx queues' descriptors on ring initialization is done
17362306a36Sopenharmony_ci    in napi context via triggering an irq. In case of a failure to get
17462306a36Sopenharmony_ci    the minimum amount of descriptors, a timeout would occur, and
17562306a36Sopenharmony_ci    descriptors could be recovered by polling the EQ (Event Queue).
17662306a36Sopenharmony_ci- rx completions with errors (reported by HW on interrupt context)
17762306a36Sopenharmony_ci    Report on rx completion error.
17862306a36Sopenharmony_ci    Recover (if needed) by flushing the related queue and reset it.
17962306a36Sopenharmony_ci
18062306a36Sopenharmony_cirx reporter also supports on demand diagnose callback, on which it
18162306a36Sopenharmony_ciprovides real time information of its receive queues' status.
18262306a36Sopenharmony_ci
18362306a36Sopenharmony_ci- Diagnose rx queues' status and corresponding completion queue::
18462306a36Sopenharmony_ci
18562306a36Sopenharmony_ci    $ devlink health diagnose pci/0000:82:00.0 reporter rx
18662306a36Sopenharmony_ci
18762306a36Sopenharmony_ci.. note::
18862306a36Sopenharmony_ci   This command has valid output only when interface is up. Otherwise, the command has empty output.
18962306a36Sopenharmony_ci
19062306a36Sopenharmony_ci- Show number of rx errors indicated, number of recover flows ended successfully,
19162306a36Sopenharmony_ci  is autorecover enabled, and graceful period from last recover::
19262306a36Sopenharmony_ci
19362306a36Sopenharmony_ci    $ devlink health show pci/0000:82:00.0 reporter rx
19462306a36Sopenharmony_ci
19562306a36Sopenharmony_cifw reporter
19662306a36Sopenharmony_ci-----------
19762306a36Sopenharmony_ciThe fw reporter implements `diagnose` and `dump` callbacks.
19862306a36Sopenharmony_ciIt follows symptoms of fw error such as fw syndrome by triggering
19962306a36Sopenharmony_cifw core dump and storing it into the dump buffer.
20062306a36Sopenharmony_ciThe fw reporter diagnose command can be triggered any time by the user to check
20162306a36Sopenharmony_cicurrent fw status.
20262306a36Sopenharmony_ci
20362306a36Sopenharmony_ciUser commands examples:
20462306a36Sopenharmony_ci
20562306a36Sopenharmony_ci- Check fw heath status::
20662306a36Sopenharmony_ci
20762306a36Sopenharmony_ci    $ devlink health diagnose pci/0000:82:00.0 reporter fw
20862306a36Sopenharmony_ci
20962306a36Sopenharmony_ci- Read FW core dump if already stored or trigger new one::
21062306a36Sopenharmony_ci
21162306a36Sopenharmony_ci    $ devlink health dump show pci/0000:82:00.0 reporter fw
21262306a36Sopenharmony_ci
21362306a36Sopenharmony_ci.. note::
21462306a36Sopenharmony_ci   This command can run only on the PF which has fw tracer ownership,
21562306a36Sopenharmony_ci   running it on other PF or any VF will return "Operation not permitted".
21662306a36Sopenharmony_ci
21762306a36Sopenharmony_cifw fatal reporter
21862306a36Sopenharmony_ci-----------------
21962306a36Sopenharmony_ciThe fw fatal reporter implements `dump` and `recover` callbacks.
22062306a36Sopenharmony_ciIt follows fatal errors indications by CR-space dump and recover flow.
22162306a36Sopenharmony_ciThe CR-space dump uses vsc interface which is valid even if the FW command
22262306a36Sopenharmony_ciinterface is not functional, which is the case in most FW fatal errors.
22362306a36Sopenharmony_ciThe recover function runs recover flow which reloads the driver and triggers fw
22462306a36Sopenharmony_cireset if needed.
22562306a36Sopenharmony_ciOn firmware error, the health buffer is dumped into the dmesg. The log
22662306a36Sopenharmony_cilevel is derived from the error's severity (given in health buffer).
22762306a36Sopenharmony_ci
22862306a36Sopenharmony_ciUser commands examples:
22962306a36Sopenharmony_ci
23062306a36Sopenharmony_ci- Run fw recover flow manually::
23162306a36Sopenharmony_ci
23262306a36Sopenharmony_ci    $ devlink health recover pci/0000:82:00.0 reporter fw_fatal
23362306a36Sopenharmony_ci
23462306a36Sopenharmony_ci- Read FW CR-space dump if already stored or trigger new one::
23562306a36Sopenharmony_ci
23662306a36Sopenharmony_ci    $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
23762306a36Sopenharmony_ci
23862306a36Sopenharmony_ci.. note::
23962306a36Sopenharmony_ci   This command can run only on PF.
24062306a36Sopenharmony_ci
24162306a36Sopenharmony_civnic reporter
24262306a36Sopenharmony_ci-------------
24362306a36Sopenharmony_ciThe vnic reporter implements only the `diagnose` callback.
24462306a36Sopenharmony_ciIt is responsible for querying the vnic diagnostic counters from fw and displaying
24562306a36Sopenharmony_cithem in realtime.
24662306a36Sopenharmony_ci
24762306a36Sopenharmony_ciDescription of the vnic counters:
24862306a36Sopenharmony_ci
24962306a36Sopenharmony_ci- total_q_under_processor_handle
25062306a36Sopenharmony_ci        number of queues in an error state due to
25162306a36Sopenharmony_ci        an async error or errored command.
25262306a36Sopenharmony_ci- send_queue_priority_update_flow
25362306a36Sopenharmony_ci        number of QP/SQ priority/SL update events.
25462306a36Sopenharmony_ci- cq_overrun
25562306a36Sopenharmony_ci        number of times CQ entered an error state due to an overflow.
25662306a36Sopenharmony_ci- async_eq_overrun
25762306a36Sopenharmony_ci        number of times an EQ mapped to async events was overrun.
25862306a36Sopenharmony_ci        comp_eq_overrun number of times an EQ mapped to completion events was
25962306a36Sopenharmony_ci        overrun.
26062306a36Sopenharmony_ci- quota_exceeded_command
26162306a36Sopenharmony_ci        number of commands issued and failed due to quota exceeded.
26262306a36Sopenharmony_ci- invalid_command
26362306a36Sopenharmony_ci        number of commands issued and failed dues to any reason other than quota
26462306a36Sopenharmony_ci        exceeded.
26562306a36Sopenharmony_ci- nic_receive_steering_discard
26662306a36Sopenharmony_ci        number of packets that completed RX flow
26762306a36Sopenharmony_ci        steering but were discarded due to a mismatch in flow table.
26862306a36Sopenharmony_ci- generated_pkt_steering_fail
26962306a36Sopenharmony_ci	number of packets generated by the VNIC experiencing unexpected steering
27062306a36Sopenharmony_ci	failure (at any point in steering flow).
27162306a36Sopenharmony_ci- handled_pkt_steering_fail
27262306a36Sopenharmony_ci	number of packets handled by the VNIC experiencing unexpected steering
27362306a36Sopenharmony_ci	failure (at any point in steering flow owned by the VNIC, including the FDB
27462306a36Sopenharmony_ci	for the eswitch owner).
27562306a36Sopenharmony_ci
27662306a36Sopenharmony_ciUser commands examples:
27762306a36Sopenharmony_ci
27862306a36Sopenharmony_ci- Diagnose PF/VF vnic counters::
27962306a36Sopenharmony_ci
28062306a36Sopenharmony_ci        $ devlink health diagnose pci/0000:82:00.1 reporter vnic
28162306a36Sopenharmony_ci
28262306a36Sopenharmony_ci- Diagnose representor vnic counters (performed by supplying devlink port of the
28362306a36Sopenharmony_ci  representor, which can be obtained via devlink port command)::
28462306a36Sopenharmony_ci
28562306a36Sopenharmony_ci        $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic
28662306a36Sopenharmony_ci
28762306a36Sopenharmony_ci.. note::
28862306a36Sopenharmony_ci   This command can run over all interfaces such as PF/VF and representor ports.
289