162306a36Sopenharmony_ci.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci.. _napi:
462306a36Sopenharmony_ci
562306a36Sopenharmony_ci====
662306a36Sopenharmony_ciNAPI
762306a36Sopenharmony_ci====
862306a36Sopenharmony_ci
962306a36Sopenharmony_ciNAPI is the event handling mechanism used by the Linux networking stack.
1062306a36Sopenharmony_ciThe name NAPI no longer stands for anything in particular [#]_.
1162306a36Sopenharmony_ci
1262306a36Sopenharmony_ciIn basic operation the device notifies the host about new events
1362306a36Sopenharmony_civia an interrupt.
1462306a36Sopenharmony_ciThe host then schedules a NAPI instance to process the events.
1562306a36Sopenharmony_ciThe device may also be polled for events via NAPI without receiving
1662306a36Sopenharmony_ciinterrupts first (:ref:`busy polling<poll>`).
1762306a36Sopenharmony_ci
1862306a36Sopenharmony_ciNAPI processing usually happens in the software interrupt context,
1962306a36Sopenharmony_cibut there is an option to use :ref:`separate kernel threads<threaded>`
2062306a36Sopenharmony_cifor NAPI processing.
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciAll in all NAPI abstracts away from the drivers the context and configuration
2362306a36Sopenharmony_ciof event (packet Rx and Tx) processing.
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ciDriver API
2662306a36Sopenharmony_ci==========
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ciThe two most important elements of NAPI are the struct napi_struct
2962306a36Sopenharmony_ciand the associated poll method. struct napi_struct holds the state
3062306a36Sopenharmony_ciof the NAPI instance while the method is the driver-specific event
3162306a36Sopenharmony_cihandler. The method will typically free Tx packets that have been
3262306a36Sopenharmony_citransmitted and process newly received packets.
3362306a36Sopenharmony_ci
3462306a36Sopenharmony_ci.. _drv_ctrl:
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ciControl API
3762306a36Sopenharmony_ci-----------
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_cinetif_napi_add() and netif_napi_del() add/remove a NAPI instance
4062306a36Sopenharmony_cifrom the system. The instances are attached to the netdevice passed
4162306a36Sopenharmony_cias argument (and will be deleted automatically when netdevice is
4262306a36Sopenharmony_ciunregistered). Instances are added in a disabled state.
4362306a36Sopenharmony_ci
4462306a36Sopenharmony_cinapi_enable() and napi_disable() manage the disabled state.
4562306a36Sopenharmony_ciA disabled NAPI can't be scheduled and its poll method is guaranteed
4662306a36Sopenharmony_cito not be invoked. napi_disable() waits for ownership of the NAPI
4762306a36Sopenharmony_ciinstance to be released.
4862306a36Sopenharmony_ci
4962306a36Sopenharmony_ciThe control APIs are not idempotent. Control API calls are safe against
5062306a36Sopenharmony_ciconcurrent use of datapath APIs but an incorrect sequence of control API
5162306a36Sopenharmony_cicalls may result in crashes, deadlocks, or race conditions. For example,
5262306a36Sopenharmony_cicalling napi_disable() multiple times in a row will deadlock.
5362306a36Sopenharmony_ci
5462306a36Sopenharmony_ciDatapath API
5562306a36Sopenharmony_ci------------
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_cinapi_schedule() is the basic method of scheduling a NAPI poll.
5862306a36Sopenharmony_ciDrivers should call this function in their interrupt handler
5962306a36Sopenharmony_ci(see :ref:`drv_sched` for more info). A successful call to napi_schedule()
6062306a36Sopenharmony_ciwill take ownership of the NAPI instance.
6162306a36Sopenharmony_ci
6262306a36Sopenharmony_ciLater, after NAPI is scheduled, the driver's poll method will be
6362306a36Sopenharmony_cicalled to process the events/packets. The method takes a ``budget``
6462306a36Sopenharmony_ciargument - drivers can process completions for any number of Tx
6562306a36Sopenharmony_cipackets but should only process up to ``budget`` number of
6662306a36Sopenharmony_ciRx packets. Rx processing is usually much more expensive.
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ciIn other words for Rx processing the ``budget`` argument limits how many
6962306a36Sopenharmony_cipackets driver can process in a single poll. Rx specific APIs like page
7062306a36Sopenharmony_cipool or XDP cannot be used at all when ``budget`` is 0.
7162306a36Sopenharmony_ciskb Tx processing should happen regardless of the ``budget``, but if
7262306a36Sopenharmony_cithe argument is 0 driver cannot call any XDP (or page pool) APIs.
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ci.. warning::
7562306a36Sopenharmony_ci
7662306a36Sopenharmony_ci   The ``budget`` argument may be 0 if core tries to only process
7762306a36Sopenharmony_ci   skb Tx completions and no Rx or XDP packets.
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ciThe poll method returns the amount of work done. If the driver still
8062306a36Sopenharmony_cihas outstanding work to do (e.g. ``budget`` was exhausted)
8162306a36Sopenharmony_cithe poll method should return exactly ``budget``. In that case,
8262306a36Sopenharmony_cithe NAPI instance will be serviced/polled again (without the
8362306a36Sopenharmony_cineed to be scheduled).
8462306a36Sopenharmony_ci
8562306a36Sopenharmony_ciIf event processing has been completed (all outstanding packets
8662306a36Sopenharmony_ciprocessed) the poll method should call napi_complete_done()
8762306a36Sopenharmony_cibefore returning. napi_complete_done() releases the ownership
8862306a36Sopenharmony_ciof the instance.
8962306a36Sopenharmony_ci
9062306a36Sopenharmony_ci.. warning::
9162306a36Sopenharmony_ci
9262306a36Sopenharmony_ci   The case of finishing all events and using exactly ``budget``
9362306a36Sopenharmony_ci   must be handled carefully. There is no way to report this
9462306a36Sopenharmony_ci   (rare) condition to the stack, so the driver must either
9562306a36Sopenharmony_ci   not call napi_complete_done() and wait to be called again,
9662306a36Sopenharmony_ci   or return ``budget - 1``.
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ci   If the ``budget`` is 0 napi_complete_done() should never be called.
9962306a36Sopenharmony_ci
10062306a36Sopenharmony_ciCall sequence
10162306a36Sopenharmony_ci-------------
10262306a36Sopenharmony_ci
10362306a36Sopenharmony_ciDrivers should not make assumptions about the exact sequencing
10462306a36Sopenharmony_ciof calls. The poll method may be called without the driver scheduling
10562306a36Sopenharmony_cithe instance (unless the instance is disabled). Similarly,
10662306a36Sopenharmony_ciit's not guaranteed that the poll method will be called, even
10762306a36Sopenharmony_ciif napi_schedule() succeeded (e.g. if the instance gets disabled).
10862306a36Sopenharmony_ci
10962306a36Sopenharmony_ciAs mentioned in the :ref:`drv_ctrl` section - napi_disable() and subsequent
11062306a36Sopenharmony_cicalls to the poll method only wait for the ownership of the instance
11162306a36Sopenharmony_cito be released, not for the poll method to exit. This means that
11262306a36Sopenharmony_cidrivers should avoid accessing any data structures after calling
11362306a36Sopenharmony_cinapi_complete_done().
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ci.. _drv_sched:
11662306a36Sopenharmony_ci
11762306a36Sopenharmony_ciScheduling and IRQ masking
11862306a36Sopenharmony_ci--------------------------
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ciDrivers should keep the interrupts masked after scheduling
12162306a36Sopenharmony_cithe NAPI instance - until NAPI polling finishes any further
12262306a36Sopenharmony_ciinterrupts are unnecessary.
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ciDrivers which have to mask the interrupts explicitly (as opposed
12562306a36Sopenharmony_cito IRQ being auto-masked by the device) should use the napi_schedule_prep()
12662306a36Sopenharmony_ciand __napi_schedule() calls:
12762306a36Sopenharmony_ci
12862306a36Sopenharmony_ci.. code-block:: c
12962306a36Sopenharmony_ci
13062306a36Sopenharmony_ci  if (napi_schedule_prep(&v->napi)) {
13162306a36Sopenharmony_ci      mydrv_mask_rxtx_irq(v->idx);
13262306a36Sopenharmony_ci      /* schedule after masking to avoid races */
13362306a36Sopenharmony_ci      __napi_schedule(&v->napi);
13462306a36Sopenharmony_ci  }
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_ciIRQ should only be unmasked after a successful call to napi_complete_done():
13762306a36Sopenharmony_ci
13862306a36Sopenharmony_ci.. code-block:: c
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ci  if (budget && napi_complete_done(&v->napi, work_done)) {
14162306a36Sopenharmony_ci    mydrv_unmask_rxtx_irq(v->idx);
14262306a36Sopenharmony_ci    return min(work_done, budget - 1);
14362306a36Sopenharmony_ci  }
14462306a36Sopenharmony_ci
14562306a36Sopenharmony_cinapi_schedule_irqoff() is a variant of napi_schedule() which takes advantage
14662306a36Sopenharmony_ciof guarantees given by being invoked in IRQ context (no need to
14762306a36Sopenharmony_cimask interrupts). Note that PREEMPT_RT forces all interrupts
14862306a36Sopenharmony_cito be threaded so the interrupt may need to be marked ``IRQF_NO_THREAD``
14962306a36Sopenharmony_cito avoid issues on real-time kernel configurations.
15062306a36Sopenharmony_ci
15162306a36Sopenharmony_ciInstance to queue mapping
15262306a36Sopenharmony_ci-------------------------
15362306a36Sopenharmony_ci
15462306a36Sopenharmony_ciModern devices have multiple NAPI instances (struct napi_struct) per
15562306a36Sopenharmony_ciinterface. There is no strong requirement on how the instances are
15662306a36Sopenharmony_cimapped to queues and interrupts. NAPI is primarily a polling/processing
15762306a36Sopenharmony_ciabstraction without specific user-facing semantics. That said, most networking
15862306a36Sopenharmony_cidevices end up using NAPI in fairly similar ways.
15962306a36Sopenharmony_ci
16062306a36Sopenharmony_ciNAPI instances most often correspond 1:1:1 to interrupts and queue pairs
16162306a36Sopenharmony_ci(queue pair is a set of a single Rx and single Tx queue).
16262306a36Sopenharmony_ci
16362306a36Sopenharmony_ciIn less common cases a NAPI instance may be used for multiple queues
16462306a36Sopenharmony_cior Rx and Tx queues can be serviced by separate NAPI instances on a single
16562306a36Sopenharmony_cicore. Regardless of the queue assignment, however, there is usually still
16662306a36Sopenharmony_cia 1:1 mapping between NAPI instances and interrupts.
16762306a36Sopenharmony_ci
16862306a36Sopenharmony_ciIt's worth noting that the ethtool API uses a "channel" terminology where
16962306a36Sopenharmony_cieach channel can be either ``rx``, ``tx`` or ``combined``. It's not clear
17062306a36Sopenharmony_ciwhat constitutes a channel; the recommended interpretation is to understand
17162306a36Sopenharmony_cia channel as an IRQ/NAPI which services queues of a given type. For example,
17262306a36Sopenharmony_cia configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected
17362306a36Sopenharmony_cito utilize 3 interrupts, 2 Rx and 2 Tx queues.
17462306a36Sopenharmony_ci
17562306a36Sopenharmony_ciUser API
17662306a36Sopenharmony_ci========
17762306a36Sopenharmony_ci
17862306a36Sopenharmony_ciUser interactions with NAPI depend on NAPI instance ID. The instance IDs
17962306a36Sopenharmony_ciare only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option.
18062306a36Sopenharmony_ciIt's not currently possible to query IDs used by a given device.
18162306a36Sopenharmony_ci
18262306a36Sopenharmony_ciSoftware IRQ coalescing
18362306a36Sopenharmony_ci-----------------------
18462306a36Sopenharmony_ci
18562306a36Sopenharmony_ciNAPI does not perform any explicit event coalescing by default.
18662306a36Sopenharmony_ciIn most scenarios batching happens due to IRQ coalescing which is done
18762306a36Sopenharmony_ciby the device. There are cases where software coalescing is helpful.
18862306a36Sopenharmony_ci
18962306a36Sopenharmony_ciNAPI can be configured to arm a repoll timer instead of unmasking
19062306a36Sopenharmony_cithe hardware interrupts as soon as all packets are processed.
19162306a36Sopenharmony_ciThe ``gro_flush_timeout`` sysfs configuration of the netdevice
19262306a36Sopenharmony_ciis reused to control the delay of the timer, while
19362306a36Sopenharmony_ci``napi_defer_hard_irqs`` controls the number of consecutive empty polls
19462306a36Sopenharmony_cibefore NAPI gives up and goes back to using hardware IRQs.
19562306a36Sopenharmony_ci
19662306a36Sopenharmony_ci.. _poll:
19762306a36Sopenharmony_ci
19862306a36Sopenharmony_ciBusy polling
19962306a36Sopenharmony_ci------------
20062306a36Sopenharmony_ci
20162306a36Sopenharmony_ciBusy polling allows a user process to check for incoming packets before
20262306a36Sopenharmony_cithe device interrupt fires. As is the case with any busy polling it trades
20362306a36Sopenharmony_cioff CPU cycles for lower latency (production uses of NAPI busy polling
20462306a36Sopenharmony_ciare not well known).
20562306a36Sopenharmony_ci
20662306a36Sopenharmony_ciBusy polling is enabled by either setting ``SO_BUSY_POLL`` on
20762306a36Sopenharmony_ciselected sockets or using the global ``net.core.busy_poll`` and
20862306a36Sopenharmony_ci``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling
20962306a36Sopenharmony_cialso exists.
21062306a36Sopenharmony_ci
21162306a36Sopenharmony_ciIRQ mitigation
21262306a36Sopenharmony_ci---------------
21362306a36Sopenharmony_ci
21462306a36Sopenharmony_ciWhile busy polling is supposed to be used by low latency applications,
21562306a36Sopenharmony_cia similar mechanism can be used for IRQ mitigation.
21662306a36Sopenharmony_ci
21762306a36Sopenharmony_ciVery high request-per-second applications (especially routing/forwarding
21862306a36Sopenharmony_ciapplications and especially applications using AF_XDP sockets) may not
21962306a36Sopenharmony_ciwant to be interrupted until they finish processing a request or a batch
22062306a36Sopenharmony_ciof packets.
22162306a36Sopenharmony_ci
22262306a36Sopenharmony_ciSuch applications can pledge to the kernel that they will perform a busy
22362306a36Sopenharmony_cipolling operation periodically, and the driver should keep the device IRQs
22462306a36Sopenharmony_cipermanently masked. This mode is enabled by using the ``SO_PREFER_BUSY_POLL``
22562306a36Sopenharmony_cisocket option. To avoid system misbehavior the pledge is revoked
22662306a36Sopenharmony_ciif ``gro_flush_timeout`` passes without any busy poll call.
22762306a36Sopenharmony_ci
22862306a36Sopenharmony_ciThe NAPI budget for busy polling is lower than the default (which makes
22962306a36Sopenharmony_cisense given the low latency intention of normal busy polling). This is
23062306a36Sopenharmony_cinot the case with IRQ mitigation, however, so the budget can be adjusted
23162306a36Sopenharmony_ciwith the ``SO_BUSY_POLL_BUDGET`` socket option.
23262306a36Sopenharmony_ci
23362306a36Sopenharmony_ci.. _threaded:
23462306a36Sopenharmony_ci
23562306a36Sopenharmony_ciThreaded NAPI
23662306a36Sopenharmony_ci-------------
23762306a36Sopenharmony_ci
23862306a36Sopenharmony_ciThreaded NAPI is an operating mode that uses dedicated kernel
23962306a36Sopenharmony_cithreads rather than software IRQ context for NAPI processing.
24062306a36Sopenharmony_ciThe configuration is per netdevice and will affect all
24162306a36Sopenharmony_ciNAPI instances of that device. Each NAPI instance will spawn a separate
24262306a36Sopenharmony_cithread (called ``napi/${ifc-name}-${napi-id}``).
24362306a36Sopenharmony_ci
24462306a36Sopenharmony_ciIt is recommended to pin each kernel thread to a single CPU, the same
24562306a36Sopenharmony_ciCPU as the CPU which services the interrupt. Note that the mapping
24662306a36Sopenharmony_cibetween IRQs and NAPI instances may not be trivial (and is driver
24762306a36Sopenharmony_cidependent). The NAPI instance IDs will be assigned in the opposite
24862306a36Sopenharmony_ciorder than the process IDs of the kernel threads.
24962306a36Sopenharmony_ci
25062306a36Sopenharmony_ciThreaded NAPI is controlled by writing 0/1 to the ``threaded`` file in
25162306a36Sopenharmony_cinetdev's sysfs directory.
25262306a36Sopenharmony_ci
25362306a36Sopenharmony_ci.. rubric:: Footnotes
25462306a36Sopenharmony_ci
25562306a36Sopenharmony_ci.. [#] NAPI was originally referred to as New API in 2.4 Linux.
256