162306a36Sopenharmony_ci.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci.. _napi: 462306a36Sopenharmony_ci 562306a36Sopenharmony_ci==== 662306a36Sopenharmony_ciNAPI 762306a36Sopenharmony_ci==== 862306a36Sopenharmony_ci 962306a36Sopenharmony_ciNAPI is the event handling mechanism used by the Linux networking stack. 1062306a36Sopenharmony_ciThe name NAPI no longer stands for anything in particular [#]_. 1162306a36Sopenharmony_ci 1262306a36Sopenharmony_ciIn basic operation the device notifies the host about new events 1362306a36Sopenharmony_civia an interrupt. 1462306a36Sopenharmony_ciThe host then schedules a NAPI instance to process the events. 1562306a36Sopenharmony_ciThe device may also be polled for events via NAPI without receiving 1662306a36Sopenharmony_ciinterrupts first (:ref:`busy polling<poll>`). 1762306a36Sopenharmony_ci 1862306a36Sopenharmony_ciNAPI processing usually happens in the software interrupt context, 1962306a36Sopenharmony_cibut there is an option to use :ref:`separate kernel threads<threaded>` 2062306a36Sopenharmony_cifor NAPI processing. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciAll in all NAPI abstracts away from the drivers the context and configuration 2362306a36Sopenharmony_ciof event (packet Rx and Tx) processing. 2462306a36Sopenharmony_ci 2562306a36Sopenharmony_ciDriver API 2662306a36Sopenharmony_ci========== 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ciThe two most important elements of NAPI are the struct napi_struct 2962306a36Sopenharmony_ciand the associated poll method. struct napi_struct holds the state 3062306a36Sopenharmony_ciof the NAPI instance while the method is the driver-specific event 3162306a36Sopenharmony_cihandler. The method will typically free Tx packets that have been 3262306a36Sopenharmony_citransmitted and process newly received packets. 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ci.. _drv_ctrl: 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ciControl API 3762306a36Sopenharmony_ci----------- 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_cinetif_napi_add() and netif_napi_del() add/remove a NAPI instance 4062306a36Sopenharmony_cifrom the system. The instances are attached to the netdevice passed 4162306a36Sopenharmony_cias argument (and will be deleted automatically when netdevice is 4262306a36Sopenharmony_ciunregistered). Instances are added in a disabled state. 4362306a36Sopenharmony_ci 4462306a36Sopenharmony_cinapi_enable() and napi_disable() manage the disabled state. 4562306a36Sopenharmony_ciA disabled NAPI can't be scheduled and its poll method is guaranteed 4662306a36Sopenharmony_cito not be invoked. napi_disable() waits for ownership of the NAPI 4762306a36Sopenharmony_ciinstance to be released. 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_ciThe control APIs are not idempotent. Control API calls are safe against 5062306a36Sopenharmony_ciconcurrent use of datapath APIs but an incorrect sequence of control API 5162306a36Sopenharmony_cicalls may result in crashes, deadlocks, or race conditions. For example, 5262306a36Sopenharmony_cicalling napi_disable() multiple times in a row will deadlock. 5362306a36Sopenharmony_ci 5462306a36Sopenharmony_ciDatapath API 5562306a36Sopenharmony_ci------------ 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_cinapi_schedule() is the basic method of scheduling a NAPI poll. 5862306a36Sopenharmony_ciDrivers should call this function in their interrupt handler 5962306a36Sopenharmony_ci(see :ref:`drv_sched` for more info). A successful call to napi_schedule() 6062306a36Sopenharmony_ciwill take ownership of the NAPI instance. 6162306a36Sopenharmony_ci 6262306a36Sopenharmony_ciLater, after NAPI is scheduled, the driver's poll method will be 6362306a36Sopenharmony_cicalled to process the events/packets. The method takes a ``budget`` 6462306a36Sopenharmony_ciargument - drivers can process completions for any number of Tx 6562306a36Sopenharmony_cipackets but should only process up to ``budget`` number of 6662306a36Sopenharmony_ciRx packets. Rx processing is usually much more expensive. 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ciIn other words for Rx processing the ``budget`` argument limits how many 6962306a36Sopenharmony_cipackets driver can process in a single poll. Rx specific APIs like page 7062306a36Sopenharmony_cipool or XDP cannot be used at all when ``budget`` is 0. 7162306a36Sopenharmony_ciskb Tx processing should happen regardless of the ``budget``, but if 7262306a36Sopenharmony_cithe argument is 0 driver cannot call any XDP (or page pool) APIs. 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ci.. warning:: 7562306a36Sopenharmony_ci 7662306a36Sopenharmony_ci The ``budget`` argument may be 0 if core tries to only process 7762306a36Sopenharmony_ci skb Tx completions and no Rx or XDP packets. 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ciThe poll method returns the amount of work done. If the driver still 8062306a36Sopenharmony_cihas outstanding work to do (e.g. ``budget`` was exhausted) 8162306a36Sopenharmony_cithe poll method should return exactly ``budget``. In that case, 8262306a36Sopenharmony_cithe NAPI instance will be serviced/polled again (without the 8362306a36Sopenharmony_cineed to be scheduled). 8462306a36Sopenharmony_ci 8562306a36Sopenharmony_ciIf event processing has been completed (all outstanding packets 8662306a36Sopenharmony_ciprocessed) the poll method should call napi_complete_done() 8762306a36Sopenharmony_cibefore returning. napi_complete_done() releases the ownership 8862306a36Sopenharmony_ciof the instance. 8962306a36Sopenharmony_ci 9062306a36Sopenharmony_ci.. warning:: 9162306a36Sopenharmony_ci 9262306a36Sopenharmony_ci The case of finishing all events and using exactly ``budget`` 9362306a36Sopenharmony_ci must be handled carefully. There is no way to report this 9462306a36Sopenharmony_ci (rare) condition to the stack, so the driver must either 9562306a36Sopenharmony_ci not call napi_complete_done() and wait to be called again, 9662306a36Sopenharmony_ci or return ``budget - 1``. 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ci If the ``budget`` is 0 napi_complete_done() should never be called. 9962306a36Sopenharmony_ci 10062306a36Sopenharmony_ciCall sequence 10162306a36Sopenharmony_ci------------- 10262306a36Sopenharmony_ci 10362306a36Sopenharmony_ciDrivers should not make assumptions about the exact sequencing 10462306a36Sopenharmony_ciof calls. The poll method may be called without the driver scheduling 10562306a36Sopenharmony_cithe instance (unless the instance is disabled). Similarly, 10662306a36Sopenharmony_ciit's not guaranteed that the poll method will be called, even 10762306a36Sopenharmony_ciif napi_schedule() succeeded (e.g. if the instance gets disabled). 10862306a36Sopenharmony_ci 10962306a36Sopenharmony_ciAs mentioned in the :ref:`drv_ctrl` section - napi_disable() and subsequent 11062306a36Sopenharmony_cicalls to the poll method only wait for the ownership of the instance 11162306a36Sopenharmony_cito be released, not for the poll method to exit. This means that 11262306a36Sopenharmony_cidrivers should avoid accessing any data structures after calling 11362306a36Sopenharmony_cinapi_complete_done(). 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ci.. _drv_sched: 11662306a36Sopenharmony_ci 11762306a36Sopenharmony_ciScheduling and IRQ masking 11862306a36Sopenharmony_ci-------------------------- 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ciDrivers should keep the interrupts masked after scheduling 12162306a36Sopenharmony_cithe NAPI instance - until NAPI polling finishes any further 12262306a36Sopenharmony_ciinterrupts are unnecessary. 12362306a36Sopenharmony_ci 12462306a36Sopenharmony_ciDrivers which have to mask the interrupts explicitly (as opposed 12562306a36Sopenharmony_cito IRQ being auto-masked by the device) should use the napi_schedule_prep() 12662306a36Sopenharmony_ciand __napi_schedule() calls: 12762306a36Sopenharmony_ci 12862306a36Sopenharmony_ci.. code-block:: c 12962306a36Sopenharmony_ci 13062306a36Sopenharmony_ci if (napi_schedule_prep(&v->napi)) { 13162306a36Sopenharmony_ci mydrv_mask_rxtx_irq(v->idx); 13262306a36Sopenharmony_ci /* schedule after masking to avoid races */ 13362306a36Sopenharmony_ci __napi_schedule(&v->napi); 13462306a36Sopenharmony_ci } 13562306a36Sopenharmony_ci 13662306a36Sopenharmony_ciIRQ should only be unmasked after a successful call to napi_complete_done(): 13762306a36Sopenharmony_ci 13862306a36Sopenharmony_ci.. code-block:: c 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ci if (budget && napi_complete_done(&v->napi, work_done)) { 14162306a36Sopenharmony_ci mydrv_unmask_rxtx_irq(v->idx); 14262306a36Sopenharmony_ci return min(work_done, budget - 1); 14362306a36Sopenharmony_ci } 14462306a36Sopenharmony_ci 14562306a36Sopenharmony_cinapi_schedule_irqoff() is a variant of napi_schedule() which takes advantage 14662306a36Sopenharmony_ciof guarantees given by being invoked in IRQ context (no need to 14762306a36Sopenharmony_cimask interrupts). Note that PREEMPT_RT forces all interrupts 14862306a36Sopenharmony_cito be threaded so the interrupt may need to be marked ``IRQF_NO_THREAD`` 14962306a36Sopenharmony_cito avoid issues on real-time kernel configurations. 15062306a36Sopenharmony_ci 15162306a36Sopenharmony_ciInstance to queue mapping 15262306a36Sopenharmony_ci------------------------- 15362306a36Sopenharmony_ci 15462306a36Sopenharmony_ciModern devices have multiple NAPI instances (struct napi_struct) per 15562306a36Sopenharmony_ciinterface. There is no strong requirement on how the instances are 15662306a36Sopenharmony_cimapped to queues and interrupts. NAPI is primarily a polling/processing 15762306a36Sopenharmony_ciabstraction without specific user-facing semantics. That said, most networking 15862306a36Sopenharmony_cidevices end up using NAPI in fairly similar ways. 15962306a36Sopenharmony_ci 16062306a36Sopenharmony_ciNAPI instances most often correspond 1:1:1 to interrupts and queue pairs 16162306a36Sopenharmony_ci(queue pair is a set of a single Rx and single Tx queue). 16262306a36Sopenharmony_ci 16362306a36Sopenharmony_ciIn less common cases a NAPI instance may be used for multiple queues 16462306a36Sopenharmony_cior Rx and Tx queues can be serviced by separate NAPI instances on a single 16562306a36Sopenharmony_cicore. Regardless of the queue assignment, however, there is usually still 16662306a36Sopenharmony_cia 1:1 mapping between NAPI instances and interrupts. 16762306a36Sopenharmony_ci 16862306a36Sopenharmony_ciIt's worth noting that the ethtool API uses a "channel" terminology where 16962306a36Sopenharmony_cieach channel can be either ``rx``, ``tx`` or ``combined``. It's not clear 17062306a36Sopenharmony_ciwhat constitutes a channel; the recommended interpretation is to understand 17162306a36Sopenharmony_cia channel as an IRQ/NAPI which services queues of a given type. For example, 17262306a36Sopenharmony_cia configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected 17362306a36Sopenharmony_cito utilize 3 interrupts, 2 Rx and 2 Tx queues. 17462306a36Sopenharmony_ci 17562306a36Sopenharmony_ciUser API 17662306a36Sopenharmony_ci======== 17762306a36Sopenharmony_ci 17862306a36Sopenharmony_ciUser interactions with NAPI depend on NAPI instance ID. The instance IDs 17962306a36Sopenharmony_ciare only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option. 18062306a36Sopenharmony_ciIt's not currently possible to query IDs used by a given device. 18162306a36Sopenharmony_ci 18262306a36Sopenharmony_ciSoftware IRQ coalescing 18362306a36Sopenharmony_ci----------------------- 18462306a36Sopenharmony_ci 18562306a36Sopenharmony_ciNAPI does not perform any explicit event coalescing by default. 18662306a36Sopenharmony_ciIn most scenarios batching happens due to IRQ coalescing which is done 18762306a36Sopenharmony_ciby the device. There are cases where software coalescing is helpful. 18862306a36Sopenharmony_ci 18962306a36Sopenharmony_ciNAPI can be configured to arm a repoll timer instead of unmasking 19062306a36Sopenharmony_cithe hardware interrupts as soon as all packets are processed. 19162306a36Sopenharmony_ciThe ``gro_flush_timeout`` sysfs configuration of the netdevice 19262306a36Sopenharmony_ciis reused to control the delay of the timer, while 19362306a36Sopenharmony_ci``napi_defer_hard_irqs`` controls the number of consecutive empty polls 19462306a36Sopenharmony_cibefore NAPI gives up and goes back to using hardware IRQs. 19562306a36Sopenharmony_ci 19662306a36Sopenharmony_ci.. _poll: 19762306a36Sopenharmony_ci 19862306a36Sopenharmony_ciBusy polling 19962306a36Sopenharmony_ci------------ 20062306a36Sopenharmony_ci 20162306a36Sopenharmony_ciBusy polling allows a user process to check for incoming packets before 20262306a36Sopenharmony_cithe device interrupt fires. As is the case with any busy polling it trades 20362306a36Sopenharmony_cioff CPU cycles for lower latency (production uses of NAPI busy polling 20462306a36Sopenharmony_ciare not well known). 20562306a36Sopenharmony_ci 20662306a36Sopenharmony_ciBusy polling is enabled by either setting ``SO_BUSY_POLL`` on 20762306a36Sopenharmony_ciselected sockets or using the global ``net.core.busy_poll`` and 20862306a36Sopenharmony_ci``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling 20962306a36Sopenharmony_cialso exists. 21062306a36Sopenharmony_ci 21162306a36Sopenharmony_ciIRQ mitigation 21262306a36Sopenharmony_ci--------------- 21362306a36Sopenharmony_ci 21462306a36Sopenharmony_ciWhile busy polling is supposed to be used by low latency applications, 21562306a36Sopenharmony_cia similar mechanism can be used for IRQ mitigation. 21662306a36Sopenharmony_ci 21762306a36Sopenharmony_ciVery high request-per-second applications (especially routing/forwarding 21862306a36Sopenharmony_ciapplications and especially applications using AF_XDP sockets) may not 21962306a36Sopenharmony_ciwant to be interrupted until they finish processing a request or a batch 22062306a36Sopenharmony_ciof packets. 22162306a36Sopenharmony_ci 22262306a36Sopenharmony_ciSuch applications can pledge to the kernel that they will perform a busy 22362306a36Sopenharmony_cipolling operation periodically, and the driver should keep the device IRQs 22462306a36Sopenharmony_cipermanently masked. This mode is enabled by using the ``SO_PREFER_BUSY_POLL`` 22562306a36Sopenharmony_cisocket option. To avoid system misbehavior the pledge is revoked 22662306a36Sopenharmony_ciif ``gro_flush_timeout`` passes without any busy poll call. 22762306a36Sopenharmony_ci 22862306a36Sopenharmony_ciThe NAPI budget for busy polling is lower than the default (which makes 22962306a36Sopenharmony_cisense given the low latency intention of normal busy polling). This is 23062306a36Sopenharmony_cinot the case with IRQ mitigation, however, so the budget can be adjusted 23162306a36Sopenharmony_ciwith the ``SO_BUSY_POLL_BUDGET`` socket option. 23262306a36Sopenharmony_ci 23362306a36Sopenharmony_ci.. _threaded: 23462306a36Sopenharmony_ci 23562306a36Sopenharmony_ciThreaded NAPI 23662306a36Sopenharmony_ci------------- 23762306a36Sopenharmony_ci 23862306a36Sopenharmony_ciThreaded NAPI is an operating mode that uses dedicated kernel 23962306a36Sopenharmony_cithreads rather than software IRQ context for NAPI processing. 24062306a36Sopenharmony_ciThe configuration is per netdevice and will affect all 24162306a36Sopenharmony_ciNAPI instances of that device. Each NAPI instance will spawn a separate 24262306a36Sopenharmony_cithread (called ``napi/${ifc-name}-${napi-id}``). 24362306a36Sopenharmony_ci 24462306a36Sopenharmony_ciIt is recommended to pin each kernel thread to a single CPU, the same 24562306a36Sopenharmony_ciCPU as the CPU which services the interrupt. Note that the mapping 24662306a36Sopenharmony_cibetween IRQs and NAPI instances may not be trivial (and is driver 24762306a36Sopenharmony_cidependent). The NAPI instance IDs will be assigned in the opposite 24862306a36Sopenharmony_ciorder than the process IDs of the kernel threads. 24962306a36Sopenharmony_ci 25062306a36Sopenharmony_ciThreaded NAPI is controlled by writing 0/1 to the ``threaded`` file in 25162306a36Sopenharmony_cinetdev's sysfs directory. 25262306a36Sopenharmony_ci 25362306a36Sopenharmony_ci.. rubric:: Footnotes 25462306a36Sopenharmony_ci 25562306a36Sopenharmony_ci.. [#] NAPI was originally referred to as New API in 2.4 Linux. 256