18c2ecf20Sopenharmony_ci===================================================
28c2ecf20Sopenharmony_ciA Tour Through TREE_RCU's Data Structures [LWN.net]
38c2ecf20Sopenharmony_ci===================================================
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ciDecember 18, 2016
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ciThis article was contributed by Paul E. McKenney
88c2ecf20Sopenharmony_ci
98c2ecf20Sopenharmony_ciIntroduction
108c2ecf20Sopenharmony_ci============
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciThis document describes RCU's major data structures and their relationship
138c2ecf20Sopenharmony_cito each other.
148c2ecf20Sopenharmony_ci
158c2ecf20Sopenharmony_ciData-Structure Relationships
168c2ecf20Sopenharmony_ci============================
178c2ecf20Sopenharmony_ci
188c2ecf20Sopenharmony_ciRCU is for all intents and purposes a large state machine, and its
198c2ecf20Sopenharmony_cidata structures maintain the state in such a way as to allow RCU readers
208c2ecf20Sopenharmony_cito execute extremely quickly, while also processing the RCU grace periods
218c2ecf20Sopenharmony_cirequested by updaters in an efficient and extremely scalable fashion.
228c2ecf20Sopenharmony_ciThe efficiency and scalability of RCU updaters is provided primarily
238c2ecf20Sopenharmony_ciby a combining tree, as shown below:
248c2ecf20Sopenharmony_ci
258c2ecf20Sopenharmony_ci.. kernel-figure:: BigTreeClassicRCU.svg
268c2ecf20Sopenharmony_ci
278c2ecf20Sopenharmony_ciThis diagram shows an enclosing ``rcu_state`` structure containing a tree
288c2ecf20Sopenharmony_ciof ``rcu_node`` structures. Each leaf node of the ``rcu_node`` tree has up
298c2ecf20Sopenharmony_cito 16 ``rcu_data`` structures associated with it, so that there are
308c2ecf20Sopenharmony_ci``NR_CPUS`` number of ``rcu_data`` structures, one for each possible CPU.
318c2ecf20Sopenharmony_ciThis structure is adjusted at boot time, if needed, to handle the common
328c2ecf20Sopenharmony_cicase where ``nr_cpu_ids`` is much less than ``NR_CPUs``.
338c2ecf20Sopenharmony_ciFor example, a number of Linux distributions set ``NR_CPUs=4096``,
348c2ecf20Sopenharmony_ciwhich results in a three-level ``rcu_node`` tree.
358c2ecf20Sopenharmony_ciIf the actual hardware has only 16 CPUs, RCU will adjust itself
368c2ecf20Sopenharmony_ciat boot time, resulting in an ``rcu_node`` tree with only a single node.
378c2ecf20Sopenharmony_ci
388c2ecf20Sopenharmony_ciThe purpose of this combining tree is to allow per-CPU events
398c2ecf20Sopenharmony_cisuch as quiescent states, dyntick-idle transitions,
408c2ecf20Sopenharmony_ciand CPU hotplug operations to be processed efficiently
418c2ecf20Sopenharmony_ciand scalably.
428c2ecf20Sopenharmony_ciQuiescent states are recorded by the per-CPU ``rcu_data`` structures,
438c2ecf20Sopenharmony_ciand other events are recorded by the leaf-level ``rcu_node``
448c2ecf20Sopenharmony_cistructures.
458c2ecf20Sopenharmony_ciAll of these events are combined at each level of the tree until finally
468c2ecf20Sopenharmony_cigrace periods are completed at the tree's root ``rcu_node``
478c2ecf20Sopenharmony_cistructure.
488c2ecf20Sopenharmony_ciA grace period can be completed at the root once every CPU
498c2ecf20Sopenharmony_ci(or, in the case of ``CONFIG_PREEMPT_RCU``, task)
508c2ecf20Sopenharmony_cihas passed through a quiescent state.
518c2ecf20Sopenharmony_ciOnce a grace period has completed, record of that fact is propagated
528c2ecf20Sopenharmony_ciback down the tree.
538c2ecf20Sopenharmony_ci
548c2ecf20Sopenharmony_ciAs can be seen from the diagram, on a 64-bit system
558c2ecf20Sopenharmony_cia two-level tree with 64 leaves can accommodate 1,024 CPUs, with a fanout
568c2ecf20Sopenharmony_ciof 64 at the root and a fanout of 16 at the leaves.
578c2ecf20Sopenharmony_ci
588c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
598c2ecf20Sopenharmony_ci| **Quick Quiz**:                                                       |
608c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
618c2ecf20Sopenharmony_ci| Why isn't the fanout at the leaves also 64?                           |
628c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
638c2ecf20Sopenharmony_ci| **Answer**:                                                           |
648c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
658c2ecf20Sopenharmony_ci| Because there are more types of events that affect the leaf-level     |
668c2ecf20Sopenharmony_ci| ``rcu_node`` structures than further up the tree. Therefore, if the   |
678c2ecf20Sopenharmony_ci| leaf ``rcu_node`` structures have fanout of 64, the contention on     |
688c2ecf20Sopenharmony_ci| these structures' ``->structures`` becomes excessive. Experimentation |
698c2ecf20Sopenharmony_ci| on a wide variety of systems has shown that a fanout of 16 works well |
708c2ecf20Sopenharmony_ci| for the leaves of the ``rcu_node`` tree.                              |
718c2ecf20Sopenharmony_ci|                                                                       |
728c2ecf20Sopenharmony_ci| Of course, further experience with systems having hundreds or         |
738c2ecf20Sopenharmony_ci| thousands of CPUs may demonstrate that the fanout for the non-leaf    |
748c2ecf20Sopenharmony_ci| ``rcu_node`` structures must also be reduced. Such reduction can be   |
758c2ecf20Sopenharmony_ci| easily carried out when and if it proves necessary. In the meantime,  |
768c2ecf20Sopenharmony_ci| if you are using such a system and running into contention problems   |
778c2ecf20Sopenharmony_ci| on the non-leaf ``rcu_node`` structures, you may use the              |
788c2ecf20Sopenharmony_ci| ``CONFIG_RCU_FANOUT`` kernel configuration parameter to reduce the    |
798c2ecf20Sopenharmony_ci| non-leaf fanout as needed.                                            |
808c2ecf20Sopenharmony_ci|                                                                       |
818c2ecf20Sopenharmony_ci| Kernels built for systems with strong NUMA characteristics might      |
828c2ecf20Sopenharmony_ci| also need to adjust ``CONFIG_RCU_FANOUT`` so that the domains of      |
838c2ecf20Sopenharmony_ci| the ``rcu_node`` structures align with hardware boundaries.           |
848c2ecf20Sopenharmony_ci| However, there has thus far been no need for this.                    |
858c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
868c2ecf20Sopenharmony_ci
878c2ecf20Sopenharmony_ciIf your system has more than 1,024 CPUs (or more than 512 CPUs on a
888c2ecf20Sopenharmony_ci32-bit system), then RCU will automatically add more levels to the tree.
898c2ecf20Sopenharmony_ciFor example, if you are crazy enough to build a 64-bit system with
908c2ecf20Sopenharmony_ci65,536 CPUs, RCU would configure the ``rcu_node`` tree as follows:
918c2ecf20Sopenharmony_ci
928c2ecf20Sopenharmony_ci.. kernel-figure:: HugeTreeClassicRCU.svg
938c2ecf20Sopenharmony_ci
948c2ecf20Sopenharmony_ciRCU currently permits up to a four-level tree, which on a 64-bit system
958c2ecf20Sopenharmony_ciaccommodates up to 4,194,304 CPUs, though only a mere 524,288 CPUs for
968c2ecf20Sopenharmony_ci32-bit systems. On the other hand, you can set both
978c2ecf20Sopenharmony_ci``CONFIG_RCU_FANOUT`` and ``CONFIG_RCU_FANOUT_LEAF`` to be as small as
988c2ecf20Sopenharmony_ci2, which would result in a 16-CPU test using a 4-level tree. This can be
998c2ecf20Sopenharmony_ciuseful for testing large-system capabilities on small test machines.
1008c2ecf20Sopenharmony_ci
1018c2ecf20Sopenharmony_ciThis multi-level combining tree allows us to get most of the performance
1028c2ecf20Sopenharmony_ciand scalability benefits of partitioning, even though RCU grace-period
1038c2ecf20Sopenharmony_cidetection is inherently a global operation. The trick here is that only
1048c2ecf20Sopenharmony_cithe last CPU to report a quiescent state into a given ``rcu_node``
1058c2ecf20Sopenharmony_cistructure need advance to the ``rcu_node`` structure at the next level
1068c2ecf20Sopenharmony_ciup the tree. This means that at the leaf-level ``rcu_node`` structure,
1078c2ecf20Sopenharmony_cionly one access out of sixteen will progress up the tree. For the
1088c2ecf20Sopenharmony_ciinternal ``rcu_node`` structures, the situation is even more extreme:
1098c2ecf20Sopenharmony_ciOnly one access out of sixty-four will progress up the tree. Because the
1108c2ecf20Sopenharmony_civast majority of the CPUs do not progress up the tree, the lock
1118c2ecf20Sopenharmony_cicontention remains roughly constant up the tree. No matter how many CPUs
1128c2ecf20Sopenharmony_cithere are in the system, at most 64 quiescent-state reports per grace
1138c2ecf20Sopenharmony_ciperiod will progress all the way to the root ``rcu_node`` structure,
1148c2ecf20Sopenharmony_cithus ensuring that the lock contention on that root ``rcu_node``
1158c2ecf20Sopenharmony_cistructure remains acceptably low.
1168c2ecf20Sopenharmony_ci
1178c2ecf20Sopenharmony_ciIn effect, the combining tree acts like a big shock absorber, keeping
1188c2ecf20Sopenharmony_cilock contention under control at all tree levels regardless of the level
1198c2ecf20Sopenharmony_ciof loading on the system.
1208c2ecf20Sopenharmony_ci
1218c2ecf20Sopenharmony_ciRCU updaters wait for normal grace periods by registering RCU callbacks,
1228c2ecf20Sopenharmony_cieither directly via ``call_rcu()`` or indirectly via
1238c2ecf20Sopenharmony_ci``synchronize_rcu()`` and friends. RCU callbacks are represented by
1248c2ecf20Sopenharmony_ci``rcu_head`` structures, which are queued on ``rcu_data`` structures
1258c2ecf20Sopenharmony_ciwhile they are waiting for a grace period to elapse, as shown in the
1268c2ecf20Sopenharmony_cifollowing figure:
1278c2ecf20Sopenharmony_ci
1288c2ecf20Sopenharmony_ci.. kernel-figure:: BigTreePreemptRCUBHdyntickCB.svg
1298c2ecf20Sopenharmony_ci
1308c2ecf20Sopenharmony_ciThis figure shows how ``TREE_RCU``'s and ``PREEMPT_RCU``'s major data
1318c2ecf20Sopenharmony_cistructures are related. Lesser data structures will be introduced with
1328c2ecf20Sopenharmony_cithe algorithms that make use of them.
1338c2ecf20Sopenharmony_ci
1348c2ecf20Sopenharmony_ciNote that each of the data structures in the above figure has its own
1358c2ecf20Sopenharmony_cisynchronization:
1368c2ecf20Sopenharmony_ci
1378c2ecf20Sopenharmony_ci#. Each ``rcu_state`` structures has a lock and a mutex, and some fields
1388c2ecf20Sopenharmony_ci   are protected by the corresponding root ``rcu_node`` structure's lock.
1398c2ecf20Sopenharmony_ci#. Each ``rcu_node`` structure has a spinlock.
1408c2ecf20Sopenharmony_ci#. The fields in ``rcu_data`` are private to the corresponding CPU,
1418c2ecf20Sopenharmony_ci   although a few can be read and written by other CPUs.
1428c2ecf20Sopenharmony_ci
1438c2ecf20Sopenharmony_ciIt is important to note that different data structures can have very
1448c2ecf20Sopenharmony_cidifferent ideas about the state of RCU at any given time. For but one
1458c2ecf20Sopenharmony_ciexample, awareness of the start or end of a given RCU grace period
1468c2ecf20Sopenharmony_cipropagates slowly through the data structures. This slow propagation is
1478c2ecf20Sopenharmony_ciabsolutely necessary for RCU to have good read-side performance. If this
1488c2ecf20Sopenharmony_cibalkanized implementation seems foreign to you, one useful trick is to
1498c2ecf20Sopenharmony_ciconsider each instance of these data structures to be a different
1508c2ecf20Sopenharmony_ciperson, each having the usual slightly different view of reality.
1518c2ecf20Sopenharmony_ci
1528c2ecf20Sopenharmony_ciThe general role of each of these data structures is as follows:
1538c2ecf20Sopenharmony_ci
1548c2ecf20Sopenharmony_ci#. ``rcu_state``: This structure forms the interconnection between the
1558c2ecf20Sopenharmony_ci   ``rcu_node`` and ``rcu_data`` structures, tracks grace periods,
1568c2ecf20Sopenharmony_ci   serves as short-term repository for callbacks orphaned by CPU-hotplug
1578c2ecf20Sopenharmony_ci   events, maintains ``rcu_barrier()`` state, tracks expedited
1588c2ecf20Sopenharmony_ci   grace-period state, and maintains state used to force quiescent
1598c2ecf20Sopenharmony_ci   states when grace periods extend too long,
1608c2ecf20Sopenharmony_ci#. ``rcu_node``: This structure forms the combining tree that propagates
1618c2ecf20Sopenharmony_ci   quiescent-state information from the leaves to the root, and also
1628c2ecf20Sopenharmony_ci   propagates grace-period information from the root to the leaves. It
1638c2ecf20Sopenharmony_ci   provides local copies of the grace-period state in order to allow
1648c2ecf20Sopenharmony_ci   this information to be accessed in a synchronized manner without
1658c2ecf20Sopenharmony_ci   suffering the scalability limitations that would otherwise be imposed
1668c2ecf20Sopenharmony_ci   by global locking. In ``CONFIG_PREEMPT_RCU`` kernels, it manages the
1678c2ecf20Sopenharmony_ci   lists of tasks that have blocked while in their current RCU read-side
1688c2ecf20Sopenharmony_ci   critical section. In ``CONFIG_PREEMPT_RCU`` with
1698c2ecf20Sopenharmony_ci   ``CONFIG_RCU_BOOST``, it manages the per-\ ``rcu_node``
1708c2ecf20Sopenharmony_ci   priority-boosting kernel threads (kthreads) and state. Finally, it
1718c2ecf20Sopenharmony_ci   records CPU-hotplug state in order to determine which CPUs should be
1728c2ecf20Sopenharmony_ci   ignored during a given grace period.
1738c2ecf20Sopenharmony_ci#. ``rcu_data``: This per-CPU structure is the focus of quiescent-state
1748c2ecf20Sopenharmony_ci   detection and RCU callback queuing. It also tracks its relationship
1758c2ecf20Sopenharmony_ci   to the corresponding leaf ``rcu_node`` structure to allow
1768c2ecf20Sopenharmony_ci   more-efficient propagation of quiescent states up the ``rcu_node``
1778c2ecf20Sopenharmony_ci   combining tree. Like the ``rcu_node`` structure, it provides a local
1788c2ecf20Sopenharmony_ci   copy of the grace-period information to allow for-free synchronized
1798c2ecf20Sopenharmony_ci   access to this information from the corresponding CPU. Finally, this
1808c2ecf20Sopenharmony_ci   structure records past dyntick-idle state for the corresponding CPU
1818c2ecf20Sopenharmony_ci   and also tracks statistics.
1828c2ecf20Sopenharmony_ci#. ``rcu_head``: This structure represents RCU callbacks, and is the
1838c2ecf20Sopenharmony_ci   only structure allocated and managed by RCU users. The ``rcu_head``
1848c2ecf20Sopenharmony_ci   structure is normally embedded within the RCU-protected data
1858c2ecf20Sopenharmony_ci   structure.
1868c2ecf20Sopenharmony_ci
1878c2ecf20Sopenharmony_ciIf all you wanted from this article was a general notion of how RCU's
1888c2ecf20Sopenharmony_cidata structures are related, you are done. Otherwise, each of the
1898c2ecf20Sopenharmony_cifollowing sections give more details on the ``rcu_state``, ``rcu_node``
1908c2ecf20Sopenharmony_ciand ``rcu_data`` data structures.
1918c2ecf20Sopenharmony_ci
1928c2ecf20Sopenharmony_ciThe ``rcu_state`` Structure
1938c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~
1948c2ecf20Sopenharmony_ci
1958c2ecf20Sopenharmony_ciThe ``rcu_state`` structure is the base structure that represents the
1968c2ecf20Sopenharmony_cistate of RCU in the system. This structure forms the interconnection
1978c2ecf20Sopenharmony_cibetween the ``rcu_node`` and ``rcu_data`` structures, tracks grace
1988c2ecf20Sopenharmony_ciperiods, contains the lock used to synchronize with CPU-hotplug events,
1998c2ecf20Sopenharmony_ciand maintains state used to force quiescent states when grace periods
2008c2ecf20Sopenharmony_ciextend too long,
2018c2ecf20Sopenharmony_ci
2028c2ecf20Sopenharmony_ciA few of the ``rcu_state`` structure's fields are discussed, singly and
2038c2ecf20Sopenharmony_ciin groups, in the following sections. The more specialized fields are
2048c2ecf20Sopenharmony_cicovered in the discussion of their use.
2058c2ecf20Sopenharmony_ci
2068c2ecf20Sopenharmony_ciRelationship to rcu_node and rcu_data Structures
2078c2ecf20Sopenharmony_ci''''''''''''''''''''''''''''''''''''''''''''''''
2088c2ecf20Sopenharmony_ci
2098c2ecf20Sopenharmony_ciThis portion of the ``rcu_state`` structure is declared as follows:
2108c2ecf20Sopenharmony_ci
2118c2ecf20Sopenharmony_ci::
2128c2ecf20Sopenharmony_ci
2138c2ecf20Sopenharmony_ci     1   struct rcu_node node[NUM_RCU_NODES];
2148c2ecf20Sopenharmony_ci     2   struct rcu_node *level[NUM_RCU_LVLS + 1];
2158c2ecf20Sopenharmony_ci     3   struct rcu_data __percpu *rda;
2168c2ecf20Sopenharmony_ci
2178c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
2188c2ecf20Sopenharmony_ci| **Quick Quiz**:                                                       |
2198c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
2208c2ecf20Sopenharmony_ci| Wait a minute! You said that the ``rcu_node`` structures formed a     |
2218c2ecf20Sopenharmony_ci| tree, but they are declared as a flat array! What gives?              |
2228c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
2238c2ecf20Sopenharmony_ci| **Answer**:                                                           |
2248c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
2258c2ecf20Sopenharmony_ci| The tree is laid out in the array. The first node In the array is the |
2268c2ecf20Sopenharmony_ci| head, the next set of nodes in the array are children of the head     |
2278c2ecf20Sopenharmony_ci| node, and so on until the last set of nodes in the array are the      |
2288c2ecf20Sopenharmony_ci| leaves.                                                               |
2298c2ecf20Sopenharmony_ci| See the following diagrams to see how this works.                     |
2308c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
2318c2ecf20Sopenharmony_ci
2328c2ecf20Sopenharmony_ciThe ``rcu_node`` tree is embedded into the ``->node[]`` array as shown
2338c2ecf20Sopenharmony_ciin the following figure:
2348c2ecf20Sopenharmony_ci
2358c2ecf20Sopenharmony_ci.. kernel-figure:: TreeMapping.svg
2368c2ecf20Sopenharmony_ci
2378c2ecf20Sopenharmony_ciOne interesting consequence of this mapping is that a breadth-first
2388c2ecf20Sopenharmony_citraversal of the tree is implemented as a simple linear scan of the
2398c2ecf20Sopenharmony_ciarray, which is in fact what the ``rcu_for_each_node_breadth_first()``
2408c2ecf20Sopenharmony_cimacro does. This macro is used at the beginning and ends of grace
2418c2ecf20Sopenharmony_ciperiods.
2428c2ecf20Sopenharmony_ci
2438c2ecf20Sopenharmony_ciEach entry of the ``->level`` array references the first ``rcu_node``
2448c2ecf20Sopenharmony_cistructure on the corresponding level of the tree, for example, as shown
2458c2ecf20Sopenharmony_cibelow:
2468c2ecf20Sopenharmony_ci
2478c2ecf20Sopenharmony_ci.. kernel-figure:: TreeMappingLevel.svg
2488c2ecf20Sopenharmony_ci
2498c2ecf20Sopenharmony_ciThe zero\ :sup:`th` element of the array references the root
2508c2ecf20Sopenharmony_ci``rcu_node`` structure, the first element references the first child of
2518c2ecf20Sopenharmony_cithe root ``rcu_node``, and finally the second element references the
2528c2ecf20Sopenharmony_cifirst leaf ``rcu_node`` structure.
2538c2ecf20Sopenharmony_ci
2548c2ecf20Sopenharmony_ciFor whatever it is worth, if you draw the tree to be tree-shaped rather
2558c2ecf20Sopenharmony_cithan array-shaped, it is easy to draw a planar representation:
2568c2ecf20Sopenharmony_ci
2578c2ecf20Sopenharmony_ci.. kernel-figure:: TreeLevel.svg
2588c2ecf20Sopenharmony_ci
2598c2ecf20Sopenharmony_ciFinally, the ``->rda`` field references a per-CPU pointer to the
2608c2ecf20Sopenharmony_cicorresponding CPU's ``rcu_data`` structure.
2618c2ecf20Sopenharmony_ci
2628c2ecf20Sopenharmony_ciAll of these fields are constant once initialization is complete, and
2638c2ecf20Sopenharmony_citherefore need no protection.
2648c2ecf20Sopenharmony_ci
2658c2ecf20Sopenharmony_ciGrace-Period Tracking
2668c2ecf20Sopenharmony_ci'''''''''''''''''''''
2678c2ecf20Sopenharmony_ci
2688c2ecf20Sopenharmony_ciThis portion of the ``rcu_state`` structure is declared as follows:
2698c2ecf20Sopenharmony_ci
2708c2ecf20Sopenharmony_ci::
2718c2ecf20Sopenharmony_ci
2728c2ecf20Sopenharmony_ci     1   unsigned long gp_seq;
2738c2ecf20Sopenharmony_ci
2748c2ecf20Sopenharmony_ciRCU grace periods are numbered, and the ``->gp_seq`` field contains the
2758c2ecf20Sopenharmony_cicurrent grace-period sequence number. The bottom two bits are the state
2768c2ecf20Sopenharmony_ciof the current grace period, which can be zero for not yet started or
2778c2ecf20Sopenharmony_cione for in progress. In other words, if the bottom two bits of
2788c2ecf20Sopenharmony_ci``->gp_seq`` are zero, then RCU is idle. Any other value in the bottom
2798c2ecf20Sopenharmony_citwo bits indicates that something is broken. This field is protected by
2808c2ecf20Sopenharmony_cithe root ``rcu_node`` structure's ``->lock`` field.
2818c2ecf20Sopenharmony_ci
2828c2ecf20Sopenharmony_ciThere are ``->gp_seq`` fields in the ``rcu_node`` and ``rcu_data``
2838c2ecf20Sopenharmony_cistructures as well. The fields in the ``rcu_state`` structure represent
2848c2ecf20Sopenharmony_cithe most current value, and those of the other structures are compared
2858c2ecf20Sopenharmony_ciin order to detect the beginnings and ends of grace periods in a
2868c2ecf20Sopenharmony_cidistributed fashion. The values flow from ``rcu_state`` to ``rcu_node``
2878c2ecf20Sopenharmony_ci(down the tree from the root to the leaves) to ``rcu_data``.
2888c2ecf20Sopenharmony_ci
2898c2ecf20Sopenharmony_ciMiscellaneous
2908c2ecf20Sopenharmony_ci'''''''''''''
2918c2ecf20Sopenharmony_ci
2928c2ecf20Sopenharmony_ciThis portion of the ``rcu_state`` structure is declared as follows:
2938c2ecf20Sopenharmony_ci
2948c2ecf20Sopenharmony_ci::
2958c2ecf20Sopenharmony_ci
2968c2ecf20Sopenharmony_ci     1   unsigned long gp_max;
2978c2ecf20Sopenharmony_ci     2   char abbr;
2988c2ecf20Sopenharmony_ci     3   char *name;
2998c2ecf20Sopenharmony_ci
3008c2ecf20Sopenharmony_ciThe ``->gp_max`` field tracks the duration of the longest grace period
3018c2ecf20Sopenharmony_ciin jiffies. It is protected by the root ``rcu_node``'s ``->lock``.
3028c2ecf20Sopenharmony_ci
3038c2ecf20Sopenharmony_ciThe ``->name`` and ``->abbr`` fields distinguish between preemptible RCU
3048c2ecf20Sopenharmony_ci(“rcu_preempt” and “p”) and non-preemptible RCU (“rcu_sched” and “s”).
3058c2ecf20Sopenharmony_ciThese fields are used for diagnostic and tracing purposes.
3068c2ecf20Sopenharmony_ci
3078c2ecf20Sopenharmony_ciThe ``rcu_node`` Structure
3088c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~
3098c2ecf20Sopenharmony_ci
3108c2ecf20Sopenharmony_ciThe ``rcu_node`` structures form the combining tree that propagates
3118c2ecf20Sopenharmony_ciquiescent-state information from the leaves to the root and also that
3128c2ecf20Sopenharmony_cipropagates grace-period information from the root down to the leaves.
3138c2ecf20Sopenharmony_ciThey provides local copies of the grace-period state in order to allow
3148c2ecf20Sopenharmony_cithis information to be accessed in a synchronized manner without
3158c2ecf20Sopenharmony_cisuffering the scalability limitations that would otherwise be imposed by
3168c2ecf20Sopenharmony_ciglobal locking. In ``CONFIG_PREEMPT_RCU`` kernels, they manage the lists
3178c2ecf20Sopenharmony_ciof tasks that have blocked while in their current RCU read-side critical
3188c2ecf20Sopenharmony_cisection. In ``CONFIG_PREEMPT_RCU`` with ``CONFIG_RCU_BOOST``, they
3198c2ecf20Sopenharmony_cimanage the per-\ ``rcu_node`` priority-boosting kernel threads
3208c2ecf20Sopenharmony_ci(kthreads) and state. Finally, they record CPU-hotplug state in order to
3218c2ecf20Sopenharmony_cidetermine which CPUs should be ignored during a given grace period.
3228c2ecf20Sopenharmony_ci
3238c2ecf20Sopenharmony_ciThe ``rcu_node`` structure's fields are discussed, singly and in groups,
3248c2ecf20Sopenharmony_ciin the following sections.
3258c2ecf20Sopenharmony_ci
3268c2ecf20Sopenharmony_ciConnection to Combining Tree
3278c2ecf20Sopenharmony_ci''''''''''''''''''''''''''''
3288c2ecf20Sopenharmony_ci
3298c2ecf20Sopenharmony_ciThis portion of the ``rcu_node`` structure is declared as follows:
3308c2ecf20Sopenharmony_ci
3318c2ecf20Sopenharmony_ci::
3328c2ecf20Sopenharmony_ci
3338c2ecf20Sopenharmony_ci     1   struct rcu_node *parent;
3348c2ecf20Sopenharmony_ci     2   u8 level;
3358c2ecf20Sopenharmony_ci     3   u8 grpnum;
3368c2ecf20Sopenharmony_ci     4   unsigned long grpmask;
3378c2ecf20Sopenharmony_ci     5   int grplo;
3388c2ecf20Sopenharmony_ci     6   int grphi;
3398c2ecf20Sopenharmony_ci
3408c2ecf20Sopenharmony_ciThe ``->parent`` pointer references the ``rcu_node`` one level up in the
3418c2ecf20Sopenharmony_citree, and is ``NULL`` for the root ``rcu_node``. The RCU implementation
3428c2ecf20Sopenharmony_cimakes heavy use of this field to push quiescent states up the tree. The
3438c2ecf20Sopenharmony_ci``->level`` field gives the level in the tree, with the root being at
3448c2ecf20Sopenharmony_cilevel zero, its children at level one, and so on. The ``->grpnum`` field
3458c2ecf20Sopenharmony_cigives this node's position within the children of its parent, so this
3468c2ecf20Sopenharmony_cinumber can range between 0 and 31 on 32-bit systems and between 0 and 63
3478c2ecf20Sopenharmony_cion 64-bit systems. The ``->level`` and ``->grpnum`` fields are used only
3488c2ecf20Sopenharmony_ciduring initialization and for tracing. The ``->grpmask`` field is the
3498c2ecf20Sopenharmony_cibitmask counterpart of ``->grpnum``, and therefore always has exactly
3508c2ecf20Sopenharmony_cione bit set. This mask is used to clear the bit corresponding to this
3518c2ecf20Sopenharmony_ci``rcu_node`` structure in its parent's bitmasks, which are described
3528c2ecf20Sopenharmony_cilater. Finally, the ``->grplo`` and ``->grphi`` fields contain the
3538c2ecf20Sopenharmony_cilowest and highest numbered CPU served by this ``rcu_node`` structure,
3548c2ecf20Sopenharmony_cirespectively.
3558c2ecf20Sopenharmony_ci
3568c2ecf20Sopenharmony_ciAll of these fields are constant, and thus do not require any
3578c2ecf20Sopenharmony_cisynchronization.
3588c2ecf20Sopenharmony_ci
3598c2ecf20Sopenharmony_ciSynchronization
3608c2ecf20Sopenharmony_ci'''''''''''''''
3618c2ecf20Sopenharmony_ci
3628c2ecf20Sopenharmony_ciThis field of the ``rcu_node`` structure is declared as follows:
3638c2ecf20Sopenharmony_ci
3648c2ecf20Sopenharmony_ci::
3658c2ecf20Sopenharmony_ci
3668c2ecf20Sopenharmony_ci     1   raw_spinlock_t lock;
3678c2ecf20Sopenharmony_ci
3688c2ecf20Sopenharmony_ciThis field is used to protect the remaining fields in this structure,
3698c2ecf20Sopenharmony_ciunless otherwise stated. That said, all of the fields in this structure
3708c2ecf20Sopenharmony_cican be accessed without locking for tracing purposes. Yes, this can
3718c2ecf20Sopenharmony_ciresult in confusing traces, but better some tracing confusion than to be
3728c2ecf20Sopenharmony_ciheisenbugged out of existence.
3738c2ecf20Sopenharmony_ci
3748c2ecf20Sopenharmony_ci.. _grace-period-tracking-1:
3758c2ecf20Sopenharmony_ci
3768c2ecf20Sopenharmony_ciGrace-Period Tracking
3778c2ecf20Sopenharmony_ci'''''''''''''''''''''
3788c2ecf20Sopenharmony_ci
3798c2ecf20Sopenharmony_ciThis portion of the ``rcu_node`` structure is declared as follows:
3808c2ecf20Sopenharmony_ci
3818c2ecf20Sopenharmony_ci::
3828c2ecf20Sopenharmony_ci
3838c2ecf20Sopenharmony_ci     1   unsigned long gp_seq;
3848c2ecf20Sopenharmony_ci     2   unsigned long gp_seq_needed;
3858c2ecf20Sopenharmony_ci
3868c2ecf20Sopenharmony_ciThe ``rcu_node`` structures' ``->gp_seq`` fields are the counterparts of
3878c2ecf20Sopenharmony_cithe field of the same name in the ``rcu_state`` structure. They each may
3888c2ecf20Sopenharmony_cilag up to one step behind their ``rcu_state`` counterpart. If the bottom
3898c2ecf20Sopenharmony_citwo bits of a given ``rcu_node`` structure's ``->gp_seq`` field is zero,
3908c2ecf20Sopenharmony_cithen this ``rcu_node`` structure believes that RCU is idle.
3918c2ecf20Sopenharmony_ci
3928c2ecf20Sopenharmony_ciThe ``>gp_seq`` field of each ``rcu_node`` structure is updated at the
3938c2ecf20Sopenharmony_cibeginning and the end of each grace period.
3948c2ecf20Sopenharmony_ci
3958c2ecf20Sopenharmony_ciThe ``->gp_seq_needed`` fields record the furthest-in-the-future grace
3968c2ecf20Sopenharmony_ciperiod request seen by the corresponding ``rcu_node`` structure. The
3978c2ecf20Sopenharmony_cirequest is considered fulfilled when the value of the ``->gp_seq`` field
3988c2ecf20Sopenharmony_ciequals or exceeds that of the ``->gp_seq_needed`` field.
3998c2ecf20Sopenharmony_ci
4008c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4018c2ecf20Sopenharmony_ci| **Quick Quiz**:                                                       |
4028c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4038c2ecf20Sopenharmony_ci| Suppose that this ``rcu_node`` structure doesn't see a request for a  |
4048c2ecf20Sopenharmony_ci| very long time. Won't wrapping of the ``->gp_seq`` field cause        |
4058c2ecf20Sopenharmony_ci| problems?                                                             |
4068c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4078c2ecf20Sopenharmony_ci| **Answer**:                                                           |
4088c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4098c2ecf20Sopenharmony_ci| No, because if the ``->gp_seq_needed`` field lags behind the          |
4108c2ecf20Sopenharmony_ci| ``->gp_seq`` field, the ``->gp_seq_needed`` field will be updated at  |
4118c2ecf20Sopenharmony_ci| the end of the grace period. Modulo-arithmetic comparisons therefore  |
4128c2ecf20Sopenharmony_ci| will always get the correct answer, even with wrapping.               |
4138c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4148c2ecf20Sopenharmony_ci
4158c2ecf20Sopenharmony_ciQuiescent-State Tracking
4168c2ecf20Sopenharmony_ci''''''''''''''''''''''''
4178c2ecf20Sopenharmony_ci
4188c2ecf20Sopenharmony_ciThese fields manage the propagation of quiescent states up the combining
4198c2ecf20Sopenharmony_citree.
4208c2ecf20Sopenharmony_ci
4218c2ecf20Sopenharmony_ciThis portion of the ``rcu_node`` structure has fields as follows:
4228c2ecf20Sopenharmony_ci
4238c2ecf20Sopenharmony_ci::
4248c2ecf20Sopenharmony_ci
4258c2ecf20Sopenharmony_ci     1   unsigned long qsmask;
4268c2ecf20Sopenharmony_ci     2   unsigned long expmask;
4278c2ecf20Sopenharmony_ci     3   unsigned long qsmaskinit;
4288c2ecf20Sopenharmony_ci     4   unsigned long expmaskinit;
4298c2ecf20Sopenharmony_ci
4308c2ecf20Sopenharmony_ciThe ``->qsmask`` field tracks which of this ``rcu_node`` structure's
4318c2ecf20Sopenharmony_cichildren still need to report quiescent states for the current normal
4328c2ecf20Sopenharmony_cigrace period. Such children will have a value of 1 in their
4338c2ecf20Sopenharmony_cicorresponding bit. Note that the leaf ``rcu_node`` structures should be
4348c2ecf20Sopenharmony_cithought of as having ``rcu_data`` structures as their children.
4358c2ecf20Sopenharmony_ciSimilarly, the ``->expmask`` field tracks which of this ``rcu_node``
4368c2ecf20Sopenharmony_cistructure's children still need to report quiescent states for the
4378c2ecf20Sopenharmony_cicurrent expedited grace period. An expedited grace period has the same
4388c2ecf20Sopenharmony_ciconceptual properties as a normal grace period, but the expedited
4398c2ecf20Sopenharmony_ciimplementation accepts extreme CPU overhead to obtain much lower
4408c2ecf20Sopenharmony_cigrace-period latency, for example, consuming a few tens of microseconds
4418c2ecf20Sopenharmony_ciworth of CPU time to reduce grace-period duration from milliseconds to
4428c2ecf20Sopenharmony_citens of microseconds. The ``->qsmaskinit`` field tracks which of this
4438c2ecf20Sopenharmony_ci``rcu_node`` structure's children cover for at least one online CPU.
4448c2ecf20Sopenharmony_ciThis mask is used to initialize ``->qsmask``, and ``->expmaskinit`` is
4458c2ecf20Sopenharmony_ciused to initialize ``->expmask`` and the beginning of the normal and
4468c2ecf20Sopenharmony_ciexpedited grace periods, respectively.
4478c2ecf20Sopenharmony_ci
4488c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4498c2ecf20Sopenharmony_ci| **Quick Quiz**:                                                       |
4508c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4518c2ecf20Sopenharmony_ci| Why are these bitmasks protected by locking? Come on, haven't you     |
4528c2ecf20Sopenharmony_ci| heard of atomic instructions???                                       |
4538c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4548c2ecf20Sopenharmony_ci| **Answer**:                                                           |
4558c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4568c2ecf20Sopenharmony_ci| Lockless grace-period computation! Such a tantalizing possibility!    |
4578c2ecf20Sopenharmony_ci| But consider the following sequence of events:                        |
4588c2ecf20Sopenharmony_ci|                                                                       |
4598c2ecf20Sopenharmony_ci| #. CPU 0 has been in dyntick-idle mode for quite some time. When it   |
4608c2ecf20Sopenharmony_ci|    wakes up, it notices that the current RCU grace period needs it to |
4618c2ecf20Sopenharmony_ci|    report in, so it sets a flag where the scheduling clock interrupt  |
4628c2ecf20Sopenharmony_ci|    will find it.                                                      |
4638c2ecf20Sopenharmony_ci| #. Meanwhile, CPU 1 is running ``force_quiescent_state()``, and       |
4648c2ecf20Sopenharmony_ci|    notices that CPU 0 has been in dyntick idle mode, which qualifies  |
4658c2ecf20Sopenharmony_ci|    as an extended quiescent state.                                    |
4668c2ecf20Sopenharmony_ci| #. CPU 0's scheduling clock interrupt fires in the middle of an RCU   |
4678c2ecf20Sopenharmony_ci|    read-side critical section, and notices that the RCU core needs    |
4688c2ecf20Sopenharmony_ci|    something, so commences RCU softirq processing.                    |
4698c2ecf20Sopenharmony_ci| #. CPU 0's softirq handler executes and is just about ready to report |
4708c2ecf20Sopenharmony_ci|    its quiescent state up the ``rcu_node`` tree.                      |
4718c2ecf20Sopenharmony_ci| #. But CPU 1 beats it to the punch, completing the current grace      |
4728c2ecf20Sopenharmony_ci|    period and starting a new one.                                     |
4738c2ecf20Sopenharmony_ci| #. CPU 0 now reports its quiescent state for the wrong grace period.  |
4748c2ecf20Sopenharmony_ci|    That grace period might now end before the RCU read-side critical  |
4758c2ecf20Sopenharmony_ci|    section. If that happens, disaster will ensue.                     |
4768c2ecf20Sopenharmony_ci|                                                                       |
4778c2ecf20Sopenharmony_ci| So the locking is absolutely required in order to coordinate clearing |
4788c2ecf20Sopenharmony_ci| of the bits with updating of the grace-period sequence number in      |
4798c2ecf20Sopenharmony_ci| ``->gp_seq``.                                                         |
4808c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
4818c2ecf20Sopenharmony_ci
4828c2ecf20Sopenharmony_ciBlocked-Task Management
4838c2ecf20Sopenharmony_ci'''''''''''''''''''''''
4848c2ecf20Sopenharmony_ci
4858c2ecf20Sopenharmony_ci``PREEMPT_RCU`` allows tasks to be preempted in the midst of their RCU
4868c2ecf20Sopenharmony_ciread-side critical sections, and these tasks must be tracked explicitly.
4878c2ecf20Sopenharmony_ciThe details of exactly why and how they are tracked will be covered in a
4888c2ecf20Sopenharmony_ciseparate article on RCU read-side processing. For now, it is enough to
4898c2ecf20Sopenharmony_ciknow that the ``rcu_node`` structure tracks them.
4908c2ecf20Sopenharmony_ci
4918c2ecf20Sopenharmony_ci::
4928c2ecf20Sopenharmony_ci
4938c2ecf20Sopenharmony_ci     1   struct list_head blkd_tasks;
4948c2ecf20Sopenharmony_ci     2   struct list_head *gp_tasks;
4958c2ecf20Sopenharmony_ci     3   struct list_head *exp_tasks;
4968c2ecf20Sopenharmony_ci     4   bool wait_blkd_tasks;
4978c2ecf20Sopenharmony_ci
4988c2ecf20Sopenharmony_ciThe ``->blkd_tasks`` field is a list header for the list of blocked and
4998c2ecf20Sopenharmony_cipreempted tasks. As tasks undergo context switches within RCU read-side
5008c2ecf20Sopenharmony_cicritical sections, their ``task_struct`` structures are enqueued (via
5018c2ecf20Sopenharmony_cithe ``task_struct``'s ``->rcu_node_entry`` field) onto the head of the
5028c2ecf20Sopenharmony_ci``->blkd_tasks`` list for the leaf ``rcu_node`` structure corresponding
5038c2ecf20Sopenharmony_cito the CPU on which the outgoing context switch executed. As these tasks
5048c2ecf20Sopenharmony_cilater exit their RCU read-side critical sections, they remove themselves
5058c2ecf20Sopenharmony_cifrom the list. This list is therefore in reverse time order, so that if
5068c2ecf20Sopenharmony_cione of the tasks is blocking the current grace period, all subsequent
5078c2ecf20Sopenharmony_citasks must also be blocking that same grace period. Therefore, a single
5088c2ecf20Sopenharmony_cipointer into this list suffices to track all tasks blocking a given
5098c2ecf20Sopenharmony_cigrace period. That pointer is stored in ``->gp_tasks`` for normal grace
5108c2ecf20Sopenharmony_ciperiods and in ``->exp_tasks`` for expedited grace periods. These last
5118c2ecf20Sopenharmony_citwo fields are ``NULL`` if either there is no grace period in flight or
5128c2ecf20Sopenharmony_ciif there are no blocked tasks preventing that grace period from
5138c2ecf20Sopenharmony_cicompleting. If either of these two pointers is referencing a task that
5148c2ecf20Sopenharmony_ciremoves itself from the ``->blkd_tasks`` list, then that task must
5158c2ecf20Sopenharmony_ciadvance the pointer to the next task on the list, or set the pointer to
5168c2ecf20Sopenharmony_ci``NULL`` if there are no subsequent tasks on the list.
5178c2ecf20Sopenharmony_ci
5188c2ecf20Sopenharmony_ciFor example, suppose that tasks T1, T2, and T3 are all hard-affinitied
5198c2ecf20Sopenharmony_cito the largest-numbered CPU in the system. Then if task T1 blocked in an
5208c2ecf20Sopenharmony_ciRCU read-side critical section, then an expedited grace period started,
5218c2ecf20Sopenharmony_cithen task T2 blocked in an RCU read-side critical section, then a normal
5228c2ecf20Sopenharmony_cigrace period started, and finally task 3 blocked in an RCU read-side
5238c2ecf20Sopenharmony_cicritical section, then the state of the last leaf ``rcu_node``
5248c2ecf20Sopenharmony_cistructure's blocked-task list would be as shown below:
5258c2ecf20Sopenharmony_ci
5268c2ecf20Sopenharmony_ci.. kernel-figure:: blkd_task.svg
5278c2ecf20Sopenharmony_ci
5288c2ecf20Sopenharmony_ciTask T1 is blocking both grace periods, task T2 is blocking only the
5298c2ecf20Sopenharmony_cinormal grace period, and task T3 is blocking neither grace period. Note
5308c2ecf20Sopenharmony_cithat these tasks will not remove themselves from this list immediately
5318c2ecf20Sopenharmony_ciupon resuming execution. They will instead remain on the list until they
5328c2ecf20Sopenharmony_ciexecute the outermost ``rcu_read_unlock()`` that ends their RCU
5338c2ecf20Sopenharmony_ciread-side critical section.
5348c2ecf20Sopenharmony_ci
5358c2ecf20Sopenharmony_ciThe ``->wait_blkd_tasks`` field indicates whether or not the current
5368c2ecf20Sopenharmony_cigrace period is waiting on a blocked task.
5378c2ecf20Sopenharmony_ci
5388c2ecf20Sopenharmony_ciSizing the ``rcu_node`` Array
5398c2ecf20Sopenharmony_ci'''''''''''''''''''''''''''''
5408c2ecf20Sopenharmony_ci
5418c2ecf20Sopenharmony_ciThe ``rcu_node`` array is sized via a series of C-preprocessor
5428c2ecf20Sopenharmony_ciexpressions as follows:
5438c2ecf20Sopenharmony_ci
5448c2ecf20Sopenharmony_ci::
5458c2ecf20Sopenharmony_ci
5468c2ecf20Sopenharmony_ci    1 #ifdef CONFIG_RCU_FANOUT
5478c2ecf20Sopenharmony_ci    2 #define RCU_FANOUT CONFIG_RCU_FANOUT
5488c2ecf20Sopenharmony_ci    3 #else
5498c2ecf20Sopenharmony_ci    4 # ifdef CONFIG_64BIT
5508c2ecf20Sopenharmony_ci    5 # define RCU_FANOUT 64
5518c2ecf20Sopenharmony_ci    6 # else
5528c2ecf20Sopenharmony_ci    7 # define RCU_FANOUT 32
5538c2ecf20Sopenharmony_ci    8 # endif
5548c2ecf20Sopenharmony_ci    9 #endif
5558c2ecf20Sopenharmony_ci   10
5568c2ecf20Sopenharmony_ci   11 #ifdef CONFIG_RCU_FANOUT_LEAF
5578c2ecf20Sopenharmony_ci   12 #define RCU_FANOUT_LEAF CONFIG_RCU_FANOUT_LEAF
5588c2ecf20Sopenharmony_ci   13 #else
5598c2ecf20Sopenharmony_ci   14 # ifdef CONFIG_64BIT
5608c2ecf20Sopenharmony_ci   15 # define RCU_FANOUT_LEAF 64
5618c2ecf20Sopenharmony_ci   16 # else
5628c2ecf20Sopenharmony_ci   17 # define RCU_FANOUT_LEAF 32
5638c2ecf20Sopenharmony_ci   18 # endif
5648c2ecf20Sopenharmony_ci   19 #endif
5658c2ecf20Sopenharmony_ci   20
5668c2ecf20Sopenharmony_ci   21 #define RCU_FANOUT_1        (RCU_FANOUT_LEAF)
5678c2ecf20Sopenharmony_ci   22 #define RCU_FANOUT_2        (RCU_FANOUT_1 * RCU_FANOUT)
5688c2ecf20Sopenharmony_ci   23 #define RCU_FANOUT_3        (RCU_FANOUT_2 * RCU_FANOUT)
5698c2ecf20Sopenharmony_ci   24 #define RCU_FANOUT_4        (RCU_FANOUT_3 * RCU_FANOUT)
5708c2ecf20Sopenharmony_ci   25
5718c2ecf20Sopenharmony_ci   26 #if NR_CPUS <= RCU_FANOUT_1
5728c2ecf20Sopenharmony_ci   27 #  define RCU_NUM_LVLS        1
5738c2ecf20Sopenharmony_ci   28 #  define NUM_RCU_LVL_0        1
5748c2ecf20Sopenharmony_ci   29 #  define NUM_RCU_NODES        NUM_RCU_LVL_0
5758c2ecf20Sopenharmony_ci   30 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0 }
5768c2ecf20Sopenharmony_ci   31 #  define RCU_NODE_NAME_INIT  { "rcu_node_0" }
5778c2ecf20Sopenharmony_ci   32 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0" }
5788c2ecf20Sopenharmony_ci   33 #  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0" }
5798c2ecf20Sopenharmony_ci   34 #elif NR_CPUS <= RCU_FANOUT_2
5808c2ecf20Sopenharmony_ci   35 #  define RCU_NUM_LVLS        2
5818c2ecf20Sopenharmony_ci   36 #  define NUM_RCU_LVL_0        1
5828c2ecf20Sopenharmony_ci   37 #  define NUM_RCU_LVL_1        DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
5838c2ecf20Sopenharmony_ci   38 #  define NUM_RCU_NODES        (NUM_RCU_LVL_0 + NUM_RCU_LVL_1)
5848c2ecf20Sopenharmony_ci   39 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1 }
5858c2ecf20Sopenharmony_ci   40 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1" }
5868c2ecf20Sopenharmony_ci   41 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1" }
5878c2ecf20Sopenharmony_ci   42 #  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1" }
5888c2ecf20Sopenharmony_ci   43 #elif NR_CPUS <= RCU_FANOUT_3
5898c2ecf20Sopenharmony_ci   44 #  define RCU_NUM_LVLS        3
5908c2ecf20Sopenharmony_ci   45 #  define NUM_RCU_LVL_0        1
5918c2ecf20Sopenharmony_ci   46 #  define NUM_RCU_LVL_1        DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
5928c2ecf20Sopenharmony_ci   47 #  define NUM_RCU_LVL_2        DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
5938c2ecf20Sopenharmony_ci   48 #  define NUM_RCU_NODES        (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2)
5948c2ecf20Sopenharmony_ci   49 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2 }
5958c2ecf20Sopenharmony_ci   50 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1", "rcu_node_2" }
5968c2ecf20Sopenharmony_ci   51 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" }
5978c2ecf20Sopenharmony_ci   52 #  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2" }
5988c2ecf20Sopenharmony_ci   53 #elif NR_CPUS <= RCU_FANOUT_4
5998c2ecf20Sopenharmony_ci   54 #  define RCU_NUM_LVLS        4
6008c2ecf20Sopenharmony_ci   55 #  define NUM_RCU_LVL_0        1
6018c2ecf20Sopenharmony_ci   56 #  define NUM_RCU_LVL_1        DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_3)
6028c2ecf20Sopenharmony_ci   57 #  define NUM_RCU_LVL_2        DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
6038c2ecf20Sopenharmony_ci   58 #  define NUM_RCU_LVL_3        DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
6048c2ecf20Sopenharmony_ci   59 #  define NUM_RCU_NODES        (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3)
6058c2ecf20Sopenharmony_ci   60 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2, NUM_RCU_LVL_3 }
6068c2ecf20Sopenharmony_ci   61 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" }
6078c2ecf20Sopenharmony_ci   62 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }
6088c2ecf20Sopenharmony_ci   63 #  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2", "rcu_node_exp_3" }
6098c2ecf20Sopenharmony_ci   64 #else
6108c2ecf20Sopenharmony_ci   65 # error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
6118c2ecf20Sopenharmony_ci   66 #endif
6128c2ecf20Sopenharmony_ci
6138c2ecf20Sopenharmony_ciThe maximum number of levels in the ``rcu_node`` structure is currently
6148c2ecf20Sopenharmony_cilimited to four, as specified by lines 21-24 and the structure of the
6158c2ecf20Sopenharmony_cisubsequent “if” statement. For 32-bit systems, this allows
6168c2ecf20Sopenharmony_ci16*32*32*32=524,288 CPUs, which should be sufficient for the next few
6178c2ecf20Sopenharmony_ciyears at least. For 64-bit systems, 16*64*64*64=4,194,304 CPUs is
6188c2ecf20Sopenharmony_ciallowed, which should see us through the next decade or so. This
6198c2ecf20Sopenharmony_cifour-level tree also allows kernels built with ``CONFIG_RCU_FANOUT=8``
6208c2ecf20Sopenharmony_cito support up to 4096 CPUs, which might be useful in very large systems
6218c2ecf20Sopenharmony_cihaving eight CPUs per socket (but please note that no one has yet shown
6228c2ecf20Sopenharmony_ciany measurable performance degradation due to misaligned socket and
6238c2ecf20Sopenharmony_ci``rcu_node`` boundaries). In addition, building kernels with a full four
6248c2ecf20Sopenharmony_cilevels of ``rcu_node`` tree permits better testing of RCU's
6258c2ecf20Sopenharmony_cicombining-tree code.
6268c2ecf20Sopenharmony_ci
6278c2ecf20Sopenharmony_ciThe ``RCU_FANOUT`` symbol controls how many children are permitted at
6288c2ecf20Sopenharmony_cieach non-leaf level of the ``rcu_node`` tree. If the
6298c2ecf20Sopenharmony_ci``CONFIG_RCU_FANOUT`` Kconfig option is not specified, it is set based
6308c2ecf20Sopenharmony_cion the word size of the system, which is also the Kconfig default.
6318c2ecf20Sopenharmony_ci
6328c2ecf20Sopenharmony_ciThe ``RCU_FANOUT_LEAF`` symbol controls how many CPUs are handled by
6338c2ecf20Sopenharmony_cieach leaf ``rcu_node`` structure. Experience has shown that allowing a
6348c2ecf20Sopenharmony_cigiven leaf ``rcu_node`` structure to handle 64 CPUs, as permitted by the
6358c2ecf20Sopenharmony_cinumber of bits in the ``->qsmask`` field on a 64-bit system, results in
6368c2ecf20Sopenharmony_ciexcessive contention for the leaf ``rcu_node`` structures' ``->lock``
6378c2ecf20Sopenharmony_cifields. The number of CPUs per leaf ``rcu_node`` structure is therefore
6388c2ecf20Sopenharmony_cilimited to 16 given the default value of ``CONFIG_RCU_FANOUT_LEAF``. If
6398c2ecf20Sopenharmony_ci``CONFIG_RCU_FANOUT_LEAF`` is unspecified, the value selected is based
6408c2ecf20Sopenharmony_cion the word size of the system, just as for ``CONFIG_RCU_FANOUT``.
6418c2ecf20Sopenharmony_ciLines 11-19 perform this computation.
6428c2ecf20Sopenharmony_ci
6438c2ecf20Sopenharmony_ciLines 21-24 compute the maximum number of CPUs supported by a
6448c2ecf20Sopenharmony_cisingle-level (which contains a single ``rcu_node`` structure),
6458c2ecf20Sopenharmony_citwo-level, three-level, and four-level ``rcu_node`` tree, respectively,
6468c2ecf20Sopenharmony_cigiven the fanout specified by ``RCU_FANOUT`` and ``RCU_FANOUT_LEAF``.
6478c2ecf20Sopenharmony_ciThese numbers of CPUs are retained in the ``RCU_FANOUT_1``,
6488c2ecf20Sopenharmony_ci``RCU_FANOUT_2``, ``RCU_FANOUT_3``, and ``RCU_FANOUT_4`` C-preprocessor
6498c2ecf20Sopenharmony_civariables, respectively.
6508c2ecf20Sopenharmony_ci
6518c2ecf20Sopenharmony_ciThese variables are used to control the C-preprocessor ``#if`` statement
6528c2ecf20Sopenharmony_cispanning lines 26-66 that computes the number of ``rcu_node`` structures
6538c2ecf20Sopenharmony_cirequired for each level of the tree, as well as the number of levels
6548c2ecf20Sopenharmony_cirequired. The number of levels is placed in the ``NUM_RCU_LVLS``
6558c2ecf20Sopenharmony_ciC-preprocessor variable by lines 27, 35, 44, and 54. The number of
6568c2ecf20Sopenharmony_ci``rcu_node`` structures for the topmost level of the tree is always
6578c2ecf20Sopenharmony_ciexactly one, and this value is unconditionally placed into
6588c2ecf20Sopenharmony_ci``NUM_RCU_LVL_0`` by lines 28, 36, 45, and 55. The rest of the levels
6598c2ecf20Sopenharmony_ci(if any) of the ``rcu_node`` tree are computed by dividing the maximum
6608c2ecf20Sopenharmony_cinumber of CPUs by the fanout supported by the number of levels from the
6618c2ecf20Sopenharmony_cicurrent level down, rounding up. This computation is performed by
6628c2ecf20Sopenharmony_cilines 37, 46-47, and 56-58. Lines 31-33, 40-42, 50-52, and 62-63 create
6638c2ecf20Sopenharmony_ciinitializers for lockdep lock-class names. Finally, lines 64-66 produce
6648c2ecf20Sopenharmony_cian error if the maximum number of CPUs is too large for the specified
6658c2ecf20Sopenharmony_cifanout.
6668c2ecf20Sopenharmony_ci
6678c2ecf20Sopenharmony_ciThe ``rcu_segcblist`` Structure
6688c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6698c2ecf20Sopenharmony_ci
6708c2ecf20Sopenharmony_ciThe ``rcu_segcblist`` structure maintains a segmented list of callbacks
6718c2ecf20Sopenharmony_cias follows:
6728c2ecf20Sopenharmony_ci
6738c2ecf20Sopenharmony_ci::
6748c2ecf20Sopenharmony_ci
6758c2ecf20Sopenharmony_ci    1 #define RCU_DONE_TAIL        0
6768c2ecf20Sopenharmony_ci    2 #define RCU_WAIT_TAIL        1
6778c2ecf20Sopenharmony_ci    3 #define RCU_NEXT_READY_TAIL  2
6788c2ecf20Sopenharmony_ci    4 #define RCU_NEXT_TAIL        3
6798c2ecf20Sopenharmony_ci    5 #define RCU_CBLIST_NSEGS     4
6808c2ecf20Sopenharmony_ci    6
6818c2ecf20Sopenharmony_ci    7 struct rcu_segcblist {
6828c2ecf20Sopenharmony_ci    8   struct rcu_head *head;
6838c2ecf20Sopenharmony_ci    9   struct rcu_head **tails[RCU_CBLIST_NSEGS];
6848c2ecf20Sopenharmony_ci   10   unsigned long gp_seq[RCU_CBLIST_NSEGS];
6858c2ecf20Sopenharmony_ci   11   long len;
6868c2ecf20Sopenharmony_ci   12   long len_lazy;
6878c2ecf20Sopenharmony_ci   13 };
6888c2ecf20Sopenharmony_ci
6898c2ecf20Sopenharmony_ciThe segments are as follows:
6908c2ecf20Sopenharmony_ci
6918c2ecf20Sopenharmony_ci#. ``RCU_DONE_TAIL``: Callbacks whose grace periods have elapsed. These
6928c2ecf20Sopenharmony_ci   callbacks are ready to be invoked.
6938c2ecf20Sopenharmony_ci#. ``RCU_WAIT_TAIL``: Callbacks that are waiting for the current grace
6948c2ecf20Sopenharmony_ci   period. Note that different CPUs can have different ideas about which
6958c2ecf20Sopenharmony_ci   grace period is current, hence the ``->gp_seq`` field.
6968c2ecf20Sopenharmony_ci#. ``RCU_NEXT_READY_TAIL``: Callbacks waiting for the next grace period
6978c2ecf20Sopenharmony_ci   to start.
6988c2ecf20Sopenharmony_ci#. ``RCU_NEXT_TAIL``: Callbacks that have not yet been associated with a
6998c2ecf20Sopenharmony_ci   grace period.
7008c2ecf20Sopenharmony_ci
7018c2ecf20Sopenharmony_ciThe ``->head`` pointer references the first callback or is ``NULL`` if
7028c2ecf20Sopenharmony_cithe list contains no callbacks (which is *not* the same as being empty).
7038c2ecf20Sopenharmony_ciEach element of the ``->tails[]`` array references the ``->next``
7048c2ecf20Sopenharmony_cipointer of the last callback in the corresponding segment of the list,
7058c2ecf20Sopenharmony_cior the list's ``->head`` pointer if that segment and all previous
7068c2ecf20Sopenharmony_cisegments are empty. If the corresponding segment is empty but some
7078c2ecf20Sopenharmony_ciprevious segment is not empty, then the array element is identical to
7088c2ecf20Sopenharmony_ciits predecessor. Older callbacks are closer to the head of the list, and
7098c2ecf20Sopenharmony_cinew callbacks are added at the tail. This relationship between the
7108c2ecf20Sopenharmony_ci``->head`` pointer, the ``->tails[]`` array, and the callbacks is shown
7118c2ecf20Sopenharmony_ciin this diagram:
7128c2ecf20Sopenharmony_ci
7138c2ecf20Sopenharmony_ci.. kernel-figure:: nxtlist.svg
7148c2ecf20Sopenharmony_ci
7158c2ecf20Sopenharmony_ciIn this figure, the ``->head`` pointer references the first RCU callback
7168c2ecf20Sopenharmony_ciin the list. The ``->tails[RCU_DONE_TAIL]`` array element references the
7178c2ecf20Sopenharmony_ci``->head`` pointer itself, indicating that none of the callbacks is
7188c2ecf20Sopenharmony_ciready to invoke. The ``->tails[RCU_WAIT_TAIL]`` array element references
7198c2ecf20Sopenharmony_cicallback CB 2's ``->next`` pointer, which indicates that CB 1 and CB 2
7208c2ecf20Sopenharmony_ciare both waiting on the current grace period, give or take possible
7218c2ecf20Sopenharmony_cidisagreements about exactly which grace period is the current one. The
7228c2ecf20Sopenharmony_ci``->tails[RCU_NEXT_READY_TAIL]`` array element references the same RCU
7238c2ecf20Sopenharmony_cicallback that ``->tails[RCU_WAIT_TAIL]`` does, which indicates that
7248c2ecf20Sopenharmony_cithere are no callbacks waiting on the next RCU grace period. The
7258c2ecf20Sopenharmony_ci``->tails[RCU_NEXT_TAIL]`` array element references CB 4's ``->next``
7268c2ecf20Sopenharmony_cipointer, indicating that all the remaining RCU callbacks have not yet
7278c2ecf20Sopenharmony_cibeen assigned to an RCU grace period. Note that the
7288c2ecf20Sopenharmony_ci``->tails[RCU_NEXT_TAIL]`` array element always references the last RCU
7298c2ecf20Sopenharmony_cicallback's ``->next`` pointer unless the callback list is empty, in
7308c2ecf20Sopenharmony_ciwhich case it references the ``->head`` pointer.
7318c2ecf20Sopenharmony_ci
7328c2ecf20Sopenharmony_ciThere is one additional important special case for the
7338c2ecf20Sopenharmony_ci``->tails[RCU_NEXT_TAIL]`` array element: It can be ``NULL`` when this
7348c2ecf20Sopenharmony_cilist is *disabled*. Lists are disabled when the corresponding CPU is
7358c2ecf20Sopenharmony_cioffline or when the corresponding CPU's callbacks are offloaded to a
7368c2ecf20Sopenharmony_cikthread, both of which are described elsewhere.
7378c2ecf20Sopenharmony_ci
7388c2ecf20Sopenharmony_ciCPUs advance their callbacks from the ``RCU_NEXT_TAIL`` to the
7398c2ecf20Sopenharmony_ci``RCU_NEXT_READY_TAIL`` to the ``RCU_WAIT_TAIL`` to the
7408c2ecf20Sopenharmony_ci``RCU_DONE_TAIL`` list segments as grace periods advance.
7418c2ecf20Sopenharmony_ci
7428c2ecf20Sopenharmony_ciThe ``->gp_seq[]`` array records grace-period numbers corresponding to
7438c2ecf20Sopenharmony_cithe list segments. This is what allows different CPUs to have different
7448c2ecf20Sopenharmony_ciideas as to which is the current grace period while still avoiding
7458c2ecf20Sopenharmony_cipremature invocation of their callbacks. In particular, this allows CPUs
7468c2ecf20Sopenharmony_cithat go idle for extended periods to determine which of their callbacks
7478c2ecf20Sopenharmony_ciare ready to be invoked after reawakening.
7488c2ecf20Sopenharmony_ci
7498c2ecf20Sopenharmony_ciThe ``->len`` counter contains the number of callbacks in ``->head``,
7508c2ecf20Sopenharmony_ciand the ``->len_lazy`` contains the number of those callbacks that are
7518c2ecf20Sopenharmony_ciknown to only free memory, and whose invocation can therefore be safely
7528c2ecf20Sopenharmony_cideferred.
7538c2ecf20Sopenharmony_ci
7548c2ecf20Sopenharmony_ci.. important::
7558c2ecf20Sopenharmony_ci
7568c2ecf20Sopenharmony_ci   It is the ``->len`` field that determines whether or
7578c2ecf20Sopenharmony_ci   not there are callbacks associated with this ``rcu_segcblist``
7588c2ecf20Sopenharmony_ci   structure, *not* the ``->head`` pointer. The reason for this is that all
7598c2ecf20Sopenharmony_ci   the ready-to-invoke callbacks (that is, those in the ``RCU_DONE_TAIL``
7608c2ecf20Sopenharmony_ci   segment) are extracted all at once at callback-invocation time
7618c2ecf20Sopenharmony_ci   (``rcu_do_batch``), due to which ``->head`` may be set to NULL if there
7628c2ecf20Sopenharmony_ci   are no not-done callbacks remaining in the ``rcu_segcblist``. If
7638c2ecf20Sopenharmony_ci   callback invocation must be postponed, for example, because a
7648c2ecf20Sopenharmony_ci   high-priority process just woke up on this CPU, then the remaining
7658c2ecf20Sopenharmony_ci   callbacks are placed back on the ``RCU_DONE_TAIL`` segment and
7668c2ecf20Sopenharmony_ci   ``->head`` once again points to the start of the segment. In short, the
7678c2ecf20Sopenharmony_ci   head field can briefly be ``NULL`` even though the CPU has callbacks
7688c2ecf20Sopenharmony_ci   present the entire time. Therefore, it is not appropriate to test the
7698c2ecf20Sopenharmony_ci   ``->head`` pointer for ``NULL``.
7708c2ecf20Sopenharmony_ci
7718c2ecf20Sopenharmony_ciIn contrast, the ``->len`` and ``->len_lazy`` counts are adjusted only
7728c2ecf20Sopenharmony_ciafter the corresponding callbacks have been invoked. This means that the
7738c2ecf20Sopenharmony_ci``->len`` count is zero only if the ``rcu_segcblist`` structure really
7748c2ecf20Sopenharmony_ciis devoid of callbacks. Of course, off-CPU sampling of the ``->len``
7758c2ecf20Sopenharmony_cicount requires careful use of appropriate synchronization, for example,
7768c2ecf20Sopenharmony_cimemory barriers. This synchronization can be a bit subtle, particularly
7778c2ecf20Sopenharmony_ciin the case of ``rcu_barrier()``.
7788c2ecf20Sopenharmony_ci
7798c2ecf20Sopenharmony_ciThe ``rcu_data`` Structure
7808c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~
7818c2ecf20Sopenharmony_ci
7828c2ecf20Sopenharmony_ciThe ``rcu_data`` maintains the per-CPU state for the RCU subsystem. The
7838c2ecf20Sopenharmony_cifields in this structure may be accessed only from the corresponding CPU
7848c2ecf20Sopenharmony_ci(and from tracing) unless otherwise stated. This structure is the focus
7858c2ecf20Sopenharmony_ciof quiescent-state detection and RCU callback queuing. It also tracks
7868c2ecf20Sopenharmony_ciits relationship to the corresponding leaf ``rcu_node`` structure to
7878c2ecf20Sopenharmony_ciallow more-efficient propagation of quiescent states up the ``rcu_node``
7888c2ecf20Sopenharmony_cicombining tree. Like the ``rcu_node`` structure, it provides a local
7898c2ecf20Sopenharmony_cicopy of the grace-period information to allow for-free synchronized
7908c2ecf20Sopenharmony_ciaccess to this information from the corresponding CPU. Finally, this
7918c2ecf20Sopenharmony_cistructure records past dyntick-idle state for the corresponding CPU and
7928c2ecf20Sopenharmony_cialso tracks statistics.
7938c2ecf20Sopenharmony_ci
7948c2ecf20Sopenharmony_ciThe ``rcu_data`` structure's fields are discussed, singly and in groups,
7958c2ecf20Sopenharmony_ciin the following sections.
7968c2ecf20Sopenharmony_ci
7978c2ecf20Sopenharmony_ciConnection to Other Data Structures
7988c2ecf20Sopenharmony_ci'''''''''''''''''''''''''''''''''''
7998c2ecf20Sopenharmony_ci
8008c2ecf20Sopenharmony_ciThis portion of the ``rcu_data`` structure is declared as follows:
8018c2ecf20Sopenharmony_ci
8028c2ecf20Sopenharmony_ci::
8038c2ecf20Sopenharmony_ci
8048c2ecf20Sopenharmony_ci     1   int cpu;
8058c2ecf20Sopenharmony_ci     2   struct rcu_node *mynode;
8068c2ecf20Sopenharmony_ci     3   unsigned long grpmask;
8078c2ecf20Sopenharmony_ci     4   bool beenonline;
8088c2ecf20Sopenharmony_ci
8098c2ecf20Sopenharmony_ciThe ``->cpu`` field contains the number of the corresponding CPU and the
8108c2ecf20Sopenharmony_ci``->mynode`` field references the corresponding ``rcu_node`` structure.
8118c2ecf20Sopenharmony_ciThe ``->mynode`` is used to propagate quiescent states up the combining
8128c2ecf20Sopenharmony_citree. These two fields are constant and therefore do not require
8138c2ecf20Sopenharmony_cisynchronization.
8148c2ecf20Sopenharmony_ci
8158c2ecf20Sopenharmony_ciThe ``->grpmask`` field indicates the bit in the ``->mynode->qsmask``
8168c2ecf20Sopenharmony_cicorresponding to this ``rcu_data`` structure, and is also used when
8178c2ecf20Sopenharmony_cipropagating quiescent states. The ``->beenonline`` flag is set whenever
8188c2ecf20Sopenharmony_cithe corresponding CPU comes online, which means that the debugfs tracing
8198c2ecf20Sopenharmony_cineed not dump out any ``rcu_data`` structure for which this flag is not
8208c2ecf20Sopenharmony_ciset.
8218c2ecf20Sopenharmony_ci
8228c2ecf20Sopenharmony_ciQuiescent-State and Grace-Period Tracking
8238c2ecf20Sopenharmony_ci'''''''''''''''''''''''''''''''''''''''''
8248c2ecf20Sopenharmony_ci
8258c2ecf20Sopenharmony_ciThis portion of the ``rcu_data`` structure is declared as follows:
8268c2ecf20Sopenharmony_ci
8278c2ecf20Sopenharmony_ci::
8288c2ecf20Sopenharmony_ci
8298c2ecf20Sopenharmony_ci     1   unsigned long gp_seq;
8308c2ecf20Sopenharmony_ci     2   unsigned long gp_seq_needed;
8318c2ecf20Sopenharmony_ci     3   bool cpu_no_qs;
8328c2ecf20Sopenharmony_ci     4   bool core_needs_qs;
8338c2ecf20Sopenharmony_ci     5   bool gpwrap;
8348c2ecf20Sopenharmony_ci
8358c2ecf20Sopenharmony_ciThe ``->gp_seq`` field is the counterpart of the field of the same name
8368c2ecf20Sopenharmony_ciin the ``rcu_state`` and ``rcu_node`` structures. The
8378c2ecf20Sopenharmony_ci``->gp_seq_needed`` field is the counterpart of the field of the same
8388c2ecf20Sopenharmony_ciname in the rcu_node structure. They may each lag up to one behind their
8398c2ecf20Sopenharmony_ci``rcu_node`` counterparts, but in ``CONFIG_NO_HZ_IDLE`` and
8408c2ecf20Sopenharmony_ci``CONFIG_NO_HZ_FULL`` kernels can lag arbitrarily far behind for CPUs in
8418c2ecf20Sopenharmony_cidyntick-idle mode (but these counters will catch up upon exit from
8428c2ecf20Sopenharmony_cidyntick-idle mode). If the lower two bits of a given ``rcu_data``
8438c2ecf20Sopenharmony_cistructure's ``->gp_seq`` are zero, then this ``rcu_data`` structure
8448c2ecf20Sopenharmony_cibelieves that RCU is idle.
8458c2ecf20Sopenharmony_ci
8468c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
8478c2ecf20Sopenharmony_ci| **Quick Quiz**:                                                       |
8488c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
8498c2ecf20Sopenharmony_ci| All this replication of the grace period numbers can only cause       |
8508c2ecf20Sopenharmony_ci| massive confusion. Why not just keep a global sequence number and be  |
8518c2ecf20Sopenharmony_ci| done with it???                                                       |
8528c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
8538c2ecf20Sopenharmony_ci| **Answer**:                                                           |
8548c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
8558c2ecf20Sopenharmony_ci| Because if there was only a single global sequence numbers, there     |
8568c2ecf20Sopenharmony_ci| would need to be a single global lock to allow safely accessing and   |
8578c2ecf20Sopenharmony_ci| updating it. And if we are not going to have a single global lock, we |
8588c2ecf20Sopenharmony_ci| need to carefully manage the numbers on a per-node basis. Recall from |
8598c2ecf20Sopenharmony_ci| the answer to a previous Quick Quiz that the consequences of applying |
8608c2ecf20Sopenharmony_ci| a previously sampled quiescent state to the wrong grace period are    |
8618c2ecf20Sopenharmony_ci| quite severe.                                                         |
8628c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
8638c2ecf20Sopenharmony_ci
8648c2ecf20Sopenharmony_ciThe ``->cpu_no_qs`` flag indicates that the CPU has not yet passed
8658c2ecf20Sopenharmony_cithrough a quiescent state, while the ``->core_needs_qs`` flag indicates
8668c2ecf20Sopenharmony_cithat the RCU core needs a quiescent state from the corresponding CPU.
8678c2ecf20Sopenharmony_ciThe ``->gpwrap`` field indicates that the corresponding CPU has remained
8688c2ecf20Sopenharmony_ciidle for so long that the ``gp_seq`` counter is in danger of overflow,
8698c2ecf20Sopenharmony_ciwhich will cause the CPU to disregard the values of its counters on its
8708c2ecf20Sopenharmony_cinext exit from idle.
8718c2ecf20Sopenharmony_ci
8728c2ecf20Sopenharmony_ciRCU Callback Handling
8738c2ecf20Sopenharmony_ci'''''''''''''''''''''
8748c2ecf20Sopenharmony_ci
8758c2ecf20Sopenharmony_ciIn the absence of CPU-hotplug events, RCU callbacks are invoked by the
8768c2ecf20Sopenharmony_cisame CPU that registered them. This is strictly a cache-locality
8778c2ecf20Sopenharmony_cioptimization: callbacks can and do get invoked on CPUs other than the
8788c2ecf20Sopenharmony_cione that registered them. After all, if the CPU that registered a given
8798c2ecf20Sopenharmony_cicallback has gone offline before the callback can be invoked, there
8808c2ecf20Sopenharmony_cireally is no other choice.
8818c2ecf20Sopenharmony_ci
8828c2ecf20Sopenharmony_ciThis portion of the ``rcu_data`` structure is declared as follows:
8838c2ecf20Sopenharmony_ci
8848c2ecf20Sopenharmony_ci::
8858c2ecf20Sopenharmony_ci
8868c2ecf20Sopenharmony_ci    1 struct rcu_segcblist cblist;
8878c2ecf20Sopenharmony_ci    2 long qlen_last_fqs_check;
8888c2ecf20Sopenharmony_ci    3 unsigned long n_cbs_invoked;
8898c2ecf20Sopenharmony_ci    4 unsigned long n_nocbs_invoked;
8908c2ecf20Sopenharmony_ci    5 unsigned long n_cbs_orphaned;
8918c2ecf20Sopenharmony_ci    6 unsigned long n_cbs_adopted;
8928c2ecf20Sopenharmony_ci    7 unsigned long n_force_qs_snap;
8938c2ecf20Sopenharmony_ci    8 long blimit;
8948c2ecf20Sopenharmony_ci
8958c2ecf20Sopenharmony_ciThe ``->cblist`` structure is the segmented callback list described
8968c2ecf20Sopenharmony_ciearlier. The CPU advances the callbacks in its ``rcu_data`` structure
8978c2ecf20Sopenharmony_ciwhenever it notices that another RCU grace period has completed. The CPU
8988c2ecf20Sopenharmony_cidetects the completion of an RCU grace period by noticing that the value
8998c2ecf20Sopenharmony_ciof its ``rcu_data`` structure's ``->gp_seq`` field differs from that of
9008c2ecf20Sopenharmony_ciits leaf ``rcu_node`` structure. Recall that each ``rcu_node``
9018c2ecf20Sopenharmony_cistructure's ``->gp_seq`` field is updated at the beginnings and ends of
9028c2ecf20Sopenharmony_cieach grace period.
9038c2ecf20Sopenharmony_ci
9048c2ecf20Sopenharmony_ciThe ``->qlen_last_fqs_check`` and ``->n_force_qs_snap`` coordinate the
9058c2ecf20Sopenharmony_ciforcing of quiescent states from ``call_rcu()`` and friends when
9068c2ecf20Sopenharmony_cicallback lists grow excessively long.
9078c2ecf20Sopenharmony_ci
9088c2ecf20Sopenharmony_ciThe ``->n_cbs_invoked``, ``->n_cbs_orphaned``, and ``->n_cbs_adopted``
9098c2ecf20Sopenharmony_cifields count the number of callbacks invoked, sent to other CPUs when
9108c2ecf20Sopenharmony_cithis CPU goes offline, and received from other CPUs when those other
9118c2ecf20Sopenharmony_ciCPUs go offline. The ``->n_nocbs_invoked`` is used when the CPU's
9128c2ecf20Sopenharmony_cicallbacks are offloaded to a kthread.
9138c2ecf20Sopenharmony_ci
9148c2ecf20Sopenharmony_ciFinally, the ``->blimit`` counter is the maximum number of RCU callbacks
9158c2ecf20Sopenharmony_cithat may be invoked at a given time.
9168c2ecf20Sopenharmony_ci
9178c2ecf20Sopenharmony_ciDyntick-Idle Handling
9188c2ecf20Sopenharmony_ci'''''''''''''''''''''
9198c2ecf20Sopenharmony_ci
9208c2ecf20Sopenharmony_ciThis portion of the ``rcu_data`` structure is declared as follows:
9218c2ecf20Sopenharmony_ci
9228c2ecf20Sopenharmony_ci::
9238c2ecf20Sopenharmony_ci
9248c2ecf20Sopenharmony_ci     1   int dynticks_snap;
9258c2ecf20Sopenharmony_ci     2   unsigned long dynticks_fqs;
9268c2ecf20Sopenharmony_ci
9278c2ecf20Sopenharmony_ciThe ``->dynticks_snap`` field is used to take a snapshot of the
9288c2ecf20Sopenharmony_cicorresponding CPU's dyntick-idle state when forcing quiescent states,
9298c2ecf20Sopenharmony_ciand is therefore accessed from other CPUs. Finally, the
9308c2ecf20Sopenharmony_ci``->dynticks_fqs`` field is used to count the number of times this CPU
9318c2ecf20Sopenharmony_ciis determined to be in dyntick-idle state, and is used for tracing and
9328c2ecf20Sopenharmony_cidebugging purposes.
9338c2ecf20Sopenharmony_ci
9348c2ecf20Sopenharmony_ciThis portion of the rcu_data structure is declared as follows:
9358c2ecf20Sopenharmony_ci
9368c2ecf20Sopenharmony_ci::
9378c2ecf20Sopenharmony_ci
9388c2ecf20Sopenharmony_ci     1   long dynticks_nesting;
9398c2ecf20Sopenharmony_ci     2   long dynticks_nmi_nesting;
9408c2ecf20Sopenharmony_ci     3   atomic_t dynticks;
9418c2ecf20Sopenharmony_ci     4   bool rcu_need_heavy_qs;
9428c2ecf20Sopenharmony_ci     5   bool rcu_urgent_qs;
9438c2ecf20Sopenharmony_ci
9448c2ecf20Sopenharmony_ciThese fields in the rcu_data structure maintain the per-CPU dyntick-idle
9458c2ecf20Sopenharmony_cistate for the corresponding CPU. The fields may be accessed only from
9468c2ecf20Sopenharmony_cithe corresponding CPU (and from tracing) unless otherwise stated.
9478c2ecf20Sopenharmony_ci
9488c2ecf20Sopenharmony_ciThe ``->dynticks_nesting`` field counts the nesting depth of process
9498c2ecf20Sopenharmony_ciexecution, so that in normal circumstances this counter has value zero
9508c2ecf20Sopenharmony_cior one. NMIs, irqs, and tracers are counted by the
9518c2ecf20Sopenharmony_ci``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes
9528c2ecf20Sopenharmony_cito this variable have to be undertaken carefully using an algorithm
9538c2ecf20Sopenharmony_ciprovided by Andy Lutomirski. The initial transition from idle adds one,
9548c2ecf20Sopenharmony_ciand nested transitions add two, so that a nesting level of five is
9558c2ecf20Sopenharmony_cirepresented by a ``->dynticks_nmi_nesting`` value of nine. This counter
9568c2ecf20Sopenharmony_cican therefore be thought of as counting the number of reasons why this
9578c2ecf20Sopenharmony_ciCPU cannot be permitted to enter dyntick-idle mode, aside from
9588c2ecf20Sopenharmony_ciprocess-level transitions.
9598c2ecf20Sopenharmony_ci
9608c2ecf20Sopenharmony_ciHowever, it turns out that when running in non-idle kernel context, the
9618c2ecf20Sopenharmony_ciLinux kernel is fully capable of entering interrupt handlers that never
9628c2ecf20Sopenharmony_ciexit and perhaps also vice versa. Therefore, whenever the
9638c2ecf20Sopenharmony_ci``->dynticks_nesting`` field is incremented up from zero, the
9648c2ecf20Sopenharmony_ci``->dynticks_nmi_nesting`` field is set to a large positive number, and
9658c2ecf20Sopenharmony_ciwhenever the ``->dynticks_nesting`` field is decremented down to zero,
9668c2ecf20Sopenharmony_cithe ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
9678c2ecf20Sopenharmony_cithe number of misnested interrupts is not sufficient to overflow the
9688c2ecf20Sopenharmony_cicounter, this approach corrects the ``->dynticks_nmi_nesting`` field
9698c2ecf20Sopenharmony_cievery time the corresponding CPU enters the idle loop from process
9708c2ecf20Sopenharmony_cicontext.
9718c2ecf20Sopenharmony_ci
9728c2ecf20Sopenharmony_ciThe ``->dynticks`` field counts the corresponding CPU's transitions to
9738c2ecf20Sopenharmony_ciand from either dyntick-idle or user mode, so that this counter has an
9748c2ecf20Sopenharmony_cieven value when the CPU is in dyntick-idle mode or user mode and an odd
9758c2ecf20Sopenharmony_civalue otherwise. The transitions to/from user mode need to be counted
9768c2ecf20Sopenharmony_cifor user mode adaptive-ticks support (see timers/NO_HZ.txt).
9778c2ecf20Sopenharmony_ci
9788c2ecf20Sopenharmony_ciThe ``->rcu_need_heavy_qs`` field is used to record the fact that the
9798c2ecf20Sopenharmony_ciRCU core code would really like to see a quiescent state from the
9808c2ecf20Sopenharmony_cicorresponding CPU, so much so that it is willing to call for
9818c2ecf20Sopenharmony_ciheavy-weight dyntick-counter operations. This flag is checked by RCU's
9828c2ecf20Sopenharmony_cicontext-switch and ``cond_resched()`` code, which provide a momentary
9838c2ecf20Sopenharmony_ciidle sojourn in response.
9848c2ecf20Sopenharmony_ci
9858c2ecf20Sopenharmony_ciFinally, the ``->rcu_urgent_qs`` field is used to record the fact that
9868c2ecf20Sopenharmony_cithe RCU core code would really like to see a quiescent state from the
9878c2ecf20Sopenharmony_cicorresponding CPU, with the various other fields indicating just how
9888c2ecf20Sopenharmony_cibadly RCU wants this quiescent state. This flag is checked by RCU's
9898c2ecf20Sopenharmony_cicontext-switch path (``rcu_note_context_switch``) and the cond_resched
9908c2ecf20Sopenharmony_cicode.
9918c2ecf20Sopenharmony_ci
9928c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
9938c2ecf20Sopenharmony_ci| **Quick Quiz**:                                                       |
9948c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
9958c2ecf20Sopenharmony_ci| Why not simply combine the ``->dynticks_nesting`` and                 |
9968c2ecf20Sopenharmony_ci| ``->dynticks_nmi_nesting`` counters into a single counter that just   |
9978c2ecf20Sopenharmony_ci| counts the number of reasons that the corresponding CPU is non-idle?  |
9988c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
9998c2ecf20Sopenharmony_ci| **Answer**:                                                           |
10008c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
10018c2ecf20Sopenharmony_ci| Because this would fail in the presence of interrupts whose handlers  |
10028c2ecf20Sopenharmony_ci| never return and of handlers that manage to return from a made-up     |
10038c2ecf20Sopenharmony_ci| interrupt.                                                            |
10048c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
10058c2ecf20Sopenharmony_ci
10068c2ecf20Sopenharmony_ciAdditional fields are present for some special-purpose builds, and are
10078c2ecf20Sopenharmony_cidiscussed separately.
10088c2ecf20Sopenharmony_ci
10098c2ecf20Sopenharmony_ciThe ``rcu_head`` Structure
10108c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~
10118c2ecf20Sopenharmony_ci
10128c2ecf20Sopenharmony_ciEach ``rcu_head`` structure represents an RCU callback. These structures
10138c2ecf20Sopenharmony_ciare normally embedded within RCU-protected data structures whose
10148c2ecf20Sopenharmony_cialgorithms use asynchronous grace periods. In contrast, when using
10158c2ecf20Sopenharmony_cialgorithms that block waiting for RCU grace periods, RCU users need not
10168c2ecf20Sopenharmony_ciprovide ``rcu_head`` structures.
10178c2ecf20Sopenharmony_ci
10188c2ecf20Sopenharmony_ciThe ``rcu_head`` structure has fields as follows:
10198c2ecf20Sopenharmony_ci
10208c2ecf20Sopenharmony_ci::
10218c2ecf20Sopenharmony_ci
10228c2ecf20Sopenharmony_ci     1   struct rcu_head *next;
10238c2ecf20Sopenharmony_ci     2   void (*func)(struct rcu_head *head);
10248c2ecf20Sopenharmony_ci
10258c2ecf20Sopenharmony_ciThe ``->next`` field is used to link the ``rcu_head`` structures
10268c2ecf20Sopenharmony_citogether in the lists within the ``rcu_data`` structures. The ``->func``
10278c2ecf20Sopenharmony_cifield is a pointer to the function to be called when the callback is
10288c2ecf20Sopenharmony_ciready to be invoked, and this function is passed a pointer to the
10298c2ecf20Sopenharmony_ci``rcu_head`` structure. However, ``kfree_rcu()`` uses the ``->func``
10308c2ecf20Sopenharmony_cifield to record the offset of the ``rcu_head`` structure within the
10318c2ecf20Sopenharmony_cienclosing RCU-protected data structure.
10328c2ecf20Sopenharmony_ci
10338c2ecf20Sopenharmony_ciBoth of these fields are used internally by RCU. From the viewpoint of
10348c2ecf20Sopenharmony_ciRCU users, this structure is an opaque “cookie”.
10358c2ecf20Sopenharmony_ci
10368c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
10378c2ecf20Sopenharmony_ci| **Quick Quiz**:                                                       |
10388c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
10398c2ecf20Sopenharmony_ci| Given that the callback function ``->func`` is passed a pointer to    |
10408c2ecf20Sopenharmony_ci| the ``rcu_head`` structure, how is that function supposed to find the |
10418c2ecf20Sopenharmony_ci| beginning of the enclosing RCU-protected data structure?              |
10428c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
10438c2ecf20Sopenharmony_ci| **Answer**:                                                           |
10448c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
10458c2ecf20Sopenharmony_ci| In actual practice, there is a separate callback function per type of |
10468c2ecf20Sopenharmony_ci| RCU-protected data structure. The callback function can therefore use |
10478c2ecf20Sopenharmony_ci| the ``container_of()`` macro in the Linux kernel (or other            |
10488c2ecf20Sopenharmony_ci| pointer-manipulation facilities in other software environments) to    |
10498c2ecf20Sopenharmony_ci| find the beginning of the enclosing structure.                        |
10508c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
10518c2ecf20Sopenharmony_ci
10528c2ecf20Sopenharmony_ciRCU-Specific Fields in the ``task_struct`` Structure
10538c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10548c2ecf20Sopenharmony_ci
10558c2ecf20Sopenharmony_ciThe ``CONFIG_PREEMPT_RCU`` implementation uses some additional fields in
10568c2ecf20Sopenharmony_cithe ``task_struct`` structure:
10578c2ecf20Sopenharmony_ci
10588c2ecf20Sopenharmony_ci::
10598c2ecf20Sopenharmony_ci
10608c2ecf20Sopenharmony_ci    1 #ifdef CONFIG_PREEMPT_RCU
10618c2ecf20Sopenharmony_ci    2   int rcu_read_lock_nesting;
10628c2ecf20Sopenharmony_ci    3   union rcu_special rcu_read_unlock_special;
10638c2ecf20Sopenharmony_ci    4   struct list_head rcu_node_entry;
10648c2ecf20Sopenharmony_ci    5   struct rcu_node *rcu_blocked_node;
10658c2ecf20Sopenharmony_ci    6 #endif /* #ifdef CONFIG_PREEMPT_RCU */
10668c2ecf20Sopenharmony_ci    7 #ifdef CONFIG_TASKS_RCU
10678c2ecf20Sopenharmony_ci    8   unsigned long rcu_tasks_nvcsw;
10688c2ecf20Sopenharmony_ci    9   bool rcu_tasks_holdout;
10698c2ecf20Sopenharmony_ci   10   struct list_head rcu_tasks_holdout_list;
10708c2ecf20Sopenharmony_ci   11   int rcu_tasks_idle_cpu;
10718c2ecf20Sopenharmony_ci   12 #endif /* #ifdef CONFIG_TASKS_RCU */
10728c2ecf20Sopenharmony_ci
10738c2ecf20Sopenharmony_ciThe ``->rcu_read_lock_nesting`` field records the nesting level for RCU
10748c2ecf20Sopenharmony_ciread-side critical sections, and the ``->rcu_read_unlock_special`` field
10758c2ecf20Sopenharmony_ciis a bitmask that records special conditions that require
10768c2ecf20Sopenharmony_ci``rcu_read_unlock()`` to do additional work. The ``->rcu_node_entry``
10778c2ecf20Sopenharmony_cifield is used to form lists of tasks that have blocked within
10788c2ecf20Sopenharmony_cipreemptible-RCU read-side critical sections and the
10798c2ecf20Sopenharmony_ci``->rcu_blocked_node`` field references the ``rcu_node`` structure whose
10808c2ecf20Sopenharmony_cilist this task is a member of, or ``NULL`` if it is not blocked within a
10818c2ecf20Sopenharmony_cipreemptible-RCU read-side critical section.
10828c2ecf20Sopenharmony_ci
10838c2ecf20Sopenharmony_ciThe ``->rcu_tasks_nvcsw`` field tracks the number of voluntary context
10848c2ecf20Sopenharmony_ciswitches that this task had undergone at the beginning of the current
10858c2ecf20Sopenharmony_citasks-RCU grace period, ``->rcu_tasks_holdout`` is set if the current
10868c2ecf20Sopenharmony_citasks-RCU grace period is waiting on this task,
10878c2ecf20Sopenharmony_ci``->rcu_tasks_holdout_list`` is a list element enqueuing this task on
10888c2ecf20Sopenharmony_cithe holdout list, and ``->rcu_tasks_idle_cpu`` tracks which CPU this
10898c2ecf20Sopenharmony_ciidle task is running, but only if the task is currently running, that
10908c2ecf20Sopenharmony_ciis, if the CPU is currently idle.
10918c2ecf20Sopenharmony_ci
10928c2ecf20Sopenharmony_ciAccessor Functions
10938c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~
10948c2ecf20Sopenharmony_ci
10958c2ecf20Sopenharmony_ciThe following listing shows the ``rcu_get_root()``,
10968c2ecf20Sopenharmony_ci``rcu_for_each_node_breadth_first`` and ``rcu_for_each_leaf_node()``
10978c2ecf20Sopenharmony_cifunction and macros:
10988c2ecf20Sopenharmony_ci
10998c2ecf20Sopenharmony_ci::
11008c2ecf20Sopenharmony_ci
11018c2ecf20Sopenharmony_ci     1 static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
11028c2ecf20Sopenharmony_ci     2 {
11038c2ecf20Sopenharmony_ci     3   return &rsp->node[0];
11048c2ecf20Sopenharmony_ci     4 }
11058c2ecf20Sopenharmony_ci     5
11068c2ecf20Sopenharmony_ci     6 #define rcu_for_each_node_breadth_first(rsp, rnp) \
11078c2ecf20Sopenharmony_ci     7   for ((rnp) = &(rsp)->node[0]; \
11088c2ecf20Sopenharmony_ci     8        (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
11098c2ecf20Sopenharmony_ci     9
11108c2ecf20Sopenharmony_ci    10 #define rcu_for_each_leaf_node(rsp, rnp) \
11118c2ecf20Sopenharmony_ci    11   for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \
11128c2ecf20Sopenharmony_ci    12        (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
11138c2ecf20Sopenharmony_ci
11148c2ecf20Sopenharmony_ciThe ``rcu_get_root()`` simply returns a pointer to the first element of
11158c2ecf20Sopenharmony_cithe specified ``rcu_state`` structure's ``->node[]`` array, which is the
11168c2ecf20Sopenharmony_ciroot ``rcu_node`` structure.
11178c2ecf20Sopenharmony_ci
11188c2ecf20Sopenharmony_ciAs noted earlier, the ``rcu_for_each_node_breadth_first()`` macro takes
11198c2ecf20Sopenharmony_ciadvantage of the layout of the ``rcu_node`` structures in the
11208c2ecf20Sopenharmony_ci``rcu_state`` structure's ``->node[]`` array, performing a breadth-first
11218c2ecf20Sopenharmony_citraversal by simply traversing the array in order. Similarly, the
11228c2ecf20Sopenharmony_ci``rcu_for_each_leaf_node()`` macro traverses only the last part of the
11238c2ecf20Sopenharmony_ciarray, thus traversing only the leaf ``rcu_node`` structures.
11248c2ecf20Sopenharmony_ci
11258c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
11268c2ecf20Sopenharmony_ci| **Quick Quiz**:                                                       |
11278c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
11288c2ecf20Sopenharmony_ci| What does ``rcu_for_each_leaf_node()`` do if the ``rcu_node`` tree    |
11298c2ecf20Sopenharmony_ci| contains only a single node?                                          |
11308c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
11318c2ecf20Sopenharmony_ci| **Answer**:                                                           |
11328c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
11338c2ecf20Sopenharmony_ci| In the single-node case, ``rcu_for_each_leaf_node()`` traverses the   |
11348c2ecf20Sopenharmony_ci| single node.                                                          |
11358c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+
11368c2ecf20Sopenharmony_ci
11378c2ecf20Sopenharmony_ciSummary
11388c2ecf20Sopenharmony_ci~~~~~~~
11398c2ecf20Sopenharmony_ci
11408c2ecf20Sopenharmony_ciSo the state of RCU is represented by an ``rcu_state`` structure, which
11418c2ecf20Sopenharmony_cicontains a combining tree of ``rcu_node`` and ``rcu_data`` structures.
11428c2ecf20Sopenharmony_ciFinally, in ``CONFIG_NO_HZ_IDLE`` kernels, each CPU's dyntick-idle state
11438c2ecf20Sopenharmony_ciis tracked by dynticks-related fields in the ``rcu_data`` structure. If
11448c2ecf20Sopenharmony_ciyou made it this far, you are well prepared to read the code
11458c2ecf20Sopenharmony_ciwalkthroughs in the other articles in this series.
11468c2ecf20Sopenharmony_ci
11478c2ecf20Sopenharmony_ciAcknowledgments
11488c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~
11498c2ecf20Sopenharmony_ci
11508c2ecf20Sopenharmony_ciI owe thanks to Cyrill Gorcunov, Mathieu Desnoyers, Dhaval Giani, Paul
11518c2ecf20Sopenharmony_ciTurner, Abhishek Srivastava, Matt Kowalczyk, and Serge Hallyn for
11528c2ecf20Sopenharmony_cihelping me get this document into a more human-readable state.
11538c2ecf20Sopenharmony_ci
11548c2ecf20Sopenharmony_ciLegal Statement
11558c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~
11568c2ecf20Sopenharmony_ci
11578c2ecf20Sopenharmony_ciThis work represents the view of the author and does not necessarily
11588c2ecf20Sopenharmony_cirepresent the view of IBM.
11598c2ecf20Sopenharmony_ci
11608c2ecf20Sopenharmony_ciLinux is a registered trademark of Linus Torvalds.
11618c2ecf20Sopenharmony_ci
11628c2ecf20Sopenharmony_ciOther company, product, and service names may be trademarks or service
11638c2ecf20Sopenharmony_cimarks of others.
1164