18c2ecf20Sopenharmony_ci=================================================== 28c2ecf20Sopenharmony_ciA Tour Through TREE_RCU's Data Structures [LWN.net] 38c2ecf20Sopenharmony_ci=================================================== 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciDecember 18, 2016 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciThis article was contributed by Paul E. McKenney 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ciIntroduction 108c2ecf20Sopenharmony_ci============ 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ciThis document describes RCU's major data structures and their relationship 138c2ecf20Sopenharmony_cito each other. 148c2ecf20Sopenharmony_ci 158c2ecf20Sopenharmony_ciData-Structure Relationships 168c2ecf20Sopenharmony_ci============================ 178c2ecf20Sopenharmony_ci 188c2ecf20Sopenharmony_ciRCU is for all intents and purposes a large state machine, and its 198c2ecf20Sopenharmony_cidata structures maintain the state in such a way as to allow RCU readers 208c2ecf20Sopenharmony_cito execute extremely quickly, while also processing the RCU grace periods 218c2ecf20Sopenharmony_cirequested by updaters in an efficient and extremely scalable fashion. 228c2ecf20Sopenharmony_ciThe efficiency and scalability of RCU updaters is provided primarily 238c2ecf20Sopenharmony_ciby a combining tree, as shown below: 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ci.. kernel-figure:: BigTreeClassicRCU.svg 268c2ecf20Sopenharmony_ci 278c2ecf20Sopenharmony_ciThis diagram shows an enclosing ``rcu_state`` structure containing a tree 288c2ecf20Sopenharmony_ciof ``rcu_node`` structures. Each leaf node of the ``rcu_node`` tree has up 298c2ecf20Sopenharmony_cito 16 ``rcu_data`` structures associated with it, so that there are 308c2ecf20Sopenharmony_ci``NR_CPUS`` number of ``rcu_data`` structures, one for each possible CPU. 318c2ecf20Sopenharmony_ciThis structure is adjusted at boot time, if needed, to handle the common 328c2ecf20Sopenharmony_cicase where ``nr_cpu_ids`` is much less than ``NR_CPUs``. 338c2ecf20Sopenharmony_ciFor example, a number of Linux distributions set ``NR_CPUs=4096``, 348c2ecf20Sopenharmony_ciwhich results in a three-level ``rcu_node`` tree. 358c2ecf20Sopenharmony_ciIf the actual hardware has only 16 CPUs, RCU will adjust itself 368c2ecf20Sopenharmony_ciat boot time, resulting in an ``rcu_node`` tree with only a single node. 378c2ecf20Sopenharmony_ci 388c2ecf20Sopenharmony_ciThe purpose of this combining tree is to allow per-CPU events 398c2ecf20Sopenharmony_cisuch as quiescent states, dyntick-idle transitions, 408c2ecf20Sopenharmony_ciand CPU hotplug operations to be processed efficiently 418c2ecf20Sopenharmony_ciand scalably. 428c2ecf20Sopenharmony_ciQuiescent states are recorded by the per-CPU ``rcu_data`` structures, 438c2ecf20Sopenharmony_ciand other events are recorded by the leaf-level ``rcu_node`` 448c2ecf20Sopenharmony_cistructures. 458c2ecf20Sopenharmony_ciAll of these events are combined at each level of the tree until finally 468c2ecf20Sopenharmony_cigrace periods are completed at the tree's root ``rcu_node`` 478c2ecf20Sopenharmony_cistructure. 488c2ecf20Sopenharmony_ciA grace period can be completed at the root once every CPU 498c2ecf20Sopenharmony_ci(or, in the case of ``CONFIG_PREEMPT_RCU``, task) 508c2ecf20Sopenharmony_cihas passed through a quiescent state. 518c2ecf20Sopenharmony_ciOnce a grace period has completed, record of that fact is propagated 528c2ecf20Sopenharmony_ciback down the tree. 538c2ecf20Sopenharmony_ci 548c2ecf20Sopenharmony_ciAs can be seen from the diagram, on a 64-bit system 558c2ecf20Sopenharmony_cia two-level tree with 64 leaves can accommodate 1,024 CPUs, with a fanout 568c2ecf20Sopenharmony_ciof 64 at the root and a fanout of 16 at the leaves. 578c2ecf20Sopenharmony_ci 588c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 598c2ecf20Sopenharmony_ci| **Quick Quiz**: | 608c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 618c2ecf20Sopenharmony_ci| Why isn't the fanout at the leaves also 64? | 628c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 638c2ecf20Sopenharmony_ci| **Answer**: | 648c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 658c2ecf20Sopenharmony_ci| Because there are more types of events that affect the leaf-level | 668c2ecf20Sopenharmony_ci| ``rcu_node`` structures than further up the tree. Therefore, if the | 678c2ecf20Sopenharmony_ci| leaf ``rcu_node`` structures have fanout of 64, the contention on | 688c2ecf20Sopenharmony_ci| these structures' ``->structures`` becomes excessive. Experimentation | 698c2ecf20Sopenharmony_ci| on a wide variety of systems has shown that a fanout of 16 works well | 708c2ecf20Sopenharmony_ci| for the leaves of the ``rcu_node`` tree. | 718c2ecf20Sopenharmony_ci| | 728c2ecf20Sopenharmony_ci| Of course, further experience with systems having hundreds or | 738c2ecf20Sopenharmony_ci| thousands of CPUs may demonstrate that the fanout for the non-leaf | 748c2ecf20Sopenharmony_ci| ``rcu_node`` structures must also be reduced. Such reduction can be | 758c2ecf20Sopenharmony_ci| easily carried out when and if it proves necessary. In the meantime, | 768c2ecf20Sopenharmony_ci| if you are using such a system and running into contention problems | 778c2ecf20Sopenharmony_ci| on the non-leaf ``rcu_node`` structures, you may use the | 788c2ecf20Sopenharmony_ci| ``CONFIG_RCU_FANOUT`` kernel configuration parameter to reduce the | 798c2ecf20Sopenharmony_ci| non-leaf fanout as needed. | 808c2ecf20Sopenharmony_ci| | 818c2ecf20Sopenharmony_ci| Kernels built for systems with strong NUMA characteristics might | 828c2ecf20Sopenharmony_ci| also need to adjust ``CONFIG_RCU_FANOUT`` so that the domains of | 838c2ecf20Sopenharmony_ci| the ``rcu_node`` structures align with hardware boundaries. | 848c2ecf20Sopenharmony_ci| However, there has thus far been no need for this. | 858c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ciIf your system has more than 1,024 CPUs (or more than 512 CPUs on a 888c2ecf20Sopenharmony_ci32-bit system), then RCU will automatically add more levels to the tree. 898c2ecf20Sopenharmony_ciFor example, if you are crazy enough to build a 64-bit system with 908c2ecf20Sopenharmony_ci65,536 CPUs, RCU would configure the ``rcu_node`` tree as follows: 918c2ecf20Sopenharmony_ci 928c2ecf20Sopenharmony_ci.. kernel-figure:: HugeTreeClassicRCU.svg 938c2ecf20Sopenharmony_ci 948c2ecf20Sopenharmony_ciRCU currently permits up to a four-level tree, which on a 64-bit system 958c2ecf20Sopenharmony_ciaccommodates up to 4,194,304 CPUs, though only a mere 524,288 CPUs for 968c2ecf20Sopenharmony_ci32-bit systems. On the other hand, you can set both 978c2ecf20Sopenharmony_ci``CONFIG_RCU_FANOUT`` and ``CONFIG_RCU_FANOUT_LEAF`` to be as small as 988c2ecf20Sopenharmony_ci2, which would result in a 16-CPU test using a 4-level tree. This can be 998c2ecf20Sopenharmony_ciuseful for testing large-system capabilities on small test machines. 1008c2ecf20Sopenharmony_ci 1018c2ecf20Sopenharmony_ciThis multi-level combining tree allows us to get most of the performance 1028c2ecf20Sopenharmony_ciand scalability benefits of partitioning, even though RCU grace-period 1038c2ecf20Sopenharmony_cidetection is inherently a global operation. The trick here is that only 1048c2ecf20Sopenharmony_cithe last CPU to report a quiescent state into a given ``rcu_node`` 1058c2ecf20Sopenharmony_cistructure need advance to the ``rcu_node`` structure at the next level 1068c2ecf20Sopenharmony_ciup the tree. This means that at the leaf-level ``rcu_node`` structure, 1078c2ecf20Sopenharmony_cionly one access out of sixteen will progress up the tree. For the 1088c2ecf20Sopenharmony_ciinternal ``rcu_node`` structures, the situation is even more extreme: 1098c2ecf20Sopenharmony_ciOnly one access out of sixty-four will progress up the tree. Because the 1108c2ecf20Sopenharmony_civast majority of the CPUs do not progress up the tree, the lock 1118c2ecf20Sopenharmony_cicontention remains roughly constant up the tree. No matter how many CPUs 1128c2ecf20Sopenharmony_cithere are in the system, at most 64 quiescent-state reports per grace 1138c2ecf20Sopenharmony_ciperiod will progress all the way to the root ``rcu_node`` structure, 1148c2ecf20Sopenharmony_cithus ensuring that the lock contention on that root ``rcu_node`` 1158c2ecf20Sopenharmony_cistructure remains acceptably low. 1168c2ecf20Sopenharmony_ci 1178c2ecf20Sopenharmony_ciIn effect, the combining tree acts like a big shock absorber, keeping 1188c2ecf20Sopenharmony_cilock contention under control at all tree levels regardless of the level 1198c2ecf20Sopenharmony_ciof loading on the system. 1208c2ecf20Sopenharmony_ci 1218c2ecf20Sopenharmony_ciRCU updaters wait for normal grace periods by registering RCU callbacks, 1228c2ecf20Sopenharmony_cieither directly via ``call_rcu()`` or indirectly via 1238c2ecf20Sopenharmony_ci``synchronize_rcu()`` and friends. RCU callbacks are represented by 1248c2ecf20Sopenharmony_ci``rcu_head`` structures, which are queued on ``rcu_data`` structures 1258c2ecf20Sopenharmony_ciwhile they are waiting for a grace period to elapse, as shown in the 1268c2ecf20Sopenharmony_cifollowing figure: 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ci.. kernel-figure:: BigTreePreemptRCUBHdyntickCB.svg 1298c2ecf20Sopenharmony_ci 1308c2ecf20Sopenharmony_ciThis figure shows how ``TREE_RCU``'s and ``PREEMPT_RCU``'s major data 1318c2ecf20Sopenharmony_cistructures are related. Lesser data structures will be introduced with 1328c2ecf20Sopenharmony_cithe algorithms that make use of them. 1338c2ecf20Sopenharmony_ci 1348c2ecf20Sopenharmony_ciNote that each of the data structures in the above figure has its own 1358c2ecf20Sopenharmony_cisynchronization: 1368c2ecf20Sopenharmony_ci 1378c2ecf20Sopenharmony_ci#. Each ``rcu_state`` structures has a lock and a mutex, and some fields 1388c2ecf20Sopenharmony_ci are protected by the corresponding root ``rcu_node`` structure's lock. 1398c2ecf20Sopenharmony_ci#. Each ``rcu_node`` structure has a spinlock. 1408c2ecf20Sopenharmony_ci#. The fields in ``rcu_data`` are private to the corresponding CPU, 1418c2ecf20Sopenharmony_ci although a few can be read and written by other CPUs. 1428c2ecf20Sopenharmony_ci 1438c2ecf20Sopenharmony_ciIt is important to note that different data structures can have very 1448c2ecf20Sopenharmony_cidifferent ideas about the state of RCU at any given time. For but one 1458c2ecf20Sopenharmony_ciexample, awareness of the start or end of a given RCU grace period 1468c2ecf20Sopenharmony_cipropagates slowly through the data structures. This slow propagation is 1478c2ecf20Sopenharmony_ciabsolutely necessary for RCU to have good read-side performance. If this 1488c2ecf20Sopenharmony_cibalkanized implementation seems foreign to you, one useful trick is to 1498c2ecf20Sopenharmony_ciconsider each instance of these data structures to be a different 1508c2ecf20Sopenharmony_ciperson, each having the usual slightly different view of reality. 1518c2ecf20Sopenharmony_ci 1528c2ecf20Sopenharmony_ciThe general role of each of these data structures is as follows: 1538c2ecf20Sopenharmony_ci 1548c2ecf20Sopenharmony_ci#. ``rcu_state``: This structure forms the interconnection between the 1558c2ecf20Sopenharmony_ci ``rcu_node`` and ``rcu_data`` structures, tracks grace periods, 1568c2ecf20Sopenharmony_ci serves as short-term repository for callbacks orphaned by CPU-hotplug 1578c2ecf20Sopenharmony_ci events, maintains ``rcu_barrier()`` state, tracks expedited 1588c2ecf20Sopenharmony_ci grace-period state, and maintains state used to force quiescent 1598c2ecf20Sopenharmony_ci states when grace periods extend too long, 1608c2ecf20Sopenharmony_ci#. ``rcu_node``: This structure forms the combining tree that propagates 1618c2ecf20Sopenharmony_ci quiescent-state information from the leaves to the root, and also 1628c2ecf20Sopenharmony_ci propagates grace-period information from the root to the leaves. It 1638c2ecf20Sopenharmony_ci provides local copies of the grace-period state in order to allow 1648c2ecf20Sopenharmony_ci this information to be accessed in a synchronized manner without 1658c2ecf20Sopenharmony_ci suffering the scalability limitations that would otherwise be imposed 1668c2ecf20Sopenharmony_ci by global locking. In ``CONFIG_PREEMPT_RCU`` kernels, it manages the 1678c2ecf20Sopenharmony_ci lists of tasks that have blocked while in their current RCU read-side 1688c2ecf20Sopenharmony_ci critical section. In ``CONFIG_PREEMPT_RCU`` with 1698c2ecf20Sopenharmony_ci ``CONFIG_RCU_BOOST``, it manages the per-\ ``rcu_node`` 1708c2ecf20Sopenharmony_ci priority-boosting kernel threads (kthreads) and state. Finally, it 1718c2ecf20Sopenharmony_ci records CPU-hotplug state in order to determine which CPUs should be 1728c2ecf20Sopenharmony_ci ignored during a given grace period. 1738c2ecf20Sopenharmony_ci#. ``rcu_data``: This per-CPU structure is the focus of quiescent-state 1748c2ecf20Sopenharmony_ci detection and RCU callback queuing. It also tracks its relationship 1758c2ecf20Sopenharmony_ci to the corresponding leaf ``rcu_node`` structure to allow 1768c2ecf20Sopenharmony_ci more-efficient propagation of quiescent states up the ``rcu_node`` 1778c2ecf20Sopenharmony_ci combining tree. Like the ``rcu_node`` structure, it provides a local 1788c2ecf20Sopenharmony_ci copy of the grace-period information to allow for-free synchronized 1798c2ecf20Sopenharmony_ci access to this information from the corresponding CPU. Finally, this 1808c2ecf20Sopenharmony_ci structure records past dyntick-idle state for the corresponding CPU 1818c2ecf20Sopenharmony_ci and also tracks statistics. 1828c2ecf20Sopenharmony_ci#. ``rcu_head``: This structure represents RCU callbacks, and is the 1838c2ecf20Sopenharmony_ci only structure allocated and managed by RCU users. The ``rcu_head`` 1848c2ecf20Sopenharmony_ci structure is normally embedded within the RCU-protected data 1858c2ecf20Sopenharmony_ci structure. 1868c2ecf20Sopenharmony_ci 1878c2ecf20Sopenharmony_ciIf all you wanted from this article was a general notion of how RCU's 1888c2ecf20Sopenharmony_cidata structures are related, you are done. Otherwise, each of the 1898c2ecf20Sopenharmony_cifollowing sections give more details on the ``rcu_state``, ``rcu_node`` 1908c2ecf20Sopenharmony_ciand ``rcu_data`` data structures. 1918c2ecf20Sopenharmony_ci 1928c2ecf20Sopenharmony_ciThe ``rcu_state`` Structure 1938c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ciThe ``rcu_state`` structure is the base structure that represents the 1968c2ecf20Sopenharmony_cistate of RCU in the system. This structure forms the interconnection 1978c2ecf20Sopenharmony_cibetween the ``rcu_node`` and ``rcu_data`` structures, tracks grace 1988c2ecf20Sopenharmony_ciperiods, contains the lock used to synchronize with CPU-hotplug events, 1998c2ecf20Sopenharmony_ciand maintains state used to force quiescent states when grace periods 2008c2ecf20Sopenharmony_ciextend too long, 2018c2ecf20Sopenharmony_ci 2028c2ecf20Sopenharmony_ciA few of the ``rcu_state`` structure's fields are discussed, singly and 2038c2ecf20Sopenharmony_ciin groups, in the following sections. The more specialized fields are 2048c2ecf20Sopenharmony_cicovered in the discussion of their use. 2058c2ecf20Sopenharmony_ci 2068c2ecf20Sopenharmony_ciRelationship to rcu_node and rcu_data Structures 2078c2ecf20Sopenharmony_ci'''''''''''''''''''''''''''''''''''''''''''''''' 2088c2ecf20Sopenharmony_ci 2098c2ecf20Sopenharmony_ciThis portion of the ``rcu_state`` structure is declared as follows: 2108c2ecf20Sopenharmony_ci 2118c2ecf20Sopenharmony_ci:: 2128c2ecf20Sopenharmony_ci 2138c2ecf20Sopenharmony_ci 1 struct rcu_node node[NUM_RCU_NODES]; 2148c2ecf20Sopenharmony_ci 2 struct rcu_node *level[NUM_RCU_LVLS + 1]; 2158c2ecf20Sopenharmony_ci 3 struct rcu_data __percpu *rda; 2168c2ecf20Sopenharmony_ci 2178c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 2188c2ecf20Sopenharmony_ci| **Quick Quiz**: | 2198c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 2208c2ecf20Sopenharmony_ci| Wait a minute! You said that the ``rcu_node`` structures formed a | 2218c2ecf20Sopenharmony_ci| tree, but they are declared as a flat array! What gives? | 2228c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 2238c2ecf20Sopenharmony_ci| **Answer**: | 2248c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 2258c2ecf20Sopenharmony_ci| The tree is laid out in the array. The first node In the array is the | 2268c2ecf20Sopenharmony_ci| head, the next set of nodes in the array are children of the head | 2278c2ecf20Sopenharmony_ci| node, and so on until the last set of nodes in the array are the | 2288c2ecf20Sopenharmony_ci| leaves. | 2298c2ecf20Sopenharmony_ci| See the following diagrams to see how this works. | 2308c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ciThe ``rcu_node`` tree is embedded into the ``->node[]`` array as shown 2338c2ecf20Sopenharmony_ciin the following figure: 2348c2ecf20Sopenharmony_ci 2358c2ecf20Sopenharmony_ci.. kernel-figure:: TreeMapping.svg 2368c2ecf20Sopenharmony_ci 2378c2ecf20Sopenharmony_ciOne interesting consequence of this mapping is that a breadth-first 2388c2ecf20Sopenharmony_citraversal of the tree is implemented as a simple linear scan of the 2398c2ecf20Sopenharmony_ciarray, which is in fact what the ``rcu_for_each_node_breadth_first()`` 2408c2ecf20Sopenharmony_cimacro does. This macro is used at the beginning and ends of grace 2418c2ecf20Sopenharmony_ciperiods. 2428c2ecf20Sopenharmony_ci 2438c2ecf20Sopenharmony_ciEach entry of the ``->level`` array references the first ``rcu_node`` 2448c2ecf20Sopenharmony_cistructure on the corresponding level of the tree, for example, as shown 2458c2ecf20Sopenharmony_cibelow: 2468c2ecf20Sopenharmony_ci 2478c2ecf20Sopenharmony_ci.. kernel-figure:: TreeMappingLevel.svg 2488c2ecf20Sopenharmony_ci 2498c2ecf20Sopenharmony_ciThe zero\ :sup:`th` element of the array references the root 2508c2ecf20Sopenharmony_ci``rcu_node`` structure, the first element references the first child of 2518c2ecf20Sopenharmony_cithe root ``rcu_node``, and finally the second element references the 2528c2ecf20Sopenharmony_cifirst leaf ``rcu_node`` structure. 2538c2ecf20Sopenharmony_ci 2548c2ecf20Sopenharmony_ciFor whatever it is worth, if you draw the tree to be tree-shaped rather 2558c2ecf20Sopenharmony_cithan array-shaped, it is easy to draw a planar representation: 2568c2ecf20Sopenharmony_ci 2578c2ecf20Sopenharmony_ci.. kernel-figure:: TreeLevel.svg 2588c2ecf20Sopenharmony_ci 2598c2ecf20Sopenharmony_ciFinally, the ``->rda`` field references a per-CPU pointer to the 2608c2ecf20Sopenharmony_cicorresponding CPU's ``rcu_data`` structure. 2618c2ecf20Sopenharmony_ci 2628c2ecf20Sopenharmony_ciAll of these fields are constant once initialization is complete, and 2638c2ecf20Sopenharmony_citherefore need no protection. 2648c2ecf20Sopenharmony_ci 2658c2ecf20Sopenharmony_ciGrace-Period Tracking 2668c2ecf20Sopenharmony_ci''''''''''''''''''''' 2678c2ecf20Sopenharmony_ci 2688c2ecf20Sopenharmony_ciThis portion of the ``rcu_state`` structure is declared as follows: 2698c2ecf20Sopenharmony_ci 2708c2ecf20Sopenharmony_ci:: 2718c2ecf20Sopenharmony_ci 2728c2ecf20Sopenharmony_ci 1 unsigned long gp_seq; 2738c2ecf20Sopenharmony_ci 2748c2ecf20Sopenharmony_ciRCU grace periods are numbered, and the ``->gp_seq`` field contains the 2758c2ecf20Sopenharmony_cicurrent grace-period sequence number. The bottom two bits are the state 2768c2ecf20Sopenharmony_ciof the current grace period, which can be zero for not yet started or 2778c2ecf20Sopenharmony_cione for in progress. In other words, if the bottom two bits of 2788c2ecf20Sopenharmony_ci``->gp_seq`` are zero, then RCU is idle. Any other value in the bottom 2798c2ecf20Sopenharmony_citwo bits indicates that something is broken. This field is protected by 2808c2ecf20Sopenharmony_cithe root ``rcu_node`` structure's ``->lock`` field. 2818c2ecf20Sopenharmony_ci 2828c2ecf20Sopenharmony_ciThere are ``->gp_seq`` fields in the ``rcu_node`` and ``rcu_data`` 2838c2ecf20Sopenharmony_cistructures as well. The fields in the ``rcu_state`` structure represent 2848c2ecf20Sopenharmony_cithe most current value, and those of the other structures are compared 2858c2ecf20Sopenharmony_ciin order to detect the beginnings and ends of grace periods in a 2868c2ecf20Sopenharmony_cidistributed fashion. The values flow from ``rcu_state`` to ``rcu_node`` 2878c2ecf20Sopenharmony_ci(down the tree from the root to the leaves) to ``rcu_data``. 2888c2ecf20Sopenharmony_ci 2898c2ecf20Sopenharmony_ciMiscellaneous 2908c2ecf20Sopenharmony_ci''''''''''''' 2918c2ecf20Sopenharmony_ci 2928c2ecf20Sopenharmony_ciThis portion of the ``rcu_state`` structure is declared as follows: 2938c2ecf20Sopenharmony_ci 2948c2ecf20Sopenharmony_ci:: 2958c2ecf20Sopenharmony_ci 2968c2ecf20Sopenharmony_ci 1 unsigned long gp_max; 2978c2ecf20Sopenharmony_ci 2 char abbr; 2988c2ecf20Sopenharmony_ci 3 char *name; 2998c2ecf20Sopenharmony_ci 3008c2ecf20Sopenharmony_ciThe ``->gp_max`` field tracks the duration of the longest grace period 3018c2ecf20Sopenharmony_ciin jiffies. It is protected by the root ``rcu_node``'s ``->lock``. 3028c2ecf20Sopenharmony_ci 3038c2ecf20Sopenharmony_ciThe ``->name`` and ``->abbr`` fields distinguish between preemptible RCU 3048c2ecf20Sopenharmony_ci(“rcu_preempt” and “p”) and non-preemptible RCU (“rcu_sched” and “s”). 3058c2ecf20Sopenharmony_ciThese fields are used for diagnostic and tracing purposes. 3068c2ecf20Sopenharmony_ci 3078c2ecf20Sopenharmony_ciThe ``rcu_node`` Structure 3088c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~ 3098c2ecf20Sopenharmony_ci 3108c2ecf20Sopenharmony_ciThe ``rcu_node`` structures form the combining tree that propagates 3118c2ecf20Sopenharmony_ciquiescent-state information from the leaves to the root and also that 3128c2ecf20Sopenharmony_cipropagates grace-period information from the root down to the leaves. 3138c2ecf20Sopenharmony_ciThey provides local copies of the grace-period state in order to allow 3148c2ecf20Sopenharmony_cithis information to be accessed in a synchronized manner without 3158c2ecf20Sopenharmony_cisuffering the scalability limitations that would otherwise be imposed by 3168c2ecf20Sopenharmony_ciglobal locking. In ``CONFIG_PREEMPT_RCU`` kernels, they manage the lists 3178c2ecf20Sopenharmony_ciof tasks that have blocked while in their current RCU read-side critical 3188c2ecf20Sopenharmony_cisection. In ``CONFIG_PREEMPT_RCU`` with ``CONFIG_RCU_BOOST``, they 3198c2ecf20Sopenharmony_cimanage the per-\ ``rcu_node`` priority-boosting kernel threads 3208c2ecf20Sopenharmony_ci(kthreads) and state. Finally, they record CPU-hotplug state in order to 3218c2ecf20Sopenharmony_cidetermine which CPUs should be ignored during a given grace period. 3228c2ecf20Sopenharmony_ci 3238c2ecf20Sopenharmony_ciThe ``rcu_node`` structure's fields are discussed, singly and in groups, 3248c2ecf20Sopenharmony_ciin the following sections. 3258c2ecf20Sopenharmony_ci 3268c2ecf20Sopenharmony_ciConnection to Combining Tree 3278c2ecf20Sopenharmony_ci'''''''''''''''''''''''''''' 3288c2ecf20Sopenharmony_ci 3298c2ecf20Sopenharmony_ciThis portion of the ``rcu_node`` structure is declared as follows: 3308c2ecf20Sopenharmony_ci 3318c2ecf20Sopenharmony_ci:: 3328c2ecf20Sopenharmony_ci 3338c2ecf20Sopenharmony_ci 1 struct rcu_node *parent; 3348c2ecf20Sopenharmony_ci 2 u8 level; 3358c2ecf20Sopenharmony_ci 3 u8 grpnum; 3368c2ecf20Sopenharmony_ci 4 unsigned long grpmask; 3378c2ecf20Sopenharmony_ci 5 int grplo; 3388c2ecf20Sopenharmony_ci 6 int grphi; 3398c2ecf20Sopenharmony_ci 3408c2ecf20Sopenharmony_ciThe ``->parent`` pointer references the ``rcu_node`` one level up in the 3418c2ecf20Sopenharmony_citree, and is ``NULL`` for the root ``rcu_node``. The RCU implementation 3428c2ecf20Sopenharmony_cimakes heavy use of this field to push quiescent states up the tree. The 3438c2ecf20Sopenharmony_ci``->level`` field gives the level in the tree, with the root being at 3448c2ecf20Sopenharmony_cilevel zero, its children at level one, and so on. The ``->grpnum`` field 3458c2ecf20Sopenharmony_cigives this node's position within the children of its parent, so this 3468c2ecf20Sopenharmony_cinumber can range between 0 and 31 on 32-bit systems and between 0 and 63 3478c2ecf20Sopenharmony_cion 64-bit systems. The ``->level`` and ``->grpnum`` fields are used only 3488c2ecf20Sopenharmony_ciduring initialization and for tracing. The ``->grpmask`` field is the 3498c2ecf20Sopenharmony_cibitmask counterpart of ``->grpnum``, and therefore always has exactly 3508c2ecf20Sopenharmony_cione bit set. This mask is used to clear the bit corresponding to this 3518c2ecf20Sopenharmony_ci``rcu_node`` structure in its parent's bitmasks, which are described 3528c2ecf20Sopenharmony_cilater. Finally, the ``->grplo`` and ``->grphi`` fields contain the 3538c2ecf20Sopenharmony_cilowest and highest numbered CPU served by this ``rcu_node`` structure, 3548c2ecf20Sopenharmony_cirespectively. 3558c2ecf20Sopenharmony_ci 3568c2ecf20Sopenharmony_ciAll of these fields are constant, and thus do not require any 3578c2ecf20Sopenharmony_cisynchronization. 3588c2ecf20Sopenharmony_ci 3598c2ecf20Sopenharmony_ciSynchronization 3608c2ecf20Sopenharmony_ci''''''''''''''' 3618c2ecf20Sopenharmony_ci 3628c2ecf20Sopenharmony_ciThis field of the ``rcu_node`` structure is declared as follows: 3638c2ecf20Sopenharmony_ci 3648c2ecf20Sopenharmony_ci:: 3658c2ecf20Sopenharmony_ci 3668c2ecf20Sopenharmony_ci 1 raw_spinlock_t lock; 3678c2ecf20Sopenharmony_ci 3688c2ecf20Sopenharmony_ciThis field is used to protect the remaining fields in this structure, 3698c2ecf20Sopenharmony_ciunless otherwise stated. That said, all of the fields in this structure 3708c2ecf20Sopenharmony_cican be accessed without locking for tracing purposes. Yes, this can 3718c2ecf20Sopenharmony_ciresult in confusing traces, but better some tracing confusion than to be 3728c2ecf20Sopenharmony_ciheisenbugged out of existence. 3738c2ecf20Sopenharmony_ci 3748c2ecf20Sopenharmony_ci.. _grace-period-tracking-1: 3758c2ecf20Sopenharmony_ci 3768c2ecf20Sopenharmony_ciGrace-Period Tracking 3778c2ecf20Sopenharmony_ci''''''''''''''''''''' 3788c2ecf20Sopenharmony_ci 3798c2ecf20Sopenharmony_ciThis portion of the ``rcu_node`` structure is declared as follows: 3808c2ecf20Sopenharmony_ci 3818c2ecf20Sopenharmony_ci:: 3828c2ecf20Sopenharmony_ci 3838c2ecf20Sopenharmony_ci 1 unsigned long gp_seq; 3848c2ecf20Sopenharmony_ci 2 unsigned long gp_seq_needed; 3858c2ecf20Sopenharmony_ci 3868c2ecf20Sopenharmony_ciThe ``rcu_node`` structures' ``->gp_seq`` fields are the counterparts of 3878c2ecf20Sopenharmony_cithe field of the same name in the ``rcu_state`` structure. They each may 3888c2ecf20Sopenharmony_cilag up to one step behind their ``rcu_state`` counterpart. If the bottom 3898c2ecf20Sopenharmony_citwo bits of a given ``rcu_node`` structure's ``->gp_seq`` field is zero, 3908c2ecf20Sopenharmony_cithen this ``rcu_node`` structure believes that RCU is idle. 3918c2ecf20Sopenharmony_ci 3928c2ecf20Sopenharmony_ciThe ``>gp_seq`` field of each ``rcu_node`` structure is updated at the 3938c2ecf20Sopenharmony_cibeginning and the end of each grace period. 3948c2ecf20Sopenharmony_ci 3958c2ecf20Sopenharmony_ciThe ``->gp_seq_needed`` fields record the furthest-in-the-future grace 3968c2ecf20Sopenharmony_ciperiod request seen by the corresponding ``rcu_node`` structure. The 3978c2ecf20Sopenharmony_cirequest is considered fulfilled when the value of the ``->gp_seq`` field 3988c2ecf20Sopenharmony_ciequals or exceeds that of the ``->gp_seq_needed`` field. 3998c2ecf20Sopenharmony_ci 4008c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4018c2ecf20Sopenharmony_ci| **Quick Quiz**: | 4028c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4038c2ecf20Sopenharmony_ci| Suppose that this ``rcu_node`` structure doesn't see a request for a | 4048c2ecf20Sopenharmony_ci| very long time. Won't wrapping of the ``->gp_seq`` field cause | 4058c2ecf20Sopenharmony_ci| problems? | 4068c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4078c2ecf20Sopenharmony_ci| **Answer**: | 4088c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4098c2ecf20Sopenharmony_ci| No, because if the ``->gp_seq_needed`` field lags behind the | 4108c2ecf20Sopenharmony_ci| ``->gp_seq`` field, the ``->gp_seq_needed`` field will be updated at | 4118c2ecf20Sopenharmony_ci| the end of the grace period. Modulo-arithmetic comparisons therefore | 4128c2ecf20Sopenharmony_ci| will always get the correct answer, even with wrapping. | 4138c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4148c2ecf20Sopenharmony_ci 4158c2ecf20Sopenharmony_ciQuiescent-State Tracking 4168c2ecf20Sopenharmony_ci'''''''''''''''''''''''' 4178c2ecf20Sopenharmony_ci 4188c2ecf20Sopenharmony_ciThese fields manage the propagation of quiescent states up the combining 4198c2ecf20Sopenharmony_citree. 4208c2ecf20Sopenharmony_ci 4218c2ecf20Sopenharmony_ciThis portion of the ``rcu_node`` structure has fields as follows: 4228c2ecf20Sopenharmony_ci 4238c2ecf20Sopenharmony_ci:: 4248c2ecf20Sopenharmony_ci 4258c2ecf20Sopenharmony_ci 1 unsigned long qsmask; 4268c2ecf20Sopenharmony_ci 2 unsigned long expmask; 4278c2ecf20Sopenharmony_ci 3 unsigned long qsmaskinit; 4288c2ecf20Sopenharmony_ci 4 unsigned long expmaskinit; 4298c2ecf20Sopenharmony_ci 4308c2ecf20Sopenharmony_ciThe ``->qsmask`` field tracks which of this ``rcu_node`` structure's 4318c2ecf20Sopenharmony_cichildren still need to report quiescent states for the current normal 4328c2ecf20Sopenharmony_cigrace period. Such children will have a value of 1 in their 4338c2ecf20Sopenharmony_cicorresponding bit. Note that the leaf ``rcu_node`` structures should be 4348c2ecf20Sopenharmony_cithought of as having ``rcu_data`` structures as their children. 4358c2ecf20Sopenharmony_ciSimilarly, the ``->expmask`` field tracks which of this ``rcu_node`` 4368c2ecf20Sopenharmony_cistructure's children still need to report quiescent states for the 4378c2ecf20Sopenharmony_cicurrent expedited grace period. An expedited grace period has the same 4388c2ecf20Sopenharmony_ciconceptual properties as a normal grace period, but the expedited 4398c2ecf20Sopenharmony_ciimplementation accepts extreme CPU overhead to obtain much lower 4408c2ecf20Sopenharmony_cigrace-period latency, for example, consuming a few tens of microseconds 4418c2ecf20Sopenharmony_ciworth of CPU time to reduce grace-period duration from milliseconds to 4428c2ecf20Sopenharmony_citens of microseconds. The ``->qsmaskinit`` field tracks which of this 4438c2ecf20Sopenharmony_ci``rcu_node`` structure's children cover for at least one online CPU. 4448c2ecf20Sopenharmony_ciThis mask is used to initialize ``->qsmask``, and ``->expmaskinit`` is 4458c2ecf20Sopenharmony_ciused to initialize ``->expmask`` and the beginning of the normal and 4468c2ecf20Sopenharmony_ciexpedited grace periods, respectively. 4478c2ecf20Sopenharmony_ci 4488c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4498c2ecf20Sopenharmony_ci| **Quick Quiz**: | 4508c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4518c2ecf20Sopenharmony_ci| Why are these bitmasks protected by locking? Come on, haven't you | 4528c2ecf20Sopenharmony_ci| heard of atomic instructions??? | 4538c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4548c2ecf20Sopenharmony_ci| **Answer**: | 4558c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4568c2ecf20Sopenharmony_ci| Lockless grace-period computation! Such a tantalizing possibility! | 4578c2ecf20Sopenharmony_ci| But consider the following sequence of events: | 4588c2ecf20Sopenharmony_ci| | 4598c2ecf20Sopenharmony_ci| #. CPU 0 has been in dyntick-idle mode for quite some time. When it | 4608c2ecf20Sopenharmony_ci| wakes up, it notices that the current RCU grace period needs it to | 4618c2ecf20Sopenharmony_ci| report in, so it sets a flag where the scheduling clock interrupt | 4628c2ecf20Sopenharmony_ci| will find it. | 4638c2ecf20Sopenharmony_ci| #. Meanwhile, CPU 1 is running ``force_quiescent_state()``, and | 4648c2ecf20Sopenharmony_ci| notices that CPU 0 has been in dyntick idle mode, which qualifies | 4658c2ecf20Sopenharmony_ci| as an extended quiescent state. | 4668c2ecf20Sopenharmony_ci| #. CPU 0's scheduling clock interrupt fires in the middle of an RCU | 4678c2ecf20Sopenharmony_ci| read-side critical section, and notices that the RCU core needs | 4688c2ecf20Sopenharmony_ci| something, so commences RCU softirq processing. | 4698c2ecf20Sopenharmony_ci| #. CPU 0's softirq handler executes and is just about ready to report | 4708c2ecf20Sopenharmony_ci| its quiescent state up the ``rcu_node`` tree. | 4718c2ecf20Sopenharmony_ci| #. But CPU 1 beats it to the punch, completing the current grace | 4728c2ecf20Sopenharmony_ci| period and starting a new one. | 4738c2ecf20Sopenharmony_ci| #. CPU 0 now reports its quiescent state for the wrong grace period. | 4748c2ecf20Sopenharmony_ci| That grace period might now end before the RCU read-side critical | 4758c2ecf20Sopenharmony_ci| section. If that happens, disaster will ensue. | 4768c2ecf20Sopenharmony_ci| | 4778c2ecf20Sopenharmony_ci| So the locking is absolutely required in order to coordinate clearing | 4788c2ecf20Sopenharmony_ci| of the bits with updating of the grace-period sequence number in | 4798c2ecf20Sopenharmony_ci| ``->gp_seq``. | 4808c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 4818c2ecf20Sopenharmony_ci 4828c2ecf20Sopenharmony_ciBlocked-Task Management 4838c2ecf20Sopenharmony_ci''''''''''''''''''''''' 4848c2ecf20Sopenharmony_ci 4858c2ecf20Sopenharmony_ci``PREEMPT_RCU`` allows tasks to be preempted in the midst of their RCU 4868c2ecf20Sopenharmony_ciread-side critical sections, and these tasks must be tracked explicitly. 4878c2ecf20Sopenharmony_ciThe details of exactly why and how they are tracked will be covered in a 4888c2ecf20Sopenharmony_ciseparate article on RCU read-side processing. For now, it is enough to 4898c2ecf20Sopenharmony_ciknow that the ``rcu_node`` structure tracks them. 4908c2ecf20Sopenharmony_ci 4918c2ecf20Sopenharmony_ci:: 4928c2ecf20Sopenharmony_ci 4938c2ecf20Sopenharmony_ci 1 struct list_head blkd_tasks; 4948c2ecf20Sopenharmony_ci 2 struct list_head *gp_tasks; 4958c2ecf20Sopenharmony_ci 3 struct list_head *exp_tasks; 4968c2ecf20Sopenharmony_ci 4 bool wait_blkd_tasks; 4978c2ecf20Sopenharmony_ci 4988c2ecf20Sopenharmony_ciThe ``->blkd_tasks`` field is a list header for the list of blocked and 4998c2ecf20Sopenharmony_cipreempted tasks. As tasks undergo context switches within RCU read-side 5008c2ecf20Sopenharmony_cicritical sections, their ``task_struct`` structures are enqueued (via 5018c2ecf20Sopenharmony_cithe ``task_struct``'s ``->rcu_node_entry`` field) onto the head of the 5028c2ecf20Sopenharmony_ci``->blkd_tasks`` list for the leaf ``rcu_node`` structure corresponding 5038c2ecf20Sopenharmony_cito the CPU on which the outgoing context switch executed. As these tasks 5048c2ecf20Sopenharmony_cilater exit their RCU read-side critical sections, they remove themselves 5058c2ecf20Sopenharmony_cifrom the list. This list is therefore in reverse time order, so that if 5068c2ecf20Sopenharmony_cione of the tasks is blocking the current grace period, all subsequent 5078c2ecf20Sopenharmony_citasks must also be blocking that same grace period. Therefore, a single 5088c2ecf20Sopenharmony_cipointer into this list suffices to track all tasks blocking a given 5098c2ecf20Sopenharmony_cigrace period. That pointer is stored in ``->gp_tasks`` for normal grace 5108c2ecf20Sopenharmony_ciperiods and in ``->exp_tasks`` for expedited grace periods. These last 5118c2ecf20Sopenharmony_citwo fields are ``NULL`` if either there is no grace period in flight or 5128c2ecf20Sopenharmony_ciif there are no blocked tasks preventing that grace period from 5138c2ecf20Sopenharmony_cicompleting. If either of these two pointers is referencing a task that 5148c2ecf20Sopenharmony_ciremoves itself from the ``->blkd_tasks`` list, then that task must 5158c2ecf20Sopenharmony_ciadvance the pointer to the next task on the list, or set the pointer to 5168c2ecf20Sopenharmony_ci``NULL`` if there are no subsequent tasks on the list. 5178c2ecf20Sopenharmony_ci 5188c2ecf20Sopenharmony_ciFor example, suppose that tasks T1, T2, and T3 are all hard-affinitied 5198c2ecf20Sopenharmony_cito the largest-numbered CPU in the system. Then if task T1 blocked in an 5208c2ecf20Sopenharmony_ciRCU read-side critical section, then an expedited grace period started, 5218c2ecf20Sopenharmony_cithen task T2 blocked in an RCU read-side critical section, then a normal 5228c2ecf20Sopenharmony_cigrace period started, and finally task 3 blocked in an RCU read-side 5238c2ecf20Sopenharmony_cicritical section, then the state of the last leaf ``rcu_node`` 5248c2ecf20Sopenharmony_cistructure's blocked-task list would be as shown below: 5258c2ecf20Sopenharmony_ci 5268c2ecf20Sopenharmony_ci.. kernel-figure:: blkd_task.svg 5278c2ecf20Sopenharmony_ci 5288c2ecf20Sopenharmony_ciTask T1 is blocking both grace periods, task T2 is blocking only the 5298c2ecf20Sopenharmony_cinormal grace period, and task T3 is blocking neither grace period. Note 5308c2ecf20Sopenharmony_cithat these tasks will not remove themselves from this list immediately 5318c2ecf20Sopenharmony_ciupon resuming execution. They will instead remain on the list until they 5328c2ecf20Sopenharmony_ciexecute the outermost ``rcu_read_unlock()`` that ends their RCU 5338c2ecf20Sopenharmony_ciread-side critical section. 5348c2ecf20Sopenharmony_ci 5358c2ecf20Sopenharmony_ciThe ``->wait_blkd_tasks`` field indicates whether or not the current 5368c2ecf20Sopenharmony_cigrace period is waiting on a blocked task. 5378c2ecf20Sopenharmony_ci 5388c2ecf20Sopenharmony_ciSizing the ``rcu_node`` Array 5398c2ecf20Sopenharmony_ci''''''''''''''''''''''''''''' 5408c2ecf20Sopenharmony_ci 5418c2ecf20Sopenharmony_ciThe ``rcu_node`` array is sized via a series of C-preprocessor 5428c2ecf20Sopenharmony_ciexpressions as follows: 5438c2ecf20Sopenharmony_ci 5448c2ecf20Sopenharmony_ci:: 5458c2ecf20Sopenharmony_ci 5468c2ecf20Sopenharmony_ci 1 #ifdef CONFIG_RCU_FANOUT 5478c2ecf20Sopenharmony_ci 2 #define RCU_FANOUT CONFIG_RCU_FANOUT 5488c2ecf20Sopenharmony_ci 3 #else 5498c2ecf20Sopenharmony_ci 4 # ifdef CONFIG_64BIT 5508c2ecf20Sopenharmony_ci 5 # define RCU_FANOUT 64 5518c2ecf20Sopenharmony_ci 6 # else 5528c2ecf20Sopenharmony_ci 7 # define RCU_FANOUT 32 5538c2ecf20Sopenharmony_ci 8 # endif 5548c2ecf20Sopenharmony_ci 9 #endif 5558c2ecf20Sopenharmony_ci 10 5568c2ecf20Sopenharmony_ci 11 #ifdef CONFIG_RCU_FANOUT_LEAF 5578c2ecf20Sopenharmony_ci 12 #define RCU_FANOUT_LEAF CONFIG_RCU_FANOUT_LEAF 5588c2ecf20Sopenharmony_ci 13 #else 5598c2ecf20Sopenharmony_ci 14 # ifdef CONFIG_64BIT 5608c2ecf20Sopenharmony_ci 15 # define RCU_FANOUT_LEAF 64 5618c2ecf20Sopenharmony_ci 16 # else 5628c2ecf20Sopenharmony_ci 17 # define RCU_FANOUT_LEAF 32 5638c2ecf20Sopenharmony_ci 18 # endif 5648c2ecf20Sopenharmony_ci 19 #endif 5658c2ecf20Sopenharmony_ci 20 5668c2ecf20Sopenharmony_ci 21 #define RCU_FANOUT_1 (RCU_FANOUT_LEAF) 5678c2ecf20Sopenharmony_ci 22 #define RCU_FANOUT_2 (RCU_FANOUT_1 * RCU_FANOUT) 5688c2ecf20Sopenharmony_ci 23 #define RCU_FANOUT_3 (RCU_FANOUT_2 * RCU_FANOUT) 5698c2ecf20Sopenharmony_ci 24 #define RCU_FANOUT_4 (RCU_FANOUT_3 * RCU_FANOUT) 5708c2ecf20Sopenharmony_ci 25 5718c2ecf20Sopenharmony_ci 26 #if NR_CPUS <= RCU_FANOUT_1 5728c2ecf20Sopenharmony_ci 27 # define RCU_NUM_LVLS 1 5738c2ecf20Sopenharmony_ci 28 # define NUM_RCU_LVL_0 1 5748c2ecf20Sopenharmony_ci 29 # define NUM_RCU_NODES NUM_RCU_LVL_0 5758c2ecf20Sopenharmony_ci 30 # define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0 } 5768c2ecf20Sopenharmony_ci 31 # define RCU_NODE_NAME_INIT { "rcu_node_0" } 5778c2ecf20Sopenharmony_ci 32 # define RCU_FQS_NAME_INIT { "rcu_node_fqs_0" } 5788c2ecf20Sopenharmony_ci 33 # define RCU_EXP_NAME_INIT { "rcu_node_exp_0" } 5798c2ecf20Sopenharmony_ci 34 #elif NR_CPUS <= RCU_FANOUT_2 5808c2ecf20Sopenharmony_ci 35 # define RCU_NUM_LVLS 2 5818c2ecf20Sopenharmony_ci 36 # define NUM_RCU_LVL_0 1 5828c2ecf20Sopenharmony_ci 37 # define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1) 5838c2ecf20Sopenharmony_ci 38 # define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1) 5848c2ecf20Sopenharmony_ci 39 # define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1 } 5858c2ecf20Sopenharmony_ci 40 # define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1" } 5868c2ecf20Sopenharmony_ci 41 # define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1" } 5878c2ecf20Sopenharmony_ci 42 # define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1" } 5888c2ecf20Sopenharmony_ci 43 #elif NR_CPUS <= RCU_FANOUT_3 5898c2ecf20Sopenharmony_ci 44 # define RCU_NUM_LVLS 3 5908c2ecf20Sopenharmony_ci 45 # define NUM_RCU_LVL_0 1 5918c2ecf20Sopenharmony_ci 46 # define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2) 5928c2ecf20Sopenharmony_ci 47 # define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1) 5938c2ecf20Sopenharmony_ci 48 # define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2) 5948c2ecf20Sopenharmony_ci 49 # define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2 } 5958c2ecf20Sopenharmony_ci 50 # define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2" } 5968c2ecf20Sopenharmony_ci 51 # define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" } 5978c2ecf20Sopenharmony_ci 52 # define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2" } 5988c2ecf20Sopenharmony_ci 53 #elif NR_CPUS <= RCU_FANOUT_4 5998c2ecf20Sopenharmony_ci 54 # define RCU_NUM_LVLS 4 6008c2ecf20Sopenharmony_ci 55 # define NUM_RCU_LVL_0 1 6018c2ecf20Sopenharmony_ci 56 # define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_3) 6028c2ecf20Sopenharmony_ci 57 # define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2) 6038c2ecf20Sopenharmony_ci 58 # define NUM_RCU_LVL_3 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1) 6048c2ecf20Sopenharmony_ci 59 # define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3) 6058c2ecf20Sopenharmony_ci 60 # define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2, NUM_RCU_LVL_3 } 6068c2ecf20Sopenharmony_ci 61 # define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" } 6078c2ecf20Sopenharmony_ci 62 # define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" } 6088c2ecf20Sopenharmony_ci 63 # define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2", "rcu_node_exp_3" } 6098c2ecf20Sopenharmony_ci 64 #else 6108c2ecf20Sopenharmony_ci 65 # error "CONFIG_RCU_FANOUT insufficient for NR_CPUS" 6118c2ecf20Sopenharmony_ci 66 #endif 6128c2ecf20Sopenharmony_ci 6138c2ecf20Sopenharmony_ciThe maximum number of levels in the ``rcu_node`` structure is currently 6148c2ecf20Sopenharmony_cilimited to four, as specified by lines 21-24 and the structure of the 6158c2ecf20Sopenharmony_cisubsequent “if” statement. For 32-bit systems, this allows 6168c2ecf20Sopenharmony_ci16*32*32*32=524,288 CPUs, which should be sufficient for the next few 6178c2ecf20Sopenharmony_ciyears at least. For 64-bit systems, 16*64*64*64=4,194,304 CPUs is 6188c2ecf20Sopenharmony_ciallowed, which should see us through the next decade or so. This 6198c2ecf20Sopenharmony_cifour-level tree also allows kernels built with ``CONFIG_RCU_FANOUT=8`` 6208c2ecf20Sopenharmony_cito support up to 4096 CPUs, which might be useful in very large systems 6218c2ecf20Sopenharmony_cihaving eight CPUs per socket (but please note that no one has yet shown 6228c2ecf20Sopenharmony_ciany measurable performance degradation due to misaligned socket and 6238c2ecf20Sopenharmony_ci``rcu_node`` boundaries). In addition, building kernels with a full four 6248c2ecf20Sopenharmony_cilevels of ``rcu_node`` tree permits better testing of RCU's 6258c2ecf20Sopenharmony_cicombining-tree code. 6268c2ecf20Sopenharmony_ci 6278c2ecf20Sopenharmony_ciThe ``RCU_FANOUT`` symbol controls how many children are permitted at 6288c2ecf20Sopenharmony_cieach non-leaf level of the ``rcu_node`` tree. If the 6298c2ecf20Sopenharmony_ci``CONFIG_RCU_FANOUT`` Kconfig option is not specified, it is set based 6308c2ecf20Sopenharmony_cion the word size of the system, which is also the Kconfig default. 6318c2ecf20Sopenharmony_ci 6328c2ecf20Sopenharmony_ciThe ``RCU_FANOUT_LEAF`` symbol controls how many CPUs are handled by 6338c2ecf20Sopenharmony_cieach leaf ``rcu_node`` structure. Experience has shown that allowing a 6348c2ecf20Sopenharmony_cigiven leaf ``rcu_node`` structure to handle 64 CPUs, as permitted by the 6358c2ecf20Sopenharmony_cinumber of bits in the ``->qsmask`` field on a 64-bit system, results in 6368c2ecf20Sopenharmony_ciexcessive contention for the leaf ``rcu_node`` structures' ``->lock`` 6378c2ecf20Sopenharmony_cifields. The number of CPUs per leaf ``rcu_node`` structure is therefore 6388c2ecf20Sopenharmony_cilimited to 16 given the default value of ``CONFIG_RCU_FANOUT_LEAF``. If 6398c2ecf20Sopenharmony_ci``CONFIG_RCU_FANOUT_LEAF`` is unspecified, the value selected is based 6408c2ecf20Sopenharmony_cion the word size of the system, just as for ``CONFIG_RCU_FANOUT``. 6418c2ecf20Sopenharmony_ciLines 11-19 perform this computation. 6428c2ecf20Sopenharmony_ci 6438c2ecf20Sopenharmony_ciLines 21-24 compute the maximum number of CPUs supported by a 6448c2ecf20Sopenharmony_cisingle-level (which contains a single ``rcu_node`` structure), 6458c2ecf20Sopenharmony_citwo-level, three-level, and four-level ``rcu_node`` tree, respectively, 6468c2ecf20Sopenharmony_cigiven the fanout specified by ``RCU_FANOUT`` and ``RCU_FANOUT_LEAF``. 6478c2ecf20Sopenharmony_ciThese numbers of CPUs are retained in the ``RCU_FANOUT_1``, 6488c2ecf20Sopenharmony_ci``RCU_FANOUT_2``, ``RCU_FANOUT_3``, and ``RCU_FANOUT_4`` C-preprocessor 6498c2ecf20Sopenharmony_civariables, respectively. 6508c2ecf20Sopenharmony_ci 6518c2ecf20Sopenharmony_ciThese variables are used to control the C-preprocessor ``#if`` statement 6528c2ecf20Sopenharmony_cispanning lines 26-66 that computes the number of ``rcu_node`` structures 6538c2ecf20Sopenharmony_cirequired for each level of the tree, as well as the number of levels 6548c2ecf20Sopenharmony_cirequired. The number of levels is placed in the ``NUM_RCU_LVLS`` 6558c2ecf20Sopenharmony_ciC-preprocessor variable by lines 27, 35, 44, and 54. The number of 6568c2ecf20Sopenharmony_ci``rcu_node`` structures for the topmost level of the tree is always 6578c2ecf20Sopenharmony_ciexactly one, and this value is unconditionally placed into 6588c2ecf20Sopenharmony_ci``NUM_RCU_LVL_0`` by lines 28, 36, 45, and 55. The rest of the levels 6598c2ecf20Sopenharmony_ci(if any) of the ``rcu_node`` tree are computed by dividing the maximum 6608c2ecf20Sopenharmony_cinumber of CPUs by the fanout supported by the number of levels from the 6618c2ecf20Sopenharmony_cicurrent level down, rounding up. This computation is performed by 6628c2ecf20Sopenharmony_cilines 37, 46-47, and 56-58. Lines 31-33, 40-42, 50-52, and 62-63 create 6638c2ecf20Sopenharmony_ciinitializers for lockdep lock-class names. Finally, lines 64-66 produce 6648c2ecf20Sopenharmony_cian error if the maximum number of CPUs is too large for the specified 6658c2ecf20Sopenharmony_cifanout. 6668c2ecf20Sopenharmony_ci 6678c2ecf20Sopenharmony_ciThe ``rcu_segcblist`` Structure 6688c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6698c2ecf20Sopenharmony_ci 6708c2ecf20Sopenharmony_ciThe ``rcu_segcblist`` structure maintains a segmented list of callbacks 6718c2ecf20Sopenharmony_cias follows: 6728c2ecf20Sopenharmony_ci 6738c2ecf20Sopenharmony_ci:: 6748c2ecf20Sopenharmony_ci 6758c2ecf20Sopenharmony_ci 1 #define RCU_DONE_TAIL 0 6768c2ecf20Sopenharmony_ci 2 #define RCU_WAIT_TAIL 1 6778c2ecf20Sopenharmony_ci 3 #define RCU_NEXT_READY_TAIL 2 6788c2ecf20Sopenharmony_ci 4 #define RCU_NEXT_TAIL 3 6798c2ecf20Sopenharmony_ci 5 #define RCU_CBLIST_NSEGS 4 6808c2ecf20Sopenharmony_ci 6 6818c2ecf20Sopenharmony_ci 7 struct rcu_segcblist { 6828c2ecf20Sopenharmony_ci 8 struct rcu_head *head; 6838c2ecf20Sopenharmony_ci 9 struct rcu_head **tails[RCU_CBLIST_NSEGS]; 6848c2ecf20Sopenharmony_ci 10 unsigned long gp_seq[RCU_CBLIST_NSEGS]; 6858c2ecf20Sopenharmony_ci 11 long len; 6868c2ecf20Sopenharmony_ci 12 long len_lazy; 6878c2ecf20Sopenharmony_ci 13 }; 6888c2ecf20Sopenharmony_ci 6898c2ecf20Sopenharmony_ciThe segments are as follows: 6908c2ecf20Sopenharmony_ci 6918c2ecf20Sopenharmony_ci#. ``RCU_DONE_TAIL``: Callbacks whose grace periods have elapsed. These 6928c2ecf20Sopenharmony_ci callbacks are ready to be invoked. 6938c2ecf20Sopenharmony_ci#. ``RCU_WAIT_TAIL``: Callbacks that are waiting for the current grace 6948c2ecf20Sopenharmony_ci period. Note that different CPUs can have different ideas about which 6958c2ecf20Sopenharmony_ci grace period is current, hence the ``->gp_seq`` field. 6968c2ecf20Sopenharmony_ci#. ``RCU_NEXT_READY_TAIL``: Callbacks waiting for the next grace period 6978c2ecf20Sopenharmony_ci to start. 6988c2ecf20Sopenharmony_ci#. ``RCU_NEXT_TAIL``: Callbacks that have not yet been associated with a 6998c2ecf20Sopenharmony_ci grace period. 7008c2ecf20Sopenharmony_ci 7018c2ecf20Sopenharmony_ciThe ``->head`` pointer references the first callback or is ``NULL`` if 7028c2ecf20Sopenharmony_cithe list contains no callbacks (which is *not* the same as being empty). 7038c2ecf20Sopenharmony_ciEach element of the ``->tails[]`` array references the ``->next`` 7048c2ecf20Sopenharmony_cipointer of the last callback in the corresponding segment of the list, 7058c2ecf20Sopenharmony_cior the list's ``->head`` pointer if that segment and all previous 7068c2ecf20Sopenharmony_cisegments are empty. If the corresponding segment is empty but some 7078c2ecf20Sopenharmony_ciprevious segment is not empty, then the array element is identical to 7088c2ecf20Sopenharmony_ciits predecessor. Older callbacks are closer to the head of the list, and 7098c2ecf20Sopenharmony_cinew callbacks are added at the tail. This relationship between the 7108c2ecf20Sopenharmony_ci``->head`` pointer, the ``->tails[]`` array, and the callbacks is shown 7118c2ecf20Sopenharmony_ciin this diagram: 7128c2ecf20Sopenharmony_ci 7138c2ecf20Sopenharmony_ci.. kernel-figure:: nxtlist.svg 7148c2ecf20Sopenharmony_ci 7158c2ecf20Sopenharmony_ciIn this figure, the ``->head`` pointer references the first RCU callback 7168c2ecf20Sopenharmony_ciin the list. The ``->tails[RCU_DONE_TAIL]`` array element references the 7178c2ecf20Sopenharmony_ci``->head`` pointer itself, indicating that none of the callbacks is 7188c2ecf20Sopenharmony_ciready to invoke. The ``->tails[RCU_WAIT_TAIL]`` array element references 7198c2ecf20Sopenharmony_cicallback CB 2's ``->next`` pointer, which indicates that CB 1 and CB 2 7208c2ecf20Sopenharmony_ciare both waiting on the current grace period, give or take possible 7218c2ecf20Sopenharmony_cidisagreements about exactly which grace period is the current one. The 7228c2ecf20Sopenharmony_ci``->tails[RCU_NEXT_READY_TAIL]`` array element references the same RCU 7238c2ecf20Sopenharmony_cicallback that ``->tails[RCU_WAIT_TAIL]`` does, which indicates that 7248c2ecf20Sopenharmony_cithere are no callbacks waiting on the next RCU grace period. The 7258c2ecf20Sopenharmony_ci``->tails[RCU_NEXT_TAIL]`` array element references CB 4's ``->next`` 7268c2ecf20Sopenharmony_cipointer, indicating that all the remaining RCU callbacks have not yet 7278c2ecf20Sopenharmony_cibeen assigned to an RCU grace period. Note that the 7288c2ecf20Sopenharmony_ci``->tails[RCU_NEXT_TAIL]`` array element always references the last RCU 7298c2ecf20Sopenharmony_cicallback's ``->next`` pointer unless the callback list is empty, in 7308c2ecf20Sopenharmony_ciwhich case it references the ``->head`` pointer. 7318c2ecf20Sopenharmony_ci 7328c2ecf20Sopenharmony_ciThere is one additional important special case for the 7338c2ecf20Sopenharmony_ci``->tails[RCU_NEXT_TAIL]`` array element: It can be ``NULL`` when this 7348c2ecf20Sopenharmony_cilist is *disabled*. Lists are disabled when the corresponding CPU is 7358c2ecf20Sopenharmony_cioffline or when the corresponding CPU's callbacks are offloaded to a 7368c2ecf20Sopenharmony_cikthread, both of which are described elsewhere. 7378c2ecf20Sopenharmony_ci 7388c2ecf20Sopenharmony_ciCPUs advance their callbacks from the ``RCU_NEXT_TAIL`` to the 7398c2ecf20Sopenharmony_ci``RCU_NEXT_READY_TAIL`` to the ``RCU_WAIT_TAIL`` to the 7408c2ecf20Sopenharmony_ci``RCU_DONE_TAIL`` list segments as grace periods advance. 7418c2ecf20Sopenharmony_ci 7428c2ecf20Sopenharmony_ciThe ``->gp_seq[]`` array records grace-period numbers corresponding to 7438c2ecf20Sopenharmony_cithe list segments. This is what allows different CPUs to have different 7448c2ecf20Sopenharmony_ciideas as to which is the current grace period while still avoiding 7458c2ecf20Sopenharmony_cipremature invocation of their callbacks. In particular, this allows CPUs 7468c2ecf20Sopenharmony_cithat go idle for extended periods to determine which of their callbacks 7478c2ecf20Sopenharmony_ciare ready to be invoked after reawakening. 7488c2ecf20Sopenharmony_ci 7498c2ecf20Sopenharmony_ciThe ``->len`` counter contains the number of callbacks in ``->head``, 7508c2ecf20Sopenharmony_ciand the ``->len_lazy`` contains the number of those callbacks that are 7518c2ecf20Sopenharmony_ciknown to only free memory, and whose invocation can therefore be safely 7528c2ecf20Sopenharmony_cideferred. 7538c2ecf20Sopenharmony_ci 7548c2ecf20Sopenharmony_ci.. important:: 7558c2ecf20Sopenharmony_ci 7568c2ecf20Sopenharmony_ci It is the ``->len`` field that determines whether or 7578c2ecf20Sopenharmony_ci not there are callbacks associated with this ``rcu_segcblist`` 7588c2ecf20Sopenharmony_ci structure, *not* the ``->head`` pointer. The reason for this is that all 7598c2ecf20Sopenharmony_ci the ready-to-invoke callbacks (that is, those in the ``RCU_DONE_TAIL`` 7608c2ecf20Sopenharmony_ci segment) are extracted all at once at callback-invocation time 7618c2ecf20Sopenharmony_ci (``rcu_do_batch``), due to which ``->head`` may be set to NULL if there 7628c2ecf20Sopenharmony_ci are no not-done callbacks remaining in the ``rcu_segcblist``. If 7638c2ecf20Sopenharmony_ci callback invocation must be postponed, for example, because a 7648c2ecf20Sopenharmony_ci high-priority process just woke up on this CPU, then the remaining 7658c2ecf20Sopenharmony_ci callbacks are placed back on the ``RCU_DONE_TAIL`` segment and 7668c2ecf20Sopenharmony_ci ``->head`` once again points to the start of the segment. In short, the 7678c2ecf20Sopenharmony_ci head field can briefly be ``NULL`` even though the CPU has callbacks 7688c2ecf20Sopenharmony_ci present the entire time. Therefore, it is not appropriate to test the 7698c2ecf20Sopenharmony_ci ``->head`` pointer for ``NULL``. 7708c2ecf20Sopenharmony_ci 7718c2ecf20Sopenharmony_ciIn contrast, the ``->len`` and ``->len_lazy`` counts are adjusted only 7728c2ecf20Sopenharmony_ciafter the corresponding callbacks have been invoked. This means that the 7738c2ecf20Sopenharmony_ci``->len`` count is zero only if the ``rcu_segcblist`` structure really 7748c2ecf20Sopenharmony_ciis devoid of callbacks. Of course, off-CPU sampling of the ``->len`` 7758c2ecf20Sopenharmony_cicount requires careful use of appropriate synchronization, for example, 7768c2ecf20Sopenharmony_cimemory barriers. This synchronization can be a bit subtle, particularly 7778c2ecf20Sopenharmony_ciin the case of ``rcu_barrier()``. 7788c2ecf20Sopenharmony_ci 7798c2ecf20Sopenharmony_ciThe ``rcu_data`` Structure 7808c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~ 7818c2ecf20Sopenharmony_ci 7828c2ecf20Sopenharmony_ciThe ``rcu_data`` maintains the per-CPU state for the RCU subsystem. The 7838c2ecf20Sopenharmony_cifields in this structure may be accessed only from the corresponding CPU 7848c2ecf20Sopenharmony_ci(and from tracing) unless otherwise stated. This structure is the focus 7858c2ecf20Sopenharmony_ciof quiescent-state detection and RCU callback queuing. It also tracks 7868c2ecf20Sopenharmony_ciits relationship to the corresponding leaf ``rcu_node`` structure to 7878c2ecf20Sopenharmony_ciallow more-efficient propagation of quiescent states up the ``rcu_node`` 7888c2ecf20Sopenharmony_cicombining tree. Like the ``rcu_node`` structure, it provides a local 7898c2ecf20Sopenharmony_cicopy of the grace-period information to allow for-free synchronized 7908c2ecf20Sopenharmony_ciaccess to this information from the corresponding CPU. Finally, this 7918c2ecf20Sopenharmony_cistructure records past dyntick-idle state for the corresponding CPU and 7928c2ecf20Sopenharmony_cialso tracks statistics. 7938c2ecf20Sopenharmony_ci 7948c2ecf20Sopenharmony_ciThe ``rcu_data`` structure's fields are discussed, singly and in groups, 7958c2ecf20Sopenharmony_ciin the following sections. 7968c2ecf20Sopenharmony_ci 7978c2ecf20Sopenharmony_ciConnection to Other Data Structures 7988c2ecf20Sopenharmony_ci''''''''''''''''''''''''''''''''''' 7998c2ecf20Sopenharmony_ci 8008c2ecf20Sopenharmony_ciThis portion of the ``rcu_data`` structure is declared as follows: 8018c2ecf20Sopenharmony_ci 8028c2ecf20Sopenharmony_ci:: 8038c2ecf20Sopenharmony_ci 8048c2ecf20Sopenharmony_ci 1 int cpu; 8058c2ecf20Sopenharmony_ci 2 struct rcu_node *mynode; 8068c2ecf20Sopenharmony_ci 3 unsigned long grpmask; 8078c2ecf20Sopenharmony_ci 4 bool beenonline; 8088c2ecf20Sopenharmony_ci 8098c2ecf20Sopenharmony_ciThe ``->cpu`` field contains the number of the corresponding CPU and the 8108c2ecf20Sopenharmony_ci``->mynode`` field references the corresponding ``rcu_node`` structure. 8118c2ecf20Sopenharmony_ciThe ``->mynode`` is used to propagate quiescent states up the combining 8128c2ecf20Sopenharmony_citree. These two fields are constant and therefore do not require 8138c2ecf20Sopenharmony_cisynchronization. 8148c2ecf20Sopenharmony_ci 8158c2ecf20Sopenharmony_ciThe ``->grpmask`` field indicates the bit in the ``->mynode->qsmask`` 8168c2ecf20Sopenharmony_cicorresponding to this ``rcu_data`` structure, and is also used when 8178c2ecf20Sopenharmony_cipropagating quiescent states. The ``->beenonline`` flag is set whenever 8188c2ecf20Sopenharmony_cithe corresponding CPU comes online, which means that the debugfs tracing 8198c2ecf20Sopenharmony_cineed not dump out any ``rcu_data`` structure for which this flag is not 8208c2ecf20Sopenharmony_ciset. 8218c2ecf20Sopenharmony_ci 8228c2ecf20Sopenharmony_ciQuiescent-State and Grace-Period Tracking 8238c2ecf20Sopenharmony_ci''''''''''''''''''''''''''''''''''''''''' 8248c2ecf20Sopenharmony_ci 8258c2ecf20Sopenharmony_ciThis portion of the ``rcu_data`` structure is declared as follows: 8268c2ecf20Sopenharmony_ci 8278c2ecf20Sopenharmony_ci:: 8288c2ecf20Sopenharmony_ci 8298c2ecf20Sopenharmony_ci 1 unsigned long gp_seq; 8308c2ecf20Sopenharmony_ci 2 unsigned long gp_seq_needed; 8318c2ecf20Sopenharmony_ci 3 bool cpu_no_qs; 8328c2ecf20Sopenharmony_ci 4 bool core_needs_qs; 8338c2ecf20Sopenharmony_ci 5 bool gpwrap; 8348c2ecf20Sopenharmony_ci 8358c2ecf20Sopenharmony_ciThe ``->gp_seq`` field is the counterpart of the field of the same name 8368c2ecf20Sopenharmony_ciin the ``rcu_state`` and ``rcu_node`` structures. The 8378c2ecf20Sopenharmony_ci``->gp_seq_needed`` field is the counterpart of the field of the same 8388c2ecf20Sopenharmony_ciname in the rcu_node structure. They may each lag up to one behind their 8398c2ecf20Sopenharmony_ci``rcu_node`` counterparts, but in ``CONFIG_NO_HZ_IDLE`` and 8408c2ecf20Sopenharmony_ci``CONFIG_NO_HZ_FULL`` kernels can lag arbitrarily far behind for CPUs in 8418c2ecf20Sopenharmony_cidyntick-idle mode (but these counters will catch up upon exit from 8428c2ecf20Sopenharmony_cidyntick-idle mode). If the lower two bits of a given ``rcu_data`` 8438c2ecf20Sopenharmony_cistructure's ``->gp_seq`` are zero, then this ``rcu_data`` structure 8448c2ecf20Sopenharmony_cibelieves that RCU is idle. 8458c2ecf20Sopenharmony_ci 8468c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 8478c2ecf20Sopenharmony_ci| **Quick Quiz**: | 8488c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 8498c2ecf20Sopenharmony_ci| All this replication of the grace period numbers can only cause | 8508c2ecf20Sopenharmony_ci| massive confusion. Why not just keep a global sequence number and be | 8518c2ecf20Sopenharmony_ci| done with it??? | 8528c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 8538c2ecf20Sopenharmony_ci| **Answer**: | 8548c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 8558c2ecf20Sopenharmony_ci| Because if there was only a single global sequence numbers, there | 8568c2ecf20Sopenharmony_ci| would need to be a single global lock to allow safely accessing and | 8578c2ecf20Sopenharmony_ci| updating it. And if we are not going to have a single global lock, we | 8588c2ecf20Sopenharmony_ci| need to carefully manage the numbers on a per-node basis. Recall from | 8598c2ecf20Sopenharmony_ci| the answer to a previous Quick Quiz that the consequences of applying | 8608c2ecf20Sopenharmony_ci| a previously sampled quiescent state to the wrong grace period are | 8618c2ecf20Sopenharmony_ci| quite severe. | 8628c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 8638c2ecf20Sopenharmony_ci 8648c2ecf20Sopenharmony_ciThe ``->cpu_no_qs`` flag indicates that the CPU has not yet passed 8658c2ecf20Sopenharmony_cithrough a quiescent state, while the ``->core_needs_qs`` flag indicates 8668c2ecf20Sopenharmony_cithat the RCU core needs a quiescent state from the corresponding CPU. 8678c2ecf20Sopenharmony_ciThe ``->gpwrap`` field indicates that the corresponding CPU has remained 8688c2ecf20Sopenharmony_ciidle for so long that the ``gp_seq`` counter is in danger of overflow, 8698c2ecf20Sopenharmony_ciwhich will cause the CPU to disregard the values of its counters on its 8708c2ecf20Sopenharmony_cinext exit from idle. 8718c2ecf20Sopenharmony_ci 8728c2ecf20Sopenharmony_ciRCU Callback Handling 8738c2ecf20Sopenharmony_ci''''''''''''''''''''' 8748c2ecf20Sopenharmony_ci 8758c2ecf20Sopenharmony_ciIn the absence of CPU-hotplug events, RCU callbacks are invoked by the 8768c2ecf20Sopenharmony_cisame CPU that registered them. This is strictly a cache-locality 8778c2ecf20Sopenharmony_cioptimization: callbacks can and do get invoked on CPUs other than the 8788c2ecf20Sopenharmony_cione that registered them. After all, if the CPU that registered a given 8798c2ecf20Sopenharmony_cicallback has gone offline before the callback can be invoked, there 8808c2ecf20Sopenharmony_cireally is no other choice. 8818c2ecf20Sopenharmony_ci 8828c2ecf20Sopenharmony_ciThis portion of the ``rcu_data`` structure is declared as follows: 8838c2ecf20Sopenharmony_ci 8848c2ecf20Sopenharmony_ci:: 8858c2ecf20Sopenharmony_ci 8868c2ecf20Sopenharmony_ci 1 struct rcu_segcblist cblist; 8878c2ecf20Sopenharmony_ci 2 long qlen_last_fqs_check; 8888c2ecf20Sopenharmony_ci 3 unsigned long n_cbs_invoked; 8898c2ecf20Sopenharmony_ci 4 unsigned long n_nocbs_invoked; 8908c2ecf20Sopenharmony_ci 5 unsigned long n_cbs_orphaned; 8918c2ecf20Sopenharmony_ci 6 unsigned long n_cbs_adopted; 8928c2ecf20Sopenharmony_ci 7 unsigned long n_force_qs_snap; 8938c2ecf20Sopenharmony_ci 8 long blimit; 8948c2ecf20Sopenharmony_ci 8958c2ecf20Sopenharmony_ciThe ``->cblist`` structure is the segmented callback list described 8968c2ecf20Sopenharmony_ciearlier. The CPU advances the callbacks in its ``rcu_data`` structure 8978c2ecf20Sopenharmony_ciwhenever it notices that another RCU grace period has completed. The CPU 8988c2ecf20Sopenharmony_cidetects the completion of an RCU grace period by noticing that the value 8998c2ecf20Sopenharmony_ciof its ``rcu_data`` structure's ``->gp_seq`` field differs from that of 9008c2ecf20Sopenharmony_ciits leaf ``rcu_node`` structure. Recall that each ``rcu_node`` 9018c2ecf20Sopenharmony_cistructure's ``->gp_seq`` field is updated at the beginnings and ends of 9028c2ecf20Sopenharmony_cieach grace period. 9038c2ecf20Sopenharmony_ci 9048c2ecf20Sopenharmony_ciThe ``->qlen_last_fqs_check`` and ``->n_force_qs_snap`` coordinate the 9058c2ecf20Sopenharmony_ciforcing of quiescent states from ``call_rcu()`` and friends when 9068c2ecf20Sopenharmony_cicallback lists grow excessively long. 9078c2ecf20Sopenharmony_ci 9088c2ecf20Sopenharmony_ciThe ``->n_cbs_invoked``, ``->n_cbs_orphaned``, and ``->n_cbs_adopted`` 9098c2ecf20Sopenharmony_cifields count the number of callbacks invoked, sent to other CPUs when 9108c2ecf20Sopenharmony_cithis CPU goes offline, and received from other CPUs when those other 9118c2ecf20Sopenharmony_ciCPUs go offline. The ``->n_nocbs_invoked`` is used when the CPU's 9128c2ecf20Sopenharmony_cicallbacks are offloaded to a kthread. 9138c2ecf20Sopenharmony_ci 9148c2ecf20Sopenharmony_ciFinally, the ``->blimit`` counter is the maximum number of RCU callbacks 9158c2ecf20Sopenharmony_cithat may be invoked at a given time. 9168c2ecf20Sopenharmony_ci 9178c2ecf20Sopenharmony_ciDyntick-Idle Handling 9188c2ecf20Sopenharmony_ci''''''''''''''''''''' 9198c2ecf20Sopenharmony_ci 9208c2ecf20Sopenharmony_ciThis portion of the ``rcu_data`` structure is declared as follows: 9218c2ecf20Sopenharmony_ci 9228c2ecf20Sopenharmony_ci:: 9238c2ecf20Sopenharmony_ci 9248c2ecf20Sopenharmony_ci 1 int dynticks_snap; 9258c2ecf20Sopenharmony_ci 2 unsigned long dynticks_fqs; 9268c2ecf20Sopenharmony_ci 9278c2ecf20Sopenharmony_ciThe ``->dynticks_snap`` field is used to take a snapshot of the 9288c2ecf20Sopenharmony_cicorresponding CPU's dyntick-idle state when forcing quiescent states, 9298c2ecf20Sopenharmony_ciand is therefore accessed from other CPUs. Finally, the 9308c2ecf20Sopenharmony_ci``->dynticks_fqs`` field is used to count the number of times this CPU 9318c2ecf20Sopenharmony_ciis determined to be in dyntick-idle state, and is used for tracing and 9328c2ecf20Sopenharmony_cidebugging purposes. 9338c2ecf20Sopenharmony_ci 9348c2ecf20Sopenharmony_ciThis portion of the rcu_data structure is declared as follows: 9358c2ecf20Sopenharmony_ci 9368c2ecf20Sopenharmony_ci:: 9378c2ecf20Sopenharmony_ci 9388c2ecf20Sopenharmony_ci 1 long dynticks_nesting; 9398c2ecf20Sopenharmony_ci 2 long dynticks_nmi_nesting; 9408c2ecf20Sopenharmony_ci 3 atomic_t dynticks; 9418c2ecf20Sopenharmony_ci 4 bool rcu_need_heavy_qs; 9428c2ecf20Sopenharmony_ci 5 bool rcu_urgent_qs; 9438c2ecf20Sopenharmony_ci 9448c2ecf20Sopenharmony_ciThese fields in the rcu_data structure maintain the per-CPU dyntick-idle 9458c2ecf20Sopenharmony_cistate for the corresponding CPU. The fields may be accessed only from 9468c2ecf20Sopenharmony_cithe corresponding CPU (and from tracing) unless otherwise stated. 9478c2ecf20Sopenharmony_ci 9488c2ecf20Sopenharmony_ciThe ``->dynticks_nesting`` field counts the nesting depth of process 9498c2ecf20Sopenharmony_ciexecution, so that in normal circumstances this counter has value zero 9508c2ecf20Sopenharmony_cior one. NMIs, irqs, and tracers are counted by the 9518c2ecf20Sopenharmony_ci``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes 9528c2ecf20Sopenharmony_cito this variable have to be undertaken carefully using an algorithm 9538c2ecf20Sopenharmony_ciprovided by Andy Lutomirski. The initial transition from idle adds one, 9548c2ecf20Sopenharmony_ciand nested transitions add two, so that a nesting level of five is 9558c2ecf20Sopenharmony_cirepresented by a ``->dynticks_nmi_nesting`` value of nine. This counter 9568c2ecf20Sopenharmony_cican therefore be thought of as counting the number of reasons why this 9578c2ecf20Sopenharmony_ciCPU cannot be permitted to enter dyntick-idle mode, aside from 9588c2ecf20Sopenharmony_ciprocess-level transitions. 9598c2ecf20Sopenharmony_ci 9608c2ecf20Sopenharmony_ciHowever, it turns out that when running in non-idle kernel context, the 9618c2ecf20Sopenharmony_ciLinux kernel is fully capable of entering interrupt handlers that never 9628c2ecf20Sopenharmony_ciexit and perhaps also vice versa. Therefore, whenever the 9638c2ecf20Sopenharmony_ci``->dynticks_nesting`` field is incremented up from zero, the 9648c2ecf20Sopenharmony_ci``->dynticks_nmi_nesting`` field is set to a large positive number, and 9658c2ecf20Sopenharmony_ciwhenever the ``->dynticks_nesting`` field is decremented down to zero, 9668c2ecf20Sopenharmony_cithe ``->dynticks_nmi_nesting`` field is set to zero. Assuming that 9678c2ecf20Sopenharmony_cithe number of misnested interrupts is not sufficient to overflow the 9688c2ecf20Sopenharmony_cicounter, this approach corrects the ``->dynticks_nmi_nesting`` field 9698c2ecf20Sopenharmony_cievery time the corresponding CPU enters the idle loop from process 9708c2ecf20Sopenharmony_cicontext. 9718c2ecf20Sopenharmony_ci 9728c2ecf20Sopenharmony_ciThe ``->dynticks`` field counts the corresponding CPU's transitions to 9738c2ecf20Sopenharmony_ciand from either dyntick-idle or user mode, so that this counter has an 9748c2ecf20Sopenharmony_cieven value when the CPU is in dyntick-idle mode or user mode and an odd 9758c2ecf20Sopenharmony_civalue otherwise. The transitions to/from user mode need to be counted 9768c2ecf20Sopenharmony_cifor user mode adaptive-ticks support (see timers/NO_HZ.txt). 9778c2ecf20Sopenharmony_ci 9788c2ecf20Sopenharmony_ciThe ``->rcu_need_heavy_qs`` field is used to record the fact that the 9798c2ecf20Sopenharmony_ciRCU core code would really like to see a quiescent state from the 9808c2ecf20Sopenharmony_cicorresponding CPU, so much so that it is willing to call for 9818c2ecf20Sopenharmony_ciheavy-weight dyntick-counter operations. This flag is checked by RCU's 9828c2ecf20Sopenharmony_cicontext-switch and ``cond_resched()`` code, which provide a momentary 9838c2ecf20Sopenharmony_ciidle sojourn in response. 9848c2ecf20Sopenharmony_ci 9858c2ecf20Sopenharmony_ciFinally, the ``->rcu_urgent_qs`` field is used to record the fact that 9868c2ecf20Sopenharmony_cithe RCU core code would really like to see a quiescent state from the 9878c2ecf20Sopenharmony_cicorresponding CPU, with the various other fields indicating just how 9888c2ecf20Sopenharmony_cibadly RCU wants this quiescent state. This flag is checked by RCU's 9898c2ecf20Sopenharmony_cicontext-switch path (``rcu_note_context_switch``) and the cond_resched 9908c2ecf20Sopenharmony_cicode. 9918c2ecf20Sopenharmony_ci 9928c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 9938c2ecf20Sopenharmony_ci| **Quick Quiz**: | 9948c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 9958c2ecf20Sopenharmony_ci| Why not simply combine the ``->dynticks_nesting`` and | 9968c2ecf20Sopenharmony_ci| ``->dynticks_nmi_nesting`` counters into a single counter that just | 9978c2ecf20Sopenharmony_ci| counts the number of reasons that the corresponding CPU is non-idle? | 9988c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 9998c2ecf20Sopenharmony_ci| **Answer**: | 10008c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 10018c2ecf20Sopenharmony_ci| Because this would fail in the presence of interrupts whose handlers | 10028c2ecf20Sopenharmony_ci| never return and of handlers that manage to return from a made-up | 10038c2ecf20Sopenharmony_ci| interrupt. | 10048c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 10058c2ecf20Sopenharmony_ci 10068c2ecf20Sopenharmony_ciAdditional fields are present for some special-purpose builds, and are 10078c2ecf20Sopenharmony_cidiscussed separately. 10088c2ecf20Sopenharmony_ci 10098c2ecf20Sopenharmony_ciThe ``rcu_head`` Structure 10108c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~ 10118c2ecf20Sopenharmony_ci 10128c2ecf20Sopenharmony_ciEach ``rcu_head`` structure represents an RCU callback. These structures 10138c2ecf20Sopenharmony_ciare normally embedded within RCU-protected data structures whose 10148c2ecf20Sopenharmony_cialgorithms use asynchronous grace periods. In contrast, when using 10158c2ecf20Sopenharmony_cialgorithms that block waiting for RCU grace periods, RCU users need not 10168c2ecf20Sopenharmony_ciprovide ``rcu_head`` structures. 10178c2ecf20Sopenharmony_ci 10188c2ecf20Sopenharmony_ciThe ``rcu_head`` structure has fields as follows: 10198c2ecf20Sopenharmony_ci 10208c2ecf20Sopenharmony_ci:: 10218c2ecf20Sopenharmony_ci 10228c2ecf20Sopenharmony_ci 1 struct rcu_head *next; 10238c2ecf20Sopenharmony_ci 2 void (*func)(struct rcu_head *head); 10248c2ecf20Sopenharmony_ci 10258c2ecf20Sopenharmony_ciThe ``->next`` field is used to link the ``rcu_head`` structures 10268c2ecf20Sopenharmony_citogether in the lists within the ``rcu_data`` structures. The ``->func`` 10278c2ecf20Sopenharmony_cifield is a pointer to the function to be called when the callback is 10288c2ecf20Sopenharmony_ciready to be invoked, and this function is passed a pointer to the 10298c2ecf20Sopenharmony_ci``rcu_head`` structure. However, ``kfree_rcu()`` uses the ``->func`` 10308c2ecf20Sopenharmony_cifield to record the offset of the ``rcu_head`` structure within the 10318c2ecf20Sopenharmony_cienclosing RCU-protected data structure. 10328c2ecf20Sopenharmony_ci 10338c2ecf20Sopenharmony_ciBoth of these fields are used internally by RCU. From the viewpoint of 10348c2ecf20Sopenharmony_ciRCU users, this structure is an opaque “cookie”. 10358c2ecf20Sopenharmony_ci 10368c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 10378c2ecf20Sopenharmony_ci| **Quick Quiz**: | 10388c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 10398c2ecf20Sopenharmony_ci| Given that the callback function ``->func`` is passed a pointer to | 10408c2ecf20Sopenharmony_ci| the ``rcu_head`` structure, how is that function supposed to find the | 10418c2ecf20Sopenharmony_ci| beginning of the enclosing RCU-protected data structure? | 10428c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 10438c2ecf20Sopenharmony_ci| **Answer**: | 10448c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 10458c2ecf20Sopenharmony_ci| In actual practice, there is a separate callback function per type of | 10468c2ecf20Sopenharmony_ci| RCU-protected data structure. The callback function can therefore use | 10478c2ecf20Sopenharmony_ci| the ``container_of()`` macro in the Linux kernel (or other | 10488c2ecf20Sopenharmony_ci| pointer-manipulation facilities in other software environments) to | 10498c2ecf20Sopenharmony_ci| find the beginning of the enclosing structure. | 10508c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 10518c2ecf20Sopenharmony_ci 10528c2ecf20Sopenharmony_ciRCU-Specific Fields in the ``task_struct`` Structure 10538c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 10548c2ecf20Sopenharmony_ci 10558c2ecf20Sopenharmony_ciThe ``CONFIG_PREEMPT_RCU`` implementation uses some additional fields in 10568c2ecf20Sopenharmony_cithe ``task_struct`` structure: 10578c2ecf20Sopenharmony_ci 10588c2ecf20Sopenharmony_ci:: 10598c2ecf20Sopenharmony_ci 10608c2ecf20Sopenharmony_ci 1 #ifdef CONFIG_PREEMPT_RCU 10618c2ecf20Sopenharmony_ci 2 int rcu_read_lock_nesting; 10628c2ecf20Sopenharmony_ci 3 union rcu_special rcu_read_unlock_special; 10638c2ecf20Sopenharmony_ci 4 struct list_head rcu_node_entry; 10648c2ecf20Sopenharmony_ci 5 struct rcu_node *rcu_blocked_node; 10658c2ecf20Sopenharmony_ci 6 #endif /* #ifdef CONFIG_PREEMPT_RCU */ 10668c2ecf20Sopenharmony_ci 7 #ifdef CONFIG_TASKS_RCU 10678c2ecf20Sopenharmony_ci 8 unsigned long rcu_tasks_nvcsw; 10688c2ecf20Sopenharmony_ci 9 bool rcu_tasks_holdout; 10698c2ecf20Sopenharmony_ci 10 struct list_head rcu_tasks_holdout_list; 10708c2ecf20Sopenharmony_ci 11 int rcu_tasks_idle_cpu; 10718c2ecf20Sopenharmony_ci 12 #endif /* #ifdef CONFIG_TASKS_RCU */ 10728c2ecf20Sopenharmony_ci 10738c2ecf20Sopenharmony_ciThe ``->rcu_read_lock_nesting`` field records the nesting level for RCU 10748c2ecf20Sopenharmony_ciread-side critical sections, and the ``->rcu_read_unlock_special`` field 10758c2ecf20Sopenharmony_ciis a bitmask that records special conditions that require 10768c2ecf20Sopenharmony_ci``rcu_read_unlock()`` to do additional work. The ``->rcu_node_entry`` 10778c2ecf20Sopenharmony_cifield is used to form lists of tasks that have blocked within 10788c2ecf20Sopenharmony_cipreemptible-RCU read-side critical sections and the 10798c2ecf20Sopenharmony_ci``->rcu_blocked_node`` field references the ``rcu_node`` structure whose 10808c2ecf20Sopenharmony_cilist this task is a member of, or ``NULL`` if it is not blocked within a 10818c2ecf20Sopenharmony_cipreemptible-RCU read-side critical section. 10828c2ecf20Sopenharmony_ci 10838c2ecf20Sopenharmony_ciThe ``->rcu_tasks_nvcsw`` field tracks the number of voluntary context 10848c2ecf20Sopenharmony_ciswitches that this task had undergone at the beginning of the current 10858c2ecf20Sopenharmony_citasks-RCU grace period, ``->rcu_tasks_holdout`` is set if the current 10868c2ecf20Sopenharmony_citasks-RCU grace period is waiting on this task, 10878c2ecf20Sopenharmony_ci``->rcu_tasks_holdout_list`` is a list element enqueuing this task on 10888c2ecf20Sopenharmony_cithe holdout list, and ``->rcu_tasks_idle_cpu`` tracks which CPU this 10898c2ecf20Sopenharmony_ciidle task is running, but only if the task is currently running, that 10908c2ecf20Sopenharmony_ciis, if the CPU is currently idle. 10918c2ecf20Sopenharmony_ci 10928c2ecf20Sopenharmony_ciAccessor Functions 10938c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~ 10948c2ecf20Sopenharmony_ci 10958c2ecf20Sopenharmony_ciThe following listing shows the ``rcu_get_root()``, 10968c2ecf20Sopenharmony_ci``rcu_for_each_node_breadth_first`` and ``rcu_for_each_leaf_node()`` 10978c2ecf20Sopenharmony_cifunction and macros: 10988c2ecf20Sopenharmony_ci 10998c2ecf20Sopenharmony_ci:: 11008c2ecf20Sopenharmony_ci 11018c2ecf20Sopenharmony_ci 1 static struct rcu_node *rcu_get_root(struct rcu_state *rsp) 11028c2ecf20Sopenharmony_ci 2 { 11038c2ecf20Sopenharmony_ci 3 return &rsp->node[0]; 11048c2ecf20Sopenharmony_ci 4 } 11058c2ecf20Sopenharmony_ci 5 11068c2ecf20Sopenharmony_ci 6 #define rcu_for_each_node_breadth_first(rsp, rnp) \ 11078c2ecf20Sopenharmony_ci 7 for ((rnp) = &(rsp)->node[0]; \ 11088c2ecf20Sopenharmony_ci 8 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++) 11098c2ecf20Sopenharmony_ci 9 11108c2ecf20Sopenharmony_ci 10 #define rcu_for_each_leaf_node(rsp, rnp) \ 11118c2ecf20Sopenharmony_ci 11 for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \ 11128c2ecf20Sopenharmony_ci 12 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++) 11138c2ecf20Sopenharmony_ci 11148c2ecf20Sopenharmony_ciThe ``rcu_get_root()`` simply returns a pointer to the first element of 11158c2ecf20Sopenharmony_cithe specified ``rcu_state`` structure's ``->node[]`` array, which is the 11168c2ecf20Sopenharmony_ciroot ``rcu_node`` structure. 11178c2ecf20Sopenharmony_ci 11188c2ecf20Sopenharmony_ciAs noted earlier, the ``rcu_for_each_node_breadth_first()`` macro takes 11198c2ecf20Sopenharmony_ciadvantage of the layout of the ``rcu_node`` structures in the 11208c2ecf20Sopenharmony_ci``rcu_state`` structure's ``->node[]`` array, performing a breadth-first 11218c2ecf20Sopenharmony_citraversal by simply traversing the array in order. Similarly, the 11228c2ecf20Sopenharmony_ci``rcu_for_each_leaf_node()`` macro traverses only the last part of the 11238c2ecf20Sopenharmony_ciarray, thus traversing only the leaf ``rcu_node`` structures. 11248c2ecf20Sopenharmony_ci 11258c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 11268c2ecf20Sopenharmony_ci| **Quick Quiz**: | 11278c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 11288c2ecf20Sopenharmony_ci| What does ``rcu_for_each_leaf_node()`` do if the ``rcu_node`` tree | 11298c2ecf20Sopenharmony_ci| contains only a single node? | 11308c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 11318c2ecf20Sopenharmony_ci| **Answer**: | 11328c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 11338c2ecf20Sopenharmony_ci| In the single-node case, ``rcu_for_each_leaf_node()`` traverses the | 11348c2ecf20Sopenharmony_ci| single node. | 11358c2ecf20Sopenharmony_ci+-----------------------------------------------------------------------+ 11368c2ecf20Sopenharmony_ci 11378c2ecf20Sopenharmony_ciSummary 11388c2ecf20Sopenharmony_ci~~~~~~~ 11398c2ecf20Sopenharmony_ci 11408c2ecf20Sopenharmony_ciSo the state of RCU is represented by an ``rcu_state`` structure, which 11418c2ecf20Sopenharmony_cicontains a combining tree of ``rcu_node`` and ``rcu_data`` structures. 11428c2ecf20Sopenharmony_ciFinally, in ``CONFIG_NO_HZ_IDLE`` kernels, each CPU's dyntick-idle state 11438c2ecf20Sopenharmony_ciis tracked by dynticks-related fields in the ``rcu_data`` structure. If 11448c2ecf20Sopenharmony_ciyou made it this far, you are well prepared to read the code 11458c2ecf20Sopenharmony_ciwalkthroughs in the other articles in this series. 11468c2ecf20Sopenharmony_ci 11478c2ecf20Sopenharmony_ciAcknowledgments 11488c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~ 11498c2ecf20Sopenharmony_ci 11508c2ecf20Sopenharmony_ciI owe thanks to Cyrill Gorcunov, Mathieu Desnoyers, Dhaval Giani, Paul 11518c2ecf20Sopenharmony_ciTurner, Abhishek Srivastava, Matt Kowalczyk, and Serge Hallyn for 11528c2ecf20Sopenharmony_cihelping me get this document into a more human-readable state. 11538c2ecf20Sopenharmony_ci 11548c2ecf20Sopenharmony_ciLegal Statement 11558c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~ 11568c2ecf20Sopenharmony_ci 11578c2ecf20Sopenharmony_ciThis work represents the view of the author and does not necessarily 11588c2ecf20Sopenharmony_cirepresent the view of IBM. 11598c2ecf20Sopenharmony_ci 11608c2ecf20Sopenharmony_ciLinux is a registered trademark of Linus Torvalds. 11618c2ecf20Sopenharmony_ci 11628c2ecf20Sopenharmony_ciOther company, product, and service names may be trademarks or service 11638c2ecf20Sopenharmony_cimarks of others. 1164