18c2ecf20Sopenharmony_ci.. _kernel_hacking_lock: 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci=========================== 48c2ecf20Sopenharmony_ciUnreliable Guide To Locking 58c2ecf20Sopenharmony_ci=========================== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ci:Author: Rusty Russell 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ciIntroduction 108c2ecf20Sopenharmony_ci============ 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ciWelcome, to Rusty's Remarkably Unreliable Guide to Kernel Locking 138c2ecf20Sopenharmony_ciissues. This document describes the locking systems in the Linux Kernel 148c2ecf20Sopenharmony_ciin 2.6. 158c2ecf20Sopenharmony_ci 168c2ecf20Sopenharmony_ciWith the wide availability of HyperThreading, and preemption in the 178c2ecf20Sopenharmony_ciLinux Kernel, everyone hacking on the kernel needs to know the 188c2ecf20Sopenharmony_cifundamentals of concurrency and locking for SMP. 198c2ecf20Sopenharmony_ci 208c2ecf20Sopenharmony_ciThe Problem With Concurrency 218c2ecf20Sopenharmony_ci============================ 228c2ecf20Sopenharmony_ci 238c2ecf20Sopenharmony_ci(Skip this if you know what a Race Condition is). 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ciIn a normal program, you can increment a counter like so: 268c2ecf20Sopenharmony_ci 278c2ecf20Sopenharmony_ci:: 288c2ecf20Sopenharmony_ci 298c2ecf20Sopenharmony_ci very_important_count++; 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_ciThis is what they would expect to happen: 338c2ecf20Sopenharmony_ci 348c2ecf20Sopenharmony_ci 358c2ecf20Sopenharmony_ci.. table:: Expected Results 368c2ecf20Sopenharmony_ci 378c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 388c2ecf20Sopenharmony_ci | Instance 1 | Instance 2 | 398c2ecf20Sopenharmony_ci +====================================+====================================+ 408c2ecf20Sopenharmony_ci | read very_important_count (5) | | 418c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 428c2ecf20Sopenharmony_ci | add 1 (6) | | 438c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 448c2ecf20Sopenharmony_ci | write very_important_count (6) | | 458c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 468c2ecf20Sopenharmony_ci | | read very_important_count (6) | 478c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 488c2ecf20Sopenharmony_ci | | add 1 (7) | 498c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 508c2ecf20Sopenharmony_ci | | write very_important_count (7) | 518c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 528c2ecf20Sopenharmony_ci 538c2ecf20Sopenharmony_ciThis is what might happen: 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ci.. table:: Possible Results 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 588c2ecf20Sopenharmony_ci | Instance 1 | Instance 2 | 598c2ecf20Sopenharmony_ci +====================================+====================================+ 608c2ecf20Sopenharmony_ci | read very_important_count (5) | | 618c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 628c2ecf20Sopenharmony_ci | | read very_important_count (5) | 638c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 648c2ecf20Sopenharmony_ci | add 1 (6) | | 658c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 668c2ecf20Sopenharmony_ci | | add 1 (6) | 678c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 688c2ecf20Sopenharmony_ci | write very_important_count (6) | | 698c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 708c2ecf20Sopenharmony_ci | | write very_important_count (6) | 718c2ecf20Sopenharmony_ci +------------------------------------+------------------------------------+ 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_ciRace Conditions and Critical Regions 758c2ecf20Sopenharmony_ci------------------------------------ 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciThis overlap, where the result depends on the relative timing of 788c2ecf20Sopenharmony_cimultiple tasks, is called a race condition. The piece of code containing 798c2ecf20Sopenharmony_cithe concurrency issue is called a critical region. And especially since 808c2ecf20Sopenharmony_ciLinux starting running on SMP machines, they became one of the major 818c2ecf20Sopenharmony_ciissues in kernel design and implementation. 828c2ecf20Sopenharmony_ci 838c2ecf20Sopenharmony_ciPreemption can have the same effect, even if there is only one CPU: by 848c2ecf20Sopenharmony_cipreempting one task during the critical region, we have exactly the same 858c2ecf20Sopenharmony_cirace condition. In this case the thread which preempts might run the 868c2ecf20Sopenharmony_cicritical region itself. 878c2ecf20Sopenharmony_ci 888c2ecf20Sopenharmony_ciThe solution is to recognize when these simultaneous accesses occur, and 898c2ecf20Sopenharmony_ciuse locks to make sure that only one instance can enter the critical 908c2ecf20Sopenharmony_ciregion at any time. There are many friendly primitives in the Linux 918c2ecf20Sopenharmony_cikernel to help you do this. And then there are the unfriendly 928c2ecf20Sopenharmony_ciprimitives, but I'll pretend they don't exist. 938c2ecf20Sopenharmony_ci 948c2ecf20Sopenharmony_ciLocking in the Linux Kernel 958c2ecf20Sopenharmony_ci=========================== 968c2ecf20Sopenharmony_ci 978c2ecf20Sopenharmony_ciIf I could give you one piece of advice: never sleep with anyone crazier 988c2ecf20Sopenharmony_cithan yourself. But if I had to give you advice on locking: **keep it 998c2ecf20Sopenharmony_cisimple**. 1008c2ecf20Sopenharmony_ci 1018c2ecf20Sopenharmony_ciBe reluctant to introduce new locks. 1028c2ecf20Sopenharmony_ci 1038c2ecf20Sopenharmony_ciStrangely enough, this last one is the exact reverse of my advice when 1048c2ecf20Sopenharmony_ciyou **have** slept with someone crazier than yourself. And you should 1058c2ecf20Sopenharmony_cithink about getting a big dog. 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ciTwo Main Types of Kernel Locks: Spinlocks and Mutexes 1088c2ecf20Sopenharmony_ci----------------------------------------------------- 1098c2ecf20Sopenharmony_ci 1108c2ecf20Sopenharmony_ciThere are two main types of kernel locks. The fundamental type is the 1118c2ecf20Sopenharmony_cispinlock (``include/asm/spinlock.h``), which is a very simple 1128c2ecf20Sopenharmony_cisingle-holder lock: if you can't get the spinlock, you keep trying 1138c2ecf20Sopenharmony_ci(spinning) until you can. Spinlocks are very small and fast, and can be 1148c2ecf20Sopenharmony_ciused anywhere. 1158c2ecf20Sopenharmony_ci 1168c2ecf20Sopenharmony_ciThe second type is a mutex (``include/linux/mutex.h``): it is like a 1178c2ecf20Sopenharmony_cispinlock, but you may block holding a mutex. If you can't lock a mutex, 1188c2ecf20Sopenharmony_ciyour task will suspend itself, and be woken up when the mutex is 1198c2ecf20Sopenharmony_cireleased. This means the CPU can do something else while you are 1208c2ecf20Sopenharmony_ciwaiting. There are many cases when you simply can't sleep (see 1218c2ecf20Sopenharmony_ci`What Functions Are Safe To Call From Interrupts? <#sleeping-things>`__), 1228c2ecf20Sopenharmony_ciand so have to use a spinlock instead. 1238c2ecf20Sopenharmony_ci 1248c2ecf20Sopenharmony_ciNeither type of lock is recursive: see 1258c2ecf20Sopenharmony_ci`Deadlock: Simple and Advanced <#deadlock>`__. 1268c2ecf20Sopenharmony_ci 1278c2ecf20Sopenharmony_ciLocks and Uniprocessor Kernels 1288c2ecf20Sopenharmony_ci------------------------------ 1298c2ecf20Sopenharmony_ci 1308c2ecf20Sopenharmony_ciFor kernels compiled without ``CONFIG_SMP``, and without 1318c2ecf20Sopenharmony_ci``CONFIG_PREEMPT`` spinlocks do not exist at all. This is an excellent 1328c2ecf20Sopenharmony_cidesign decision: when no-one else can run at the same time, there is no 1338c2ecf20Sopenharmony_cireason to have a lock. 1348c2ecf20Sopenharmony_ci 1358c2ecf20Sopenharmony_ciIf the kernel is compiled without ``CONFIG_SMP``, but ``CONFIG_PREEMPT`` 1368c2ecf20Sopenharmony_ciis set, then spinlocks simply disable preemption, which is sufficient to 1378c2ecf20Sopenharmony_ciprevent any races. For most purposes, we can think of preemption as 1388c2ecf20Sopenharmony_ciequivalent to SMP, and not worry about it separately. 1398c2ecf20Sopenharmony_ci 1408c2ecf20Sopenharmony_ciYou should always test your locking code with ``CONFIG_SMP`` and 1418c2ecf20Sopenharmony_ci``CONFIG_PREEMPT`` enabled, even if you don't have an SMP test box, 1428c2ecf20Sopenharmony_cibecause it will still catch some kinds of locking bugs. 1438c2ecf20Sopenharmony_ci 1448c2ecf20Sopenharmony_ciMutexes still exist, because they are required for synchronization 1458c2ecf20Sopenharmony_cibetween user contexts, as we will see below. 1468c2ecf20Sopenharmony_ci 1478c2ecf20Sopenharmony_ciLocking Only In User Context 1488c2ecf20Sopenharmony_ci---------------------------- 1498c2ecf20Sopenharmony_ci 1508c2ecf20Sopenharmony_ciIf you have a data structure which is only ever accessed from user 1518c2ecf20Sopenharmony_cicontext, then you can use a simple mutex (``include/linux/mutex.h``) to 1528c2ecf20Sopenharmony_ciprotect it. This is the most trivial case: you initialize the mutex. 1538c2ecf20Sopenharmony_ciThen you can call mutex_lock_interruptible() to grab the 1548c2ecf20Sopenharmony_cimutex, and mutex_unlock() to release it. There is also a 1558c2ecf20Sopenharmony_cimutex_lock(), which should be avoided, because it will 1568c2ecf20Sopenharmony_cinot return if a signal is received. 1578c2ecf20Sopenharmony_ci 1588c2ecf20Sopenharmony_ciExample: ``net/netfilter/nf_sockopt.c`` allows registration of new 1598c2ecf20Sopenharmony_cisetsockopt() and getsockopt() calls, with 1608c2ecf20Sopenharmony_cinf_register_sockopt(). Registration and de-registration 1618c2ecf20Sopenharmony_ciare only done on module load and unload (and boot time, where there is 1628c2ecf20Sopenharmony_cino concurrency), and the list of registrations is only consulted for an 1638c2ecf20Sopenharmony_ciunknown setsockopt() or getsockopt() system 1648c2ecf20Sopenharmony_cicall. The ``nf_sockopt_mutex`` is perfect to protect this, especially 1658c2ecf20Sopenharmony_cisince the setsockopt and getsockopt calls may well sleep. 1668c2ecf20Sopenharmony_ci 1678c2ecf20Sopenharmony_ciLocking Between User Context and Softirqs 1688c2ecf20Sopenharmony_ci----------------------------------------- 1698c2ecf20Sopenharmony_ci 1708c2ecf20Sopenharmony_ciIf a softirq shares data with user context, you have two problems. 1718c2ecf20Sopenharmony_ciFirstly, the current user context can be interrupted by a softirq, and 1728c2ecf20Sopenharmony_cisecondly, the critical region could be entered from another CPU. This is 1738c2ecf20Sopenharmony_ciwhere spin_lock_bh() (``include/linux/spinlock.h``) is 1748c2ecf20Sopenharmony_ciused. It disables softirqs on that CPU, then grabs the lock. 1758c2ecf20Sopenharmony_cispin_unlock_bh() does the reverse. (The '_bh' suffix is 1768c2ecf20Sopenharmony_cia historical reference to "Bottom Halves", the old name for software 1778c2ecf20Sopenharmony_ciinterrupts. It should really be called spin_lock_softirq()' in a 1788c2ecf20Sopenharmony_ciperfect world). 1798c2ecf20Sopenharmony_ci 1808c2ecf20Sopenharmony_ciNote that you can also use spin_lock_irq() or 1818c2ecf20Sopenharmony_cispin_lock_irqsave() here, which stop hardware interrupts 1828c2ecf20Sopenharmony_cias well: see `Hard IRQ Context <#hard-irq-context>`__. 1838c2ecf20Sopenharmony_ci 1848c2ecf20Sopenharmony_ciThis works perfectly for UP as well: the spin lock vanishes, and this 1858c2ecf20Sopenharmony_cimacro simply becomes local_bh_disable() 1868c2ecf20Sopenharmony_ci(``include/linux/interrupt.h``), which protects you from the softirq 1878c2ecf20Sopenharmony_cibeing run. 1888c2ecf20Sopenharmony_ci 1898c2ecf20Sopenharmony_ciLocking Between User Context and Tasklets 1908c2ecf20Sopenharmony_ci----------------------------------------- 1918c2ecf20Sopenharmony_ci 1928c2ecf20Sopenharmony_ciThis is exactly the same as above, because tasklets are actually run 1938c2ecf20Sopenharmony_cifrom a softirq. 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ciLocking Between User Context and Timers 1968c2ecf20Sopenharmony_ci--------------------------------------- 1978c2ecf20Sopenharmony_ci 1988c2ecf20Sopenharmony_ciThis, too, is exactly the same as above, because timers are actually run 1998c2ecf20Sopenharmony_cifrom a softirq. From a locking point of view, tasklets and timers are 2008c2ecf20Sopenharmony_ciidentical. 2018c2ecf20Sopenharmony_ci 2028c2ecf20Sopenharmony_ciLocking Between Tasklets/Timers 2038c2ecf20Sopenharmony_ci------------------------------- 2048c2ecf20Sopenharmony_ci 2058c2ecf20Sopenharmony_ciSometimes a tasklet or timer might want to share data with another 2068c2ecf20Sopenharmony_citasklet or timer. 2078c2ecf20Sopenharmony_ci 2088c2ecf20Sopenharmony_ciThe Same Tasklet/Timer 2098c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~ 2108c2ecf20Sopenharmony_ci 2118c2ecf20Sopenharmony_ciSince a tasklet is never run on two CPUs at once, you don't need to 2128c2ecf20Sopenharmony_ciworry about your tasklet being reentrant (running twice at once), even 2138c2ecf20Sopenharmony_cion SMP. 2148c2ecf20Sopenharmony_ci 2158c2ecf20Sopenharmony_ciDifferent Tasklets/Timers 2168c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~ 2178c2ecf20Sopenharmony_ci 2188c2ecf20Sopenharmony_ciIf another tasklet/timer wants to share data with your tasklet or timer 2198c2ecf20Sopenharmony_ci, you will both need to use spin_lock() and 2208c2ecf20Sopenharmony_cispin_unlock() calls. spin_lock_bh() is 2218c2ecf20Sopenharmony_ciunnecessary here, as you are already in a tasklet, and none will be run 2228c2ecf20Sopenharmony_cion the same CPU. 2238c2ecf20Sopenharmony_ci 2248c2ecf20Sopenharmony_ciLocking Between Softirqs 2258c2ecf20Sopenharmony_ci------------------------ 2268c2ecf20Sopenharmony_ci 2278c2ecf20Sopenharmony_ciOften a softirq might want to share data with itself or a tasklet/timer. 2288c2ecf20Sopenharmony_ci 2298c2ecf20Sopenharmony_ciThe Same Softirq 2308c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~ 2318c2ecf20Sopenharmony_ci 2328c2ecf20Sopenharmony_ciThe same softirq can run on the other CPUs: you can use a per-CPU array 2338c2ecf20Sopenharmony_ci(see `Per-CPU Data <#per-cpu-data>`__) for better performance. If you're 2348c2ecf20Sopenharmony_cigoing so far as to use a softirq, you probably care about scalable 2358c2ecf20Sopenharmony_ciperformance enough to justify the extra complexity. 2368c2ecf20Sopenharmony_ci 2378c2ecf20Sopenharmony_ciYou'll need to use spin_lock() and 2388c2ecf20Sopenharmony_cispin_unlock() for shared data. 2398c2ecf20Sopenharmony_ci 2408c2ecf20Sopenharmony_ciDifferent Softirqs 2418c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~ 2428c2ecf20Sopenharmony_ci 2438c2ecf20Sopenharmony_ciYou'll need to use spin_lock() and 2448c2ecf20Sopenharmony_cispin_unlock() for shared data, whether it be a timer, 2458c2ecf20Sopenharmony_citasklet, different softirq or the same or another softirq: any of them 2468c2ecf20Sopenharmony_cicould be running on a different CPU. 2478c2ecf20Sopenharmony_ci 2488c2ecf20Sopenharmony_ciHard IRQ Context 2498c2ecf20Sopenharmony_ci================ 2508c2ecf20Sopenharmony_ci 2518c2ecf20Sopenharmony_ciHardware interrupts usually communicate with a tasklet or softirq. 2528c2ecf20Sopenharmony_ciFrequently this involves putting work in a queue, which the softirq will 2538c2ecf20Sopenharmony_citake out. 2548c2ecf20Sopenharmony_ci 2558c2ecf20Sopenharmony_ciLocking Between Hard IRQ and Softirqs/Tasklets 2568c2ecf20Sopenharmony_ci---------------------------------------------- 2578c2ecf20Sopenharmony_ci 2588c2ecf20Sopenharmony_ciIf a hardware irq handler shares data with a softirq, you have two 2598c2ecf20Sopenharmony_ciconcerns. Firstly, the softirq processing can be interrupted by a 2608c2ecf20Sopenharmony_cihardware interrupt, and secondly, the critical region could be entered 2618c2ecf20Sopenharmony_ciby a hardware interrupt on another CPU. This is where 2628c2ecf20Sopenharmony_cispin_lock_irq() is used. It is defined to disable 2638c2ecf20Sopenharmony_ciinterrupts on that cpu, then grab the lock. 2648c2ecf20Sopenharmony_cispin_unlock_irq() does the reverse. 2658c2ecf20Sopenharmony_ci 2668c2ecf20Sopenharmony_ciThe irq handler does not need to use spin_lock_irq(), because 2678c2ecf20Sopenharmony_cithe softirq cannot run while the irq handler is running: it can use 2688c2ecf20Sopenharmony_cispin_lock(), which is slightly faster. The only exception 2698c2ecf20Sopenharmony_ciwould be if a different hardware irq handler uses the same lock: 2708c2ecf20Sopenharmony_cispin_lock_irq() will stop that from interrupting us. 2718c2ecf20Sopenharmony_ci 2728c2ecf20Sopenharmony_ciThis works perfectly for UP as well: the spin lock vanishes, and this 2738c2ecf20Sopenharmony_cimacro simply becomes local_irq_disable() 2748c2ecf20Sopenharmony_ci(``include/asm/smp.h``), which protects you from the softirq/tasklet/BH 2758c2ecf20Sopenharmony_cibeing run. 2768c2ecf20Sopenharmony_ci 2778c2ecf20Sopenharmony_cispin_lock_irqsave() (``include/linux/spinlock.h``) is a 2788c2ecf20Sopenharmony_civariant which saves whether interrupts were on or off in a flags word, 2798c2ecf20Sopenharmony_ciwhich is passed to spin_unlock_irqrestore(). This means 2808c2ecf20Sopenharmony_cithat the same code can be used inside an hard irq handler (where 2818c2ecf20Sopenharmony_ciinterrupts are already off) and in softirqs (where the irq disabling is 2828c2ecf20Sopenharmony_cirequired). 2838c2ecf20Sopenharmony_ci 2848c2ecf20Sopenharmony_ciNote that softirqs (and hence tasklets and timers) are run on return 2858c2ecf20Sopenharmony_cifrom hardware interrupts, so spin_lock_irq() also stops 2868c2ecf20Sopenharmony_cithese. In that sense, spin_lock_irqsave() is the most 2878c2ecf20Sopenharmony_cigeneral and powerful locking function. 2888c2ecf20Sopenharmony_ci 2898c2ecf20Sopenharmony_ciLocking Between Two Hard IRQ Handlers 2908c2ecf20Sopenharmony_ci------------------------------------- 2918c2ecf20Sopenharmony_ci 2928c2ecf20Sopenharmony_ciIt is rare to have to share data between two IRQ handlers, but if you 2938c2ecf20Sopenharmony_cido, spin_lock_irqsave() should be used: it is 2948c2ecf20Sopenharmony_ciarchitecture-specific whether all interrupts are disabled inside irq 2958c2ecf20Sopenharmony_cihandlers themselves. 2968c2ecf20Sopenharmony_ci 2978c2ecf20Sopenharmony_ciCheat Sheet For Locking 2988c2ecf20Sopenharmony_ci======================= 2998c2ecf20Sopenharmony_ci 3008c2ecf20Sopenharmony_ciPete Zaitcev gives the following summary: 3018c2ecf20Sopenharmony_ci 3028c2ecf20Sopenharmony_ci- If you are in a process context (any syscall) and want to lock other 3038c2ecf20Sopenharmony_ci process out, use a mutex. You can take a mutex and sleep 3048c2ecf20Sopenharmony_ci (``copy_from_user*(`` or ``kmalloc(x,GFP_KERNEL)``). 3058c2ecf20Sopenharmony_ci 3068c2ecf20Sopenharmony_ci- Otherwise (== data can be touched in an interrupt), use 3078c2ecf20Sopenharmony_ci spin_lock_irqsave() and 3088c2ecf20Sopenharmony_ci spin_unlock_irqrestore(). 3098c2ecf20Sopenharmony_ci 3108c2ecf20Sopenharmony_ci- Avoid holding spinlock for more than 5 lines of code and across any 3118c2ecf20Sopenharmony_ci function call (except accessors like readb()). 3128c2ecf20Sopenharmony_ci 3138c2ecf20Sopenharmony_ciTable of Minimum Requirements 3148c2ecf20Sopenharmony_ci----------------------------- 3158c2ecf20Sopenharmony_ci 3168c2ecf20Sopenharmony_ciThe following table lists the **minimum** locking requirements between 3178c2ecf20Sopenharmony_civarious contexts. In some cases, the same context can only be running on 3188c2ecf20Sopenharmony_cione CPU at a time, so no locking is required for that context (eg. a 3198c2ecf20Sopenharmony_ciparticular thread can only run on one CPU at a time, but if it needs 3208c2ecf20Sopenharmony_cishares data with another thread, locking is required). 3218c2ecf20Sopenharmony_ci 3228c2ecf20Sopenharmony_ciRemember the advice above: you can always use 3238c2ecf20Sopenharmony_cispin_lock_irqsave(), which is a superset of all other 3248c2ecf20Sopenharmony_cispinlock primitives. 3258c2ecf20Sopenharmony_ci 3268c2ecf20Sopenharmony_ci============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== 3278c2ecf20Sopenharmony_ci. IRQ Handler A IRQ Handler B Softirq A Softirq B Tasklet A Tasklet B Timer A Timer B User Context A User Context B 3288c2ecf20Sopenharmony_ci============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== 3298c2ecf20Sopenharmony_ciIRQ Handler A None 3308c2ecf20Sopenharmony_ciIRQ Handler B SLIS None 3318c2ecf20Sopenharmony_ciSoftirq A SLI SLI SL 3328c2ecf20Sopenharmony_ciSoftirq B SLI SLI SL SL 3338c2ecf20Sopenharmony_ciTasklet A SLI SLI SL SL None 3348c2ecf20Sopenharmony_ciTasklet B SLI SLI SL SL SL None 3358c2ecf20Sopenharmony_ciTimer A SLI SLI SL SL SL SL None 3368c2ecf20Sopenharmony_ciTimer B SLI SLI SL SL SL SL SL None 3378c2ecf20Sopenharmony_ciUser Context A SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH None 3388c2ecf20Sopenharmony_ciUser Context B SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH MLI None 3398c2ecf20Sopenharmony_ci============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== 3408c2ecf20Sopenharmony_ci 3418c2ecf20Sopenharmony_ciTable: Table of Locking Requirements 3428c2ecf20Sopenharmony_ci 3438c2ecf20Sopenharmony_ci+--------+----------------------------+ 3448c2ecf20Sopenharmony_ci| SLIS | spin_lock_irqsave | 3458c2ecf20Sopenharmony_ci+--------+----------------------------+ 3468c2ecf20Sopenharmony_ci| SLI | spin_lock_irq | 3478c2ecf20Sopenharmony_ci+--------+----------------------------+ 3488c2ecf20Sopenharmony_ci| SL | spin_lock | 3498c2ecf20Sopenharmony_ci+--------+----------------------------+ 3508c2ecf20Sopenharmony_ci| SLBH | spin_lock_bh | 3518c2ecf20Sopenharmony_ci+--------+----------------------------+ 3528c2ecf20Sopenharmony_ci| MLI | mutex_lock_interruptible | 3538c2ecf20Sopenharmony_ci+--------+----------------------------+ 3548c2ecf20Sopenharmony_ci 3558c2ecf20Sopenharmony_ciTable: Legend for Locking Requirements Table 3568c2ecf20Sopenharmony_ci 3578c2ecf20Sopenharmony_ciThe trylock Functions 3588c2ecf20Sopenharmony_ci===================== 3598c2ecf20Sopenharmony_ci 3608c2ecf20Sopenharmony_ciThere are functions that try to acquire a lock only once and immediately 3618c2ecf20Sopenharmony_cireturn a value telling about success or failure to acquire the lock. 3628c2ecf20Sopenharmony_ciThey can be used if you need no access to the data protected with the 3638c2ecf20Sopenharmony_cilock when some other thread is holding the lock. You should acquire the 3648c2ecf20Sopenharmony_cilock later if you then need access to the data protected with the lock. 3658c2ecf20Sopenharmony_ci 3668c2ecf20Sopenharmony_cispin_trylock() does not spin but returns non-zero if it 3678c2ecf20Sopenharmony_ciacquires the spinlock on the first try or 0 if not. This function can be 3688c2ecf20Sopenharmony_ciused in all contexts like spin_lock(): you must have 3698c2ecf20Sopenharmony_cidisabled the contexts that might interrupt you and acquire the spin 3708c2ecf20Sopenharmony_cilock. 3718c2ecf20Sopenharmony_ci 3728c2ecf20Sopenharmony_cimutex_trylock() does not suspend your task but returns 3738c2ecf20Sopenharmony_cinon-zero if it could lock the mutex on the first try or 0 if not. This 3748c2ecf20Sopenharmony_cifunction cannot be safely used in hardware or software interrupt 3758c2ecf20Sopenharmony_cicontexts despite not sleeping. 3768c2ecf20Sopenharmony_ci 3778c2ecf20Sopenharmony_ciCommon Examples 3788c2ecf20Sopenharmony_ci=============== 3798c2ecf20Sopenharmony_ci 3808c2ecf20Sopenharmony_ciLet's step through a simple example: a cache of number to name mappings. 3818c2ecf20Sopenharmony_ciThe cache keeps a count of how often each of the objects is used, and 3828c2ecf20Sopenharmony_ciwhen it gets full, throws out the least used one. 3838c2ecf20Sopenharmony_ci 3848c2ecf20Sopenharmony_ciAll In User Context 3858c2ecf20Sopenharmony_ci------------------- 3868c2ecf20Sopenharmony_ci 3878c2ecf20Sopenharmony_ciFor our first example, we assume that all operations are in user context 3888c2ecf20Sopenharmony_ci(ie. from system calls), so we can sleep. This means we can use a mutex 3898c2ecf20Sopenharmony_cito protect the cache and all the objects within it. Here's the code:: 3908c2ecf20Sopenharmony_ci 3918c2ecf20Sopenharmony_ci #include <linux/list.h> 3928c2ecf20Sopenharmony_ci #include <linux/slab.h> 3938c2ecf20Sopenharmony_ci #include <linux/string.h> 3948c2ecf20Sopenharmony_ci #include <linux/mutex.h> 3958c2ecf20Sopenharmony_ci #include <asm/errno.h> 3968c2ecf20Sopenharmony_ci 3978c2ecf20Sopenharmony_ci struct object 3988c2ecf20Sopenharmony_ci { 3998c2ecf20Sopenharmony_ci struct list_head list; 4008c2ecf20Sopenharmony_ci int id; 4018c2ecf20Sopenharmony_ci char name[32]; 4028c2ecf20Sopenharmony_ci int popularity; 4038c2ecf20Sopenharmony_ci }; 4048c2ecf20Sopenharmony_ci 4058c2ecf20Sopenharmony_ci /* Protects the cache, cache_num, and the objects within it */ 4068c2ecf20Sopenharmony_ci static DEFINE_MUTEX(cache_lock); 4078c2ecf20Sopenharmony_ci static LIST_HEAD(cache); 4088c2ecf20Sopenharmony_ci static unsigned int cache_num = 0; 4098c2ecf20Sopenharmony_ci #define MAX_CACHE_SIZE 10 4108c2ecf20Sopenharmony_ci 4118c2ecf20Sopenharmony_ci /* Must be holding cache_lock */ 4128c2ecf20Sopenharmony_ci static struct object *__cache_find(int id) 4138c2ecf20Sopenharmony_ci { 4148c2ecf20Sopenharmony_ci struct object *i; 4158c2ecf20Sopenharmony_ci 4168c2ecf20Sopenharmony_ci list_for_each_entry(i, &cache, list) 4178c2ecf20Sopenharmony_ci if (i->id == id) { 4188c2ecf20Sopenharmony_ci i->popularity++; 4198c2ecf20Sopenharmony_ci return i; 4208c2ecf20Sopenharmony_ci } 4218c2ecf20Sopenharmony_ci return NULL; 4228c2ecf20Sopenharmony_ci } 4238c2ecf20Sopenharmony_ci 4248c2ecf20Sopenharmony_ci /* Must be holding cache_lock */ 4258c2ecf20Sopenharmony_ci static void __cache_delete(struct object *obj) 4268c2ecf20Sopenharmony_ci { 4278c2ecf20Sopenharmony_ci BUG_ON(!obj); 4288c2ecf20Sopenharmony_ci list_del(&obj->list); 4298c2ecf20Sopenharmony_ci kfree(obj); 4308c2ecf20Sopenharmony_ci cache_num--; 4318c2ecf20Sopenharmony_ci } 4328c2ecf20Sopenharmony_ci 4338c2ecf20Sopenharmony_ci /* Must be holding cache_lock */ 4348c2ecf20Sopenharmony_ci static void __cache_add(struct object *obj) 4358c2ecf20Sopenharmony_ci { 4368c2ecf20Sopenharmony_ci list_add(&obj->list, &cache); 4378c2ecf20Sopenharmony_ci if (++cache_num > MAX_CACHE_SIZE) { 4388c2ecf20Sopenharmony_ci struct object *i, *outcast = NULL; 4398c2ecf20Sopenharmony_ci list_for_each_entry(i, &cache, list) { 4408c2ecf20Sopenharmony_ci if (!outcast || i->popularity < outcast->popularity) 4418c2ecf20Sopenharmony_ci outcast = i; 4428c2ecf20Sopenharmony_ci } 4438c2ecf20Sopenharmony_ci __cache_delete(outcast); 4448c2ecf20Sopenharmony_ci } 4458c2ecf20Sopenharmony_ci } 4468c2ecf20Sopenharmony_ci 4478c2ecf20Sopenharmony_ci int cache_add(int id, const char *name) 4488c2ecf20Sopenharmony_ci { 4498c2ecf20Sopenharmony_ci struct object *obj; 4508c2ecf20Sopenharmony_ci 4518c2ecf20Sopenharmony_ci if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL) 4528c2ecf20Sopenharmony_ci return -ENOMEM; 4538c2ecf20Sopenharmony_ci 4548c2ecf20Sopenharmony_ci strscpy(obj->name, name, sizeof(obj->name)); 4558c2ecf20Sopenharmony_ci obj->id = id; 4568c2ecf20Sopenharmony_ci obj->popularity = 0; 4578c2ecf20Sopenharmony_ci 4588c2ecf20Sopenharmony_ci mutex_lock(&cache_lock); 4598c2ecf20Sopenharmony_ci __cache_add(obj); 4608c2ecf20Sopenharmony_ci mutex_unlock(&cache_lock); 4618c2ecf20Sopenharmony_ci return 0; 4628c2ecf20Sopenharmony_ci } 4638c2ecf20Sopenharmony_ci 4648c2ecf20Sopenharmony_ci void cache_delete(int id) 4658c2ecf20Sopenharmony_ci { 4668c2ecf20Sopenharmony_ci mutex_lock(&cache_lock); 4678c2ecf20Sopenharmony_ci __cache_delete(__cache_find(id)); 4688c2ecf20Sopenharmony_ci mutex_unlock(&cache_lock); 4698c2ecf20Sopenharmony_ci } 4708c2ecf20Sopenharmony_ci 4718c2ecf20Sopenharmony_ci int cache_find(int id, char *name) 4728c2ecf20Sopenharmony_ci { 4738c2ecf20Sopenharmony_ci struct object *obj; 4748c2ecf20Sopenharmony_ci int ret = -ENOENT; 4758c2ecf20Sopenharmony_ci 4768c2ecf20Sopenharmony_ci mutex_lock(&cache_lock); 4778c2ecf20Sopenharmony_ci obj = __cache_find(id); 4788c2ecf20Sopenharmony_ci if (obj) { 4798c2ecf20Sopenharmony_ci ret = 0; 4808c2ecf20Sopenharmony_ci strcpy(name, obj->name); 4818c2ecf20Sopenharmony_ci } 4828c2ecf20Sopenharmony_ci mutex_unlock(&cache_lock); 4838c2ecf20Sopenharmony_ci return ret; 4848c2ecf20Sopenharmony_ci } 4858c2ecf20Sopenharmony_ci 4868c2ecf20Sopenharmony_ciNote that we always make sure we have the cache_lock when we add, 4878c2ecf20Sopenharmony_cidelete, or look up the cache: both the cache infrastructure itself and 4888c2ecf20Sopenharmony_cithe contents of the objects are protected by the lock. In this case it's 4898c2ecf20Sopenharmony_cieasy, since we copy the data for the user, and never let them access the 4908c2ecf20Sopenharmony_ciobjects directly. 4918c2ecf20Sopenharmony_ci 4928c2ecf20Sopenharmony_ciThere is a slight (and common) optimization here: in 4938c2ecf20Sopenharmony_cicache_add() we set up the fields of the object before 4948c2ecf20Sopenharmony_cigrabbing the lock. This is safe, as no-one else can access it until we 4958c2ecf20Sopenharmony_ciput it in cache. 4968c2ecf20Sopenharmony_ci 4978c2ecf20Sopenharmony_ciAccessing From Interrupt Context 4988c2ecf20Sopenharmony_ci-------------------------------- 4998c2ecf20Sopenharmony_ci 5008c2ecf20Sopenharmony_ciNow consider the case where cache_find() can be called 5018c2ecf20Sopenharmony_cifrom interrupt context: either a hardware interrupt or a softirq. An 5028c2ecf20Sopenharmony_ciexample would be a timer which deletes object from the cache. 5038c2ecf20Sopenharmony_ci 5048c2ecf20Sopenharmony_ciThe change is shown below, in standard patch format: the ``-`` are lines 5058c2ecf20Sopenharmony_ciwhich are taken away, and the ``+`` are lines which are added. 5068c2ecf20Sopenharmony_ci 5078c2ecf20Sopenharmony_ci:: 5088c2ecf20Sopenharmony_ci 5098c2ecf20Sopenharmony_ci --- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100 5108c2ecf20Sopenharmony_ci +++ cache.c.interrupt 2003-12-09 14:07:49.000000000 +1100 5118c2ecf20Sopenharmony_ci @@ -12,7 +12,7 @@ 5128c2ecf20Sopenharmony_ci int popularity; 5138c2ecf20Sopenharmony_ci }; 5148c2ecf20Sopenharmony_ci 5158c2ecf20Sopenharmony_ci -static DEFINE_MUTEX(cache_lock); 5168c2ecf20Sopenharmony_ci +static DEFINE_SPINLOCK(cache_lock); 5178c2ecf20Sopenharmony_ci static LIST_HEAD(cache); 5188c2ecf20Sopenharmony_ci static unsigned int cache_num = 0; 5198c2ecf20Sopenharmony_ci #define MAX_CACHE_SIZE 10 5208c2ecf20Sopenharmony_ci @@ -55,6 +55,7 @@ 5218c2ecf20Sopenharmony_ci int cache_add(int id, const char *name) 5228c2ecf20Sopenharmony_ci { 5238c2ecf20Sopenharmony_ci struct object *obj; 5248c2ecf20Sopenharmony_ci + unsigned long flags; 5258c2ecf20Sopenharmony_ci 5268c2ecf20Sopenharmony_ci if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL) 5278c2ecf20Sopenharmony_ci return -ENOMEM; 5288c2ecf20Sopenharmony_ci @@ -63,30 +64,33 @@ 5298c2ecf20Sopenharmony_ci obj->id = id; 5308c2ecf20Sopenharmony_ci obj->popularity = 0; 5318c2ecf20Sopenharmony_ci 5328c2ecf20Sopenharmony_ci - mutex_lock(&cache_lock); 5338c2ecf20Sopenharmony_ci + spin_lock_irqsave(&cache_lock, flags); 5348c2ecf20Sopenharmony_ci __cache_add(obj); 5358c2ecf20Sopenharmony_ci - mutex_unlock(&cache_lock); 5368c2ecf20Sopenharmony_ci + spin_unlock_irqrestore(&cache_lock, flags); 5378c2ecf20Sopenharmony_ci return 0; 5388c2ecf20Sopenharmony_ci } 5398c2ecf20Sopenharmony_ci 5408c2ecf20Sopenharmony_ci void cache_delete(int id) 5418c2ecf20Sopenharmony_ci { 5428c2ecf20Sopenharmony_ci - mutex_lock(&cache_lock); 5438c2ecf20Sopenharmony_ci + unsigned long flags; 5448c2ecf20Sopenharmony_ci + 5458c2ecf20Sopenharmony_ci + spin_lock_irqsave(&cache_lock, flags); 5468c2ecf20Sopenharmony_ci __cache_delete(__cache_find(id)); 5478c2ecf20Sopenharmony_ci - mutex_unlock(&cache_lock); 5488c2ecf20Sopenharmony_ci + spin_unlock_irqrestore(&cache_lock, flags); 5498c2ecf20Sopenharmony_ci } 5508c2ecf20Sopenharmony_ci 5518c2ecf20Sopenharmony_ci int cache_find(int id, char *name) 5528c2ecf20Sopenharmony_ci { 5538c2ecf20Sopenharmony_ci struct object *obj; 5548c2ecf20Sopenharmony_ci int ret = -ENOENT; 5558c2ecf20Sopenharmony_ci + unsigned long flags; 5568c2ecf20Sopenharmony_ci 5578c2ecf20Sopenharmony_ci - mutex_lock(&cache_lock); 5588c2ecf20Sopenharmony_ci + spin_lock_irqsave(&cache_lock, flags); 5598c2ecf20Sopenharmony_ci obj = __cache_find(id); 5608c2ecf20Sopenharmony_ci if (obj) { 5618c2ecf20Sopenharmony_ci ret = 0; 5628c2ecf20Sopenharmony_ci strcpy(name, obj->name); 5638c2ecf20Sopenharmony_ci } 5648c2ecf20Sopenharmony_ci - mutex_unlock(&cache_lock); 5658c2ecf20Sopenharmony_ci + spin_unlock_irqrestore(&cache_lock, flags); 5668c2ecf20Sopenharmony_ci return ret; 5678c2ecf20Sopenharmony_ci } 5688c2ecf20Sopenharmony_ci 5698c2ecf20Sopenharmony_ciNote that the spin_lock_irqsave() will turn off 5708c2ecf20Sopenharmony_ciinterrupts if they are on, otherwise does nothing (if we are already in 5718c2ecf20Sopenharmony_cian interrupt handler), hence these functions are safe to call from any 5728c2ecf20Sopenharmony_cicontext. 5738c2ecf20Sopenharmony_ci 5748c2ecf20Sopenharmony_ciUnfortunately, cache_add() calls kmalloc() 5758c2ecf20Sopenharmony_ciwith the ``GFP_KERNEL`` flag, which is only legal in user context. I 5768c2ecf20Sopenharmony_cihave assumed that cache_add() is still only called in 5778c2ecf20Sopenharmony_ciuser context, otherwise this should become a parameter to 5788c2ecf20Sopenharmony_cicache_add(). 5798c2ecf20Sopenharmony_ci 5808c2ecf20Sopenharmony_ciExposing Objects Outside This File 5818c2ecf20Sopenharmony_ci---------------------------------- 5828c2ecf20Sopenharmony_ci 5838c2ecf20Sopenharmony_ciIf our objects contained more information, it might not be sufficient to 5848c2ecf20Sopenharmony_cicopy the information in and out: other parts of the code might want to 5858c2ecf20Sopenharmony_cikeep pointers to these objects, for example, rather than looking up the 5868c2ecf20Sopenharmony_ciid every time. This produces two problems. 5878c2ecf20Sopenharmony_ci 5888c2ecf20Sopenharmony_ciThe first problem is that we use the ``cache_lock`` to protect objects: 5898c2ecf20Sopenharmony_ciwe'd need to make this non-static so the rest of the code can use it. 5908c2ecf20Sopenharmony_ciThis makes locking trickier, as it is no longer all in one place. 5918c2ecf20Sopenharmony_ci 5928c2ecf20Sopenharmony_ciThe second problem is the lifetime problem: if another structure keeps a 5938c2ecf20Sopenharmony_cipointer to an object, it presumably expects that pointer to remain 5948c2ecf20Sopenharmony_civalid. Unfortunately, this is only guaranteed while you hold the lock, 5958c2ecf20Sopenharmony_ciotherwise someone might call cache_delete() and even 5968c2ecf20Sopenharmony_ciworse, add another object, re-using the same address. 5978c2ecf20Sopenharmony_ci 5988c2ecf20Sopenharmony_ciAs there is only one lock, you can't hold it forever: no-one else would 5998c2ecf20Sopenharmony_ciget any work done. 6008c2ecf20Sopenharmony_ci 6018c2ecf20Sopenharmony_ciThe solution to this problem is to use a reference count: everyone who 6028c2ecf20Sopenharmony_cihas a pointer to the object increases it when they first get the object, 6038c2ecf20Sopenharmony_ciand drops the reference count when they're finished with it. Whoever 6048c2ecf20Sopenharmony_cidrops it to zero knows it is unused, and can actually delete it. 6058c2ecf20Sopenharmony_ci 6068c2ecf20Sopenharmony_ciHere is the code:: 6078c2ecf20Sopenharmony_ci 6088c2ecf20Sopenharmony_ci --- cache.c.interrupt 2003-12-09 14:25:43.000000000 +1100 6098c2ecf20Sopenharmony_ci +++ cache.c.refcnt 2003-12-09 14:33:05.000000000 +1100 6108c2ecf20Sopenharmony_ci @@ -7,6 +7,7 @@ 6118c2ecf20Sopenharmony_ci struct object 6128c2ecf20Sopenharmony_ci { 6138c2ecf20Sopenharmony_ci struct list_head list; 6148c2ecf20Sopenharmony_ci + unsigned int refcnt; 6158c2ecf20Sopenharmony_ci int id; 6168c2ecf20Sopenharmony_ci char name[32]; 6178c2ecf20Sopenharmony_ci int popularity; 6188c2ecf20Sopenharmony_ci @@ -17,6 +18,35 @@ 6198c2ecf20Sopenharmony_ci static unsigned int cache_num = 0; 6208c2ecf20Sopenharmony_ci #define MAX_CACHE_SIZE 10 6218c2ecf20Sopenharmony_ci 6228c2ecf20Sopenharmony_ci +static void __object_put(struct object *obj) 6238c2ecf20Sopenharmony_ci +{ 6248c2ecf20Sopenharmony_ci + if (--obj->refcnt == 0) 6258c2ecf20Sopenharmony_ci + kfree(obj); 6268c2ecf20Sopenharmony_ci +} 6278c2ecf20Sopenharmony_ci + 6288c2ecf20Sopenharmony_ci +static void __object_get(struct object *obj) 6298c2ecf20Sopenharmony_ci +{ 6308c2ecf20Sopenharmony_ci + obj->refcnt++; 6318c2ecf20Sopenharmony_ci +} 6328c2ecf20Sopenharmony_ci + 6338c2ecf20Sopenharmony_ci +void object_put(struct object *obj) 6348c2ecf20Sopenharmony_ci +{ 6358c2ecf20Sopenharmony_ci + unsigned long flags; 6368c2ecf20Sopenharmony_ci + 6378c2ecf20Sopenharmony_ci + spin_lock_irqsave(&cache_lock, flags); 6388c2ecf20Sopenharmony_ci + __object_put(obj); 6398c2ecf20Sopenharmony_ci + spin_unlock_irqrestore(&cache_lock, flags); 6408c2ecf20Sopenharmony_ci +} 6418c2ecf20Sopenharmony_ci + 6428c2ecf20Sopenharmony_ci +void object_get(struct object *obj) 6438c2ecf20Sopenharmony_ci +{ 6448c2ecf20Sopenharmony_ci + unsigned long flags; 6458c2ecf20Sopenharmony_ci + 6468c2ecf20Sopenharmony_ci + spin_lock_irqsave(&cache_lock, flags); 6478c2ecf20Sopenharmony_ci + __object_get(obj); 6488c2ecf20Sopenharmony_ci + spin_unlock_irqrestore(&cache_lock, flags); 6498c2ecf20Sopenharmony_ci +} 6508c2ecf20Sopenharmony_ci + 6518c2ecf20Sopenharmony_ci /* Must be holding cache_lock */ 6528c2ecf20Sopenharmony_ci static struct object *__cache_find(int id) 6538c2ecf20Sopenharmony_ci { 6548c2ecf20Sopenharmony_ci @@ -35,6 +65,7 @@ 6558c2ecf20Sopenharmony_ci { 6568c2ecf20Sopenharmony_ci BUG_ON(!obj); 6578c2ecf20Sopenharmony_ci list_del(&obj->list); 6588c2ecf20Sopenharmony_ci + __object_put(obj); 6598c2ecf20Sopenharmony_ci cache_num--; 6608c2ecf20Sopenharmony_ci } 6618c2ecf20Sopenharmony_ci 6628c2ecf20Sopenharmony_ci @@ -63,6 +94,7 @@ 6638c2ecf20Sopenharmony_ci strscpy(obj->name, name, sizeof(obj->name)); 6648c2ecf20Sopenharmony_ci obj->id = id; 6658c2ecf20Sopenharmony_ci obj->popularity = 0; 6668c2ecf20Sopenharmony_ci + obj->refcnt = 1; /* The cache holds a reference */ 6678c2ecf20Sopenharmony_ci 6688c2ecf20Sopenharmony_ci spin_lock_irqsave(&cache_lock, flags); 6698c2ecf20Sopenharmony_ci __cache_add(obj); 6708c2ecf20Sopenharmony_ci @@ -79,18 +111,15 @@ 6718c2ecf20Sopenharmony_ci spin_unlock_irqrestore(&cache_lock, flags); 6728c2ecf20Sopenharmony_ci } 6738c2ecf20Sopenharmony_ci 6748c2ecf20Sopenharmony_ci -int cache_find(int id, char *name) 6758c2ecf20Sopenharmony_ci +struct object *cache_find(int id) 6768c2ecf20Sopenharmony_ci { 6778c2ecf20Sopenharmony_ci struct object *obj; 6788c2ecf20Sopenharmony_ci - int ret = -ENOENT; 6798c2ecf20Sopenharmony_ci unsigned long flags; 6808c2ecf20Sopenharmony_ci 6818c2ecf20Sopenharmony_ci spin_lock_irqsave(&cache_lock, flags); 6828c2ecf20Sopenharmony_ci obj = __cache_find(id); 6838c2ecf20Sopenharmony_ci - if (obj) { 6848c2ecf20Sopenharmony_ci - ret = 0; 6858c2ecf20Sopenharmony_ci - strcpy(name, obj->name); 6868c2ecf20Sopenharmony_ci - } 6878c2ecf20Sopenharmony_ci + if (obj) 6888c2ecf20Sopenharmony_ci + __object_get(obj); 6898c2ecf20Sopenharmony_ci spin_unlock_irqrestore(&cache_lock, flags); 6908c2ecf20Sopenharmony_ci - return ret; 6918c2ecf20Sopenharmony_ci + return obj; 6928c2ecf20Sopenharmony_ci } 6938c2ecf20Sopenharmony_ci 6948c2ecf20Sopenharmony_ciWe encapsulate the reference counting in the standard 'get' and 'put' 6958c2ecf20Sopenharmony_cifunctions. Now we can return the object itself from 6968c2ecf20Sopenharmony_cicache_find() which has the advantage that the user can 6978c2ecf20Sopenharmony_cinow sleep holding the object (eg. to copy_to_user() to 6988c2ecf20Sopenharmony_ciname to userspace). 6998c2ecf20Sopenharmony_ci 7008c2ecf20Sopenharmony_ciThe other point to note is that I said a reference should be held for 7018c2ecf20Sopenharmony_cievery pointer to the object: thus the reference count is 1 when first 7028c2ecf20Sopenharmony_ciinserted into the cache. In some versions the framework does not hold a 7038c2ecf20Sopenharmony_cireference count, but they are more complicated. 7048c2ecf20Sopenharmony_ci 7058c2ecf20Sopenharmony_ciUsing Atomic Operations For The Reference Count 7068c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 7078c2ecf20Sopenharmony_ci 7088c2ecf20Sopenharmony_ciIn practice, :c:type:`atomic_t` would usually be used for refcnt. There are a 7098c2ecf20Sopenharmony_cinumber of atomic operations defined in ``include/asm/atomic.h``: these 7108c2ecf20Sopenharmony_ciare guaranteed to be seen atomically from all CPUs in the system, so no 7118c2ecf20Sopenharmony_cilock is required. In this case, it is simpler than using spinlocks, 7128c2ecf20Sopenharmony_cialthough for anything non-trivial using spinlocks is clearer. The 7138c2ecf20Sopenharmony_ciatomic_inc() and atomic_dec_and_test() 7148c2ecf20Sopenharmony_ciare used instead of the standard increment and decrement operators, and 7158c2ecf20Sopenharmony_cithe lock is no longer used to protect the reference count itself. 7168c2ecf20Sopenharmony_ci 7178c2ecf20Sopenharmony_ci:: 7188c2ecf20Sopenharmony_ci 7198c2ecf20Sopenharmony_ci --- cache.c.refcnt 2003-12-09 15:00:35.000000000 +1100 7208c2ecf20Sopenharmony_ci +++ cache.c.refcnt-atomic 2003-12-11 15:49:42.000000000 +1100 7218c2ecf20Sopenharmony_ci @@ -7,7 +7,7 @@ 7228c2ecf20Sopenharmony_ci struct object 7238c2ecf20Sopenharmony_ci { 7248c2ecf20Sopenharmony_ci struct list_head list; 7258c2ecf20Sopenharmony_ci - unsigned int refcnt; 7268c2ecf20Sopenharmony_ci + atomic_t refcnt; 7278c2ecf20Sopenharmony_ci int id; 7288c2ecf20Sopenharmony_ci char name[32]; 7298c2ecf20Sopenharmony_ci int popularity; 7308c2ecf20Sopenharmony_ci @@ -18,33 +18,15 @@ 7318c2ecf20Sopenharmony_ci static unsigned int cache_num = 0; 7328c2ecf20Sopenharmony_ci #define MAX_CACHE_SIZE 10 7338c2ecf20Sopenharmony_ci 7348c2ecf20Sopenharmony_ci -static void __object_put(struct object *obj) 7358c2ecf20Sopenharmony_ci -{ 7368c2ecf20Sopenharmony_ci - if (--obj->refcnt == 0) 7378c2ecf20Sopenharmony_ci - kfree(obj); 7388c2ecf20Sopenharmony_ci -} 7398c2ecf20Sopenharmony_ci - 7408c2ecf20Sopenharmony_ci -static void __object_get(struct object *obj) 7418c2ecf20Sopenharmony_ci -{ 7428c2ecf20Sopenharmony_ci - obj->refcnt++; 7438c2ecf20Sopenharmony_ci -} 7448c2ecf20Sopenharmony_ci - 7458c2ecf20Sopenharmony_ci void object_put(struct object *obj) 7468c2ecf20Sopenharmony_ci { 7478c2ecf20Sopenharmony_ci - unsigned long flags; 7488c2ecf20Sopenharmony_ci - 7498c2ecf20Sopenharmony_ci - spin_lock_irqsave(&cache_lock, flags); 7508c2ecf20Sopenharmony_ci - __object_put(obj); 7518c2ecf20Sopenharmony_ci - spin_unlock_irqrestore(&cache_lock, flags); 7528c2ecf20Sopenharmony_ci + if (atomic_dec_and_test(&obj->refcnt)) 7538c2ecf20Sopenharmony_ci + kfree(obj); 7548c2ecf20Sopenharmony_ci } 7558c2ecf20Sopenharmony_ci 7568c2ecf20Sopenharmony_ci void object_get(struct object *obj) 7578c2ecf20Sopenharmony_ci { 7588c2ecf20Sopenharmony_ci - unsigned long flags; 7598c2ecf20Sopenharmony_ci - 7608c2ecf20Sopenharmony_ci - spin_lock_irqsave(&cache_lock, flags); 7618c2ecf20Sopenharmony_ci - __object_get(obj); 7628c2ecf20Sopenharmony_ci - spin_unlock_irqrestore(&cache_lock, flags); 7638c2ecf20Sopenharmony_ci + atomic_inc(&obj->refcnt); 7648c2ecf20Sopenharmony_ci } 7658c2ecf20Sopenharmony_ci 7668c2ecf20Sopenharmony_ci /* Must be holding cache_lock */ 7678c2ecf20Sopenharmony_ci @@ -65,7 +47,7 @@ 7688c2ecf20Sopenharmony_ci { 7698c2ecf20Sopenharmony_ci BUG_ON(!obj); 7708c2ecf20Sopenharmony_ci list_del(&obj->list); 7718c2ecf20Sopenharmony_ci - __object_put(obj); 7728c2ecf20Sopenharmony_ci + object_put(obj); 7738c2ecf20Sopenharmony_ci cache_num--; 7748c2ecf20Sopenharmony_ci } 7758c2ecf20Sopenharmony_ci 7768c2ecf20Sopenharmony_ci @@ -94,7 +76,7 @@ 7778c2ecf20Sopenharmony_ci strscpy(obj->name, name, sizeof(obj->name)); 7788c2ecf20Sopenharmony_ci obj->id = id; 7798c2ecf20Sopenharmony_ci obj->popularity = 0; 7808c2ecf20Sopenharmony_ci - obj->refcnt = 1; /* The cache holds a reference */ 7818c2ecf20Sopenharmony_ci + atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ 7828c2ecf20Sopenharmony_ci 7838c2ecf20Sopenharmony_ci spin_lock_irqsave(&cache_lock, flags); 7848c2ecf20Sopenharmony_ci __cache_add(obj); 7858c2ecf20Sopenharmony_ci @@ -119,7 +101,7 @@ 7868c2ecf20Sopenharmony_ci spin_lock_irqsave(&cache_lock, flags); 7878c2ecf20Sopenharmony_ci obj = __cache_find(id); 7888c2ecf20Sopenharmony_ci if (obj) 7898c2ecf20Sopenharmony_ci - __object_get(obj); 7908c2ecf20Sopenharmony_ci + object_get(obj); 7918c2ecf20Sopenharmony_ci spin_unlock_irqrestore(&cache_lock, flags); 7928c2ecf20Sopenharmony_ci return obj; 7938c2ecf20Sopenharmony_ci } 7948c2ecf20Sopenharmony_ci 7958c2ecf20Sopenharmony_ciProtecting The Objects Themselves 7968c2ecf20Sopenharmony_ci--------------------------------- 7978c2ecf20Sopenharmony_ci 7988c2ecf20Sopenharmony_ciIn these examples, we assumed that the objects (except the reference 7998c2ecf20Sopenharmony_cicounts) never changed once they are created. If we wanted to allow the 8008c2ecf20Sopenharmony_ciname to change, there are three possibilities: 8018c2ecf20Sopenharmony_ci 8028c2ecf20Sopenharmony_ci- You can make ``cache_lock`` non-static, and tell people to grab that 8038c2ecf20Sopenharmony_ci lock before changing the name in any object. 8048c2ecf20Sopenharmony_ci 8058c2ecf20Sopenharmony_ci- You can provide a cache_obj_rename() which grabs this 8068c2ecf20Sopenharmony_ci lock and changes the name for the caller, and tell everyone to use 8078c2ecf20Sopenharmony_ci that function. 8088c2ecf20Sopenharmony_ci 8098c2ecf20Sopenharmony_ci- You can make the ``cache_lock`` protect only the cache itself, and 8108c2ecf20Sopenharmony_ci use another lock to protect the name. 8118c2ecf20Sopenharmony_ci 8128c2ecf20Sopenharmony_ciTheoretically, you can make the locks as fine-grained as one lock for 8138c2ecf20Sopenharmony_cievery field, for every object. In practice, the most common variants 8148c2ecf20Sopenharmony_ciare: 8158c2ecf20Sopenharmony_ci 8168c2ecf20Sopenharmony_ci- One lock which protects the infrastructure (the ``cache`` list in 8178c2ecf20Sopenharmony_ci this example) and all the objects. This is what we have done so far. 8188c2ecf20Sopenharmony_ci 8198c2ecf20Sopenharmony_ci- One lock which protects the infrastructure (including the list 8208c2ecf20Sopenharmony_ci pointers inside the objects), and one lock inside the object which 8218c2ecf20Sopenharmony_ci protects the rest of that object. 8228c2ecf20Sopenharmony_ci 8238c2ecf20Sopenharmony_ci- Multiple locks to protect the infrastructure (eg. one lock per hash 8248c2ecf20Sopenharmony_ci chain), possibly with a separate per-object lock. 8258c2ecf20Sopenharmony_ci 8268c2ecf20Sopenharmony_ciHere is the "lock-per-object" implementation: 8278c2ecf20Sopenharmony_ci 8288c2ecf20Sopenharmony_ci:: 8298c2ecf20Sopenharmony_ci 8308c2ecf20Sopenharmony_ci --- cache.c.refcnt-atomic 2003-12-11 15:50:54.000000000 +1100 8318c2ecf20Sopenharmony_ci +++ cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100 8328c2ecf20Sopenharmony_ci @@ -6,11 +6,17 @@ 8338c2ecf20Sopenharmony_ci 8348c2ecf20Sopenharmony_ci struct object 8358c2ecf20Sopenharmony_ci { 8368c2ecf20Sopenharmony_ci + /* These two protected by cache_lock. */ 8378c2ecf20Sopenharmony_ci struct list_head list; 8388c2ecf20Sopenharmony_ci + int popularity; 8398c2ecf20Sopenharmony_ci + 8408c2ecf20Sopenharmony_ci atomic_t refcnt; 8418c2ecf20Sopenharmony_ci + 8428c2ecf20Sopenharmony_ci + /* Doesn't change once created. */ 8438c2ecf20Sopenharmony_ci int id; 8448c2ecf20Sopenharmony_ci + 8458c2ecf20Sopenharmony_ci + spinlock_t lock; /* Protects the name */ 8468c2ecf20Sopenharmony_ci char name[32]; 8478c2ecf20Sopenharmony_ci - int popularity; 8488c2ecf20Sopenharmony_ci }; 8498c2ecf20Sopenharmony_ci 8508c2ecf20Sopenharmony_ci static DEFINE_SPINLOCK(cache_lock); 8518c2ecf20Sopenharmony_ci @@ -77,6 +84,7 @@ 8528c2ecf20Sopenharmony_ci obj->id = id; 8538c2ecf20Sopenharmony_ci obj->popularity = 0; 8548c2ecf20Sopenharmony_ci atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ 8558c2ecf20Sopenharmony_ci + spin_lock_init(&obj->lock); 8568c2ecf20Sopenharmony_ci 8578c2ecf20Sopenharmony_ci spin_lock_irqsave(&cache_lock, flags); 8588c2ecf20Sopenharmony_ci __cache_add(obj); 8598c2ecf20Sopenharmony_ci 8608c2ecf20Sopenharmony_ciNote that I decide that the popularity count should be protected by the 8618c2ecf20Sopenharmony_ci``cache_lock`` rather than the per-object lock: this is because it (like 8628c2ecf20Sopenharmony_cithe :c:type:`struct list_head <list_head>` inside the object) 8638c2ecf20Sopenharmony_ciis logically part of the infrastructure. This way, I don't need to grab 8648c2ecf20Sopenharmony_cithe lock of every object in __cache_add() when seeking 8658c2ecf20Sopenharmony_cithe least popular. 8668c2ecf20Sopenharmony_ci 8678c2ecf20Sopenharmony_ciI also decided that the id member is unchangeable, so I don't need to 8688c2ecf20Sopenharmony_cigrab each object lock in __cache_find() to examine the 8698c2ecf20Sopenharmony_ciid: the object lock is only used by a caller who wants to read or write 8708c2ecf20Sopenharmony_cithe name field. 8718c2ecf20Sopenharmony_ci 8728c2ecf20Sopenharmony_ciNote also that I added a comment describing what data was protected by 8738c2ecf20Sopenharmony_ciwhich locks. This is extremely important, as it describes the runtime 8748c2ecf20Sopenharmony_cibehavior of the code, and can be hard to gain from just reading. And as 8758c2ecf20Sopenharmony_ciAlan Cox says, “Lock data, not code”. 8768c2ecf20Sopenharmony_ci 8778c2ecf20Sopenharmony_ciCommon Problems 8788c2ecf20Sopenharmony_ci=============== 8798c2ecf20Sopenharmony_ci 8808c2ecf20Sopenharmony_ciDeadlock: Simple and Advanced 8818c2ecf20Sopenharmony_ci----------------------------- 8828c2ecf20Sopenharmony_ci 8838c2ecf20Sopenharmony_ciThere is a coding bug where a piece of code tries to grab a spinlock 8848c2ecf20Sopenharmony_citwice: it will spin forever, waiting for the lock to be released 8858c2ecf20Sopenharmony_ci(spinlocks, rwlocks and mutexes are not recursive in Linux). This is 8868c2ecf20Sopenharmony_citrivial to diagnose: not a 8878c2ecf20Sopenharmony_cistay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem. 8888c2ecf20Sopenharmony_ci 8898c2ecf20Sopenharmony_ciFor a slightly more complex case, imagine you have a region shared by a 8908c2ecf20Sopenharmony_cisoftirq and user context. If you use a spin_lock() call 8918c2ecf20Sopenharmony_cito protect it, it is possible that the user context will be interrupted 8928c2ecf20Sopenharmony_ciby the softirq while it holds the lock, and the softirq will then spin 8938c2ecf20Sopenharmony_ciforever trying to get the same lock. 8948c2ecf20Sopenharmony_ci 8958c2ecf20Sopenharmony_ciBoth of these are called deadlock, and as shown above, it can occur even 8968c2ecf20Sopenharmony_ciwith a single CPU (although not on UP compiles, since spinlocks vanish 8978c2ecf20Sopenharmony_cion kernel compiles with ``CONFIG_SMP``\ =n. You'll still get data 8988c2ecf20Sopenharmony_cicorruption in the second example). 8998c2ecf20Sopenharmony_ci 9008c2ecf20Sopenharmony_ciThis complete lockup is easy to diagnose: on SMP boxes the watchdog 9018c2ecf20Sopenharmony_citimer or compiling with ``DEBUG_SPINLOCK`` set 9028c2ecf20Sopenharmony_ci(``include/linux/spinlock.h``) will show this up immediately when it 9038c2ecf20Sopenharmony_cihappens. 9048c2ecf20Sopenharmony_ci 9058c2ecf20Sopenharmony_ciA more complex problem is the so-called 'deadly embrace', involving two 9068c2ecf20Sopenharmony_cior more locks. Say you have a hash table: each entry in the table is a 9078c2ecf20Sopenharmony_cispinlock, and a chain of hashed objects. Inside a softirq handler, you 9088c2ecf20Sopenharmony_cisometimes want to alter an object from one place in the hash to another: 9098c2ecf20Sopenharmony_ciyou grab the spinlock of the old hash chain and the spinlock of the new 9108c2ecf20Sopenharmony_cihash chain, and delete the object from the old one, and insert it in the 9118c2ecf20Sopenharmony_cinew one. 9128c2ecf20Sopenharmony_ci 9138c2ecf20Sopenharmony_ciThere are two problems here. First, if your code ever tries to move the 9148c2ecf20Sopenharmony_ciobject to the same chain, it will deadlock with itself as it tries to 9158c2ecf20Sopenharmony_cilock it twice. Secondly, if the same softirq on another CPU is trying to 9168c2ecf20Sopenharmony_cimove another object in the reverse direction, the following could 9178c2ecf20Sopenharmony_cihappen: 9188c2ecf20Sopenharmony_ci 9198c2ecf20Sopenharmony_ci+-----------------------+-----------------------+ 9208c2ecf20Sopenharmony_ci| CPU 1 | CPU 2 | 9218c2ecf20Sopenharmony_ci+=======================+=======================+ 9228c2ecf20Sopenharmony_ci| Grab lock A -> OK | Grab lock B -> OK | 9238c2ecf20Sopenharmony_ci+-----------------------+-----------------------+ 9248c2ecf20Sopenharmony_ci| Grab lock B -> spin | Grab lock A -> spin | 9258c2ecf20Sopenharmony_ci+-----------------------+-----------------------+ 9268c2ecf20Sopenharmony_ci 9278c2ecf20Sopenharmony_ciTable: Consequences 9288c2ecf20Sopenharmony_ci 9298c2ecf20Sopenharmony_ciThe two CPUs will spin forever, waiting for the other to give up their 9308c2ecf20Sopenharmony_cilock. It will look, smell, and feel like a crash. 9318c2ecf20Sopenharmony_ci 9328c2ecf20Sopenharmony_ciPreventing Deadlock 9338c2ecf20Sopenharmony_ci------------------- 9348c2ecf20Sopenharmony_ci 9358c2ecf20Sopenharmony_ciTextbooks will tell you that if you always lock in the same order, you 9368c2ecf20Sopenharmony_ciwill never get this kind of deadlock. Practice will tell you that this 9378c2ecf20Sopenharmony_ciapproach doesn't scale: when I create a new lock, I don't understand 9388c2ecf20Sopenharmony_cienough of the kernel to figure out where in the 5000 lock hierarchy it 9398c2ecf20Sopenharmony_ciwill fit. 9408c2ecf20Sopenharmony_ci 9418c2ecf20Sopenharmony_ciThe best locks are encapsulated: they never get exposed in headers, and 9428c2ecf20Sopenharmony_ciare never held around calls to non-trivial functions outside the same 9438c2ecf20Sopenharmony_cifile. You can read through this code and see that it will never 9448c2ecf20Sopenharmony_cideadlock, because it never tries to grab another lock while it has that 9458c2ecf20Sopenharmony_cione. People using your code don't even need to know you are using a 9468c2ecf20Sopenharmony_cilock. 9478c2ecf20Sopenharmony_ci 9488c2ecf20Sopenharmony_ciA classic problem here is when you provide callbacks or hooks: if you 9498c2ecf20Sopenharmony_cicall these with the lock held, you risk simple deadlock, or a deadly 9508c2ecf20Sopenharmony_ciembrace (who knows what the callback will do?). Remember, the other 9518c2ecf20Sopenharmony_ciprogrammers are out to get you, so don't do this. 9528c2ecf20Sopenharmony_ci 9538c2ecf20Sopenharmony_ciOverzealous Prevention Of Deadlocks 9548c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9558c2ecf20Sopenharmony_ci 9568c2ecf20Sopenharmony_ciDeadlocks are problematic, but not as bad as data corruption. Code which 9578c2ecf20Sopenharmony_cigrabs a read lock, searches a list, fails to find what it wants, drops 9588c2ecf20Sopenharmony_cithe read lock, grabs a write lock and inserts the object has a race 9598c2ecf20Sopenharmony_cicondition. 9608c2ecf20Sopenharmony_ci 9618c2ecf20Sopenharmony_ciIf you don't see why, please stay the fuck away from my code. 9628c2ecf20Sopenharmony_ci 9638c2ecf20Sopenharmony_ciRacing Timers: A Kernel Pastime 9648c2ecf20Sopenharmony_ci------------------------------- 9658c2ecf20Sopenharmony_ci 9668c2ecf20Sopenharmony_ciTimers can produce their own special problems with races. Consider a 9678c2ecf20Sopenharmony_cicollection of objects (list, hash, etc) where each object has a timer 9688c2ecf20Sopenharmony_ciwhich is due to destroy it. 9698c2ecf20Sopenharmony_ci 9708c2ecf20Sopenharmony_ciIf you want to destroy the entire collection (say on module removal), 9718c2ecf20Sopenharmony_ciyou might do the following:: 9728c2ecf20Sopenharmony_ci 9738c2ecf20Sopenharmony_ci /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE 9748c2ecf20Sopenharmony_ci HUNGARIAN NOTATION */ 9758c2ecf20Sopenharmony_ci spin_lock_bh(&list_lock); 9768c2ecf20Sopenharmony_ci 9778c2ecf20Sopenharmony_ci while (list) { 9788c2ecf20Sopenharmony_ci struct foo *next = list->next; 9798c2ecf20Sopenharmony_ci del_timer(&list->timer); 9808c2ecf20Sopenharmony_ci kfree(list); 9818c2ecf20Sopenharmony_ci list = next; 9828c2ecf20Sopenharmony_ci } 9838c2ecf20Sopenharmony_ci 9848c2ecf20Sopenharmony_ci spin_unlock_bh(&list_lock); 9858c2ecf20Sopenharmony_ci 9868c2ecf20Sopenharmony_ci 9878c2ecf20Sopenharmony_ciSooner or later, this will crash on SMP, because a timer can have just 9888c2ecf20Sopenharmony_cigone off before the spin_lock_bh(), and it will only get 9898c2ecf20Sopenharmony_cithe lock after we spin_unlock_bh(), and then try to free 9908c2ecf20Sopenharmony_cithe element (which has already been freed!). 9918c2ecf20Sopenharmony_ci 9928c2ecf20Sopenharmony_ciThis can be avoided by checking the result of 9938c2ecf20Sopenharmony_cidel_timer(): if it returns 1, the timer has been deleted. 9948c2ecf20Sopenharmony_ciIf 0, it means (in this case) that it is currently running, so we can 9958c2ecf20Sopenharmony_cido:: 9968c2ecf20Sopenharmony_ci 9978c2ecf20Sopenharmony_ci retry: 9988c2ecf20Sopenharmony_ci spin_lock_bh(&list_lock); 9998c2ecf20Sopenharmony_ci 10008c2ecf20Sopenharmony_ci while (list) { 10018c2ecf20Sopenharmony_ci struct foo *next = list->next; 10028c2ecf20Sopenharmony_ci if (!del_timer(&list->timer)) { 10038c2ecf20Sopenharmony_ci /* Give timer a chance to delete this */ 10048c2ecf20Sopenharmony_ci spin_unlock_bh(&list_lock); 10058c2ecf20Sopenharmony_ci goto retry; 10068c2ecf20Sopenharmony_ci } 10078c2ecf20Sopenharmony_ci kfree(list); 10088c2ecf20Sopenharmony_ci list = next; 10098c2ecf20Sopenharmony_ci } 10108c2ecf20Sopenharmony_ci 10118c2ecf20Sopenharmony_ci spin_unlock_bh(&list_lock); 10128c2ecf20Sopenharmony_ci 10138c2ecf20Sopenharmony_ci 10148c2ecf20Sopenharmony_ciAnother common problem is deleting timers which restart themselves (by 10158c2ecf20Sopenharmony_cicalling add_timer() at the end of their timer function). 10168c2ecf20Sopenharmony_ciBecause this is a fairly common case which is prone to races, you should 10178c2ecf20Sopenharmony_ciuse del_timer_sync() (``include/linux/timer.h``) to 10188c2ecf20Sopenharmony_cihandle this case. It returns the number of times the timer had to be 10198c2ecf20Sopenharmony_cideleted before we finally stopped it from adding itself back in. 10208c2ecf20Sopenharmony_ci 10218c2ecf20Sopenharmony_ciLocking Speed 10228c2ecf20Sopenharmony_ci============= 10238c2ecf20Sopenharmony_ci 10248c2ecf20Sopenharmony_ciThere are three main things to worry about when considering speed of 10258c2ecf20Sopenharmony_cisome code which does locking. First is concurrency: how many things are 10268c2ecf20Sopenharmony_cigoing to be waiting while someone else is holding a lock. Second is the 10278c2ecf20Sopenharmony_citime taken to actually acquire and release an uncontended lock. Third is 10288c2ecf20Sopenharmony_ciusing fewer, or smarter locks. I'm assuming that the lock is used fairly 10298c2ecf20Sopenharmony_cioften: otherwise, you wouldn't be concerned about efficiency. 10308c2ecf20Sopenharmony_ci 10318c2ecf20Sopenharmony_ciConcurrency depends on how long the lock is usually held: you should 10328c2ecf20Sopenharmony_cihold the lock for as long as needed, but no longer. In the cache 10338c2ecf20Sopenharmony_ciexample, we always create the object without the lock held, and then 10348c2ecf20Sopenharmony_cigrab the lock only when we are ready to insert it in the list. 10358c2ecf20Sopenharmony_ci 10368c2ecf20Sopenharmony_ciAcquisition times depend on how much damage the lock operations do to 10378c2ecf20Sopenharmony_cithe pipeline (pipeline stalls) and how likely it is that this CPU was 10388c2ecf20Sopenharmony_cithe last one to grab the lock (ie. is the lock cache-hot for this CPU): 10398c2ecf20Sopenharmony_cion a machine with more CPUs, this likelihood drops fast. Consider a 10408c2ecf20Sopenharmony_ci700MHz Intel Pentium III: an instruction takes about 0.7ns, an atomic 10418c2ecf20Sopenharmony_ciincrement takes about 58ns, a lock which is cache-hot on this CPU takes 10428c2ecf20Sopenharmony_ci160ns, and a cacheline transfer from another CPU takes an additional 170 10438c2ecf20Sopenharmony_cito 360ns. (These figures from Paul McKenney's `Linux Journal RCU 10448c2ecf20Sopenharmony_ciarticle <http://www.linuxjournal.com/article.php?sid=6993>`__). 10458c2ecf20Sopenharmony_ci 10468c2ecf20Sopenharmony_ciThese two aims conflict: holding a lock for a short time might be done 10478c2ecf20Sopenharmony_ciby splitting locks into parts (such as in our final per-object-lock 10488c2ecf20Sopenharmony_ciexample), but this increases the number of lock acquisitions, and the 10498c2ecf20Sopenharmony_ciresults are often slower than having a single lock. This is another 10508c2ecf20Sopenharmony_cireason to advocate locking simplicity. 10518c2ecf20Sopenharmony_ci 10528c2ecf20Sopenharmony_ciThe third concern is addressed below: there are some methods to reduce 10538c2ecf20Sopenharmony_cithe amount of locking which needs to be done. 10548c2ecf20Sopenharmony_ci 10558c2ecf20Sopenharmony_ciRead/Write Lock Variants 10568c2ecf20Sopenharmony_ci------------------------ 10578c2ecf20Sopenharmony_ci 10588c2ecf20Sopenharmony_ciBoth spinlocks and mutexes have read/write variants: ``rwlock_t`` and 10598c2ecf20Sopenharmony_ci:c:type:`struct rw_semaphore <rw_semaphore>`. These divide 10608c2ecf20Sopenharmony_ciusers into two classes: the readers and the writers. If you are only 10618c2ecf20Sopenharmony_cireading the data, you can get a read lock, but to write to the data you 10628c2ecf20Sopenharmony_cineed the write lock. Many people can hold a read lock, but a writer must 10638c2ecf20Sopenharmony_cibe sole holder. 10648c2ecf20Sopenharmony_ci 10658c2ecf20Sopenharmony_ciIf your code divides neatly along reader/writer lines (as our cache code 10668c2ecf20Sopenharmony_cidoes), and the lock is held by readers for significant lengths of time, 10678c2ecf20Sopenharmony_ciusing these locks can help. They are slightly slower than the normal 10688c2ecf20Sopenharmony_cilocks though, so in practice ``rwlock_t`` is not usually worthwhile. 10698c2ecf20Sopenharmony_ci 10708c2ecf20Sopenharmony_ciAvoiding Locks: Read Copy Update 10718c2ecf20Sopenharmony_ci-------------------------------- 10728c2ecf20Sopenharmony_ci 10738c2ecf20Sopenharmony_ciThere is a special method of read/write locking called Read Copy Update. 10748c2ecf20Sopenharmony_ciUsing RCU, the readers can avoid taking a lock altogether: as we expect 10758c2ecf20Sopenharmony_ciour cache to be read more often than updated (otherwise the cache is a 10768c2ecf20Sopenharmony_ciwaste of time), it is a candidate for this optimization. 10778c2ecf20Sopenharmony_ci 10788c2ecf20Sopenharmony_ciHow do we get rid of read locks? Getting rid of read locks means that 10798c2ecf20Sopenharmony_ciwriters may be changing the list underneath the readers. That is 10808c2ecf20Sopenharmony_ciactually quite simple: we can read a linked list while an element is 10818c2ecf20Sopenharmony_cibeing added if the writer adds the element very carefully. For example, 10828c2ecf20Sopenharmony_ciadding ``new`` to a single linked list called ``list``:: 10838c2ecf20Sopenharmony_ci 10848c2ecf20Sopenharmony_ci new->next = list->next; 10858c2ecf20Sopenharmony_ci wmb(); 10868c2ecf20Sopenharmony_ci list->next = new; 10878c2ecf20Sopenharmony_ci 10888c2ecf20Sopenharmony_ci 10898c2ecf20Sopenharmony_ciThe wmb() is a write memory barrier. It ensures that the 10908c2ecf20Sopenharmony_cifirst operation (setting the new element's ``next`` pointer) is complete 10918c2ecf20Sopenharmony_ciand will be seen by all CPUs, before the second operation is (putting 10928c2ecf20Sopenharmony_cithe new element into the list). This is important, since modern 10938c2ecf20Sopenharmony_cicompilers and modern CPUs can both reorder instructions unless told 10948c2ecf20Sopenharmony_ciotherwise: we want a reader to either not see the new element at all, or 10958c2ecf20Sopenharmony_cisee the new element with the ``next`` pointer correctly pointing at the 10968c2ecf20Sopenharmony_cirest of the list. 10978c2ecf20Sopenharmony_ci 10988c2ecf20Sopenharmony_ciFortunately, there is a function to do this for standard 10998c2ecf20Sopenharmony_ci:c:type:`struct list_head <list_head>` lists: 11008c2ecf20Sopenharmony_cilist_add_rcu() (``include/linux/list.h``). 11018c2ecf20Sopenharmony_ci 11028c2ecf20Sopenharmony_ciRemoving an element from the list is even simpler: we replace the 11038c2ecf20Sopenharmony_cipointer to the old element with a pointer to its successor, and readers 11048c2ecf20Sopenharmony_ciwill either see it, or skip over it. 11058c2ecf20Sopenharmony_ci 11068c2ecf20Sopenharmony_ci:: 11078c2ecf20Sopenharmony_ci 11088c2ecf20Sopenharmony_ci list->next = old->next; 11098c2ecf20Sopenharmony_ci 11108c2ecf20Sopenharmony_ci 11118c2ecf20Sopenharmony_ciThere is list_del_rcu() (``include/linux/list.h``) which 11128c2ecf20Sopenharmony_cidoes this (the normal version poisons the old object, which we don't 11138c2ecf20Sopenharmony_ciwant). 11148c2ecf20Sopenharmony_ci 11158c2ecf20Sopenharmony_ciThe reader must also be careful: some CPUs can look through the ``next`` 11168c2ecf20Sopenharmony_cipointer to start reading the contents of the next element early, but 11178c2ecf20Sopenharmony_cidon't realize that the pre-fetched contents is wrong when the ``next`` 11188c2ecf20Sopenharmony_cipointer changes underneath them. Once again, there is a 11198c2ecf20Sopenharmony_cilist_for_each_entry_rcu() (``include/linux/list.h``) 11208c2ecf20Sopenharmony_cito help you. Of course, writers can just use 11218c2ecf20Sopenharmony_cilist_for_each_entry(), since there cannot be two 11228c2ecf20Sopenharmony_cisimultaneous writers. 11238c2ecf20Sopenharmony_ci 11248c2ecf20Sopenharmony_ciOur final dilemma is this: when can we actually destroy the removed 11258c2ecf20Sopenharmony_cielement? Remember, a reader might be stepping through this element in 11268c2ecf20Sopenharmony_cithe list right now: if we free this element and the ``next`` pointer 11278c2ecf20Sopenharmony_cichanges, the reader will jump off into garbage and crash. We need to 11288c2ecf20Sopenharmony_ciwait until we know that all the readers who were traversing the list 11298c2ecf20Sopenharmony_ciwhen we deleted the element are finished. We use 11308c2ecf20Sopenharmony_cicall_rcu() to register a callback which will actually 11318c2ecf20Sopenharmony_cidestroy the object once all pre-existing readers are finished. 11328c2ecf20Sopenharmony_ciAlternatively, synchronize_rcu() may be used to block 11338c2ecf20Sopenharmony_ciuntil all pre-existing are finished. 11348c2ecf20Sopenharmony_ci 11358c2ecf20Sopenharmony_ciBut how does Read Copy Update know when the readers are finished? The 11368c2ecf20Sopenharmony_cimethod is this: firstly, the readers always traverse the list inside 11378c2ecf20Sopenharmony_circu_read_lock()/rcu_read_unlock() pairs: 11388c2ecf20Sopenharmony_cithese simply disable preemption so the reader won't go to sleep while 11398c2ecf20Sopenharmony_cireading the list. 11408c2ecf20Sopenharmony_ci 11418c2ecf20Sopenharmony_ciRCU then waits until every other CPU has slept at least once: since 11428c2ecf20Sopenharmony_cireaders cannot sleep, we know that any readers which were traversing the 11438c2ecf20Sopenharmony_cilist during the deletion are finished, and the callback is triggered. 11448c2ecf20Sopenharmony_ciThe real Read Copy Update code is a little more optimized than this, but 11458c2ecf20Sopenharmony_cithis is the fundamental idea. 11468c2ecf20Sopenharmony_ci 11478c2ecf20Sopenharmony_ci:: 11488c2ecf20Sopenharmony_ci 11498c2ecf20Sopenharmony_ci --- cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100 11508c2ecf20Sopenharmony_ci +++ cache.c.rcupdate 2003-12-11 17:55:14.000000000 +1100 11518c2ecf20Sopenharmony_ci @@ -1,15 +1,18 @@ 11528c2ecf20Sopenharmony_ci #include <linux/list.h> 11538c2ecf20Sopenharmony_ci #include <linux/slab.h> 11548c2ecf20Sopenharmony_ci #include <linux/string.h> 11558c2ecf20Sopenharmony_ci +#include <linux/rcupdate.h> 11568c2ecf20Sopenharmony_ci #include <linux/mutex.h> 11578c2ecf20Sopenharmony_ci #include <asm/errno.h> 11588c2ecf20Sopenharmony_ci 11598c2ecf20Sopenharmony_ci struct object 11608c2ecf20Sopenharmony_ci { 11618c2ecf20Sopenharmony_ci - /* These two protected by cache_lock. */ 11628c2ecf20Sopenharmony_ci + /* This is protected by RCU */ 11638c2ecf20Sopenharmony_ci struct list_head list; 11648c2ecf20Sopenharmony_ci int popularity; 11658c2ecf20Sopenharmony_ci 11668c2ecf20Sopenharmony_ci + struct rcu_head rcu; 11678c2ecf20Sopenharmony_ci + 11688c2ecf20Sopenharmony_ci atomic_t refcnt; 11698c2ecf20Sopenharmony_ci 11708c2ecf20Sopenharmony_ci /* Doesn't change once created. */ 11718c2ecf20Sopenharmony_ci @@ -40,7 +43,7 @@ 11728c2ecf20Sopenharmony_ci { 11738c2ecf20Sopenharmony_ci struct object *i; 11748c2ecf20Sopenharmony_ci 11758c2ecf20Sopenharmony_ci - list_for_each_entry(i, &cache, list) { 11768c2ecf20Sopenharmony_ci + list_for_each_entry_rcu(i, &cache, list) { 11778c2ecf20Sopenharmony_ci if (i->id == id) { 11788c2ecf20Sopenharmony_ci i->popularity++; 11798c2ecf20Sopenharmony_ci return i; 11808c2ecf20Sopenharmony_ci @@ -49,19 +52,25 @@ 11818c2ecf20Sopenharmony_ci return NULL; 11828c2ecf20Sopenharmony_ci } 11838c2ecf20Sopenharmony_ci 11848c2ecf20Sopenharmony_ci +/* Final discard done once we know no readers are looking. */ 11858c2ecf20Sopenharmony_ci +static void cache_delete_rcu(void *arg) 11868c2ecf20Sopenharmony_ci +{ 11878c2ecf20Sopenharmony_ci + object_put(arg); 11888c2ecf20Sopenharmony_ci +} 11898c2ecf20Sopenharmony_ci + 11908c2ecf20Sopenharmony_ci /* Must be holding cache_lock */ 11918c2ecf20Sopenharmony_ci static void __cache_delete(struct object *obj) 11928c2ecf20Sopenharmony_ci { 11938c2ecf20Sopenharmony_ci BUG_ON(!obj); 11948c2ecf20Sopenharmony_ci - list_del(&obj->list); 11958c2ecf20Sopenharmony_ci - object_put(obj); 11968c2ecf20Sopenharmony_ci + list_del_rcu(&obj->list); 11978c2ecf20Sopenharmony_ci cache_num--; 11988c2ecf20Sopenharmony_ci + call_rcu(&obj->rcu, cache_delete_rcu); 11998c2ecf20Sopenharmony_ci } 12008c2ecf20Sopenharmony_ci 12018c2ecf20Sopenharmony_ci /* Must be holding cache_lock */ 12028c2ecf20Sopenharmony_ci static void __cache_add(struct object *obj) 12038c2ecf20Sopenharmony_ci { 12048c2ecf20Sopenharmony_ci - list_add(&obj->list, &cache); 12058c2ecf20Sopenharmony_ci + list_add_rcu(&obj->list, &cache); 12068c2ecf20Sopenharmony_ci if (++cache_num > MAX_CACHE_SIZE) { 12078c2ecf20Sopenharmony_ci struct object *i, *outcast = NULL; 12088c2ecf20Sopenharmony_ci list_for_each_entry(i, &cache, list) { 12098c2ecf20Sopenharmony_ci @@ -104,12 +114,11 @@ 12108c2ecf20Sopenharmony_ci struct object *cache_find(int id) 12118c2ecf20Sopenharmony_ci { 12128c2ecf20Sopenharmony_ci struct object *obj; 12138c2ecf20Sopenharmony_ci - unsigned long flags; 12148c2ecf20Sopenharmony_ci 12158c2ecf20Sopenharmony_ci - spin_lock_irqsave(&cache_lock, flags); 12168c2ecf20Sopenharmony_ci + rcu_read_lock(); 12178c2ecf20Sopenharmony_ci obj = __cache_find(id); 12188c2ecf20Sopenharmony_ci if (obj) 12198c2ecf20Sopenharmony_ci object_get(obj); 12208c2ecf20Sopenharmony_ci - spin_unlock_irqrestore(&cache_lock, flags); 12218c2ecf20Sopenharmony_ci + rcu_read_unlock(); 12228c2ecf20Sopenharmony_ci return obj; 12238c2ecf20Sopenharmony_ci } 12248c2ecf20Sopenharmony_ci 12258c2ecf20Sopenharmony_ciNote that the reader will alter the popularity member in 12268c2ecf20Sopenharmony_ci__cache_find(), and now it doesn't hold a lock. One 12278c2ecf20Sopenharmony_cisolution would be to make it an ``atomic_t``, but for this usage, we 12288c2ecf20Sopenharmony_cidon't really care about races: an approximate result is good enough, so 12298c2ecf20Sopenharmony_ciI didn't change it. 12308c2ecf20Sopenharmony_ci 12318c2ecf20Sopenharmony_ciThe result is that cache_find() requires no 12328c2ecf20Sopenharmony_cisynchronization with any other functions, so is almost as fast on SMP as 12338c2ecf20Sopenharmony_ciit would be on UP. 12348c2ecf20Sopenharmony_ci 12358c2ecf20Sopenharmony_ciThere is a further optimization possible here: remember our original 12368c2ecf20Sopenharmony_cicache code, where there were no reference counts and the caller simply 12378c2ecf20Sopenharmony_ciheld the lock whenever using the object? This is still possible: if you 12388c2ecf20Sopenharmony_cihold the lock, no one can delete the object, so you don't need to get 12398c2ecf20Sopenharmony_ciand put the reference count. 12408c2ecf20Sopenharmony_ci 12418c2ecf20Sopenharmony_ciNow, because the 'read lock' in RCU is simply disabling preemption, a 12428c2ecf20Sopenharmony_cicaller which always has preemption disabled between calling 12438c2ecf20Sopenharmony_cicache_find() and object_put() does not 12448c2ecf20Sopenharmony_cineed to actually get and put the reference count: we could expose 12458c2ecf20Sopenharmony_ci__cache_find() by making it non-static, and such 12468c2ecf20Sopenharmony_cicallers could simply call that. 12478c2ecf20Sopenharmony_ci 12488c2ecf20Sopenharmony_ciThe benefit here is that the reference count is not written to: the 12498c2ecf20Sopenharmony_ciobject is not altered in any way, which is much faster on SMP machines 12508c2ecf20Sopenharmony_cidue to caching. 12518c2ecf20Sopenharmony_ci 12528c2ecf20Sopenharmony_ciPer-CPU Data 12538c2ecf20Sopenharmony_ci------------ 12548c2ecf20Sopenharmony_ci 12558c2ecf20Sopenharmony_ciAnother technique for avoiding locking which is used fairly widely is to 12568c2ecf20Sopenharmony_ciduplicate information for each CPU. For example, if you wanted to keep a 12578c2ecf20Sopenharmony_cicount of a common condition, you could use a spin lock and a single 12588c2ecf20Sopenharmony_cicounter. Nice and simple. 12598c2ecf20Sopenharmony_ci 12608c2ecf20Sopenharmony_ciIf that was too slow (it's usually not, but if you've got a really big 12618c2ecf20Sopenharmony_cimachine to test on and can show that it is), you could instead use a 12628c2ecf20Sopenharmony_cicounter for each CPU, then none of them need an exclusive lock. See 12638c2ecf20Sopenharmony_ciDEFINE_PER_CPU(), get_cpu_var() and 12648c2ecf20Sopenharmony_ciput_cpu_var() (``include/linux/percpu.h``). 12658c2ecf20Sopenharmony_ci 12668c2ecf20Sopenharmony_ciOf particular use for simple per-cpu counters is the ``local_t`` type, 12678c2ecf20Sopenharmony_ciand the cpu_local_inc() and related functions, which are 12688c2ecf20Sopenharmony_cimore efficient than simple code on some architectures 12698c2ecf20Sopenharmony_ci(``include/asm/local.h``). 12708c2ecf20Sopenharmony_ci 12718c2ecf20Sopenharmony_ciNote that there is no simple, reliable way of getting an exact value of 12728c2ecf20Sopenharmony_cisuch a counter, without introducing more locks. This is not a problem 12738c2ecf20Sopenharmony_cifor some uses. 12748c2ecf20Sopenharmony_ci 12758c2ecf20Sopenharmony_ciData Which Mostly Used By An IRQ Handler 12768c2ecf20Sopenharmony_ci---------------------------------------- 12778c2ecf20Sopenharmony_ci 12788c2ecf20Sopenharmony_ciIf data is always accessed from within the same IRQ handler, you don't 12798c2ecf20Sopenharmony_cineed a lock at all: the kernel already guarantees that the irq handler 12808c2ecf20Sopenharmony_ciwill not run simultaneously on multiple CPUs. 12818c2ecf20Sopenharmony_ci 12828c2ecf20Sopenharmony_ciManfred Spraul points out that you can still do this, even if the data 12838c2ecf20Sopenharmony_ciis very occasionally accessed in user context or softirqs/tasklets. The 12848c2ecf20Sopenharmony_ciirq handler doesn't use a lock, and all other accesses are done as so:: 12858c2ecf20Sopenharmony_ci 12868c2ecf20Sopenharmony_ci spin_lock(&lock); 12878c2ecf20Sopenharmony_ci disable_irq(irq); 12888c2ecf20Sopenharmony_ci ... 12898c2ecf20Sopenharmony_ci enable_irq(irq); 12908c2ecf20Sopenharmony_ci spin_unlock(&lock); 12918c2ecf20Sopenharmony_ci 12928c2ecf20Sopenharmony_ciThe disable_irq() prevents the irq handler from running 12938c2ecf20Sopenharmony_ci(and waits for it to finish if it's currently running on other CPUs). 12948c2ecf20Sopenharmony_ciThe spinlock prevents any other accesses happening at the same time. 12958c2ecf20Sopenharmony_ciNaturally, this is slower than just a spin_lock_irq() 12968c2ecf20Sopenharmony_cicall, so it only makes sense if this type of access happens extremely 12978c2ecf20Sopenharmony_cirarely. 12988c2ecf20Sopenharmony_ci 12998c2ecf20Sopenharmony_ciWhat Functions Are Safe To Call From Interrupts? 13008c2ecf20Sopenharmony_ci================================================ 13018c2ecf20Sopenharmony_ci 13028c2ecf20Sopenharmony_ciMany functions in the kernel sleep (ie. call schedule()) directly or 13038c2ecf20Sopenharmony_ciindirectly: you can never call them while holding a spinlock, or with 13048c2ecf20Sopenharmony_cipreemption disabled. This also means you need to be in user context: 13058c2ecf20Sopenharmony_cicalling them from an interrupt is illegal. 13068c2ecf20Sopenharmony_ci 13078c2ecf20Sopenharmony_ciSome Functions Which Sleep 13088c2ecf20Sopenharmony_ci-------------------------- 13098c2ecf20Sopenharmony_ci 13108c2ecf20Sopenharmony_ciThe most common ones are listed below, but you usually have to read the 13118c2ecf20Sopenharmony_cicode to find out if other calls are safe. If everyone else who calls it 13128c2ecf20Sopenharmony_cican sleep, you probably need to be able to sleep, too. In particular, 13138c2ecf20Sopenharmony_ciregistration and deregistration functions usually expect to be called 13148c2ecf20Sopenharmony_cifrom user context, and can sleep. 13158c2ecf20Sopenharmony_ci 13168c2ecf20Sopenharmony_ci- Accesses to userspace: 13178c2ecf20Sopenharmony_ci 13188c2ecf20Sopenharmony_ci - copy_from_user() 13198c2ecf20Sopenharmony_ci 13208c2ecf20Sopenharmony_ci - copy_to_user() 13218c2ecf20Sopenharmony_ci 13228c2ecf20Sopenharmony_ci - get_user() 13238c2ecf20Sopenharmony_ci 13248c2ecf20Sopenharmony_ci - put_user() 13258c2ecf20Sopenharmony_ci 13268c2ecf20Sopenharmony_ci- kmalloc(GP_KERNEL) <kmalloc>` 13278c2ecf20Sopenharmony_ci 13288c2ecf20Sopenharmony_ci- mutex_lock_interruptible() and 13298c2ecf20Sopenharmony_ci mutex_lock() 13308c2ecf20Sopenharmony_ci 13318c2ecf20Sopenharmony_ci There is a mutex_trylock() which does not sleep. 13328c2ecf20Sopenharmony_ci Still, it must not be used inside interrupt context since its 13338c2ecf20Sopenharmony_ci implementation is not safe for that. mutex_unlock() 13348c2ecf20Sopenharmony_ci will also never sleep. It cannot be used in interrupt context either 13358c2ecf20Sopenharmony_ci since a mutex must be released by the same task that acquired it. 13368c2ecf20Sopenharmony_ci 13378c2ecf20Sopenharmony_ciSome Functions Which Don't Sleep 13388c2ecf20Sopenharmony_ci-------------------------------- 13398c2ecf20Sopenharmony_ci 13408c2ecf20Sopenharmony_ciSome functions are safe to call from any context, or holding almost any 13418c2ecf20Sopenharmony_cilock. 13428c2ecf20Sopenharmony_ci 13438c2ecf20Sopenharmony_ci- printk() 13448c2ecf20Sopenharmony_ci 13458c2ecf20Sopenharmony_ci- kfree() 13468c2ecf20Sopenharmony_ci 13478c2ecf20Sopenharmony_ci- add_timer() and del_timer() 13488c2ecf20Sopenharmony_ci 13498c2ecf20Sopenharmony_ciMutex API reference 13508c2ecf20Sopenharmony_ci=================== 13518c2ecf20Sopenharmony_ci 13528c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/mutex.h 13538c2ecf20Sopenharmony_ci :internal: 13548c2ecf20Sopenharmony_ci 13558c2ecf20Sopenharmony_ci.. kernel-doc:: kernel/locking/mutex.c 13568c2ecf20Sopenharmony_ci :export: 13578c2ecf20Sopenharmony_ci 13588c2ecf20Sopenharmony_ciFutex API reference 13598c2ecf20Sopenharmony_ci=================== 13608c2ecf20Sopenharmony_ci 13618c2ecf20Sopenharmony_ci.. kernel-doc:: kernel/futex/core.c 13628c2ecf20Sopenharmony_ci :internal: 13638c2ecf20Sopenharmony_ci 13648c2ecf20Sopenharmony_ciFurther reading 13658c2ecf20Sopenharmony_ci=============== 13668c2ecf20Sopenharmony_ci 13678c2ecf20Sopenharmony_ci- ``Documentation/locking/spinlocks.rst``: Linus Torvalds' spinlocking 13688c2ecf20Sopenharmony_ci tutorial in the kernel sources. 13698c2ecf20Sopenharmony_ci 13708c2ecf20Sopenharmony_ci- Unix Systems for Modern Architectures: Symmetric Multiprocessing and 13718c2ecf20Sopenharmony_ci Caching for Kernel Programmers: 13728c2ecf20Sopenharmony_ci 13738c2ecf20Sopenharmony_ci Curt Schimmel's very good introduction to kernel level locking (not 13748c2ecf20Sopenharmony_ci written for Linux, but nearly everything applies). The book is 13758c2ecf20Sopenharmony_ci expensive, but really worth every penny to understand SMP locking. 13768c2ecf20Sopenharmony_ci [ISBN: 0201633388] 13778c2ecf20Sopenharmony_ci 13788c2ecf20Sopenharmony_ciThanks 13798c2ecf20Sopenharmony_ci====== 13808c2ecf20Sopenharmony_ci 13818c2ecf20Sopenharmony_ciThanks to Telsa Gwynne for DocBooking, neatening and adding style. 13828c2ecf20Sopenharmony_ci 13838c2ecf20Sopenharmony_ciThanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul Mackerras, 13848c2ecf20Sopenharmony_ciRuedi Aschwanden, Alan Cox, Manfred Spraul, Tim Waugh, Pete Zaitcev, 13858c2ecf20Sopenharmony_ciJames Morris, Robert Love, Paul McKenney, John Ashby for proofreading, 13868c2ecf20Sopenharmony_cicorrecting, flaming, commenting. 13878c2ecf20Sopenharmony_ci 13888c2ecf20Sopenharmony_ciThanks to the cabal for having no influence on this document. 13898c2ecf20Sopenharmony_ci 13908c2ecf20Sopenharmony_ciGlossary 13918c2ecf20Sopenharmony_ci======== 13928c2ecf20Sopenharmony_ci 13938c2ecf20Sopenharmony_cipreemption 13948c2ecf20Sopenharmony_ci Prior to 2.5, or when ``CONFIG_PREEMPT`` is unset, processes in user 13958c2ecf20Sopenharmony_ci context inside the kernel would not preempt each other (ie. you had that 13968c2ecf20Sopenharmony_ci CPU until you gave it up, except for interrupts). With the addition of 13978c2ecf20Sopenharmony_ci ``CONFIG_PREEMPT`` in 2.5.4, this changed: when in user context, higher 13988c2ecf20Sopenharmony_ci priority tasks can "cut in": spinlocks were changed to disable 13998c2ecf20Sopenharmony_ci preemption, even on UP. 14008c2ecf20Sopenharmony_ci 14018c2ecf20Sopenharmony_cibh 14028c2ecf20Sopenharmony_ci Bottom Half: for historical reasons, functions with '_bh' in them often 14038c2ecf20Sopenharmony_ci now refer to any software interrupt, e.g. spin_lock_bh() 14048c2ecf20Sopenharmony_ci blocks any software interrupt on the current CPU. Bottom halves are 14058c2ecf20Sopenharmony_ci deprecated, and will eventually be replaced by tasklets. Only one bottom 14068c2ecf20Sopenharmony_ci half will be running at any time. 14078c2ecf20Sopenharmony_ci 14088c2ecf20Sopenharmony_ciHardware Interrupt / Hardware IRQ 14098c2ecf20Sopenharmony_ci Hardware interrupt request. in_irq() returns true in a 14108c2ecf20Sopenharmony_ci hardware interrupt handler. 14118c2ecf20Sopenharmony_ci 14128c2ecf20Sopenharmony_ciInterrupt Context 14138c2ecf20Sopenharmony_ci Not user context: processing a hardware irq or software irq. Indicated 14148c2ecf20Sopenharmony_ci by the in_interrupt() macro returning true. 14158c2ecf20Sopenharmony_ci 14168c2ecf20Sopenharmony_ciSMP 14178c2ecf20Sopenharmony_ci Symmetric Multi-Processor: kernels compiled for multiple-CPU machines. 14188c2ecf20Sopenharmony_ci (``CONFIG_SMP=y``). 14198c2ecf20Sopenharmony_ci 14208c2ecf20Sopenharmony_ciSoftware Interrupt / softirq 14218c2ecf20Sopenharmony_ci Software interrupt handler. in_irq() returns false; 14228c2ecf20Sopenharmony_ci in_softirq() returns true. Tasklets and softirqs both 14238c2ecf20Sopenharmony_ci fall into the category of 'software interrupts'. 14248c2ecf20Sopenharmony_ci 14258c2ecf20Sopenharmony_ci Strictly speaking a softirq is one of up to 32 enumerated software 14268c2ecf20Sopenharmony_ci interrupts which can run on multiple CPUs at once. Sometimes used to 14278c2ecf20Sopenharmony_ci refer to tasklets as well (ie. all software interrupts). 14288c2ecf20Sopenharmony_ci 14298c2ecf20Sopenharmony_citasklet 14308c2ecf20Sopenharmony_ci A dynamically-registrable software interrupt, which is guaranteed to 14318c2ecf20Sopenharmony_ci only run on one CPU at a time. 14328c2ecf20Sopenharmony_ci 14338c2ecf20Sopenharmony_citimer 14348c2ecf20Sopenharmony_ci A dynamically-registrable software interrupt, which is run at (or close 14358c2ecf20Sopenharmony_ci to) a given time. When running, it is just like a tasklet (in fact, they 14368c2ecf20Sopenharmony_ci are called from the ``TIMER_SOFTIRQ``). 14378c2ecf20Sopenharmony_ci 14388c2ecf20Sopenharmony_ciUP 14398c2ecf20Sopenharmony_ci Uni-Processor: Non-SMP. (``CONFIG_SMP=n``). 14408c2ecf20Sopenharmony_ci 14418c2ecf20Sopenharmony_ciUser Context 14428c2ecf20Sopenharmony_ci The kernel executing on behalf of a particular process (ie. a system 14438c2ecf20Sopenharmony_ci call or trap) or kernel thread. You can tell which process with the 14448c2ecf20Sopenharmony_ci ``current`` macro.) Not to be confused with userspace. Can be 14458c2ecf20Sopenharmony_ci interrupted by software or hardware interrupts. 14468c2ecf20Sopenharmony_ci 14478c2ecf20Sopenharmony_ciUserspace 14488c2ecf20Sopenharmony_ci A process executing its own code outside the kernel. 1449