18c2ecf20Sopenharmony_ci.. _kernel_hacking_lock:
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci===========================
48c2ecf20Sopenharmony_ciUnreliable Guide To Locking
58c2ecf20Sopenharmony_ci===========================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ci:Author: Rusty Russell
88c2ecf20Sopenharmony_ci
98c2ecf20Sopenharmony_ciIntroduction
108c2ecf20Sopenharmony_ci============
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciWelcome, to Rusty's Remarkably Unreliable Guide to Kernel Locking
138c2ecf20Sopenharmony_ciissues. This document describes the locking systems in the Linux Kernel
148c2ecf20Sopenharmony_ciin 2.6.
158c2ecf20Sopenharmony_ci
168c2ecf20Sopenharmony_ciWith the wide availability of HyperThreading, and preemption in the
178c2ecf20Sopenharmony_ciLinux Kernel, everyone hacking on the kernel needs to know the
188c2ecf20Sopenharmony_cifundamentals of concurrency and locking for SMP.
198c2ecf20Sopenharmony_ci
208c2ecf20Sopenharmony_ciThe Problem With Concurrency
218c2ecf20Sopenharmony_ci============================
228c2ecf20Sopenharmony_ci
238c2ecf20Sopenharmony_ci(Skip this if you know what a Race Condition is).
248c2ecf20Sopenharmony_ci
258c2ecf20Sopenharmony_ciIn a normal program, you can increment a counter like so:
268c2ecf20Sopenharmony_ci
278c2ecf20Sopenharmony_ci::
288c2ecf20Sopenharmony_ci
298c2ecf20Sopenharmony_ci          very_important_count++;
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ci
328c2ecf20Sopenharmony_ciThis is what they would expect to happen:
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ci
358c2ecf20Sopenharmony_ci.. table:: Expected Results
368c2ecf20Sopenharmony_ci
378c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
388c2ecf20Sopenharmony_ci  | Instance 1                         | Instance 2                         |
398c2ecf20Sopenharmony_ci  +====================================+====================================+
408c2ecf20Sopenharmony_ci  | read very_important_count (5)      |                                    |
418c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
428c2ecf20Sopenharmony_ci  | add 1 (6)                          |                                    |
438c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
448c2ecf20Sopenharmony_ci  | write very_important_count (6)     |                                    |
458c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
468c2ecf20Sopenharmony_ci  |                                    | read very_important_count (6)      |
478c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
488c2ecf20Sopenharmony_ci  |                                    | add 1 (7)                          |
498c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
508c2ecf20Sopenharmony_ci  |                                    | write very_important_count (7)     |
518c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
528c2ecf20Sopenharmony_ci
538c2ecf20Sopenharmony_ciThis is what might happen:
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ci.. table:: Possible Results
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
588c2ecf20Sopenharmony_ci  | Instance 1                         | Instance 2                         |
598c2ecf20Sopenharmony_ci  +====================================+====================================+
608c2ecf20Sopenharmony_ci  | read very_important_count (5)      |                                    |
618c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
628c2ecf20Sopenharmony_ci  |                                    | read very_important_count (5)      |
638c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
648c2ecf20Sopenharmony_ci  | add 1 (6)                          |                                    |
658c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
668c2ecf20Sopenharmony_ci  |                                    | add 1 (6)                          |
678c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
688c2ecf20Sopenharmony_ci  | write very_important_count (6)     |                                    |
698c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
708c2ecf20Sopenharmony_ci  |                                    | write very_important_count (6)     |
718c2ecf20Sopenharmony_ci  +------------------------------------+------------------------------------+
728c2ecf20Sopenharmony_ci
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ciRace Conditions and Critical Regions
758c2ecf20Sopenharmony_ci------------------------------------
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ciThis overlap, where the result depends on the relative timing of
788c2ecf20Sopenharmony_cimultiple tasks, is called a race condition. The piece of code containing
798c2ecf20Sopenharmony_cithe concurrency issue is called a critical region. And especially since
808c2ecf20Sopenharmony_ciLinux starting running on SMP machines, they became one of the major
818c2ecf20Sopenharmony_ciissues in kernel design and implementation.
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_ciPreemption can have the same effect, even if there is only one CPU: by
848c2ecf20Sopenharmony_cipreempting one task during the critical region, we have exactly the same
858c2ecf20Sopenharmony_cirace condition. In this case the thread which preempts might run the
868c2ecf20Sopenharmony_cicritical region itself.
878c2ecf20Sopenharmony_ci
888c2ecf20Sopenharmony_ciThe solution is to recognize when these simultaneous accesses occur, and
898c2ecf20Sopenharmony_ciuse locks to make sure that only one instance can enter the critical
908c2ecf20Sopenharmony_ciregion at any time. There are many friendly primitives in the Linux
918c2ecf20Sopenharmony_cikernel to help you do this. And then there are the unfriendly
928c2ecf20Sopenharmony_ciprimitives, but I'll pretend they don't exist.
938c2ecf20Sopenharmony_ci
948c2ecf20Sopenharmony_ciLocking in the Linux Kernel
958c2ecf20Sopenharmony_ci===========================
968c2ecf20Sopenharmony_ci
978c2ecf20Sopenharmony_ciIf I could give you one piece of advice: never sleep with anyone crazier
988c2ecf20Sopenharmony_cithan yourself. But if I had to give you advice on locking: **keep it
998c2ecf20Sopenharmony_cisimple**.
1008c2ecf20Sopenharmony_ci
1018c2ecf20Sopenharmony_ciBe reluctant to introduce new locks.
1028c2ecf20Sopenharmony_ci
1038c2ecf20Sopenharmony_ciStrangely enough, this last one is the exact reverse of my advice when
1048c2ecf20Sopenharmony_ciyou **have** slept with someone crazier than yourself. And you should
1058c2ecf20Sopenharmony_cithink about getting a big dog.
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ciTwo Main Types of Kernel Locks: Spinlocks and Mutexes
1088c2ecf20Sopenharmony_ci-----------------------------------------------------
1098c2ecf20Sopenharmony_ci
1108c2ecf20Sopenharmony_ciThere are two main types of kernel locks. The fundamental type is the
1118c2ecf20Sopenharmony_cispinlock (``include/asm/spinlock.h``), which is a very simple
1128c2ecf20Sopenharmony_cisingle-holder lock: if you can't get the spinlock, you keep trying
1138c2ecf20Sopenharmony_ci(spinning) until you can. Spinlocks are very small and fast, and can be
1148c2ecf20Sopenharmony_ciused anywhere.
1158c2ecf20Sopenharmony_ci
1168c2ecf20Sopenharmony_ciThe second type is a mutex (``include/linux/mutex.h``): it is like a
1178c2ecf20Sopenharmony_cispinlock, but you may block holding a mutex. If you can't lock a mutex,
1188c2ecf20Sopenharmony_ciyour task will suspend itself, and be woken up when the mutex is
1198c2ecf20Sopenharmony_cireleased. This means the CPU can do something else while you are
1208c2ecf20Sopenharmony_ciwaiting. There are many cases when you simply can't sleep (see
1218c2ecf20Sopenharmony_ci`What Functions Are Safe To Call From Interrupts? <#sleeping-things>`__),
1228c2ecf20Sopenharmony_ciand so have to use a spinlock instead.
1238c2ecf20Sopenharmony_ci
1248c2ecf20Sopenharmony_ciNeither type of lock is recursive: see
1258c2ecf20Sopenharmony_ci`Deadlock: Simple and Advanced <#deadlock>`__.
1268c2ecf20Sopenharmony_ci
1278c2ecf20Sopenharmony_ciLocks and Uniprocessor Kernels
1288c2ecf20Sopenharmony_ci------------------------------
1298c2ecf20Sopenharmony_ci
1308c2ecf20Sopenharmony_ciFor kernels compiled without ``CONFIG_SMP``, and without
1318c2ecf20Sopenharmony_ci``CONFIG_PREEMPT`` spinlocks do not exist at all. This is an excellent
1328c2ecf20Sopenharmony_cidesign decision: when no-one else can run at the same time, there is no
1338c2ecf20Sopenharmony_cireason to have a lock.
1348c2ecf20Sopenharmony_ci
1358c2ecf20Sopenharmony_ciIf the kernel is compiled without ``CONFIG_SMP``, but ``CONFIG_PREEMPT``
1368c2ecf20Sopenharmony_ciis set, then spinlocks simply disable preemption, which is sufficient to
1378c2ecf20Sopenharmony_ciprevent any races. For most purposes, we can think of preemption as
1388c2ecf20Sopenharmony_ciequivalent to SMP, and not worry about it separately.
1398c2ecf20Sopenharmony_ci
1408c2ecf20Sopenharmony_ciYou should always test your locking code with ``CONFIG_SMP`` and
1418c2ecf20Sopenharmony_ci``CONFIG_PREEMPT`` enabled, even if you don't have an SMP test box,
1428c2ecf20Sopenharmony_cibecause it will still catch some kinds of locking bugs.
1438c2ecf20Sopenharmony_ci
1448c2ecf20Sopenharmony_ciMutexes still exist, because they are required for synchronization
1458c2ecf20Sopenharmony_cibetween user contexts, as we will see below.
1468c2ecf20Sopenharmony_ci
1478c2ecf20Sopenharmony_ciLocking Only In User Context
1488c2ecf20Sopenharmony_ci----------------------------
1498c2ecf20Sopenharmony_ci
1508c2ecf20Sopenharmony_ciIf you have a data structure which is only ever accessed from user
1518c2ecf20Sopenharmony_cicontext, then you can use a simple mutex (``include/linux/mutex.h``) to
1528c2ecf20Sopenharmony_ciprotect it. This is the most trivial case: you initialize the mutex.
1538c2ecf20Sopenharmony_ciThen you can call mutex_lock_interruptible() to grab the
1548c2ecf20Sopenharmony_cimutex, and mutex_unlock() to release it. There is also a
1558c2ecf20Sopenharmony_cimutex_lock(), which should be avoided, because it will
1568c2ecf20Sopenharmony_cinot return if a signal is received.
1578c2ecf20Sopenharmony_ci
1588c2ecf20Sopenharmony_ciExample: ``net/netfilter/nf_sockopt.c`` allows registration of new
1598c2ecf20Sopenharmony_cisetsockopt() and getsockopt() calls, with
1608c2ecf20Sopenharmony_cinf_register_sockopt(). Registration and de-registration
1618c2ecf20Sopenharmony_ciare only done on module load and unload (and boot time, where there is
1628c2ecf20Sopenharmony_cino concurrency), and the list of registrations is only consulted for an
1638c2ecf20Sopenharmony_ciunknown setsockopt() or getsockopt() system
1648c2ecf20Sopenharmony_cicall. The ``nf_sockopt_mutex`` is perfect to protect this, especially
1658c2ecf20Sopenharmony_cisince the setsockopt and getsockopt calls may well sleep.
1668c2ecf20Sopenharmony_ci
1678c2ecf20Sopenharmony_ciLocking Between User Context and Softirqs
1688c2ecf20Sopenharmony_ci-----------------------------------------
1698c2ecf20Sopenharmony_ci
1708c2ecf20Sopenharmony_ciIf a softirq shares data with user context, you have two problems.
1718c2ecf20Sopenharmony_ciFirstly, the current user context can be interrupted by a softirq, and
1728c2ecf20Sopenharmony_cisecondly, the critical region could be entered from another CPU. This is
1738c2ecf20Sopenharmony_ciwhere spin_lock_bh() (``include/linux/spinlock.h``) is
1748c2ecf20Sopenharmony_ciused. It disables softirqs on that CPU, then grabs the lock.
1758c2ecf20Sopenharmony_cispin_unlock_bh() does the reverse. (The '_bh' suffix is
1768c2ecf20Sopenharmony_cia historical reference to "Bottom Halves", the old name for software
1778c2ecf20Sopenharmony_ciinterrupts. It should really be called spin_lock_softirq()' in a
1788c2ecf20Sopenharmony_ciperfect world).
1798c2ecf20Sopenharmony_ci
1808c2ecf20Sopenharmony_ciNote that you can also use spin_lock_irq() or
1818c2ecf20Sopenharmony_cispin_lock_irqsave() here, which stop hardware interrupts
1828c2ecf20Sopenharmony_cias well: see `Hard IRQ Context <#hard-irq-context>`__.
1838c2ecf20Sopenharmony_ci
1848c2ecf20Sopenharmony_ciThis works perfectly for UP as well: the spin lock vanishes, and this
1858c2ecf20Sopenharmony_cimacro simply becomes local_bh_disable()
1868c2ecf20Sopenharmony_ci(``include/linux/interrupt.h``), which protects you from the softirq
1878c2ecf20Sopenharmony_cibeing run.
1888c2ecf20Sopenharmony_ci
1898c2ecf20Sopenharmony_ciLocking Between User Context and Tasklets
1908c2ecf20Sopenharmony_ci-----------------------------------------
1918c2ecf20Sopenharmony_ci
1928c2ecf20Sopenharmony_ciThis is exactly the same as above, because tasklets are actually run
1938c2ecf20Sopenharmony_cifrom a softirq.
1948c2ecf20Sopenharmony_ci
1958c2ecf20Sopenharmony_ciLocking Between User Context and Timers
1968c2ecf20Sopenharmony_ci---------------------------------------
1978c2ecf20Sopenharmony_ci
1988c2ecf20Sopenharmony_ciThis, too, is exactly the same as above, because timers are actually run
1998c2ecf20Sopenharmony_cifrom a softirq. From a locking point of view, tasklets and timers are
2008c2ecf20Sopenharmony_ciidentical.
2018c2ecf20Sopenharmony_ci
2028c2ecf20Sopenharmony_ciLocking Between Tasklets/Timers
2038c2ecf20Sopenharmony_ci-------------------------------
2048c2ecf20Sopenharmony_ci
2058c2ecf20Sopenharmony_ciSometimes a tasklet or timer might want to share data with another
2068c2ecf20Sopenharmony_citasklet or timer.
2078c2ecf20Sopenharmony_ci
2088c2ecf20Sopenharmony_ciThe Same Tasklet/Timer
2098c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~
2108c2ecf20Sopenharmony_ci
2118c2ecf20Sopenharmony_ciSince a tasklet is never run on two CPUs at once, you don't need to
2128c2ecf20Sopenharmony_ciworry about your tasklet being reentrant (running twice at once), even
2138c2ecf20Sopenharmony_cion SMP.
2148c2ecf20Sopenharmony_ci
2158c2ecf20Sopenharmony_ciDifferent Tasklets/Timers
2168c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~
2178c2ecf20Sopenharmony_ci
2188c2ecf20Sopenharmony_ciIf another tasklet/timer wants to share data with your tasklet or timer
2198c2ecf20Sopenharmony_ci, you will both need to use spin_lock() and
2208c2ecf20Sopenharmony_cispin_unlock() calls. spin_lock_bh() is
2218c2ecf20Sopenharmony_ciunnecessary here, as you are already in a tasklet, and none will be run
2228c2ecf20Sopenharmony_cion the same CPU.
2238c2ecf20Sopenharmony_ci
2248c2ecf20Sopenharmony_ciLocking Between Softirqs
2258c2ecf20Sopenharmony_ci------------------------
2268c2ecf20Sopenharmony_ci
2278c2ecf20Sopenharmony_ciOften a softirq might want to share data with itself or a tasklet/timer.
2288c2ecf20Sopenharmony_ci
2298c2ecf20Sopenharmony_ciThe Same Softirq
2308c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~
2318c2ecf20Sopenharmony_ci
2328c2ecf20Sopenharmony_ciThe same softirq can run on the other CPUs: you can use a per-CPU array
2338c2ecf20Sopenharmony_ci(see `Per-CPU Data <#per-cpu-data>`__) for better performance. If you're
2348c2ecf20Sopenharmony_cigoing so far as to use a softirq, you probably care about scalable
2358c2ecf20Sopenharmony_ciperformance enough to justify the extra complexity.
2368c2ecf20Sopenharmony_ci
2378c2ecf20Sopenharmony_ciYou'll need to use spin_lock() and
2388c2ecf20Sopenharmony_cispin_unlock() for shared data.
2398c2ecf20Sopenharmony_ci
2408c2ecf20Sopenharmony_ciDifferent Softirqs
2418c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~
2428c2ecf20Sopenharmony_ci
2438c2ecf20Sopenharmony_ciYou'll need to use spin_lock() and
2448c2ecf20Sopenharmony_cispin_unlock() for shared data, whether it be a timer,
2458c2ecf20Sopenharmony_citasklet, different softirq or the same or another softirq: any of them
2468c2ecf20Sopenharmony_cicould be running on a different CPU.
2478c2ecf20Sopenharmony_ci
2488c2ecf20Sopenharmony_ciHard IRQ Context
2498c2ecf20Sopenharmony_ci================
2508c2ecf20Sopenharmony_ci
2518c2ecf20Sopenharmony_ciHardware interrupts usually communicate with a tasklet or softirq.
2528c2ecf20Sopenharmony_ciFrequently this involves putting work in a queue, which the softirq will
2538c2ecf20Sopenharmony_citake out.
2548c2ecf20Sopenharmony_ci
2558c2ecf20Sopenharmony_ciLocking Between Hard IRQ and Softirqs/Tasklets
2568c2ecf20Sopenharmony_ci----------------------------------------------
2578c2ecf20Sopenharmony_ci
2588c2ecf20Sopenharmony_ciIf a hardware irq handler shares data with a softirq, you have two
2598c2ecf20Sopenharmony_ciconcerns. Firstly, the softirq processing can be interrupted by a
2608c2ecf20Sopenharmony_cihardware interrupt, and secondly, the critical region could be entered
2618c2ecf20Sopenharmony_ciby a hardware interrupt on another CPU. This is where
2628c2ecf20Sopenharmony_cispin_lock_irq() is used. It is defined to disable
2638c2ecf20Sopenharmony_ciinterrupts on that cpu, then grab the lock.
2648c2ecf20Sopenharmony_cispin_unlock_irq() does the reverse.
2658c2ecf20Sopenharmony_ci
2668c2ecf20Sopenharmony_ciThe irq handler does not need to use spin_lock_irq(), because
2678c2ecf20Sopenharmony_cithe softirq cannot run while the irq handler is running: it can use
2688c2ecf20Sopenharmony_cispin_lock(), which is slightly faster. The only exception
2698c2ecf20Sopenharmony_ciwould be if a different hardware irq handler uses the same lock:
2708c2ecf20Sopenharmony_cispin_lock_irq() will stop that from interrupting us.
2718c2ecf20Sopenharmony_ci
2728c2ecf20Sopenharmony_ciThis works perfectly for UP as well: the spin lock vanishes, and this
2738c2ecf20Sopenharmony_cimacro simply becomes local_irq_disable()
2748c2ecf20Sopenharmony_ci(``include/asm/smp.h``), which protects you from the softirq/tasklet/BH
2758c2ecf20Sopenharmony_cibeing run.
2768c2ecf20Sopenharmony_ci
2778c2ecf20Sopenharmony_cispin_lock_irqsave() (``include/linux/spinlock.h``) is a
2788c2ecf20Sopenharmony_civariant which saves whether interrupts were on or off in a flags word,
2798c2ecf20Sopenharmony_ciwhich is passed to spin_unlock_irqrestore(). This means
2808c2ecf20Sopenharmony_cithat the same code can be used inside an hard irq handler (where
2818c2ecf20Sopenharmony_ciinterrupts are already off) and in softirqs (where the irq disabling is
2828c2ecf20Sopenharmony_cirequired).
2838c2ecf20Sopenharmony_ci
2848c2ecf20Sopenharmony_ciNote that softirqs (and hence tasklets and timers) are run on return
2858c2ecf20Sopenharmony_cifrom hardware interrupts, so spin_lock_irq() also stops
2868c2ecf20Sopenharmony_cithese. In that sense, spin_lock_irqsave() is the most
2878c2ecf20Sopenharmony_cigeneral and powerful locking function.
2888c2ecf20Sopenharmony_ci
2898c2ecf20Sopenharmony_ciLocking Between Two Hard IRQ Handlers
2908c2ecf20Sopenharmony_ci-------------------------------------
2918c2ecf20Sopenharmony_ci
2928c2ecf20Sopenharmony_ciIt is rare to have to share data between two IRQ handlers, but if you
2938c2ecf20Sopenharmony_cido, spin_lock_irqsave() should be used: it is
2948c2ecf20Sopenharmony_ciarchitecture-specific whether all interrupts are disabled inside irq
2958c2ecf20Sopenharmony_cihandlers themselves.
2968c2ecf20Sopenharmony_ci
2978c2ecf20Sopenharmony_ciCheat Sheet For Locking
2988c2ecf20Sopenharmony_ci=======================
2998c2ecf20Sopenharmony_ci
3008c2ecf20Sopenharmony_ciPete Zaitcev gives the following summary:
3018c2ecf20Sopenharmony_ci
3028c2ecf20Sopenharmony_ci-  If you are in a process context (any syscall) and want to lock other
3038c2ecf20Sopenharmony_ci   process out, use a mutex. You can take a mutex and sleep
3048c2ecf20Sopenharmony_ci   (``copy_from_user*(`` or ``kmalloc(x,GFP_KERNEL)``).
3058c2ecf20Sopenharmony_ci
3068c2ecf20Sopenharmony_ci-  Otherwise (== data can be touched in an interrupt), use
3078c2ecf20Sopenharmony_ci   spin_lock_irqsave() and
3088c2ecf20Sopenharmony_ci   spin_unlock_irqrestore().
3098c2ecf20Sopenharmony_ci
3108c2ecf20Sopenharmony_ci-  Avoid holding spinlock for more than 5 lines of code and across any
3118c2ecf20Sopenharmony_ci   function call (except accessors like readb()).
3128c2ecf20Sopenharmony_ci
3138c2ecf20Sopenharmony_ciTable of Minimum Requirements
3148c2ecf20Sopenharmony_ci-----------------------------
3158c2ecf20Sopenharmony_ci
3168c2ecf20Sopenharmony_ciThe following table lists the **minimum** locking requirements between
3178c2ecf20Sopenharmony_civarious contexts. In some cases, the same context can only be running on
3188c2ecf20Sopenharmony_cione CPU at a time, so no locking is required for that context (eg. a
3198c2ecf20Sopenharmony_ciparticular thread can only run on one CPU at a time, but if it needs
3208c2ecf20Sopenharmony_cishares data with another thread, locking is required).
3218c2ecf20Sopenharmony_ci
3228c2ecf20Sopenharmony_ciRemember the advice above: you can always use
3238c2ecf20Sopenharmony_cispin_lock_irqsave(), which is a superset of all other
3248c2ecf20Sopenharmony_cispinlock primitives.
3258c2ecf20Sopenharmony_ci
3268c2ecf20Sopenharmony_ci============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
3278c2ecf20Sopenharmony_ci.              IRQ Handler A IRQ Handler B Softirq A Softirq B Tasklet A Tasklet B Timer A Timer B User Context A User Context B
3288c2ecf20Sopenharmony_ci============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
3298c2ecf20Sopenharmony_ciIRQ Handler A  None
3308c2ecf20Sopenharmony_ciIRQ Handler B  SLIS          None
3318c2ecf20Sopenharmony_ciSoftirq A      SLI           SLI           SL
3328c2ecf20Sopenharmony_ciSoftirq B      SLI           SLI           SL        SL
3338c2ecf20Sopenharmony_ciTasklet A      SLI           SLI           SL        SL        None
3348c2ecf20Sopenharmony_ciTasklet B      SLI           SLI           SL        SL        SL        None
3358c2ecf20Sopenharmony_ciTimer A        SLI           SLI           SL        SL        SL        SL        None
3368c2ecf20Sopenharmony_ciTimer B        SLI           SLI           SL        SL        SL        SL        SL      None
3378c2ecf20Sopenharmony_ciUser Context A SLI           SLI           SLBH      SLBH      SLBH      SLBH      SLBH    SLBH    None
3388c2ecf20Sopenharmony_ciUser Context B SLI           SLI           SLBH      SLBH      SLBH      SLBH      SLBH    SLBH    MLI            None
3398c2ecf20Sopenharmony_ci============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
3408c2ecf20Sopenharmony_ci
3418c2ecf20Sopenharmony_ciTable: Table of Locking Requirements
3428c2ecf20Sopenharmony_ci
3438c2ecf20Sopenharmony_ci+--------+----------------------------+
3448c2ecf20Sopenharmony_ci| SLIS   | spin_lock_irqsave          |
3458c2ecf20Sopenharmony_ci+--------+----------------------------+
3468c2ecf20Sopenharmony_ci| SLI    | spin_lock_irq              |
3478c2ecf20Sopenharmony_ci+--------+----------------------------+
3488c2ecf20Sopenharmony_ci| SL     | spin_lock                  |
3498c2ecf20Sopenharmony_ci+--------+----------------------------+
3508c2ecf20Sopenharmony_ci| SLBH   | spin_lock_bh               |
3518c2ecf20Sopenharmony_ci+--------+----------------------------+
3528c2ecf20Sopenharmony_ci| MLI    | mutex_lock_interruptible   |
3538c2ecf20Sopenharmony_ci+--------+----------------------------+
3548c2ecf20Sopenharmony_ci
3558c2ecf20Sopenharmony_ciTable: Legend for Locking Requirements Table
3568c2ecf20Sopenharmony_ci
3578c2ecf20Sopenharmony_ciThe trylock Functions
3588c2ecf20Sopenharmony_ci=====================
3598c2ecf20Sopenharmony_ci
3608c2ecf20Sopenharmony_ciThere are functions that try to acquire a lock only once and immediately
3618c2ecf20Sopenharmony_cireturn a value telling about success or failure to acquire the lock.
3628c2ecf20Sopenharmony_ciThey can be used if you need no access to the data protected with the
3638c2ecf20Sopenharmony_cilock when some other thread is holding the lock. You should acquire the
3648c2ecf20Sopenharmony_cilock later if you then need access to the data protected with the lock.
3658c2ecf20Sopenharmony_ci
3668c2ecf20Sopenharmony_cispin_trylock() does not spin but returns non-zero if it
3678c2ecf20Sopenharmony_ciacquires the spinlock on the first try or 0 if not. This function can be
3688c2ecf20Sopenharmony_ciused in all contexts like spin_lock(): you must have
3698c2ecf20Sopenharmony_cidisabled the contexts that might interrupt you and acquire the spin
3708c2ecf20Sopenharmony_cilock.
3718c2ecf20Sopenharmony_ci
3728c2ecf20Sopenharmony_cimutex_trylock() does not suspend your task but returns
3738c2ecf20Sopenharmony_cinon-zero if it could lock the mutex on the first try or 0 if not. This
3748c2ecf20Sopenharmony_cifunction cannot be safely used in hardware or software interrupt
3758c2ecf20Sopenharmony_cicontexts despite not sleeping.
3768c2ecf20Sopenharmony_ci
3778c2ecf20Sopenharmony_ciCommon Examples
3788c2ecf20Sopenharmony_ci===============
3798c2ecf20Sopenharmony_ci
3808c2ecf20Sopenharmony_ciLet's step through a simple example: a cache of number to name mappings.
3818c2ecf20Sopenharmony_ciThe cache keeps a count of how often each of the objects is used, and
3828c2ecf20Sopenharmony_ciwhen it gets full, throws out the least used one.
3838c2ecf20Sopenharmony_ci
3848c2ecf20Sopenharmony_ciAll In User Context
3858c2ecf20Sopenharmony_ci-------------------
3868c2ecf20Sopenharmony_ci
3878c2ecf20Sopenharmony_ciFor our first example, we assume that all operations are in user context
3888c2ecf20Sopenharmony_ci(ie. from system calls), so we can sleep. This means we can use a mutex
3898c2ecf20Sopenharmony_cito protect the cache and all the objects within it. Here's the code::
3908c2ecf20Sopenharmony_ci
3918c2ecf20Sopenharmony_ci    #include <linux/list.h>
3928c2ecf20Sopenharmony_ci    #include <linux/slab.h>
3938c2ecf20Sopenharmony_ci    #include <linux/string.h>
3948c2ecf20Sopenharmony_ci    #include <linux/mutex.h>
3958c2ecf20Sopenharmony_ci    #include <asm/errno.h>
3968c2ecf20Sopenharmony_ci
3978c2ecf20Sopenharmony_ci    struct object
3988c2ecf20Sopenharmony_ci    {
3998c2ecf20Sopenharmony_ci            struct list_head list;
4008c2ecf20Sopenharmony_ci            int id;
4018c2ecf20Sopenharmony_ci            char name[32];
4028c2ecf20Sopenharmony_ci            int popularity;
4038c2ecf20Sopenharmony_ci    };
4048c2ecf20Sopenharmony_ci
4058c2ecf20Sopenharmony_ci    /* Protects the cache, cache_num, and the objects within it */
4068c2ecf20Sopenharmony_ci    static DEFINE_MUTEX(cache_lock);
4078c2ecf20Sopenharmony_ci    static LIST_HEAD(cache);
4088c2ecf20Sopenharmony_ci    static unsigned int cache_num = 0;
4098c2ecf20Sopenharmony_ci    #define MAX_CACHE_SIZE 10
4108c2ecf20Sopenharmony_ci
4118c2ecf20Sopenharmony_ci    /* Must be holding cache_lock */
4128c2ecf20Sopenharmony_ci    static struct object *__cache_find(int id)
4138c2ecf20Sopenharmony_ci    {
4148c2ecf20Sopenharmony_ci            struct object *i;
4158c2ecf20Sopenharmony_ci
4168c2ecf20Sopenharmony_ci            list_for_each_entry(i, &cache, list)
4178c2ecf20Sopenharmony_ci                    if (i->id == id) {
4188c2ecf20Sopenharmony_ci                            i->popularity++;
4198c2ecf20Sopenharmony_ci                            return i;
4208c2ecf20Sopenharmony_ci                    }
4218c2ecf20Sopenharmony_ci            return NULL;
4228c2ecf20Sopenharmony_ci    }
4238c2ecf20Sopenharmony_ci
4248c2ecf20Sopenharmony_ci    /* Must be holding cache_lock */
4258c2ecf20Sopenharmony_ci    static void __cache_delete(struct object *obj)
4268c2ecf20Sopenharmony_ci    {
4278c2ecf20Sopenharmony_ci            BUG_ON(!obj);
4288c2ecf20Sopenharmony_ci            list_del(&obj->list);
4298c2ecf20Sopenharmony_ci            kfree(obj);
4308c2ecf20Sopenharmony_ci            cache_num--;
4318c2ecf20Sopenharmony_ci    }
4328c2ecf20Sopenharmony_ci
4338c2ecf20Sopenharmony_ci    /* Must be holding cache_lock */
4348c2ecf20Sopenharmony_ci    static void __cache_add(struct object *obj)
4358c2ecf20Sopenharmony_ci    {
4368c2ecf20Sopenharmony_ci            list_add(&obj->list, &cache);
4378c2ecf20Sopenharmony_ci            if (++cache_num > MAX_CACHE_SIZE) {
4388c2ecf20Sopenharmony_ci                    struct object *i, *outcast = NULL;
4398c2ecf20Sopenharmony_ci                    list_for_each_entry(i, &cache, list) {
4408c2ecf20Sopenharmony_ci                            if (!outcast || i->popularity < outcast->popularity)
4418c2ecf20Sopenharmony_ci                                    outcast = i;
4428c2ecf20Sopenharmony_ci                    }
4438c2ecf20Sopenharmony_ci                    __cache_delete(outcast);
4448c2ecf20Sopenharmony_ci            }
4458c2ecf20Sopenharmony_ci    }
4468c2ecf20Sopenharmony_ci
4478c2ecf20Sopenharmony_ci    int cache_add(int id, const char *name)
4488c2ecf20Sopenharmony_ci    {
4498c2ecf20Sopenharmony_ci            struct object *obj;
4508c2ecf20Sopenharmony_ci
4518c2ecf20Sopenharmony_ci            if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
4528c2ecf20Sopenharmony_ci                    return -ENOMEM;
4538c2ecf20Sopenharmony_ci
4548c2ecf20Sopenharmony_ci            strscpy(obj->name, name, sizeof(obj->name));
4558c2ecf20Sopenharmony_ci            obj->id = id;
4568c2ecf20Sopenharmony_ci            obj->popularity = 0;
4578c2ecf20Sopenharmony_ci
4588c2ecf20Sopenharmony_ci            mutex_lock(&cache_lock);
4598c2ecf20Sopenharmony_ci            __cache_add(obj);
4608c2ecf20Sopenharmony_ci            mutex_unlock(&cache_lock);
4618c2ecf20Sopenharmony_ci            return 0;
4628c2ecf20Sopenharmony_ci    }
4638c2ecf20Sopenharmony_ci
4648c2ecf20Sopenharmony_ci    void cache_delete(int id)
4658c2ecf20Sopenharmony_ci    {
4668c2ecf20Sopenharmony_ci            mutex_lock(&cache_lock);
4678c2ecf20Sopenharmony_ci            __cache_delete(__cache_find(id));
4688c2ecf20Sopenharmony_ci            mutex_unlock(&cache_lock);
4698c2ecf20Sopenharmony_ci    }
4708c2ecf20Sopenharmony_ci
4718c2ecf20Sopenharmony_ci    int cache_find(int id, char *name)
4728c2ecf20Sopenharmony_ci    {
4738c2ecf20Sopenharmony_ci            struct object *obj;
4748c2ecf20Sopenharmony_ci            int ret = -ENOENT;
4758c2ecf20Sopenharmony_ci
4768c2ecf20Sopenharmony_ci            mutex_lock(&cache_lock);
4778c2ecf20Sopenharmony_ci            obj = __cache_find(id);
4788c2ecf20Sopenharmony_ci            if (obj) {
4798c2ecf20Sopenharmony_ci                    ret = 0;
4808c2ecf20Sopenharmony_ci                    strcpy(name, obj->name);
4818c2ecf20Sopenharmony_ci            }
4828c2ecf20Sopenharmony_ci            mutex_unlock(&cache_lock);
4838c2ecf20Sopenharmony_ci            return ret;
4848c2ecf20Sopenharmony_ci    }
4858c2ecf20Sopenharmony_ci
4868c2ecf20Sopenharmony_ciNote that we always make sure we have the cache_lock when we add,
4878c2ecf20Sopenharmony_cidelete, or look up the cache: both the cache infrastructure itself and
4888c2ecf20Sopenharmony_cithe contents of the objects are protected by the lock. In this case it's
4898c2ecf20Sopenharmony_cieasy, since we copy the data for the user, and never let them access the
4908c2ecf20Sopenharmony_ciobjects directly.
4918c2ecf20Sopenharmony_ci
4928c2ecf20Sopenharmony_ciThere is a slight (and common) optimization here: in
4938c2ecf20Sopenharmony_cicache_add() we set up the fields of the object before
4948c2ecf20Sopenharmony_cigrabbing the lock. This is safe, as no-one else can access it until we
4958c2ecf20Sopenharmony_ciput it in cache.
4968c2ecf20Sopenharmony_ci
4978c2ecf20Sopenharmony_ciAccessing From Interrupt Context
4988c2ecf20Sopenharmony_ci--------------------------------
4998c2ecf20Sopenharmony_ci
5008c2ecf20Sopenharmony_ciNow consider the case where cache_find() can be called
5018c2ecf20Sopenharmony_cifrom interrupt context: either a hardware interrupt or a softirq. An
5028c2ecf20Sopenharmony_ciexample would be a timer which deletes object from the cache.
5038c2ecf20Sopenharmony_ci
5048c2ecf20Sopenharmony_ciThe change is shown below, in standard patch format: the ``-`` are lines
5058c2ecf20Sopenharmony_ciwhich are taken away, and the ``+`` are lines which are added.
5068c2ecf20Sopenharmony_ci
5078c2ecf20Sopenharmony_ci::
5088c2ecf20Sopenharmony_ci
5098c2ecf20Sopenharmony_ci    --- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100
5108c2ecf20Sopenharmony_ci    +++ cache.c.interrupt   2003-12-09 14:07:49.000000000 +1100
5118c2ecf20Sopenharmony_ci    @@ -12,7 +12,7 @@
5128c2ecf20Sopenharmony_ci             int popularity;
5138c2ecf20Sopenharmony_ci     };
5148c2ecf20Sopenharmony_ci
5158c2ecf20Sopenharmony_ci    -static DEFINE_MUTEX(cache_lock);
5168c2ecf20Sopenharmony_ci    +static DEFINE_SPINLOCK(cache_lock);
5178c2ecf20Sopenharmony_ci     static LIST_HEAD(cache);
5188c2ecf20Sopenharmony_ci     static unsigned int cache_num = 0;
5198c2ecf20Sopenharmony_ci     #define MAX_CACHE_SIZE 10
5208c2ecf20Sopenharmony_ci    @@ -55,6 +55,7 @@
5218c2ecf20Sopenharmony_ci     int cache_add(int id, const char *name)
5228c2ecf20Sopenharmony_ci     {
5238c2ecf20Sopenharmony_ci             struct object *obj;
5248c2ecf20Sopenharmony_ci    +        unsigned long flags;
5258c2ecf20Sopenharmony_ci
5268c2ecf20Sopenharmony_ci             if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
5278c2ecf20Sopenharmony_ci                     return -ENOMEM;
5288c2ecf20Sopenharmony_ci    @@ -63,30 +64,33 @@
5298c2ecf20Sopenharmony_ci             obj->id = id;
5308c2ecf20Sopenharmony_ci             obj->popularity = 0;
5318c2ecf20Sopenharmony_ci
5328c2ecf20Sopenharmony_ci    -        mutex_lock(&cache_lock);
5338c2ecf20Sopenharmony_ci    +        spin_lock_irqsave(&cache_lock, flags);
5348c2ecf20Sopenharmony_ci             __cache_add(obj);
5358c2ecf20Sopenharmony_ci    -        mutex_unlock(&cache_lock);
5368c2ecf20Sopenharmony_ci    +        spin_unlock_irqrestore(&cache_lock, flags);
5378c2ecf20Sopenharmony_ci             return 0;
5388c2ecf20Sopenharmony_ci     }
5398c2ecf20Sopenharmony_ci
5408c2ecf20Sopenharmony_ci     void cache_delete(int id)
5418c2ecf20Sopenharmony_ci     {
5428c2ecf20Sopenharmony_ci    -        mutex_lock(&cache_lock);
5438c2ecf20Sopenharmony_ci    +        unsigned long flags;
5448c2ecf20Sopenharmony_ci    +
5458c2ecf20Sopenharmony_ci    +        spin_lock_irqsave(&cache_lock, flags);
5468c2ecf20Sopenharmony_ci             __cache_delete(__cache_find(id));
5478c2ecf20Sopenharmony_ci    -        mutex_unlock(&cache_lock);
5488c2ecf20Sopenharmony_ci    +        spin_unlock_irqrestore(&cache_lock, flags);
5498c2ecf20Sopenharmony_ci     }
5508c2ecf20Sopenharmony_ci
5518c2ecf20Sopenharmony_ci     int cache_find(int id, char *name)
5528c2ecf20Sopenharmony_ci     {
5538c2ecf20Sopenharmony_ci             struct object *obj;
5548c2ecf20Sopenharmony_ci             int ret = -ENOENT;
5558c2ecf20Sopenharmony_ci    +        unsigned long flags;
5568c2ecf20Sopenharmony_ci
5578c2ecf20Sopenharmony_ci    -        mutex_lock(&cache_lock);
5588c2ecf20Sopenharmony_ci    +        spin_lock_irqsave(&cache_lock, flags);
5598c2ecf20Sopenharmony_ci             obj = __cache_find(id);
5608c2ecf20Sopenharmony_ci             if (obj) {
5618c2ecf20Sopenharmony_ci                     ret = 0;
5628c2ecf20Sopenharmony_ci                     strcpy(name, obj->name);
5638c2ecf20Sopenharmony_ci             }
5648c2ecf20Sopenharmony_ci    -        mutex_unlock(&cache_lock);
5658c2ecf20Sopenharmony_ci    +        spin_unlock_irqrestore(&cache_lock, flags);
5668c2ecf20Sopenharmony_ci             return ret;
5678c2ecf20Sopenharmony_ci     }
5688c2ecf20Sopenharmony_ci
5698c2ecf20Sopenharmony_ciNote that the spin_lock_irqsave() will turn off
5708c2ecf20Sopenharmony_ciinterrupts if they are on, otherwise does nothing (if we are already in
5718c2ecf20Sopenharmony_cian interrupt handler), hence these functions are safe to call from any
5728c2ecf20Sopenharmony_cicontext.
5738c2ecf20Sopenharmony_ci
5748c2ecf20Sopenharmony_ciUnfortunately, cache_add() calls kmalloc()
5758c2ecf20Sopenharmony_ciwith the ``GFP_KERNEL`` flag, which is only legal in user context. I
5768c2ecf20Sopenharmony_cihave assumed that cache_add() is still only called in
5778c2ecf20Sopenharmony_ciuser context, otherwise this should become a parameter to
5788c2ecf20Sopenharmony_cicache_add().
5798c2ecf20Sopenharmony_ci
5808c2ecf20Sopenharmony_ciExposing Objects Outside This File
5818c2ecf20Sopenharmony_ci----------------------------------
5828c2ecf20Sopenharmony_ci
5838c2ecf20Sopenharmony_ciIf our objects contained more information, it might not be sufficient to
5848c2ecf20Sopenharmony_cicopy the information in and out: other parts of the code might want to
5858c2ecf20Sopenharmony_cikeep pointers to these objects, for example, rather than looking up the
5868c2ecf20Sopenharmony_ciid every time. This produces two problems.
5878c2ecf20Sopenharmony_ci
5888c2ecf20Sopenharmony_ciThe first problem is that we use the ``cache_lock`` to protect objects:
5898c2ecf20Sopenharmony_ciwe'd need to make this non-static so the rest of the code can use it.
5908c2ecf20Sopenharmony_ciThis makes locking trickier, as it is no longer all in one place.
5918c2ecf20Sopenharmony_ci
5928c2ecf20Sopenharmony_ciThe second problem is the lifetime problem: if another structure keeps a
5938c2ecf20Sopenharmony_cipointer to an object, it presumably expects that pointer to remain
5948c2ecf20Sopenharmony_civalid. Unfortunately, this is only guaranteed while you hold the lock,
5958c2ecf20Sopenharmony_ciotherwise someone might call cache_delete() and even
5968c2ecf20Sopenharmony_ciworse, add another object, re-using the same address.
5978c2ecf20Sopenharmony_ci
5988c2ecf20Sopenharmony_ciAs there is only one lock, you can't hold it forever: no-one else would
5998c2ecf20Sopenharmony_ciget any work done.
6008c2ecf20Sopenharmony_ci
6018c2ecf20Sopenharmony_ciThe solution to this problem is to use a reference count: everyone who
6028c2ecf20Sopenharmony_cihas a pointer to the object increases it when they first get the object,
6038c2ecf20Sopenharmony_ciand drops the reference count when they're finished with it. Whoever
6048c2ecf20Sopenharmony_cidrops it to zero knows it is unused, and can actually delete it.
6058c2ecf20Sopenharmony_ci
6068c2ecf20Sopenharmony_ciHere is the code::
6078c2ecf20Sopenharmony_ci
6088c2ecf20Sopenharmony_ci    --- cache.c.interrupt   2003-12-09 14:25:43.000000000 +1100
6098c2ecf20Sopenharmony_ci    +++ cache.c.refcnt  2003-12-09 14:33:05.000000000 +1100
6108c2ecf20Sopenharmony_ci    @@ -7,6 +7,7 @@
6118c2ecf20Sopenharmony_ci     struct object
6128c2ecf20Sopenharmony_ci     {
6138c2ecf20Sopenharmony_ci             struct list_head list;
6148c2ecf20Sopenharmony_ci    +        unsigned int refcnt;
6158c2ecf20Sopenharmony_ci             int id;
6168c2ecf20Sopenharmony_ci             char name[32];
6178c2ecf20Sopenharmony_ci             int popularity;
6188c2ecf20Sopenharmony_ci    @@ -17,6 +18,35 @@
6198c2ecf20Sopenharmony_ci     static unsigned int cache_num = 0;
6208c2ecf20Sopenharmony_ci     #define MAX_CACHE_SIZE 10
6218c2ecf20Sopenharmony_ci
6228c2ecf20Sopenharmony_ci    +static void __object_put(struct object *obj)
6238c2ecf20Sopenharmony_ci    +{
6248c2ecf20Sopenharmony_ci    +        if (--obj->refcnt == 0)
6258c2ecf20Sopenharmony_ci    +                kfree(obj);
6268c2ecf20Sopenharmony_ci    +}
6278c2ecf20Sopenharmony_ci    +
6288c2ecf20Sopenharmony_ci    +static void __object_get(struct object *obj)
6298c2ecf20Sopenharmony_ci    +{
6308c2ecf20Sopenharmony_ci    +        obj->refcnt++;
6318c2ecf20Sopenharmony_ci    +}
6328c2ecf20Sopenharmony_ci    +
6338c2ecf20Sopenharmony_ci    +void object_put(struct object *obj)
6348c2ecf20Sopenharmony_ci    +{
6358c2ecf20Sopenharmony_ci    +        unsigned long flags;
6368c2ecf20Sopenharmony_ci    +
6378c2ecf20Sopenharmony_ci    +        spin_lock_irqsave(&cache_lock, flags);
6388c2ecf20Sopenharmony_ci    +        __object_put(obj);
6398c2ecf20Sopenharmony_ci    +        spin_unlock_irqrestore(&cache_lock, flags);
6408c2ecf20Sopenharmony_ci    +}
6418c2ecf20Sopenharmony_ci    +
6428c2ecf20Sopenharmony_ci    +void object_get(struct object *obj)
6438c2ecf20Sopenharmony_ci    +{
6448c2ecf20Sopenharmony_ci    +        unsigned long flags;
6458c2ecf20Sopenharmony_ci    +
6468c2ecf20Sopenharmony_ci    +        spin_lock_irqsave(&cache_lock, flags);
6478c2ecf20Sopenharmony_ci    +        __object_get(obj);
6488c2ecf20Sopenharmony_ci    +        spin_unlock_irqrestore(&cache_lock, flags);
6498c2ecf20Sopenharmony_ci    +}
6508c2ecf20Sopenharmony_ci    +
6518c2ecf20Sopenharmony_ci     /* Must be holding cache_lock */
6528c2ecf20Sopenharmony_ci     static struct object *__cache_find(int id)
6538c2ecf20Sopenharmony_ci     {
6548c2ecf20Sopenharmony_ci    @@ -35,6 +65,7 @@
6558c2ecf20Sopenharmony_ci     {
6568c2ecf20Sopenharmony_ci             BUG_ON(!obj);
6578c2ecf20Sopenharmony_ci             list_del(&obj->list);
6588c2ecf20Sopenharmony_ci    +        __object_put(obj);
6598c2ecf20Sopenharmony_ci             cache_num--;
6608c2ecf20Sopenharmony_ci     }
6618c2ecf20Sopenharmony_ci
6628c2ecf20Sopenharmony_ci    @@ -63,6 +94,7 @@
6638c2ecf20Sopenharmony_ci             strscpy(obj->name, name, sizeof(obj->name));
6648c2ecf20Sopenharmony_ci             obj->id = id;
6658c2ecf20Sopenharmony_ci             obj->popularity = 0;
6668c2ecf20Sopenharmony_ci    +        obj->refcnt = 1; /* The cache holds a reference */
6678c2ecf20Sopenharmony_ci
6688c2ecf20Sopenharmony_ci             spin_lock_irqsave(&cache_lock, flags);
6698c2ecf20Sopenharmony_ci             __cache_add(obj);
6708c2ecf20Sopenharmony_ci    @@ -79,18 +111,15 @@
6718c2ecf20Sopenharmony_ci             spin_unlock_irqrestore(&cache_lock, flags);
6728c2ecf20Sopenharmony_ci     }
6738c2ecf20Sopenharmony_ci
6748c2ecf20Sopenharmony_ci    -int cache_find(int id, char *name)
6758c2ecf20Sopenharmony_ci    +struct object *cache_find(int id)
6768c2ecf20Sopenharmony_ci     {
6778c2ecf20Sopenharmony_ci             struct object *obj;
6788c2ecf20Sopenharmony_ci    -        int ret = -ENOENT;
6798c2ecf20Sopenharmony_ci             unsigned long flags;
6808c2ecf20Sopenharmony_ci
6818c2ecf20Sopenharmony_ci             spin_lock_irqsave(&cache_lock, flags);
6828c2ecf20Sopenharmony_ci             obj = __cache_find(id);
6838c2ecf20Sopenharmony_ci    -        if (obj) {
6848c2ecf20Sopenharmony_ci    -                ret = 0;
6858c2ecf20Sopenharmony_ci    -                strcpy(name, obj->name);
6868c2ecf20Sopenharmony_ci    -        }
6878c2ecf20Sopenharmony_ci    +        if (obj)
6888c2ecf20Sopenharmony_ci    +                __object_get(obj);
6898c2ecf20Sopenharmony_ci             spin_unlock_irqrestore(&cache_lock, flags);
6908c2ecf20Sopenharmony_ci    -        return ret;
6918c2ecf20Sopenharmony_ci    +        return obj;
6928c2ecf20Sopenharmony_ci     }
6938c2ecf20Sopenharmony_ci
6948c2ecf20Sopenharmony_ciWe encapsulate the reference counting in the standard 'get' and 'put'
6958c2ecf20Sopenharmony_cifunctions. Now we can return the object itself from
6968c2ecf20Sopenharmony_cicache_find() which has the advantage that the user can
6978c2ecf20Sopenharmony_cinow sleep holding the object (eg. to copy_to_user() to
6988c2ecf20Sopenharmony_ciname to userspace).
6998c2ecf20Sopenharmony_ci
7008c2ecf20Sopenharmony_ciThe other point to note is that I said a reference should be held for
7018c2ecf20Sopenharmony_cievery pointer to the object: thus the reference count is 1 when first
7028c2ecf20Sopenharmony_ciinserted into the cache. In some versions the framework does not hold a
7038c2ecf20Sopenharmony_cireference count, but they are more complicated.
7048c2ecf20Sopenharmony_ci
7058c2ecf20Sopenharmony_ciUsing Atomic Operations For The Reference Count
7068c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7078c2ecf20Sopenharmony_ci
7088c2ecf20Sopenharmony_ciIn practice, :c:type:`atomic_t` would usually be used for refcnt. There are a
7098c2ecf20Sopenharmony_cinumber of atomic operations defined in ``include/asm/atomic.h``: these
7108c2ecf20Sopenharmony_ciare guaranteed to be seen atomically from all CPUs in the system, so no
7118c2ecf20Sopenharmony_cilock is required. In this case, it is simpler than using spinlocks,
7128c2ecf20Sopenharmony_cialthough for anything non-trivial using spinlocks is clearer. The
7138c2ecf20Sopenharmony_ciatomic_inc() and atomic_dec_and_test()
7148c2ecf20Sopenharmony_ciare used instead of the standard increment and decrement operators, and
7158c2ecf20Sopenharmony_cithe lock is no longer used to protect the reference count itself.
7168c2ecf20Sopenharmony_ci
7178c2ecf20Sopenharmony_ci::
7188c2ecf20Sopenharmony_ci
7198c2ecf20Sopenharmony_ci    --- cache.c.refcnt  2003-12-09 15:00:35.000000000 +1100
7208c2ecf20Sopenharmony_ci    +++ cache.c.refcnt-atomic   2003-12-11 15:49:42.000000000 +1100
7218c2ecf20Sopenharmony_ci    @@ -7,7 +7,7 @@
7228c2ecf20Sopenharmony_ci     struct object
7238c2ecf20Sopenharmony_ci     {
7248c2ecf20Sopenharmony_ci             struct list_head list;
7258c2ecf20Sopenharmony_ci    -        unsigned int refcnt;
7268c2ecf20Sopenharmony_ci    +        atomic_t refcnt;
7278c2ecf20Sopenharmony_ci             int id;
7288c2ecf20Sopenharmony_ci             char name[32];
7298c2ecf20Sopenharmony_ci             int popularity;
7308c2ecf20Sopenharmony_ci    @@ -18,33 +18,15 @@
7318c2ecf20Sopenharmony_ci     static unsigned int cache_num = 0;
7328c2ecf20Sopenharmony_ci     #define MAX_CACHE_SIZE 10
7338c2ecf20Sopenharmony_ci
7348c2ecf20Sopenharmony_ci    -static void __object_put(struct object *obj)
7358c2ecf20Sopenharmony_ci    -{
7368c2ecf20Sopenharmony_ci    -        if (--obj->refcnt == 0)
7378c2ecf20Sopenharmony_ci    -                kfree(obj);
7388c2ecf20Sopenharmony_ci    -}
7398c2ecf20Sopenharmony_ci    -
7408c2ecf20Sopenharmony_ci    -static void __object_get(struct object *obj)
7418c2ecf20Sopenharmony_ci    -{
7428c2ecf20Sopenharmony_ci    -        obj->refcnt++;
7438c2ecf20Sopenharmony_ci    -}
7448c2ecf20Sopenharmony_ci    -
7458c2ecf20Sopenharmony_ci     void object_put(struct object *obj)
7468c2ecf20Sopenharmony_ci     {
7478c2ecf20Sopenharmony_ci    -        unsigned long flags;
7488c2ecf20Sopenharmony_ci    -
7498c2ecf20Sopenharmony_ci    -        spin_lock_irqsave(&cache_lock, flags);
7508c2ecf20Sopenharmony_ci    -        __object_put(obj);
7518c2ecf20Sopenharmony_ci    -        spin_unlock_irqrestore(&cache_lock, flags);
7528c2ecf20Sopenharmony_ci    +        if (atomic_dec_and_test(&obj->refcnt))
7538c2ecf20Sopenharmony_ci    +                kfree(obj);
7548c2ecf20Sopenharmony_ci     }
7558c2ecf20Sopenharmony_ci
7568c2ecf20Sopenharmony_ci     void object_get(struct object *obj)
7578c2ecf20Sopenharmony_ci     {
7588c2ecf20Sopenharmony_ci    -        unsigned long flags;
7598c2ecf20Sopenharmony_ci    -
7608c2ecf20Sopenharmony_ci    -        spin_lock_irqsave(&cache_lock, flags);
7618c2ecf20Sopenharmony_ci    -        __object_get(obj);
7628c2ecf20Sopenharmony_ci    -        spin_unlock_irqrestore(&cache_lock, flags);
7638c2ecf20Sopenharmony_ci    +        atomic_inc(&obj->refcnt);
7648c2ecf20Sopenharmony_ci     }
7658c2ecf20Sopenharmony_ci
7668c2ecf20Sopenharmony_ci     /* Must be holding cache_lock */
7678c2ecf20Sopenharmony_ci    @@ -65,7 +47,7 @@
7688c2ecf20Sopenharmony_ci     {
7698c2ecf20Sopenharmony_ci             BUG_ON(!obj);
7708c2ecf20Sopenharmony_ci             list_del(&obj->list);
7718c2ecf20Sopenharmony_ci    -        __object_put(obj);
7728c2ecf20Sopenharmony_ci    +        object_put(obj);
7738c2ecf20Sopenharmony_ci             cache_num--;
7748c2ecf20Sopenharmony_ci     }
7758c2ecf20Sopenharmony_ci
7768c2ecf20Sopenharmony_ci    @@ -94,7 +76,7 @@
7778c2ecf20Sopenharmony_ci             strscpy(obj->name, name, sizeof(obj->name));
7788c2ecf20Sopenharmony_ci             obj->id = id;
7798c2ecf20Sopenharmony_ci             obj->popularity = 0;
7808c2ecf20Sopenharmony_ci    -        obj->refcnt = 1; /* The cache holds a reference */
7818c2ecf20Sopenharmony_ci    +        atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
7828c2ecf20Sopenharmony_ci
7838c2ecf20Sopenharmony_ci             spin_lock_irqsave(&cache_lock, flags);
7848c2ecf20Sopenharmony_ci             __cache_add(obj);
7858c2ecf20Sopenharmony_ci    @@ -119,7 +101,7 @@
7868c2ecf20Sopenharmony_ci             spin_lock_irqsave(&cache_lock, flags);
7878c2ecf20Sopenharmony_ci             obj = __cache_find(id);
7888c2ecf20Sopenharmony_ci             if (obj)
7898c2ecf20Sopenharmony_ci    -                __object_get(obj);
7908c2ecf20Sopenharmony_ci    +                object_get(obj);
7918c2ecf20Sopenharmony_ci             spin_unlock_irqrestore(&cache_lock, flags);
7928c2ecf20Sopenharmony_ci             return obj;
7938c2ecf20Sopenharmony_ci     }
7948c2ecf20Sopenharmony_ci
7958c2ecf20Sopenharmony_ciProtecting The Objects Themselves
7968c2ecf20Sopenharmony_ci---------------------------------
7978c2ecf20Sopenharmony_ci
7988c2ecf20Sopenharmony_ciIn these examples, we assumed that the objects (except the reference
7998c2ecf20Sopenharmony_cicounts) never changed once they are created. If we wanted to allow the
8008c2ecf20Sopenharmony_ciname to change, there are three possibilities:
8018c2ecf20Sopenharmony_ci
8028c2ecf20Sopenharmony_ci-  You can make ``cache_lock`` non-static, and tell people to grab that
8038c2ecf20Sopenharmony_ci   lock before changing the name in any object.
8048c2ecf20Sopenharmony_ci
8058c2ecf20Sopenharmony_ci-  You can provide a cache_obj_rename() which grabs this
8068c2ecf20Sopenharmony_ci   lock and changes the name for the caller, and tell everyone to use
8078c2ecf20Sopenharmony_ci   that function.
8088c2ecf20Sopenharmony_ci
8098c2ecf20Sopenharmony_ci-  You can make the ``cache_lock`` protect only the cache itself, and
8108c2ecf20Sopenharmony_ci   use another lock to protect the name.
8118c2ecf20Sopenharmony_ci
8128c2ecf20Sopenharmony_ciTheoretically, you can make the locks as fine-grained as one lock for
8138c2ecf20Sopenharmony_cievery field, for every object. In practice, the most common variants
8148c2ecf20Sopenharmony_ciare:
8158c2ecf20Sopenharmony_ci
8168c2ecf20Sopenharmony_ci-  One lock which protects the infrastructure (the ``cache`` list in
8178c2ecf20Sopenharmony_ci   this example) and all the objects. This is what we have done so far.
8188c2ecf20Sopenharmony_ci
8198c2ecf20Sopenharmony_ci-  One lock which protects the infrastructure (including the list
8208c2ecf20Sopenharmony_ci   pointers inside the objects), and one lock inside the object which
8218c2ecf20Sopenharmony_ci   protects the rest of that object.
8228c2ecf20Sopenharmony_ci
8238c2ecf20Sopenharmony_ci-  Multiple locks to protect the infrastructure (eg. one lock per hash
8248c2ecf20Sopenharmony_ci   chain), possibly with a separate per-object lock.
8258c2ecf20Sopenharmony_ci
8268c2ecf20Sopenharmony_ciHere is the "lock-per-object" implementation:
8278c2ecf20Sopenharmony_ci
8288c2ecf20Sopenharmony_ci::
8298c2ecf20Sopenharmony_ci
8308c2ecf20Sopenharmony_ci    --- cache.c.refcnt-atomic   2003-12-11 15:50:54.000000000 +1100
8318c2ecf20Sopenharmony_ci    +++ cache.c.perobjectlock   2003-12-11 17:15:03.000000000 +1100
8328c2ecf20Sopenharmony_ci    @@ -6,11 +6,17 @@
8338c2ecf20Sopenharmony_ci
8348c2ecf20Sopenharmony_ci     struct object
8358c2ecf20Sopenharmony_ci     {
8368c2ecf20Sopenharmony_ci    +        /* These two protected by cache_lock. */
8378c2ecf20Sopenharmony_ci             struct list_head list;
8388c2ecf20Sopenharmony_ci    +        int popularity;
8398c2ecf20Sopenharmony_ci    +
8408c2ecf20Sopenharmony_ci             atomic_t refcnt;
8418c2ecf20Sopenharmony_ci    +
8428c2ecf20Sopenharmony_ci    +        /* Doesn't change once created. */
8438c2ecf20Sopenharmony_ci             int id;
8448c2ecf20Sopenharmony_ci    +
8458c2ecf20Sopenharmony_ci    +        spinlock_t lock; /* Protects the name */
8468c2ecf20Sopenharmony_ci             char name[32];
8478c2ecf20Sopenharmony_ci    -        int popularity;
8488c2ecf20Sopenharmony_ci     };
8498c2ecf20Sopenharmony_ci
8508c2ecf20Sopenharmony_ci     static DEFINE_SPINLOCK(cache_lock);
8518c2ecf20Sopenharmony_ci    @@ -77,6 +84,7 @@
8528c2ecf20Sopenharmony_ci             obj->id = id;
8538c2ecf20Sopenharmony_ci             obj->popularity = 0;
8548c2ecf20Sopenharmony_ci             atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
8558c2ecf20Sopenharmony_ci    +        spin_lock_init(&obj->lock);
8568c2ecf20Sopenharmony_ci
8578c2ecf20Sopenharmony_ci             spin_lock_irqsave(&cache_lock, flags);
8588c2ecf20Sopenharmony_ci             __cache_add(obj);
8598c2ecf20Sopenharmony_ci
8608c2ecf20Sopenharmony_ciNote that I decide that the popularity count should be protected by the
8618c2ecf20Sopenharmony_ci``cache_lock`` rather than the per-object lock: this is because it (like
8628c2ecf20Sopenharmony_cithe :c:type:`struct list_head <list_head>` inside the object)
8638c2ecf20Sopenharmony_ciis logically part of the infrastructure. This way, I don't need to grab
8648c2ecf20Sopenharmony_cithe lock of every object in __cache_add() when seeking
8658c2ecf20Sopenharmony_cithe least popular.
8668c2ecf20Sopenharmony_ci
8678c2ecf20Sopenharmony_ciI also decided that the id member is unchangeable, so I don't need to
8688c2ecf20Sopenharmony_cigrab each object lock in __cache_find() to examine the
8698c2ecf20Sopenharmony_ciid: the object lock is only used by a caller who wants to read or write
8708c2ecf20Sopenharmony_cithe name field.
8718c2ecf20Sopenharmony_ci
8728c2ecf20Sopenharmony_ciNote also that I added a comment describing what data was protected by
8738c2ecf20Sopenharmony_ciwhich locks. This is extremely important, as it describes the runtime
8748c2ecf20Sopenharmony_cibehavior of the code, and can be hard to gain from just reading. And as
8758c2ecf20Sopenharmony_ciAlan Cox says, “Lock data, not code”.
8768c2ecf20Sopenharmony_ci
8778c2ecf20Sopenharmony_ciCommon Problems
8788c2ecf20Sopenharmony_ci===============
8798c2ecf20Sopenharmony_ci
8808c2ecf20Sopenharmony_ciDeadlock: Simple and Advanced
8818c2ecf20Sopenharmony_ci-----------------------------
8828c2ecf20Sopenharmony_ci
8838c2ecf20Sopenharmony_ciThere is a coding bug where a piece of code tries to grab a spinlock
8848c2ecf20Sopenharmony_citwice: it will spin forever, waiting for the lock to be released
8858c2ecf20Sopenharmony_ci(spinlocks, rwlocks and mutexes are not recursive in Linux). This is
8868c2ecf20Sopenharmony_citrivial to diagnose: not a
8878c2ecf20Sopenharmony_cistay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem.
8888c2ecf20Sopenharmony_ci
8898c2ecf20Sopenharmony_ciFor a slightly more complex case, imagine you have a region shared by a
8908c2ecf20Sopenharmony_cisoftirq and user context. If you use a spin_lock() call
8918c2ecf20Sopenharmony_cito protect it, it is possible that the user context will be interrupted
8928c2ecf20Sopenharmony_ciby the softirq while it holds the lock, and the softirq will then spin
8938c2ecf20Sopenharmony_ciforever trying to get the same lock.
8948c2ecf20Sopenharmony_ci
8958c2ecf20Sopenharmony_ciBoth of these are called deadlock, and as shown above, it can occur even
8968c2ecf20Sopenharmony_ciwith a single CPU (although not on UP compiles, since spinlocks vanish
8978c2ecf20Sopenharmony_cion kernel compiles with ``CONFIG_SMP``\ =n. You'll still get data
8988c2ecf20Sopenharmony_cicorruption in the second example).
8998c2ecf20Sopenharmony_ci
9008c2ecf20Sopenharmony_ciThis complete lockup is easy to diagnose: on SMP boxes the watchdog
9018c2ecf20Sopenharmony_citimer or compiling with ``DEBUG_SPINLOCK`` set
9028c2ecf20Sopenharmony_ci(``include/linux/spinlock.h``) will show this up immediately when it
9038c2ecf20Sopenharmony_cihappens.
9048c2ecf20Sopenharmony_ci
9058c2ecf20Sopenharmony_ciA more complex problem is the so-called 'deadly embrace', involving two
9068c2ecf20Sopenharmony_cior more locks. Say you have a hash table: each entry in the table is a
9078c2ecf20Sopenharmony_cispinlock, and a chain of hashed objects. Inside a softirq handler, you
9088c2ecf20Sopenharmony_cisometimes want to alter an object from one place in the hash to another:
9098c2ecf20Sopenharmony_ciyou grab the spinlock of the old hash chain and the spinlock of the new
9108c2ecf20Sopenharmony_cihash chain, and delete the object from the old one, and insert it in the
9118c2ecf20Sopenharmony_cinew one.
9128c2ecf20Sopenharmony_ci
9138c2ecf20Sopenharmony_ciThere are two problems here. First, if your code ever tries to move the
9148c2ecf20Sopenharmony_ciobject to the same chain, it will deadlock with itself as it tries to
9158c2ecf20Sopenharmony_cilock it twice. Secondly, if the same softirq on another CPU is trying to
9168c2ecf20Sopenharmony_cimove another object in the reverse direction, the following could
9178c2ecf20Sopenharmony_cihappen:
9188c2ecf20Sopenharmony_ci
9198c2ecf20Sopenharmony_ci+-----------------------+-----------------------+
9208c2ecf20Sopenharmony_ci| CPU 1                 | CPU 2                 |
9218c2ecf20Sopenharmony_ci+=======================+=======================+
9228c2ecf20Sopenharmony_ci| Grab lock A -> OK     | Grab lock B -> OK     |
9238c2ecf20Sopenharmony_ci+-----------------------+-----------------------+
9248c2ecf20Sopenharmony_ci| Grab lock B -> spin   | Grab lock A -> spin   |
9258c2ecf20Sopenharmony_ci+-----------------------+-----------------------+
9268c2ecf20Sopenharmony_ci
9278c2ecf20Sopenharmony_ciTable: Consequences
9288c2ecf20Sopenharmony_ci
9298c2ecf20Sopenharmony_ciThe two CPUs will spin forever, waiting for the other to give up their
9308c2ecf20Sopenharmony_cilock. It will look, smell, and feel like a crash.
9318c2ecf20Sopenharmony_ci
9328c2ecf20Sopenharmony_ciPreventing Deadlock
9338c2ecf20Sopenharmony_ci-------------------
9348c2ecf20Sopenharmony_ci
9358c2ecf20Sopenharmony_ciTextbooks will tell you that if you always lock in the same order, you
9368c2ecf20Sopenharmony_ciwill never get this kind of deadlock. Practice will tell you that this
9378c2ecf20Sopenharmony_ciapproach doesn't scale: when I create a new lock, I don't understand
9388c2ecf20Sopenharmony_cienough of the kernel to figure out where in the 5000 lock hierarchy it
9398c2ecf20Sopenharmony_ciwill fit.
9408c2ecf20Sopenharmony_ci
9418c2ecf20Sopenharmony_ciThe best locks are encapsulated: they never get exposed in headers, and
9428c2ecf20Sopenharmony_ciare never held around calls to non-trivial functions outside the same
9438c2ecf20Sopenharmony_cifile. You can read through this code and see that it will never
9448c2ecf20Sopenharmony_cideadlock, because it never tries to grab another lock while it has that
9458c2ecf20Sopenharmony_cione. People using your code don't even need to know you are using a
9468c2ecf20Sopenharmony_cilock.
9478c2ecf20Sopenharmony_ci
9488c2ecf20Sopenharmony_ciA classic problem here is when you provide callbacks or hooks: if you
9498c2ecf20Sopenharmony_cicall these with the lock held, you risk simple deadlock, or a deadly
9508c2ecf20Sopenharmony_ciembrace (who knows what the callback will do?). Remember, the other
9518c2ecf20Sopenharmony_ciprogrammers are out to get you, so don't do this.
9528c2ecf20Sopenharmony_ci
9538c2ecf20Sopenharmony_ciOverzealous Prevention Of Deadlocks
9548c2ecf20Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9558c2ecf20Sopenharmony_ci
9568c2ecf20Sopenharmony_ciDeadlocks are problematic, but not as bad as data corruption. Code which
9578c2ecf20Sopenharmony_cigrabs a read lock, searches a list, fails to find what it wants, drops
9588c2ecf20Sopenharmony_cithe read lock, grabs a write lock and inserts the object has a race
9598c2ecf20Sopenharmony_cicondition.
9608c2ecf20Sopenharmony_ci
9618c2ecf20Sopenharmony_ciIf you don't see why, please stay the fuck away from my code.
9628c2ecf20Sopenharmony_ci
9638c2ecf20Sopenharmony_ciRacing Timers: A Kernel Pastime
9648c2ecf20Sopenharmony_ci-------------------------------
9658c2ecf20Sopenharmony_ci
9668c2ecf20Sopenharmony_ciTimers can produce their own special problems with races. Consider a
9678c2ecf20Sopenharmony_cicollection of objects (list, hash, etc) where each object has a timer
9688c2ecf20Sopenharmony_ciwhich is due to destroy it.
9698c2ecf20Sopenharmony_ci
9708c2ecf20Sopenharmony_ciIf you want to destroy the entire collection (say on module removal),
9718c2ecf20Sopenharmony_ciyou might do the following::
9728c2ecf20Sopenharmony_ci
9738c2ecf20Sopenharmony_ci            /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE
9748c2ecf20Sopenharmony_ci               HUNGARIAN NOTATION */
9758c2ecf20Sopenharmony_ci            spin_lock_bh(&list_lock);
9768c2ecf20Sopenharmony_ci
9778c2ecf20Sopenharmony_ci            while (list) {
9788c2ecf20Sopenharmony_ci                    struct foo *next = list->next;
9798c2ecf20Sopenharmony_ci                    del_timer(&list->timer);
9808c2ecf20Sopenharmony_ci                    kfree(list);
9818c2ecf20Sopenharmony_ci                    list = next;
9828c2ecf20Sopenharmony_ci            }
9838c2ecf20Sopenharmony_ci
9848c2ecf20Sopenharmony_ci            spin_unlock_bh(&list_lock);
9858c2ecf20Sopenharmony_ci
9868c2ecf20Sopenharmony_ci
9878c2ecf20Sopenharmony_ciSooner or later, this will crash on SMP, because a timer can have just
9888c2ecf20Sopenharmony_cigone off before the spin_lock_bh(), and it will only get
9898c2ecf20Sopenharmony_cithe lock after we spin_unlock_bh(), and then try to free
9908c2ecf20Sopenharmony_cithe element (which has already been freed!).
9918c2ecf20Sopenharmony_ci
9928c2ecf20Sopenharmony_ciThis can be avoided by checking the result of
9938c2ecf20Sopenharmony_cidel_timer(): if it returns 1, the timer has been deleted.
9948c2ecf20Sopenharmony_ciIf 0, it means (in this case) that it is currently running, so we can
9958c2ecf20Sopenharmony_cido::
9968c2ecf20Sopenharmony_ci
9978c2ecf20Sopenharmony_ci            retry:
9988c2ecf20Sopenharmony_ci                    spin_lock_bh(&list_lock);
9998c2ecf20Sopenharmony_ci
10008c2ecf20Sopenharmony_ci                    while (list) {
10018c2ecf20Sopenharmony_ci                            struct foo *next = list->next;
10028c2ecf20Sopenharmony_ci                            if (!del_timer(&list->timer)) {
10038c2ecf20Sopenharmony_ci                                    /* Give timer a chance to delete this */
10048c2ecf20Sopenharmony_ci                                    spin_unlock_bh(&list_lock);
10058c2ecf20Sopenharmony_ci                                    goto retry;
10068c2ecf20Sopenharmony_ci                            }
10078c2ecf20Sopenharmony_ci                            kfree(list);
10088c2ecf20Sopenharmony_ci                            list = next;
10098c2ecf20Sopenharmony_ci                    }
10108c2ecf20Sopenharmony_ci
10118c2ecf20Sopenharmony_ci                    spin_unlock_bh(&list_lock);
10128c2ecf20Sopenharmony_ci
10138c2ecf20Sopenharmony_ci
10148c2ecf20Sopenharmony_ciAnother common problem is deleting timers which restart themselves (by
10158c2ecf20Sopenharmony_cicalling add_timer() at the end of their timer function).
10168c2ecf20Sopenharmony_ciBecause this is a fairly common case which is prone to races, you should
10178c2ecf20Sopenharmony_ciuse del_timer_sync() (``include/linux/timer.h``) to
10188c2ecf20Sopenharmony_cihandle this case. It returns the number of times the timer had to be
10198c2ecf20Sopenharmony_cideleted before we finally stopped it from adding itself back in.
10208c2ecf20Sopenharmony_ci
10218c2ecf20Sopenharmony_ciLocking Speed
10228c2ecf20Sopenharmony_ci=============
10238c2ecf20Sopenharmony_ci
10248c2ecf20Sopenharmony_ciThere are three main things to worry about when considering speed of
10258c2ecf20Sopenharmony_cisome code which does locking. First is concurrency: how many things are
10268c2ecf20Sopenharmony_cigoing to be waiting while someone else is holding a lock. Second is the
10278c2ecf20Sopenharmony_citime taken to actually acquire and release an uncontended lock. Third is
10288c2ecf20Sopenharmony_ciusing fewer, or smarter locks. I'm assuming that the lock is used fairly
10298c2ecf20Sopenharmony_cioften: otherwise, you wouldn't be concerned about efficiency.
10308c2ecf20Sopenharmony_ci
10318c2ecf20Sopenharmony_ciConcurrency depends on how long the lock is usually held: you should
10328c2ecf20Sopenharmony_cihold the lock for as long as needed, but no longer. In the cache
10338c2ecf20Sopenharmony_ciexample, we always create the object without the lock held, and then
10348c2ecf20Sopenharmony_cigrab the lock only when we are ready to insert it in the list.
10358c2ecf20Sopenharmony_ci
10368c2ecf20Sopenharmony_ciAcquisition times depend on how much damage the lock operations do to
10378c2ecf20Sopenharmony_cithe pipeline (pipeline stalls) and how likely it is that this CPU was
10388c2ecf20Sopenharmony_cithe last one to grab the lock (ie. is the lock cache-hot for this CPU):
10398c2ecf20Sopenharmony_cion a machine with more CPUs, this likelihood drops fast. Consider a
10408c2ecf20Sopenharmony_ci700MHz Intel Pentium III: an instruction takes about 0.7ns, an atomic
10418c2ecf20Sopenharmony_ciincrement takes about 58ns, a lock which is cache-hot on this CPU takes
10428c2ecf20Sopenharmony_ci160ns, and a cacheline transfer from another CPU takes an additional 170
10438c2ecf20Sopenharmony_cito 360ns. (These figures from Paul McKenney's `Linux Journal RCU
10448c2ecf20Sopenharmony_ciarticle <http://www.linuxjournal.com/article.php?sid=6993>`__).
10458c2ecf20Sopenharmony_ci
10468c2ecf20Sopenharmony_ciThese two aims conflict: holding a lock for a short time might be done
10478c2ecf20Sopenharmony_ciby splitting locks into parts (such as in our final per-object-lock
10488c2ecf20Sopenharmony_ciexample), but this increases the number of lock acquisitions, and the
10498c2ecf20Sopenharmony_ciresults are often slower than having a single lock. This is another
10508c2ecf20Sopenharmony_cireason to advocate locking simplicity.
10518c2ecf20Sopenharmony_ci
10528c2ecf20Sopenharmony_ciThe third concern is addressed below: there are some methods to reduce
10538c2ecf20Sopenharmony_cithe amount of locking which needs to be done.
10548c2ecf20Sopenharmony_ci
10558c2ecf20Sopenharmony_ciRead/Write Lock Variants
10568c2ecf20Sopenharmony_ci------------------------
10578c2ecf20Sopenharmony_ci
10588c2ecf20Sopenharmony_ciBoth spinlocks and mutexes have read/write variants: ``rwlock_t`` and
10598c2ecf20Sopenharmony_ci:c:type:`struct rw_semaphore <rw_semaphore>`. These divide
10608c2ecf20Sopenharmony_ciusers into two classes: the readers and the writers. If you are only
10618c2ecf20Sopenharmony_cireading the data, you can get a read lock, but to write to the data you
10628c2ecf20Sopenharmony_cineed the write lock. Many people can hold a read lock, but a writer must
10638c2ecf20Sopenharmony_cibe sole holder.
10648c2ecf20Sopenharmony_ci
10658c2ecf20Sopenharmony_ciIf your code divides neatly along reader/writer lines (as our cache code
10668c2ecf20Sopenharmony_cidoes), and the lock is held by readers for significant lengths of time,
10678c2ecf20Sopenharmony_ciusing these locks can help. They are slightly slower than the normal
10688c2ecf20Sopenharmony_cilocks though, so in practice ``rwlock_t`` is not usually worthwhile.
10698c2ecf20Sopenharmony_ci
10708c2ecf20Sopenharmony_ciAvoiding Locks: Read Copy Update
10718c2ecf20Sopenharmony_ci--------------------------------
10728c2ecf20Sopenharmony_ci
10738c2ecf20Sopenharmony_ciThere is a special method of read/write locking called Read Copy Update.
10748c2ecf20Sopenharmony_ciUsing RCU, the readers can avoid taking a lock altogether: as we expect
10758c2ecf20Sopenharmony_ciour cache to be read more often than updated (otherwise the cache is a
10768c2ecf20Sopenharmony_ciwaste of time), it is a candidate for this optimization.
10778c2ecf20Sopenharmony_ci
10788c2ecf20Sopenharmony_ciHow do we get rid of read locks? Getting rid of read locks means that
10798c2ecf20Sopenharmony_ciwriters may be changing the list underneath the readers. That is
10808c2ecf20Sopenharmony_ciactually quite simple: we can read a linked list while an element is
10818c2ecf20Sopenharmony_cibeing added if the writer adds the element very carefully. For example,
10828c2ecf20Sopenharmony_ciadding ``new`` to a single linked list called ``list``::
10838c2ecf20Sopenharmony_ci
10848c2ecf20Sopenharmony_ci            new->next = list->next;
10858c2ecf20Sopenharmony_ci            wmb();
10868c2ecf20Sopenharmony_ci            list->next = new;
10878c2ecf20Sopenharmony_ci
10888c2ecf20Sopenharmony_ci
10898c2ecf20Sopenharmony_ciThe wmb() is a write memory barrier. It ensures that the
10908c2ecf20Sopenharmony_cifirst operation (setting the new element's ``next`` pointer) is complete
10918c2ecf20Sopenharmony_ciand will be seen by all CPUs, before the second operation is (putting
10928c2ecf20Sopenharmony_cithe new element into the list). This is important, since modern
10938c2ecf20Sopenharmony_cicompilers and modern CPUs can both reorder instructions unless told
10948c2ecf20Sopenharmony_ciotherwise: we want a reader to either not see the new element at all, or
10958c2ecf20Sopenharmony_cisee the new element with the ``next`` pointer correctly pointing at the
10968c2ecf20Sopenharmony_cirest of the list.
10978c2ecf20Sopenharmony_ci
10988c2ecf20Sopenharmony_ciFortunately, there is a function to do this for standard
10998c2ecf20Sopenharmony_ci:c:type:`struct list_head <list_head>` lists:
11008c2ecf20Sopenharmony_cilist_add_rcu() (``include/linux/list.h``).
11018c2ecf20Sopenharmony_ci
11028c2ecf20Sopenharmony_ciRemoving an element from the list is even simpler: we replace the
11038c2ecf20Sopenharmony_cipointer to the old element with a pointer to its successor, and readers
11048c2ecf20Sopenharmony_ciwill either see it, or skip over it.
11058c2ecf20Sopenharmony_ci
11068c2ecf20Sopenharmony_ci::
11078c2ecf20Sopenharmony_ci
11088c2ecf20Sopenharmony_ci            list->next = old->next;
11098c2ecf20Sopenharmony_ci
11108c2ecf20Sopenharmony_ci
11118c2ecf20Sopenharmony_ciThere is list_del_rcu() (``include/linux/list.h``) which
11128c2ecf20Sopenharmony_cidoes this (the normal version poisons the old object, which we don't
11138c2ecf20Sopenharmony_ciwant).
11148c2ecf20Sopenharmony_ci
11158c2ecf20Sopenharmony_ciThe reader must also be careful: some CPUs can look through the ``next``
11168c2ecf20Sopenharmony_cipointer to start reading the contents of the next element early, but
11178c2ecf20Sopenharmony_cidon't realize that the pre-fetched contents is wrong when the ``next``
11188c2ecf20Sopenharmony_cipointer changes underneath them. Once again, there is a
11198c2ecf20Sopenharmony_cilist_for_each_entry_rcu() (``include/linux/list.h``)
11208c2ecf20Sopenharmony_cito help you. Of course, writers can just use
11218c2ecf20Sopenharmony_cilist_for_each_entry(), since there cannot be two
11228c2ecf20Sopenharmony_cisimultaneous writers.
11238c2ecf20Sopenharmony_ci
11248c2ecf20Sopenharmony_ciOur final dilemma is this: when can we actually destroy the removed
11258c2ecf20Sopenharmony_cielement? Remember, a reader might be stepping through this element in
11268c2ecf20Sopenharmony_cithe list right now: if we free this element and the ``next`` pointer
11278c2ecf20Sopenharmony_cichanges, the reader will jump off into garbage and crash. We need to
11288c2ecf20Sopenharmony_ciwait until we know that all the readers who were traversing the list
11298c2ecf20Sopenharmony_ciwhen we deleted the element are finished. We use
11308c2ecf20Sopenharmony_cicall_rcu() to register a callback which will actually
11318c2ecf20Sopenharmony_cidestroy the object once all pre-existing readers are finished.
11328c2ecf20Sopenharmony_ciAlternatively, synchronize_rcu() may be used to block
11338c2ecf20Sopenharmony_ciuntil all pre-existing are finished.
11348c2ecf20Sopenharmony_ci
11358c2ecf20Sopenharmony_ciBut how does Read Copy Update know when the readers are finished? The
11368c2ecf20Sopenharmony_cimethod is this: firstly, the readers always traverse the list inside
11378c2ecf20Sopenharmony_circu_read_lock()/rcu_read_unlock() pairs:
11388c2ecf20Sopenharmony_cithese simply disable preemption so the reader won't go to sleep while
11398c2ecf20Sopenharmony_cireading the list.
11408c2ecf20Sopenharmony_ci
11418c2ecf20Sopenharmony_ciRCU then waits until every other CPU has slept at least once: since
11428c2ecf20Sopenharmony_cireaders cannot sleep, we know that any readers which were traversing the
11438c2ecf20Sopenharmony_cilist during the deletion are finished, and the callback is triggered.
11448c2ecf20Sopenharmony_ciThe real Read Copy Update code is a little more optimized than this, but
11458c2ecf20Sopenharmony_cithis is the fundamental idea.
11468c2ecf20Sopenharmony_ci
11478c2ecf20Sopenharmony_ci::
11488c2ecf20Sopenharmony_ci
11498c2ecf20Sopenharmony_ci    --- cache.c.perobjectlock   2003-12-11 17:15:03.000000000 +1100
11508c2ecf20Sopenharmony_ci    +++ cache.c.rcupdate    2003-12-11 17:55:14.000000000 +1100
11518c2ecf20Sopenharmony_ci    @@ -1,15 +1,18 @@
11528c2ecf20Sopenharmony_ci     #include <linux/list.h>
11538c2ecf20Sopenharmony_ci     #include <linux/slab.h>
11548c2ecf20Sopenharmony_ci     #include <linux/string.h>
11558c2ecf20Sopenharmony_ci    +#include <linux/rcupdate.h>
11568c2ecf20Sopenharmony_ci     #include <linux/mutex.h>
11578c2ecf20Sopenharmony_ci     #include <asm/errno.h>
11588c2ecf20Sopenharmony_ci
11598c2ecf20Sopenharmony_ci     struct object
11608c2ecf20Sopenharmony_ci     {
11618c2ecf20Sopenharmony_ci    -        /* These two protected by cache_lock. */
11628c2ecf20Sopenharmony_ci    +        /* This is protected by RCU */
11638c2ecf20Sopenharmony_ci             struct list_head list;
11648c2ecf20Sopenharmony_ci             int popularity;
11658c2ecf20Sopenharmony_ci
11668c2ecf20Sopenharmony_ci    +        struct rcu_head rcu;
11678c2ecf20Sopenharmony_ci    +
11688c2ecf20Sopenharmony_ci             atomic_t refcnt;
11698c2ecf20Sopenharmony_ci
11708c2ecf20Sopenharmony_ci             /* Doesn't change once created. */
11718c2ecf20Sopenharmony_ci    @@ -40,7 +43,7 @@
11728c2ecf20Sopenharmony_ci     {
11738c2ecf20Sopenharmony_ci             struct object *i;
11748c2ecf20Sopenharmony_ci
11758c2ecf20Sopenharmony_ci    -        list_for_each_entry(i, &cache, list) {
11768c2ecf20Sopenharmony_ci    +        list_for_each_entry_rcu(i, &cache, list) {
11778c2ecf20Sopenharmony_ci                     if (i->id == id) {
11788c2ecf20Sopenharmony_ci                             i->popularity++;
11798c2ecf20Sopenharmony_ci                             return i;
11808c2ecf20Sopenharmony_ci    @@ -49,19 +52,25 @@
11818c2ecf20Sopenharmony_ci             return NULL;
11828c2ecf20Sopenharmony_ci     }
11838c2ecf20Sopenharmony_ci
11848c2ecf20Sopenharmony_ci    +/* Final discard done once we know no readers are looking. */
11858c2ecf20Sopenharmony_ci    +static void cache_delete_rcu(void *arg)
11868c2ecf20Sopenharmony_ci    +{
11878c2ecf20Sopenharmony_ci    +        object_put(arg);
11888c2ecf20Sopenharmony_ci    +}
11898c2ecf20Sopenharmony_ci    +
11908c2ecf20Sopenharmony_ci     /* Must be holding cache_lock */
11918c2ecf20Sopenharmony_ci     static void __cache_delete(struct object *obj)
11928c2ecf20Sopenharmony_ci     {
11938c2ecf20Sopenharmony_ci             BUG_ON(!obj);
11948c2ecf20Sopenharmony_ci    -        list_del(&obj->list);
11958c2ecf20Sopenharmony_ci    -        object_put(obj);
11968c2ecf20Sopenharmony_ci    +        list_del_rcu(&obj->list);
11978c2ecf20Sopenharmony_ci             cache_num--;
11988c2ecf20Sopenharmony_ci    +        call_rcu(&obj->rcu, cache_delete_rcu);
11998c2ecf20Sopenharmony_ci     }
12008c2ecf20Sopenharmony_ci
12018c2ecf20Sopenharmony_ci     /* Must be holding cache_lock */
12028c2ecf20Sopenharmony_ci     static void __cache_add(struct object *obj)
12038c2ecf20Sopenharmony_ci     {
12048c2ecf20Sopenharmony_ci    -        list_add(&obj->list, &cache);
12058c2ecf20Sopenharmony_ci    +        list_add_rcu(&obj->list, &cache);
12068c2ecf20Sopenharmony_ci             if (++cache_num > MAX_CACHE_SIZE) {
12078c2ecf20Sopenharmony_ci                     struct object *i, *outcast = NULL;
12088c2ecf20Sopenharmony_ci                     list_for_each_entry(i, &cache, list) {
12098c2ecf20Sopenharmony_ci    @@ -104,12 +114,11 @@
12108c2ecf20Sopenharmony_ci     struct object *cache_find(int id)
12118c2ecf20Sopenharmony_ci     {
12128c2ecf20Sopenharmony_ci             struct object *obj;
12138c2ecf20Sopenharmony_ci    -        unsigned long flags;
12148c2ecf20Sopenharmony_ci
12158c2ecf20Sopenharmony_ci    -        spin_lock_irqsave(&cache_lock, flags);
12168c2ecf20Sopenharmony_ci    +        rcu_read_lock();
12178c2ecf20Sopenharmony_ci             obj = __cache_find(id);
12188c2ecf20Sopenharmony_ci             if (obj)
12198c2ecf20Sopenharmony_ci                     object_get(obj);
12208c2ecf20Sopenharmony_ci    -        spin_unlock_irqrestore(&cache_lock, flags);
12218c2ecf20Sopenharmony_ci    +        rcu_read_unlock();
12228c2ecf20Sopenharmony_ci             return obj;
12238c2ecf20Sopenharmony_ci     }
12248c2ecf20Sopenharmony_ci
12258c2ecf20Sopenharmony_ciNote that the reader will alter the popularity member in
12268c2ecf20Sopenharmony_ci__cache_find(), and now it doesn't hold a lock. One
12278c2ecf20Sopenharmony_cisolution would be to make it an ``atomic_t``, but for this usage, we
12288c2ecf20Sopenharmony_cidon't really care about races: an approximate result is good enough, so
12298c2ecf20Sopenharmony_ciI didn't change it.
12308c2ecf20Sopenharmony_ci
12318c2ecf20Sopenharmony_ciThe result is that cache_find() requires no
12328c2ecf20Sopenharmony_cisynchronization with any other functions, so is almost as fast on SMP as
12338c2ecf20Sopenharmony_ciit would be on UP.
12348c2ecf20Sopenharmony_ci
12358c2ecf20Sopenharmony_ciThere is a further optimization possible here: remember our original
12368c2ecf20Sopenharmony_cicache code, where there were no reference counts and the caller simply
12378c2ecf20Sopenharmony_ciheld the lock whenever using the object? This is still possible: if you
12388c2ecf20Sopenharmony_cihold the lock, no one can delete the object, so you don't need to get
12398c2ecf20Sopenharmony_ciand put the reference count.
12408c2ecf20Sopenharmony_ci
12418c2ecf20Sopenharmony_ciNow, because the 'read lock' in RCU is simply disabling preemption, a
12428c2ecf20Sopenharmony_cicaller which always has preemption disabled between calling
12438c2ecf20Sopenharmony_cicache_find() and object_put() does not
12448c2ecf20Sopenharmony_cineed to actually get and put the reference count: we could expose
12458c2ecf20Sopenharmony_ci__cache_find() by making it non-static, and such
12468c2ecf20Sopenharmony_cicallers could simply call that.
12478c2ecf20Sopenharmony_ci
12488c2ecf20Sopenharmony_ciThe benefit here is that the reference count is not written to: the
12498c2ecf20Sopenharmony_ciobject is not altered in any way, which is much faster on SMP machines
12508c2ecf20Sopenharmony_cidue to caching.
12518c2ecf20Sopenharmony_ci
12528c2ecf20Sopenharmony_ciPer-CPU Data
12538c2ecf20Sopenharmony_ci------------
12548c2ecf20Sopenharmony_ci
12558c2ecf20Sopenharmony_ciAnother technique for avoiding locking which is used fairly widely is to
12568c2ecf20Sopenharmony_ciduplicate information for each CPU. For example, if you wanted to keep a
12578c2ecf20Sopenharmony_cicount of a common condition, you could use a spin lock and a single
12588c2ecf20Sopenharmony_cicounter. Nice and simple.
12598c2ecf20Sopenharmony_ci
12608c2ecf20Sopenharmony_ciIf that was too slow (it's usually not, but if you've got a really big
12618c2ecf20Sopenharmony_cimachine to test on and can show that it is), you could instead use a
12628c2ecf20Sopenharmony_cicounter for each CPU, then none of them need an exclusive lock. See
12638c2ecf20Sopenharmony_ciDEFINE_PER_CPU(), get_cpu_var() and
12648c2ecf20Sopenharmony_ciput_cpu_var() (``include/linux/percpu.h``).
12658c2ecf20Sopenharmony_ci
12668c2ecf20Sopenharmony_ciOf particular use for simple per-cpu counters is the ``local_t`` type,
12678c2ecf20Sopenharmony_ciand the cpu_local_inc() and related functions, which are
12688c2ecf20Sopenharmony_cimore efficient than simple code on some architectures
12698c2ecf20Sopenharmony_ci(``include/asm/local.h``).
12708c2ecf20Sopenharmony_ci
12718c2ecf20Sopenharmony_ciNote that there is no simple, reliable way of getting an exact value of
12728c2ecf20Sopenharmony_cisuch a counter, without introducing more locks. This is not a problem
12738c2ecf20Sopenharmony_cifor some uses.
12748c2ecf20Sopenharmony_ci
12758c2ecf20Sopenharmony_ciData Which Mostly Used By An IRQ Handler
12768c2ecf20Sopenharmony_ci----------------------------------------
12778c2ecf20Sopenharmony_ci
12788c2ecf20Sopenharmony_ciIf data is always accessed from within the same IRQ handler, you don't
12798c2ecf20Sopenharmony_cineed a lock at all: the kernel already guarantees that the irq handler
12808c2ecf20Sopenharmony_ciwill not run simultaneously on multiple CPUs.
12818c2ecf20Sopenharmony_ci
12828c2ecf20Sopenharmony_ciManfred Spraul points out that you can still do this, even if the data
12838c2ecf20Sopenharmony_ciis very occasionally accessed in user context or softirqs/tasklets. The
12848c2ecf20Sopenharmony_ciirq handler doesn't use a lock, and all other accesses are done as so::
12858c2ecf20Sopenharmony_ci
12868c2ecf20Sopenharmony_ci        spin_lock(&lock);
12878c2ecf20Sopenharmony_ci        disable_irq(irq);
12888c2ecf20Sopenharmony_ci        ...
12898c2ecf20Sopenharmony_ci        enable_irq(irq);
12908c2ecf20Sopenharmony_ci        spin_unlock(&lock);
12918c2ecf20Sopenharmony_ci
12928c2ecf20Sopenharmony_ciThe disable_irq() prevents the irq handler from running
12938c2ecf20Sopenharmony_ci(and waits for it to finish if it's currently running on other CPUs).
12948c2ecf20Sopenharmony_ciThe spinlock prevents any other accesses happening at the same time.
12958c2ecf20Sopenharmony_ciNaturally, this is slower than just a spin_lock_irq()
12968c2ecf20Sopenharmony_cicall, so it only makes sense if this type of access happens extremely
12978c2ecf20Sopenharmony_cirarely.
12988c2ecf20Sopenharmony_ci
12998c2ecf20Sopenharmony_ciWhat Functions Are Safe To Call From Interrupts?
13008c2ecf20Sopenharmony_ci================================================
13018c2ecf20Sopenharmony_ci
13028c2ecf20Sopenharmony_ciMany functions in the kernel sleep (ie. call schedule()) directly or
13038c2ecf20Sopenharmony_ciindirectly: you can never call them while holding a spinlock, or with
13048c2ecf20Sopenharmony_cipreemption disabled. This also means you need to be in user context:
13058c2ecf20Sopenharmony_cicalling them from an interrupt is illegal.
13068c2ecf20Sopenharmony_ci
13078c2ecf20Sopenharmony_ciSome Functions Which Sleep
13088c2ecf20Sopenharmony_ci--------------------------
13098c2ecf20Sopenharmony_ci
13108c2ecf20Sopenharmony_ciThe most common ones are listed below, but you usually have to read the
13118c2ecf20Sopenharmony_cicode to find out if other calls are safe. If everyone else who calls it
13128c2ecf20Sopenharmony_cican sleep, you probably need to be able to sleep, too. In particular,
13138c2ecf20Sopenharmony_ciregistration and deregistration functions usually expect to be called
13148c2ecf20Sopenharmony_cifrom user context, and can sleep.
13158c2ecf20Sopenharmony_ci
13168c2ecf20Sopenharmony_ci-  Accesses to userspace:
13178c2ecf20Sopenharmony_ci
13188c2ecf20Sopenharmony_ci   -  copy_from_user()
13198c2ecf20Sopenharmony_ci
13208c2ecf20Sopenharmony_ci   -  copy_to_user()
13218c2ecf20Sopenharmony_ci
13228c2ecf20Sopenharmony_ci   -  get_user()
13238c2ecf20Sopenharmony_ci
13248c2ecf20Sopenharmony_ci   -  put_user()
13258c2ecf20Sopenharmony_ci
13268c2ecf20Sopenharmony_ci-  kmalloc(GP_KERNEL) <kmalloc>`
13278c2ecf20Sopenharmony_ci
13288c2ecf20Sopenharmony_ci-  mutex_lock_interruptible() and
13298c2ecf20Sopenharmony_ci   mutex_lock()
13308c2ecf20Sopenharmony_ci
13318c2ecf20Sopenharmony_ci   There is a mutex_trylock() which does not sleep.
13328c2ecf20Sopenharmony_ci   Still, it must not be used inside interrupt context since its
13338c2ecf20Sopenharmony_ci   implementation is not safe for that. mutex_unlock()
13348c2ecf20Sopenharmony_ci   will also never sleep. It cannot be used in interrupt context either
13358c2ecf20Sopenharmony_ci   since a mutex must be released by the same task that acquired it.
13368c2ecf20Sopenharmony_ci
13378c2ecf20Sopenharmony_ciSome Functions Which Don't Sleep
13388c2ecf20Sopenharmony_ci--------------------------------
13398c2ecf20Sopenharmony_ci
13408c2ecf20Sopenharmony_ciSome functions are safe to call from any context, or holding almost any
13418c2ecf20Sopenharmony_cilock.
13428c2ecf20Sopenharmony_ci
13438c2ecf20Sopenharmony_ci-  printk()
13448c2ecf20Sopenharmony_ci
13458c2ecf20Sopenharmony_ci-  kfree()
13468c2ecf20Sopenharmony_ci
13478c2ecf20Sopenharmony_ci-  add_timer() and del_timer()
13488c2ecf20Sopenharmony_ci
13498c2ecf20Sopenharmony_ciMutex API reference
13508c2ecf20Sopenharmony_ci===================
13518c2ecf20Sopenharmony_ci
13528c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/mutex.h
13538c2ecf20Sopenharmony_ci   :internal:
13548c2ecf20Sopenharmony_ci
13558c2ecf20Sopenharmony_ci.. kernel-doc:: kernel/locking/mutex.c
13568c2ecf20Sopenharmony_ci   :export:
13578c2ecf20Sopenharmony_ci
13588c2ecf20Sopenharmony_ciFutex API reference
13598c2ecf20Sopenharmony_ci===================
13608c2ecf20Sopenharmony_ci
13618c2ecf20Sopenharmony_ci.. kernel-doc:: kernel/futex/core.c
13628c2ecf20Sopenharmony_ci   :internal:
13638c2ecf20Sopenharmony_ci
13648c2ecf20Sopenharmony_ciFurther reading
13658c2ecf20Sopenharmony_ci===============
13668c2ecf20Sopenharmony_ci
13678c2ecf20Sopenharmony_ci-  ``Documentation/locking/spinlocks.rst``: Linus Torvalds' spinlocking
13688c2ecf20Sopenharmony_ci   tutorial in the kernel sources.
13698c2ecf20Sopenharmony_ci
13708c2ecf20Sopenharmony_ci-  Unix Systems for Modern Architectures: Symmetric Multiprocessing and
13718c2ecf20Sopenharmony_ci   Caching for Kernel Programmers:
13728c2ecf20Sopenharmony_ci
13738c2ecf20Sopenharmony_ci   Curt Schimmel's very good introduction to kernel level locking (not
13748c2ecf20Sopenharmony_ci   written for Linux, but nearly everything applies). The book is
13758c2ecf20Sopenharmony_ci   expensive, but really worth every penny to understand SMP locking.
13768c2ecf20Sopenharmony_ci   [ISBN: 0201633388]
13778c2ecf20Sopenharmony_ci
13788c2ecf20Sopenharmony_ciThanks
13798c2ecf20Sopenharmony_ci======
13808c2ecf20Sopenharmony_ci
13818c2ecf20Sopenharmony_ciThanks to Telsa Gwynne for DocBooking, neatening and adding style.
13828c2ecf20Sopenharmony_ci
13838c2ecf20Sopenharmony_ciThanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul Mackerras,
13848c2ecf20Sopenharmony_ciRuedi Aschwanden, Alan Cox, Manfred Spraul, Tim Waugh, Pete Zaitcev,
13858c2ecf20Sopenharmony_ciJames Morris, Robert Love, Paul McKenney, John Ashby for proofreading,
13868c2ecf20Sopenharmony_cicorrecting, flaming, commenting.
13878c2ecf20Sopenharmony_ci
13888c2ecf20Sopenharmony_ciThanks to the cabal for having no influence on this document.
13898c2ecf20Sopenharmony_ci
13908c2ecf20Sopenharmony_ciGlossary
13918c2ecf20Sopenharmony_ci========
13928c2ecf20Sopenharmony_ci
13938c2ecf20Sopenharmony_cipreemption
13948c2ecf20Sopenharmony_ci  Prior to 2.5, or when ``CONFIG_PREEMPT`` is unset, processes in user
13958c2ecf20Sopenharmony_ci  context inside the kernel would not preempt each other (ie. you had that
13968c2ecf20Sopenharmony_ci  CPU until you gave it up, except for interrupts). With the addition of
13978c2ecf20Sopenharmony_ci  ``CONFIG_PREEMPT`` in 2.5.4, this changed: when in user context, higher
13988c2ecf20Sopenharmony_ci  priority tasks can "cut in": spinlocks were changed to disable
13998c2ecf20Sopenharmony_ci  preemption, even on UP.
14008c2ecf20Sopenharmony_ci
14018c2ecf20Sopenharmony_cibh
14028c2ecf20Sopenharmony_ci  Bottom Half: for historical reasons, functions with '_bh' in them often
14038c2ecf20Sopenharmony_ci  now refer to any software interrupt, e.g. spin_lock_bh()
14048c2ecf20Sopenharmony_ci  blocks any software interrupt on the current CPU. Bottom halves are
14058c2ecf20Sopenharmony_ci  deprecated, and will eventually be replaced by tasklets. Only one bottom
14068c2ecf20Sopenharmony_ci  half will be running at any time.
14078c2ecf20Sopenharmony_ci
14088c2ecf20Sopenharmony_ciHardware Interrupt / Hardware IRQ
14098c2ecf20Sopenharmony_ci  Hardware interrupt request. in_irq() returns true in a
14108c2ecf20Sopenharmony_ci  hardware interrupt handler.
14118c2ecf20Sopenharmony_ci
14128c2ecf20Sopenharmony_ciInterrupt Context
14138c2ecf20Sopenharmony_ci  Not user context: processing a hardware irq or software irq. Indicated
14148c2ecf20Sopenharmony_ci  by the in_interrupt() macro returning true.
14158c2ecf20Sopenharmony_ci
14168c2ecf20Sopenharmony_ciSMP
14178c2ecf20Sopenharmony_ci  Symmetric Multi-Processor: kernels compiled for multiple-CPU machines.
14188c2ecf20Sopenharmony_ci  (``CONFIG_SMP=y``).
14198c2ecf20Sopenharmony_ci
14208c2ecf20Sopenharmony_ciSoftware Interrupt / softirq
14218c2ecf20Sopenharmony_ci  Software interrupt handler. in_irq() returns false;
14228c2ecf20Sopenharmony_ci  in_softirq() returns true. Tasklets and softirqs both
14238c2ecf20Sopenharmony_ci  fall into the category of 'software interrupts'.
14248c2ecf20Sopenharmony_ci
14258c2ecf20Sopenharmony_ci  Strictly speaking a softirq is one of up to 32 enumerated software
14268c2ecf20Sopenharmony_ci  interrupts which can run on multiple CPUs at once. Sometimes used to
14278c2ecf20Sopenharmony_ci  refer to tasklets as well (ie. all software interrupts).
14288c2ecf20Sopenharmony_ci
14298c2ecf20Sopenharmony_citasklet
14308c2ecf20Sopenharmony_ci  A dynamically-registrable software interrupt, which is guaranteed to
14318c2ecf20Sopenharmony_ci  only run on one CPU at a time.
14328c2ecf20Sopenharmony_ci
14338c2ecf20Sopenharmony_citimer
14348c2ecf20Sopenharmony_ci  A dynamically-registrable software interrupt, which is run at (or close
14358c2ecf20Sopenharmony_ci  to) a given time. When running, it is just like a tasklet (in fact, they
14368c2ecf20Sopenharmony_ci  are called from the ``TIMER_SOFTIRQ``).
14378c2ecf20Sopenharmony_ci
14388c2ecf20Sopenharmony_ciUP
14398c2ecf20Sopenharmony_ci  Uni-Processor: Non-SMP. (``CONFIG_SMP=n``).
14408c2ecf20Sopenharmony_ci
14418c2ecf20Sopenharmony_ciUser Context
14428c2ecf20Sopenharmony_ci  The kernel executing on behalf of a particular process (ie. a system
14438c2ecf20Sopenharmony_ci  call or trap) or kernel thread. You can tell which process with the
14448c2ecf20Sopenharmony_ci  ``current`` macro.) Not to be confused with userspace. Can be
14458c2ecf20Sopenharmony_ci  interrupted by software or hardware interrupts.
14468c2ecf20Sopenharmony_ci
14478c2ecf20Sopenharmony_ciUserspace
14488c2ecf20Sopenharmony_ci  A process executing its own code outside the kernel.
1449