18c2ecf20Sopenharmony_ci======================================================= 28c2ecf20Sopenharmony_ciSemantics and Behavior of Atomic and Bitmask Operations 38c2ecf20Sopenharmony_ci======================================================= 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ci:Author: David S. Miller 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciThis document is intended to serve as a guide to Linux port 88c2ecf20Sopenharmony_cimaintainers on how to implement atomic counter, bitops, and spinlock 98c2ecf20Sopenharmony_ciinterfaces properly. 108c2ecf20Sopenharmony_ci 118c2ecf20Sopenharmony_ciAtomic Type And Operations 128c2ecf20Sopenharmony_ci========================== 138c2ecf20Sopenharmony_ci 148c2ecf20Sopenharmony_ciThe atomic_t type should be defined as a signed integer and 158c2ecf20Sopenharmony_cithe atomic_long_t type as a signed long integer. Also, they should 168c2ecf20Sopenharmony_cibe made opaque such that any kind of cast to a normal C integer type 178c2ecf20Sopenharmony_ciwill fail. Something like the following should suffice:: 188c2ecf20Sopenharmony_ci 198c2ecf20Sopenharmony_ci typedef struct { int counter; } atomic_t; 208c2ecf20Sopenharmony_ci typedef struct { long counter; } atomic_long_t; 218c2ecf20Sopenharmony_ci 228c2ecf20Sopenharmony_ciHistorically, counter has been declared volatile. This is now discouraged. 238c2ecf20Sopenharmony_ciSee :ref:`Documentation/process/volatile-considered-harmful.rst 248c2ecf20Sopenharmony_ci<volatile_considered_harmful>` for the complete rationale. 258c2ecf20Sopenharmony_ci 268c2ecf20Sopenharmony_cilocal_t is very similar to atomic_t. If the counter is per CPU and only 278c2ecf20Sopenharmony_ciupdated by one CPU, local_t is probably more appropriate. Please see 288c2ecf20Sopenharmony_ci:ref:`Documentation/core-api/local_ops.rst <local_ops>` for the semantics of 298c2ecf20Sopenharmony_cilocal_t. 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ciThe first operations to implement for atomic_t's are the initializers and 328c2ecf20Sopenharmony_ciplain writes. :: 338c2ecf20Sopenharmony_ci 348c2ecf20Sopenharmony_ci #define ATOMIC_INIT(i) { (i) } 358c2ecf20Sopenharmony_ci #define atomic_set(v, i) ((v)->counter = (i)) 368c2ecf20Sopenharmony_ci 378c2ecf20Sopenharmony_ciThe first macro is used in definitions, such as:: 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_ci static atomic_t my_counter = ATOMIC_INIT(1); 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_ciThe initializer is atomic in that the return values of the atomic operations 428c2ecf20Sopenharmony_ciare guaranteed to be correct reflecting the initialized value if the 438c2ecf20Sopenharmony_ciinitializer is used before runtime. If the initializer is used at runtime, a 448c2ecf20Sopenharmony_ciproper implicit or explicit read memory barrier is needed before reading the 458c2ecf20Sopenharmony_civalue with atomic_read from another thread. 468c2ecf20Sopenharmony_ci 478c2ecf20Sopenharmony_ciAs with all of the ``atomic_`` interfaces, replace the leading ``atomic_`` 488c2ecf20Sopenharmony_ciwith ``atomic_long_`` to operate on atomic_long_t. 498c2ecf20Sopenharmony_ci 508c2ecf20Sopenharmony_ciThe second interface can be used at runtime, as in:: 518c2ecf20Sopenharmony_ci 528c2ecf20Sopenharmony_ci struct foo { atomic_t counter; }; 538c2ecf20Sopenharmony_ci ... 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ci struct foo *k; 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ci k = kmalloc(sizeof(*k), GFP_KERNEL); 588c2ecf20Sopenharmony_ci if (!k) 598c2ecf20Sopenharmony_ci return -ENOMEM; 608c2ecf20Sopenharmony_ci atomic_set(&k->counter, 0); 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ciThe setting is atomic in that the return values of the atomic operations by 638c2ecf20Sopenharmony_ciall threads are guaranteed to be correct reflecting either the value that has 648c2ecf20Sopenharmony_cibeen set with this operation or set with another operation. A proper implicit 658c2ecf20Sopenharmony_cior explicit memory barrier is needed before the value set with the operation 668c2ecf20Sopenharmony_ciis guaranteed to be readable with atomic_read from another thread. 678c2ecf20Sopenharmony_ci 688c2ecf20Sopenharmony_ciNext, we have:: 698c2ecf20Sopenharmony_ci 708c2ecf20Sopenharmony_ci #define atomic_read(v) ((v)->counter) 718c2ecf20Sopenharmony_ci 728c2ecf20Sopenharmony_ciwhich simply reads the counter value currently visible to the calling thread. 738c2ecf20Sopenharmony_ciThe read is atomic in that the return value is guaranteed to be one of the 748c2ecf20Sopenharmony_civalues initialized or modified with the interface operations if a proper 758c2ecf20Sopenharmony_ciimplicit or explicit memory barrier is used after possible runtime 768c2ecf20Sopenharmony_ciinitialization by any other thread and the value is modified only with the 778c2ecf20Sopenharmony_ciinterface operations. atomic_read does not guarantee that the runtime 788c2ecf20Sopenharmony_ciinitialization by any other thread is visible yet, so the user of the 798c2ecf20Sopenharmony_ciinterface must take care of that with a proper implicit or explicit memory 808c2ecf20Sopenharmony_cibarrier. 818c2ecf20Sopenharmony_ci 828c2ecf20Sopenharmony_ci.. warning:: 838c2ecf20Sopenharmony_ci 848c2ecf20Sopenharmony_ci ``atomic_read()`` and ``atomic_set()`` DO NOT IMPLY BARRIERS! 858c2ecf20Sopenharmony_ci 868c2ecf20Sopenharmony_ci Some architectures may choose to use the volatile keyword, barriers, or 878c2ecf20Sopenharmony_ci inline assembly to guarantee some degree of immediacy for atomic_read() 888c2ecf20Sopenharmony_ci and atomic_set(). This is not uniformly guaranteed, and may change in 898c2ecf20Sopenharmony_ci the future, so all users of atomic_t should treat atomic_read() and 908c2ecf20Sopenharmony_ci atomic_set() as simple C statements that may be reordered or optimized 918c2ecf20Sopenharmony_ci away entirely by the compiler or processor, and explicitly invoke the 928c2ecf20Sopenharmony_ci appropriate compiler and/or memory barrier for each use case. Failure 938c2ecf20Sopenharmony_ci to do so will result in code that may suddenly break when used with 948c2ecf20Sopenharmony_ci different architectures or compiler optimizations, or even changes in 958c2ecf20Sopenharmony_ci unrelated code which changes how the compiler optimizes the section 968c2ecf20Sopenharmony_ci accessing atomic_t variables. 978c2ecf20Sopenharmony_ci 988c2ecf20Sopenharmony_ciProperly aligned pointers, longs, ints, and chars (and unsigned 998c2ecf20Sopenharmony_ciequivalents) may be atomically loaded from and stored to in the same 1008c2ecf20Sopenharmony_cisense as described for atomic_read() and atomic_set(). The READ_ONCE() 1018c2ecf20Sopenharmony_ciand WRITE_ONCE() macros should be used to prevent the compiler from using 1028c2ecf20Sopenharmony_cioptimizations that might otherwise optimize accesses out of existence on 1038c2ecf20Sopenharmony_cithe one hand, or that might create unsolicited accesses on the other. 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ciFor example consider the following code:: 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ci while (a > 0) 1088c2ecf20Sopenharmony_ci do_something(); 1098c2ecf20Sopenharmony_ci 1108c2ecf20Sopenharmony_ciIf the compiler can prove that do_something() does not store to the 1118c2ecf20Sopenharmony_civariable a, then the compiler is within its rights transforming this to 1128c2ecf20Sopenharmony_cithe following:: 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_ci if (a > 0) 1158c2ecf20Sopenharmony_ci for (;;) 1168c2ecf20Sopenharmony_ci do_something(); 1178c2ecf20Sopenharmony_ci 1188c2ecf20Sopenharmony_ciIf you don't want the compiler to do this (and you probably don't), then 1198c2ecf20Sopenharmony_ciyou should use something like the following:: 1208c2ecf20Sopenharmony_ci 1218c2ecf20Sopenharmony_ci while (READ_ONCE(a) > 0) 1228c2ecf20Sopenharmony_ci do_something(); 1238c2ecf20Sopenharmony_ci 1248c2ecf20Sopenharmony_ciAlternatively, you could place a barrier() call in the loop. 1258c2ecf20Sopenharmony_ci 1268c2ecf20Sopenharmony_ciFor another example, consider the following code:: 1278c2ecf20Sopenharmony_ci 1288c2ecf20Sopenharmony_ci tmp_a = a; 1298c2ecf20Sopenharmony_ci do_something_with(tmp_a); 1308c2ecf20Sopenharmony_ci do_something_else_with(tmp_a); 1318c2ecf20Sopenharmony_ci 1328c2ecf20Sopenharmony_ciIf the compiler can prove that do_something_with() does not store to the 1338c2ecf20Sopenharmony_civariable a, then the compiler is within its rights to manufacture an 1348c2ecf20Sopenharmony_ciadditional load as follows:: 1358c2ecf20Sopenharmony_ci 1368c2ecf20Sopenharmony_ci tmp_a = a; 1378c2ecf20Sopenharmony_ci do_something_with(tmp_a); 1388c2ecf20Sopenharmony_ci tmp_a = a; 1398c2ecf20Sopenharmony_ci do_something_else_with(tmp_a); 1408c2ecf20Sopenharmony_ci 1418c2ecf20Sopenharmony_ciThis could fatally confuse your code if it expected the same value 1428c2ecf20Sopenharmony_cito be passed to do_something_with() and do_something_else_with(). 1438c2ecf20Sopenharmony_ci 1448c2ecf20Sopenharmony_ciThe compiler would be likely to manufacture this additional load if 1458c2ecf20Sopenharmony_cido_something_with() was an inline function that made very heavy use 1468c2ecf20Sopenharmony_ciof registers: reloading from variable a could save a flush to the 1478c2ecf20Sopenharmony_cistack and later reload. To prevent the compiler from attacking your 1488c2ecf20Sopenharmony_cicode in this manner, write the following:: 1498c2ecf20Sopenharmony_ci 1508c2ecf20Sopenharmony_ci tmp_a = READ_ONCE(a); 1518c2ecf20Sopenharmony_ci do_something_with(tmp_a); 1528c2ecf20Sopenharmony_ci do_something_else_with(tmp_a); 1538c2ecf20Sopenharmony_ci 1548c2ecf20Sopenharmony_ciFor a final example, consider the following code, assuming that the 1558c2ecf20Sopenharmony_civariable a is set at boot time before the second CPU is brought online 1568c2ecf20Sopenharmony_ciand never changed later, so that memory barriers are not needed:: 1578c2ecf20Sopenharmony_ci 1588c2ecf20Sopenharmony_ci if (a) 1598c2ecf20Sopenharmony_ci b = 9; 1608c2ecf20Sopenharmony_ci else 1618c2ecf20Sopenharmony_ci b = 42; 1628c2ecf20Sopenharmony_ci 1638c2ecf20Sopenharmony_ciThe compiler is within its rights to manufacture an additional store 1648c2ecf20Sopenharmony_ciby transforming the above code into the following:: 1658c2ecf20Sopenharmony_ci 1668c2ecf20Sopenharmony_ci b = 42; 1678c2ecf20Sopenharmony_ci if (a) 1688c2ecf20Sopenharmony_ci b = 9; 1698c2ecf20Sopenharmony_ci 1708c2ecf20Sopenharmony_ciThis could come as a fatal surprise to other code running concurrently 1718c2ecf20Sopenharmony_cithat expected b to never have the value 42 if a was zero. To prevent 1728c2ecf20Sopenharmony_cithe compiler from doing this, write something like:: 1738c2ecf20Sopenharmony_ci 1748c2ecf20Sopenharmony_ci if (a) 1758c2ecf20Sopenharmony_ci WRITE_ONCE(b, 9); 1768c2ecf20Sopenharmony_ci else 1778c2ecf20Sopenharmony_ci WRITE_ONCE(b, 42); 1788c2ecf20Sopenharmony_ci 1798c2ecf20Sopenharmony_ciDon't even -think- about doing this without proper use of memory barriers, 1808c2ecf20Sopenharmony_cilocks, or atomic operations if variable a can change at runtime! 1818c2ecf20Sopenharmony_ci 1828c2ecf20Sopenharmony_ci.. warning:: 1838c2ecf20Sopenharmony_ci 1848c2ecf20Sopenharmony_ci ``READ_ONCE()`` OR ``WRITE_ONCE()`` DO NOT IMPLY A BARRIER! 1858c2ecf20Sopenharmony_ci 1868c2ecf20Sopenharmony_ciNow, we move onto the atomic operation interfaces typically implemented with 1878c2ecf20Sopenharmony_cithe help of assembly code. :: 1888c2ecf20Sopenharmony_ci 1898c2ecf20Sopenharmony_ci void atomic_add(int i, atomic_t *v); 1908c2ecf20Sopenharmony_ci void atomic_sub(int i, atomic_t *v); 1918c2ecf20Sopenharmony_ci void atomic_inc(atomic_t *v); 1928c2ecf20Sopenharmony_ci void atomic_dec(atomic_t *v); 1938c2ecf20Sopenharmony_ci 1948c2ecf20Sopenharmony_ciThese four routines add and subtract integral values to/from the given 1958c2ecf20Sopenharmony_ciatomic_t value. The first two routines pass explicit integers by 1968c2ecf20Sopenharmony_ciwhich to make the adjustment, whereas the latter two use an implicit 1978c2ecf20Sopenharmony_ciadjustment value of "1". 1988c2ecf20Sopenharmony_ci 1998c2ecf20Sopenharmony_ciOne very important aspect of these two routines is that they DO NOT 2008c2ecf20Sopenharmony_cirequire any explicit memory barriers. They need only perform the 2018c2ecf20Sopenharmony_ciatomic_t counter update in an SMP safe manner. 2028c2ecf20Sopenharmony_ci 2038c2ecf20Sopenharmony_ciNext, we have:: 2048c2ecf20Sopenharmony_ci 2058c2ecf20Sopenharmony_ci int atomic_inc_return(atomic_t *v); 2068c2ecf20Sopenharmony_ci int atomic_dec_return(atomic_t *v); 2078c2ecf20Sopenharmony_ci 2088c2ecf20Sopenharmony_ciThese routines add 1 and subtract 1, respectively, from the given 2098c2ecf20Sopenharmony_ciatomic_t and return the new counter value after the operation is 2108c2ecf20Sopenharmony_ciperformed. 2118c2ecf20Sopenharmony_ci 2128c2ecf20Sopenharmony_ciUnlike the above routines, it is required that these primitives 2138c2ecf20Sopenharmony_ciinclude explicit memory barriers that are performed before and after 2148c2ecf20Sopenharmony_cithe operation. It must be done such that all memory operations before 2158c2ecf20Sopenharmony_ciand after the atomic operation calls are strongly ordered with respect 2168c2ecf20Sopenharmony_cito the atomic operation itself. 2178c2ecf20Sopenharmony_ci 2188c2ecf20Sopenharmony_ciFor example, it should behave as if a smp_mb() call existed both 2198c2ecf20Sopenharmony_cibefore and after the atomic operation. 2208c2ecf20Sopenharmony_ci 2218c2ecf20Sopenharmony_ciIf the atomic instructions used in an implementation provide explicit 2228c2ecf20Sopenharmony_cimemory barrier semantics which satisfy the above requirements, that is 2238c2ecf20Sopenharmony_cifine as well. 2248c2ecf20Sopenharmony_ci 2258c2ecf20Sopenharmony_ciLet's move on:: 2268c2ecf20Sopenharmony_ci 2278c2ecf20Sopenharmony_ci int atomic_add_return(int i, atomic_t *v); 2288c2ecf20Sopenharmony_ci int atomic_sub_return(int i, atomic_t *v); 2298c2ecf20Sopenharmony_ci 2308c2ecf20Sopenharmony_ciThese behave just like atomic_{inc,dec}_return() except that an 2318c2ecf20Sopenharmony_ciexplicit counter adjustment is given instead of the implicit "1". 2328c2ecf20Sopenharmony_ciThis means that like atomic_{inc,dec}_return(), the memory barrier 2338c2ecf20Sopenharmony_cisemantics are required. 2348c2ecf20Sopenharmony_ci 2358c2ecf20Sopenharmony_ciNext:: 2368c2ecf20Sopenharmony_ci 2378c2ecf20Sopenharmony_ci int atomic_inc_and_test(atomic_t *v); 2388c2ecf20Sopenharmony_ci int atomic_dec_and_test(atomic_t *v); 2398c2ecf20Sopenharmony_ci 2408c2ecf20Sopenharmony_ciThese two routines increment and decrement by 1, respectively, the 2418c2ecf20Sopenharmony_cigiven atomic counter. They return a boolean indicating whether the 2428c2ecf20Sopenharmony_ciresulting counter value was zero or not. 2438c2ecf20Sopenharmony_ci 2448c2ecf20Sopenharmony_ciAgain, these primitives provide explicit memory barrier semantics around 2458c2ecf20Sopenharmony_cithe atomic operation:: 2468c2ecf20Sopenharmony_ci 2478c2ecf20Sopenharmony_ci int atomic_sub_and_test(int i, atomic_t *v); 2488c2ecf20Sopenharmony_ci 2498c2ecf20Sopenharmony_ciThis is identical to atomic_dec_and_test() except that an explicit 2508c2ecf20Sopenharmony_cidecrement is given instead of the implicit "1". This primitive must 2518c2ecf20Sopenharmony_ciprovide explicit memory barrier semantics around the operation:: 2528c2ecf20Sopenharmony_ci 2538c2ecf20Sopenharmony_ci int atomic_add_negative(int i, atomic_t *v); 2548c2ecf20Sopenharmony_ci 2558c2ecf20Sopenharmony_ciThe given increment is added to the given atomic counter value. A boolean 2568c2ecf20Sopenharmony_ciis return which indicates whether the resulting counter value is negative. 2578c2ecf20Sopenharmony_ciThis primitive must provide explicit memory barrier semantics around 2588c2ecf20Sopenharmony_cithe operation. 2598c2ecf20Sopenharmony_ci 2608c2ecf20Sopenharmony_ciThen:: 2618c2ecf20Sopenharmony_ci 2628c2ecf20Sopenharmony_ci int atomic_xchg(atomic_t *v, int new); 2638c2ecf20Sopenharmony_ci 2648c2ecf20Sopenharmony_ciThis performs an atomic exchange operation on the atomic variable v, setting 2658c2ecf20Sopenharmony_cithe given new value. It returns the old value that the atomic variable v had 2668c2ecf20Sopenharmony_cijust before the operation. 2678c2ecf20Sopenharmony_ci 2688c2ecf20Sopenharmony_ciatomic_xchg must provide explicit memory barriers around the operation. :: 2698c2ecf20Sopenharmony_ci 2708c2ecf20Sopenharmony_ci int atomic_cmpxchg(atomic_t *v, int old, int new); 2718c2ecf20Sopenharmony_ci 2728c2ecf20Sopenharmony_ciThis performs an atomic compare exchange operation on the atomic value v, 2738c2ecf20Sopenharmony_ciwith the given old and new values. Like all atomic_xxx operations, 2748c2ecf20Sopenharmony_ciatomic_cmpxchg will only satisfy its atomicity semantics as long as all 2758c2ecf20Sopenharmony_ciother accesses of \*v are performed through atomic_xxx operations. 2768c2ecf20Sopenharmony_ci 2778c2ecf20Sopenharmony_ciatomic_cmpxchg must provide explicit memory barriers around the operation, 2788c2ecf20Sopenharmony_cialthough if the comparison fails then no memory ordering guarantees are 2798c2ecf20Sopenharmony_cirequired. 2808c2ecf20Sopenharmony_ci 2818c2ecf20Sopenharmony_ciThe semantics for atomic_cmpxchg are the same as those defined for 'cas' 2828c2ecf20Sopenharmony_cibelow. 2838c2ecf20Sopenharmony_ci 2848c2ecf20Sopenharmony_ciFinally:: 2858c2ecf20Sopenharmony_ci 2868c2ecf20Sopenharmony_ci int atomic_add_unless(atomic_t *v, int a, int u); 2878c2ecf20Sopenharmony_ci 2888c2ecf20Sopenharmony_ciIf the atomic value v is not equal to u, this function adds a to v, and 2898c2ecf20Sopenharmony_cireturns non zero. If v is equal to u then it returns zero. This is done as 2908c2ecf20Sopenharmony_cian atomic operation. 2918c2ecf20Sopenharmony_ci 2928c2ecf20Sopenharmony_ciatomic_add_unless must provide explicit memory barriers around the 2938c2ecf20Sopenharmony_cioperation unless it fails (returns 0). 2948c2ecf20Sopenharmony_ci 2958c2ecf20Sopenharmony_ciatomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0) 2968c2ecf20Sopenharmony_ci 2978c2ecf20Sopenharmony_ci 2988c2ecf20Sopenharmony_ciIf a caller requires memory barrier semantics around an atomic_t 2998c2ecf20Sopenharmony_cioperation which does not return a value, a set of interfaces are 3008c2ecf20Sopenharmony_cidefined which accomplish this:: 3018c2ecf20Sopenharmony_ci 3028c2ecf20Sopenharmony_ci void smp_mb__before_atomic(void); 3038c2ecf20Sopenharmony_ci void smp_mb__after_atomic(void); 3048c2ecf20Sopenharmony_ci 3058c2ecf20Sopenharmony_ciPreceding a non-value-returning read-modify-write atomic operation with 3068c2ecf20Sopenharmony_cismp_mb__before_atomic() and following it with smp_mb__after_atomic() 3078c2ecf20Sopenharmony_ciprovides the same full ordering that is provided by value-returning 3088c2ecf20Sopenharmony_ciread-modify-write atomic operations. 3098c2ecf20Sopenharmony_ci 3108c2ecf20Sopenharmony_ciFor example, smp_mb__before_atomic() can be used like so:: 3118c2ecf20Sopenharmony_ci 3128c2ecf20Sopenharmony_ci obj->dead = 1; 3138c2ecf20Sopenharmony_ci smp_mb__before_atomic(); 3148c2ecf20Sopenharmony_ci atomic_dec(&obj->ref_count); 3158c2ecf20Sopenharmony_ci 3168c2ecf20Sopenharmony_ciIt makes sure that all memory operations preceding the atomic_dec() 3178c2ecf20Sopenharmony_cicall are strongly ordered with respect to the atomic counter 3188c2ecf20Sopenharmony_cioperation. In the above example, it guarantees that the assignment of 3198c2ecf20Sopenharmony_ci"1" to obj->dead will be globally visible to other cpus before the 3208c2ecf20Sopenharmony_ciatomic counter decrement. 3218c2ecf20Sopenharmony_ci 3228c2ecf20Sopenharmony_ciWithout the explicit smp_mb__before_atomic() call, the 3238c2ecf20Sopenharmony_ciimplementation could legally allow the atomic counter update visible 3248c2ecf20Sopenharmony_cito other cpus before the "obj->dead = 1;" assignment. 3258c2ecf20Sopenharmony_ci 3268c2ecf20Sopenharmony_ciA missing memory barrier in the cases where they are required by the 3278c2ecf20Sopenharmony_ciatomic_t implementation above can have disastrous results. Here is 3288c2ecf20Sopenharmony_cian example, which follows a pattern occurring frequently in the Linux 3298c2ecf20Sopenharmony_cikernel. It is the use of atomic counters to implement reference 3308c2ecf20Sopenharmony_cicounting, and it works such that once the counter falls to zero it can 3318c2ecf20Sopenharmony_cibe guaranteed that no other entity can be accessing the object:: 3328c2ecf20Sopenharmony_ci 3338c2ecf20Sopenharmony_ci static void obj_list_add(struct obj *obj, struct list_head *head) 3348c2ecf20Sopenharmony_ci { 3358c2ecf20Sopenharmony_ci obj->active = 1; 3368c2ecf20Sopenharmony_ci list_add(&obj->list, head); 3378c2ecf20Sopenharmony_ci } 3388c2ecf20Sopenharmony_ci 3398c2ecf20Sopenharmony_ci static void obj_list_del(struct obj *obj) 3408c2ecf20Sopenharmony_ci { 3418c2ecf20Sopenharmony_ci list_del(&obj->list); 3428c2ecf20Sopenharmony_ci obj->active = 0; 3438c2ecf20Sopenharmony_ci } 3448c2ecf20Sopenharmony_ci 3458c2ecf20Sopenharmony_ci static void obj_destroy(struct obj *obj) 3468c2ecf20Sopenharmony_ci { 3478c2ecf20Sopenharmony_ci BUG_ON(obj->active); 3488c2ecf20Sopenharmony_ci kfree(obj); 3498c2ecf20Sopenharmony_ci } 3508c2ecf20Sopenharmony_ci 3518c2ecf20Sopenharmony_ci struct obj *obj_list_peek(struct list_head *head) 3528c2ecf20Sopenharmony_ci { 3538c2ecf20Sopenharmony_ci if (!list_empty(head)) { 3548c2ecf20Sopenharmony_ci struct obj *obj; 3558c2ecf20Sopenharmony_ci 3568c2ecf20Sopenharmony_ci obj = list_entry(head->next, struct obj, list); 3578c2ecf20Sopenharmony_ci atomic_inc(&obj->refcnt); 3588c2ecf20Sopenharmony_ci return obj; 3598c2ecf20Sopenharmony_ci } 3608c2ecf20Sopenharmony_ci return NULL; 3618c2ecf20Sopenharmony_ci } 3628c2ecf20Sopenharmony_ci 3638c2ecf20Sopenharmony_ci void obj_poke(void) 3648c2ecf20Sopenharmony_ci { 3658c2ecf20Sopenharmony_ci struct obj *obj; 3668c2ecf20Sopenharmony_ci 3678c2ecf20Sopenharmony_ci spin_lock(&global_list_lock); 3688c2ecf20Sopenharmony_ci obj = obj_list_peek(&global_list); 3698c2ecf20Sopenharmony_ci spin_unlock(&global_list_lock); 3708c2ecf20Sopenharmony_ci 3718c2ecf20Sopenharmony_ci if (obj) { 3728c2ecf20Sopenharmony_ci obj->ops->poke(obj); 3738c2ecf20Sopenharmony_ci if (atomic_dec_and_test(&obj->refcnt)) 3748c2ecf20Sopenharmony_ci obj_destroy(obj); 3758c2ecf20Sopenharmony_ci } 3768c2ecf20Sopenharmony_ci } 3778c2ecf20Sopenharmony_ci 3788c2ecf20Sopenharmony_ci void obj_timeout(struct obj *obj) 3798c2ecf20Sopenharmony_ci { 3808c2ecf20Sopenharmony_ci spin_lock(&global_list_lock); 3818c2ecf20Sopenharmony_ci obj_list_del(obj); 3828c2ecf20Sopenharmony_ci spin_unlock(&global_list_lock); 3838c2ecf20Sopenharmony_ci 3848c2ecf20Sopenharmony_ci if (atomic_dec_and_test(&obj->refcnt)) 3858c2ecf20Sopenharmony_ci obj_destroy(obj); 3868c2ecf20Sopenharmony_ci } 3878c2ecf20Sopenharmony_ci 3888c2ecf20Sopenharmony_ci.. note:: 3898c2ecf20Sopenharmony_ci 3908c2ecf20Sopenharmony_ci This is a simplification of the ARP queue management in the generic 3918c2ecf20Sopenharmony_ci neighbour discover code of the networking. Olaf Kirch found a bug wrt. 3928c2ecf20Sopenharmony_ci memory barriers in kfree_skb() that exposed the atomic_t memory barrier 3938c2ecf20Sopenharmony_ci requirements quite clearly. 3948c2ecf20Sopenharmony_ci 3958c2ecf20Sopenharmony_ciGiven the above scheme, it must be the case that the obj->active 3968c2ecf20Sopenharmony_ciupdate done by the obj list deletion be visible to other processors 3978c2ecf20Sopenharmony_cibefore the atomic counter decrement is performed. 3988c2ecf20Sopenharmony_ci 3998c2ecf20Sopenharmony_ciOtherwise, the counter could fall to zero, yet obj->active would still 4008c2ecf20Sopenharmony_cibe set, thus triggering the assertion in obj_destroy(). The error 4018c2ecf20Sopenharmony_cisequence looks like this:: 4028c2ecf20Sopenharmony_ci 4038c2ecf20Sopenharmony_ci cpu 0 cpu 1 4048c2ecf20Sopenharmony_ci obj_poke() obj_timeout() 4058c2ecf20Sopenharmony_ci obj = obj_list_peek(); 4068c2ecf20Sopenharmony_ci ... gains ref to obj, refcnt=2 4078c2ecf20Sopenharmony_ci obj_list_del(obj); 4088c2ecf20Sopenharmony_ci obj->active = 0 ... 4098c2ecf20Sopenharmony_ci ... visibility delayed ... 4108c2ecf20Sopenharmony_ci atomic_dec_and_test() 4118c2ecf20Sopenharmony_ci ... refcnt drops to 1 ... 4128c2ecf20Sopenharmony_ci atomic_dec_and_test() 4138c2ecf20Sopenharmony_ci ... refcount drops to 0 ... 4148c2ecf20Sopenharmony_ci obj_destroy() 4158c2ecf20Sopenharmony_ci BUG() triggers since obj->active 4168c2ecf20Sopenharmony_ci still seen as one 4178c2ecf20Sopenharmony_ci obj->active update visibility occurs 4188c2ecf20Sopenharmony_ci 4198c2ecf20Sopenharmony_ciWith the memory barrier semantics required of the atomic_t operations 4208c2ecf20Sopenharmony_ciwhich return values, the above sequence of memory visibility can never 4218c2ecf20Sopenharmony_cihappen. Specifically, in the above case the atomic_dec_and_test() 4228c2ecf20Sopenharmony_cicounter decrement would not become globally visible until the 4238c2ecf20Sopenharmony_ciobj->active update does. 4248c2ecf20Sopenharmony_ci 4258c2ecf20Sopenharmony_ciAs a historical note, 32-bit Sparc used to only allow usage of 4268c2ecf20Sopenharmony_ci24-bits of its atomic_t type. This was because it used 8 bits 4278c2ecf20Sopenharmony_cias a spinlock for SMP safety. Sparc32 lacked a "compare and swap" 4288c2ecf20Sopenharmony_citype instruction. However, 32-bit Sparc has since been moved over 4298c2ecf20Sopenharmony_cito a "hash table of spinlocks" scheme, that allows the full 32-bit 4308c2ecf20Sopenharmony_cicounter to be realized. Essentially, an array of spinlocks are 4318c2ecf20Sopenharmony_ciindexed into based upon the address of the atomic_t being operated 4328c2ecf20Sopenharmony_cion, and that lock protects the atomic operation. Parisc uses the 4338c2ecf20Sopenharmony_cisame scheme. 4348c2ecf20Sopenharmony_ci 4358c2ecf20Sopenharmony_ciAnother note is that the atomic_t operations returning values are 4368c2ecf20Sopenharmony_ciextremely slow on an old 386. 4378c2ecf20Sopenharmony_ci 4388c2ecf20Sopenharmony_ci 4398c2ecf20Sopenharmony_ciAtomic Bitmask 4408c2ecf20Sopenharmony_ci============== 4418c2ecf20Sopenharmony_ci 4428c2ecf20Sopenharmony_ciWe will now cover the atomic bitmask operations. You will find that 4438c2ecf20Sopenharmony_citheir SMP and memory barrier semantics are similar in shape and scope 4448c2ecf20Sopenharmony_cito the atomic_t ops above. 4458c2ecf20Sopenharmony_ci 4468c2ecf20Sopenharmony_ciNative atomic bit operations are defined to operate on objects aligned 4478c2ecf20Sopenharmony_cito the size of an "unsigned long" C data type, and are least of that 4488c2ecf20Sopenharmony_cisize. The endianness of the bits within each "unsigned long" are the 4498c2ecf20Sopenharmony_cinative endianness of the cpu. :: 4508c2ecf20Sopenharmony_ci 4518c2ecf20Sopenharmony_ci void set_bit(unsigned long nr, volatile unsigned long *addr); 4528c2ecf20Sopenharmony_ci void clear_bit(unsigned long nr, volatile unsigned long *addr); 4538c2ecf20Sopenharmony_ci void change_bit(unsigned long nr, volatile unsigned long *addr); 4548c2ecf20Sopenharmony_ci 4558c2ecf20Sopenharmony_ciThese routines set, clear, and change, respectively, the bit number 4568c2ecf20Sopenharmony_ciindicated by "nr" on the bit mask pointed to by "ADDR". 4578c2ecf20Sopenharmony_ci 4588c2ecf20Sopenharmony_ciThey must execute atomically, yet there are no implicit memory barrier 4598c2ecf20Sopenharmony_cisemantics required of these interfaces. :: 4608c2ecf20Sopenharmony_ci 4618c2ecf20Sopenharmony_ci int test_and_set_bit(unsigned long nr, volatile unsigned long *addr); 4628c2ecf20Sopenharmony_ci int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr); 4638c2ecf20Sopenharmony_ci int test_and_change_bit(unsigned long nr, volatile unsigned long *addr); 4648c2ecf20Sopenharmony_ci 4658c2ecf20Sopenharmony_ciLike the above, except that these routines return a boolean which 4668c2ecf20Sopenharmony_ciindicates whether the changed bit was set _BEFORE_ the atomic bit 4678c2ecf20Sopenharmony_cioperation. 4688c2ecf20Sopenharmony_ci 4698c2ecf20Sopenharmony_ci 4708c2ecf20Sopenharmony_ci.. warning:: 4718c2ecf20Sopenharmony_ci It is incredibly important that the value be a boolean, ie. "0" or "1". 4728c2ecf20Sopenharmony_ci Do not try to be fancy and save a few instructions by declaring the 4738c2ecf20Sopenharmony_ci above to return "long" and just returning something like "old_val & 4748c2ecf20Sopenharmony_ci mask" because that will not work. 4758c2ecf20Sopenharmony_ci 4768c2ecf20Sopenharmony_ciFor one thing, this return value gets truncated to int in many code 4778c2ecf20Sopenharmony_cipaths using these interfaces, so on 64-bit if the bit is set in the 4788c2ecf20Sopenharmony_ciupper 32-bits then testers will never see that. 4798c2ecf20Sopenharmony_ci 4808c2ecf20Sopenharmony_ciOne great example of where this problem crops up are the thread_info 4818c2ecf20Sopenharmony_ciflag operations. Routines such as test_and_set_ti_thread_flag() chop 4828c2ecf20Sopenharmony_cithe return value into an int. There are other places where things 4838c2ecf20Sopenharmony_cilike this occur as well. 4848c2ecf20Sopenharmony_ci 4858c2ecf20Sopenharmony_ciThese routines, like the atomic_t counter operations returning values, 4868c2ecf20Sopenharmony_cimust provide explicit memory barrier semantics around their execution. 4878c2ecf20Sopenharmony_ciAll memory operations before the atomic bit operation call must be 4888c2ecf20Sopenharmony_cimade visible globally before the atomic bit operation is made visible. 4898c2ecf20Sopenharmony_ciLikewise, the atomic bit operation must be visible globally before any 4908c2ecf20Sopenharmony_cisubsequent memory operation is made visible. For example:: 4918c2ecf20Sopenharmony_ci 4928c2ecf20Sopenharmony_ci obj->dead = 1; 4938c2ecf20Sopenharmony_ci if (test_and_set_bit(0, &obj->flags)) 4948c2ecf20Sopenharmony_ci /* ... */; 4958c2ecf20Sopenharmony_ci obj->killed = 1; 4968c2ecf20Sopenharmony_ci 4978c2ecf20Sopenharmony_ciThe implementation of test_and_set_bit() must guarantee that 4988c2ecf20Sopenharmony_ci"obj->dead = 1;" is visible to cpus before the atomic memory operation 4998c2ecf20Sopenharmony_cidone by test_and_set_bit() becomes visible. Likewise, the atomic 5008c2ecf20Sopenharmony_cimemory operation done by test_and_set_bit() must become visible before 5018c2ecf20Sopenharmony_ci"obj->killed = 1;" is visible. 5028c2ecf20Sopenharmony_ci 5038c2ecf20Sopenharmony_ciFinally there is the basic operation:: 5048c2ecf20Sopenharmony_ci 5058c2ecf20Sopenharmony_ci int test_bit(unsigned long nr, __const__ volatile unsigned long *addr); 5068c2ecf20Sopenharmony_ci 5078c2ecf20Sopenharmony_ciWhich returns a boolean indicating if bit "nr" is set in the bitmask 5088c2ecf20Sopenharmony_cipointed to by "addr". 5098c2ecf20Sopenharmony_ci 5108c2ecf20Sopenharmony_ciIf explicit memory barriers are required around {set,clear}_bit() (which do 5118c2ecf20Sopenharmony_cinot return a value, and thus does not need to provide memory barrier 5128c2ecf20Sopenharmony_cisemantics), two interfaces are provided:: 5138c2ecf20Sopenharmony_ci 5148c2ecf20Sopenharmony_ci void smp_mb__before_atomic(void); 5158c2ecf20Sopenharmony_ci void smp_mb__after_atomic(void); 5168c2ecf20Sopenharmony_ci 5178c2ecf20Sopenharmony_ciThey are used as follows, and are akin to their atomic_t operation 5188c2ecf20Sopenharmony_cibrothers:: 5198c2ecf20Sopenharmony_ci 5208c2ecf20Sopenharmony_ci /* All memory operations before this call will 5218c2ecf20Sopenharmony_ci * be globally visible before the clear_bit(). 5228c2ecf20Sopenharmony_ci */ 5238c2ecf20Sopenharmony_ci smp_mb__before_atomic(); 5248c2ecf20Sopenharmony_ci clear_bit( ... ); 5258c2ecf20Sopenharmony_ci 5268c2ecf20Sopenharmony_ci /* The clear_bit() will be visible before all 5278c2ecf20Sopenharmony_ci * subsequent memory operations. 5288c2ecf20Sopenharmony_ci */ 5298c2ecf20Sopenharmony_ci smp_mb__after_atomic(); 5308c2ecf20Sopenharmony_ci 5318c2ecf20Sopenharmony_ciThere are two special bitops with lock barrier semantics (acquire/release, 5328c2ecf20Sopenharmony_cisame as spinlocks). These operate in the same way as their non-_lock/unlock 5338c2ecf20Sopenharmony_cipostfixed variants, except that they are to provide acquire/release semantics, 5348c2ecf20Sopenharmony_cirespectively. This means they can be used for bit_spin_trylock and 5358c2ecf20Sopenharmony_cibit_spin_unlock type operations without specifying any more barriers. :: 5368c2ecf20Sopenharmony_ci 5378c2ecf20Sopenharmony_ci int test_and_set_bit_lock(unsigned long nr, unsigned long *addr); 5388c2ecf20Sopenharmony_ci void clear_bit_unlock(unsigned long nr, unsigned long *addr); 5398c2ecf20Sopenharmony_ci void __clear_bit_unlock(unsigned long nr, unsigned long *addr); 5408c2ecf20Sopenharmony_ci 5418c2ecf20Sopenharmony_ciThe __clear_bit_unlock version is non-atomic, however it still implements 5428c2ecf20Sopenharmony_ciunlock barrier semantics. This can be useful if the lock itself is protecting 5438c2ecf20Sopenharmony_cithe other bits in the word. 5448c2ecf20Sopenharmony_ci 5458c2ecf20Sopenharmony_ciFinally, there are non-atomic versions of the bitmask operations 5468c2ecf20Sopenharmony_ciprovided. They are used in contexts where some other higher-level SMP 5478c2ecf20Sopenharmony_cilocking scheme is being used to protect the bitmask, and thus less 5488c2ecf20Sopenharmony_ciexpensive non-atomic operations may be used in the implementation. 5498c2ecf20Sopenharmony_ciThey have names similar to the above bitmask operation interfaces, 5508c2ecf20Sopenharmony_ciexcept that two underscores are prefixed to the interface name. :: 5518c2ecf20Sopenharmony_ci 5528c2ecf20Sopenharmony_ci void __set_bit(unsigned long nr, volatile unsigned long *addr); 5538c2ecf20Sopenharmony_ci void __clear_bit(unsigned long nr, volatile unsigned long *addr); 5548c2ecf20Sopenharmony_ci void __change_bit(unsigned long nr, volatile unsigned long *addr); 5558c2ecf20Sopenharmony_ci int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr); 5568c2ecf20Sopenharmony_ci int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr); 5578c2ecf20Sopenharmony_ci int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr); 5588c2ecf20Sopenharmony_ci 5598c2ecf20Sopenharmony_ciThese non-atomic variants also do not require any special memory 5608c2ecf20Sopenharmony_cibarrier semantics. 5618c2ecf20Sopenharmony_ci 5628c2ecf20Sopenharmony_ciThe routines xchg() and cmpxchg() must provide the same exact 5638c2ecf20Sopenharmony_cimemory-barrier semantics as the atomic and bit operations returning 5648c2ecf20Sopenharmony_civalues. 5658c2ecf20Sopenharmony_ci 5668c2ecf20Sopenharmony_ci.. note:: 5678c2ecf20Sopenharmony_ci 5688c2ecf20Sopenharmony_ci If someone wants to use xchg(), cmpxchg() and their variants, 5698c2ecf20Sopenharmony_ci linux/atomic.h should be included rather than asm/cmpxchg.h, unless the 5708c2ecf20Sopenharmony_ci code is in arch/* and can take care of itself. 5718c2ecf20Sopenharmony_ci 5728c2ecf20Sopenharmony_ciSpinlocks and rwlocks have memory barrier expectations as well. 5738c2ecf20Sopenharmony_ciThe rule to follow is simple: 5748c2ecf20Sopenharmony_ci 5758c2ecf20Sopenharmony_ci1) When acquiring a lock, the implementation must make it globally 5768c2ecf20Sopenharmony_ci visible before any subsequent memory operation. 5778c2ecf20Sopenharmony_ci 5788c2ecf20Sopenharmony_ci2) When releasing a lock, the implementation must make it such that 5798c2ecf20Sopenharmony_ci all previous memory operations are globally visible before the 5808c2ecf20Sopenharmony_ci lock release. 5818c2ecf20Sopenharmony_ci 5828c2ecf20Sopenharmony_ciWhich finally brings us to _atomic_dec_and_lock(). There is an 5838c2ecf20Sopenharmony_ciarchitecture-neutral version implemented in lib/dec_and_lock.c, 5848c2ecf20Sopenharmony_cibut most platforms will wish to optimize this in assembler. :: 5858c2ecf20Sopenharmony_ci 5868c2ecf20Sopenharmony_ci int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock); 5878c2ecf20Sopenharmony_ci 5888c2ecf20Sopenharmony_ciAtomically decrement the given counter, and if will drop to zero 5898c2ecf20Sopenharmony_ciatomically acquire the given spinlock and perform the decrement 5908c2ecf20Sopenharmony_ciof the counter to zero. If it does not drop to zero, do nothing 5918c2ecf20Sopenharmony_ciwith the spinlock. 5928c2ecf20Sopenharmony_ci 5938c2ecf20Sopenharmony_ciIt is actually pretty simple to get the memory barrier correct. 5948c2ecf20Sopenharmony_ciSimply satisfy the spinlock grab requirements, which is make 5958c2ecf20Sopenharmony_cisure the spinlock operation is globally visible before any 5968c2ecf20Sopenharmony_cisubsequent memory operation. 5978c2ecf20Sopenharmony_ci 5988c2ecf20Sopenharmony_ciWe can demonstrate this operation more clearly if we define 5998c2ecf20Sopenharmony_cian abstract atomic operation:: 6008c2ecf20Sopenharmony_ci 6018c2ecf20Sopenharmony_ci long cas(long *mem, long old, long new); 6028c2ecf20Sopenharmony_ci 6038c2ecf20Sopenharmony_ci"cas" stands for "compare and swap". It atomically: 6048c2ecf20Sopenharmony_ci 6058c2ecf20Sopenharmony_ci1) Compares "old" with the value currently at "mem". 6068c2ecf20Sopenharmony_ci2) If they are equal, "new" is written to "mem". 6078c2ecf20Sopenharmony_ci3) Regardless, the current value at "mem" is returned. 6088c2ecf20Sopenharmony_ci 6098c2ecf20Sopenharmony_ciAs an example usage, here is what an atomic counter update 6108c2ecf20Sopenharmony_cimight look like:: 6118c2ecf20Sopenharmony_ci 6128c2ecf20Sopenharmony_ci void example_atomic_inc(long *counter) 6138c2ecf20Sopenharmony_ci { 6148c2ecf20Sopenharmony_ci long old, new, ret; 6158c2ecf20Sopenharmony_ci 6168c2ecf20Sopenharmony_ci while (1) { 6178c2ecf20Sopenharmony_ci old = *counter; 6188c2ecf20Sopenharmony_ci new = old + 1; 6198c2ecf20Sopenharmony_ci 6208c2ecf20Sopenharmony_ci ret = cas(counter, old, new); 6218c2ecf20Sopenharmony_ci if (ret == old) 6228c2ecf20Sopenharmony_ci break; 6238c2ecf20Sopenharmony_ci } 6248c2ecf20Sopenharmony_ci } 6258c2ecf20Sopenharmony_ci 6268c2ecf20Sopenharmony_ciLet's use cas() in order to build a pseudo-C atomic_dec_and_lock():: 6278c2ecf20Sopenharmony_ci 6288c2ecf20Sopenharmony_ci int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) 6298c2ecf20Sopenharmony_ci { 6308c2ecf20Sopenharmony_ci long old, new, ret; 6318c2ecf20Sopenharmony_ci int went_to_zero; 6328c2ecf20Sopenharmony_ci 6338c2ecf20Sopenharmony_ci went_to_zero = 0; 6348c2ecf20Sopenharmony_ci while (1) { 6358c2ecf20Sopenharmony_ci old = atomic_read(atomic); 6368c2ecf20Sopenharmony_ci new = old - 1; 6378c2ecf20Sopenharmony_ci if (new == 0) { 6388c2ecf20Sopenharmony_ci went_to_zero = 1; 6398c2ecf20Sopenharmony_ci spin_lock(lock); 6408c2ecf20Sopenharmony_ci } 6418c2ecf20Sopenharmony_ci ret = cas(atomic, old, new); 6428c2ecf20Sopenharmony_ci if (ret == old) 6438c2ecf20Sopenharmony_ci break; 6448c2ecf20Sopenharmony_ci if (went_to_zero) { 6458c2ecf20Sopenharmony_ci spin_unlock(lock); 6468c2ecf20Sopenharmony_ci went_to_zero = 0; 6478c2ecf20Sopenharmony_ci } 6488c2ecf20Sopenharmony_ci } 6498c2ecf20Sopenharmony_ci 6508c2ecf20Sopenharmony_ci return went_to_zero; 6518c2ecf20Sopenharmony_ci } 6528c2ecf20Sopenharmony_ci 6538c2ecf20Sopenharmony_ciNow, as far as memory barriers go, as long as spin_lock() 6548c2ecf20Sopenharmony_cistrictly orders all subsequent memory operations (including 6558c2ecf20Sopenharmony_cithe cas()) with respect to itself, things will be fine. 6568c2ecf20Sopenharmony_ci 6578c2ecf20Sopenharmony_ciSaid another way, _atomic_dec_and_lock() must guarantee that 6588c2ecf20Sopenharmony_cia counter dropping to zero is never made visible before the 6598c2ecf20Sopenharmony_cispinlock being acquired. 6608c2ecf20Sopenharmony_ci 6618c2ecf20Sopenharmony_ci.. note:: 6628c2ecf20Sopenharmony_ci 6638c2ecf20Sopenharmony_ci Note that this also means that for the case where the counter is not 6648c2ecf20Sopenharmony_ci dropping to zero, there are no memory ordering requirements. 665