18c2ecf20Sopenharmony_ci 28c2ecf20Sopenharmony_ci JFFS2 LOCKING DOCUMENTATION 38c2ecf20Sopenharmony_ci --------------------------- 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciThis document attempts to describe the existing locking rules for 68c2ecf20Sopenharmony_ciJFFS2. It is not expected to remain perfectly up to date, but ought to 78c2ecf20Sopenharmony_cibe fairly close. 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ci alloc_sem 118c2ecf20Sopenharmony_ci --------- 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciThe alloc_sem is a per-filesystem mutex, used primarily to ensure 148c2ecf20Sopenharmony_cicontiguous allocation of space on the medium. It is automatically 158c2ecf20Sopenharmony_ciobtained during space allocations (jffs2_reserve_space()) and freed 168c2ecf20Sopenharmony_ciupon write completion (jffs2_complete_reservation()). Note that 178c2ecf20Sopenharmony_cithe garbage collector will obtain this right at the beginning of 188c2ecf20Sopenharmony_cijffs2_garbage_collect_pass() and release it at the end, thereby 198c2ecf20Sopenharmony_cipreventing any other write activity on the file system during a 208c2ecf20Sopenharmony_cigarbage collect pass. 218c2ecf20Sopenharmony_ci 228c2ecf20Sopenharmony_ciWhen writing new nodes, the alloc_sem must be held until the new nodes 238c2ecf20Sopenharmony_cihave been properly linked into the data structures for the inode to 248c2ecf20Sopenharmony_ciwhich they belong. This is for the benefit of NAND flash - adding new 258c2ecf20Sopenharmony_cinodes to an inode may obsolete old ones, and by holding the alloc_sem 268c2ecf20Sopenharmony_ciuntil this happens we ensure that any data in the write-buffer at the 278c2ecf20Sopenharmony_citime this happens are part of the new node, not just something that 288c2ecf20Sopenharmony_ciwas written afterwards. Hence, we can ensure the newly-obsoleted nodes 298c2ecf20Sopenharmony_cidon't actually get erased until the write-buffer has been flushed to 308c2ecf20Sopenharmony_cithe medium. 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_ciWith the introduction of NAND flash support and the write-buffer, 338c2ecf20Sopenharmony_cithe alloc_sem is also used to protect the wbuf-related members of the 348c2ecf20Sopenharmony_cijffs2_sb_info structure. Atomically reading the wbuf_len member to see 358c2ecf20Sopenharmony_ciif the wbuf is currently holding any data is permitted, though. 368c2ecf20Sopenharmony_ci 378c2ecf20Sopenharmony_ciOrdering constraints: See f->sem. 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_ci 408c2ecf20Sopenharmony_ci File Mutex f->sem 418c2ecf20Sopenharmony_ci --------------------- 428c2ecf20Sopenharmony_ci 438c2ecf20Sopenharmony_ciThis is the JFFS2-internal equivalent of the inode mutex i->i_sem. 448c2ecf20Sopenharmony_ciIt protects the contents of the jffs2_inode_info private inode data, 458c2ecf20Sopenharmony_ciincluding the linked list of node fragments (but see the notes below on 468c2ecf20Sopenharmony_cierase_completion_lock), etc. 478c2ecf20Sopenharmony_ci 488c2ecf20Sopenharmony_ciThe reason that the i_sem itself isn't used for this purpose is to 498c2ecf20Sopenharmony_ciavoid deadlocks with garbage collection -- the VFS will lock the i_sem 508c2ecf20Sopenharmony_cibefore calling a function which may need to allocate space. The 518c2ecf20Sopenharmony_ciallocation may trigger garbage-collection, which may need to move a 528c2ecf20Sopenharmony_cinode belonging to the inode which was locked in the first place by the 538c2ecf20Sopenharmony_ciVFS. If the garbage collection code were to attempt to lock the i_sem 548c2ecf20Sopenharmony_ciof the inode from which it's garbage-collecting a physical node, this 558c2ecf20Sopenharmony_cilead to deadlock, unless we played games with unlocking the i_sem 568c2ecf20Sopenharmony_cibefore calling the space allocation functions. 578c2ecf20Sopenharmony_ci 588c2ecf20Sopenharmony_ciInstead of playing such games, we just have an extra internal 598c2ecf20Sopenharmony_cimutex, which is obtained by the garbage collection code and also 608c2ecf20Sopenharmony_ciby the normal file system code _after_ allocation of space. 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ciOrdering constraints: 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ci 1. Never attempt to allocate space or lock alloc_sem with 658c2ecf20Sopenharmony_ci any f->sem held. 668c2ecf20Sopenharmony_ci 2. Never attempt to lock two file mutexes in one thread. 678c2ecf20Sopenharmony_ci No ordering rules have been made for doing so. 688c2ecf20Sopenharmony_ci 3. Never lock a page cache page with f->sem held. 698c2ecf20Sopenharmony_ci 708c2ecf20Sopenharmony_ci 718c2ecf20Sopenharmony_ci erase_completion_lock spinlock 728c2ecf20Sopenharmony_ci ------------------------------ 738c2ecf20Sopenharmony_ci 748c2ecf20Sopenharmony_ciThis is used to serialise access to the eraseblock lists, to the 758c2ecf20Sopenharmony_ciper-eraseblock lists of physical jffs2_raw_node_ref structures, and 768c2ecf20Sopenharmony_ci(NB) the per-inode list of physical nodes. The latter is a special 778c2ecf20Sopenharmony_cicase - see below. 788c2ecf20Sopenharmony_ci 798c2ecf20Sopenharmony_ciAs the MTD API no longer permits erase-completion callback functions 808c2ecf20Sopenharmony_cito be called from bottom-half (timer) context (on the basis that nobody 818c2ecf20Sopenharmony_ciever actually implemented such a thing), it's now sufficient to use 828c2ecf20Sopenharmony_cia simple spin_lock() rather than spin_lock_bh(). 838c2ecf20Sopenharmony_ci 848c2ecf20Sopenharmony_ciNote that the per-inode list of physical nodes (f->nodes) is a special 858c2ecf20Sopenharmony_cicase. Any changes to _valid_ nodes (i.e. ->flash_offset & 1 == 0) in 868c2ecf20Sopenharmony_cithe list are protected by the file mutex f->sem. But the erase code 878c2ecf20Sopenharmony_cimay remove _obsolete_ nodes from the list while holding only the 888c2ecf20Sopenharmony_cierase_completion_lock. So you can walk the list only while holding the 898c2ecf20Sopenharmony_cierase_completion_lock, and can drop the lock temporarily mid-walk as 908c2ecf20Sopenharmony_cilong as the pointer you're holding is to a _valid_ node, not an 918c2ecf20Sopenharmony_ciobsolete one. 928c2ecf20Sopenharmony_ci 938c2ecf20Sopenharmony_ciThe erase_completion_lock is also used to protect the c->gc_task 948c2ecf20Sopenharmony_cipointer when the garbage collection thread exits. The code to kill the 958c2ecf20Sopenharmony_ciGC thread locks it, sends the signal, then unlocks it - while the GC 968c2ecf20Sopenharmony_cithread itself locks it, zeroes c->gc_task, then unlocks on the exit path. 978c2ecf20Sopenharmony_ci 988c2ecf20Sopenharmony_ci 998c2ecf20Sopenharmony_ci inocache_lock spinlock 1008c2ecf20Sopenharmony_ci ---------------------- 1018c2ecf20Sopenharmony_ci 1028c2ecf20Sopenharmony_ciThis spinlock protects the hashed list (c->inocache_list) of the 1038c2ecf20Sopenharmony_ciin-core jffs2_inode_cache objects (each inode in JFFS2 has the 1048c2ecf20Sopenharmony_cicorrespondent jffs2_inode_cache object). So, the inocache_lock 1058c2ecf20Sopenharmony_cihas to be locked while walking the c->inocache_list hash buckets. 1068c2ecf20Sopenharmony_ci 1078c2ecf20Sopenharmony_ciThis spinlock also covers allocation of new inode numbers, which is 1088c2ecf20Sopenharmony_cicurrently just '++->highest_ino++', but might one day get more complicated 1098c2ecf20Sopenharmony_ciif we need to deal with wrapping after 4 milliard inode numbers are used. 1108c2ecf20Sopenharmony_ci 1118c2ecf20Sopenharmony_ciNote, the f->sem guarantees that the correspondent jffs2_inode_cache 1128c2ecf20Sopenharmony_ciwill not be removed. So, it is allowed to access it without locking 1138c2ecf20Sopenharmony_cithe inocache_lock spinlock. 1148c2ecf20Sopenharmony_ci 1158c2ecf20Sopenharmony_ciOrdering constraints: 1168c2ecf20Sopenharmony_ci 1178c2ecf20Sopenharmony_ci If both erase_completion_lock and inocache_lock are needed, the 1188c2ecf20Sopenharmony_ci c->erase_completion has to be acquired first. 1198c2ecf20Sopenharmony_ci 1208c2ecf20Sopenharmony_ci 1218c2ecf20Sopenharmony_ci erase_free_sem 1228c2ecf20Sopenharmony_ci -------------- 1238c2ecf20Sopenharmony_ci 1248c2ecf20Sopenharmony_ciThis mutex is only used by the erase code which frees obsolete node 1258c2ecf20Sopenharmony_cireferences and the jffs2_garbage_collect_deletion_dirent() function. 1268c2ecf20Sopenharmony_ciThe latter function on NAND flash must read _obsolete_ nodes to 1278c2ecf20Sopenharmony_cidetermine whether the 'deletion dirent' under consideration can be 1288c2ecf20Sopenharmony_cidiscarded or whether it is still required to show that an inode has 1298c2ecf20Sopenharmony_cibeen unlinked. Because reading from the flash may sleep, the 1308c2ecf20Sopenharmony_cierase_completion_lock cannot be held, so an alternative, more 1318c2ecf20Sopenharmony_ciheavyweight lock was required to prevent the erase code from freeing 1328c2ecf20Sopenharmony_cithe jffs2_raw_node_ref structures in question while the garbage 1338c2ecf20Sopenharmony_cicollection code is looking at them. 1348c2ecf20Sopenharmony_ci 1358c2ecf20Sopenharmony_ciSuggestions for alternative solutions to this problem would be welcomed. 1368c2ecf20Sopenharmony_ci 1378c2ecf20Sopenharmony_ci 1388c2ecf20Sopenharmony_ci wbuf_sem 1398c2ecf20Sopenharmony_ci -------- 1408c2ecf20Sopenharmony_ci 1418c2ecf20Sopenharmony_ciThis read/write semaphore protects against concurrent access to the 1428c2ecf20Sopenharmony_ciwrite-behind buffer ('wbuf') used for flash chips where we must write 1438c2ecf20Sopenharmony_ciin blocks. It protects both the contents of the wbuf and the metadata 1448c2ecf20Sopenharmony_ciwhich indicates which flash region (if any) is currently covered by 1458c2ecf20Sopenharmony_cithe buffer. 1468c2ecf20Sopenharmony_ci 1478c2ecf20Sopenharmony_ciOrdering constraints: 1488c2ecf20Sopenharmony_ci Lock wbuf_sem last, after the alloc_sem or and f->sem. 1498c2ecf20Sopenharmony_ci 1508c2ecf20Sopenharmony_ci 1518c2ecf20Sopenharmony_ci c->xattr_sem 1528c2ecf20Sopenharmony_ci ------------ 1538c2ecf20Sopenharmony_ci 1548c2ecf20Sopenharmony_ciThis read/write semaphore protects against concurrent access to the 1558c2ecf20Sopenharmony_cixattr related objects which include stuff in superblock and ic->xref. 1568c2ecf20Sopenharmony_ciIn read-only path, write-semaphore is too much exclusion. It's enough 1578c2ecf20Sopenharmony_ciby read-semaphore. But you must hold write-semaphore when updating, 1588c2ecf20Sopenharmony_cicreating or deleting any xattr related object. 1598c2ecf20Sopenharmony_ci 1608c2ecf20Sopenharmony_ciOnce xattr_sem released, there would be no assurance for the existence 1618c2ecf20Sopenharmony_ciof those objects. Thus, a series of processes is often required to retry, 1628c2ecf20Sopenharmony_ciwhen updating such a object is necessary under holding read semaphore. 1638c2ecf20Sopenharmony_ciFor example, do_jffs2_getxattr() holds read-semaphore to scan xref and 1648c2ecf20Sopenharmony_cixdatum at first. But it retries this process with holding write-semaphore 1658c2ecf20Sopenharmony_ciafter release read-semaphore, if it's necessary to load name/value pair 1668c2ecf20Sopenharmony_cifrom medium. 1678c2ecf20Sopenharmony_ci 1688c2ecf20Sopenharmony_ciOrdering constraints: 1698c2ecf20Sopenharmony_ci Lock xattr_sem last, after the alloc_sem. 170