162306a36Sopenharmony_ci 262306a36Sopenharmony_ci JFFS2 LOCKING DOCUMENTATION 362306a36Sopenharmony_ci --------------------------- 462306a36Sopenharmony_ci 562306a36Sopenharmony_ciThis document attempts to describe the existing locking rules for 662306a36Sopenharmony_ciJFFS2. It is not expected to remain perfectly up to date, but ought to 762306a36Sopenharmony_cibe fairly close. 862306a36Sopenharmony_ci 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ci alloc_sem 1162306a36Sopenharmony_ci --------- 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciThe alloc_sem is a per-filesystem mutex, used primarily to ensure 1462306a36Sopenharmony_cicontiguous allocation of space on the medium. It is automatically 1562306a36Sopenharmony_ciobtained during space allocations (jffs2_reserve_space()) and freed 1662306a36Sopenharmony_ciupon write completion (jffs2_complete_reservation()). Note that 1762306a36Sopenharmony_cithe garbage collector will obtain this right at the beginning of 1862306a36Sopenharmony_cijffs2_garbage_collect_pass() and release it at the end, thereby 1962306a36Sopenharmony_cipreventing any other write activity on the file system during a 2062306a36Sopenharmony_cigarbage collect pass. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciWhen writing new nodes, the alloc_sem must be held until the new nodes 2362306a36Sopenharmony_cihave been properly linked into the data structures for the inode to 2462306a36Sopenharmony_ciwhich they belong. This is for the benefit of NAND flash - adding new 2562306a36Sopenharmony_cinodes to an inode may obsolete old ones, and by holding the alloc_sem 2662306a36Sopenharmony_ciuntil this happens we ensure that any data in the write-buffer at the 2762306a36Sopenharmony_citime this happens are part of the new node, not just something that 2862306a36Sopenharmony_ciwas written afterwards. Hence, we can ensure the newly-obsoleted nodes 2962306a36Sopenharmony_cidon't actually get erased until the write-buffer has been flushed to 3062306a36Sopenharmony_cithe medium. 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ciWith the introduction of NAND flash support and the write-buffer, 3362306a36Sopenharmony_cithe alloc_sem is also used to protect the wbuf-related members of the 3462306a36Sopenharmony_cijffs2_sb_info structure. Atomically reading the wbuf_len member to see 3562306a36Sopenharmony_ciif the wbuf is currently holding any data is permitted, though. 3662306a36Sopenharmony_ci 3762306a36Sopenharmony_ciOrdering constraints: See f->sem. 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ci 4062306a36Sopenharmony_ci File Mutex f->sem 4162306a36Sopenharmony_ci --------------------- 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciThis is the JFFS2-internal equivalent of the inode mutex i->i_sem. 4462306a36Sopenharmony_ciIt protects the contents of the jffs2_inode_info private inode data, 4562306a36Sopenharmony_ciincluding the linked list of node fragments (but see the notes below on 4662306a36Sopenharmony_cierase_completion_lock), etc. 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ciThe reason that the i_sem itself isn't used for this purpose is to 4962306a36Sopenharmony_ciavoid deadlocks with garbage collection -- the VFS will lock the i_sem 5062306a36Sopenharmony_cibefore calling a function which may need to allocate space. The 5162306a36Sopenharmony_ciallocation may trigger garbage-collection, which may need to move a 5262306a36Sopenharmony_cinode belonging to the inode which was locked in the first place by the 5362306a36Sopenharmony_ciVFS. If the garbage collection code were to attempt to lock the i_sem 5462306a36Sopenharmony_ciof the inode from which it's garbage-collecting a physical node, this 5562306a36Sopenharmony_cilead to deadlock, unless we played games with unlocking the i_sem 5662306a36Sopenharmony_cibefore calling the space allocation functions. 5762306a36Sopenharmony_ci 5862306a36Sopenharmony_ciInstead of playing such games, we just have an extra internal 5962306a36Sopenharmony_cimutex, which is obtained by the garbage collection code and also 6062306a36Sopenharmony_ciby the normal file system code _after_ allocation of space. 6162306a36Sopenharmony_ci 6262306a36Sopenharmony_ciOrdering constraints: 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ci 1. Never attempt to allocate space or lock alloc_sem with 6562306a36Sopenharmony_ci any f->sem held. 6662306a36Sopenharmony_ci 2. Never attempt to lock two file mutexes in one thread. 6762306a36Sopenharmony_ci No ordering rules have been made for doing so. 6862306a36Sopenharmony_ci 3. Never lock a page cache page with f->sem held. 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ci erase_completion_lock spinlock 7262306a36Sopenharmony_ci ------------------------------ 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ciThis is used to serialise access to the eraseblock lists, to the 7562306a36Sopenharmony_ciper-eraseblock lists of physical jffs2_raw_node_ref structures, and 7662306a36Sopenharmony_ci(NB) the per-inode list of physical nodes. The latter is a special 7762306a36Sopenharmony_cicase - see below. 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ciAs the MTD API no longer permits erase-completion callback functions 8062306a36Sopenharmony_cito be called from bottom-half (timer) context (on the basis that nobody 8162306a36Sopenharmony_ciever actually implemented such a thing), it's now sufficient to use 8262306a36Sopenharmony_cia simple spin_lock() rather than spin_lock_bh(). 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ciNote that the per-inode list of physical nodes (f->nodes) is a special 8562306a36Sopenharmony_cicase. Any changes to _valid_ nodes (i.e. ->flash_offset & 1 == 0) in 8662306a36Sopenharmony_cithe list are protected by the file mutex f->sem. But the erase code 8762306a36Sopenharmony_cimay remove _obsolete_ nodes from the list while holding only the 8862306a36Sopenharmony_cierase_completion_lock. So you can walk the list only while holding the 8962306a36Sopenharmony_cierase_completion_lock, and can drop the lock temporarily mid-walk as 9062306a36Sopenharmony_cilong as the pointer you're holding is to a _valid_ node, not an 9162306a36Sopenharmony_ciobsolete one. 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ciThe erase_completion_lock is also used to protect the c->gc_task 9462306a36Sopenharmony_cipointer when the garbage collection thread exits. The code to kill the 9562306a36Sopenharmony_ciGC thread locks it, sends the signal, then unlocks it - while the GC 9662306a36Sopenharmony_cithread itself locks it, zeroes c->gc_task, then unlocks on the exit path. 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ci 9962306a36Sopenharmony_ci inocache_lock spinlock 10062306a36Sopenharmony_ci ---------------------- 10162306a36Sopenharmony_ci 10262306a36Sopenharmony_ciThis spinlock protects the hashed list (c->inocache_list) of the 10362306a36Sopenharmony_ciin-core jffs2_inode_cache objects (each inode in JFFS2 has the 10462306a36Sopenharmony_cicorrespondent jffs2_inode_cache object). So, the inocache_lock 10562306a36Sopenharmony_cihas to be locked while walking the c->inocache_list hash buckets. 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ciThis spinlock also covers allocation of new inode numbers, which is 10862306a36Sopenharmony_cicurrently just '++->highest_ino++', but might one day get more complicated 10962306a36Sopenharmony_ciif we need to deal with wrapping after 4 milliard inode numbers are used. 11062306a36Sopenharmony_ci 11162306a36Sopenharmony_ciNote, the f->sem guarantees that the correspondent jffs2_inode_cache 11262306a36Sopenharmony_ciwill not be removed. So, it is allowed to access it without locking 11362306a36Sopenharmony_cithe inocache_lock spinlock. 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ciOrdering constraints: 11662306a36Sopenharmony_ci 11762306a36Sopenharmony_ci If both erase_completion_lock and inocache_lock are needed, the 11862306a36Sopenharmony_ci c->erase_completion has to be acquired first. 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ci 12162306a36Sopenharmony_ci erase_free_sem 12262306a36Sopenharmony_ci -------------- 12362306a36Sopenharmony_ci 12462306a36Sopenharmony_ciThis mutex is only used by the erase code which frees obsolete node 12562306a36Sopenharmony_cireferences and the jffs2_garbage_collect_deletion_dirent() function. 12662306a36Sopenharmony_ciThe latter function on NAND flash must read _obsolete_ nodes to 12762306a36Sopenharmony_cidetermine whether the 'deletion dirent' under consideration can be 12862306a36Sopenharmony_cidiscarded or whether it is still required to show that an inode has 12962306a36Sopenharmony_cibeen unlinked. Because reading from the flash may sleep, the 13062306a36Sopenharmony_cierase_completion_lock cannot be held, so an alternative, more 13162306a36Sopenharmony_ciheavyweight lock was required to prevent the erase code from freeing 13262306a36Sopenharmony_cithe jffs2_raw_node_ref structures in question while the garbage 13362306a36Sopenharmony_cicollection code is looking at them. 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ciSuggestions for alternative solutions to this problem would be welcomed. 13662306a36Sopenharmony_ci 13762306a36Sopenharmony_ci 13862306a36Sopenharmony_ci wbuf_sem 13962306a36Sopenharmony_ci -------- 14062306a36Sopenharmony_ci 14162306a36Sopenharmony_ciThis read/write semaphore protects against concurrent access to the 14262306a36Sopenharmony_ciwrite-behind buffer ('wbuf') used for flash chips where we must write 14362306a36Sopenharmony_ciin blocks. It protects both the contents of the wbuf and the metadata 14462306a36Sopenharmony_ciwhich indicates which flash region (if any) is currently covered by 14562306a36Sopenharmony_cithe buffer. 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ciOrdering constraints: 14862306a36Sopenharmony_ci Lock wbuf_sem last, after the alloc_sem or and f->sem. 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ci 15162306a36Sopenharmony_ci c->xattr_sem 15262306a36Sopenharmony_ci ------------ 15362306a36Sopenharmony_ci 15462306a36Sopenharmony_ciThis read/write semaphore protects against concurrent access to the 15562306a36Sopenharmony_cixattr related objects which include stuff in superblock and ic->xref. 15662306a36Sopenharmony_ciIn read-only path, write-semaphore is too much exclusion. It's enough 15762306a36Sopenharmony_ciby read-semaphore. But you must hold write-semaphore when updating, 15862306a36Sopenharmony_cicreating or deleting any xattr related object. 15962306a36Sopenharmony_ci 16062306a36Sopenharmony_ciOnce xattr_sem released, there would be no assurance for the existence 16162306a36Sopenharmony_ciof those objects. Thus, a series of processes is often required to retry, 16262306a36Sopenharmony_ciwhen updating such a object is necessary under holding read semaphore. 16362306a36Sopenharmony_ciFor example, do_jffs2_getxattr() holds read-semaphore to scan xref and 16462306a36Sopenharmony_cixdatum at first. But it retries this process with holding write-semaphore 16562306a36Sopenharmony_ciafter release read-semaphore, if it's necessary to load name/value pair 16662306a36Sopenharmony_cifrom medium. 16762306a36Sopenharmony_ci 16862306a36Sopenharmony_ciOrdering constraints: 16962306a36Sopenharmony_ci Lock xattr_sem last, after the alloc_sem. 170