162306a36Sopenharmony_ci
262306a36Sopenharmony_ci	JFFS2 LOCKING DOCUMENTATION
362306a36Sopenharmony_ci	---------------------------
462306a36Sopenharmony_ci
562306a36Sopenharmony_ciThis document attempts to describe the existing locking rules for
662306a36Sopenharmony_ciJFFS2. It is not expected to remain perfectly up to date, but ought to
762306a36Sopenharmony_cibe fairly close.
862306a36Sopenharmony_ci
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ci	alloc_sem
1162306a36Sopenharmony_ci	---------
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciThe alloc_sem is a per-filesystem mutex, used primarily to ensure
1462306a36Sopenharmony_cicontiguous allocation of space on the medium. It is automatically
1562306a36Sopenharmony_ciobtained during space allocations (jffs2_reserve_space()) and freed
1662306a36Sopenharmony_ciupon write completion (jffs2_complete_reservation()). Note that
1762306a36Sopenharmony_cithe garbage collector will obtain this right at the beginning of
1862306a36Sopenharmony_cijffs2_garbage_collect_pass() and release it at the end, thereby
1962306a36Sopenharmony_cipreventing any other write activity on the file system during a
2062306a36Sopenharmony_cigarbage collect pass.
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciWhen writing new nodes, the alloc_sem must be held until the new nodes
2362306a36Sopenharmony_cihave been properly linked into the data structures for the inode to
2462306a36Sopenharmony_ciwhich they belong. This is for the benefit of NAND flash - adding new
2562306a36Sopenharmony_cinodes to an inode may obsolete old ones, and by holding the alloc_sem
2662306a36Sopenharmony_ciuntil this happens we ensure that any data in the write-buffer at the
2762306a36Sopenharmony_citime this happens are part of the new node, not just something that
2862306a36Sopenharmony_ciwas written afterwards. Hence, we can ensure the newly-obsoleted nodes
2962306a36Sopenharmony_cidon't actually get erased until the write-buffer has been flushed to
3062306a36Sopenharmony_cithe medium.
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ciWith the introduction of NAND flash support and the write-buffer, 
3362306a36Sopenharmony_cithe alloc_sem is also used to protect the wbuf-related members of the
3462306a36Sopenharmony_cijffs2_sb_info structure. Atomically reading the wbuf_len member to see
3562306a36Sopenharmony_ciif the wbuf is currently holding any data is permitted, though.
3662306a36Sopenharmony_ci
3762306a36Sopenharmony_ciOrdering constraints: See f->sem.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ci
4062306a36Sopenharmony_ci	File Mutex f->sem
4162306a36Sopenharmony_ci	---------------------
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciThis is the JFFS2-internal equivalent of the inode mutex i->i_sem.
4462306a36Sopenharmony_ciIt protects the contents of the jffs2_inode_info private inode data,
4562306a36Sopenharmony_ciincluding the linked list of node fragments (but see the notes below on
4662306a36Sopenharmony_cierase_completion_lock), etc.
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ciThe reason that the i_sem itself isn't used for this purpose is to
4962306a36Sopenharmony_ciavoid deadlocks with garbage collection -- the VFS will lock the i_sem
5062306a36Sopenharmony_cibefore calling a function which may need to allocate space. The
5162306a36Sopenharmony_ciallocation may trigger garbage-collection, which may need to move a
5262306a36Sopenharmony_cinode belonging to the inode which was locked in the first place by the
5362306a36Sopenharmony_ciVFS. If the garbage collection code were to attempt to lock the i_sem
5462306a36Sopenharmony_ciof the inode from which it's garbage-collecting a physical node, this
5562306a36Sopenharmony_cilead to deadlock, unless we played games with unlocking the i_sem
5662306a36Sopenharmony_cibefore calling the space allocation functions.
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_ciInstead of playing such games, we just have an extra internal
5962306a36Sopenharmony_cimutex, which is obtained by the garbage collection code and also
6062306a36Sopenharmony_ciby the normal file system code _after_ allocation of space.
6162306a36Sopenharmony_ci
6262306a36Sopenharmony_ciOrdering constraints: 
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ci	1. Never attempt to allocate space or lock alloc_sem with 
6562306a36Sopenharmony_ci	   any f->sem held.
6662306a36Sopenharmony_ci	2. Never attempt to lock two file mutexes in one thread.
6762306a36Sopenharmony_ci	   No ordering rules have been made for doing so.
6862306a36Sopenharmony_ci	3. Never lock a page cache page with f->sem held.
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ci	erase_completion_lock spinlock
7262306a36Sopenharmony_ci	------------------------------
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ciThis is used to serialise access to the eraseblock lists, to the
7562306a36Sopenharmony_ciper-eraseblock lists of physical jffs2_raw_node_ref structures, and
7662306a36Sopenharmony_ci(NB) the per-inode list of physical nodes. The latter is a special
7762306a36Sopenharmony_cicase - see below.
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ciAs the MTD API no longer permits erase-completion callback functions
8062306a36Sopenharmony_cito be called from bottom-half (timer) context (on the basis that nobody
8162306a36Sopenharmony_ciever actually implemented such a thing), it's now sufficient to use
8262306a36Sopenharmony_cia simple spin_lock() rather than spin_lock_bh().
8362306a36Sopenharmony_ci
8462306a36Sopenharmony_ciNote that the per-inode list of physical nodes (f->nodes) is a special
8562306a36Sopenharmony_cicase. Any changes to _valid_ nodes (i.e. ->flash_offset & 1 == 0) in
8662306a36Sopenharmony_cithe list are protected by the file mutex f->sem. But the erase code
8762306a36Sopenharmony_cimay remove _obsolete_ nodes from the list while holding only the
8862306a36Sopenharmony_cierase_completion_lock. So you can walk the list only while holding the
8962306a36Sopenharmony_cierase_completion_lock, and can drop the lock temporarily mid-walk as
9062306a36Sopenharmony_cilong as the pointer you're holding is to a _valid_ node, not an
9162306a36Sopenharmony_ciobsolete one.
9262306a36Sopenharmony_ci
9362306a36Sopenharmony_ciThe erase_completion_lock is also used to protect the c->gc_task
9462306a36Sopenharmony_cipointer when the garbage collection thread exits. The code to kill the
9562306a36Sopenharmony_ciGC thread locks it, sends the signal, then unlocks it - while the GC
9662306a36Sopenharmony_cithread itself locks it, zeroes c->gc_task, then unlocks on the exit path.
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ci
9962306a36Sopenharmony_ci	inocache_lock spinlock
10062306a36Sopenharmony_ci	----------------------
10162306a36Sopenharmony_ci
10262306a36Sopenharmony_ciThis spinlock protects the hashed list (c->inocache_list) of the
10362306a36Sopenharmony_ciin-core jffs2_inode_cache objects (each inode in JFFS2 has the
10462306a36Sopenharmony_cicorrespondent jffs2_inode_cache object). So, the inocache_lock
10562306a36Sopenharmony_cihas to be locked while walking the c->inocache_list hash buckets.
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ciThis spinlock also covers allocation of new inode numbers, which is
10862306a36Sopenharmony_cicurrently just '++->highest_ino++', but might one day get more complicated
10962306a36Sopenharmony_ciif we need to deal with wrapping after 4 milliard inode numbers are used.
11062306a36Sopenharmony_ci
11162306a36Sopenharmony_ciNote, the f->sem guarantees that the correspondent jffs2_inode_cache
11262306a36Sopenharmony_ciwill not be removed. So, it is allowed to access it without locking
11362306a36Sopenharmony_cithe inocache_lock spinlock. 
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ciOrdering constraints: 
11662306a36Sopenharmony_ci
11762306a36Sopenharmony_ci	If both erase_completion_lock and inocache_lock are needed, the
11862306a36Sopenharmony_ci	c->erase_completion has to be acquired first.
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ci
12162306a36Sopenharmony_ci	erase_free_sem
12262306a36Sopenharmony_ci	--------------
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ciThis mutex is only used by the erase code which frees obsolete node
12562306a36Sopenharmony_cireferences and the jffs2_garbage_collect_deletion_dirent() function.
12662306a36Sopenharmony_ciThe latter function on NAND flash must read _obsolete_ nodes to
12762306a36Sopenharmony_cidetermine whether the 'deletion dirent' under consideration can be
12862306a36Sopenharmony_cidiscarded or whether it is still required to show that an inode has
12962306a36Sopenharmony_cibeen unlinked. Because reading from the flash may sleep, the
13062306a36Sopenharmony_cierase_completion_lock cannot be held, so an alternative, more
13162306a36Sopenharmony_ciheavyweight lock was required to prevent the erase code from freeing
13262306a36Sopenharmony_cithe jffs2_raw_node_ref structures in question while the garbage
13362306a36Sopenharmony_cicollection code is looking at them.
13462306a36Sopenharmony_ci
13562306a36Sopenharmony_ciSuggestions for alternative solutions to this problem would be welcomed.
13662306a36Sopenharmony_ci
13762306a36Sopenharmony_ci
13862306a36Sopenharmony_ci	wbuf_sem
13962306a36Sopenharmony_ci	--------
14062306a36Sopenharmony_ci
14162306a36Sopenharmony_ciThis read/write semaphore protects against concurrent access to the
14262306a36Sopenharmony_ciwrite-behind buffer ('wbuf') used for flash chips where we must write
14362306a36Sopenharmony_ciin blocks. It protects both the contents of the wbuf and the metadata
14462306a36Sopenharmony_ciwhich indicates which flash region (if any) is currently covered by 
14562306a36Sopenharmony_cithe buffer.
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ciOrdering constraints:
14862306a36Sopenharmony_ci	Lock wbuf_sem last, after the alloc_sem or and f->sem.
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_ci
15162306a36Sopenharmony_ci	c->xattr_sem
15262306a36Sopenharmony_ci	------------
15362306a36Sopenharmony_ci
15462306a36Sopenharmony_ciThis read/write semaphore protects against concurrent access to the
15562306a36Sopenharmony_cixattr related objects which include stuff in superblock and ic->xref.
15662306a36Sopenharmony_ciIn read-only path, write-semaphore is too much exclusion. It's enough
15762306a36Sopenharmony_ciby read-semaphore. But you must hold write-semaphore when updating,
15862306a36Sopenharmony_cicreating or deleting any xattr related object.
15962306a36Sopenharmony_ci
16062306a36Sopenharmony_ciOnce xattr_sem released, there would be no assurance for the existence
16162306a36Sopenharmony_ciof those objects. Thus, a series of processes is often required to retry,
16262306a36Sopenharmony_ciwhen updating such a object is necessary under holding read semaphore.
16362306a36Sopenharmony_ciFor example, do_jffs2_getxattr() holds read-semaphore to scan xref and
16462306a36Sopenharmony_cixdatum at first. But it retries this process with holding write-semaphore
16562306a36Sopenharmony_ciafter release read-semaphore, if it's necessary to load name/value pair
16662306a36Sopenharmony_cifrom medium.
16762306a36Sopenharmony_ci
16862306a36Sopenharmony_ciOrdering constraints:
16962306a36Sopenharmony_ci	Lock xattr_sem last, after the alloc_sem.
170