18c2ecf20Sopenharmony_ci.. _gfp_mask_from_fs_io:
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci=================================
48c2ecf20Sopenharmony_ciGFP masks used from FS/IO context
58c2ecf20Sopenharmony_ci=================================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ci:Date: May, 2018
88c2ecf20Sopenharmony_ci:Author: Michal Hocko <mhocko@kernel.org>
98c2ecf20Sopenharmony_ci
108c2ecf20Sopenharmony_ciIntroduction
118c2ecf20Sopenharmony_ci============
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ciCode paths in the filesystem and IO stacks must be careful when
148c2ecf20Sopenharmony_ciallocating memory to prevent recursion deadlocks caused by direct
158c2ecf20Sopenharmony_cimemory reclaim calling back into the FS or IO paths and blocking on
168c2ecf20Sopenharmony_cialready held resources (e.g. locks - most commonly those used for the
178c2ecf20Sopenharmony_citransaction context).
188c2ecf20Sopenharmony_ci
198c2ecf20Sopenharmony_ciThe traditional way to avoid this deadlock problem is to clear __GFP_FS
208c2ecf20Sopenharmony_cirespectively __GFP_IO (note the latter implies clearing the first as well) in
218c2ecf20Sopenharmony_cithe gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be
228c2ecf20Sopenharmony_ciused as shortcut. It turned out though that above approach has led to
238c2ecf20Sopenharmony_ciabuses when the restricted gfp mask is used "just in case" without a
248c2ecf20Sopenharmony_cideeper consideration which leads to problems because an excessive use
258c2ecf20Sopenharmony_ciof GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory
268c2ecf20Sopenharmony_cireclaim issues.
278c2ecf20Sopenharmony_ci
288c2ecf20Sopenharmony_ciNew API
298c2ecf20Sopenharmony_ci========
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ciSince 4.12 we do have a generic scope API for both NOFS and NOIO context
328c2ecf20Sopenharmony_ci``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``,
338c2ecf20Sopenharmony_ci``memalloc_noio_restore`` which allow to mark a scope to be a critical
348c2ecf20Sopenharmony_cisection from a filesystem or I/O point of view. Any allocation from that
358c2ecf20Sopenharmony_ciscope will inherently drop __GFP_FS respectively __GFP_IO from the given
368c2ecf20Sopenharmony_cimask so no memory allocation can recurse back in the FS/IO.
378c2ecf20Sopenharmony_ci
388c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/sched/mm.h
398c2ecf20Sopenharmony_ci   :functions: memalloc_nofs_save memalloc_nofs_restore
408c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/sched/mm.h
418c2ecf20Sopenharmony_ci   :functions: memalloc_noio_save memalloc_noio_restore
428c2ecf20Sopenharmony_ci
438c2ecf20Sopenharmony_ciFS/IO code then simply calls the appropriate save function before
448c2ecf20Sopenharmony_ciany critical section with respect to the reclaim is started - e.g.
458c2ecf20Sopenharmony_cilock shared with the reclaim context or when a transaction context
468c2ecf20Sopenharmony_cinesting would be possible via reclaim. The restore function should be
478c2ecf20Sopenharmony_cicalled when the critical section ends. All that ideally along with an
488c2ecf20Sopenharmony_ciexplanation what is the reclaim context for easier maintenance.
498c2ecf20Sopenharmony_ci
508c2ecf20Sopenharmony_ciPlease note that the proper pairing of save/restore functions
518c2ecf20Sopenharmony_ciallows nesting so it is safe to call ``memalloc_noio_save`` or
528c2ecf20Sopenharmony_ci``memalloc_noio_restore`` respectively from an existing NOIO or NOFS
538c2ecf20Sopenharmony_ciscope.
548c2ecf20Sopenharmony_ci
558c2ecf20Sopenharmony_ciWhat about __vmalloc(GFP_NOFS)
568c2ecf20Sopenharmony_ci==============================
578c2ecf20Sopenharmony_ci
588c2ecf20Sopenharmony_civmalloc doesn't support GFP_NOFS semantic because there are hardcoded
598c2ecf20Sopenharmony_ciGFP_KERNEL allocations deep inside the allocator which are quite non-trivial
608c2ecf20Sopenharmony_cito fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is
618c2ecf20Sopenharmony_cialmost always a bug. The good news is that the NOFS/NOIO semantic can be
628c2ecf20Sopenharmony_ciachieved by the scope API.
638c2ecf20Sopenharmony_ci
648c2ecf20Sopenharmony_ciIn the ideal world, upper layers should already mark dangerous contexts
658c2ecf20Sopenharmony_ciand so no special care is required and vmalloc should be called without
668c2ecf20Sopenharmony_ciany problems. Sometimes if the context is not really clear or there are
678c2ecf20Sopenharmony_cilayering violations then the recommended way around that is to wrap ``vmalloc``
688c2ecf20Sopenharmony_ciby the scope API with a comment explaining the problem.
69