162306a36Sopenharmony_ci.. _gfp_mask_from_fs_io:
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci=================================
462306a36Sopenharmony_ciGFP masks used from FS/IO context
562306a36Sopenharmony_ci=================================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ci:Date: May, 2018
862306a36Sopenharmony_ci:Author: Michal Hocko <mhocko@kernel.org>
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ciIntroduction
1162306a36Sopenharmony_ci============
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciCode paths in the filesystem and IO stacks must be careful when
1462306a36Sopenharmony_ciallocating memory to prevent recursion deadlocks caused by direct
1562306a36Sopenharmony_cimemory reclaim calling back into the FS or IO paths and blocking on
1662306a36Sopenharmony_cialready held resources (e.g. locks - most commonly those used for the
1762306a36Sopenharmony_citransaction context).
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciThe traditional way to avoid this deadlock problem is to clear __GFP_FS
2062306a36Sopenharmony_cirespectively __GFP_IO (note the latter implies clearing the first as well) in
2162306a36Sopenharmony_cithe gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be
2262306a36Sopenharmony_ciused as shortcut. It turned out though that above approach has led to
2362306a36Sopenharmony_ciabuses when the restricted gfp mask is used "just in case" without a
2462306a36Sopenharmony_cideeper consideration which leads to problems because an excessive use
2562306a36Sopenharmony_ciof GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory
2662306a36Sopenharmony_cireclaim issues.
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ciNew API
2962306a36Sopenharmony_ci========
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciSince 4.12 we do have a generic scope API for both NOFS and NOIO context
3262306a36Sopenharmony_ci``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``,
3362306a36Sopenharmony_ci``memalloc_noio_restore`` which allow to mark a scope to be a critical
3462306a36Sopenharmony_cisection from a filesystem or I/O point of view. Any allocation from that
3562306a36Sopenharmony_ciscope will inherently drop __GFP_FS respectively __GFP_IO from the given
3662306a36Sopenharmony_cimask so no memory allocation can recurse back in the FS/IO.
3762306a36Sopenharmony_ci
3862306a36Sopenharmony_ci.. kernel-doc:: include/linux/sched/mm.h
3962306a36Sopenharmony_ci   :functions: memalloc_nofs_save memalloc_nofs_restore
4062306a36Sopenharmony_ci.. kernel-doc:: include/linux/sched/mm.h
4162306a36Sopenharmony_ci   :functions: memalloc_noio_save memalloc_noio_restore
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciFS/IO code then simply calls the appropriate save function before
4462306a36Sopenharmony_ciany critical section with respect to the reclaim is started - e.g.
4562306a36Sopenharmony_cilock shared with the reclaim context or when a transaction context
4662306a36Sopenharmony_cinesting would be possible via reclaim. The restore function should be
4762306a36Sopenharmony_cicalled when the critical section ends. All that ideally along with an
4862306a36Sopenharmony_ciexplanation what is the reclaim context for easier maintenance.
4962306a36Sopenharmony_ci
5062306a36Sopenharmony_ciPlease note that the proper pairing of save/restore functions
5162306a36Sopenharmony_ciallows nesting so it is safe to call ``memalloc_noio_save`` or
5262306a36Sopenharmony_ci``memalloc_noio_restore`` respectively from an existing NOIO or NOFS
5362306a36Sopenharmony_ciscope.
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ciWhat about __vmalloc(GFP_NOFS)
5662306a36Sopenharmony_ci==============================
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_civmalloc doesn't support GFP_NOFS semantic because there are hardcoded
5962306a36Sopenharmony_ciGFP_KERNEL allocations deep inside the allocator which are quite non-trivial
6062306a36Sopenharmony_cito fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is
6162306a36Sopenharmony_cialmost always a bug. The good news is that the NOFS/NOIO semantic can be
6262306a36Sopenharmony_ciachieved by the scope API.
6362306a36Sopenharmony_ci
6462306a36Sopenharmony_ciIn the ideal world, upper layers should already mark dangerous contexts
6562306a36Sopenharmony_ciand so no special care is required and vmalloc should be called without
6662306a36Sopenharmony_ciany problems. Sometimes if the context is not really clear or there are
6762306a36Sopenharmony_cilayering violations then the recommended way around that is to wrap ``vmalloc``
6862306a36Sopenharmony_ciby the scope API with a comment explaining the problem.
69