162306a36Sopenharmony_ci.. _gfp_mask_from_fs_io: 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci================================= 462306a36Sopenharmony_ciGFP masks used from FS/IO context 562306a36Sopenharmony_ci================================= 662306a36Sopenharmony_ci 762306a36Sopenharmony_ci:Date: May, 2018 862306a36Sopenharmony_ci:Author: Michal Hocko <mhocko@kernel.org> 962306a36Sopenharmony_ci 1062306a36Sopenharmony_ciIntroduction 1162306a36Sopenharmony_ci============ 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciCode paths in the filesystem and IO stacks must be careful when 1462306a36Sopenharmony_ciallocating memory to prevent recursion deadlocks caused by direct 1562306a36Sopenharmony_cimemory reclaim calling back into the FS or IO paths and blocking on 1662306a36Sopenharmony_cialready held resources (e.g. locks - most commonly those used for the 1762306a36Sopenharmony_citransaction context). 1862306a36Sopenharmony_ci 1962306a36Sopenharmony_ciThe traditional way to avoid this deadlock problem is to clear __GFP_FS 2062306a36Sopenharmony_cirespectively __GFP_IO (note the latter implies clearing the first as well) in 2162306a36Sopenharmony_cithe gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be 2262306a36Sopenharmony_ciused as shortcut. It turned out though that above approach has led to 2362306a36Sopenharmony_ciabuses when the restricted gfp mask is used "just in case" without a 2462306a36Sopenharmony_cideeper consideration which leads to problems because an excessive use 2562306a36Sopenharmony_ciof GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory 2662306a36Sopenharmony_cireclaim issues. 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ciNew API 2962306a36Sopenharmony_ci======== 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciSince 4.12 we do have a generic scope API for both NOFS and NOIO context 3262306a36Sopenharmony_ci``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``, 3362306a36Sopenharmony_ci``memalloc_noio_restore`` which allow to mark a scope to be a critical 3462306a36Sopenharmony_cisection from a filesystem or I/O point of view. Any allocation from that 3562306a36Sopenharmony_ciscope will inherently drop __GFP_FS respectively __GFP_IO from the given 3662306a36Sopenharmony_cimask so no memory allocation can recurse back in the FS/IO. 3762306a36Sopenharmony_ci 3862306a36Sopenharmony_ci.. kernel-doc:: include/linux/sched/mm.h 3962306a36Sopenharmony_ci :functions: memalloc_nofs_save memalloc_nofs_restore 4062306a36Sopenharmony_ci.. kernel-doc:: include/linux/sched/mm.h 4162306a36Sopenharmony_ci :functions: memalloc_noio_save memalloc_noio_restore 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciFS/IO code then simply calls the appropriate save function before 4462306a36Sopenharmony_ciany critical section with respect to the reclaim is started - e.g. 4562306a36Sopenharmony_cilock shared with the reclaim context or when a transaction context 4662306a36Sopenharmony_cinesting would be possible via reclaim. The restore function should be 4762306a36Sopenharmony_cicalled when the critical section ends. All that ideally along with an 4862306a36Sopenharmony_ciexplanation what is the reclaim context for easier maintenance. 4962306a36Sopenharmony_ci 5062306a36Sopenharmony_ciPlease note that the proper pairing of save/restore functions 5162306a36Sopenharmony_ciallows nesting so it is safe to call ``memalloc_noio_save`` or 5262306a36Sopenharmony_ci``memalloc_noio_restore`` respectively from an existing NOIO or NOFS 5362306a36Sopenharmony_ciscope. 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ciWhat about __vmalloc(GFP_NOFS) 5662306a36Sopenharmony_ci============================== 5762306a36Sopenharmony_ci 5862306a36Sopenharmony_civmalloc doesn't support GFP_NOFS semantic because there are hardcoded 5962306a36Sopenharmony_ciGFP_KERNEL allocations deep inside the allocator which are quite non-trivial 6062306a36Sopenharmony_cito fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is 6162306a36Sopenharmony_cialmost always a bug. The good news is that the NOFS/NOIO semantic can be 6262306a36Sopenharmony_ciachieved by the scope API. 6362306a36Sopenharmony_ci 6462306a36Sopenharmony_ciIn the ideal world, upper layers should already mark dangerous contexts 6562306a36Sopenharmony_ciand so no special care is required and vmalloc should be called without 6662306a36Sopenharmony_ciany problems. Sometimes if the context is not really clear or there are 6762306a36Sopenharmony_cilayering violations then the recommended way around that is to wrap ``vmalloc`` 6862306a36Sopenharmony_ciby the scope API with a comment explaining the problem. 69