18c2ecf20Sopenharmony_ci.. _gfp_mask_from_fs_io: 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci================================= 48c2ecf20Sopenharmony_ciGFP masks used from FS/IO context 58c2ecf20Sopenharmony_ci================================= 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ci:Date: May, 2018 88c2ecf20Sopenharmony_ci:Author: Michal Hocko <mhocko@kernel.org> 98c2ecf20Sopenharmony_ci 108c2ecf20Sopenharmony_ciIntroduction 118c2ecf20Sopenharmony_ci============ 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciCode paths in the filesystem and IO stacks must be careful when 148c2ecf20Sopenharmony_ciallocating memory to prevent recursion deadlocks caused by direct 158c2ecf20Sopenharmony_cimemory reclaim calling back into the FS or IO paths and blocking on 168c2ecf20Sopenharmony_cialready held resources (e.g. locks - most commonly those used for the 178c2ecf20Sopenharmony_citransaction context). 188c2ecf20Sopenharmony_ci 198c2ecf20Sopenharmony_ciThe traditional way to avoid this deadlock problem is to clear __GFP_FS 208c2ecf20Sopenharmony_cirespectively __GFP_IO (note the latter implies clearing the first as well) in 218c2ecf20Sopenharmony_cithe gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be 228c2ecf20Sopenharmony_ciused as shortcut. It turned out though that above approach has led to 238c2ecf20Sopenharmony_ciabuses when the restricted gfp mask is used "just in case" without a 248c2ecf20Sopenharmony_cideeper consideration which leads to problems because an excessive use 258c2ecf20Sopenharmony_ciof GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory 268c2ecf20Sopenharmony_cireclaim issues. 278c2ecf20Sopenharmony_ci 288c2ecf20Sopenharmony_ciNew API 298c2ecf20Sopenharmony_ci======== 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ciSince 4.12 we do have a generic scope API for both NOFS and NOIO context 328c2ecf20Sopenharmony_ci``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``, 338c2ecf20Sopenharmony_ci``memalloc_noio_restore`` which allow to mark a scope to be a critical 348c2ecf20Sopenharmony_cisection from a filesystem or I/O point of view. Any allocation from that 358c2ecf20Sopenharmony_ciscope will inherently drop __GFP_FS respectively __GFP_IO from the given 368c2ecf20Sopenharmony_cimask so no memory allocation can recurse back in the FS/IO. 378c2ecf20Sopenharmony_ci 388c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/sched/mm.h 398c2ecf20Sopenharmony_ci :functions: memalloc_nofs_save memalloc_nofs_restore 408c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/sched/mm.h 418c2ecf20Sopenharmony_ci :functions: memalloc_noio_save memalloc_noio_restore 428c2ecf20Sopenharmony_ci 438c2ecf20Sopenharmony_ciFS/IO code then simply calls the appropriate save function before 448c2ecf20Sopenharmony_ciany critical section with respect to the reclaim is started - e.g. 458c2ecf20Sopenharmony_cilock shared with the reclaim context or when a transaction context 468c2ecf20Sopenharmony_cinesting would be possible via reclaim. The restore function should be 478c2ecf20Sopenharmony_cicalled when the critical section ends. All that ideally along with an 488c2ecf20Sopenharmony_ciexplanation what is the reclaim context for easier maintenance. 498c2ecf20Sopenharmony_ci 508c2ecf20Sopenharmony_ciPlease note that the proper pairing of save/restore functions 518c2ecf20Sopenharmony_ciallows nesting so it is safe to call ``memalloc_noio_save`` or 528c2ecf20Sopenharmony_ci``memalloc_noio_restore`` respectively from an existing NOIO or NOFS 538c2ecf20Sopenharmony_ciscope. 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ciWhat about __vmalloc(GFP_NOFS) 568c2ecf20Sopenharmony_ci============================== 578c2ecf20Sopenharmony_ci 588c2ecf20Sopenharmony_civmalloc doesn't support GFP_NOFS semantic because there are hardcoded 598c2ecf20Sopenharmony_ciGFP_KERNEL allocations deep inside the allocator which are quite non-trivial 608c2ecf20Sopenharmony_cito fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is 618c2ecf20Sopenharmony_cialmost always a bug. The good news is that the NOFS/NOIO semantic can be 628c2ecf20Sopenharmony_ciachieved by the scope API. 638c2ecf20Sopenharmony_ci 648c2ecf20Sopenharmony_ciIn the ideal world, upper layers should already mark dangerous contexts 658c2ecf20Sopenharmony_ciand so no special care is required and vmalloc should be called without 668c2ecf20Sopenharmony_ciany problems. Sometimes if the context is not really clear or there are 678c2ecf20Sopenharmony_cilayering violations then the recommended way around that is to wrap ``vmalloc`` 688c2ecf20Sopenharmony_ciby the scope API with a comment explaining the problem. 69