18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci==========================================
48c2ecf20Sopenharmony_ciWHAT IS Flash-Friendly File System (F2FS)?
58c2ecf20Sopenharmony_ci==========================================
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ciNAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
88c2ecf20Sopenharmony_cibeen equipped on a variety systems ranging from mobile to server systems. Since
98c2ecf20Sopenharmony_cithey are known to have different characteristics from the conventional rotating
108c2ecf20Sopenharmony_cidisks, a file system, an upper layer to the storage device, should adapt to the
118c2ecf20Sopenharmony_cichanges from the sketch in the design level.
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ciF2FS is a file system exploiting NAND flash memory-based storage devices, which
148c2ecf20Sopenharmony_ciis based on Log-structured File System (LFS). The design has been focused on
158c2ecf20Sopenharmony_ciaddressing the fundamental issues in LFS, which are snowball effect of wandering
168c2ecf20Sopenharmony_citree and high cleaning overhead.
178c2ecf20Sopenharmony_ci
188c2ecf20Sopenharmony_ciSince a NAND flash memory-based storage device shows different characteristic
198c2ecf20Sopenharmony_ciaccording to its internal geometry or flash memory management scheme, namely FTL,
208c2ecf20Sopenharmony_ciF2FS and its tools support various parameters not only for configuring on-disk
218c2ecf20Sopenharmony_cilayout, but also for selecting allocation and cleaning algorithms.
228c2ecf20Sopenharmony_ci
238c2ecf20Sopenharmony_ciThe following git tree provides the file system formatting tool (mkfs.f2fs),
248c2ecf20Sopenharmony_cia consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ci- git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
278c2ecf20Sopenharmony_ci
288c2ecf20Sopenharmony_ciFor reporting bugs and sending patches, please use the following mailing list:
298c2ecf20Sopenharmony_ci
308c2ecf20Sopenharmony_ci- linux-f2fs-devel@lists.sourceforge.net
318c2ecf20Sopenharmony_ci
328c2ecf20Sopenharmony_ciBackground and Design issues
338c2ecf20Sopenharmony_ci============================
348c2ecf20Sopenharmony_ci
358c2ecf20Sopenharmony_ciLog-structured File System (LFS)
368c2ecf20Sopenharmony_ci--------------------------------
378c2ecf20Sopenharmony_ci"A log-structured file system writes all modifications to disk sequentially in
388c2ecf20Sopenharmony_cia log-like structure, thereby speeding up  both file writing and crash recovery.
398c2ecf20Sopenharmony_ciThe log is the only structure on disk; it contains indexing information so that
408c2ecf20Sopenharmony_cifiles can be read back from the log efficiently. In order to maintain large free
418c2ecf20Sopenharmony_ciareas on disk for fast writing, we divide  the log into segments and use a
428c2ecf20Sopenharmony_cisegment cleaner to compress the live information from heavily fragmented
438c2ecf20Sopenharmony_cisegments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
448c2ecf20Sopenharmony_ciimplementation of a log-structured file system", ACM Trans. Computer Systems
458c2ecf20Sopenharmony_ci10, 1, 26–52.
468c2ecf20Sopenharmony_ci
478c2ecf20Sopenharmony_ciWandering Tree Problem
488c2ecf20Sopenharmony_ci----------------------
498c2ecf20Sopenharmony_ciIn LFS, when a file data is updated and written to the end of log, its direct
508c2ecf20Sopenharmony_cipointer block is updated due to the changed location. Then the indirect pointer
518c2ecf20Sopenharmony_ciblock is also updated due to the direct pointer block update. In this manner,
528c2ecf20Sopenharmony_cithe upper index structures such as inode, inode map, and checkpoint block are
538c2ecf20Sopenharmony_cialso updated recursively. This problem is called as wandering tree problem [1],
548c2ecf20Sopenharmony_ciand in order to enhance the performance, it should eliminate or relax the update
558c2ecf20Sopenharmony_cipropagation as much as possible.
568c2ecf20Sopenharmony_ci
578c2ecf20Sopenharmony_ci[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
588c2ecf20Sopenharmony_ci
598c2ecf20Sopenharmony_ciCleaning Overhead
608c2ecf20Sopenharmony_ci-----------------
618c2ecf20Sopenharmony_ciSince LFS is based on out-of-place writes, it produces so many obsolete blocks
628c2ecf20Sopenharmony_ciscattered across the whole storage. In order to serve new empty log space, it
638c2ecf20Sopenharmony_cineeds to reclaim these obsolete blocks seamlessly to users. This job is called
648c2ecf20Sopenharmony_cias a cleaning process.
658c2ecf20Sopenharmony_ci
668c2ecf20Sopenharmony_ciThe process consists of three operations as follows.
678c2ecf20Sopenharmony_ci
688c2ecf20Sopenharmony_ci1. A victim segment is selected through referencing segment usage table.
698c2ecf20Sopenharmony_ci2. It loads parent index structures of all the data in the victim identified by
708c2ecf20Sopenharmony_ci   segment summary blocks.
718c2ecf20Sopenharmony_ci3. It checks the cross-reference between the data and its parent index structure.
728c2ecf20Sopenharmony_ci4. It moves valid data selectively.
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ciThis cleaning job may cause unexpected long delays, so the most important goal
758c2ecf20Sopenharmony_ciis to hide the latencies to users. And also definitely, it should reduce the
768c2ecf20Sopenharmony_ciamount of valid data to be moved, and move them quickly as well.
778c2ecf20Sopenharmony_ci
788c2ecf20Sopenharmony_ciKey Features
798c2ecf20Sopenharmony_ci============
808c2ecf20Sopenharmony_ci
818c2ecf20Sopenharmony_ciFlash Awareness
828c2ecf20Sopenharmony_ci---------------
838c2ecf20Sopenharmony_ci- Enlarge the random write area for better performance, but provide the high
848c2ecf20Sopenharmony_ci  spatial locality
858c2ecf20Sopenharmony_ci- Align FS data structures to the operational units in FTL as best efforts
868c2ecf20Sopenharmony_ci
878c2ecf20Sopenharmony_ciWandering Tree Problem
888c2ecf20Sopenharmony_ci----------------------
898c2ecf20Sopenharmony_ci- Use a term, “node”, that represents inodes as well as various pointer blocks
908c2ecf20Sopenharmony_ci- Introduce Node Address Table (NAT) containing the locations of all the “node”
918c2ecf20Sopenharmony_ci  blocks; this will cut off the update propagation.
928c2ecf20Sopenharmony_ci
938c2ecf20Sopenharmony_ciCleaning Overhead
948c2ecf20Sopenharmony_ci-----------------
958c2ecf20Sopenharmony_ci- Support a background cleaning process
968c2ecf20Sopenharmony_ci- Support greedy and cost-benefit algorithms for victim selection policies
978c2ecf20Sopenharmony_ci- Support multi-head logs for static/dynamic hot and cold data separation
988c2ecf20Sopenharmony_ci- Introduce adaptive logging for efficient block allocation
998c2ecf20Sopenharmony_ci
1008c2ecf20Sopenharmony_ciMount Options
1018c2ecf20Sopenharmony_ci=============
1028c2ecf20Sopenharmony_ci
1038c2ecf20Sopenharmony_ci
1048c2ecf20Sopenharmony_ci======================== ============================================================
1058c2ecf20Sopenharmony_cibackground_gc=%s	 Turn on/off cleaning operations, namely garbage
1068c2ecf20Sopenharmony_ci			 collection, triggered in background when I/O subsystem is
1078c2ecf20Sopenharmony_ci			 idle. If background_gc=on, it will turn on the garbage
1088c2ecf20Sopenharmony_ci			 collection and if background_gc=off, garbage collection
1098c2ecf20Sopenharmony_ci			 will be turned off. If background_gc=sync, it will turn
1108c2ecf20Sopenharmony_ci			 on synchronous garbage collection running in background.
1118c2ecf20Sopenharmony_ci			 Default value for this option is on. So garbage
1128c2ecf20Sopenharmony_ci			 collection is on by default.
1138c2ecf20Sopenharmony_cigc_merge		 When background_gc is on, this option can be enabled to
1148c2ecf20Sopenharmony_ci			 let background GC thread to handle foreground GC requests,
1158c2ecf20Sopenharmony_ci			 it can eliminate the sluggish issue caused by slow foreground
1168c2ecf20Sopenharmony_ci			 GC operation when GC is triggered from a process with limited
1178c2ecf20Sopenharmony_ci			 I/O and CPU resources.
1188c2ecf20Sopenharmony_cinogc_merge		 Disable GC merge feature.
1198c2ecf20Sopenharmony_cidisable_roll_forward	 Disable the roll-forward recovery routine
1208c2ecf20Sopenharmony_cinorecovery		 Disable the roll-forward recovery routine, mounted read-
1218c2ecf20Sopenharmony_ci			 only (i.e., -o ro,disable_roll_forward)
1228c2ecf20Sopenharmony_cidiscard/nodiscard	 Enable/disable real-time discard in f2fs, if discard is
1238c2ecf20Sopenharmony_ci			 enabled, f2fs will issue discard/TRIM commands when a
1248c2ecf20Sopenharmony_ci			 segment is cleaned.
1258c2ecf20Sopenharmony_cino_heap			 Disable heap-style segment allocation which finds free
1268c2ecf20Sopenharmony_ci			 segments for data from the beginning of main area, while
1278c2ecf20Sopenharmony_ci			 for node from the end of main area.
1288c2ecf20Sopenharmony_cinouser_xattr		 Disable Extended User Attributes. Note: xattr is enabled
1298c2ecf20Sopenharmony_ci			 by default if CONFIG_F2FS_FS_XATTR is selected.
1308c2ecf20Sopenharmony_cinoacl			 Disable POSIX Access Control List. Note: acl is enabled
1318c2ecf20Sopenharmony_ci			 by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
1328c2ecf20Sopenharmony_ciactive_logs=%u		 Support configuring the number of active logs. In the
1338c2ecf20Sopenharmony_ci			 current design, f2fs supports only 2, 4, and 6 logs.
1348c2ecf20Sopenharmony_ci			 Default number is 6.
1358c2ecf20Sopenharmony_cidisable_ext_identify	 Disable the extension list configured by mkfs, so f2fs
1368c2ecf20Sopenharmony_ci			 is not aware of cold files such as media files.
1378c2ecf20Sopenharmony_ciinline_xattr		 Enable the inline xattrs feature.
1388c2ecf20Sopenharmony_cinoinline_xattr		 Disable the inline xattrs feature.
1398c2ecf20Sopenharmony_ciinline_xattr_size=%u	 Support configuring inline xattr size, it depends on
1408c2ecf20Sopenharmony_ci			 flexible inline xattr feature.
1418c2ecf20Sopenharmony_ciinline_data		 Enable the inline data feature: Newly created small (<~3.4k)
1428c2ecf20Sopenharmony_ci			 files can be written into inode block.
1438c2ecf20Sopenharmony_ciinline_dentry		 Enable the inline dir feature: data in newly created
1448c2ecf20Sopenharmony_ci			 directory entries can be written into inode block. The
1458c2ecf20Sopenharmony_ci			 space of inode block which is used to store inline
1468c2ecf20Sopenharmony_ci			 dentries is limited to ~3.4k.
1478c2ecf20Sopenharmony_cinoinline_dentry		 Disable the inline dentry feature.
1488c2ecf20Sopenharmony_ciflush_merge		 Merge concurrent cache_flush commands as much as possible
1498c2ecf20Sopenharmony_ci			 to eliminate redundant command issues. If the underlying
1508c2ecf20Sopenharmony_ci			 device handles the cache_flush command relatively slowly,
1518c2ecf20Sopenharmony_ci			 recommend to enable this option.
1528c2ecf20Sopenharmony_cinobarrier		 This option can be used if underlying storage guarantees
1538c2ecf20Sopenharmony_ci			 its cached data should be written to the novolatile area.
1548c2ecf20Sopenharmony_ci			 If this option is set, no cache_flush commands are issued
1558c2ecf20Sopenharmony_ci			 but f2fs still guarantees the write ordering of all the
1568c2ecf20Sopenharmony_ci			 data writes.
1578c2ecf20Sopenharmony_cifastboot		 This option is used when a system wants to reduce mount
1588c2ecf20Sopenharmony_ci			 time as much as possible, even though normal performance
1598c2ecf20Sopenharmony_ci			 can be sacrificed.
1608c2ecf20Sopenharmony_ciextent_cache		 Enable an extent cache based on rb-tree, it can cache
1618c2ecf20Sopenharmony_ci			 as many as extent which map between contiguous logical
1628c2ecf20Sopenharmony_ci			 address and physical address per inode, resulting in
1638c2ecf20Sopenharmony_ci			 increasing the cache hit ratio. Set by default.
1648c2ecf20Sopenharmony_cinoextent_cache		 Disable an extent cache based on rb-tree explicitly, see
1658c2ecf20Sopenharmony_ci			 the above extent_cache mount option.
1668c2ecf20Sopenharmony_cinoinline_data		 Disable the inline data feature, inline data feature is
1678c2ecf20Sopenharmony_ci			 enabled by default.
1688c2ecf20Sopenharmony_cidata_flush		 Enable data flushing before checkpoint in order to
1698c2ecf20Sopenharmony_ci			 persist data of regular and symlink.
1708c2ecf20Sopenharmony_cireserve_root=%d		 Support configuring reserved space which is used for
1718c2ecf20Sopenharmony_ci			 allocation from a privileged user with specified uid or
1728c2ecf20Sopenharmony_ci			 gid, unit: 4KB, the default limit is 0.2% of user blocks.
1738c2ecf20Sopenharmony_ciresuid=%d		 The user ID which may use the reserved blocks.
1748c2ecf20Sopenharmony_ciresgid=%d		 The group ID which may use the reserved blocks.
1758c2ecf20Sopenharmony_cifault_injection=%d	 Enable fault injection in all supported types with
1768c2ecf20Sopenharmony_ci			 specified injection rate.
1778c2ecf20Sopenharmony_cifault_type=%d		 Support configuring fault injection type, should be
1788c2ecf20Sopenharmony_ci			 enabled with fault_injection option, fault type value
1798c2ecf20Sopenharmony_ci			 is shown below, it supports single or combined type.
1808c2ecf20Sopenharmony_ci
1818c2ecf20Sopenharmony_ci			 ===================	  ===========
1828c2ecf20Sopenharmony_ci			 Type_Name		  Type_Value
1838c2ecf20Sopenharmony_ci			 ===================	  ===========
1848c2ecf20Sopenharmony_ci			 FAULT_KMALLOC		  0x000000001
1858c2ecf20Sopenharmony_ci			 FAULT_KVMALLOC		  0x000000002
1868c2ecf20Sopenharmony_ci			 FAULT_PAGE_ALLOC	  0x000000004
1878c2ecf20Sopenharmony_ci			 FAULT_PAGE_GET		  0x000000008
1888c2ecf20Sopenharmony_ci			 FAULT_ALLOC_BIO	  0x000000010
1898c2ecf20Sopenharmony_ci			 FAULT_ALLOC_NID	  0x000000020
1908c2ecf20Sopenharmony_ci			 FAULT_ORPHAN		  0x000000040
1918c2ecf20Sopenharmony_ci			 FAULT_BLOCK		  0x000000080
1928c2ecf20Sopenharmony_ci			 FAULT_DIR_DEPTH	  0x000000100
1938c2ecf20Sopenharmony_ci			 FAULT_EVICT_INODE	  0x000000200
1948c2ecf20Sopenharmony_ci			 FAULT_TRUNCATE		  0x000000400
1958c2ecf20Sopenharmony_ci			 FAULT_READ_IO		  0x000000800
1968c2ecf20Sopenharmony_ci			 FAULT_CHECKPOINT	  0x000001000
1978c2ecf20Sopenharmony_ci			 FAULT_DISCARD		  0x000002000
1988c2ecf20Sopenharmony_ci			 FAULT_WRITE_IO		  0x000004000
1998c2ecf20Sopenharmony_ci			 ===================	  ===========
2008c2ecf20Sopenharmony_cimode=%s			 Control block allocation mode which supports "adaptive"
2018c2ecf20Sopenharmony_ci			 and "lfs". In "lfs" mode, there should be no random
2028c2ecf20Sopenharmony_ci			 writes towards main area.
2038c2ecf20Sopenharmony_ciio_bits=%u		 Set the bit size of write IO requests. It should be set
2048c2ecf20Sopenharmony_ci			 with "mode=lfs".
2058c2ecf20Sopenharmony_ciusrquota		 Enable plain user disk quota accounting.
2068c2ecf20Sopenharmony_cigrpquota		 Enable plain group disk quota accounting.
2078c2ecf20Sopenharmony_ciprjquota		 Enable plain project quota accounting.
2088c2ecf20Sopenharmony_ciusrjquota=<file>	 Appoint specified file and type during mount, so that quota
2098c2ecf20Sopenharmony_cigrpjquota=<file>	 information can be properly updated during recovery flow,
2108c2ecf20Sopenharmony_ciprjjquota=<file>	 <quota file>: must be in root directory;
2118c2ecf20Sopenharmony_cijqfmt=<quota type>	 <quota type>: [vfsold,vfsv0,vfsv1].
2128c2ecf20Sopenharmony_cioffusrjquota		 Turn off user journalled quota.
2138c2ecf20Sopenharmony_cioffgrpjquota		 Turn off group journalled quota.
2148c2ecf20Sopenharmony_cioffprjjquota		 Turn off project journalled quota.
2158c2ecf20Sopenharmony_ciquota			 Enable plain user disk quota accounting.
2168c2ecf20Sopenharmony_cinoquota			 Disable all plain disk quota option.
2178c2ecf20Sopenharmony_ciwhint_mode=%s		 Control which write hints are passed down to block
2188c2ecf20Sopenharmony_ci			 layer. This supports "off", "user-based", and
2198c2ecf20Sopenharmony_ci			 "fs-based".  In "off" mode (default), f2fs does not pass
2208c2ecf20Sopenharmony_ci			 down hints. In "user-based" mode, f2fs tries to pass
2218c2ecf20Sopenharmony_ci			 down hints given by users. And in "fs-based" mode, f2fs
2228c2ecf20Sopenharmony_ci			 passes down hints with its policy.
2238c2ecf20Sopenharmony_cialloc_mode=%s		 Adjust block allocation policy, which supports "reuse"
2248c2ecf20Sopenharmony_ci			 and "default".
2258c2ecf20Sopenharmony_cifsync_mode=%s		 Control the policy of fsync. Currently supports "posix",
2268c2ecf20Sopenharmony_ci			 "strict", and "nobarrier". In "posix" mode, which is
2278c2ecf20Sopenharmony_ci			 default, fsync will follow POSIX semantics and does a
2288c2ecf20Sopenharmony_ci			 light operation to improve the filesystem performance.
2298c2ecf20Sopenharmony_ci			 In "strict" mode, fsync will be heavy and behaves in line
2308c2ecf20Sopenharmony_ci			 with xfs, ext4 and btrfs, where xfstest generic/342 will
2318c2ecf20Sopenharmony_ci			 pass, but the performance will regress. "nobarrier" is
2328c2ecf20Sopenharmony_ci			 based on "posix", but doesn't issue flush command for
2338c2ecf20Sopenharmony_ci			 non-atomic files likewise "nobarrier" mount option.
2348c2ecf20Sopenharmony_citest_dummy_encryption
2358c2ecf20Sopenharmony_citest_dummy_encryption=%s
2368c2ecf20Sopenharmony_ci			 Enable dummy encryption, which provides a fake fscrypt
2378c2ecf20Sopenharmony_ci			 context. The fake fscrypt context is used by xfstests.
2388c2ecf20Sopenharmony_ci			 The argument may be either "v1" or "v2", in order to
2398c2ecf20Sopenharmony_ci			 select the corresponding fscrypt policy version.
2408c2ecf20Sopenharmony_cicheckpoint=%s[:%u[%]]	 Set to "disable" to turn off checkpointing. Set to "enable"
2418c2ecf20Sopenharmony_ci			 to reenable checkpointing. Is enabled by default. While
2428c2ecf20Sopenharmony_ci			 disabled, any unmounting or unexpected shutdowns will cause
2438c2ecf20Sopenharmony_ci			 the filesystem contents to appear as they did when the
2448c2ecf20Sopenharmony_ci			 filesystem was mounted with that option.
2458c2ecf20Sopenharmony_ci			 While mounting with checkpoint=disabled, the filesystem must
2468c2ecf20Sopenharmony_ci			 run garbage collection to ensure that all available space can
2478c2ecf20Sopenharmony_ci			 be used. If this takes too much time, the mount may return
2488c2ecf20Sopenharmony_ci			 EAGAIN. You may optionally add a value to indicate how much
2498c2ecf20Sopenharmony_ci			 of the disk you would be willing to temporarily give up to
2508c2ecf20Sopenharmony_ci			 avoid additional garbage collection. This can be given as a
2518c2ecf20Sopenharmony_ci			 number of blocks, or as a percent. For instance, mounting
2528c2ecf20Sopenharmony_ci			 with checkpoint=disable:100% would always succeed, but it may
2538c2ecf20Sopenharmony_ci			 hide up to all remaining free space. The actual space that
2548c2ecf20Sopenharmony_ci			 would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable
2558c2ecf20Sopenharmony_ci			 This space is reclaimed once checkpoint=enable.
2568c2ecf20Sopenharmony_cicompress_algorithm=%s	 Control compress algorithm, currently f2fs supports "lzo",
2578c2ecf20Sopenharmony_ci			 "lz4", "zstd" and "lzo-rle" algorithm.
2588c2ecf20Sopenharmony_cicompress_log_size=%u	 Support configuring compress cluster size, the size will
2598c2ecf20Sopenharmony_ci			 be 4KB * (1 << %u), 16KB is minimum size, also it's
2608c2ecf20Sopenharmony_ci			 default size.
2618c2ecf20Sopenharmony_cicompress_extension=%s	 Support adding specified extension, so that f2fs can enable
2628c2ecf20Sopenharmony_ci			 compression on those corresponding files, e.g. if all files
2638c2ecf20Sopenharmony_ci			 with '.ext' has high compression rate, we can set the '.ext'
2648c2ecf20Sopenharmony_ci			 on compression extension list and enable compression on
2658c2ecf20Sopenharmony_ci			 these file by default rather than to enable it via ioctl.
2668c2ecf20Sopenharmony_ci			 For other files, we can still enable compression via ioctl.
2678c2ecf20Sopenharmony_ci			 Note that, there is one reserved special extension '*', it
2688c2ecf20Sopenharmony_ci			 can be set to enable compression for all files.
2698c2ecf20Sopenharmony_ciinlinecrypt		 When possible, encrypt/decrypt the contents of encrypted
2708c2ecf20Sopenharmony_ci			 files using the blk-crypto framework rather than
2718c2ecf20Sopenharmony_ci			 filesystem-layer encryption. This allows the use of
2728c2ecf20Sopenharmony_ci			 inline encryption hardware. The on-disk format is
2738c2ecf20Sopenharmony_ci			 unaffected. For more details, see
2748c2ecf20Sopenharmony_ci			 Documentation/block/inline-encryption.rst.
2758c2ecf20Sopenharmony_ciatgc			 Enable age-threshold garbage collection, it provides high
2768c2ecf20Sopenharmony_ci			 effectiveness and efficiency on background GC.
2778c2ecf20Sopenharmony_ci======================== ============================================================
2788c2ecf20Sopenharmony_ci
2798c2ecf20Sopenharmony_ciDebugfs Entries
2808c2ecf20Sopenharmony_ci===============
2818c2ecf20Sopenharmony_ci
2828c2ecf20Sopenharmony_ci/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
2838c2ecf20Sopenharmony_cif2fs. Each file shows the whole f2fs information.
2848c2ecf20Sopenharmony_ci
2858c2ecf20Sopenharmony_ci/sys/kernel/debug/f2fs/status includes:
2868c2ecf20Sopenharmony_ci
2878c2ecf20Sopenharmony_ci - major file system information managed by f2fs currently
2888c2ecf20Sopenharmony_ci - average SIT information about whole segments
2898c2ecf20Sopenharmony_ci - current memory footprint consumed by f2fs.
2908c2ecf20Sopenharmony_ci
2918c2ecf20Sopenharmony_ciSysfs Entries
2928c2ecf20Sopenharmony_ci=============
2938c2ecf20Sopenharmony_ci
2948c2ecf20Sopenharmony_ciInformation about mounted f2fs file systems can be found in
2958c2ecf20Sopenharmony_ci/sys/fs/f2fs.  Each mounted filesystem will have a directory in
2968c2ecf20Sopenharmony_ci/sys/fs/f2fs based on its device name (i.e., /sys/fs/f2fs/sda).
2978c2ecf20Sopenharmony_ciThe files in each per-device directory are shown in table below.
2988c2ecf20Sopenharmony_ci
2998c2ecf20Sopenharmony_ciFiles in /sys/fs/f2fs/<devname>
3008c2ecf20Sopenharmony_ci(see also Documentation/ABI/testing/sysfs-fs-f2fs)
3018c2ecf20Sopenharmony_ci
3028c2ecf20Sopenharmony_ciUsage
3038c2ecf20Sopenharmony_ci=====
3048c2ecf20Sopenharmony_ci
3058c2ecf20Sopenharmony_ci1. Download userland tools and compile them.
3068c2ecf20Sopenharmony_ci
3078c2ecf20Sopenharmony_ci2. Skip, if f2fs was compiled statically inside kernel.
3088c2ecf20Sopenharmony_ci   Otherwise, insert the f2fs.ko module::
3098c2ecf20Sopenharmony_ci
3108c2ecf20Sopenharmony_ci	# insmod f2fs.ko
3118c2ecf20Sopenharmony_ci
3128c2ecf20Sopenharmony_ci3. Create a directory to use when mounting::
3138c2ecf20Sopenharmony_ci
3148c2ecf20Sopenharmony_ci	# mkdir /mnt/f2fs
3158c2ecf20Sopenharmony_ci
3168c2ecf20Sopenharmony_ci4. Format the block device, and then mount as f2fs::
3178c2ecf20Sopenharmony_ci
3188c2ecf20Sopenharmony_ci	# mkfs.f2fs -l label /dev/block_device
3198c2ecf20Sopenharmony_ci	# mount -t f2fs /dev/block_device /mnt/f2fs
3208c2ecf20Sopenharmony_ci
3218c2ecf20Sopenharmony_cimkfs.f2fs
3228c2ecf20Sopenharmony_ci---------
3238c2ecf20Sopenharmony_ciThe mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
3248c2ecf20Sopenharmony_ciwhich builds a basic on-disk layout.
3258c2ecf20Sopenharmony_ci
3268c2ecf20Sopenharmony_ciThe quick options consist of:
3278c2ecf20Sopenharmony_ci
3288c2ecf20Sopenharmony_ci===============    ===========================================================
3298c2ecf20Sopenharmony_ci``-l [label]``     Give a volume label, up to 512 unicode name.
3308c2ecf20Sopenharmony_ci``-a [0 or 1]``    Split start location of each area for heap-based allocation.
3318c2ecf20Sopenharmony_ci
3328c2ecf20Sopenharmony_ci                   1 is set by default, which performs this.
3338c2ecf20Sopenharmony_ci``-o [int]``       Set overprovision ratio in percent over volume size.
3348c2ecf20Sopenharmony_ci
3358c2ecf20Sopenharmony_ci                   5 is set by default.
3368c2ecf20Sopenharmony_ci``-s [int]``       Set the number of segments per section.
3378c2ecf20Sopenharmony_ci
3388c2ecf20Sopenharmony_ci                   1 is set by default.
3398c2ecf20Sopenharmony_ci``-z [int]``       Set the number of sections per zone.
3408c2ecf20Sopenharmony_ci
3418c2ecf20Sopenharmony_ci                   1 is set by default.
3428c2ecf20Sopenharmony_ci``-e [str]``       Set basic extension list. e.g. "mp3,gif,mov"
3438c2ecf20Sopenharmony_ci``-t [0 or 1]``    Disable discard command or not.
3448c2ecf20Sopenharmony_ci
3458c2ecf20Sopenharmony_ci                   1 is set by default, which conducts discard.
3468c2ecf20Sopenharmony_ci===============    ===========================================================
3478c2ecf20Sopenharmony_ci
3488c2ecf20Sopenharmony_ciNote: please refer to the manpage of mkfs.f2fs(8) to get full option list.
3498c2ecf20Sopenharmony_ci
3508c2ecf20Sopenharmony_cifsck.f2fs
3518c2ecf20Sopenharmony_ci---------
3528c2ecf20Sopenharmony_ciThe fsck.f2fs is a tool to check the consistency of an f2fs-formatted
3538c2ecf20Sopenharmony_cipartition, which examines whether the filesystem metadata and user-made data
3548c2ecf20Sopenharmony_ciare cross-referenced correctly or not.
3558c2ecf20Sopenharmony_ciNote that, initial version of the tool does not fix any inconsistency.
3568c2ecf20Sopenharmony_ci
3578c2ecf20Sopenharmony_ciThe quick options consist of::
3588c2ecf20Sopenharmony_ci
3598c2ecf20Sopenharmony_ci  -d debug level [default:0]
3608c2ecf20Sopenharmony_ci
3618c2ecf20Sopenharmony_ciNote: please refer to the manpage of fsck.f2fs(8) to get full option list.
3628c2ecf20Sopenharmony_ci
3638c2ecf20Sopenharmony_cidump.f2fs
3648c2ecf20Sopenharmony_ci---------
3658c2ecf20Sopenharmony_ciThe dump.f2fs shows the information of specific inode and dumps SSA and SIT to
3668c2ecf20Sopenharmony_cifile. Each file is dump_ssa and dump_sit.
3678c2ecf20Sopenharmony_ci
3688c2ecf20Sopenharmony_ciThe dump.f2fs is used to debug on-disk data structures of the f2fs filesystem.
3698c2ecf20Sopenharmony_ciIt shows on-disk inode information recognized by a given inode number, and is
3708c2ecf20Sopenharmony_ciable to dump all the SSA and SIT entries into predefined files, ./dump_ssa and
3718c2ecf20Sopenharmony_ci./dump_sit respectively.
3728c2ecf20Sopenharmony_ci
3738c2ecf20Sopenharmony_ciThe options consist of::
3748c2ecf20Sopenharmony_ci
3758c2ecf20Sopenharmony_ci  -d debug level [default:0]
3768c2ecf20Sopenharmony_ci  -i inode no (hex)
3778c2ecf20Sopenharmony_ci  -s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
3788c2ecf20Sopenharmony_ci  -a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
3798c2ecf20Sopenharmony_ci
3808c2ecf20Sopenharmony_ciExamples::
3818c2ecf20Sopenharmony_ci
3828c2ecf20Sopenharmony_ci    # dump.f2fs -i [ino] /dev/sdx
3838c2ecf20Sopenharmony_ci    # dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
3848c2ecf20Sopenharmony_ci    # dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
3858c2ecf20Sopenharmony_ci
3868c2ecf20Sopenharmony_ciNote: please refer to the manpage of dump.f2fs(8) to get full option list.
3878c2ecf20Sopenharmony_ci
3888c2ecf20Sopenharmony_cisload.f2fs
3898c2ecf20Sopenharmony_ci----------
3908c2ecf20Sopenharmony_ciThe sload.f2fs gives a way to insert files and directories in the exisiting disk
3918c2ecf20Sopenharmony_ciimage. This tool is useful when building f2fs images given compiled files.
3928c2ecf20Sopenharmony_ci
3938c2ecf20Sopenharmony_ciNote: please refer to the manpage of sload.f2fs(8) to get full option list.
3948c2ecf20Sopenharmony_ci
3958c2ecf20Sopenharmony_ciresize.f2fs
3968c2ecf20Sopenharmony_ci-----------
3978c2ecf20Sopenharmony_ciThe resize.f2fs lets a user resize the f2fs-formatted disk image, while preserving
3988c2ecf20Sopenharmony_ciall the files and directories stored in the image.
3998c2ecf20Sopenharmony_ci
4008c2ecf20Sopenharmony_ciNote: please refer to the manpage of resize.f2fs(8) to get full option list.
4018c2ecf20Sopenharmony_ci
4028c2ecf20Sopenharmony_cidefrag.f2fs
4038c2ecf20Sopenharmony_ci-----------
4048c2ecf20Sopenharmony_ciThe defrag.f2fs can be used to defragment scattered written data as well as
4058c2ecf20Sopenharmony_cifilesystem metadata across the disk. This can improve the write speed by giving
4068c2ecf20Sopenharmony_cimore free consecutive space.
4078c2ecf20Sopenharmony_ci
4088c2ecf20Sopenharmony_ciNote: please refer to the manpage of defrag.f2fs(8) to get full option list.
4098c2ecf20Sopenharmony_ci
4108c2ecf20Sopenharmony_cif2fs_io
4118c2ecf20Sopenharmony_ci-------
4128c2ecf20Sopenharmony_ciThe f2fs_io is a simple tool to issue various filesystem APIs as well as
4138c2ecf20Sopenharmony_cif2fs-specific ones, which is very useful for QA tests.
4148c2ecf20Sopenharmony_ci
4158c2ecf20Sopenharmony_ciNote: please refer to the manpage of f2fs_io(8) to get full option list.
4168c2ecf20Sopenharmony_ci
4178c2ecf20Sopenharmony_ciDesign
4188c2ecf20Sopenharmony_ci======
4198c2ecf20Sopenharmony_ci
4208c2ecf20Sopenharmony_ciOn-disk Layout
4218c2ecf20Sopenharmony_ci--------------
4228c2ecf20Sopenharmony_ci
4238c2ecf20Sopenharmony_ciF2FS divides the whole volume into a number of segments, each of which is fixed
4248c2ecf20Sopenharmony_cito 2MB in size. A section is composed of consecutive segments, and a zone
4258c2ecf20Sopenharmony_ciconsists of a set of sections. By default, section and zone sizes are set to one
4268c2ecf20Sopenharmony_cisegment size identically, but users can easily modify the sizes by mkfs.
4278c2ecf20Sopenharmony_ci
4288c2ecf20Sopenharmony_ciF2FS splits the entire volume into six areas, and all the areas except superblock
4298c2ecf20Sopenharmony_ciconsist of multiple segments as described below::
4308c2ecf20Sopenharmony_ci
4318c2ecf20Sopenharmony_ci                                            align with the zone size <-|
4328c2ecf20Sopenharmony_ci                 |-> align with the segment size
4338c2ecf20Sopenharmony_ci     _________________________________________________________________________
4348c2ecf20Sopenharmony_ci    |            |            |   Segment   |    Node     |   Segment  |      |
4358c2ecf20Sopenharmony_ci    | Superblock | Checkpoint |    Info.    |   Address   |   Summary  | Main |
4368c2ecf20Sopenharmony_ci    |    (SB)    |   (CP)     | Table (SIT) | Table (NAT) | Area (SSA) |      |
4378c2ecf20Sopenharmony_ci    |____________|_____2______|______N______|______N______|______N_____|__N___|
4388c2ecf20Sopenharmony_ci                                                                       .      .
4398c2ecf20Sopenharmony_ci                                                             .                .
4408c2ecf20Sopenharmony_ci                                                 .                            .
4418c2ecf20Sopenharmony_ci                                    ._________________________________________.
4428c2ecf20Sopenharmony_ci                                    |_Segment_|_..._|_Segment_|_..._|_Segment_|
4438c2ecf20Sopenharmony_ci                                    .           .
4448c2ecf20Sopenharmony_ci                                    ._________._________
4458c2ecf20Sopenharmony_ci                                    |_section_|__...__|_
4468c2ecf20Sopenharmony_ci                                    .            .
4478c2ecf20Sopenharmony_ci		                    .________.
4488c2ecf20Sopenharmony_ci	                            |__zone__|
4498c2ecf20Sopenharmony_ci
4508c2ecf20Sopenharmony_ci- Superblock (SB)
4518c2ecf20Sopenharmony_ci   It is located at the beginning of the partition, and there exist two copies
4528c2ecf20Sopenharmony_ci   to avoid file system crash. It contains basic partition information and some
4538c2ecf20Sopenharmony_ci   default parameters of f2fs.
4548c2ecf20Sopenharmony_ci
4558c2ecf20Sopenharmony_ci- Checkpoint (CP)
4568c2ecf20Sopenharmony_ci   It contains file system information, bitmaps for valid NAT/SIT sets, orphan
4578c2ecf20Sopenharmony_ci   inode lists, and summary entries of current active segments.
4588c2ecf20Sopenharmony_ci
4598c2ecf20Sopenharmony_ci- Segment Information Table (SIT)
4608c2ecf20Sopenharmony_ci   It contains segment information such as valid block count and bitmap for the
4618c2ecf20Sopenharmony_ci   validity of all the blocks.
4628c2ecf20Sopenharmony_ci
4638c2ecf20Sopenharmony_ci- Node Address Table (NAT)
4648c2ecf20Sopenharmony_ci   It is composed of a block address table for all the node blocks stored in
4658c2ecf20Sopenharmony_ci   Main area.
4668c2ecf20Sopenharmony_ci
4678c2ecf20Sopenharmony_ci- Segment Summary Area (SSA)
4688c2ecf20Sopenharmony_ci   It contains summary entries which contains the owner information of all the
4698c2ecf20Sopenharmony_ci   data and node blocks stored in Main area.
4708c2ecf20Sopenharmony_ci
4718c2ecf20Sopenharmony_ci- Main Area
4728c2ecf20Sopenharmony_ci   It contains file and directory data including their indices.
4738c2ecf20Sopenharmony_ci
4748c2ecf20Sopenharmony_ciIn order to avoid misalignment between file system and flash-based storage, F2FS
4758c2ecf20Sopenharmony_cialigns the start block address of CP with the segment size. Also, it aligns the
4768c2ecf20Sopenharmony_cistart block address of Main area with the zone size by reserving some segments
4778c2ecf20Sopenharmony_ciin SSA area.
4788c2ecf20Sopenharmony_ci
4798c2ecf20Sopenharmony_ciReference the following survey for additional technical details.
4808c2ecf20Sopenharmony_cihttps://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
4818c2ecf20Sopenharmony_ci
4828c2ecf20Sopenharmony_ciFile System Metadata Structure
4838c2ecf20Sopenharmony_ci------------------------------
4848c2ecf20Sopenharmony_ci
4858c2ecf20Sopenharmony_ciF2FS adopts the checkpointing scheme to maintain file system consistency. At
4868c2ecf20Sopenharmony_cimount time, F2FS first tries to find the last valid checkpoint data by scanning
4878c2ecf20Sopenharmony_ciCP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
4888c2ecf20Sopenharmony_ciOne of them always indicates the last valid data, which is called as shadow copy
4898c2ecf20Sopenharmony_cimechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
4908c2ecf20Sopenharmony_ci
4918c2ecf20Sopenharmony_ciFor file system consistency, each CP points to which NAT and SIT copies are
4928c2ecf20Sopenharmony_civalid, as shown as below::
4938c2ecf20Sopenharmony_ci
4948c2ecf20Sopenharmony_ci  +--------+----------+---------+
4958c2ecf20Sopenharmony_ci  |   CP   |    SIT   |   NAT   |
4968c2ecf20Sopenharmony_ci  +--------+----------+---------+
4978c2ecf20Sopenharmony_ci  .         .          .          .
4988c2ecf20Sopenharmony_ci  .            .              .              .
4998c2ecf20Sopenharmony_ci  .               .                 .                 .
5008c2ecf20Sopenharmony_ci  +-------+-------+--------+--------+--------+--------+
5018c2ecf20Sopenharmony_ci  | CP #0 | CP #1 | SIT #0 | SIT #1 | NAT #0 | NAT #1 |
5028c2ecf20Sopenharmony_ci  +-------+-------+--------+--------+--------+--------+
5038c2ecf20Sopenharmony_ci     |             ^                          ^
5048c2ecf20Sopenharmony_ci     |             |                          |
5058c2ecf20Sopenharmony_ci     `----------------------------------------'
5068c2ecf20Sopenharmony_ci
5078c2ecf20Sopenharmony_ciIndex Structure
5088c2ecf20Sopenharmony_ci---------------
5098c2ecf20Sopenharmony_ci
5108c2ecf20Sopenharmony_ciThe key data structure to manage the data locations is a "node". Similar to
5118c2ecf20Sopenharmony_citraditional file structures, F2FS has three types of node: inode, direct node,
5128c2ecf20Sopenharmony_ciindirect node. F2FS assigns 4KB to an inode block which contains 923 data block
5138c2ecf20Sopenharmony_ciindices, two direct node pointers, two indirect node pointers, and one double
5148c2ecf20Sopenharmony_ciindirect node pointer as described below. One direct node block contains 1018
5158c2ecf20Sopenharmony_cidata blocks, and one indirect node block contains also 1018 node blocks. Thus,
5168c2ecf20Sopenharmony_cione inode block (i.e., a file) covers::
5178c2ecf20Sopenharmony_ci
5188c2ecf20Sopenharmony_ci  4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
5198c2ecf20Sopenharmony_ci
5208c2ecf20Sopenharmony_ci   Inode block (4KB)
5218c2ecf20Sopenharmony_ci     |- data (923)
5228c2ecf20Sopenharmony_ci     |- direct node (2)
5238c2ecf20Sopenharmony_ci     |          `- data (1018)
5248c2ecf20Sopenharmony_ci     |- indirect node (2)
5258c2ecf20Sopenharmony_ci     |            `- direct node (1018)
5268c2ecf20Sopenharmony_ci     |                       `- data (1018)
5278c2ecf20Sopenharmony_ci     `- double indirect node (1)
5288c2ecf20Sopenharmony_ci                         `- indirect node (1018)
5298c2ecf20Sopenharmony_ci			              `- direct node (1018)
5308c2ecf20Sopenharmony_ci	                                         `- data (1018)
5318c2ecf20Sopenharmony_ci
5328c2ecf20Sopenharmony_ciNote that all the node blocks are mapped by NAT which means the location of
5338c2ecf20Sopenharmony_cieach node is translated by the NAT table. In the consideration of the wandering
5348c2ecf20Sopenharmony_citree problem, F2FS is able to cut off the propagation of node updates caused by
5358c2ecf20Sopenharmony_cileaf data writes.
5368c2ecf20Sopenharmony_ci
5378c2ecf20Sopenharmony_ciDirectory Structure
5388c2ecf20Sopenharmony_ci-------------------
5398c2ecf20Sopenharmony_ci
5408c2ecf20Sopenharmony_ciA directory entry occupies 11 bytes, which consists of the following attributes.
5418c2ecf20Sopenharmony_ci
5428c2ecf20Sopenharmony_ci- hash		hash value of the file name
5438c2ecf20Sopenharmony_ci- ino		inode number
5448c2ecf20Sopenharmony_ci- len		the length of file name
5458c2ecf20Sopenharmony_ci- type		file type such as directory, symlink, etc
5468c2ecf20Sopenharmony_ci
5478c2ecf20Sopenharmony_ciA dentry block consists of 214 dentry slots and file names. Therein a bitmap is
5488c2ecf20Sopenharmony_ciused to represent whether each dentry is valid or not. A dentry block occupies
5498c2ecf20Sopenharmony_ci4KB with the following composition.
5508c2ecf20Sopenharmony_ci
5518c2ecf20Sopenharmony_ci::
5528c2ecf20Sopenharmony_ci
5538c2ecf20Sopenharmony_ci  Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
5548c2ecf20Sopenharmony_ci	              dentries(11 * 214 bytes) + file name (8 * 214 bytes)
5558c2ecf20Sopenharmony_ci
5568c2ecf20Sopenharmony_ci                         [Bucket]
5578c2ecf20Sopenharmony_ci             +--------------------------------+
5588c2ecf20Sopenharmony_ci             |dentry block 1 | dentry block 2 |
5598c2ecf20Sopenharmony_ci             +--------------------------------+
5608c2ecf20Sopenharmony_ci             .               .
5618c2ecf20Sopenharmony_ci       .                             .
5628c2ecf20Sopenharmony_ci  .       [Dentry Block Structure: 4KB]       .
5638c2ecf20Sopenharmony_ci  +--------+----------+----------+------------+
5648c2ecf20Sopenharmony_ci  | bitmap | reserved | dentries | file names |
5658c2ecf20Sopenharmony_ci  +--------+----------+----------+------------+
5668c2ecf20Sopenharmony_ci  [Dentry Block: 4KB] .   .
5678c2ecf20Sopenharmony_ci		 .               .
5688c2ecf20Sopenharmony_ci            .                          .
5698c2ecf20Sopenharmony_ci            +------+------+-----+------+
5708c2ecf20Sopenharmony_ci            | hash | ino  | len | type |
5718c2ecf20Sopenharmony_ci            +------+------+-----+------+
5728c2ecf20Sopenharmony_ci            [Dentry Structure: 11 bytes]
5738c2ecf20Sopenharmony_ci
5748c2ecf20Sopenharmony_ciF2FS implements multi-level hash tables for directory structure. Each level has
5758c2ecf20Sopenharmony_cia hash table with dedicated number of hash buckets as shown below. Note that
5768c2ecf20Sopenharmony_ci"A(2B)" means a bucket includes 2 data blocks.
5778c2ecf20Sopenharmony_ci
5788c2ecf20Sopenharmony_ci::
5798c2ecf20Sopenharmony_ci
5808c2ecf20Sopenharmony_ci    ----------------------
5818c2ecf20Sopenharmony_ci    A : bucket
5828c2ecf20Sopenharmony_ci    B : block
5838c2ecf20Sopenharmony_ci    N : MAX_DIR_HASH_DEPTH
5848c2ecf20Sopenharmony_ci    ----------------------
5858c2ecf20Sopenharmony_ci
5868c2ecf20Sopenharmony_ci    level #0   | A(2B)
5878c2ecf20Sopenharmony_ci	    |
5888c2ecf20Sopenharmony_ci    level #1   | A(2B) - A(2B)
5898c2ecf20Sopenharmony_ci	    |
5908c2ecf20Sopenharmony_ci    level #2   | A(2B) - A(2B) - A(2B) - A(2B)
5918c2ecf20Sopenharmony_ci	.     |   .       .       .       .
5928c2ecf20Sopenharmony_ci    level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
5938c2ecf20Sopenharmony_ci	.     |   .       .       .       .
5948c2ecf20Sopenharmony_ci    level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
5958c2ecf20Sopenharmony_ci
5968c2ecf20Sopenharmony_ciThe number of blocks and buckets are determined by::
5978c2ecf20Sopenharmony_ci
5988c2ecf20Sopenharmony_ci                            ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
5998c2ecf20Sopenharmony_ci  # of blocks in level #n = |
6008c2ecf20Sopenharmony_ci                            `- 4, Otherwise
6018c2ecf20Sopenharmony_ci
6028c2ecf20Sopenharmony_ci                             ,- 2^(n + dir_level),
6038c2ecf20Sopenharmony_ci			     |        if n + dir_level < MAX_DIR_HASH_DEPTH / 2,
6048c2ecf20Sopenharmony_ci  # of buckets in level #n = |
6058c2ecf20Sopenharmony_ci                             `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1),
6068c2ecf20Sopenharmony_ci			              Otherwise
6078c2ecf20Sopenharmony_ci
6088c2ecf20Sopenharmony_ciWhen F2FS finds a file name in a directory, at first a hash value of the file
6098c2ecf20Sopenharmony_ciname is calculated. Then, F2FS scans the hash table in level #0 to find the
6108c2ecf20Sopenharmony_cidentry consisting of the file name and its inode number. If not found, F2FS
6118c2ecf20Sopenharmony_ciscans the next hash table in level #1. In this way, F2FS scans hash tables in
6128c2ecf20Sopenharmony_cieach levels incrementally from 1 to N. In each level F2FS needs to scan only
6138c2ecf20Sopenharmony_cione bucket determined by the following equation, which shows O(log(# of files))
6148c2ecf20Sopenharmony_cicomplexity::
6158c2ecf20Sopenharmony_ci
6168c2ecf20Sopenharmony_ci  bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
6178c2ecf20Sopenharmony_ci
6188c2ecf20Sopenharmony_ciIn the case of file creation, F2FS finds empty consecutive slots that cover the
6198c2ecf20Sopenharmony_cifile name. F2FS searches the empty slots in the hash tables of whole levels from
6208c2ecf20Sopenharmony_ci1 to N in the same way as the lookup operation.
6218c2ecf20Sopenharmony_ci
6228c2ecf20Sopenharmony_ciThe following figure shows an example of two cases holding children::
6238c2ecf20Sopenharmony_ci
6248c2ecf20Sopenharmony_ci       --------------> Dir <--------------
6258c2ecf20Sopenharmony_ci       |                                 |
6268c2ecf20Sopenharmony_ci    child                             child
6278c2ecf20Sopenharmony_ci
6288c2ecf20Sopenharmony_ci    child - child                     [hole] - child
6298c2ecf20Sopenharmony_ci
6308c2ecf20Sopenharmony_ci    child - child - child             [hole] - [hole] - child
6318c2ecf20Sopenharmony_ci
6328c2ecf20Sopenharmony_ci   Case 1:                           Case 2:
6338c2ecf20Sopenharmony_ci   Number of children = 6,           Number of children = 3,
6348c2ecf20Sopenharmony_ci   File size = 7                     File size = 7
6358c2ecf20Sopenharmony_ci
6368c2ecf20Sopenharmony_ciDefault Block Allocation
6378c2ecf20Sopenharmony_ci------------------------
6388c2ecf20Sopenharmony_ci
6398c2ecf20Sopenharmony_ciAt runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
6408c2ecf20Sopenharmony_ciand Hot/Warm/Cold data.
6418c2ecf20Sopenharmony_ci
6428c2ecf20Sopenharmony_ci- Hot node	contains direct node blocks of directories.
6438c2ecf20Sopenharmony_ci- Warm node	contains direct node blocks except hot node blocks.
6448c2ecf20Sopenharmony_ci- Cold node	contains indirect node blocks
6458c2ecf20Sopenharmony_ci- Hot data	contains dentry blocks
6468c2ecf20Sopenharmony_ci- Warm data	contains data blocks except hot and cold data blocks
6478c2ecf20Sopenharmony_ci- Cold data	contains multimedia data or migrated data blocks
6488c2ecf20Sopenharmony_ci
6498c2ecf20Sopenharmony_ciLFS has two schemes for free space management: threaded log and copy-and-compac-
6508c2ecf20Sopenharmony_cition. The copy-and-compaction scheme which is known as cleaning, is well-suited
6518c2ecf20Sopenharmony_cifor devices showing very good sequential write performance, since free segments
6528c2ecf20Sopenharmony_ciare served all the time for writing new data. However, it suffers from cleaning
6538c2ecf20Sopenharmony_cioverhead under high utilization. Contrarily, the threaded log scheme suffers
6548c2ecf20Sopenharmony_cifrom random writes, but no cleaning process is needed. F2FS adopts a hybrid
6558c2ecf20Sopenharmony_cischeme where the copy-and-compaction scheme is adopted by default, but the
6568c2ecf20Sopenharmony_cipolicy is dynamically changed to the threaded log scheme according to the file
6578c2ecf20Sopenharmony_cisystem status.
6588c2ecf20Sopenharmony_ci
6598c2ecf20Sopenharmony_ciIn order to align F2FS with underlying flash-based storage, F2FS allocates a
6608c2ecf20Sopenharmony_cisegment in a unit of section. F2FS expects that the section size would be the
6618c2ecf20Sopenharmony_cisame as the unit size of garbage collection in FTL. Furthermore, with respect
6628c2ecf20Sopenharmony_cito the mapping granularity in FTL, F2FS allocates each section of the active
6638c2ecf20Sopenharmony_cilogs from different zones as much as possible, since FTL can write the data in
6648c2ecf20Sopenharmony_cithe active logs into one allocation unit according to its mapping granularity.
6658c2ecf20Sopenharmony_ci
6668c2ecf20Sopenharmony_ciCleaning process
6678c2ecf20Sopenharmony_ci----------------
6688c2ecf20Sopenharmony_ci
6698c2ecf20Sopenharmony_ciF2FS does cleaning both on demand and in the background. On-demand cleaning is
6708c2ecf20Sopenharmony_citriggered when there are not enough free segments to serve VFS calls. Background
6718c2ecf20Sopenharmony_cicleaner is operated by a kernel thread, and triggers the cleaning job when the
6728c2ecf20Sopenharmony_cisystem is idle.
6738c2ecf20Sopenharmony_ci
6748c2ecf20Sopenharmony_ciF2FS supports two victim selection policies: greedy and cost-benefit algorithms.
6758c2ecf20Sopenharmony_ciIn the greedy algorithm, F2FS selects a victim segment having the smallest number
6768c2ecf20Sopenharmony_ciof valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
6778c2ecf20Sopenharmony_ciaccording to the segment age and the number of valid blocks in order to address
6788c2ecf20Sopenharmony_cilog block thrashing problem in the greedy algorithm. F2FS adopts the greedy
6798c2ecf20Sopenharmony_cialgorithm for on-demand cleaner, while background cleaner adopts cost-benefit
6808c2ecf20Sopenharmony_cialgorithm.
6818c2ecf20Sopenharmony_ci
6828c2ecf20Sopenharmony_ciIn order to identify whether the data in the victim segment are valid or not,
6838c2ecf20Sopenharmony_ciF2FS manages a bitmap. Each bit represents the validity of a block, and the
6848c2ecf20Sopenharmony_cibitmap is composed of a bit stream covering whole blocks in main area.
6858c2ecf20Sopenharmony_ci
6868c2ecf20Sopenharmony_ciWrite-hint Policy
6878c2ecf20Sopenharmony_ci-----------------
6888c2ecf20Sopenharmony_ci
6898c2ecf20Sopenharmony_ci1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
6908c2ecf20Sopenharmony_ci
6918c2ecf20Sopenharmony_ci2) whint_mode=user-based. F2FS tries to pass down hints given by
6928c2ecf20Sopenharmony_ciusers.
6938c2ecf20Sopenharmony_ci
6948c2ecf20Sopenharmony_ci===================== ======================== ===================
6958c2ecf20Sopenharmony_ciUser                  F2FS                     Block
6968c2ecf20Sopenharmony_ci===================== ======================== ===================
6978c2ecf20Sopenharmony_ciN/A                   META                     WRITE_LIFE_NOT_SET
6988c2ecf20Sopenharmony_ciN/A                   HOT_NODE                 "
6998c2ecf20Sopenharmony_ciN/A                   WARM_NODE                "
7008c2ecf20Sopenharmony_ciN/A                   COLD_NODE                "
7018c2ecf20Sopenharmony_ciioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
7028c2ecf20Sopenharmony_ciextension list        "                        "
7038c2ecf20Sopenharmony_ci
7048c2ecf20Sopenharmony_ci-- buffered io
7058c2ecf20Sopenharmony_ciWRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
7068c2ecf20Sopenharmony_ciWRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
7078c2ecf20Sopenharmony_ciWRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
7088c2ecf20Sopenharmony_ciWRITE_LIFE_NONE       "                        "
7098c2ecf20Sopenharmony_ciWRITE_LIFE_MEDIUM     "                        "
7108c2ecf20Sopenharmony_ciWRITE_LIFE_LONG       "                        "
7118c2ecf20Sopenharmony_ci
7128c2ecf20Sopenharmony_ci-- direct io
7138c2ecf20Sopenharmony_ciWRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
7148c2ecf20Sopenharmony_ciWRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
7158c2ecf20Sopenharmony_ciWRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
7168c2ecf20Sopenharmony_ciWRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
7178c2ecf20Sopenharmony_ciWRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
7188c2ecf20Sopenharmony_ciWRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
7198c2ecf20Sopenharmony_ci===================== ======================== ===================
7208c2ecf20Sopenharmony_ci
7218c2ecf20Sopenharmony_ci3) whint_mode=fs-based. F2FS passes down hints with its policy.
7228c2ecf20Sopenharmony_ci
7238c2ecf20Sopenharmony_ci===================== ======================== ===================
7248c2ecf20Sopenharmony_ciUser                  F2FS                     Block
7258c2ecf20Sopenharmony_ci===================== ======================== ===================
7268c2ecf20Sopenharmony_ciN/A                   META                     WRITE_LIFE_MEDIUM;
7278c2ecf20Sopenharmony_ciN/A                   HOT_NODE                 WRITE_LIFE_NOT_SET
7288c2ecf20Sopenharmony_ciN/A                   WARM_NODE                "
7298c2ecf20Sopenharmony_ciN/A                   COLD_NODE                WRITE_LIFE_NONE
7308c2ecf20Sopenharmony_ciioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
7318c2ecf20Sopenharmony_ciextension list        "                        "
7328c2ecf20Sopenharmony_ci
7338c2ecf20Sopenharmony_ci-- buffered io
7348c2ecf20Sopenharmony_ciWRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
7358c2ecf20Sopenharmony_ciWRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
7368c2ecf20Sopenharmony_ciWRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
7378c2ecf20Sopenharmony_ciWRITE_LIFE_NONE       "                        "
7388c2ecf20Sopenharmony_ciWRITE_LIFE_MEDIUM     "                        "
7398c2ecf20Sopenharmony_ciWRITE_LIFE_LONG       "                        "
7408c2ecf20Sopenharmony_ci
7418c2ecf20Sopenharmony_ci-- direct io
7428c2ecf20Sopenharmony_ciWRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
7438c2ecf20Sopenharmony_ciWRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
7448c2ecf20Sopenharmony_ciWRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
7458c2ecf20Sopenharmony_ciWRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
7468c2ecf20Sopenharmony_ciWRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
7478c2ecf20Sopenharmony_ciWRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
7488c2ecf20Sopenharmony_ci===================== ======================== ===================
7498c2ecf20Sopenharmony_ci
7508c2ecf20Sopenharmony_ciFallocate(2) Policy
7518c2ecf20Sopenharmony_ci-------------------
7528c2ecf20Sopenharmony_ci
7538c2ecf20Sopenharmony_ciThe default policy follows the below POSIX rule.
7548c2ecf20Sopenharmony_ci
7558c2ecf20Sopenharmony_ciAllocating disk space
7568c2ecf20Sopenharmony_ci    The default operation (i.e., mode is zero) of fallocate() allocates
7578c2ecf20Sopenharmony_ci    the disk space within the range specified by offset and len.  The
7588c2ecf20Sopenharmony_ci    file size (as reported by stat(2)) will be changed if offset+len is
7598c2ecf20Sopenharmony_ci    greater than the file size.  Any subregion within the range specified
7608c2ecf20Sopenharmony_ci    by offset and len that did not contain data before the call will be
7618c2ecf20Sopenharmony_ci    initialized to zero.  This default behavior closely resembles the
7628c2ecf20Sopenharmony_ci    behavior of the posix_fallocate(3) library function, and is intended
7638c2ecf20Sopenharmony_ci    as a method of optimally implementing that function.
7648c2ecf20Sopenharmony_ci
7658c2ecf20Sopenharmony_ciHowever, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
7668c2ecf20Sopenharmony_cifallocate(fd, DEFAULT_MODE), it allocates on-disk block addressess having
7678c2ecf20Sopenharmony_cizero or random data, which is useful to the below scenario where:
7688c2ecf20Sopenharmony_ci
7698c2ecf20Sopenharmony_ci 1. create(fd)
7708c2ecf20Sopenharmony_ci 2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
7718c2ecf20Sopenharmony_ci 3. fallocate(fd, 0, 0, size)
7728c2ecf20Sopenharmony_ci 4. address = fibmap(fd, offset)
7738c2ecf20Sopenharmony_ci 5. open(blkdev)
7748c2ecf20Sopenharmony_ci 6. write(blkdev, address)
7758c2ecf20Sopenharmony_ci
7768c2ecf20Sopenharmony_ciCompression implementation
7778c2ecf20Sopenharmony_ci--------------------------
7788c2ecf20Sopenharmony_ci
7798c2ecf20Sopenharmony_ci- New term named cluster is defined as basic unit of compression, file can
7808c2ecf20Sopenharmony_ci  be divided into multiple clusters logically. One cluster includes 4 << n
7818c2ecf20Sopenharmony_ci  (n >= 0) logical pages, compression size is also cluster size, each of
7828c2ecf20Sopenharmony_ci  cluster can be compressed or not.
7838c2ecf20Sopenharmony_ci
7848c2ecf20Sopenharmony_ci- In cluster metadata layout, one special block address is used to indicate
7858c2ecf20Sopenharmony_ci  a cluster is a compressed one or normal one; for compressed cluster, following
7868c2ecf20Sopenharmony_ci  metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
7878c2ecf20Sopenharmony_ci  stores data including compress header and compressed data.
7888c2ecf20Sopenharmony_ci
7898c2ecf20Sopenharmony_ci- In order to eliminate write amplification during overwrite, F2FS only
7908c2ecf20Sopenharmony_ci  support compression on write-once file, data can be compressed only when
7918c2ecf20Sopenharmony_ci  all logical blocks in cluster contain valid data and compress ratio of
7928c2ecf20Sopenharmony_ci  cluster data is lower than specified threshold.
7938c2ecf20Sopenharmony_ci
7948c2ecf20Sopenharmony_ci- To enable compression on regular inode, there are three ways:
7958c2ecf20Sopenharmony_ci
7968c2ecf20Sopenharmony_ci  * chattr +c file
7978c2ecf20Sopenharmony_ci  * chattr +c dir; touch dir/file
7988c2ecf20Sopenharmony_ci  * mount w/ -o compress_extension=ext; touch file.ext
7998c2ecf20Sopenharmony_ci
8008c2ecf20Sopenharmony_ciCompress metadata layout::
8018c2ecf20Sopenharmony_ci
8028c2ecf20Sopenharmony_ci				[Dnode Structure]
8038c2ecf20Sopenharmony_ci		+-----------------------------------------------+
8048c2ecf20Sopenharmony_ci		| cluster 1 | cluster 2 | ......... | cluster N |
8058c2ecf20Sopenharmony_ci		+-----------------------------------------------+
8068c2ecf20Sopenharmony_ci		.           .                       .           .
8078c2ecf20Sopenharmony_ci	.                       .                .                      .
8088c2ecf20Sopenharmony_ci    .         Compressed Cluster       .        .        Normal Cluster            .
8098c2ecf20Sopenharmony_ci    +----------+---------+---------+---------+  +---------+---------+---------+---------+
8108c2ecf20Sopenharmony_ci    |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
8118c2ecf20Sopenharmony_ci    +----------+---------+---------+---------+  +---------+---------+---------+---------+
8128c2ecf20Sopenharmony_ci	    .                             .
8138c2ecf20Sopenharmony_ci	    .                                           .
8148c2ecf20Sopenharmony_ci	.                                                           .
8158c2ecf20Sopenharmony_ci	+-------------+-------------+----------+----------------------------+
8168c2ecf20Sopenharmony_ci	| data length | data chksum | reserved |      compressed data       |
8178c2ecf20Sopenharmony_ci	+-------------+-------------+----------+----------------------------+
8188c2ecf20Sopenharmony_ci
8198c2ecf20Sopenharmony_ciNVMe Zoned Namespace devices
8208c2ecf20Sopenharmony_ci----------------------------
8218c2ecf20Sopenharmony_ci
8228c2ecf20Sopenharmony_ci- ZNS defines a per-zone capacity which can be equal or less than the
8238c2ecf20Sopenharmony_ci  zone-size. Zone-capacity is the number of usable blocks in the zone.
8248c2ecf20Sopenharmony_ci  F2FS checks if zone-capacity is less than zone-size, if it is, then any
8258c2ecf20Sopenharmony_ci  segment which starts after the zone-capacity is marked as not-free in
8268c2ecf20Sopenharmony_ci  the free segment bitmap at initial mount time. These segments are marked
8278c2ecf20Sopenharmony_ci  as permanently used so they are not allocated for writes and
8288c2ecf20Sopenharmony_ci  consequently are not needed to be garbage collected. In case the
8298c2ecf20Sopenharmony_ci  zone-capacity is not aligned to default segment size(2MB), then a segment
8308c2ecf20Sopenharmony_ci  can start before the zone-capacity and span across zone-capacity boundary.
8318c2ecf20Sopenharmony_ci  Such spanning segments are also considered as usable segments. All blocks
8328c2ecf20Sopenharmony_ci  past the zone-capacity are considered unusable in these segments.
833