162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci========================
462306a36Sopenharmony_ciext4 General Information
562306a36Sopenharmony_ci========================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciExt4 is an advanced level of the ext3 filesystem which incorporates
862306a36Sopenharmony_ciscalability and reliability enhancements for supporting large filesystems
962306a36Sopenharmony_ci(64 bit) in keeping with increasing disk capacities and state-of-the-art
1062306a36Sopenharmony_cifeature requirements.
1162306a36Sopenharmony_ci
1262306a36Sopenharmony_ciMailing list:	linux-ext4@vger.kernel.org
1362306a36Sopenharmony_ciWeb site:	http://ext4.wiki.kernel.org
1462306a36Sopenharmony_ci
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ciQuick usage instructions
1762306a36Sopenharmony_ci========================
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciNote: More extensive information for getting started with ext4 can be
2062306a36Sopenharmony_cifound at the ext4 wiki site at the URL:
2162306a36Sopenharmony_cihttp://ext4.wiki.kernel.org/index.php/Ext4_Howto
2262306a36Sopenharmony_ci
2362306a36Sopenharmony_ci  - The latest version of e2fsprogs can be found at:
2462306a36Sopenharmony_ci
2562306a36Sopenharmony_ci    https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
2662306a36Sopenharmony_ci
2762306a36Sopenharmony_ci	or
2862306a36Sopenharmony_ci
2962306a36Sopenharmony_ci    http://sourceforge.net/project/showfiles.php?group_id=2406
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ci	or grab the latest git repository from:
3262306a36Sopenharmony_ci
3362306a36Sopenharmony_ci   https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
3462306a36Sopenharmony_ci
3562306a36Sopenharmony_ci  - Create a new filesystem using the ext4 filesystem type:
3662306a36Sopenharmony_ci
3762306a36Sopenharmony_ci        # mke2fs -t ext4 /dev/hda1
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ci    Or to configure an existing ext3 filesystem to support extents:
4062306a36Sopenharmony_ci
4162306a36Sopenharmony_ci	# tune2fs -O extents /dev/hda1
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ci    If the filesystem was created with 128 byte inodes, it can be
4462306a36Sopenharmony_ci    converted to use 256 byte for greater efficiency via:
4562306a36Sopenharmony_ci
4662306a36Sopenharmony_ci        # tune2fs -I 256 /dev/hda1
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ci  - Mounting:
4962306a36Sopenharmony_ci
5062306a36Sopenharmony_ci	# mount -t ext4 /dev/hda1 /wherever
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ci  - When comparing performance with other filesystems, it's always
5362306a36Sopenharmony_ci    important to try multiple workloads; very often a subtle change in a
5462306a36Sopenharmony_ci    workload parameter can completely change the ranking of which
5562306a36Sopenharmony_ci    filesystems do well compared to others.  When comparing versus ext3,
5662306a36Sopenharmony_ci    note that ext4 enables write barriers by default, while ext3 does
5762306a36Sopenharmony_ci    not enable write barriers by default.  So it is useful to use
5862306a36Sopenharmony_ci    explicitly specify whether barriers are enabled or not when via the
5962306a36Sopenharmony_ci    '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems
6062306a36Sopenharmony_ci    for a fair comparison.  When tuning ext3 for best benchmark numbers,
6162306a36Sopenharmony_ci    it is often worthwhile to try changing the data journaling mode; '-o
6262306a36Sopenharmony_ci    data=writeback' can be faster for some workloads.  (Note however that
6362306a36Sopenharmony_ci    running mounted with data=writeback can potentially leave stale data
6462306a36Sopenharmony_ci    exposed in recently written files in case of an unclean shutdown,
6562306a36Sopenharmony_ci    which could be a security exposure in some situations.)  Configuring
6662306a36Sopenharmony_ci    the filesystem with a large journal can also be helpful for
6762306a36Sopenharmony_ci    metadata-intensive workloads.
6862306a36Sopenharmony_ci
6962306a36Sopenharmony_ciFeatures
7062306a36Sopenharmony_ci========
7162306a36Sopenharmony_ci
7262306a36Sopenharmony_ciCurrently Available
7362306a36Sopenharmony_ci-------------------
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_ci* ability to use filesystems > 16TB (e2fsprogs support not available yet)
7662306a36Sopenharmony_ci* extent format reduces metadata overhead (RAM, IO for access, transactions)
7762306a36Sopenharmony_ci* extent format more robust in face of on-disk corruption due to magics,
7862306a36Sopenharmony_ci* internal redundancy in tree
7962306a36Sopenharmony_ci* improved file allocation (multi-block alloc)
8062306a36Sopenharmony_ci* lift 32000 subdirectory limit imposed by i_links_count[1]
8162306a36Sopenharmony_ci* nsec timestamps for mtime, atime, ctime, create time
8262306a36Sopenharmony_ci* inode version field on disk (NFSv4, Lustre)
8362306a36Sopenharmony_ci* reduced e2fsck time via uninit_bg feature
8462306a36Sopenharmony_ci* journal checksumming for robustness, performance
8562306a36Sopenharmony_ci* persistent file preallocation (e.g for streaming media, databases)
8662306a36Sopenharmony_ci* ability to pack bitmaps and inode tables into larger virtual groups via the
8762306a36Sopenharmony_ci  flex_bg feature
8862306a36Sopenharmony_ci* large file support
8962306a36Sopenharmony_ci* inode allocation using large virtual block groups via flex_bg
9062306a36Sopenharmony_ci* delayed allocation
9162306a36Sopenharmony_ci* large block (up to pagesize) support
9262306a36Sopenharmony_ci* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
9362306a36Sopenharmony_ci  the ordering)
9462306a36Sopenharmony_ci* Case-insensitive file name lookups
9562306a36Sopenharmony_ci* file-based encryption support (fscrypt)
9662306a36Sopenharmony_ci* file-based verity support (fsverity)
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ci[1] Filesystems with a block size of 1k may see a limit imposed by the
9962306a36Sopenharmony_cidirectory hash tree having a maximum depth of two.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_cicase-insensitive file name lookups
10262306a36Sopenharmony_ci======================================================
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ciThe case-insensitive file name lookup feature is supported on a
10562306a36Sopenharmony_ciper-directory basis, allowing the user to mix case-insensitive and
10662306a36Sopenharmony_cicase-sensitive directories in the same filesystem.  It is enabled by
10762306a36Sopenharmony_ciflipping the +F inode attribute of an empty directory.  The
10862306a36Sopenharmony_cicase-insensitive string match operation is only defined when we know how
10962306a36Sopenharmony_citext in encoded in a byte sequence.  For that reason, in order to enable
11062306a36Sopenharmony_cicase-insensitive directories, the filesystem must have the
11162306a36Sopenharmony_cicasefold feature, which stores the filesystem-wide encoding
11262306a36Sopenharmony_cimodel used.  By default, the charset adopted is the latest version of
11362306a36Sopenharmony_ciUnicode (12.1.0, by the time of this writing), encoded in the UTF-8
11462306a36Sopenharmony_ciform.  The comparison algorithm is implemented by normalizing the
11562306a36Sopenharmony_cistrings to the Canonical decomposition form, as defined by Unicode,
11662306a36Sopenharmony_cifollowed by a byte per byte comparison.
11762306a36Sopenharmony_ci
11862306a36Sopenharmony_ciThe case-awareness is name-preserving on the disk, meaning that the file
11962306a36Sopenharmony_ciname provided by userspace is a byte-per-byte match to what is actually
12062306a36Sopenharmony_ciwritten in the disk.  The Unicode normalization format used by the
12162306a36Sopenharmony_cikernel is thus an internal representation, and not exposed to the
12262306a36Sopenharmony_ciuserspace nor to the disk, with the important exception of disk hashes,
12362306a36Sopenharmony_ciused on large case-insensitive directories with DX feature.  On DX
12462306a36Sopenharmony_cidirectories, the hash must be calculated using the casefolded version of
12562306a36Sopenharmony_cithe filename, meaning that the normalization format used actually has an
12662306a36Sopenharmony_ciimpact on where the directory entry is stored.
12762306a36Sopenharmony_ci
12862306a36Sopenharmony_ciWhen we change from viewing filenames as opaque byte sequences to seeing
12962306a36Sopenharmony_cithem as encoded strings we need to address what happens when a program
13062306a36Sopenharmony_citries to create a file with an invalid name.  The Unicode subsystem
13162306a36Sopenharmony_ciwithin the kernel leaves the decision of what to do in this case to the
13262306a36Sopenharmony_cifilesystem, which select its preferred behavior by enabling/disabling
13362306a36Sopenharmony_cithe strict mode.  When Ext4 encounters one of those strings and the
13462306a36Sopenharmony_cifilesystem did not require strict mode, it falls back to considering the
13562306a36Sopenharmony_cientire string as an opaque byte sequence, which still allows the user to
13662306a36Sopenharmony_cioperate on that file, but the case-insensitive lookups won't work.
13762306a36Sopenharmony_ci
13862306a36Sopenharmony_ciOptions
13962306a36Sopenharmony_ci=======
14062306a36Sopenharmony_ci
14162306a36Sopenharmony_ciWhen mounting an ext4 filesystem, the following option are accepted:
14262306a36Sopenharmony_ci(*) == default
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ci  ro
14562306a36Sopenharmony_ci        Mount filesystem read only. Note that ext4 will replay the journal (and
14662306a36Sopenharmony_ci        thus write to the partition) even when mounted "read only". The mount
14762306a36Sopenharmony_ci        options "ro,noload" can be used to prevent writes to the filesystem.
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ci  journal_checksum
15062306a36Sopenharmony_ci        Enable checksumming of the journal transactions.  This will allow the
15162306a36Sopenharmony_ci        recovery code in e2fsck and the kernel to detect corruption in the
15262306a36Sopenharmony_ci        kernel.  It is a compatible change and will be ignored by older
15362306a36Sopenharmony_ci        kernels.
15462306a36Sopenharmony_ci
15562306a36Sopenharmony_ci  journal_async_commit
15662306a36Sopenharmony_ci        Commit block can be written to disk without waiting for descriptor
15762306a36Sopenharmony_ci        blocks. If enabled older kernels cannot mount the device. This will
15862306a36Sopenharmony_ci        enable 'journal_checksum' internally.
15962306a36Sopenharmony_ci
16062306a36Sopenharmony_ci  journal_path=path, journal_dev=devnum
16162306a36Sopenharmony_ci        When the external journal device's major/minor numbers have changed,
16262306a36Sopenharmony_ci        these options allow the user to specify the new journal location.  The
16362306a36Sopenharmony_ci        journal device is identified through either its new major/minor numbers
16462306a36Sopenharmony_ci        encoded in devnum, or via a path to the device.
16562306a36Sopenharmony_ci
16662306a36Sopenharmony_ci  norecovery, noload
16762306a36Sopenharmony_ci        Don't load the journal on mounting.  Note that if the filesystem was
16862306a36Sopenharmony_ci        not unmounted cleanly, skipping the journal replay will lead to the
16962306a36Sopenharmony_ci        filesystem containing inconsistencies that can lead to any number of
17062306a36Sopenharmony_ci        problems.
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_ci  data=journal
17362306a36Sopenharmony_ci        All data are committed into the journal prior to being written into the
17462306a36Sopenharmony_ci        main file system.  Enabling this mode will disable delayed allocation
17562306a36Sopenharmony_ci        and O_DIRECT support.
17662306a36Sopenharmony_ci
17762306a36Sopenharmony_ci  data=ordered	(*)
17862306a36Sopenharmony_ci        All data are forced directly out to the main file system prior to its
17962306a36Sopenharmony_ci        metadata being committed to the journal.
18062306a36Sopenharmony_ci
18162306a36Sopenharmony_ci  data=writeback
18262306a36Sopenharmony_ci        Data ordering is not preserved, data may be written into the main file
18362306a36Sopenharmony_ci        system after its metadata has been committed to the journal.
18462306a36Sopenharmony_ci
18562306a36Sopenharmony_ci  commit=nrsec	(*)
18662306a36Sopenharmony_ci        This setting limits the maximum age of the running transaction to
18762306a36Sopenharmony_ci        'nrsec' seconds.  The default value is 5 seconds.  This means that if
18862306a36Sopenharmony_ci        you lose your power, you will lose as much as the latest 5 seconds of
18962306a36Sopenharmony_ci        metadata changes (your filesystem will not be damaged though, thanks
19062306a36Sopenharmony_ci        to the journaling). This default value (or any low value) will hurt
19162306a36Sopenharmony_ci        performance, but it's good for data-safety.  Setting it to 0 will have
19262306a36Sopenharmony_ci        the same effect as leaving it at the default (5 seconds).  Setting it
19362306a36Sopenharmony_ci        to very large values will improve performance.  Note that due to
19462306a36Sopenharmony_ci        delayed allocation even older data can be lost on power failure since
19562306a36Sopenharmony_ci        writeback of those data begins only after time set in
19662306a36Sopenharmony_ci        /proc/sys/vm/dirty_expire_centisecs.
19762306a36Sopenharmony_ci
19862306a36Sopenharmony_ci  barrier=<0|1(*)>, barrier(*), nobarrier
19962306a36Sopenharmony_ci        This enables/disables the use of write barriers in the jbd code.
20062306a36Sopenharmony_ci        barrier=0 disables, barrier=1 enables.  This also requires an IO stack
20162306a36Sopenharmony_ci        which can support barriers, and if jbd gets an error on a barrier
20262306a36Sopenharmony_ci        write, it will disable again with a warning.  Write barriers enforce
20362306a36Sopenharmony_ci        proper on-disk ordering of journal commits, making volatile disk write
20462306a36Sopenharmony_ci        caches safe to use, at some performance penalty.  If your disks are
20562306a36Sopenharmony_ci        battery-backed in one way or another, disabling barriers may safely
20662306a36Sopenharmony_ci        improve performance.  The mount options "barrier" and "nobarrier" can
20762306a36Sopenharmony_ci        also be used to enable or disable barriers, for consistency with other
20862306a36Sopenharmony_ci        ext4 mount options.
20962306a36Sopenharmony_ci
21062306a36Sopenharmony_ci  inode_readahead_blks=n
21162306a36Sopenharmony_ci        This tuning parameter controls the maximum number of inode table blocks
21262306a36Sopenharmony_ci        that ext4's inode table readahead algorithm will pre-read into the
21362306a36Sopenharmony_ci        buffer cache.  The default value is 32 blocks.
21462306a36Sopenharmony_ci
21562306a36Sopenharmony_ci  nouser_xattr
21662306a36Sopenharmony_ci        Disables Extended User Attributes.  See the attr(5) manual page for
21762306a36Sopenharmony_ci        more information about extended attributes.
21862306a36Sopenharmony_ci
21962306a36Sopenharmony_ci  noacl
22062306a36Sopenharmony_ci        This option disables POSIX Access Control List support. If ACL support
22162306a36Sopenharmony_ci        is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL
22262306a36Sopenharmony_ci        is enabled by default on mount. See the acl(5) manual page for more
22362306a36Sopenharmony_ci        information about acl.
22462306a36Sopenharmony_ci
22562306a36Sopenharmony_ci  bsddf	(*)
22662306a36Sopenharmony_ci        Make 'df' act like BSD.
22762306a36Sopenharmony_ci
22862306a36Sopenharmony_ci  minixdf
22962306a36Sopenharmony_ci        Make 'df' act like Minix.
23062306a36Sopenharmony_ci
23162306a36Sopenharmony_ci  debug
23262306a36Sopenharmony_ci        Extra debugging information is sent to syslog.
23362306a36Sopenharmony_ci
23462306a36Sopenharmony_ci  abort
23562306a36Sopenharmony_ci        Simulate the effects of calling ext4_abort() for debugging purposes.
23662306a36Sopenharmony_ci        This is normally used while remounting a filesystem which is already
23762306a36Sopenharmony_ci        mounted.
23862306a36Sopenharmony_ci
23962306a36Sopenharmony_ci  errors=remount-ro
24062306a36Sopenharmony_ci        Remount the filesystem read-only on an error.
24162306a36Sopenharmony_ci
24262306a36Sopenharmony_ci  errors=continue
24362306a36Sopenharmony_ci        Keep going on a filesystem error.
24462306a36Sopenharmony_ci
24562306a36Sopenharmony_ci  errors=panic
24662306a36Sopenharmony_ci        Panic and halt the machine if an error occurs.  (These mount options
24762306a36Sopenharmony_ci        override the errors behavior specified in the superblock, which can be
24862306a36Sopenharmony_ci        configured using tune2fs)
24962306a36Sopenharmony_ci
25062306a36Sopenharmony_ci  data_err=ignore(*)
25162306a36Sopenharmony_ci        Just print an error message if an error occurs in a file data buffer in
25262306a36Sopenharmony_ci        ordered mode.
25362306a36Sopenharmony_ci  data_err=abort
25462306a36Sopenharmony_ci        Abort the journal if an error occurs in a file data buffer in ordered
25562306a36Sopenharmony_ci        mode.
25662306a36Sopenharmony_ci
25762306a36Sopenharmony_ci  grpid | bsdgroups
25862306a36Sopenharmony_ci        New objects have the group ID of their parent.
25962306a36Sopenharmony_ci
26062306a36Sopenharmony_ci  nogrpid (*) | sysvgroups
26162306a36Sopenharmony_ci        New objects have the group ID of their creator.
26262306a36Sopenharmony_ci
26362306a36Sopenharmony_ci  resgid=n
26462306a36Sopenharmony_ci        The group ID which may use the reserved blocks.
26562306a36Sopenharmony_ci
26662306a36Sopenharmony_ci  resuid=n
26762306a36Sopenharmony_ci        The user ID which may use the reserved blocks.
26862306a36Sopenharmony_ci
26962306a36Sopenharmony_ci  sb=
27062306a36Sopenharmony_ci        Use alternate superblock at this location.
27162306a36Sopenharmony_ci
27262306a36Sopenharmony_ci  quota, noquota, grpquota, usrquota
27362306a36Sopenharmony_ci        These options are ignored by the filesystem. They are used only by
27462306a36Sopenharmony_ci        quota tools to recognize volumes where quota should be turned on. See
27562306a36Sopenharmony_ci        documentation in the quota-tools package for more details
27662306a36Sopenharmony_ci        (http://sourceforge.net/projects/linuxquota).
27762306a36Sopenharmony_ci
27862306a36Sopenharmony_ci  jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file>
27962306a36Sopenharmony_ci        These options tell filesystem details about quota so that quota
28062306a36Sopenharmony_ci        information can be properly updated during journal replay. They replace
28162306a36Sopenharmony_ci        the above quota options. See documentation in the quota-tools package
28262306a36Sopenharmony_ci        for more details (http://sourceforge.net/projects/linuxquota).
28362306a36Sopenharmony_ci
28462306a36Sopenharmony_ci  stripe=n
28562306a36Sopenharmony_ci        Number of filesystem blocks that mballoc will try to use for allocation
28662306a36Sopenharmony_ci        size and alignment. For RAID5/6 systems this should be the number of
28762306a36Sopenharmony_ci        data disks *  RAID chunk size in file system blocks.
28862306a36Sopenharmony_ci
28962306a36Sopenharmony_ci  delalloc	(*)
29062306a36Sopenharmony_ci        Defer block allocation until just before ext4 writes out the block(s)
29162306a36Sopenharmony_ci        in question.  This allows ext4 to better allocation decisions more
29262306a36Sopenharmony_ci        efficiently.
29362306a36Sopenharmony_ci
29462306a36Sopenharmony_ci  nodelalloc
29562306a36Sopenharmony_ci        Disable delayed allocation.  Blocks are allocated when the data is
29662306a36Sopenharmony_ci        copied from userspace to the page cache, either via the write(2) system
29762306a36Sopenharmony_ci        call or when an mmap'ed page which was previously unallocated is
29862306a36Sopenharmony_ci        written for the first time.
29962306a36Sopenharmony_ci
30062306a36Sopenharmony_ci  max_batch_time=usec
30162306a36Sopenharmony_ci        Maximum amount of time ext4 should wait for additional filesystem
30262306a36Sopenharmony_ci        operations to be batch together with a synchronous write operation.
30362306a36Sopenharmony_ci        Since a synchronous write operation is going to force a commit and then
30462306a36Sopenharmony_ci        a wait for the I/O complete, it doesn't cost much, and can be a huge
30562306a36Sopenharmony_ci        throughput win, we wait for a small amount of time to see if any other
30662306a36Sopenharmony_ci        transactions can piggyback on the synchronous write.   The algorithm
30762306a36Sopenharmony_ci        used is designed to automatically tune for the speed of the disk, by
30862306a36Sopenharmony_ci        measuring the amount of time (on average) that it takes to finish
30962306a36Sopenharmony_ci        committing a transaction.  Call this time the "commit time".  If the
31062306a36Sopenharmony_ci        time that the transaction has been running is less than the commit
31162306a36Sopenharmony_ci        time, ext4 will try sleeping for the commit time to see if other
31262306a36Sopenharmony_ci        operations will join the transaction.   The commit time is capped by
31362306a36Sopenharmony_ci        the max_batch_time, which defaults to 15000us (15ms).   This
31462306a36Sopenharmony_ci        optimization can be turned off entirely by setting max_batch_time to 0.
31562306a36Sopenharmony_ci
31662306a36Sopenharmony_ci  min_batch_time=usec
31762306a36Sopenharmony_ci        This parameter sets the commit time (as described above) to be at least
31862306a36Sopenharmony_ci        min_batch_time.  It defaults to zero microseconds.  Increasing this
31962306a36Sopenharmony_ci        parameter may improve the throughput of multi-threaded, synchronous
32062306a36Sopenharmony_ci        workloads on very fast disks, at the cost of increasing latency.
32162306a36Sopenharmony_ci
32262306a36Sopenharmony_ci  journal_ioprio=prio
32362306a36Sopenharmony_ci        The I/O priority (from 0 to 7, where 0 is the highest priority) which
32462306a36Sopenharmony_ci        should be used for I/O operations submitted by kjournald2 during a
32562306a36Sopenharmony_ci        commit operation.  This defaults to 3, which is a slightly higher
32662306a36Sopenharmony_ci        priority than the default I/O priority.
32762306a36Sopenharmony_ci
32862306a36Sopenharmony_ci  auto_da_alloc(*), noauto_da_alloc
32962306a36Sopenharmony_ci        Many broken applications don't use fsync() when replacing existing
33062306a36Sopenharmony_ci        files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/
33162306a36Sopenharmony_ci        rename("foo.new", "foo"), or worse yet, fd = open("foo",
33262306a36Sopenharmony_ci        O_TRUNC)/write(fd,..)/close(fd).  If auto_da_alloc is enabled, ext4
33362306a36Sopenharmony_ci        will detect the replace-via-rename and replace-via-truncate patterns
33462306a36Sopenharmony_ci        and force that any delayed allocation blocks are allocated such that at
33562306a36Sopenharmony_ci        the next journal commit, in the default data=ordered mode, the data
33662306a36Sopenharmony_ci        blocks of the new file are forced to disk before the rename() operation
33762306a36Sopenharmony_ci        is committed.  This provides roughly the same level of guarantees as
33862306a36Sopenharmony_ci        ext3, and avoids the "zero-length" problem that can happen when a
33962306a36Sopenharmony_ci        system crashes before the delayed allocation blocks are forced to disk.
34062306a36Sopenharmony_ci
34162306a36Sopenharmony_ci  noinit_itable
34262306a36Sopenharmony_ci        Do not initialize any uninitialized inode table blocks in the
34362306a36Sopenharmony_ci        background.  This feature may be used by installation CD's so that the
34462306a36Sopenharmony_ci        install process can complete as quickly as possible; the inode table
34562306a36Sopenharmony_ci        initialization process would then be deferred until the next time the
34662306a36Sopenharmony_ci        file system is unmounted.
34762306a36Sopenharmony_ci
34862306a36Sopenharmony_ci  init_itable=n
34962306a36Sopenharmony_ci        The lazy itable init code will wait n times the number of milliseconds
35062306a36Sopenharmony_ci        it took to zero out the previous block group's inode table.  This
35162306a36Sopenharmony_ci        minimizes the impact on the system performance while file system's
35262306a36Sopenharmony_ci        inode table is being initialized.
35362306a36Sopenharmony_ci
35462306a36Sopenharmony_ci  discard, nodiscard(*)
35562306a36Sopenharmony_ci        Controls whether ext4 should issue discard/TRIM commands to the
35662306a36Sopenharmony_ci        underlying block device when blocks are freed.  This is useful for SSD
35762306a36Sopenharmony_ci        devices and sparse/thinly-provisioned LUNs, but it is off by default
35862306a36Sopenharmony_ci        until sufficient testing has been done.
35962306a36Sopenharmony_ci
36062306a36Sopenharmony_ci  nouid32
36162306a36Sopenharmony_ci        Disables 32-bit UIDs and GIDs.  This is for interoperability  with
36262306a36Sopenharmony_ci        older kernels which only store and expect 16-bit values.
36362306a36Sopenharmony_ci
36462306a36Sopenharmony_ci  block_validity(*), noblock_validity
36562306a36Sopenharmony_ci        These options enable or disable the in-kernel facility for tracking
36662306a36Sopenharmony_ci        filesystem metadata blocks within internal data structures.  This
36762306a36Sopenharmony_ci        allows multi- block allocator and other routines to notice bugs or
36862306a36Sopenharmony_ci        corrupted allocation bitmaps which cause blocks to be allocated which
36962306a36Sopenharmony_ci        overlap with filesystem metadata blocks.
37062306a36Sopenharmony_ci
37162306a36Sopenharmony_ci  dioread_lock, dioread_nolock
37262306a36Sopenharmony_ci        Controls whether or not ext4 should use the DIO read locking. If the
37362306a36Sopenharmony_ci        dioread_nolock option is specified ext4 will allocate uninitialized
37462306a36Sopenharmony_ci        extent before buffer write and convert the extent to initialized after
37562306a36Sopenharmony_ci        IO completes. This approach allows ext4 code to avoid using inode
37662306a36Sopenharmony_ci        mutex, which improves scalability on high speed storages. However this
37762306a36Sopenharmony_ci        does not work with data journaling and dioread_nolock option will be
37862306a36Sopenharmony_ci        ignored with kernel warning. Note that dioread_nolock code path is only
37962306a36Sopenharmony_ci        used for extent-based files.  Because of the restrictions this options
38062306a36Sopenharmony_ci        comprises it is off by default (e.g. dioread_lock).
38162306a36Sopenharmony_ci
38262306a36Sopenharmony_ci  max_dir_size_kb=n
38362306a36Sopenharmony_ci        This limits the size of directories so that any attempt to expand them
38462306a36Sopenharmony_ci        beyond the specified limit in kilobytes will cause an ENOSPC error.
38562306a36Sopenharmony_ci        This is useful in memory constrained environments, where a very large
38662306a36Sopenharmony_ci        directory can cause severe performance problems or even provoke the Out
38762306a36Sopenharmony_ci        Of Memory killer.  (For example, if there is only 512mb memory
38862306a36Sopenharmony_ci        available, a 176mb directory may seriously cramp the system's style.)
38962306a36Sopenharmony_ci
39062306a36Sopenharmony_ci  i_version
39162306a36Sopenharmony_ci        Enable 64-bit inode version support. This option is off by default.
39262306a36Sopenharmony_ci
39362306a36Sopenharmony_ci  dax
39462306a36Sopenharmony_ci        Use direct access (no page cache).  See
39562306a36Sopenharmony_ci        Documentation/filesystems/dax.rst.  Note that this option is
39662306a36Sopenharmony_ci        incompatible with data=journal.
39762306a36Sopenharmony_ci
39862306a36Sopenharmony_ci  inlinecrypt
39962306a36Sopenharmony_ci        When possible, encrypt/decrypt the contents of encrypted files using the
40062306a36Sopenharmony_ci        blk-crypto framework rather than filesystem-layer encryption. This
40162306a36Sopenharmony_ci        allows the use of inline encryption hardware. The on-disk format is
40262306a36Sopenharmony_ci        unaffected. For more details, see
40362306a36Sopenharmony_ci        Documentation/block/inline-encryption.rst.
40462306a36Sopenharmony_ci
40562306a36Sopenharmony_ciData Mode
40662306a36Sopenharmony_ci=========
40762306a36Sopenharmony_ciThere are 3 different data modes:
40862306a36Sopenharmony_ci
40962306a36Sopenharmony_ci* writeback mode
41062306a36Sopenharmony_ci
41162306a36Sopenharmony_ci  In data=writeback mode, ext4 does not journal data at all.  This mode provides
41262306a36Sopenharmony_ci  a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
41362306a36Sopenharmony_ci  mode - metadata journaling.  A crash+recovery can cause incorrect data to
41462306a36Sopenharmony_ci  appear in files which were written shortly before the crash.  This mode will
41562306a36Sopenharmony_ci  typically provide the best ext4 performance.
41662306a36Sopenharmony_ci
41762306a36Sopenharmony_ci* ordered mode
41862306a36Sopenharmony_ci
41962306a36Sopenharmony_ci  In data=ordered mode, ext4 only officially journals metadata, but it logically
42062306a36Sopenharmony_ci  groups metadata information related to data changes with the data blocks into
42162306a36Sopenharmony_ci  a single unit called a transaction.  When it's time to write the new metadata
42262306a36Sopenharmony_ci  out to disk, the associated data blocks are written first.  In general, this
42362306a36Sopenharmony_ci  mode performs slightly slower than writeback but significantly faster than
42462306a36Sopenharmony_ci  journal mode.
42562306a36Sopenharmony_ci
42662306a36Sopenharmony_ci* journal mode
42762306a36Sopenharmony_ci
42862306a36Sopenharmony_ci  data=journal mode provides full data and metadata journaling.  All new data is
42962306a36Sopenharmony_ci  written to the journal first, and then to its final location.  In the event of
43062306a36Sopenharmony_ci  a crash, the journal can be replayed, bringing both data and metadata into a
43162306a36Sopenharmony_ci  consistent state.  This mode is the slowest except when data needs to be read
43262306a36Sopenharmony_ci  from and written to disk at the same time where it outperforms all others
43362306a36Sopenharmony_ci  modes.  Enabling this mode will disable delayed allocation and O_DIRECT
43462306a36Sopenharmony_ci  support.
43562306a36Sopenharmony_ci
43662306a36Sopenharmony_ci/proc entries
43762306a36Sopenharmony_ci=============
43862306a36Sopenharmony_ci
43962306a36Sopenharmony_ciInformation about mounted ext4 file systems can be found in
44062306a36Sopenharmony_ci/proc/fs/ext4.  Each mounted filesystem will have a directory in
44162306a36Sopenharmony_ci/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
44262306a36Sopenharmony_ci/proc/fs/ext4/dm-0).   The files in each per-device directory are shown
44362306a36Sopenharmony_ciin table below.
44462306a36Sopenharmony_ci
44562306a36Sopenharmony_ciFiles in /proc/fs/ext4/<devname>
44662306a36Sopenharmony_ci
44762306a36Sopenharmony_ci  mb_groups
44862306a36Sopenharmony_ci        details of multiblock allocator buddy cache of free blocks
44962306a36Sopenharmony_ci
45062306a36Sopenharmony_ci/sys entries
45162306a36Sopenharmony_ci============
45262306a36Sopenharmony_ci
45362306a36Sopenharmony_ciInformation about mounted ext4 file systems can be found in
45462306a36Sopenharmony_ci/sys/fs/ext4.  Each mounted filesystem will have a directory in
45562306a36Sopenharmony_ci/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or
45662306a36Sopenharmony_ci/sys/fs/ext4/dm-0).   The files in each per-device directory are shown
45762306a36Sopenharmony_ciin table below.
45862306a36Sopenharmony_ci
45962306a36Sopenharmony_ciFiles in /sys/fs/ext4/<devname>:
46062306a36Sopenharmony_ci
46162306a36Sopenharmony_ci(see also Documentation/ABI/testing/sysfs-fs-ext4)
46262306a36Sopenharmony_ci
46362306a36Sopenharmony_ci  delayed_allocation_blocks
46462306a36Sopenharmony_ci        This file is read-only and shows the number of blocks that are dirty in
46562306a36Sopenharmony_ci        the page cache, but which do not have their location in the filesystem
46662306a36Sopenharmony_ci        allocated yet.
46762306a36Sopenharmony_ci
46862306a36Sopenharmony_ci  inode_goal
46962306a36Sopenharmony_ci        Tuning parameter which (if non-zero) controls the goal inode used by
47062306a36Sopenharmony_ci        the inode allocator in preference to all other allocation heuristics.
47162306a36Sopenharmony_ci        This is intended for debugging use only, and should be 0 on production
47262306a36Sopenharmony_ci        systems.
47362306a36Sopenharmony_ci
47462306a36Sopenharmony_ci  inode_readahead_blks
47562306a36Sopenharmony_ci        Tuning parameter which controls the maximum number of inode table
47662306a36Sopenharmony_ci        blocks that ext4's inode table readahead algorithm will pre-read into
47762306a36Sopenharmony_ci        the buffer cache.
47862306a36Sopenharmony_ci
47962306a36Sopenharmony_ci  lifetime_write_kbytes
48062306a36Sopenharmony_ci        This file is read-only and shows the number of kilobytes of data that
48162306a36Sopenharmony_ci        have been written to this filesystem since it was created.
48262306a36Sopenharmony_ci
48362306a36Sopenharmony_ci  max_writeback_mb_bump
48462306a36Sopenharmony_ci        The maximum number of megabytes the writeback code will try to write
48562306a36Sopenharmony_ci        out before move on to another inode.
48662306a36Sopenharmony_ci
48762306a36Sopenharmony_ci  mb_group_prealloc
48862306a36Sopenharmony_ci        The multiblock allocator will round up allocation requests to a
48962306a36Sopenharmony_ci        multiple of this tuning parameter if the stripe size is not set in the
49062306a36Sopenharmony_ci        ext4 superblock
49162306a36Sopenharmony_ci
49262306a36Sopenharmony_ci  mb_max_to_scan
49362306a36Sopenharmony_ci        The maximum number of extents the multiblock allocator will search to
49462306a36Sopenharmony_ci        find the best extent.
49562306a36Sopenharmony_ci
49662306a36Sopenharmony_ci  mb_min_to_scan
49762306a36Sopenharmony_ci        The minimum number of extents the multiblock allocator will search to
49862306a36Sopenharmony_ci        find the best extent.
49962306a36Sopenharmony_ci
50062306a36Sopenharmony_ci  mb_order2_req
50162306a36Sopenharmony_ci        Tuning parameter which controls the minimum size for requests (as a
50262306a36Sopenharmony_ci        power of 2) where the buddy cache is used.
50362306a36Sopenharmony_ci
50462306a36Sopenharmony_ci  mb_stats
50562306a36Sopenharmony_ci        Controls whether the multiblock allocator should collect statistics,
50662306a36Sopenharmony_ci        which are shown during the unmount. 1 means to collect statistics, 0
50762306a36Sopenharmony_ci        means not to collect statistics.
50862306a36Sopenharmony_ci
50962306a36Sopenharmony_ci  mb_stream_req
51062306a36Sopenharmony_ci        Files which have fewer blocks than this tunable parameter will have
51162306a36Sopenharmony_ci        their blocks allocated out of a block group specific preallocation
51262306a36Sopenharmony_ci        pool, so that small files are packed closely together.  Each large file
51362306a36Sopenharmony_ci        will have its blocks allocated out of its own unique preallocation
51462306a36Sopenharmony_ci        pool.
51562306a36Sopenharmony_ci
51662306a36Sopenharmony_ci  session_write_kbytes
51762306a36Sopenharmony_ci        This file is read-only and shows the number of kilobytes of data that
51862306a36Sopenharmony_ci        have been written to this filesystem since it was mounted.
51962306a36Sopenharmony_ci
52062306a36Sopenharmony_ci  reserved_clusters
52162306a36Sopenharmony_ci        This is RW file and contains number of reserved clusters in the file
52262306a36Sopenharmony_ci        system which will be used in the specific situations to avoid costly
52362306a36Sopenharmony_ci        zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or
52462306a36Sopenharmony_ci        4096 clusters, whichever is smaller and this can be changed however it
52562306a36Sopenharmony_ci        can never exceed number of clusters in the file system. If there is not
52662306a36Sopenharmony_ci        enough space for the reserved space when mounting the file mount will
52762306a36Sopenharmony_ci        _not_ fail.
52862306a36Sopenharmony_ci
52962306a36Sopenharmony_ciIoctls
53062306a36Sopenharmony_ci======
53162306a36Sopenharmony_ci
53262306a36Sopenharmony_ciExt4 implements various ioctls which can be used by applications to access
53362306a36Sopenharmony_ciext4-specific functionality. An incomplete list of these ioctls is shown in the
53462306a36Sopenharmony_citable below. This list includes truly ext4-specific ioctls (``EXT4_IOC_*``) as
53562306a36Sopenharmony_ciwell as ioctls that may have been ext4-specific originally but are now supported
53662306a36Sopenharmony_ciby some other filesystem(s) too (``FS_IOC_*``).
53762306a36Sopenharmony_ci
53862306a36Sopenharmony_ciTable of Ext4 ioctls
53962306a36Sopenharmony_ci
54062306a36Sopenharmony_ci  FS_IOC_GETFLAGS
54162306a36Sopenharmony_ci        Get additional attributes associated with inode.  The ioctl argument is
54262306a36Sopenharmony_ci        an integer bitfield, with bit values described in ext4.h.
54362306a36Sopenharmony_ci
54462306a36Sopenharmony_ci  FS_IOC_SETFLAGS
54562306a36Sopenharmony_ci        Set additional attributes associated with inode.  The ioctl argument is
54662306a36Sopenharmony_ci        an integer bitfield, with bit values described in ext4.h.
54762306a36Sopenharmony_ci
54862306a36Sopenharmony_ci  EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD
54962306a36Sopenharmony_ci        Get the inode i_generation number stored for each inode. The
55062306a36Sopenharmony_ci        i_generation number is normally changed only when new inode is created
55162306a36Sopenharmony_ci        and it is particularly useful for network filesystems. The '_OLD'
55262306a36Sopenharmony_ci        version of this ioctl is an alias for FS_IOC_GETVERSION.
55362306a36Sopenharmony_ci
55462306a36Sopenharmony_ci  EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD
55562306a36Sopenharmony_ci        Set the inode i_generation number stored for each inode. The '_OLD'
55662306a36Sopenharmony_ci        version of this ioctl is an alias for FS_IOC_SETVERSION.
55762306a36Sopenharmony_ci
55862306a36Sopenharmony_ci  EXT4_IOC_GROUP_EXTEND
55962306a36Sopenharmony_ci        This ioctl has the same purpose as the resize mount option. It allows
56062306a36Sopenharmony_ci        to resize filesystem to the end of the last existing block group,
56162306a36Sopenharmony_ci        further resize has to be done with resize2fs, either online, or
56262306a36Sopenharmony_ci        offline. The argument points to the unsigned logn number representing
56362306a36Sopenharmony_ci        the filesystem new block count.
56462306a36Sopenharmony_ci
56562306a36Sopenharmony_ci  EXT4_IOC_MOVE_EXT
56662306a36Sopenharmony_ci        Move the block extents from orig_fd (the one this ioctl is pointing to)
56762306a36Sopenharmony_ci        to the donor_fd (the one specified in move_extent structure passed as
56862306a36Sopenharmony_ci        an argument to this ioctl). Then, exchange inode metadata between
56962306a36Sopenharmony_ci        orig_fd and donor_fd.  This is especially useful for online
57062306a36Sopenharmony_ci        defragmentation, because the allocator has the opportunity to allocate
57162306a36Sopenharmony_ci        moved blocks better, ideally into one contiguous extent.
57262306a36Sopenharmony_ci
57362306a36Sopenharmony_ci  EXT4_IOC_GROUP_ADD
57462306a36Sopenharmony_ci        Add a new group descriptor to an existing or new group descriptor
57562306a36Sopenharmony_ci        block. The new group descriptor is described by ext4_new_group_input
57662306a36Sopenharmony_ci        structure, which is passed as an argument to this ioctl. This is
57762306a36Sopenharmony_ci        especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which
57862306a36Sopenharmony_ci        allows online resize of the filesystem to the end of the last existing
57962306a36Sopenharmony_ci        block group.  Those two ioctls combined is used in userspace online
58062306a36Sopenharmony_ci        resize tool (e.g. resize2fs).
58162306a36Sopenharmony_ci
58262306a36Sopenharmony_ci  EXT4_IOC_MIGRATE
58362306a36Sopenharmony_ci        This ioctl operates on the filesystem itself.  It converts (migrates)
58462306a36Sopenharmony_ci        ext3 indirect block mapped inode to ext4 extent mapped inode by walking
58562306a36Sopenharmony_ci        through indirect block mapping of the original inode and converting
58662306a36Sopenharmony_ci        contiguous block ranges into ext4 extents of the temporary inode. Then,
58762306a36Sopenharmony_ci        inodes are swapped. This ioctl might help, when migrating from ext3 to
58862306a36Sopenharmony_ci        ext4 filesystem, however suggestion is to create fresh ext4 filesystem
58962306a36Sopenharmony_ci        and copy data from the backup. Note, that filesystem has to support
59062306a36Sopenharmony_ci        extents for this ioctl to work.
59162306a36Sopenharmony_ci
59262306a36Sopenharmony_ci  EXT4_IOC_ALLOC_DA_BLKS
59362306a36Sopenharmony_ci        Force all of the delay allocated blocks to be allocated to preserve
59462306a36Sopenharmony_ci        application-expected ext3 behaviour. Note that this will also start
59562306a36Sopenharmony_ci        triggering a write of the data blocks, but this behaviour may change in
59662306a36Sopenharmony_ci        the future as it is not necessary and has been done this way only for
59762306a36Sopenharmony_ci        sake of simplicity.
59862306a36Sopenharmony_ci
59962306a36Sopenharmony_ci  EXT4_IOC_RESIZE_FS
60062306a36Sopenharmony_ci        Resize the filesystem to a new size.  The number of blocks of resized
60162306a36Sopenharmony_ci        filesystem is passed in via 64 bit integer argument.  The kernel
60262306a36Sopenharmony_ci        allocates bitmaps and inode table, the userspace tool thus just passes
60362306a36Sopenharmony_ci        the new number of blocks.
60462306a36Sopenharmony_ci
60562306a36Sopenharmony_ci  EXT4_IOC_SWAP_BOOT
60662306a36Sopenharmony_ci        Swap i_blocks and associated attributes (like i_blocks, i_size,
60762306a36Sopenharmony_ci        i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO
60862306a36Sopenharmony_ci        (#5). This is typically used to store a boot loader in a secure part of
60962306a36Sopenharmony_ci        the filesystem, where it can't be changed by a normal user by accident.
61062306a36Sopenharmony_ci        The data blocks of the previous boot loader will be associated with the
61162306a36Sopenharmony_ci        given inode.
61262306a36Sopenharmony_ci
61362306a36Sopenharmony_ciReferences
61462306a36Sopenharmony_ci==========
61562306a36Sopenharmony_ci
61662306a36Sopenharmony_cikernel source:	<file:fs/ext4/>
61762306a36Sopenharmony_ci		<file:fs/jbd2/>
61862306a36Sopenharmony_ci
61962306a36Sopenharmony_ciprograms:	http://e2fsprogs.sourceforge.net/
62062306a36Sopenharmony_ci
62162306a36Sopenharmony_ciuseful links:	https://fedoraproject.org/wiki/ext3-devel
62262306a36Sopenharmony_ci		http://www.bullopensource.org/ext4/
62362306a36Sopenharmony_ci		http://ext4.wiki.kernel.org/index.php/Main_Page
62462306a36Sopenharmony_ci		https://fedoraproject.org/wiki/Features/Ext4
625