18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 28c2ecf20Sopenharmony_ci 38c2ecf20Sopenharmony_ci 48c2ecf20Sopenharmony_ciThe Second Extended Filesystem 58c2ecf20Sopenharmony_ci============================== 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciext2 was originally released in January 1993. Written by R\'emy Card, 88c2ecf20Sopenharmony_ciTheodore Ts'o and Stephen Tweedie, it was a major rewrite of the 98c2ecf20Sopenharmony_ciExtended Filesystem. It is currently still (April 2001) the predominant 108c2ecf20Sopenharmony_cifilesystem in use by Linux. There are also implementations available 118c2ecf20Sopenharmony_cifor NetBSD, FreeBSD, the GNU HURD, Windows 95/98/NT, OS/2 and RISC OS. 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciOptions 148c2ecf20Sopenharmony_ci======= 158c2ecf20Sopenharmony_ci 168c2ecf20Sopenharmony_ciMost defaults are determined by the filesystem superblock, and can be 178c2ecf20Sopenharmony_ciset using tune2fs(8). Kernel-determined defaults are indicated by (*). 188c2ecf20Sopenharmony_ci 198c2ecf20Sopenharmony_ci==================== === ================================================ 208c2ecf20Sopenharmony_cibsddf (*) Makes ``df`` act like BSD. 218c2ecf20Sopenharmony_ciminixdf Makes ``df`` act like Minix. 228c2ecf20Sopenharmony_ci 238c2ecf20Sopenharmony_cicheck=none, nocheck (*) Don't do extra checking of bitmaps on mount 248c2ecf20Sopenharmony_ci (check=normal and check=strict options removed) 258c2ecf20Sopenharmony_ci 268c2ecf20Sopenharmony_cidax Use direct access (no page cache). See 278c2ecf20Sopenharmony_ci Documentation/filesystems/dax.txt. 288c2ecf20Sopenharmony_ci 298c2ecf20Sopenharmony_cidebug Extra debugging information is sent to the 308c2ecf20Sopenharmony_ci kernel syslog. Useful for developers. 318c2ecf20Sopenharmony_ci 328c2ecf20Sopenharmony_cierrors=continue Keep going on a filesystem error. 338c2ecf20Sopenharmony_cierrors=remount-ro Remount the filesystem read-only on an error. 348c2ecf20Sopenharmony_cierrors=panic Panic and halt the machine if an error occurs. 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_cigrpid, bsdgroups Give objects the same group ID as their parent. 378c2ecf20Sopenharmony_cinogrpid, sysvgroups New objects have the group ID of their creator. 388c2ecf20Sopenharmony_ci 398c2ecf20Sopenharmony_cinouid32 Use 16-bit UIDs and GIDs. 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_cioldalloc Enable the old block allocator. Orlov should 428c2ecf20Sopenharmony_ci have better performance, we'd like to get some 438c2ecf20Sopenharmony_ci feedback if it's the contrary for you. 448c2ecf20Sopenharmony_ciorlov (*) Use the Orlov block allocator. 458c2ecf20Sopenharmony_ci (See http://lwn.net/Articles/14633/ and 468c2ecf20Sopenharmony_ci http://lwn.net/Articles/14446/.) 478c2ecf20Sopenharmony_ci 488c2ecf20Sopenharmony_ciresuid=n The user ID which may use the reserved blocks. 498c2ecf20Sopenharmony_ciresgid=n The group ID which may use the reserved blocks. 508c2ecf20Sopenharmony_ci 518c2ecf20Sopenharmony_cisb=n Use alternate superblock at this location. 528c2ecf20Sopenharmony_ci 538c2ecf20Sopenharmony_ciuser_xattr Enable "user." POSIX Extended Attributes 548c2ecf20Sopenharmony_ci (requires CONFIG_EXT2_FS_XATTR). 558c2ecf20Sopenharmony_cinouser_xattr Don't support "user." extended attributes. 568c2ecf20Sopenharmony_ci 578c2ecf20Sopenharmony_ciacl Enable POSIX Access Control Lists support 588c2ecf20Sopenharmony_ci (requires CONFIG_EXT2_FS_POSIX_ACL). 598c2ecf20Sopenharmony_cinoacl Don't support POSIX ACLs. 608c2ecf20Sopenharmony_ci 618c2ecf20Sopenharmony_cinobh Do not attach buffer_heads to file pagecache. 628c2ecf20Sopenharmony_ci 638c2ecf20Sopenharmony_ciquota, usrquota Enable user disk quota support 648c2ecf20Sopenharmony_ci (requires CONFIG_QUOTA). 658c2ecf20Sopenharmony_ci 668c2ecf20Sopenharmony_cigrpquota Enable group disk quota support 678c2ecf20Sopenharmony_ci (requires CONFIG_QUOTA). 688c2ecf20Sopenharmony_ci==================== === ================================================ 698c2ecf20Sopenharmony_ci 708c2ecf20Sopenharmony_cinoquota option ls silently ignored by ext2. 718c2ecf20Sopenharmony_ci 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ciSpecification 748c2ecf20Sopenharmony_ci============= 758c2ecf20Sopenharmony_ci 768c2ecf20Sopenharmony_ciext2 shares many properties with traditional Unix filesystems. It has 778c2ecf20Sopenharmony_cithe concepts of blocks, inodes and directories. It has space in the 788c2ecf20Sopenharmony_cispecification for Access Control Lists (ACLs), fragments, undeletion and 798c2ecf20Sopenharmony_cicompression though these are not yet implemented (some are available as 808c2ecf20Sopenharmony_ciseparate patches). There is also a versioning mechanism to allow new 818c2ecf20Sopenharmony_cifeatures (such as journalling) to be added in a maximally compatible 828c2ecf20Sopenharmony_cimanner. 838c2ecf20Sopenharmony_ci 848c2ecf20Sopenharmony_ciBlocks 858c2ecf20Sopenharmony_ci------ 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ciThe space in the device or file is split up into blocks. These are 888c2ecf20Sopenharmony_cia fixed size, of 1024, 2048 or 4096 bytes (8192 bytes on Alpha systems), 898c2ecf20Sopenharmony_ciwhich is decided when the filesystem is created. Smaller blocks mean 908c2ecf20Sopenharmony_ciless wasted space per file, but require slightly more accounting overhead, 918c2ecf20Sopenharmony_ciand also impose other limits on the size of files and the filesystem. 928c2ecf20Sopenharmony_ci 938c2ecf20Sopenharmony_ciBlock Groups 948c2ecf20Sopenharmony_ci------------ 958c2ecf20Sopenharmony_ci 968c2ecf20Sopenharmony_ciBlocks are clustered into block groups in order to reduce fragmentation 978c2ecf20Sopenharmony_ciand minimise the amount of head seeking when reading a large amount 988c2ecf20Sopenharmony_ciof consecutive data. Information about each block group is kept in a 998c2ecf20Sopenharmony_cidescriptor table stored in the block(s) immediately after the superblock. 1008c2ecf20Sopenharmony_ciTwo blocks near the start of each group are reserved for the block usage 1018c2ecf20Sopenharmony_cibitmap and the inode usage bitmap which show which blocks and inodes 1028c2ecf20Sopenharmony_ciare in use. Since each bitmap is limited to a single block, this means 1038c2ecf20Sopenharmony_cithat the maximum size of a block group is 8 times the size of a block. 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ciThe block(s) following the bitmaps in each block group are designated 1068c2ecf20Sopenharmony_cias the inode table for that block group and the remainder are the data 1078c2ecf20Sopenharmony_ciblocks. The block allocation algorithm attempts to allocate data blocks 1088c2ecf20Sopenharmony_ciin the same block group as the inode which contains them. 1098c2ecf20Sopenharmony_ci 1108c2ecf20Sopenharmony_ciThe Superblock 1118c2ecf20Sopenharmony_ci-------------- 1128c2ecf20Sopenharmony_ci 1138c2ecf20Sopenharmony_ciThe superblock contains all the information about the configuration of 1148c2ecf20Sopenharmony_cithe filing system. The primary copy of the superblock is stored at an 1158c2ecf20Sopenharmony_cioffset of 1024 bytes from the start of the device, and it is essential 1168c2ecf20Sopenharmony_cito mounting the filesystem. Since it is so important, backup copies of 1178c2ecf20Sopenharmony_cithe superblock are stored in block groups throughout the filesystem. 1188c2ecf20Sopenharmony_ciThe first version of ext2 (revision 0) stores a copy at the start of 1198c2ecf20Sopenharmony_cievery block group, along with backups of the group descriptor block(s). 1208c2ecf20Sopenharmony_ciBecause this can consume a considerable amount of space for large 1218c2ecf20Sopenharmony_cifilesystems, later revisions can optionally reduce the number of backup 1228c2ecf20Sopenharmony_cicopies by only putting backups in specific groups (this is the sparse 1238c2ecf20Sopenharmony_cisuperblock feature). The groups chosen are 0, 1 and powers of 3, 5 and 7. 1248c2ecf20Sopenharmony_ci 1258c2ecf20Sopenharmony_ciThe information in the superblock contains fields such as the total 1268c2ecf20Sopenharmony_cinumber of inodes and blocks in the filesystem and how many are free, 1278c2ecf20Sopenharmony_cihow many inodes and blocks are in each block group, when the filesystem 1288c2ecf20Sopenharmony_ciwas mounted (and if it was cleanly unmounted), when it was modified, 1298c2ecf20Sopenharmony_ciwhat version of the filesystem it is (see the Revisions section below) 1308c2ecf20Sopenharmony_ciand which OS created it. 1318c2ecf20Sopenharmony_ci 1328c2ecf20Sopenharmony_ciIf the filesystem is revision 1 or higher, then there are extra fields, 1338c2ecf20Sopenharmony_cisuch as a volume name, a unique identification number, the inode size, 1348c2ecf20Sopenharmony_ciand space for optional filesystem features to store configuration info. 1358c2ecf20Sopenharmony_ci 1368c2ecf20Sopenharmony_ciAll fields in the superblock (as in all other ext2 structures) are stored 1378c2ecf20Sopenharmony_cion the disc in little endian format, so a filesystem is portable between 1388c2ecf20Sopenharmony_cimachines without having to know what machine it was created on. 1398c2ecf20Sopenharmony_ci 1408c2ecf20Sopenharmony_ciInodes 1418c2ecf20Sopenharmony_ci------ 1428c2ecf20Sopenharmony_ci 1438c2ecf20Sopenharmony_ciThe inode (index node) is a fundamental concept in the ext2 filesystem. 1448c2ecf20Sopenharmony_ciEach object in the filesystem is represented by an inode. The inode 1458c2ecf20Sopenharmony_cistructure contains pointers to the filesystem blocks which contain the 1468c2ecf20Sopenharmony_cidata held in the object and all of the metadata about an object except 1478c2ecf20Sopenharmony_ciits name. The metadata about an object includes the permissions, owner, 1488c2ecf20Sopenharmony_cigroup, flags, size, number of blocks used, access time, change time, 1498c2ecf20Sopenharmony_cimodification time, deletion time, number of links, fragments, version 1508c2ecf20Sopenharmony_ci(for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs). 1518c2ecf20Sopenharmony_ci 1528c2ecf20Sopenharmony_ciThere are some reserved fields which are currently unused in the inode 1538c2ecf20Sopenharmony_cistructure and several which are overloaded. One field is reserved for the 1548c2ecf20Sopenharmony_cidirectory ACL if the inode is a directory and alternately for the top 32 1558c2ecf20Sopenharmony_cibits of the file size if the inode is a regular file (allowing file sizes 1568c2ecf20Sopenharmony_cilarger than 2GB). The translator field is unused under Linux, but is used 1578c2ecf20Sopenharmony_ciby the HURD to reference the inode of a program which will be used to 1588c2ecf20Sopenharmony_ciinterpret this object. Most of the remaining reserved fields have been 1598c2ecf20Sopenharmony_ciused up for both Linux and the HURD for larger owner and group fields, 1608c2ecf20Sopenharmony_ciThe HURD also has a larger mode field so it uses another of the remaining 1618c2ecf20Sopenharmony_cifields to store the extra more bits. 1628c2ecf20Sopenharmony_ci 1638c2ecf20Sopenharmony_ciThere are pointers to the first 12 blocks which contain the file's data 1648c2ecf20Sopenharmony_ciin the inode. There is a pointer to an indirect block (which contains 1658c2ecf20Sopenharmony_cipointers to the next set of blocks), a pointer to a doubly-indirect 1668c2ecf20Sopenharmony_ciblock (which contains pointers to indirect blocks) and a pointer to a 1678c2ecf20Sopenharmony_citrebly-indirect block (which contains pointers to doubly-indirect blocks). 1688c2ecf20Sopenharmony_ci 1698c2ecf20Sopenharmony_ciThe flags field contains some ext2-specific flags which aren't catered 1708c2ecf20Sopenharmony_cifor by the standard chmod flags. These flags can be listed with lsattr 1718c2ecf20Sopenharmony_ciand changed with the chattr command, and allow specific filesystem 1728c2ecf20Sopenharmony_cibehaviour on a per-file basis. There are flags for secure deletion, 1738c2ecf20Sopenharmony_ciundeletable, compression, synchronous updates, immutability, append-only, 1748c2ecf20Sopenharmony_cidumpable, no-atime, indexed directories, and data-journaling. Not all 1758c2ecf20Sopenharmony_ciof these are supported yet. 1768c2ecf20Sopenharmony_ci 1778c2ecf20Sopenharmony_ciDirectories 1788c2ecf20Sopenharmony_ci----------- 1798c2ecf20Sopenharmony_ci 1808c2ecf20Sopenharmony_ciA directory is a filesystem object and has an inode just like a file. 1818c2ecf20Sopenharmony_ciIt is a specially formatted file containing records which associate 1828c2ecf20Sopenharmony_cieach name with an inode number. Later revisions of the filesystem also 1838c2ecf20Sopenharmony_ciencode the type of the object (file, directory, symlink, device, fifo, 1848c2ecf20Sopenharmony_cisocket) to avoid the need to check the inode itself for this information 1858c2ecf20Sopenharmony_ci(support for taking advantage of this feature does not yet exist in 1868c2ecf20Sopenharmony_ciGlibc 2.2). 1878c2ecf20Sopenharmony_ci 1888c2ecf20Sopenharmony_ciThe inode allocation code tries to assign inodes which are in the same 1898c2ecf20Sopenharmony_ciblock group as the directory in which they are first created. 1908c2ecf20Sopenharmony_ci 1918c2ecf20Sopenharmony_ciThe current implementation of ext2 uses a singly-linked list to store 1928c2ecf20Sopenharmony_cithe filenames in the directory; a pending enhancement uses hashing of the 1938c2ecf20Sopenharmony_cifilenames to allow lookup without the need to scan the entire directory. 1948c2ecf20Sopenharmony_ci 1958c2ecf20Sopenharmony_ciThe current implementation never removes empty directory blocks once they 1968c2ecf20Sopenharmony_cihave been allocated to hold more files. 1978c2ecf20Sopenharmony_ci 1988c2ecf20Sopenharmony_ciSpecial files 1998c2ecf20Sopenharmony_ci------------- 2008c2ecf20Sopenharmony_ci 2018c2ecf20Sopenharmony_ciSymbolic links are also filesystem objects with inodes. They deserve 2028c2ecf20Sopenharmony_cispecial mention because the data for them is stored within the inode 2038c2ecf20Sopenharmony_ciitself if the symlink is less than 60 bytes long. It uses the fields 2048c2ecf20Sopenharmony_ciwhich would normally be used to store the pointers to data blocks. 2058c2ecf20Sopenharmony_ciThis is a worthwhile optimisation as it we avoid allocating a full 2068c2ecf20Sopenharmony_ciblock for the symlink, and most symlinks are less than 60 characters long. 2078c2ecf20Sopenharmony_ci 2088c2ecf20Sopenharmony_ciCharacter and block special devices never have data blocks assigned to 2098c2ecf20Sopenharmony_cithem. Instead, their device number is stored in the inode, again reusing 2108c2ecf20Sopenharmony_cithe fields which would be used to point to the data blocks. 2118c2ecf20Sopenharmony_ci 2128c2ecf20Sopenharmony_ciReserved Space 2138c2ecf20Sopenharmony_ci-------------- 2148c2ecf20Sopenharmony_ci 2158c2ecf20Sopenharmony_ciIn ext2, there is a mechanism for reserving a certain number of blocks 2168c2ecf20Sopenharmony_cifor a particular user (normally the super-user). This is intended to 2178c2ecf20Sopenharmony_ciallow for the system to continue functioning even if non-privileged users 2188c2ecf20Sopenharmony_cifill up all the space available to them (this is independent of filesystem 2198c2ecf20Sopenharmony_ciquotas). It also keeps the filesystem from filling up entirely which 2208c2ecf20Sopenharmony_cihelps combat fragmentation. 2218c2ecf20Sopenharmony_ci 2228c2ecf20Sopenharmony_ciFilesystem check 2238c2ecf20Sopenharmony_ci---------------- 2248c2ecf20Sopenharmony_ci 2258c2ecf20Sopenharmony_ciAt boot time, most systems run a consistency check (e2fsck) on their 2268c2ecf20Sopenharmony_cifilesystems. The superblock of the ext2 filesystem contains several 2278c2ecf20Sopenharmony_cifields which indicate whether fsck should actually run (since checking 2288c2ecf20Sopenharmony_cithe filesystem at boot can take a long time if it is large). fsck will 2298c2ecf20Sopenharmony_cirun if the filesystem was not cleanly unmounted, if the maximum mount 2308c2ecf20Sopenharmony_cicount has been exceeded or if the maximum time between checks has been 2318c2ecf20Sopenharmony_ciexceeded. 2328c2ecf20Sopenharmony_ci 2338c2ecf20Sopenharmony_ciFeature Compatibility 2348c2ecf20Sopenharmony_ci--------------------- 2358c2ecf20Sopenharmony_ci 2368c2ecf20Sopenharmony_ciThe compatibility feature mechanism used in ext2 is sophisticated. 2378c2ecf20Sopenharmony_ciIt safely allows features to be added to the filesystem, without 2388c2ecf20Sopenharmony_ciunnecessarily sacrificing compatibility with older versions of the 2398c2ecf20Sopenharmony_cifilesystem code. The feature compatibility mechanism is not supported by 2408c2ecf20Sopenharmony_cithe original revision 0 (EXT2_GOOD_OLD_REV) of ext2, but was introduced in 2418c2ecf20Sopenharmony_cirevision 1. There are three 32-bit fields, one for compatible features 2428c2ecf20Sopenharmony_ci(COMPAT), one for read-only compatible (RO_COMPAT) features and one for 2438c2ecf20Sopenharmony_ciincompatible (INCOMPAT) features. 2448c2ecf20Sopenharmony_ci 2458c2ecf20Sopenharmony_ciThese feature flags have specific meanings for the kernel as follows: 2468c2ecf20Sopenharmony_ci 2478c2ecf20Sopenharmony_ciA COMPAT flag indicates that a feature is present in the filesystem, 2488c2ecf20Sopenharmony_cibut the on-disk format is 100% compatible with older on-disk formats, so 2498c2ecf20Sopenharmony_cia kernel which didn't know anything about this feature could read/write 2508c2ecf20Sopenharmony_cithe filesystem without any chance of corrupting the filesystem (or even 2518c2ecf20Sopenharmony_cimaking it inconsistent). This is essentially just a flag which says 2528c2ecf20Sopenharmony_ci"this filesystem has a (hidden) feature" that the kernel or e2fsck may 2538c2ecf20Sopenharmony_ciwant to be aware of (more on e2fsck and feature flags later). The ext3 2548c2ecf20Sopenharmony_ciHAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply 2558c2ecf20Sopenharmony_cia regular file with data blocks in it so the kernel does not need to 2568c2ecf20Sopenharmony_citake any special notice of it if it doesn't understand ext3 journaling. 2578c2ecf20Sopenharmony_ci 2588c2ecf20Sopenharmony_ciAn RO_COMPAT flag indicates that the on-disk format is 100% compatible 2598c2ecf20Sopenharmony_ciwith older on-disk formats for reading (i.e. the feature does not change 2608c2ecf20Sopenharmony_cithe visible on-disk format). However, an old kernel writing to such a 2618c2ecf20Sopenharmony_cifilesystem would/could corrupt the filesystem, so this is prevented. The 2628c2ecf20Sopenharmony_cimost common such feature, SPARSE_SUPER, is an RO_COMPAT feature because 2638c2ecf20Sopenharmony_cisparse groups allow file data blocks where superblock/group descriptor 2648c2ecf20Sopenharmony_cibackups used to live, and ext2_free_blocks() refuses to free these blocks, 2658c2ecf20Sopenharmony_ciwhich would leading to inconsistent bitmaps. An old kernel would also 2668c2ecf20Sopenharmony_ciget an error if it tried to free a series of blocks which crossed a group 2678c2ecf20Sopenharmony_ciboundary, but this is a legitimate layout in a SPARSE_SUPER filesystem. 2688c2ecf20Sopenharmony_ci 2698c2ecf20Sopenharmony_ciAn INCOMPAT flag indicates the on-disk format has changed in some 2708c2ecf20Sopenharmony_ciway that makes it unreadable by older kernels, or would otherwise 2718c2ecf20Sopenharmony_cicause a problem if an old kernel tried to mount it. FILETYPE is an 2728c2ecf20Sopenharmony_ciINCOMPAT flag because older kernels would think a filename was longer 2738c2ecf20Sopenharmony_cithan 256 characters, which would lead to corrupt directory listings. 2748c2ecf20Sopenharmony_ciThe COMPRESSION flag is an obvious INCOMPAT flag - if the kernel 2758c2ecf20Sopenharmony_cidoesn't understand compression, you would just get garbage back from 2768c2ecf20Sopenharmony_ciread() instead of it automatically decompressing your data. The ext3 2778c2ecf20Sopenharmony_ciRECOVER flag is needed to prevent a kernel which does not understand the 2788c2ecf20Sopenharmony_ciext3 journal from mounting the filesystem without replaying the journal. 2798c2ecf20Sopenharmony_ci 2808c2ecf20Sopenharmony_ciFor e2fsck, it needs to be more strict with the handling of these 2818c2ecf20Sopenharmony_ciflags than the kernel. If it doesn't understand ANY of the COMPAT, 2828c2ecf20Sopenharmony_ciRO_COMPAT, or INCOMPAT flags it will refuse to check the filesystem, 2838c2ecf20Sopenharmony_cibecause it has no way of verifying whether a given feature is valid 2848c2ecf20Sopenharmony_cior not. Allowing e2fsck to succeed on a filesystem with an unknown 2858c2ecf20Sopenharmony_cifeature is a false sense of security for the user. Refusing to check 2868c2ecf20Sopenharmony_cia filesystem with unknown features is a good incentive for the user to 2878c2ecf20Sopenharmony_ciupdate to the latest e2fsck. This also means that anyone adding feature 2888c2ecf20Sopenharmony_ciflags to ext2 also needs to update e2fsck to verify these features. 2898c2ecf20Sopenharmony_ci 2908c2ecf20Sopenharmony_ciMetadata 2918c2ecf20Sopenharmony_ci-------- 2928c2ecf20Sopenharmony_ci 2938c2ecf20Sopenharmony_ciIt is frequently claimed that the ext2 implementation of writing 2948c2ecf20Sopenharmony_ciasynchronous metadata is faster than the ffs synchronous metadata 2958c2ecf20Sopenharmony_cischeme but less reliable. Both methods are equally resolvable by their 2968c2ecf20Sopenharmony_cirespective fsck programs. 2978c2ecf20Sopenharmony_ci 2988c2ecf20Sopenharmony_ciIf you're exceptionally paranoid, there are 3 ways of making metadata 2998c2ecf20Sopenharmony_ciwrites synchronous on ext2: 3008c2ecf20Sopenharmony_ci 3018c2ecf20Sopenharmony_ci- per-file if you have the program source: use the O_SYNC flag to open() 3028c2ecf20Sopenharmony_ci- per-file if you don't have the source: use "chattr +S" on the file 3038c2ecf20Sopenharmony_ci- per-filesystem: add the "sync" option to mount (or in /etc/fstab) 3048c2ecf20Sopenharmony_ci 3058c2ecf20Sopenharmony_cithe first and last are not ext2 specific but do force the metadata to 3068c2ecf20Sopenharmony_cibe written synchronously. See also Journaling below. 3078c2ecf20Sopenharmony_ci 3088c2ecf20Sopenharmony_ciLimitations 3098c2ecf20Sopenharmony_ci----------- 3108c2ecf20Sopenharmony_ci 3118c2ecf20Sopenharmony_ciThere are various limits imposed by the on-disk layout of ext2. Other 3128c2ecf20Sopenharmony_cilimits are imposed by the current implementation of the kernel code. 3138c2ecf20Sopenharmony_ciMany of the limits are determined at the time the filesystem is first 3148c2ecf20Sopenharmony_cicreated, and depend upon the block size chosen. The ratio of inodes to 3158c2ecf20Sopenharmony_cidata blocks is fixed at filesystem creation time, so the only way to 3168c2ecf20Sopenharmony_ciincrease the number of inodes is to increase the size of the filesystem. 3178c2ecf20Sopenharmony_ciNo tools currently exist which can change the ratio of inodes to blocks. 3188c2ecf20Sopenharmony_ci 3198c2ecf20Sopenharmony_ciMost of these limits could be overcome with slight changes in the on-disk 3208c2ecf20Sopenharmony_ciformat and using a compatibility flag to signal the format change (at 3218c2ecf20Sopenharmony_cithe expense of some compatibility). 3228c2ecf20Sopenharmony_ci 3238c2ecf20Sopenharmony_ci===================== ======= ======= ======= ======== 3248c2ecf20Sopenharmony_ciFilesystem block size 1kB 2kB 4kB 8kB 3258c2ecf20Sopenharmony_ci===================== ======= ======= ======= ======== 3268c2ecf20Sopenharmony_ciFile size limit 16GB 256GB 2048GB 2048GB 3278c2ecf20Sopenharmony_ciFilesystem size limit 2047GB 8192GB 16384GB 32768GB 3288c2ecf20Sopenharmony_ci===================== ======= ======= ======= ======== 3298c2ecf20Sopenharmony_ci 3308c2ecf20Sopenharmony_ciThere is a 2.4 kernel limit of 2048GB for a single block device, so no 3318c2ecf20Sopenharmony_cifilesystem larger than that can be created at this time. There is also 3328c2ecf20Sopenharmony_cian upper limit on the block size imposed by the page size of the kernel, 3338c2ecf20Sopenharmony_ciso 8kB blocks are only allowed on Alpha systems (and other architectures 3348c2ecf20Sopenharmony_ciwhich support larger pages). 3358c2ecf20Sopenharmony_ci 3368c2ecf20Sopenharmony_ciThere is an upper limit of 32000 subdirectories in a single directory. 3378c2ecf20Sopenharmony_ci 3388c2ecf20Sopenharmony_ciThere is a "soft" upper limit of about 10-15k files in a single directory 3398c2ecf20Sopenharmony_ciwith the current linear linked-list directory implementation. This limit 3408c2ecf20Sopenharmony_cistems from performance problems when creating and deleting (and also 3418c2ecf20Sopenharmony_cifinding) files in such large directories. Using a hashed directory index 3428c2ecf20Sopenharmony_ci(under development) allows 100k-1M+ files in a single directory without 3438c2ecf20Sopenharmony_ciperformance problems (although RAM size becomes an issue at this point). 3448c2ecf20Sopenharmony_ci 3458c2ecf20Sopenharmony_ciThe (meaningless) absolute upper limit of files in a single directory 3468c2ecf20Sopenharmony_ci(imposed by the file size, the realistic limit is obviously much less) 3478c2ecf20Sopenharmony_ciis over 130 trillion files. It would be higher except there are not 3488c2ecf20Sopenharmony_cienough 4-character names to make up unique directory entries, so they 3498c2ecf20Sopenharmony_cihave to be 8 character filenames, even then we are fairly close to 3508c2ecf20Sopenharmony_cirunning out of unique filenames. 3518c2ecf20Sopenharmony_ci 3528c2ecf20Sopenharmony_ciJournaling 3538c2ecf20Sopenharmony_ci---------- 3548c2ecf20Sopenharmony_ci 3558c2ecf20Sopenharmony_ciA journaling extension to the ext2 code has been developed by Stephen 3568c2ecf20Sopenharmony_ciTweedie. It avoids the risks of metadata corruption and the need to 3578c2ecf20Sopenharmony_ciwait for e2fsck to complete after a crash, without requiring a change 3588c2ecf20Sopenharmony_cito the on-disk ext2 layout. In a nutshell, the journal is a regular 3598c2ecf20Sopenharmony_cifile which stores whole metadata (and optionally data) blocks that have 3608c2ecf20Sopenharmony_cibeen modified, prior to writing them into the filesystem. This means 3618c2ecf20Sopenharmony_ciit is possible to add a journal to an existing ext2 filesystem without 3628c2ecf20Sopenharmony_cithe need for data conversion. 3638c2ecf20Sopenharmony_ci 3648c2ecf20Sopenharmony_ciWhen changes to the filesystem (e.g. a file is renamed) they are stored in 3658c2ecf20Sopenharmony_cia transaction in the journal and can either be complete or incomplete at 3668c2ecf20Sopenharmony_cithe time of a crash. If a transaction is complete at the time of a crash 3678c2ecf20Sopenharmony_ci(or in the normal case where the system does not crash), then any blocks 3688c2ecf20Sopenharmony_ciin that transaction are guaranteed to represent a valid filesystem state, 3698c2ecf20Sopenharmony_ciand are copied into the filesystem. If a transaction is incomplete at 3708c2ecf20Sopenharmony_cithe time of the crash, then there is no guarantee of consistency for 3718c2ecf20Sopenharmony_cithe blocks in that transaction so they are discarded (which means any 3728c2ecf20Sopenharmony_cifilesystem changes they represent are also lost). 3738c2ecf20Sopenharmony_ciCheck Documentation/filesystems/ext4/ if you want to read more about 3748c2ecf20Sopenharmony_ciext4 and journaling. 3758c2ecf20Sopenharmony_ci 3768c2ecf20Sopenharmony_ciReferences 3778c2ecf20Sopenharmony_ci========== 3788c2ecf20Sopenharmony_ci 3798c2ecf20Sopenharmony_ci======================= =============================================== 3808c2ecf20Sopenharmony_ciThe kernel source file:/usr/src/linux/fs/ext2/ 3818c2ecf20Sopenharmony_cie2fsprogs (e2fsck) http://e2fsprogs.sourceforge.net/ 3828c2ecf20Sopenharmony_ciDesign & Implementation http://e2fsprogs.sourceforge.net/ext2intro.html 3838c2ecf20Sopenharmony_ciJournaling (ext3) ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/ 3848c2ecf20Sopenharmony_ciFilesystem Resizing http://ext2resize.sourceforge.net/ 3858c2ecf20Sopenharmony_ciCompression [1]_ http://e2compr.sourceforge.net/ 3868c2ecf20Sopenharmony_ci======================= =============================================== 3878c2ecf20Sopenharmony_ci 3888c2ecf20Sopenharmony_ciImplementations for: 3898c2ecf20Sopenharmony_ci 3908c2ecf20Sopenharmony_ci======================= =========================================================== 3918c2ecf20Sopenharmony_ciWindows 95/98/NT/2000 http://www.chrysocome.net/explore2fs 3928c2ecf20Sopenharmony_ciWindows 95 [1]_ http://www.yipton.net/content.html#FSDEXT2 3938c2ecf20Sopenharmony_ciDOS client [1]_ ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/ 3948c2ecf20Sopenharmony_ciOS/2 [2]_ ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/ 3958c2ecf20Sopenharmony_ciRISC OS client http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/ 3968c2ecf20Sopenharmony_ci======================= =========================================================== 3978c2ecf20Sopenharmony_ci 3988c2ecf20Sopenharmony_ci.. [1] no longer actively developed/supported (as of Apr 2001) 3998c2ecf20Sopenharmony_ci.. [2] no longer actively developed/supported (as of Mar 2009) 400