119ea8026Sopenharmony_ci## The design of littlefs
219ea8026Sopenharmony_ci
319ea8026Sopenharmony_ciA little fail-safe filesystem designed for microcontrollers.
419ea8026Sopenharmony_ci
519ea8026Sopenharmony_ci```
619ea8026Sopenharmony_ci   | | |     .---._____
719ea8026Sopenharmony_ci  .-----.   |          |
819ea8026Sopenharmony_ci--|o    |---| littlefs |
919ea8026Sopenharmony_ci--|     |---|          |
1019ea8026Sopenharmony_ci  '-----'   '----------'
1119ea8026Sopenharmony_ci   | | |
1219ea8026Sopenharmony_ci```
1319ea8026Sopenharmony_ci
1419ea8026Sopenharmony_cilittlefs was originally built as an experiment to learn about filesystem design
1519ea8026Sopenharmony_ciin the context of microcontrollers. The question was: How would you build a
1619ea8026Sopenharmony_cifilesystem that is resilient to power-loss and flash wear without using
1719ea8026Sopenharmony_ciunbounded memory?
1819ea8026Sopenharmony_ci
1919ea8026Sopenharmony_ciThis document covers the high-level design of littlefs, how it is different
2019ea8026Sopenharmony_cithan other filesystems, and the design decisions that got us here. For the
2119ea8026Sopenharmony_cilow-level details covering every bit on disk, check out [SPEC.md](SPEC.md).
2219ea8026Sopenharmony_ci
2319ea8026Sopenharmony_ci## The problem
2419ea8026Sopenharmony_ci
2519ea8026Sopenharmony_ciThe embedded systems littlefs targets are usually 32-bit microcontrollers with
2619ea8026Sopenharmony_ciaround 32 KiB of RAM and 512 KiB of ROM. These are often paired with SPI NOR
2719ea8026Sopenharmony_ciflash chips with about 4 MiB of flash storage. These devices are too small for
2819ea8026Sopenharmony_ciLinux and most existing filesystems, requiring code written specifically with
2919ea8026Sopenharmony_cisize in mind.
3019ea8026Sopenharmony_ci
3119ea8026Sopenharmony_ciFlash itself is an interesting piece of technology with its own quirks and
3219ea8026Sopenharmony_cinuance. Unlike other forms of storage, writing to flash requires two
3319ea8026Sopenharmony_cioperations: erasing and programming. Programming (setting bits to 0) is
3419ea8026Sopenharmony_cirelatively cheap and can be very granular. Erasing however (setting bits to 1),
3519ea8026Sopenharmony_cirequires an expensive and destructive operation which gives flash its name.
3619ea8026Sopenharmony_ci[Wikipedia][wikipedia-flash] has more information on how exactly flash works.
3719ea8026Sopenharmony_ci
3819ea8026Sopenharmony_ciTo make the situation more annoying, it's very common for these embedded
3919ea8026Sopenharmony_cisystems to lose power at any time. Usually, microcontroller code is simple and
4019ea8026Sopenharmony_cireactive, with no concept of a shutdown routine. This presents a big challenge
4119ea8026Sopenharmony_cifor persistent storage, where an unlucky power loss can corrupt the storage and
4219ea8026Sopenharmony_cileave a device unrecoverable.
4319ea8026Sopenharmony_ci
4419ea8026Sopenharmony_ciThis leaves us with three major requirements for an embedded filesystem.
4519ea8026Sopenharmony_ci
4619ea8026Sopenharmony_ci1. **Power-loss resilience** - On these systems, power can be lost at any time.
4719ea8026Sopenharmony_ci   If a power loss corrupts any persistent data structures, this can cause the
4819ea8026Sopenharmony_ci   device to become unrecoverable. An embedded filesystem must be designed to
4919ea8026Sopenharmony_ci   recover from a power loss during any write operation.
5019ea8026Sopenharmony_ci
5119ea8026Sopenharmony_ci1. **Wear leveling** - Writing to flash is destructive. If a filesystem
5219ea8026Sopenharmony_ci   repeatedly writes to the same block, eventually that block will wear out.
5319ea8026Sopenharmony_ci   Filesystems that don't take wear into account can easily burn through blocks
5419ea8026Sopenharmony_ci   used to store frequently updated metadata and cause a device's early death.
5519ea8026Sopenharmony_ci
5619ea8026Sopenharmony_ci1. **Bounded RAM/ROM** - If the above requirements weren't enough, these
5719ea8026Sopenharmony_ci   systems also have very limited amounts of memory. This prevents many
5819ea8026Sopenharmony_ci   existing filesystem designs, which can lean on relatively large amounts of
5919ea8026Sopenharmony_ci   RAM to temporarily store filesystem metadata.
6019ea8026Sopenharmony_ci
6119ea8026Sopenharmony_ci   For ROM, this means we need to keep our design simple and reuse code paths
6219ea8026Sopenharmony_ci   were possible. For RAM we have a stronger requirement, all RAM usage is
6319ea8026Sopenharmony_ci   bounded. This means RAM usage does not grow as the filesystem changes in
6419ea8026Sopenharmony_ci   size or number of files. This creates a unique challenge as even presumably
6519ea8026Sopenharmony_ci   simple operations, such as traversing the filesystem, become surprisingly
6619ea8026Sopenharmony_ci   difficult.
6719ea8026Sopenharmony_ci
6819ea8026Sopenharmony_ci## Existing designs?
6919ea8026Sopenharmony_ci
7019ea8026Sopenharmony_ciSo, what's already out there? There are, of course, many different filesystems,
7119ea8026Sopenharmony_cihowever they often share and borrow feature from each other. If we look at
7219ea8026Sopenharmony_cipower-loss resilience and wear leveling, we can narrow these down to a handful
7319ea8026Sopenharmony_ciof designs.
7419ea8026Sopenharmony_ci
7519ea8026Sopenharmony_ci1. First we have the non-resilient, block based filesystems, such as [FAT] and
7619ea8026Sopenharmony_ci   [ext2]. These are the earliest filesystem designs and often the most simple.
7719ea8026Sopenharmony_ci   Here storage is divided into blocks, with each file being stored in a
7819ea8026Sopenharmony_ci   collection of blocks. Without modifications, these filesystems are not
7919ea8026Sopenharmony_ci   power-loss resilient, so updating a file is a simple as rewriting the blocks
8019ea8026Sopenharmony_ci   in place.
8119ea8026Sopenharmony_ci
8219ea8026Sopenharmony_ci   ```
8319ea8026Sopenharmony_ci               .--------.
8419ea8026Sopenharmony_ci               |  root  |
8519ea8026Sopenharmony_ci               |        |
8619ea8026Sopenharmony_ci               |        |
8719ea8026Sopenharmony_ci               '--------'
8819ea8026Sopenharmony_ci               .-'    '-.
8919ea8026Sopenharmony_ci              v          v
9019ea8026Sopenharmony_ci         .--------.  .--------.
9119ea8026Sopenharmony_ci         |   A    |  |   B    |
9219ea8026Sopenharmony_ci         |        |  |        |
9319ea8026Sopenharmony_ci         |        |  |        |
9419ea8026Sopenharmony_ci         '--------'  '--------'
9519ea8026Sopenharmony_ci         .-'         .-'    '-.
9619ea8026Sopenharmony_ci        v           v          v
9719ea8026Sopenharmony_ci   .--------.  .--------.  .--------.
9819ea8026Sopenharmony_ci   |   C    |  |   D    |  |   E    |
9919ea8026Sopenharmony_ci   |        |  |        |  |        |
10019ea8026Sopenharmony_ci   |        |  |        |  |        |
10119ea8026Sopenharmony_ci   '--------'  '--------'  '--------'
10219ea8026Sopenharmony_ci   ```
10319ea8026Sopenharmony_ci
10419ea8026Sopenharmony_ci   Because of their simplicity, these filesystems are usually both the fastest
10519ea8026Sopenharmony_ci   and smallest. However the lack of power resilience is not great, and the
10619ea8026Sopenharmony_ci   binding relationship of storage location and data removes the filesystem's
10719ea8026Sopenharmony_ci   ability to manage wear.
10819ea8026Sopenharmony_ci
10919ea8026Sopenharmony_ci2. In a completely different direction, we have logging filesystems, such as
11019ea8026Sopenharmony_ci   [JFFS], [YAFFS], and [SPIFFS], storage location is not bound to a piece of
11119ea8026Sopenharmony_ci   data, instead the entire storage is used for a circular log which is
11219ea8026Sopenharmony_ci   appended with every change made to the filesystem. Writing appends new
11319ea8026Sopenharmony_ci   changes, while reading requires traversing the log to reconstruct a file.
11419ea8026Sopenharmony_ci   Some logging filesystems cache files to avoid the read cost, but this comes
11519ea8026Sopenharmony_ci   at a tradeoff of RAM.
11619ea8026Sopenharmony_ci
11719ea8026Sopenharmony_ci   ```
11819ea8026Sopenharmony_ci                                                             v
11919ea8026Sopenharmony_ci   .--------.--------.--------.--------.--------.--------.--------.--------.
12019ea8026Sopenharmony_ci   |        C        | new B  | new A  |                 |   A    |   B    |
12119ea8026Sopenharmony_ci   |                 |        |        |->               |        |        |
12219ea8026Sopenharmony_ci   |                 |        |        |                 |        |        |
12319ea8026Sopenharmony_ci   '--------'--------'--------'--------'--------'--------'--------'--------'
12419ea8026Sopenharmony_ci   ```
12519ea8026Sopenharmony_ci
12619ea8026Sopenharmony_ci   Logging filesystem are beautifully elegant. With a checksum, we can easily
12719ea8026Sopenharmony_ci   detect power-loss and fall back to the previous state by ignoring failed
12819ea8026Sopenharmony_ci   appends. And if that wasn't good enough, their cyclic nature means that
12919ea8026Sopenharmony_ci   logging filesystems distribute wear across storage perfectly.
13019ea8026Sopenharmony_ci
13119ea8026Sopenharmony_ci   The main downside is performance. If we look at garbage collection, the
13219ea8026Sopenharmony_ci   process of cleaning up outdated data from the end of the log, I've yet to
13319ea8026Sopenharmony_ci   see a pure logging filesystem that does not have one of these two costs:
13419ea8026Sopenharmony_ci
13519ea8026Sopenharmony_ci   1. _O(n²)_ runtime
13619ea8026Sopenharmony_ci   2. _O(n)_ RAM
13719ea8026Sopenharmony_ci
13819ea8026Sopenharmony_ci   SPIFFS is a very interesting case here, as it uses the fact that repeated
13919ea8026Sopenharmony_ci   programs to NOR flash is both atomic and masking. This is a very neat
14019ea8026Sopenharmony_ci   solution, however it limits the type of storage you can support.
14119ea8026Sopenharmony_ci
14219ea8026Sopenharmony_ci3. Perhaps the most common type of filesystem, a journaling filesystem is the
14319ea8026Sopenharmony_ci   offspring that happens when you mate a block based filesystem with a logging
14419ea8026Sopenharmony_ci   filesystem. [ext4] and [NTFS] are good examples. Here, we take a normal
14519ea8026Sopenharmony_ci   block based filesystem and add a bounded log where we note every change
14619ea8026Sopenharmony_ci   before it occurs.
14719ea8026Sopenharmony_ci
14819ea8026Sopenharmony_ci   ```
14919ea8026Sopenharmony_ci                             journal
15019ea8026Sopenharmony_ci                            .--------.--------.
15119ea8026Sopenharmony_ci               .--------.   | C'| D'|     | E'|
15219ea8026Sopenharmony_ci               |  root  |-->|   |   |->   |   |
15319ea8026Sopenharmony_ci               |        |   |   |   |     |   |
15419ea8026Sopenharmony_ci               |        |   '--------'--------'
15519ea8026Sopenharmony_ci               '--------'
15619ea8026Sopenharmony_ci               .-'    '-.
15719ea8026Sopenharmony_ci              v          v
15819ea8026Sopenharmony_ci         .--------.  .--------.
15919ea8026Sopenharmony_ci         |   A    |  |   B    |
16019ea8026Sopenharmony_ci         |        |  |        |
16119ea8026Sopenharmony_ci         |        |  |        |
16219ea8026Sopenharmony_ci         '--------'  '--------'
16319ea8026Sopenharmony_ci         .-'         .-'    '-.
16419ea8026Sopenharmony_ci        v           v          v
16519ea8026Sopenharmony_ci   .--------.  .--------.  .--------.
16619ea8026Sopenharmony_ci   |   C    |  |   D    |  |   E    |
16719ea8026Sopenharmony_ci   |        |  |        |  |        |
16819ea8026Sopenharmony_ci   |        |  |        |  |        |
16919ea8026Sopenharmony_ci   '--------'  '--------'  '--------'
17019ea8026Sopenharmony_ci   ```
17119ea8026Sopenharmony_ci
17219ea8026Sopenharmony_ci
17319ea8026Sopenharmony_ci   This sort of filesystem takes the best from both worlds. Performance can be
17419ea8026Sopenharmony_ci   as fast as a block based filesystem (though updating the journal does have
17519ea8026Sopenharmony_ci   a small cost), and atomic updates to the journal allow the filesystem to
17619ea8026Sopenharmony_ci   recover in the event of a power loss.
17719ea8026Sopenharmony_ci
17819ea8026Sopenharmony_ci   Unfortunately, journaling filesystems have a couple of problems. They are
17919ea8026Sopenharmony_ci   fairly complex, since there are effectively two filesystems running in
18019ea8026Sopenharmony_ci   parallel, which comes with a code size cost. They also offer no protection
18119ea8026Sopenharmony_ci   against wear because of the strong relationship between storage location
18219ea8026Sopenharmony_ci   and data.
18319ea8026Sopenharmony_ci
18419ea8026Sopenharmony_ci4. Last but not least we have copy-on-write (COW) filesystems, such as
18519ea8026Sopenharmony_ci   [btrfs] and [ZFS]. These are very similar to other block based filesystems,
18619ea8026Sopenharmony_ci   but instead of updating block inplace, all updates are performed by creating
18719ea8026Sopenharmony_ci   a copy with the changes and replacing any references to the old block with
18819ea8026Sopenharmony_ci   our new block. This recursively pushes all of our problems upwards until we
18919ea8026Sopenharmony_ci   reach the root of our filesystem, which is often stored in a very small log.
19019ea8026Sopenharmony_ci
19119ea8026Sopenharmony_ci   ```
19219ea8026Sopenharmony_ci               .--------.                  .--------.
19319ea8026Sopenharmony_ci               |  root  |       write      |new root|
19419ea8026Sopenharmony_ci               |        |        ==>       |        |
19519ea8026Sopenharmony_ci               |        |                  |        |
19619ea8026Sopenharmony_ci               '--------'                  '--------'
19719ea8026Sopenharmony_ci               .-'    '-.                    |    '-.
19819ea8026Sopenharmony_ci              |  .-------|------------------'        v
19919ea8026Sopenharmony_ci              v v        v                       .--------.
20019ea8026Sopenharmony_ci         .--------.  .--------.                  | new B  |
20119ea8026Sopenharmony_ci         |   A    |  |   B    |                  |        |
20219ea8026Sopenharmony_ci         |        |  |        |                  |        |
20319ea8026Sopenharmony_ci         |        |  |        |                  '--------'
20419ea8026Sopenharmony_ci         '--------'  '--------'                  .-'    |
20519ea8026Sopenharmony_ci         .-'         .-'    '-.    .------------|------'
20619ea8026Sopenharmony_ci        |           |          |  |             v
20719ea8026Sopenharmony_ci        v           v          v  v        .--------.
20819ea8026Sopenharmony_ci   .--------.  .--------.  .--------.      | new D  |
20919ea8026Sopenharmony_ci   |   C    |  |   D    |  |   E    |      |        |
21019ea8026Sopenharmony_ci   |        |  |        |  |        |      |        |
21119ea8026Sopenharmony_ci   |        |  |        |  |        |      '--------'
21219ea8026Sopenharmony_ci   '--------'  '--------'  '--------'
21319ea8026Sopenharmony_ci   ```
21419ea8026Sopenharmony_ci
21519ea8026Sopenharmony_ci   COW filesystems are interesting. They offer very similar performance to
21619ea8026Sopenharmony_ci   block based filesystems while managing to pull off atomic updates without
21719ea8026Sopenharmony_ci   storing data changes directly in a log. They even disassociate the storage
21819ea8026Sopenharmony_ci   location of data, which creates an opportunity for wear leveling.
21919ea8026Sopenharmony_ci
22019ea8026Sopenharmony_ci   Well, almost. The unbounded upwards movement of updates causes some
22119ea8026Sopenharmony_ci   problems. Because updates to a COW filesystem don't stop until they've
22219ea8026Sopenharmony_ci   reached the root, an update can cascade into a larger set of writes than
22319ea8026Sopenharmony_ci   would be needed for the original data. On top of this, the upward motion
22419ea8026Sopenharmony_ci   focuses these writes into the block, which can wear out much earlier than
22519ea8026Sopenharmony_ci   the rest of the filesystem.
22619ea8026Sopenharmony_ci
22719ea8026Sopenharmony_ci## littlefs
22819ea8026Sopenharmony_ci
22919ea8026Sopenharmony_ciSo what does littlefs do?
23019ea8026Sopenharmony_ci
23119ea8026Sopenharmony_ciIf we look at existing filesystems, there are two interesting design patterns
23219ea8026Sopenharmony_cithat stand out, but each have their own set of problems. Logging, which
23319ea8026Sopenharmony_ciprovides independent atomicity, has poor runtime performance. And COW data
23419ea8026Sopenharmony_cistructures, which perform well, push the atomicity problem upwards.
23519ea8026Sopenharmony_ci
23619ea8026Sopenharmony_ciCan we work around these limitations?
23719ea8026Sopenharmony_ci
23819ea8026Sopenharmony_ciConsider logging. It has either a _O(n²)_ runtime or _O(n)_ RAM cost. We
23919ea8026Sopenharmony_cican't avoid these costs, _but_ if we put an upper bound on the size we can at
24019ea8026Sopenharmony_cileast prevent the theoretical cost from becoming problem. This relies on the
24119ea8026Sopenharmony_cisuper secret computer science hack where you can pretend any algorithmic
24219ea8026Sopenharmony_cicomplexity is _O(1)_ by bounding the input.
24319ea8026Sopenharmony_ci
24419ea8026Sopenharmony_ciIn the case of COW data structures, we can try twisting the definition a bit.
24519ea8026Sopenharmony_ciLet's say that our COW structure doesn't copy after a single write, but instead
24619ea8026Sopenharmony_cicopies after _n_ writes. This doesn't change most COW properties (assuming you
24719ea8026Sopenharmony_cican write atomically!), but what it does do is prevent the upward motion of
24819ea8026Sopenharmony_ciwear. This sort of copy-on-bounded-writes (CObW) still focuses wear, but at
24919ea8026Sopenharmony_cieach level we divide the propagation of wear by _n_. With a sufficiently
25019ea8026Sopenharmony_cilarge _n_ (> branching factor) wear propagation is no longer a problem.
25119ea8026Sopenharmony_ci
25219ea8026Sopenharmony_ciSee where this is going? Separate, logging and COW are imperfect solutions and
25319ea8026Sopenharmony_cihave weaknesses that limit their usefulness. But if we merge the two they can
25419ea8026Sopenharmony_cimutually solve each other's limitations.
25519ea8026Sopenharmony_ci
25619ea8026Sopenharmony_ciThis is the idea behind littlefs. At the sub-block level, littlefs is built
25719ea8026Sopenharmony_ciout of small, two block logs that provide atomic updates to metadata anywhere
25819ea8026Sopenharmony_cion the filesystem. At the super-block level, littlefs is a CObW tree of blocks
25919ea8026Sopenharmony_cithat can be evicted on demand.
26019ea8026Sopenharmony_ci
26119ea8026Sopenharmony_ci```
26219ea8026Sopenharmony_ci                    root
26319ea8026Sopenharmony_ci                   .--------.--------.
26419ea8026Sopenharmony_ci                   | A'| B'|         |
26519ea8026Sopenharmony_ci                   |   |   |->       |
26619ea8026Sopenharmony_ci                   |   |   |         |
26719ea8026Sopenharmony_ci                   '--------'--------'
26819ea8026Sopenharmony_ci                .----'   '--------------.
26919ea8026Sopenharmony_ci       A       v                 B       v
27019ea8026Sopenharmony_ci      .--------.--------.       .--------.--------.
27119ea8026Sopenharmony_ci      | C'| D'|         |       | E'|new|         |
27219ea8026Sopenharmony_ci      |   |   |->       |       |   | E'|->       |
27319ea8026Sopenharmony_ci      |   |   |         |       |   |   |         |
27419ea8026Sopenharmony_ci      '--------'--------'       '--------'--------'
27519ea8026Sopenharmony_ci      .-'   '--.                  |   '------------------.
27619ea8026Sopenharmony_ci     v          v              .-'                        v
27719ea8026Sopenharmony_ci.--------.  .--------.        v                       .--------.
27819ea8026Sopenharmony_ci|   C    |  |   D    |   .--------.       write       | new E  |
27919ea8026Sopenharmony_ci|        |  |        |   |   E    |        ==>        |        |
28019ea8026Sopenharmony_ci|        |  |        |   |        |                   |        |
28119ea8026Sopenharmony_ci'--------'  '--------'   |        |                   '--------'
28219ea8026Sopenharmony_ci                         '--------'                   .-'    |
28319ea8026Sopenharmony_ci                         .-'    '-.    .-------------|------'
28419ea8026Sopenharmony_ci                        v          v  v              v
28519ea8026Sopenharmony_ci                   .--------.  .--------.       .--------.
28619ea8026Sopenharmony_ci                   |   F    |  |   G    |       | new F  |
28719ea8026Sopenharmony_ci                   |        |  |        |       |        |
28819ea8026Sopenharmony_ci                   |        |  |        |       |        |
28919ea8026Sopenharmony_ci                   '--------'  '--------'       '--------'
29019ea8026Sopenharmony_ci```
29119ea8026Sopenharmony_ci
29219ea8026Sopenharmony_ciThere are still some minor issues. Small logs can be expensive in terms of
29319ea8026Sopenharmony_cistorage, in the worst case a small log costs 4x the size of the original data.
29419ea8026Sopenharmony_ciCObW structures require an efficient block allocator since allocation occurs
29519ea8026Sopenharmony_cievery _n_ writes. And there is still the challenge of keeping the RAM usage
29619ea8026Sopenharmony_ciconstant.
29719ea8026Sopenharmony_ci
29819ea8026Sopenharmony_ci## Metadata pairs
29919ea8026Sopenharmony_ci
30019ea8026Sopenharmony_ciMetadata pairs are the backbone of littlefs. These are small, two block logs
30119ea8026Sopenharmony_cithat allow atomic updates anywhere in the filesystem.
30219ea8026Sopenharmony_ci
30319ea8026Sopenharmony_ciWhy two blocks? Well, logs work by appending entries to a circular buffer
30419ea8026Sopenharmony_cistored on disk. But remember that flash has limited write granularity. We can
30519ea8026Sopenharmony_ciincrementally program new data onto erased blocks, but we need to erase a full
30619ea8026Sopenharmony_ciblock at a time. This means that in order for our circular buffer to work, we
30719ea8026Sopenharmony_cineed more than one block.
30819ea8026Sopenharmony_ci
30919ea8026Sopenharmony_ciWe could make our logs larger than two blocks, but the next challenge is how
31019ea8026Sopenharmony_cido we store references to these logs? Because the blocks themselves are erased
31119ea8026Sopenharmony_ciduring writes, using a data structure to track these blocks is complicated.
31219ea8026Sopenharmony_ciThe simple solution here is to store a two block addresses for every metadata
31319ea8026Sopenharmony_cipair. This has the added advantage that we can change out blocks in the
31419ea8026Sopenharmony_cimetadata pair independently, and we don't reduce our block granularity for
31519ea8026Sopenharmony_ciother operations.
31619ea8026Sopenharmony_ci
31719ea8026Sopenharmony_ciIn order to determine which metadata block is the most recent, we store a
31819ea8026Sopenharmony_cirevision count that we compare using [sequence arithmetic][wikipedia-sna]
31919ea8026Sopenharmony_ci(very handy for avoiding problems with integer overflow). Conveniently, this
32019ea8026Sopenharmony_cirevision count also gives us a rough idea of how many erases have occurred on
32119ea8026Sopenharmony_cithe block.
32219ea8026Sopenharmony_ci
32319ea8026Sopenharmony_ci```
32419ea8026Sopenharmony_cimetadata pair pointer: {block 0, block 1}
32519ea8026Sopenharmony_ci                           |        '--------------------.
32619ea8026Sopenharmony_ci                            '-.                           |
32719ea8026Sopenharmony_cidisk                           v                          v
32819ea8026Sopenharmony_ci.--------.--------.--------.--------.--------.--------.--------.--------.
32919ea8026Sopenharmony_ci|                 |        |metadata|                 |metadata|        |
33019ea8026Sopenharmony_ci|                 |        |block 0 |                 |block 1 |        |
33119ea8026Sopenharmony_ci|                 |        |        |                 |        |        |
33219ea8026Sopenharmony_ci'--------'--------'--------'--------'--------'--------'--------'--------'
33319ea8026Sopenharmony_ci                               '--.                  .----'
33419ea8026Sopenharmony_ci                                   v                v
33519ea8026Sopenharmony_ci             metadata pair .----------------.----------------.
33619ea8026Sopenharmony_ci                           |   revision 11  |   revision 12  |
33719ea8026Sopenharmony_ci             block 1 is    |----------------|----------------|
33819ea8026Sopenharmony_ci             most recent   |       A        |       A''      |
33919ea8026Sopenharmony_ci                           |----------------|----------------|
34019ea8026Sopenharmony_ci                           |    checksum    |    checksum    |
34119ea8026Sopenharmony_ci                           |----------------|----------------|
34219ea8026Sopenharmony_ci                           |       B        |       A'''     | <- most recent A
34319ea8026Sopenharmony_ci                           |----------------|----------------|
34419ea8026Sopenharmony_ci                           |       A''      |    checksum    |
34519ea8026Sopenharmony_ci                           |----------------|----------------|
34619ea8026Sopenharmony_ci                           |    checksum    |       |        |
34719ea8026Sopenharmony_ci                           |----------------|       v        |
34819ea8026Sopenharmony_ci                           '----------------'----------------'
34919ea8026Sopenharmony_ci```
35019ea8026Sopenharmony_ci
35119ea8026Sopenharmony_ciSo how do we atomically update our metadata pairs? Atomicity (a type of
35219ea8026Sopenharmony_cipower-loss resilience) requires two parts: redundancy and error detection.
35319ea8026Sopenharmony_ciError detection can be provided with a checksum, and in littlefs's case we
35419ea8026Sopenharmony_ciuse a 32-bit [CRC][wikipedia-crc]. Maintaining redundancy, on the other hand,
35519ea8026Sopenharmony_cirequires multiple stages.
35619ea8026Sopenharmony_ci
35719ea8026Sopenharmony_ci1. If our block is not full and the program size is small enough to let us
35819ea8026Sopenharmony_ci   append more entries, we can simply append the entries to the log. Because
35919ea8026Sopenharmony_ci   we don't overwrite the original entries (remember rewriting flash requires
36019ea8026Sopenharmony_ci   an erase), we still have the original entries if we lose power during the
36119ea8026Sopenharmony_ci   append.
36219ea8026Sopenharmony_ci
36319ea8026Sopenharmony_ci   ```
36419ea8026Sopenharmony_ci                                    commit A
36519ea8026Sopenharmony_ci   .----------------.----------------.    .----------------.----------------.
36619ea8026Sopenharmony_ci   |   revision 1   |   revision 0   | => |   revision 1   |   revision 0   |
36719ea8026Sopenharmony_ci   |----------------|----------------|    |----------------|----------------|
36819ea8026Sopenharmony_ci   |       |        |                |    |       A        |                |
36919ea8026Sopenharmony_ci   |       v        |                |    |----------------|                |
37019ea8026Sopenharmony_ci   |                |                |    |    checksum    |                |
37119ea8026Sopenharmony_ci   |                |                |    |----------------|                |
37219ea8026Sopenharmony_ci   |                |                |    |       |        |                |
37319ea8026Sopenharmony_ci   |                |                |    |       v        |                |
37419ea8026Sopenharmony_ci   |                |                |    |                |                |
37519ea8026Sopenharmony_ci   |                |                |    |                |                |
37619ea8026Sopenharmony_ci   |                |                |    |                |                |
37719ea8026Sopenharmony_ci   |                |                |    |                |                |
37819ea8026Sopenharmony_ci   '----------------'----------------'    '----------------'----------------'
37919ea8026Sopenharmony_ci   ```
38019ea8026Sopenharmony_ci
38119ea8026Sopenharmony_ci   Note that littlefs doesn't maintain a checksum for each entry. Many logging
38219ea8026Sopenharmony_ci   filesystems do this, but it limits what you can update in a single atomic
38319ea8026Sopenharmony_ci   operation. What we can do instead is group multiple entries into a commit
38419ea8026Sopenharmony_ci   that shares a single checksum. This lets us update multiple unrelated pieces
38519ea8026Sopenharmony_ci   of metadata as long as they reside on the same metadata pair.
38619ea8026Sopenharmony_ci
38719ea8026Sopenharmony_ci   ```
38819ea8026Sopenharmony_ci                                 commit B and A'
38919ea8026Sopenharmony_ci   .----------------.----------------.    .----------------.----------------.
39019ea8026Sopenharmony_ci   |   revision 1   |   revision 0   | => |   revision 1   |   revision 0   |
39119ea8026Sopenharmony_ci   |----------------|----------------|    |----------------|----------------|
39219ea8026Sopenharmony_ci   |       A        |                |    |       A        |                |
39319ea8026Sopenharmony_ci   |----------------|                |    |----------------|                |
39419ea8026Sopenharmony_ci   |    checksum    |                |    |    checksum    |                |
39519ea8026Sopenharmony_ci   |----------------|                |    |----------------|                |
39619ea8026Sopenharmony_ci   |       |        |                |    |       B        |                |
39719ea8026Sopenharmony_ci   |       v        |                |    |----------------|                |
39819ea8026Sopenharmony_ci   |                |                |    |       A'       |                |
39919ea8026Sopenharmony_ci   |                |                |    |----------------|                |
40019ea8026Sopenharmony_ci   |                |                |    |    checksum    |                |
40119ea8026Sopenharmony_ci   |                |                |    |----------------|                |
40219ea8026Sopenharmony_ci   '----------------'----------------'    '----------------'----------------'
40319ea8026Sopenharmony_ci   ```
40419ea8026Sopenharmony_ci
40519ea8026Sopenharmony_ci2. If our block _is_ full of entries, we need to somehow remove outdated
40619ea8026Sopenharmony_ci   entries to make space for new ones. This process is called garbage
40719ea8026Sopenharmony_ci   collection, but because littlefs has multiple garbage collectors, we
40819ea8026Sopenharmony_ci   also call this specific case compaction.
40919ea8026Sopenharmony_ci
41019ea8026Sopenharmony_ci   Compared to other filesystems, littlefs's garbage collector is relatively
41119ea8026Sopenharmony_ci   simple. We want to avoid RAM consumption, so we use a sort of brute force
41219ea8026Sopenharmony_ci   solution where for each entry we check to see if a newer entry has been
41319ea8026Sopenharmony_ci   written. If the entry is the most recent we append it to our new block. This
41419ea8026Sopenharmony_ci   is where having two blocks becomes important, if we lose power we still have
41519ea8026Sopenharmony_ci   everything in our original block.
41619ea8026Sopenharmony_ci
41719ea8026Sopenharmony_ci   During this compaction step we also erase the metadata block and increment
41819ea8026Sopenharmony_ci   the revision count. Because we can commit multiple entries at once, we can
41919ea8026Sopenharmony_ci   write all of these changes to the second block without worrying about power
42019ea8026Sopenharmony_ci   loss. It's only when the commit's checksum is written that the compacted
42119ea8026Sopenharmony_ci   entries and revision count become committed and readable.
42219ea8026Sopenharmony_ci
42319ea8026Sopenharmony_ci   ```
42419ea8026Sopenharmony_ci                           commit B', need to compact
42519ea8026Sopenharmony_ci   .----------------.----------------.    .----------------.----------------.
42619ea8026Sopenharmony_ci   |   revision 1   |   revision 0   | => |   revision 1   |   revision 2   |
42719ea8026Sopenharmony_ci   |----------------|----------------|    |----------------|----------------|
42819ea8026Sopenharmony_ci   |       A        |                |    |       A        |       A'       |
42919ea8026Sopenharmony_ci   |----------------|                |    |----------------|----------------|
43019ea8026Sopenharmony_ci   |    checksum    |                |    |    checksum    |       B'       |
43119ea8026Sopenharmony_ci   |----------------|                |    |----------------|----------------|
43219ea8026Sopenharmony_ci   |       B        |                |    |       B        |    checksum    |
43319ea8026Sopenharmony_ci   |----------------|                |    |----------------|----------------|
43419ea8026Sopenharmony_ci   |       A'       |                |    |       A'       |       |        |
43519ea8026Sopenharmony_ci   |----------------|                |    |----------------|       v        |
43619ea8026Sopenharmony_ci   |    checksum    |                |    |    checksum    |                |
43719ea8026Sopenharmony_ci   |----------------|                |    |----------------|                |
43819ea8026Sopenharmony_ci   '----------------'----------------'    '----------------'----------------'
43919ea8026Sopenharmony_ci   ```
44019ea8026Sopenharmony_ci
44119ea8026Sopenharmony_ci3. If our block is full of entries _and_ we can't find any garbage, then what?
44219ea8026Sopenharmony_ci   At this point, most logging filesystems would return an error indicating no
44319ea8026Sopenharmony_ci   more space is available, but because we have small logs, overflowing a log
44419ea8026Sopenharmony_ci   isn't really an error condition.
44519ea8026Sopenharmony_ci
44619ea8026Sopenharmony_ci   Instead, we split our original metadata pair into two metadata pairs, each
44719ea8026Sopenharmony_ci   containing half of the entries, connected by a tail pointer. Instead of
44819ea8026Sopenharmony_ci   increasing the size of the log and dealing with the scalability issues
44919ea8026Sopenharmony_ci   associated with larger logs, we form a linked list of small bounded logs.
45019ea8026Sopenharmony_ci   This is a tradeoff as this approach does use more storage space, but at the
45119ea8026Sopenharmony_ci   benefit of improved scalability.
45219ea8026Sopenharmony_ci
45319ea8026Sopenharmony_ci   Despite writing to two metadata pairs, we can still maintain power
45419ea8026Sopenharmony_ci   resilience during this split step by first preparing the new metadata pair,
45519ea8026Sopenharmony_ci   and then inserting the tail pointer during the commit to the original
45619ea8026Sopenharmony_ci   metadata pair.
45719ea8026Sopenharmony_ci
45819ea8026Sopenharmony_ci   ```
45919ea8026Sopenharmony_ci                         commit C and D, need to split
46019ea8026Sopenharmony_ci   .----------------.----------------.    .----------------.----------------.
46119ea8026Sopenharmony_ci   |   revision 1   |   revision 2   | => |   revision 3   |   revision 2   |
46219ea8026Sopenharmony_ci   |----------------|----------------|    |----------------|----------------|
46319ea8026Sopenharmony_ci   |       A        |       A'       |    |       A'       |       A'       |
46419ea8026Sopenharmony_ci   |----------------|----------------|    |----------------|----------------|
46519ea8026Sopenharmony_ci   |    checksum    |       B'       |    |       B'       |       B'       |
46619ea8026Sopenharmony_ci   |----------------|----------------|    |----------------|----------------|
46719ea8026Sopenharmony_ci   |       B        |    checksum    |    |      tail    ---------------------.
46819ea8026Sopenharmony_ci   |----------------|----------------|    |----------------|----------------| |
46919ea8026Sopenharmony_ci   |       A'       |       |        |    |    checksum    |                | |
47019ea8026Sopenharmony_ci   |----------------|       v        |    |----------------|                | |
47119ea8026Sopenharmony_ci   |    checksum    |                |    |       |        |                | |
47219ea8026Sopenharmony_ci   |----------------|                |    |       v        |                | |
47319ea8026Sopenharmony_ci   '----------------'----------------'    '----------------'----------------' |
47419ea8026Sopenharmony_ci                                                   .----------------.---------'
47519ea8026Sopenharmony_ci                                                  v                v
47619ea8026Sopenharmony_ci                                          .----------------.----------------.
47719ea8026Sopenharmony_ci                                          |   revision 1   |   revision 0   |
47819ea8026Sopenharmony_ci                                          |----------------|----------------|
47919ea8026Sopenharmony_ci                                          |       C        |                |
48019ea8026Sopenharmony_ci                                          |----------------|                |
48119ea8026Sopenharmony_ci                                          |       D        |                |
48219ea8026Sopenharmony_ci                                          |----------------|                |
48319ea8026Sopenharmony_ci                                          |    checksum    |                |
48419ea8026Sopenharmony_ci                                          |----------------|                |
48519ea8026Sopenharmony_ci                                          |       |        |                |
48619ea8026Sopenharmony_ci                                          |       v        |                |
48719ea8026Sopenharmony_ci                                          |                |                |
48819ea8026Sopenharmony_ci                                          |                |                |
48919ea8026Sopenharmony_ci                                          '----------------'----------------'
49019ea8026Sopenharmony_ci   ```
49119ea8026Sopenharmony_ci
49219ea8026Sopenharmony_ciThere is another complexity the crops up when dealing with small logs. The
49319ea8026Sopenharmony_ciamortized runtime cost of garbage collection is not only dependent on its
49419ea8026Sopenharmony_cione time cost (_O(n&sup2;)_ for littlefs), but also depends on how often
49519ea8026Sopenharmony_cigarbage collection occurs.
49619ea8026Sopenharmony_ci
49719ea8026Sopenharmony_ciConsider two extremes:
49819ea8026Sopenharmony_ci
49919ea8026Sopenharmony_ci1. Log is empty, garbage collection occurs once every _n_ updates
50019ea8026Sopenharmony_ci2. Log is full, garbage collection occurs **every** update
50119ea8026Sopenharmony_ci
50219ea8026Sopenharmony_ciClearly we need to be more aggressive than waiting for our metadata pair to
50319ea8026Sopenharmony_cibe full. As the metadata pair approaches fullness the frequency of compactions
50419ea8026Sopenharmony_cigrows very rapidly.
50519ea8026Sopenharmony_ci
50619ea8026Sopenharmony_ciLooking at the problem generically, consider a log with ![n] bytes for each
50719ea8026Sopenharmony_cientry, ![d] dynamic entries (entries that are outdated during garbage
50819ea8026Sopenharmony_cicollection), and ![s] static entries (entries that need to be copied during
50919ea8026Sopenharmony_cigarbage collection). If we look at the amortized runtime complexity of updating
51019ea8026Sopenharmony_cithis log we get this formula:
51119ea8026Sopenharmony_ci
51219ea8026Sopenharmony_ci![cost = n + n (s / d+1)][metadata-formula1]
51319ea8026Sopenharmony_ci
51419ea8026Sopenharmony_ciIf we let ![r] be the ratio of static space to the size of our log in bytes, we
51519ea8026Sopenharmony_cifind an alternative representation of the number of static and dynamic entries:
51619ea8026Sopenharmony_ci
51719ea8026Sopenharmony_ci![s = r (size/n)][metadata-formula2]
51819ea8026Sopenharmony_ci
51919ea8026Sopenharmony_ci![d = (1 - r) (size/n)][metadata-formula3]
52019ea8026Sopenharmony_ci
52119ea8026Sopenharmony_ciSubstituting these in for ![d] and ![s] gives us a nice formula for the cost of
52219ea8026Sopenharmony_ciupdating an entry given how full the log is:
52319ea8026Sopenharmony_ci
52419ea8026Sopenharmony_ci![cost = n + n (r (size/n) / ((1-r) (size/n) + 1))][metadata-formula4]
52519ea8026Sopenharmony_ci
52619ea8026Sopenharmony_ciAssuming 100 byte entries in a 4 KiB log, we can graph this using the entry
52719ea8026Sopenharmony_cisize to find a multiplicative cost:
52819ea8026Sopenharmony_ci
52919ea8026Sopenharmony_ci![Metadata pair update cost graph][metadata-cost-graph]
53019ea8026Sopenharmony_ci
53119ea8026Sopenharmony_ciSo at 50% usage, we're seeing an average of 2x cost per update, and at 75%
53219ea8026Sopenharmony_ciusage, we're already at an average of 4x cost per update.
53319ea8026Sopenharmony_ci
53419ea8026Sopenharmony_ciTo avoid this exponential growth, instead of waiting for our metadata pair
53519ea8026Sopenharmony_cito be full, we split the metadata pair once we exceed 50% capacity. We do this
53619ea8026Sopenharmony_cilazily, waiting until we need to compact before checking if we fit in our 50%
53719ea8026Sopenharmony_cilimit. This limits the overhead of garbage collection to 2x the runtime cost,
53819ea8026Sopenharmony_cigiving us an amortized runtime complexity of _O(1)_.
53919ea8026Sopenharmony_ci
54019ea8026Sopenharmony_ci---
54119ea8026Sopenharmony_ci
54219ea8026Sopenharmony_ciIf we look at metadata pairs and linked-lists of metadata pairs at a high
54319ea8026Sopenharmony_cilevel, they have fairly nice runtime costs. Assuming _n_ metadata pairs,
54419ea8026Sopenharmony_cieach containing _m_ metadata entries, the _lookup_ cost for a specific
54519ea8026Sopenharmony_cientry has a worst case runtime complexity of _O(nm)_. For _updating_ a specific
54619ea8026Sopenharmony_cientry, the worst case complexity is _O(nm&sup2;)_, with an amortized complexity
54719ea8026Sopenharmony_ciof only _O(nm)_.
54819ea8026Sopenharmony_ci
54919ea8026Sopenharmony_ciHowever, splitting at 50% capacity does mean that in the best case our
55019ea8026Sopenharmony_cimetadata pairs will only be 1/2 full. If we include the overhead of the second
55119ea8026Sopenharmony_ciblock in our metadata pair, each metadata entry has an effective storage cost
55219ea8026Sopenharmony_ciof 4x the original size. I imagine users would not be happy if they found
55319ea8026Sopenharmony_cithat they can only use a quarter of their original storage. Metadata pairs
55419ea8026Sopenharmony_ciprovide a mechanism for performing atomic updates, but we need a separate
55519ea8026Sopenharmony_cimechanism for storing the bulk of our data.
55619ea8026Sopenharmony_ci
55719ea8026Sopenharmony_ci## CTZ skip-lists
55819ea8026Sopenharmony_ci
55919ea8026Sopenharmony_ciMetadata pairs provide efficient atomic updates but unfortunately have a large
56019ea8026Sopenharmony_cistorage cost. But we can work around this storage cost by only using the
56119ea8026Sopenharmony_cimetadata pairs to store references to more dense, copy-on-write (COW) data
56219ea8026Sopenharmony_cistructures.
56319ea8026Sopenharmony_ci
56419ea8026Sopenharmony_ci[Copy-on-write data structures][wikipedia-cow], also called purely functional
56519ea8026Sopenharmony_cidata structures, are a category of data structures where the underlying
56619ea8026Sopenharmony_cielements are immutable.  Making changes to the data requires creating new
56719ea8026Sopenharmony_cielements containing a copy of the updated data and replacing any references
56819ea8026Sopenharmony_ciwith references to the new elements. Generally, the performance of a COW data
56919ea8026Sopenharmony_cistructure depends on how many old elements can be reused after replacing parts
57019ea8026Sopenharmony_ciof the data.
57119ea8026Sopenharmony_ci
57219ea8026Sopenharmony_cilittlefs has several requirements of its COW structures. They need to be
57319ea8026Sopenharmony_ciefficient to read and write, but most frustrating, they need to be traversable
57419ea8026Sopenharmony_ciwith a constant amount of RAM. Notably this rules out
57519ea8026Sopenharmony_ci[B-trees][wikipedia-B-tree], which can not be traversed with constant RAM, and
57619ea8026Sopenharmony_ci[B+-trees][wikipedia-B+-tree], which are not possible to update with COW
57719ea8026Sopenharmony_cioperations.
57819ea8026Sopenharmony_ci
57919ea8026Sopenharmony_ci---
58019ea8026Sopenharmony_ci
58119ea8026Sopenharmony_ciSo, what can we do? First let's consider storing files in a simple COW
58219ea8026Sopenharmony_cilinked-list. Appending a block, which is the basis for writing files, means we
58319ea8026Sopenharmony_cihave to update the last block to point to our new block. This requires a COW
58419ea8026Sopenharmony_cioperation, which means we need to update the second-to-last block, and then the
58519ea8026Sopenharmony_cithird-to-last, and so on until we've copied out the entire file.
58619ea8026Sopenharmony_ci
58719ea8026Sopenharmony_ci```
58819ea8026Sopenharmony_ciA linked-list
58919ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.  .--------.  .--------.
59019ea8026Sopenharmony_ci| data 0 |->| data 1 |->| data 2 |->| data 4 |->| data 5 |->| data 6 |
59119ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |  |        |
59219ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |  |        |
59319ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'  '--------'  '--------'
59419ea8026Sopenharmony_ci```
59519ea8026Sopenharmony_ci
59619ea8026Sopenharmony_ciTo avoid a full copy during appends, we can store the data backwards. Appending
59719ea8026Sopenharmony_ciblocks just requires adding the new block and no other blocks need to be
59819ea8026Sopenharmony_ciupdated. If we update a block in the middle, we still need to copy the
59919ea8026Sopenharmony_cifollowing blocks, but can reuse any blocks before it. Since most file writes
60019ea8026Sopenharmony_ciare linear, this design gambles that appends are the most common type of data
60119ea8026Sopenharmony_ciupdate.
60219ea8026Sopenharmony_ci
60319ea8026Sopenharmony_ci```
60419ea8026Sopenharmony_ciA backwards linked-list
60519ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.  .--------.  .--------.
60619ea8026Sopenharmony_ci| data 0 |<-| data 1 |<-| data 2 |<-| data 4 |<-| data 5 |<-| data 6 |
60719ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |  |        |
60819ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |  |        |
60919ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'  '--------'  '--------'
61019ea8026Sopenharmony_ci```
61119ea8026Sopenharmony_ci
61219ea8026Sopenharmony_ciHowever, a backwards linked-list does have a rather glaring problem. Iterating
61319ea8026Sopenharmony_ciover a file _in order_ has a runtime cost of _O(n&sup2;)_. A quadratic runtime
61419ea8026Sopenharmony_cijust to read a file! That's awful.
61519ea8026Sopenharmony_ci
61619ea8026Sopenharmony_ciFortunately we can do better. Instead of a singly linked list, littlefs
61719ea8026Sopenharmony_ciuses a multilayered linked-list often called a
61819ea8026Sopenharmony_ci[skip-list][wikipedia-skip-list]. However, unlike the most common type of
61919ea8026Sopenharmony_ciskip-list, littlefs's skip-lists are strictly deterministic built around some
62019ea8026Sopenharmony_ciinteresting properties of the count-trailing-zeros (CTZ) instruction.
62119ea8026Sopenharmony_ci
62219ea8026Sopenharmony_ciThe rules CTZ skip-lists follow are that for every _n_&zwj;th block where _n_
62319ea8026Sopenharmony_ciis divisible by 2&zwj;_&#739;_, that block contains a pointer to block
62419ea8026Sopenharmony_ci_n_-2&zwj;_&#739;_. This means that each block contains anywhere from 1 to
62519ea8026Sopenharmony_cilog&#8322;_n_ pointers that skip to different preceding elements of the
62619ea8026Sopenharmony_ciskip-list.
62719ea8026Sopenharmony_ci
62819ea8026Sopenharmony_ciThe name comes from heavy use of the [CTZ instruction][wikipedia-ctz], which
62919ea8026Sopenharmony_cilets us calculate the power-of-two factors efficiently. For a give block _n_,
63019ea8026Sopenharmony_cithat block contains ctz(_n_)+1 pointers.
63119ea8026Sopenharmony_ci
63219ea8026Sopenharmony_ci```
63319ea8026Sopenharmony_ciA backwards CTZ skip-list
63419ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.  .--------.  .--------.
63519ea8026Sopenharmony_ci| data 0 |<-| data 1 |<-| data 2 |<-| data 3 |<-| data 4 |<-| data 5 |
63619ea8026Sopenharmony_ci|        |<-|        |--|        |<-|        |--|        |  |        |
63719ea8026Sopenharmony_ci|        |<-|        |--|        |--|        |--|        |  |        |
63819ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'  '--------'  '--------'
63919ea8026Sopenharmony_ci```
64019ea8026Sopenharmony_ci
64119ea8026Sopenharmony_ciThe additional pointers let us navigate the data-structure on disk much more
64219ea8026Sopenharmony_ciefficiently than in a singly linked list.
64319ea8026Sopenharmony_ci
64419ea8026Sopenharmony_ciConsider a path from data block 5 to data block 1. You can see how data block 3
64519ea8026Sopenharmony_ciwas completely skipped:
64619ea8026Sopenharmony_ci```
64719ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.  .--------.  .--------.
64819ea8026Sopenharmony_ci| data 0 |  | data 1 |<-| data 2 |  | data 3 |  | data 4 |<-| data 5 |
64919ea8026Sopenharmony_ci|        |  |        |  |        |<-|        |--|        |  |        |
65019ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |  |        |
65119ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'  '--------'  '--------'
65219ea8026Sopenharmony_ci```
65319ea8026Sopenharmony_ci
65419ea8026Sopenharmony_ciThe path to data block 0 is even faster, requiring only two jumps:
65519ea8026Sopenharmony_ci```
65619ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.  .--------.  .--------.
65719ea8026Sopenharmony_ci| data 0 |  | data 1 |  | data 2 |  | data 3 |  | data 4 |<-| data 5 |
65819ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |  |        |
65919ea8026Sopenharmony_ci|        |<-|        |--|        |--|        |--|        |  |        |
66019ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'  '--------'  '--------'
66119ea8026Sopenharmony_ci```
66219ea8026Sopenharmony_ci
66319ea8026Sopenharmony_ciWe can find the runtime complexity by looking at the path to any block from
66419ea8026Sopenharmony_cithe block containing the most pointers. Every step along the path divides
66519ea8026Sopenharmony_cithe search space for the block in half, giving us a runtime of _O(log n)_.
66619ea8026Sopenharmony_ciTo get _to_ the block with the most pointers, we can perform the same steps
66719ea8026Sopenharmony_cibackwards, which puts the runtime at _O(2 log n)_ = _O(log n)_. An interesting
66819ea8026Sopenharmony_cinote is that this optimal path occurs naturally if we greedily choose the
66919ea8026Sopenharmony_cipointer that covers the most distance without passing our target.
67019ea8026Sopenharmony_ci
67119ea8026Sopenharmony_ciSo now we have a [COW] data structure that is cheap to append with a runtime
67219ea8026Sopenharmony_ciof _O(1)_, and can be read with a worst case runtime of _O(n log n)_. Given
67319ea8026Sopenharmony_cithat this runtime is also divided by the amount of data we can store in a
67419ea8026Sopenharmony_ciblock, this cost is fairly reasonable.
67519ea8026Sopenharmony_ci
67619ea8026Sopenharmony_ci---
67719ea8026Sopenharmony_ci
67819ea8026Sopenharmony_ciThis is a new data structure, so we still have several questions. What is the
67919ea8026Sopenharmony_cistorage overhead? Can the number of pointers exceed the size of a block? How do
68019ea8026Sopenharmony_ciwe store a CTZ skip-list in our metadata pairs?
68119ea8026Sopenharmony_ci
68219ea8026Sopenharmony_ciTo find the storage overhead, we can look at the data structure as multiple
68319ea8026Sopenharmony_cilinked-lists. Each linked-list skips twice as many blocks as the previous,
68419ea8026Sopenharmony_cior from another perspective, each linked-list uses half as much storage as
68519ea8026Sopenharmony_cithe previous. As we approach infinity, the storage overhead forms a geometric
68619ea8026Sopenharmony_ciseries. Solving this tells us that on average our storage overhead is only
68719ea8026Sopenharmony_ci2 pointers per block.
68819ea8026Sopenharmony_ci
68919ea8026Sopenharmony_ci![lim,n->inf((1/n)sum,i,0->n(ctz(i)+1)) = sum,i,0->inf(1/2^i) = 2][ctz-formula1]
69019ea8026Sopenharmony_ci
69119ea8026Sopenharmony_ciBecause our file size is limited the word width we use to store sizes, we can
69219ea8026Sopenharmony_cialso solve for the maximum number of pointers we would ever need to store in a
69319ea8026Sopenharmony_ciblock. If we set the overhead of pointers equal to the block size, we get the
69419ea8026Sopenharmony_cifollowing equation. Note that both a smaller block size (![B][bigB]) and larger
69519ea8026Sopenharmony_ciword width (![w]) result in more storage overhead.
69619ea8026Sopenharmony_ci
69719ea8026Sopenharmony_ci![B = (w/8)ceil(log2(2^w / (B-2w/8)))][ctz-formula2]
69819ea8026Sopenharmony_ci
69919ea8026Sopenharmony_ciSolving the equation for ![B][bigB] gives us the minimum block size for some
70019ea8026Sopenharmony_cicommon word widths:
70119ea8026Sopenharmony_ci
70219ea8026Sopenharmony_ci1. 32-bit CTZ skip-list => minimum block size of 104 bytes
70319ea8026Sopenharmony_ci2. 64-bit CTZ skip-list => minimum block size of 448 bytes
70419ea8026Sopenharmony_ci
70519ea8026Sopenharmony_cilittlefs uses a 32-bit word width, so our blocks can only overflow with
70619ea8026Sopenharmony_cipointers if they are smaller than 104 bytes. This is an easy requirement, as
70719ea8026Sopenharmony_ciin practice, most block sizes start at 512 bytes. As long as our block size
70819ea8026Sopenharmony_ciis larger than 104 bytes, we can avoid the extra logic needed to handle
70919ea8026Sopenharmony_cipointer overflow.
71019ea8026Sopenharmony_ci
71119ea8026Sopenharmony_ciThis last question is how do we store CTZ skip-lists? We need a pointer to the
71219ea8026Sopenharmony_cihead block, the size of the skip-list, the index of the head block, and our
71319ea8026Sopenharmony_cioffset in the head block. But it's worth noting that each size maps to a unique
71419ea8026Sopenharmony_ciindex + offset pair. So in theory we can store only a single pointer and size.
71519ea8026Sopenharmony_ci
71619ea8026Sopenharmony_ciHowever, calculating the index + offset pair from the size is a bit
71719ea8026Sopenharmony_cicomplicated. We can start with a summation that loops through all of the blocks
71819ea8026Sopenharmony_ciup until our given size. Let ![B][bigB] be the block size in bytes, ![w] be the
71919ea8026Sopenharmony_ciword width in bits, ![n] be the index of the block in the skip-list, and
72019ea8026Sopenharmony_ci![N][bigN] be the file size in bytes:
72119ea8026Sopenharmony_ci
72219ea8026Sopenharmony_ci![N = sum,i,0->n(B-(w/8)(ctz(i)+1))][ctz-formula3]
72319ea8026Sopenharmony_ci
72419ea8026Sopenharmony_ciThis works quite well, but requires _O(n)_ to compute, which brings the full
72519ea8026Sopenharmony_ciruntime of reading a file up to _O(n&sup2; log n)_. Fortunately, that summation
72619ea8026Sopenharmony_cidoesn't need to touch the disk, so the practical impact is minimal.
72719ea8026Sopenharmony_ci
72819ea8026Sopenharmony_ciHowever, despite the integration of a bitwise operation, we can actually reduce
72919ea8026Sopenharmony_cithis equation to a _O(1)_ form.  While browsing the amazing resource that is
73019ea8026Sopenharmony_cithe [On-Line Encyclopedia of Integer Sequences (OEIS)][oeis], I managed to find
73119ea8026Sopenharmony_ci[A001511], which matches the iteration of the CTZ instruction,
73219ea8026Sopenharmony_ciand [A005187], which matches its partial summation. Much to my
73319ea8026Sopenharmony_cisurprise, these both result from simple equations, leading us to a rather
73419ea8026Sopenharmony_ciunintuitive property that ties together two seemingly unrelated bitwise
73519ea8026Sopenharmony_ciinstructions:
73619ea8026Sopenharmony_ci
73719ea8026Sopenharmony_ci![sum,i,0->n(ctz(i)+1) = 2n-popcount(n)][ctz-formula4]
73819ea8026Sopenharmony_ci
73919ea8026Sopenharmony_ciwhere:
74019ea8026Sopenharmony_ci
74119ea8026Sopenharmony_ci1. ctz(![x]) = the number of trailing bits that are 0 in ![x]
74219ea8026Sopenharmony_ci2. popcount(![x]) = the number of bits that are 1 in ![x]
74319ea8026Sopenharmony_ci
74419ea8026Sopenharmony_ciInitial tests of this surprising property seem to hold. As ![n] approaches
74519ea8026Sopenharmony_ciinfinity, we end up with an average overhead of 2 pointers, which matches our
74619ea8026Sopenharmony_ciassumption from earlier. During iteration, the popcount function seems to
74719ea8026Sopenharmony_cihandle deviations from this average. Of course, just to make sure I wrote a
74819ea8026Sopenharmony_ciquick script that verified this property for all 32-bit integers.
74919ea8026Sopenharmony_ci
75019ea8026Sopenharmony_ciNow we can substitute into our original equation to find a more efficient
75119ea8026Sopenharmony_ciequation for file size:
75219ea8026Sopenharmony_ci
75319ea8026Sopenharmony_ci![N = Bn - (w/8)(2n-popcount(n))][ctz-formula5]
75419ea8026Sopenharmony_ci
75519ea8026Sopenharmony_ciUnfortunately, the popcount function is non-injective, so we can't solve this
75619ea8026Sopenharmony_ciequation for our index. But what we can do is solve for an ![n'] index that
75719ea8026Sopenharmony_ciis greater than ![n] with error bounded by the range of the popcount function.
75819ea8026Sopenharmony_ciWe can repeatedly substitute ![n'] into the original equation until the error
75919ea8026Sopenharmony_ciis smaller than our integer resolution. As it turns out, we only need to
76019ea8026Sopenharmony_ciperform this substitution once, which gives us this formula for our index:
76119ea8026Sopenharmony_ci
76219ea8026Sopenharmony_ci![n = floor((N-(w/8)popcount(N/(B-2w/8))) / (B-2w/8))][ctz-formula6]
76319ea8026Sopenharmony_ci
76419ea8026Sopenharmony_ciNow that we have our index ![n], we can just plug it back into the above
76519ea8026Sopenharmony_ciequation to find the offset. We run into a bit of a problem with integer
76619ea8026Sopenharmony_cioverflow, but we can avoid this by rearranging the equation a bit:
76719ea8026Sopenharmony_ci
76819ea8026Sopenharmony_ci![off = N - (B-2w/8)n - (w/8)popcount(n)][ctz-formula7]
76919ea8026Sopenharmony_ci
77019ea8026Sopenharmony_ciOur solution requires quite a bit of math, but computers are very good at math.
77119ea8026Sopenharmony_ciNow we can find both our block index and offset from a size in _O(1)_, letting
77219ea8026Sopenharmony_cius store CTZ skip-lists with only a pointer and size.
77319ea8026Sopenharmony_ci
77419ea8026Sopenharmony_ciCTZ skip-lists give us a COW data structure that is easily traversable in
77519ea8026Sopenharmony_ci_O(n)_, can be appended in _O(1)_, and can be read in _O(n log n)_. All of
77619ea8026Sopenharmony_cithese operations work in a bounded amount of RAM and require only two words of
77719ea8026Sopenharmony_cistorage overhead per block. In combination with metadata pairs, CTZ skip-lists
77819ea8026Sopenharmony_ciprovide power resilience and compact storage of data.
77919ea8026Sopenharmony_ci
78019ea8026Sopenharmony_ci```
78119ea8026Sopenharmony_ci                                    .--------.
78219ea8026Sopenharmony_ci                                   .|metadata|
78319ea8026Sopenharmony_ci                                   ||        |
78419ea8026Sopenharmony_ci                                   ||        |
78519ea8026Sopenharmony_ci                                   |'--------'
78619ea8026Sopenharmony_ci                                   '----|---'
78719ea8026Sopenharmony_ci                                        v
78819ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.
78919ea8026Sopenharmony_ci| data 0 |<-| data 1 |<-| data 2 |<-| data 3 |
79019ea8026Sopenharmony_ci|        |<-|        |--|        |  |        |
79119ea8026Sopenharmony_ci|        |  |        |  |        |  |        |
79219ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'
79319ea8026Sopenharmony_ci
79419ea8026Sopenharmony_ciwrite data to disk, create copies
79519ea8026Sopenharmony_ci=>
79619ea8026Sopenharmony_ci                                    .--------.
79719ea8026Sopenharmony_ci                                   .|metadata|
79819ea8026Sopenharmony_ci                                   ||        |
79919ea8026Sopenharmony_ci                                   ||        |
80019ea8026Sopenharmony_ci                                   |'--------'
80119ea8026Sopenharmony_ci                                   '----|---'
80219ea8026Sopenharmony_ci                                        v
80319ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.
80419ea8026Sopenharmony_ci| data 0 |<-| data 1 |<-| data 2 |<-| data 3 |
80519ea8026Sopenharmony_ci|        |<-|        |--|        |  |        |
80619ea8026Sopenharmony_ci|        |  |        |  |        |  |        |
80719ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'
80819ea8026Sopenharmony_ci     ^ ^           ^
80919ea8026Sopenharmony_ci     | |           |    .--------.  .--------.  .--------.  .--------.
81019ea8026Sopenharmony_ci     | |           '----| new    |<-| new    |<-| new    |<-| new    |
81119ea8026Sopenharmony_ci     | '----------------| data 2 |<-| data 3 |--| data 4 |  | data 5 |
81219ea8026Sopenharmony_ci     '------------------|        |--|        |--|        |  |        |
81319ea8026Sopenharmony_ci                        '--------'  '--------'  '--------'  '--------'
81419ea8026Sopenharmony_ci
81519ea8026Sopenharmony_cicommit to metadata pair
81619ea8026Sopenharmony_ci=>
81719ea8026Sopenharmony_ci                                                            .--------.
81819ea8026Sopenharmony_ci                                                           .|new     |
81919ea8026Sopenharmony_ci                                                           ||metadata|
82019ea8026Sopenharmony_ci                                                           ||        |
82119ea8026Sopenharmony_ci                                                           |'--------'
82219ea8026Sopenharmony_ci                                                           '----|---'
82319ea8026Sopenharmony_ci                                                                |
82419ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.                  |
82519ea8026Sopenharmony_ci| data 0 |<-| data 1 |<-| data 2 |<-| data 3 |                  |
82619ea8026Sopenharmony_ci|        |<-|        |--|        |  |        |                  |
82719ea8026Sopenharmony_ci|        |  |        |  |        |  |        |                  |
82819ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'                  |
82919ea8026Sopenharmony_ci     ^ ^           ^                                            v
83019ea8026Sopenharmony_ci     | |           |    .--------.  .--------.  .--------.  .--------.
83119ea8026Sopenharmony_ci     | |           '----| new    |<-| new    |<-| new    |<-| new    |
83219ea8026Sopenharmony_ci     | '----------------| data 2 |<-| data 3 |--| data 4 |  | data 5 |
83319ea8026Sopenharmony_ci     '------------------|        |--|        |--|        |  |        |
83419ea8026Sopenharmony_ci                        '--------'  '--------'  '--------'  '--------'
83519ea8026Sopenharmony_ci```
83619ea8026Sopenharmony_ci
83719ea8026Sopenharmony_ci## The block allocator
83819ea8026Sopenharmony_ci
83919ea8026Sopenharmony_ciSo we now have the framework for an atomic, wear leveling filesystem. Small two
84019ea8026Sopenharmony_ciblock metadata pairs provide atomic updates, while CTZ skip-lists provide
84119ea8026Sopenharmony_cicompact storage of data in COW blocks.
84219ea8026Sopenharmony_ci
84319ea8026Sopenharmony_ciBut now we need to look at the [elephant] in the room. Where do all these
84419ea8026Sopenharmony_ciblocks come from?
84519ea8026Sopenharmony_ci
84619ea8026Sopenharmony_ciDeciding which block to use next is the responsibility of the block allocator.
84719ea8026Sopenharmony_ciIn filesystem design, block allocation is often a second-class citizen, but in
84819ea8026Sopenharmony_cia COW filesystem its role becomes much more important as it is needed for
84919ea8026Sopenharmony_cinearly every write to the filesystem.
85019ea8026Sopenharmony_ci
85119ea8026Sopenharmony_ciNormally, block allocation involves some sort of free list or bitmap stored on
85219ea8026Sopenharmony_cithe filesystem that is updated with free blocks. However, with power
85319ea8026Sopenharmony_ciresilience, keeping these structures consistent becomes difficult. It doesn't
85419ea8026Sopenharmony_cihelp that any mistake in updating these structures can result in lost blocks
85519ea8026Sopenharmony_cithat are impossible to recover.
85619ea8026Sopenharmony_ci
85719ea8026Sopenharmony_cilittlefs takes a cautious approach. Instead of trusting a free list on disk,
85819ea8026Sopenharmony_cilittlefs relies on the fact that the filesystem on disk is a mirror image of
85919ea8026Sopenharmony_cithe free blocks on the disk. The block allocator operates much like a garbage
86019ea8026Sopenharmony_cicollector in a scripting language, scanning for unused blocks on demand.
86119ea8026Sopenharmony_ci
86219ea8026Sopenharmony_ci```
86319ea8026Sopenharmony_ci          .----.
86419ea8026Sopenharmony_ci          |root|
86519ea8026Sopenharmony_ci          |    |
86619ea8026Sopenharmony_ci          '----'
86719ea8026Sopenharmony_ci   v-------'  '-------v
86819ea8026Sopenharmony_ci.----.    .    .    .----.
86919ea8026Sopenharmony_ci| A  |    .    .    | B  |
87019ea8026Sopenharmony_ci|    |    .    .    |    |
87119ea8026Sopenharmony_ci'----'    .    .    '----'
87219ea8026Sopenharmony_ci.    .    .    .  v--'  '------------v---------v
87319ea8026Sopenharmony_ci.    .    .    .----.    .         .----.    .----.
87419ea8026Sopenharmony_ci.    .    .    | C  |    .         | D  |    | E  |
87519ea8026Sopenharmony_ci.    .    .    |    |    .         |    |    |    |
87619ea8026Sopenharmony_ci.    .    .    '----'    .         '----'    '----'
87719ea8026Sopenharmony_ci.    .    .    .    .    .         .    .    .    .
87819ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.----.----.
87919ea8026Sopenharmony_ci| A  |    |root| C  | B  |         | D  |    | E  |         |
88019ea8026Sopenharmony_ci|    |    |    |    |    |         |    |    |    |         |
88119ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'----'----'
88219ea8026Sopenharmony_ci        ^                   ^    ^                   ^    ^
88319ea8026Sopenharmony_ci         '-------------------'----'-------------------'----'-- free blocks
88419ea8026Sopenharmony_ci```
88519ea8026Sopenharmony_ci
88619ea8026Sopenharmony_ciWhile this approach may sound complicated, the decision to not maintain a free
88719ea8026Sopenharmony_cilist greatly simplifies the overall design of littlefs. Unlike programming
88819ea8026Sopenharmony_cilanguages, there are only a handful of data structures we need to traverse.
88919ea8026Sopenharmony_ciAnd block deallocation, which occurs nearly as often as block allocation,
89019ea8026Sopenharmony_ciis simply a noop. This "drop it on the floor" strategy greatly reduces the
89119ea8026Sopenharmony_cicomplexity of managing on disk data structures, especially when handling
89219ea8026Sopenharmony_cihigh-risk error conditions.
89319ea8026Sopenharmony_ci
89419ea8026Sopenharmony_ci---
89519ea8026Sopenharmony_ci
89619ea8026Sopenharmony_ciOur block allocator needs to find free blocks efficiently. You could traverse
89719ea8026Sopenharmony_cithrough every block on storage and check each one against our filesystem tree;
89819ea8026Sopenharmony_cihowever, the runtime would be abhorrent. We need to somehow collect multiple
89919ea8026Sopenharmony_ciblocks per traversal.
90019ea8026Sopenharmony_ci
90119ea8026Sopenharmony_ciLooking at existing designs, some larger filesystems that use a similar "drop
90219ea8026Sopenharmony_ciit on the floor" strategy store a bitmap of the entire storage in [RAM]. This
90319ea8026Sopenharmony_ciworks well because bitmaps are surprisingly compact. We can't use the same
90419ea8026Sopenharmony_cistrategy here, as it violates our constant RAM requirement, but we may be able
90519ea8026Sopenharmony_cito modify the idea into a workable solution.
90619ea8026Sopenharmony_ci
90719ea8026Sopenharmony_ci```
90819ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.----.----.
90919ea8026Sopenharmony_ci| A  |    |root| C  | B  |         | D  |    | E  |         |
91019ea8026Sopenharmony_ci|    |    |    |    |    |         |    |    |    |         |
91119ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'----'----'
91219ea8026Sopenharmony_ci  1    0    1    1    1    0    0    1    0    1    0    0
91319ea8026Sopenharmony_ci \---------------------------+----------------------------/
91419ea8026Sopenharmony_ci                             v
91519ea8026Sopenharmony_ci               bitmap: 0xb94 (0b101110010100)
91619ea8026Sopenharmony_ci```
91719ea8026Sopenharmony_ci
91819ea8026Sopenharmony_ciThe block allocator in littlefs is a compromise between a disk-sized bitmap and
91919ea8026Sopenharmony_cia brute force traversal. Instead of a bitmap the size of storage, we keep track
92019ea8026Sopenharmony_ciof a small, fixed-size bitmap called the lookahead buffer. During block
92119ea8026Sopenharmony_ciallocation, we take blocks from the lookahead buffer. If the lookahead buffer
92219ea8026Sopenharmony_ciis empty, we scan the filesystem for more free blocks, populating our lookahead
92319ea8026Sopenharmony_cibuffer. In each scan we use an increasing offset, circling the storage as
92419ea8026Sopenharmony_ciblocks are allocated.
92519ea8026Sopenharmony_ci
92619ea8026Sopenharmony_ciHere's what it might look like to allocate 4 blocks on a decently busy
92719ea8026Sopenharmony_cifilesystem with a 32 bit lookahead and a total of 128 blocks (512 KiB
92819ea8026Sopenharmony_ciof storage if blocks are 4 KiB):
92919ea8026Sopenharmony_ci```
93019ea8026Sopenharmony_ciboot...         lookahead:
93119ea8026Sopenharmony_ci                fs blocks: fffff9fffffffffeffffffffffff0000
93219ea8026Sopenharmony_ciscanning...     lookahead: fffff9ff
93319ea8026Sopenharmony_ci                fs blocks: fffff9fffffffffeffffffffffff0000
93419ea8026Sopenharmony_cialloc = 21      lookahead: fffffdff
93519ea8026Sopenharmony_ci                fs blocks: fffffdfffffffffeffffffffffff0000
93619ea8026Sopenharmony_cialloc = 22      lookahead: ffffffff
93719ea8026Sopenharmony_ci                fs blocks: fffffffffffffffeffffffffffff0000
93819ea8026Sopenharmony_ciscanning...     lookahead:         fffffffe
93919ea8026Sopenharmony_ci                fs blocks: fffffffffffffffeffffffffffff0000
94019ea8026Sopenharmony_cialloc = 63      lookahead:         ffffffff
94119ea8026Sopenharmony_ci                fs blocks: ffffffffffffffffffffffffffff0000
94219ea8026Sopenharmony_ciscanning...     lookahead:         ffffffff
94319ea8026Sopenharmony_ci                fs blocks: ffffffffffffffffffffffffffff0000
94419ea8026Sopenharmony_ciscanning...     lookahead:                 ffffffff
94519ea8026Sopenharmony_ci                fs blocks: ffffffffffffffffffffffffffff0000
94619ea8026Sopenharmony_ciscanning...     lookahead:                         ffff0000
94719ea8026Sopenharmony_ci                fs blocks: ffffffffffffffffffffffffffff0000
94819ea8026Sopenharmony_cialloc = 112     lookahead:                         ffff8000
94919ea8026Sopenharmony_ci                fs blocks: ffffffffffffffffffffffffffff8000
95019ea8026Sopenharmony_ci```
95119ea8026Sopenharmony_ci
95219ea8026Sopenharmony_ciThis lookahead approach has a runtime complexity of _O(n&sup2;)_ to completely
95319ea8026Sopenharmony_ciscan storage; however, bitmaps are surprisingly compact, and in practice only
95419ea8026Sopenharmony_cione or two passes are usually needed to find free blocks. Additionally, the
95519ea8026Sopenharmony_ciperformance of the allocator can be optimized by adjusting the block size or
95619ea8026Sopenharmony_cisize of the lookahead buffer, trading either write granularity or RAM for
95719ea8026Sopenharmony_ciallocator performance.
95819ea8026Sopenharmony_ci
95919ea8026Sopenharmony_ci## Wear leveling
96019ea8026Sopenharmony_ci
96119ea8026Sopenharmony_ciThe block allocator has a secondary role: wear leveling.
96219ea8026Sopenharmony_ci
96319ea8026Sopenharmony_ciWear leveling is the process of distributing wear across all blocks in the
96419ea8026Sopenharmony_cistorage to prevent the filesystem from experiencing an early death due to
96519ea8026Sopenharmony_ciwear on a single block in the storage.
96619ea8026Sopenharmony_ci
96719ea8026Sopenharmony_cilittlefs has two methods of protecting against wear:
96819ea8026Sopenharmony_ci1. Detection and recovery from bad blocks
96919ea8026Sopenharmony_ci2. Evenly distributing wear across dynamic blocks
97019ea8026Sopenharmony_ci
97119ea8026Sopenharmony_ci---
97219ea8026Sopenharmony_ci
97319ea8026Sopenharmony_ciRecovery from bad blocks doesn't actually have anything to do with the block
97419ea8026Sopenharmony_ciallocator itself. Instead, it relies on the ability of the filesystem to detect
97519ea8026Sopenharmony_ciand evict bad blocks when they occur.
97619ea8026Sopenharmony_ci
97719ea8026Sopenharmony_ciIn littlefs, it is fairly straightforward to detect bad blocks at write time.
97819ea8026Sopenharmony_ciAll writes must be sourced by some form of data in RAM, so immediately after we
97919ea8026Sopenharmony_ciwrite to a block, we can read the data back and verify that it was written
98019ea8026Sopenharmony_cicorrectly. If we find that the data on disk does not match the copy we have in
98119ea8026Sopenharmony_ciRAM, a write error has occurred and we most likely have a bad block.
98219ea8026Sopenharmony_ci
98319ea8026Sopenharmony_ciOnce we detect a bad block, we need to recover from it. In the case of write
98419ea8026Sopenharmony_cierrors, we have a copy of the corrupted data in RAM, so all we need to do is
98519ea8026Sopenharmony_cievict the bad block, allocate a new, hopefully good block, and repeat the write
98619ea8026Sopenharmony_cithat previously failed.
98719ea8026Sopenharmony_ci
98819ea8026Sopenharmony_ciThe actual act of evicting the bad block and replacing it with a new block is
98919ea8026Sopenharmony_cileft up to the filesystem's copy-on-bounded-writes (CObW) data structures. One
99019ea8026Sopenharmony_ciproperty of CObW data structures is that any block can be replaced during a
99119ea8026Sopenharmony_ciCOW operation. The bounded-writes part is normally triggered by a counter, but
99219ea8026Sopenharmony_cinothing prevents us from triggering a COW operation as soon as we find a bad
99319ea8026Sopenharmony_ciblock.
99419ea8026Sopenharmony_ci
99519ea8026Sopenharmony_ci```
99619ea8026Sopenharmony_ci     .----.
99719ea8026Sopenharmony_ci     |root|
99819ea8026Sopenharmony_ci     |    |
99919ea8026Sopenharmony_ci     '----'
100019ea8026Sopenharmony_ci   v--'  '----------------------v
100119ea8026Sopenharmony_ci.----.                        .----.
100219ea8026Sopenharmony_ci| A  |                        | B  |
100319ea8026Sopenharmony_ci|    |                        |    |
100419ea8026Sopenharmony_ci'----'                        '----'
100519ea8026Sopenharmony_ci.    .                      v---'  .
100619ea8026Sopenharmony_ci.    .                   .----.    .
100719ea8026Sopenharmony_ci.    .                   | C  |    .
100819ea8026Sopenharmony_ci.    .                   |    |    .
100919ea8026Sopenharmony_ci.    .                   '----'    .
101019ea8026Sopenharmony_ci.    .                   .    .    .
101119ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.
101219ea8026Sopenharmony_ci| A  |root|              | C  | B  |              |
101319ea8026Sopenharmony_ci|    |    |              |    |    |              |
101419ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'
101519ea8026Sopenharmony_ci
101619ea8026Sopenharmony_ciupdate C
101719ea8026Sopenharmony_ci=>
101819ea8026Sopenharmony_ci     .----.
101919ea8026Sopenharmony_ci     |root|
102019ea8026Sopenharmony_ci     |    |
102119ea8026Sopenharmony_ci     '----'
102219ea8026Sopenharmony_ci   v--'  '----------------------v
102319ea8026Sopenharmony_ci.----.                        .----.
102419ea8026Sopenharmony_ci| A  |                        | B  |
102519ea8026Sopenharmony_ci|    |                        |    |
102619ea8026Sopenharmony_ci'----'                        '----'
102719ea8026Sopenharmony_ci.    .                      v---'  .
102819ea8026Sopenharmony_ci.    .                   .----.    .
102919ea8026Sopenharmony_ci.    .                   |bad |    .
103019ea8026Sopenharmony_ci.    .                   |blck|    .
103119ea8026Sopenharmony_ci.    .                   '----'    .
103219ea8026Sopenharmony_ci.    .                   .    .    .
103319ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.
103419ea8026Sopenharmony_ci| A  |root|              |bad | B  |              |
103519ea8026Sopenharmony_ci|    |    |              |blck|    |              |
103619ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'
103719ea8026Sopenharmony_ci
103819ea8026Sopenharmony_cioh no! bad block! relocate C
103919ea8026Sopenharmony_ci=>
104019ea8026Sopenharmony_ci     .----.
104119ea8026Sopenharmony_ci     |root|
104219ea8026Sopenharmony_ci     |    |
104319ea8026Sopenharmony_ci     '----'
104419ea8026Sopenharmony_ci   v--'  '----------------------v
104519ea8026Sopenharmony_ci.----.                        .----.
104619ea8026Sopenharmony_ci| A  |                        | B  |
104719ea8026Sopenharmony_ci|    |                        |    |
104819ea8026Sopenharmony_ci'----'                        '----'
104919ea8026Sopenharmony_ci.    .                      v---'  .
105019ea8026Sopenharmony_ci.    .                   .----.    .
105119ea8026Sopenharmony_ci.    .                   |bad |    .
105219ea8026Sopenharmony_ci.    .                   |blck|    .
105319ea8026Sopenharmony_ci.    .                   '----'    .
105419ea8026Sopenharmony_ci.    .                   .    .    .
105519ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.
105619ea8026Sopenharmony_ci| A  |root|              |bad | B  |bad |         |
105719ea8026Sopenharmony_ci|    |    |              |blck|    |blck|         |
105819ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'
105919ea8026Sopenharmony_ci                            --------->
106019ea8026Sopenharmony_cioh no! bad block! relocate C
106119ea8026Sopenharmony_ci=>
106219ea8026Sopenharmony_ci     .----.
106319ea8026Sopenharmony_ci     |root|
106419ea8026Sopenharmony_ci     |    |
106519ea8026Sopenharmony_ci     '----'
106619ea8026Sopenharmony_ci   v--'  '----------------------v
106719ea8026Sopenharmony_ci.----.                        .----.
106819ea8026Sopenharmony_ci| A  |                        | B  |
106919ea8026Sopenharmony_ci|    |                        |    |
107019ea8026Sopenharmony_ci'----'                        '----'
107119ea8026Sopenharmony_ci.    .                      v---'  .
107219ea8026Sopenharmony_ci.    .                   .----.    .    .----.
107319ea8026Sopenharmony_ci.    .                   |bad |    .    | C' |
107419ea8026Sopenharmony_ci.    .                   |blck|    .    |    |
107519ea8026Sopenharmony_ci.    .                   '----'    .    '----'
107619ea8026Sopenharmony_ci.    .                   .    .    .    .    .
107719ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.
107819ea8026Sopenharmony_ci| A  |root|              |bad | B  |bad | C' |    |
107919ea8026Sopenharmony_ci|    |    |              |blck|    |blck|    |    |
108019ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'
108119ea8026Sopenharmony_ci                            -------------->
108219ea8026Sopenharmony_cisuccessfully relocated C, update B
108319ea8026Sopenharmony_ci=>
108419ea8026Sopenharmony_ci     .----.
108519ea8026Sopenharmony_ci     |root|
108619ea8026Sopenharmony_ci     |    |
108719ea8026Sopenharmony_ci     '----'
108819ea8026Sopenharmony_ci   v--'  '----------------------v
108919ea8026Sopenharmony_ci.----.                        .----.
109019ea8026Sopenharmony_ci| A  |                        |bad |
109119ea8026Sopenharmony_ci|    |                        |blck|
109219ea8026Sopenharmony_ci'----'                        '----'
109319ea8026Sopenharmony_ci.    .                      v---'  .
109419ea8026Sopenharmony_ci.    .                   .----.    .    .----.
109519ea8026Sopenharmony_ci.    .                   |bad |    .    | C' |
109619ea8026Sopenharmony_ci.    .                   |blck|    .    |    |
109719ea8026Sopenharmony_ci.    .                   '----'    .    '----'
109819ea8026Sopenharmony_ci.    .                   .    .    .    .    .
109919ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.
110019ea8026Sopenharmony_ci| A  |root|              |bad |bad |bad | C' |    |
110119ea8026Sopenharmony_ci|    |    |              |blck|blck|blck|    |    |
110219ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'
110319ea8026Sopenharmony_ci
110419ea8026Sopenharmony_cioh no! bad block! relocate B
110519ea8026Sopenharmony_ci=>
110619ea8026Sopenharmony_ci     .----.
110719ea8026Sopenharmony_ci     |root|
110819ea8026Sopenharmony_ci     |    |
110919ea8026Sopenharmony_ci     '----'
111019ea8026Sopenharmony_ci   v--'  '----------------------v
111119ea8026Sopenharmony_ci.----.                        .----.         .----.
111219ea8026Sopenharmony_ci| A  |                        |bad |         |bad |
111319ea8026Sopenharmony_ci|    |                        |blck|         |blck|
111419ea8026Sopenharmony_ci'----'                        '----'         '----'
111519ea8026Sopenharmony_ci.    .                      v---'  .         .    .
111619ea8026Sopenharmony_ci.    .                   .----.    .    .----.    .
111719ea8026Sopenharmony_ci.    .                   |bad |    .    | C' |    .
111819ea8026Sopenharmony_ci.    .                   |blck|    .    |    |    .
111919ea8026Sopenharmony_ci.    .                   '----'    .    '----'    .
112019ea8026Sopenharmony_ci.    .                   .    .    .    .    .    .
112119ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.
112219ea8026Sopenharmony_ci| A  |root|              |bad |bad |bad | C' |bad |
112319ea8026Sopenharmony_ci|    |    |              |blck|blck|blck|    |blck|
112419ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'
112519ea8026Sopenharmony_ci                                 -------------->
112619ea8026Sopenharmony_cioh no! bad block! relocate B
112719ea8026Sopenharmony_ci=>
112819ea8026Sopenharmony_ci     .----.
112919ea8026Sopenharmony_ci     |root|
113019ea8026Sopenharmony_ci     |    |
113119ea8026Sopenharmony_ci     '----'
113219ea8026Sopenharmony_ci   v--'  '----------------------v
113319ea8026Sopenharmony_ci.----.    .----.              .----.
113419ea8026Sopenharmony_ci| A  |    | B' |              |bad |
113519ea8026Sopenharmony_ci|    |    |    |              |blck|
113619ea8026Sopenharmony_ci'----'    '----'              '----'
113719ea8026Sopenharmony_ci.    .    .  | .            .---'  .
113819ea8026Sopenharmony_ci.    .    .  '--------------v-------------v
113919ea8026Sopenharmony_ci.    .    .    .         .----.    .    .----.
114019ea8026Sopenharmony_ci.    .    .    .         |bad |    .    | C' |
114119ea8026Sopenharmony_ci.    .    .    .         |blck|    .    |    |
114219ea8026Sopenharmony_ci.    .    .    .         '----'    .    '----'
114319ea8026Sopenharmony_ci.    .    .    .         .    .    .    .    .
114419ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.
114519ea8026Sopenharmony_ci| A  |root| B' |         |bad |bad |bad | C' |bad |
114619ea8026Sopenharmony_ci|    |    |    |         |blck|blck|blck|    |blck|
114719ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'
114819ea8026Sopenharmony_ci------------>                    ------------------
114919ea8026Sopenharmony_cisuccessfully relocated B, update root
115019ea8026Sopenharmony_ci=>
115119ea8026Sopenharmony_ci     .----.
115219ea8026Sopenharmony_ci     |root|
115319ea8026Sopenharmony_ci     |    |
115419ea8026Sopenharmony_ci     '----'
115519ea8026Sopenharmony_ci   v--'  '--v
115619ea8026Sopenharmony_ci.----.    .----.
115719ea8026Sopenharmony_ci| A  |    | B' |
115819ea8026Sopenharmony_ci|    |    |    |
115919ea8026Sopenharmony_ci'----'    '----'
116019ea8026Sopenharmony_ci.    .    .   '---------------------------v
116119ea8026Sopenharmony_ci.    .    .    .                        .----.
116219ea8026Sopenharmony_ci.    .    .    .                        | C' |
116319ea8026Sopenharmony_ci.    .    .    .                        |    |
116419ea8026Sopenharmony_ci.    .    .    .                        '----'
116519ea8026Sopenharmony_ci.    .    .    .                        .    .
116619ea8026Sopenharmony_ci.----.----.----.----.----.----.----.----.----.----.
116719ea8026Sopenharmony_ci| A  |root| B' |         |bad |bad |bad | C' |bad |
116819ea8026Sopenharmony_ci|    |    |    |         |blck|blck|blck|    |blck|
116919ea8026Sopenharmony_ci'----'----'----'----'----'----'----'----'----'----'
117019ea8026Sopenharmony_ci```
117119ea8026Sopenharmony_ci
117219ea8026Sopenharmony_ciWe may find that the new block is also bad, but hopefully after repeating this
117319ea8026Sopenharmony_cicycle we'll eventually find a new block where a write succeeds. If we don't,
117419ea8026Sopenharmony_cithat means that all blocks in our storage are bad, and we've reached the end of
117519ea8026Sopenharmony_ciour device's usable life. At this point, littlefs will return an "out of space"
117619ea8026Sopenharmony_cierror. This is technically true, as there are no more good blocks, but as an
117719ea8026Sopenharmony_ciadded benefit it also matches the error condition expected by users of
117819ea8026Sopenharmony_cidynamically sized data.
117919ea8026Sopenharmony_ci
118019ea8026Sopenharmony_ci---
118119ea8026Sopenharmony_ci
118219ea8026Sopenharmony_ciRead errors, on the other hand, are quite a bit more complicated. We don't have
118319ea8026Sopenharmony_cia copy of the data lingering around in RAM, so we need a way to reconstruct the
118419ea8026Sopenharmony_cioriginal data even after it has been corrupted. One such mechanism for this is
118519ea8026Sopenharmony_ci[error-correction-codes (ECC)][wikipedia-ecc].
118619ea8026Sopenharmony_ci
118719ea8026Sopenharmony_ciECC is an extension to the idea of a checksum. Where a checksum such as CRC can
118819ea8026Sopenharmony_cidetect that an error has occurred in the data, ECC can detect and actually
118919ea8026Sopenharmony_cicorrect some amount of errors. However, there is a limit to how many errors ECC
119019ea8026Sopenharmony_cican detect: the [Hamming bound][wikipedia-hamming-bound]. As the number of
119119ea8026Sopenharmony_cierrors approaches the Hamming bound, we may still be able to detect errors, but
119219ea8026Sopenharmony_cican no longer fix the data. If we've reached this point the block is
119319ea8026Sopenharmony_ciunrecoverable.
119419ea8026Sopenharmony_ci
119519ea8026Sopenharmony_cilittlefs by itself does **not** provide ECC. The block nature and relatively
119619ea8026Sopenharmony_cilarge footprint of ECC does not work well with the dynamically sized data of
119719ea8026Sopenharmony_cifilesystems, correcting errors without RAM is complicated, and ECC fits better
119819ea8026Sopenharmony_ciwith the geometry of block devices. In fact, several NOR flash chips have extra
119919ea8026Sopenharmony_cistorage intended for ECC, and many NAND chips can even calculate ECC on the
120019ea8026Sopenharmony_cichip itself.
120119ea8026Sopenharmony_ci
120219ea8026Sopenharmony_ciIn littlefs, ECC is entirely optional. Read errors can instead be prevented
120319ea8026Sopenharmony_ciproactively by wear leveling. But it's important to note that ECC can be used
120419ea8026Sopenharmony_ciat the block device level to modestly extend the life of a device. littlefs
120519ea8026Sopenharmony_cirespects any errors reported by the block device, allowing a block device to
120619ea8026Sopenharmony_ciprovide additional aggressive error detection.
120719ea8026Sopenharmony_ci
120819ea8026Sopenharmony_ci---
120919ea8026Sopenharmony_ci
121019ea8026Sopenharmony_ciTo avoid read errors, we need to be proactive, as opposed to reactive as we
121119ea8026Sopenharmony_ciwere with write errors.
121219ea8026Sopenharmony_ci
121319ea8026Sopenharmony_ciOne way to do this is to detect when the number of errors in a block exceeds
121419ea8026Sopenharmony_cisome threshold, but is still recoverable. With ECC we can do this at write
121519ea8026Sopenharmony_citime, and treat the error as a write error, evicting the block before fatal
121619ea8026Sopenharmony_ciread errors have a chance to develop.
121719ea8026Sopenharmony_ci
121819ea8026Sopenharmony_ciA different, more generic strategy, is to proactively distribute wear across
121919ea8026Sopenharmony_ciall blocks in the storage, with the hope that no single block fails before the
122019ea8026Sopenharmony_cirest of storage is approaching the end of its usable life. This is called
122119ea8026Sopenharmony_ciwear leveling.
122219ea8026Sopenharmony_ci
122319ea8026Sopenharmony_ciGenerally, wear leveling algorithms fall into one of two categories:
122419ea8026Sopenharmony_ci
122519ea8026Sopenharmony_ci1. [Dynamic wear leveling][wikipedia-dynamic-wear-leveling], where we
122619ea8026Sopenharmony_ci   distribute wear over "dynamic" blocks. The can be accomplished by
122719ea8026Sopenharmony_ci   only considering unused blocks.
122819ea8026Sopenharmony_ci
122919ea8026Sopenharmony_ci2. [Static wear leveling][wikipedia-static-wear-leveling], where we
123019ea8026Sopenharmony_ci   distribute wear over both "dynamic" and "static" blocks. To make this work,
123119ea8026Sopenharmony_ci   we need to consider all blocks, including blocks that already contain data.
123219ea8026Sopenharmony_ci
123319ea8026Sopenharmony_ciAs a tradeoff for code size and complexity, littlefs (currently) only provides
123419ea8026Sopenharmony_cidynamic wear leveling. This is a best effort solution. Wear is not distributed
123519ea8026Sopenharmony_ciperfectly, but it is distributed among the free blocks and greatly extends the
123619ea8026Sopenharmony_cilife of a device.
123719ea8026Sopenharmony_ci
123819ea8026Sopenharmony_ciOn top of this, littlefs uses a statistical wear leveling algorithm. What this
123919ea8026Sopenharmony_cimeans is that we don’t actively track wear, instead we rely on a uniform
124019ea8026Sopenharmony_cidistribution of wear across storage to approximate a dynamic wear leveling
124119ea8026Sopenharmony_cialgorithm. Despite the long name, this is actually a simplification of dynamic
124219ea8026Sopenharmony_ciwear leveling.
124319ea8026Sopenharmony_ci
124419ea8026Sopenharmony_ciThe uniform distribution of wear is left up to the block allocator, which
124519ea8026Sopenharmony_cicreates a uniform distribution in two parts. The easy part is when the device
124619ea8026Sopenharmony_ciis powered, in which case we allocate the blocks linearly, circling the device.
124719ea8026Sopenharmony_ciThe harder part is what to do when the device loses power. We can't just
124819ea8026Sopenharmony_cirestart the allocator at the beginning of storage, as this would bias the wear.
124919ea8026Sopenharmony_ciInstead, we start the allocator as a random offset every time we mount the
125019ea8026Sopenharmony_cifilesystem. As long as this random offset is uniform, the combined allocation
125119ea8026Sopenharmony_cipattern is also a uniform distribution.
125219ea8026Sopenharmony_ci
125319ea8026Sopenharmony_ci![Cumulative wear distribution graph][wear-distribution-graph]
125419ea8026Sopenharmony_ci
125519ea8026Sopenharmony_ciInitially, this approach to wear leveling looks like it creates a difficult
125619ea8026Sopenharmony_cidependency on a power-independent random number generator, which must return
125719ea8026Sopenharmony_cidifferent random numbers on each boot. However, the filesystem is in a
125819ea8026Sopenharmony_cirelatively unique situation in that it is sitting on top of a large of amount
125919ea8026Sopenharmony_ciof entropy that persists across power loss.
126019ea8026Sopenharmony_ci
126119ea8026Sopenharmony_ciWe can actually use the data on disk to directly drive our random number
126219ea8026Sopenharmony_cigenerator. In practice, this is implemented by xoring the checksums of each
126319ea8026Sopenharmony_cimetadata pair, which is already calculated to fetch and mount the filesystem.
126419ea8026Sopenharmony_ci
126519ea8026Sopenharmony_ci```
126619ea8026Sopenharmony_ci            .--------. \                         probably random
126719ea8026Sopenharmony_ci           .|metadata| |                                ^
126819ea8026Sopenharmony_ci           ||        | +-> crc ----------------------> xor
126919ea8026Sopenharmony_ci           ||        | |                                ^
127019ea8026Sopenharmony_ci           |'--------' /                                |
127119ea8026Sopenharmony_ci           '---|--|-'                                   |
127219ea8026Sopenharmony_ci            .-'    '-------------------------.          |
127319ea8026Sopenharmony_ci           |                                  |         |
127419ea8026Sopenharmony_ci           |        .--------------> xor ------------> xor
127519ea8026Sopenharmony_ci           |        |                 ^       |         ^
127619ea8026Sopenharmony_ci           v       crc               crc      v        crc
127719ea8026Sopenharmony_ci      .--------. \  ^   .--------. \  ^   .--------. \  ^
127819ea8026Sopenharmony_ci     .|metadata|-|--|-->|metadata| |  |  .|metadata| |  |
127919ea8026Sopenharmony_ci     ||        | +--'  ||        | +--'  ||        | +--'
128019ea8026Sopenharmony_ci     ||        | |     ||        | |     ||        | |
128119ea8026Sopenharmony_ci     |'--------' /     |'--------' /     |'--------' /
128219ea8026Sopenharmony_ci     '---|--|-'        '----|---'        '---|--|-'
128319ea8026Sopenharmony_ci      .-'    '-.            |             .-'    '-.
128419ea8026Sopenharmony_ci     v          v           v            v          v
128519ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.  .--------.
128619ea8026Sopenharmony_ci|  data  |  |  data  |  |  data  |  |  data  |  |  data  |
128719ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |
128819ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |
128919ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'  '--------'
129019ea8026Sopenharmony_ci```
129119ea8026Sopenharmony_ci
129219ea8026Sopenharmony_ciNote that this random number generator is not perfect. It only returns unique
129319ea8026Sopenharmony_cirandom numbers when the filesystem is modified. This is exactly what we want
129419ea8026Sopenharmony_cifor distributing wear in the allocator, but means this random number generator
129519ea8026Sopenharmony_ciis not useful for general use.
129619ea8026Sopenharmony_ci
129719ea8026Sopenharmony_ci---
129819ea8026Sopenharmony_ci
129919ea8026Sopenharmony_ciTogether, bad block detection and dynamic wear leveling provide a best effort
130019ea8026Sopenharmony_cisolution for avoiding the early death of a filesystem due to wear. Importantly,
130119ea8026Sopenharmony_cilittlefs's wear leveling algorithm provides a key feature: You can increase the
130219ea8026Sopenharmony_cilife of a device simply by increasing the size of storage. And if more
130319ea8026Sopenharmony_ciaggressive wear leveling is desired, you can always combine littlefs with a
130419ea8026Sopenharmony_ci[flash translation layer (FTL)][wikipedia-ftl] to get a small power resilient
130519ea8026Sopenharmony_cifilesystem with static wear leveling.
130619ea8026Sopenharmony_ci
130719ea8026Sopenharmony_ci## Files
130819ea8026Sopenharmony_ci
130919ea8026Sopenharmony_ciNow that we have our building blocks out of the way, we can start looking at
131019ea8026Sopenharmony_ciour filesystem as a whole.
131119ea8026Sopenharmony_ci
131219ea8026Sopenharmony_ciThe first step: How do we actually store our files?
131319ea8026Sopenharmony_ci
131419ea8026Sopenharmony_ciWe've determined that CTZ skip-lists are pretty good at storing data compactly,
131519ea8026Sopenharmony_ciso following the precedent found in other filesystems we could give each file
131619ea8026Sopenharmony_cia skip-list stored in a metadata pair that acts as an inode for the file.
131719ea8026Sopenharmony_ci
131819ea8026Sopenharmony_ci
131919ea8026Sopenharmony_ci```
132019ea8026Sopenharmony_ci                                    .--------.
132119ea8026Sopenharmony_ci                                   .|metadata|
132219ea8026Sopenharmony_ci                                   ||        |
132319ea8026Sopenharmony_ci                                   ||        |
132419ea8026Sopenharmony_ci                                   |'--------'
132519ea8026Sopenharmony_ci                                   '----|---'
132619ea8026Sopenharmony_ci                                        v
132719ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.
132819ea8026Sopenharmony_ci| data 0 |<-| data 1 |<-| data 2 |<-| data 3 |
132919ea8026Sopenharmony_ci|        |<-|        |--|        |  |        |
133019ea8026Sopenharmony_ci|        |  |        |  |        |  |        |
133119ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'
133219ea8026Sopenharmony_ci```
133319ea8026Sopenharmony_ci
133419ea8026Sopenharmony_ciHowever, this doesn't work well when files are small, which is common for
133519ea8026Sopenharmony_ciembedded systems. Compared to PCs, _all_ data in an embedded system is small.
133619ea8026Sopenharmony_ci
133719ea8026Sopenharmony_ciConsider a small 4-byte file. With a two block metadata-pair and one block for
133819ea8026Sopenharmony_cithe CTZ skip-list, we find ourselves using a full 3 blocks. On most NOR flash
133919ea8026Sopenharmony_ciwith 4 KiB blocks, this is 12 KiB of overhead. A ridiculous 3072x increase.
134019ea8026Sopenharmony_ci
134119ea8026Sopenharmony_ci```
134219ea8026Sopenharmony_cifile stored as inode, 4 bytes costs ~12 KiB
134319ea8026Sopenharmony_ci
134419ea8026Sopenharmony_ci .----------------.                  \
134519ea8026Sopenharmony_ci.|    revision    |                  |
134619ea8026Sopenharmony_ci||----------------|    \             |
134719ea8026Sopenharmony_ci||    skiplist   ---.  +- metadata   |
134819ea8026Sopenharmony_ci||----------------| |  /  4x8 bytes  |
134919ea8026Sopenharmony_ci||    checksum    | |     32 bytes   |
135019ea8026Sopenharmony_ci||----------------| |                |
135119ea8026Sopenharmony_ci||       |        | |                +- metadata pair
135219ea8026Sopenharmony_ci||       v        | |                |  2x4 KiB
135319ea8026Sopenharmony_ci||                | |                |  8 KiB
135419ea8026Sopenharmony_ci||                | |                |
135519ea8026Sopenharmony_ci||                | |                |
135619ea8026Sopenharmony_ci||                | |                |
135719ea8026Sopenharmony_ci|'----------------' |                |
135819ea8026Sopenharmony_ci'----------------'  |                /
135919ea8026Sopenharmony_ci          .--------'
136019ea8026Sopenharmony_ci         v
136119ea8026Sopenharmony_ci .----------------.    \             \
136219ea8026Sopenharmony_ci |      data      |    +- data       |
136319ea8026Sopenharmony_ci |----------------|    /  4 bytes    |
136419ea8026Sopenharmony_ci |                |                  |
136519ea8026Sopenharmony_ci |                |                  |
136619ea8026Sopenharmony_ci |                |                  |
136719ea8026Sopenharmony_ci |                |                  +- data block
136819ea8026Sopenharmony_ci |                |                  |  4 KiB
136919ea8026Sopenharmony_ci |                |                  |
137019ea8026Sopenharmony_ci |                |                  |
137119ea8026Sopenharmony_ci |                |                  |
137219ea8026Sopenharmony_ci |                |                  |
137319ea8026Sopenharmony_ci |                |                  |
137419ea8026Sopenharmony_ci '----------------'                  /
137519ea8026Sopenharmony_ci```
137619ea8026Sopenharmony_ci
137719ea8026Sopenharmony_ciWe can make several improvements. First, instead of giving each file its own
137819ea8026Sopenharmony_cimetadata pair, we can store multiple files in a single metadata pair. One way
137919ea8026Sopenharmony_cito do this is to directly associate a directory with a metadata pair (or a
138019ea8026Sopenharmony_cilinked list of metadata pairs). This makes it easy for multiple files to share
138119ea8026Sopenharmony_cithe directory's metadata pair for logging and reduces the collective storage
138219ea8026Sopenharmony_cioverhead.
138319ea8026Sopenharmony_ci
138419ea8026Sopenharmony_ciThe strict binding of metadata pairs and directories also gives users
138519ea8026Sopenharmony_cidirect control over storage utilization depending on how they organize their
138619ea8026Sopenharmony_cidirectories.
138719ea8026Sopenharmony_ci
138819ea8026Sopenharmony_ci```
138919ea8026Sopenharmony_cimultiple files stored in metadata pair, 4 bytes costs ~4 KiB
139019ea8026Sopenharmony_ci
139119ea8026Sopenharmony_ci       .----------------.
139219ea8026Sopenharmony_ci      .|    revision    |
139319ea8026Sopenharmony_ci      ||----------------|
139419ea8026Sopenharmony_ci      ||    A name      |
139519ea8026Sopenharmony_ci      ||   A skiplist  -----.
139619ea8026Sopenharmony_ci      ||----------------|   |  \
139719ea8026Sopenharmony_ci      ||    B name      |   |  +- metadata
139819ea8026Sopenharmony_ci      ||   B skiplist  ---. |  |  4x8 bytes
139919ea8026Sopenharmony_ci      ||----------------| | |  /  32 bytes
140019ea8026Sopenharmony_ci      ||    checksum    | | |
140119ea8026Sopenharmony_ci      ||----------------| | |
140219ea8026Sopenharmony_ci      ||       |        | | |
140319ea8026Sopenharmony_ci      ||       v        | | |
140419ea8026Sopenharmony_ci      |'----------------' | |
140519ea8026Sopenharmony_ci      '----------------'  | |
140619ea8026Sopenharmony_ci         .----------------' |
140719ea8026Sopenharmony_ci        v                   v
140819ea8026Sopenharmony_ci.----------------.  .----------------.  \           \
140919ea8026Sopenharmony_ci|     A data     |  |     B data     |  +- data     |
141019ea8026Sopenharmony_ci|                |  |----------------|  /  4 bytes  |
141119ea8026Sopenharmony_ci|                |  |                |              |
141219ea8026Sopenharmony_ci|                |  |                |              |
141319ea8026Sopenharmony_ci|                |  |                |              |
141419ea8026Sopenharmony_ci|                |  |                |              + data block
141519ea8026Sopenharmony_ci|                |  |                |              | 4 KiB
141619ea8026Sopenharmony_ci|                |  |                |              |
141719ea8026Sopenharmony_ci|----------------|  |                |              |
141819ea8026Sopenharmony_ci|                |  |                |              |
141919ea8026Sopenharmony_ci|                |  |                |              |
142019ea8026Sopenharmony_ci|                |  |                |              |
142119ea8026Sopenharmony_ci'----------------'  '----------------'              /
142219ea8026Sopenharmony_ci```
142319ea8026Sopenharmony_ci
142419ea8026Sopenharmony_ciThe second improvement we can make is noticing that for very small files, our
142519ea8026Sopenharmony_ciattempts to use CTZ skip-lists for compact storage backfires. Metadata pairs
142619ea8026Sopenharmony_cihave a ~4x storage cost, so if our file is smaller than 1/4 the block size,
142719ea8026Sopenharmony_cithere's actually no benefit in storing our file outside of our metadata pair.
142819ea8026Sopenharmony_ci
142919ea8026Sopenharmony_ciIn this case, we can store the file directly in our directory's metadata pair.
143019ea8026Sopenharmony_ciWe call this an inline file, and it allows a directory to store many small
143119ea8026Sopenharmony_cifiles quite efficiently. Our previous 4 byte file now only takes up a
143219ea8026Sopenharmony_citheoretical 16 bytes on disk.
143319ea8026Sopenharmony_ci
143419ea8026Sopenharmony_ci```
143519ea8026Sopenharmony_ciinline files stored in metadata pair, 4 bytes costs ~16 bytes
143619ea8026Sopenharmony_ci
143719ea8026Sopenharmony_ci .----------------.
143819ea8026Sopenharmony_ci.|    revision    |
143919ea8026Sopenharmony_ci||----------------|
144019ea8026Sopenharmony_ci||    A name      |
144119ea8026Sopenharmony_ci||   A skiplist  ---.
144219ea8026Sopenharmony_ci||----------------| |  \
144319ea8026Sopenharmony_ci||    B name      | |  +- data
144419ea8026Sopenharmony_ci||    B data      | |  |  4x4 bytes
144519ea8026Sopenharmony_ci||----------------| |  /  16 bytes
144619ea8026Sopenharmony_ci||    checksum    | |
144719ea8026Sopenharmony_ci||----------------| |
144819ea8026Sopenharmony_ci||       |        | |
144919ea8026Sopenharmony_ci||       v        | |
145019ea8026Sopenharmony_ci|'----------------' |
145119ea8026Sopenharmony_ci'----------------'  |
145219ea8026Sopenharmony_ci          .---------'
145319ea8026Sopenharmony_ci         v
145419ea8026Sopenharmony_ci .----------------.
145519ea8026Sopenharmony_ci |     A data     |
145619ea8026Sopenharmony_ci |                |
145719ea8026Sopenharmony_ci |                |
145819ea8026Sopenharmony_ci |                |
145919ea8026Sopenharmony_ci |                |
146019ea8026Sopenharmony_ci |                |
146119ea8026Sopenharmony_ci |                |
146219ea8026Sopenharmony_ci |                |
146319ea8026Sopenharmony_ci |----------------|
146419ea8026Sopenharmony_ci |                |
146519ea8026Sopenharmony_ci |                |
146619ea8026Sopenharmony_ci |                |
146719ea8026Sopenharmony_ci '----------------'
146819ea8026Sopenharmony_ci```
146919ea8026Sopenharmony_ci
147019ea8026Sopenharmony_ciOnce the file exceeds 1/4 the block size, we switch to a CTZ skip-list. This
147119ea8026Sopenharmony_cimeans that our files never use more than 4x storage overhead, decreasing as
147219ea8026Sopenharmony_cithe file grows in size.
147319ea8026Sopenharmony_ci
147419ea8026Sopenharmony_ci![File storage cost graph][file-cost-graph]
147519ea8026Sopenharmony_ci
147619ea8026Sopenharmony_ci## Directories
147719ea8026Sopenharmony_ci
147819ea8026Sopenharmony_ciNow we just need directories to store our files. As mentioned above we want
147919ea8026Sopenharmony_cia strict binding of directories and metadata pairs, but there are a few
148019ea8026Sopenharmony_cicomplications we need to sort out.
148119ea8026Sopenharmony_ci
148219ea8026Sopenharmony_ciOn their own, each directory is a linked-list of metadata pairs. This lets us
148319ea8026Sopenharmony_cistore an unlimited number of files in each directory, and we don't need to
148419ea8026Sopenharmony_ciworry about the runtime complexity of unbounded logs. We can store other
148519ea8026Sopenharmony_cidirectory pointers in our metadata pairs, which gives us a directory tree, much
148619ea8026Sopenharmony_cilike what you find on other filesystems.
148719ea8026Sopenharmony_ci
148819ea8026Sopenharmony_ci```
148919ea8026Sopenharmony_ci            .--------.
149019ea8026Sopenharmony_ci           .| root   |
149119ea8026Sopenharmony_ci           ||        |
149219ea8026Sopenharmony_ci           ||        |
149319ea8026Sopenharmony_ci           |'--------'
149419ea8026Sopenharmony_ci           '---|--|-'
149519ea8026Sopenharmony_ci            .-'    '-------------------------.
149619ea8026Sopenharmony_ci           v                                  v
149719ea8026Sopenharmony_ci      .--------.        .--------.        .--------.
149819ea8026Sopenharmony_ci     .| dir A  |------->| dir A  |       .| dir B  |
149919ea8026Sopenharmony_ci     ||        |       ||        |       ||        |
150019ea8026Sopenharmony_ci     ||        |       ||        |       ||        |
150119ea8026Sopenharmony_ci     |'--------'       |'--------'       |'--------'
150219ea8026Sopenharmony_ci     '---|--|-'        '----|---'        '---|--|-'
150319ea8026Sopenharmony_ci      .-'    '-.            |             .-'    '-.
150419ea8026Sopenharmony_ci     v          v           v            v          v
150519ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.  .--------.
150619ea8026Sopenharmony_ci| file C |  | file D |  | file E |  | file F |  | file G |
150719ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |
150819ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |
150919ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'  '--------'
151019ea8026Sopenharmony_ci```
151119ea8026Sopenharmony_ci
151219ea8026Sopenharmony_ciThe main complication is, once again, traversal with a constant amount of
151319ea8026Sopenharmony_ci[RAM]. The directory tree is a tree, and the unfortunate fact is you can't
151419ea8026Sopenharmony_citraverse a tree with constant RAM.
151519ea8026Sopenharmony_ci
151619ea8026Sopenharmony_ciFortunately, the elements of our tree are metadata pairs, so unlike CTZ
151719ea8026Sopenharmony_ciskip-lists, we're not limited to strict COW operations. One thing we can do is
151819ea8026Sopenharmony_cithread a linked-list through our tree, explicitly enabling cheap traversal
151919ea8026Sopenharmony_ciover the entire filesystem.
152019ea8026Sopenharmony_ci
152119ea8026Sopenharmony_ci```
152219ea8026Sopenharmony_ci            .--------.
152319ea8026Sopenharmony_ci           .| root   |-.
152419ea8026Sopenharmony_ci           ||        | |
152519ea8026Sopenharmony_ci   .-------||        |-'
152619ea8026Sopenharmony_ci   |       |'--------'
152719ea8026Sopenharmony_ci   |       '---|--|-'
152819ea8026Sopenharmony_ci   |        .-'    '-------------------------.
152919ea8026Sopenharmony_ci   |       v                                  v
153019ea8026Sopenharmony_ci   |  .--------.        .--------.        .--------.
153119ea8026Sopenharmony_ci   '->| dir A  |------->| dir A  |------->| dir B  |
153219ea8026Sopenharmony_ci     ||        |       ||        |       ||        |
153319ea8026Sopenharmony_ci     ||        |       ||        |       ||        |
153419ea8026Sopenharmony_ci     |'--------'       |'--------'       |'--------'
153519ea8026Sopenharmony_ci     '---|--|-'        '----|---'        '---|--|-'
153619ea8026Sopenharmony_ci      .-'    '-.            |             .-'    '-.
153719ea8026Sopenharmony_ci     v          v           v            v          v
153819ea8026Sopenharmony_ci.--------.  .--------.  .--------.  .--------.  .--------.
153919ea8026Sopenharmony_ci| file C |  | file D |  | file E |  | file F |  | file G |
154019ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |
154119ea8026Sopenharmony_ci|        |  |        |  |        |  |        |  |        |
154219ea8026Sopenharmony_ci'--------'  '--------'  '--------'  '--------'  '--------'
154319ea8026Sopenharmony_ci```
154419ea8026Sopenharmony_ci
154519ea8026Sopenharmony_ciUnfortunately, not sticking to pure COW operations creates some problems. Now,
154619ea8026Sopenharmony_ciwhenever we want to manipulate the directory tree, multiple pointers need to be
154719ea8026Sopenharmony_ciupdated. If you're familiar with designing atomic data structures this should
154819ea8026Sopenharmony_ciset off a bunch of red flags.
154919ea8026Sopenharmony_ci
155019ea8026Sopenharmony_ciTo work around this, our threaded linked-list has a bit of leeway. Instead of
155119ea8026Sopenharmony_cionly containing metadata pairs found in our filesystem, it is allowed to
155219ea8026Sopenharmony_cicontain metadata pairs that have no parent because of a power loss. These are
155319ea8026Sopenharmony_cicalled orphaned metadata pairs.
155419ea8026Sopenharmony_ci
155519ea8026Sopenharmony_ciWith the possibility of orphans, we can build power loss resilient operations
155619ea8026Sopenharmony_cithat maintain a filesystem tree threaded with a linked-list for traversal.
155719ea8026Sopenharmony_ci
155819ea8026Sopenharmony_ciAdding a directory to our tree:
155919ea8026Sopenharmony_ci
156019ea8026Sopenharmony_ci```
156119ea8026Sopenharmony_ci         .--------.
156219ea8026Sopenharmony_ci        .| root   |-.
156319ea8026Sopenharmony_ci        ||        | |
156419ea8026Sopenharmony_ci.-------||        |-'
156519ea8026Sopenharmony_ci|       |'--------'
156619ea8026Sopenharmony_ci|       '---|--|-'
156719ea8026Sopenharmony_ci|        .-'    '-.
156819ea8026Sopenharmony_ci|       v          v
156919ea8026Sopenharmony_ci|  .--------.  .--------.
157019ea8026Sopenharmony_ci'->| dir A  |->| dir C  |
157119ea8026Sopenharmony_ci  ||        | ||        |
157219ea8026Sopenharmony_ci  ||        | ||        |
157319ea8026Sopenharmony_ci  |'--------' |'--------'
157419ea8026Sopenharmony_ci  '--------'  '--------'
157519ea8026Sopenharmony_ci
157619ea8026Sopenharmony_ciallocate dir B
157719ea8026Sopenharmony_ci=>
157819ea8026Sopenharmony_ci         .--------.
157919ea8026Sopenharmony_ci        .| root   |-.
158019ea8026Sopenharmony_ci        ||        | |
158119ea8026Sopenharmony_ci.-------||        |-'
158219ea8026Sopenharmony_ci|       |'--------'
158319ea8026Sopenharmony_ci|       '---|--|-'
158419ea8026Sopenharmony_ci|        .-'    '-.
158519ea8026Sopenharmony_ci|       v          v
158619ea8026Sopenharmony_ci|  .--------.    .--------.
158719ea8026Sopenharmony_ci'->| dir A  |--->| dir C  |
158819ea8026Sopenharmony_ci  ||        | .->|        |
158919ea8026Sopenharmony_ci  ||        | | ||        |
159019ea8026Sopenharmony_ci  |'--------' | |'--------'
159119ea8026Sopenharmony_ci  '--------'  | '--------'
159219ea8026Sopenharmony_ci              |
159319ea8026Sopenharmony_ci   .--------. |
159419ea8026Sopenharmony_ci  .| dir B  |-'
159519ea8026Sopenharmony_ci  ||        |
159619ea8026Sopenharmony_ci  ||        |
159719ea8026Sopenharmony_ci  |'--------'
159819ea8026Sopenharmony_ci  '--------'
159919ea8026Sopenharmony_ci
160019ea8026Sopenharmony_ciinsert dir B into threaded linked-list, creating an orphan
160119ea8026Sopenharmony_ci=>
160219ea8026Sopenharmony_ci         .--------.
160319ea8026Sopenharmony_ci        .| root   |-.
160419ea8026Sopenharmony_ci        ||        | |
160519ea8026Sopenharmony_ci.-------||        |-'
160619ea8026Sopenharmony_ci|       |'--------'
160719ea8026Sopenharmony_ci|       '---|--|-'
160819ea8026Sopenharmony_ci|        .-'    '-------------.
160919ea8026Sopenharmony_ci|       v                      v
161019ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
161119ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
161219ea8026Sopenharmony_ci  ||        | || orphan!| ||        |
161319ea8026Sopenharmony_ci  ||        | ||        | ||        |
161419ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
161519ea8026Sopenharmony_ci  '--------'  '--------'  '--------'
161619ea8026Sopenharmony_ci
161719ea8026Sopenharmony_ciadd dir B to parent directory
161819ea8026Sopenharmony_ci=>
161919ea8026Sopenharmony_ci               .--------.
162019ea8026Sopenharmony_ci              .| root   |-.
162119ea8026Sopenharmony_ci              ||        | |
162219ea8026Sopenharmony_ci.-------------||        |-'
162319ea8026Sopenharmony_ci|             |'--------'
162419ea8026Sopenharmony_ci|             '--|-|-|-'
162519ea8026Sopenharmony_ci|        .------'  |  '-------.
162619ea8026Sopenharmony_ci|       v          v           v
162719ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
162819ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
162919ea8026Sopenharmony_ci  ||        | ||        | ||        |
163019ea8026Sopenharmony_ci  ||        | ||        | ||        |
163119ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
163219ea8026Sopenharmony_ci  '--------'  '--------'  '--------'
163319ea8026Sopenharmony_ci```
163419ea8026Sopenharmony_ci
163519ea8026Sopenharmony_ciRemoving a directory:
163619ea8026Sopenharmony_ci
163719ea8026Sopenharmony_ci```
163819ea8026Sopenharmony_ci               .--------.
163919ea8026Sopenharmony_ci              .| root   |-.
164019ea8026Sopenharmony_ci              ||        | |
164119ea8026Sopenharmony_ci.-------------||        |-'
164219ea8026Sopenharmony_ci|             |'--------'
164319ea8026Sopenharmony_ci|             '--|-|-|-'
164419ea8026Sopenharmony_ci|        .------'  |  '-------.
164519ea8026Sopenharmony_ci|       v          v           v
164619ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
164719ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
164819ea8026Sopenharmony_ci  ||        | ||        | ||        |
164919ea8026Sopenharmony_ci  ||        | ||        | ||        |
165019ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
165119ea8026Sopenharmony_ci  '--------'  '--------'  '--------'
165219ea8026Sopenharmony_ci
165319ea8026Sopenharmony_ciremove dir B from parent directory, creating an orphan
165419ea8026Sopenharmony_ci=>
165519ea8026Sopenharmony_ci         .--------.
165619ea8026Sopenharmony_ci        .| root   |-.
165719ea8026Sopenharmony_ci        ||        | |
165819ea8026Sopenharmony_ci.-------||        |-'
165919ea8026Sopenharmony_ci|       |'--------'
166019ea8026Sopenharmony_ci|       '---|--|-'
166119ea8026Sopenharmony_ci|        .-'    '-------------.
166219ea8026Sopenharmony_ci|       v                      v
166319ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
166419ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
166519ea8026Sopenharmony_ci  ||        | || orphan!| ||        |
166619ea8026Sopenharmony_ci  ||        | ||        | ||        |
166719ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
166819ea8026Sopenharmony_ci  '--------'  '--------'  '--------'
166919ea8026Sopenharmony_ci
167019ea8026Sopenharmony_ciremove dir B from threaded linked-list, returning dir B to free blocks
167119ea8026Sopenharmony_ci=>
167219ea8026Sopenharmony_ci         .--------.
167319ea8026Sopenharmony_ci        .| root   |-.
167419ea8026Sopenharmony_ci        ||        | |
167519ea8026Sopenharmony_ci.-------||        |-'
167619ea8026Sopenharmony_ci|       |'--------'
167719ea8026Sopenharmony_ci|       '---|--|-'
167819ea8026Sopenharmony_ci|        .-'    '-.
167919ea8026Sopenharmony_ci|       v          v
168019ea8026Sopenharmony_ci|  .--------.  .--------.
168119ea8026Sopenharmony_ci'->| dir A  |->| dir C  |
168219ea8026Sopenharmony_ci  ||        | ||        |
168319ea8026Sopenharmony_ci  ||        | ||        |
168419ea8026Sopenharmony_ci  |'--------' |'--------'
168519ea8026Sopenharmony_ci  '--------'  '--------'
168619ea8026Sopenharmony_ci```
168719ea8026Sopenharmony_ci
168819ea8026Sopenharmony_ciIn addition to normal directory tree operations, we can use orphans to evict
168919ea8026Sopenharmony_ciblocks in a metadata pair when the block goes bad or exceeds its allocated
169019ea8026Sopenharmony_cierases. If we lose power while evicting a metadata block we may end up with
169119ea8026Sopenharmony_cia situation where the filesystem references the replacement block while the
169219ea8026Sopenharmony_cithreaded linked-list still contains the evicted block. We call this a
169319ea8026Sopenharmony_cihalf-orphan.
169419ea8026Sopenharmony_ci
169519ea8026Sopenharmony_ci```
169619ea8026Sopenharmony_ci               .--------.
169719ea8026Sopenharmony_ci              .| root   |-.
169819ea8026Sopenharmony_ci              ||        | |
169919ea8026Sopenharmony_ci.-------------||        |-'
170019ea8026Sopenharmony_ci|             |'--------'
170119ea8026Sopenharmony_ci|             '--|-|-|-'
170219ea8026Sopenharmony_ci|        .------'  |  '-------.
170319ea8026Sopenharmony_ci|       v          v           v
170419ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
170519ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
170619ea8026Sopenharmony_ci  ||        | ||        | ||        |
170719ea8026Sopenharmony_ci  ||        | ||        | ||        |
170819ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
170919ea8026Sopenharmony_ci  '--------'  '--------'  '--------'
171019ea8026Sopenharmony_ci
171119ea8026Sopenharmony_citry to write to dir B
171219ea8026Sopenharmony_ci=>
171319ea8026Sopenharmony_ci                  .--------.
171419ea8026Sopenharmony_ci                 .| root   |-.
171519ea8026Sopenharmony_ci                 ||        | |
171619ea8026Sopenharmony_ci.----------------||        |-'
171719ea8026Sopenharmony_ci|                |'--------'
171819ea8026Sopenharmony_ci|                '-|-||-|-'
171919ea8026Sopenharmony_ci|        .--------'  ||  '-----.
172019ea8026Sopenharmony_ci|       v            |v         v
172119ea8026Sopenharmony_ci|  .--------.     .--------.  .--------.
172219ea8026Sopenharmony_ci'->| dir A  |---->| dir B  |->| dir C  |
172319ea8026Sopenharmony_ci  ||        |-.   |        | ||        |
172419ea8026Sopenharmony_ci  ||        | |   |        | ||        |
172519ea8026Sopenharmony_ci  |'--------' |   '--------' |'--------'
172619ea8026Sopenharmony_ci  '--------'  |      v       '--------'
172719ea8026Sopenharmony_ci              |  .--------.
172819ea8026Sopenharmony_ci              '->| dir B  |
172919ea8026Sopenharmony_ci                 | bad    |
173019ea8026Sopenharmony_ci                 | block! |
173119ea8026Sopenharmony_ci                 '--------'
173219ea8026Sopenharmony_ci
173319ea8026Sopenharmony_cioh no! bad block detected, allocate replacement
173419ea8026Sopenharmony_ci=>
173519ea8026Sopenharmony_ci                  .--------.
173619ea8026Sopenharmony_ci                 .| root   |-.
173719ea8026Sopenharmony_ci                 ||        | |
173819ea8026Sopenharmony_ci.----------------||        |-'
173919ea8026Sopenharmony_ci|                |'--------'
174019ea8026Sopenharmony_ci|                '-|-||-|-'
174119ea8026Sopenharmony_ci|        .--------'  ||  '-------.
174219ea8026Sopenharmony_ci|       v            |v           v
174319ea8026Sopenharmony_ci|  .--------.     .--------.    .--------.
174419ea8026Sopenharmony_ci'->| dir A  |---->| dir B  |--->| dir C  |
174519ea8026Sopenharmony_ci  ||        |-.   |        | .->|        |
174619ea8026Sopenharmony_ci  ||        | |   |        | | ||        |
174719ea8026Sopenharmony_ci  |'--------' |   '--------' | |'--------'
174819ea8026Sopenharmony_ci  '--------'  |      v       | '--------'
174919ea8026Sopenharmony_ci              |  .--------.  |
175019ea8026Sopenharmony_ci              '->| dir B  |  |
175119ea8026Sopenharmony_ci                 | bad    |  |
175219ea8026Sopenharmony_ci                 | block! |  |
175319ea8026Sopenharmony_ci                 '--------'  |
175419ea8026Sopenharmony_ci                             |
175519ea8026Sopenharmony_ci                 .--------.  |
175619ea8026Sopenharmony_ci                 | dir B  |--'
175719ea8026Sopenharmony_ci                 |        |
175819ea8026Sopenharmony_ci                 |        |
175919ea8026Sopenharmony_ci                 '--------'
176019ea8026Sopenharmony_ci
176119ea8026Sopenharmony_ciinsert replacement in threaded linked-list, creating a half-orphan
176219ea8026Sopenharmony_ci=>
176319ea8026Sopenharmony_ci                  .--------.
176419ea8026Sopenharmony_ci                 .| root   |-.
176519ea8026Sopenharmony_ci                 ||        | |
176619ea8026Sopenharmony_ci.----------------||        |-'
176719ea8026Sopenharmony_ci|                |'--------'
176819ea8026Sopenharmony_ci|                '-|-||-|-'
176919ea8026Sopenharmony_ci|        .--------'  ||  '-------.
177019ea8026Sopenharmony_ci|       v            |v           v
177119ea8026Sopenharmony_ci|  .--------.     .--------.    .--------.
177219ea8026Sopenharmony_ci'->| dir A  |---->| dir B  |--->| dir C  |
177319ea8026Sopenharmony_ci  ||        |-.   |        | .->|        |
177419ea8026Sopenharmony_ci  ||        | |   |        | | ||        |
177519ea8026Sopenharmony_ci  |'--------' |   '--------' | |'--------'
177619ea8026Sopenharmony_ci  '--------'  |      v       | '--------'
177719ea8026Sopenharmony_ci              |  .--------.  |
177819ea8026Sopenharmony_ci              |  | dir B  |  |
177919ea8026Sopenharmony_ci              |  | bad    |  |
178019ea8026Sopenharmony_ci              |  | block! |  |
178119ea8026Sopenharmony_ci              |  '--------'  |
178219ea8026Sopenharmony_ci              |              |
178319ea8026Sopenharmony_ci              |  .--------.  |
178419ea8026Sopenharmony_ci              '->| dir B  |--'
178519ea8026Sopenharmony_ci                 | half   |
178619ea8026Sopenharmony_ci                 | orphan!|
178719ea8026Sopenharmony_ci                 '--------'
178819ea8026Sopenharmony_ci
178919ea8026Sopenharmony_cifix reference in parent directory
179019ea8026Sopenharmony_ci=>
179119ea8026Sopenharmony_ci               .--------.
179219ea8026Sopenharmony_ci              .| root   |-.
179319ea8026Sopenharmony_ci              ||        | |
179419ea8026Sopenharmony_ci.-------------||        |-'
179519ea8026Sopenharmony_ci|             |'--------'
179619ea8026Sopenharmony_ci|             '--|-|-|-'
179719ea8026Sopenharmony_ci|        .------'  |  '-------.
179819ea8026Sopenharmony_ci|       v          v           v
179919ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
180019ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
180119ea8026Sopenharmony_ci  ||        | ||        | ||        |
180219ea8026Sopenharmony_ci  ||        | ||        | ||        |
180319ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
180419ea8026Sopenharmony_ci  '--------'  '--------'  '--------'
180519ea8026Sopenharmony_ci```
180619ea8026Sopenharmony_ci
180719ea8026Sopenharmony_ciFinding orphans and half-orphans is expensive, requiring a _O(n&sup2;)_
180819ea8026Sopenharmony_cicomparison of every metadata pair with every directory entry. But the tradeoff
180919ea8026Sopenharmony_ciis a power resilient filesystem that works with only a bounded amount of RAM.
181019ea8026Sopenharmony_ciFortunately, we only need to check for orphans on the first allocation after
181119ea8026Sopenharmony_ciboot, and a read-only littlefs can ignore the threaded linked-list entirely.
181219ea8026Sopenharmony_ci
181319ea8026Sopenharmony_ciIf we only had some sort of global state, then we could also store a flag and
181419ea8026Sopenharmony_ciavoid searching for orphans unless we knew we were specifically interrupted
181519ea8026Sopenharmony_ciwhile manipulating the directory tree (foreshadowing!).
181619ea8026Sopenharmony_ci
181719ea8026Sopenharmony_ci## The move problem
181819ea8026Sopenharmony_ci
181919ea8026Sopenharmony_ciWe have one last challenge: the move problem. Phrasing the problem is simple:
182019ea8026Sopenharmony_ci
182119ea8026Sopenharmony_ciHow do you atomically move a file between two directories?
182219ea8026Sopenharmony_ci
182319ea8026Sopenharmony_ciIn littlefs we can atomically commit to directories, but we can't create
182419ea8026Sopenharmony_cian atomic commit that spans multiple directories. The filesystem must go
182519ea8026Sopenharmony_cithrough a minimum of two distinct states to complete a move.
182619ea8026Sopenharmony_ci
182719ea8026Sopenharmony_ciTo make matters worse, file moves are a common form of synchronization for
182819ea8026Sopenharmony_cifilesystems. As a filesystem designed for power-loss, it's important we get
182919ea8026Sopenharmony_ciatomic moves right.
183019ea8026Sopenharmony_ci
183119ea8026Sopenharmony_ciSo what can we do?
183219ea8026Sopenharmony_ci
183319ea8026Sopenharmony_ci- We definitely can't just let power-loss result in duplicated or lost files.
183419ea8026Sopenharmony_ci  This could easily break users' code and would only reveal itself in extreme
183519ea8026Sopenharmony_ci  cases. We were only able to be lazy about the threaded linked-list because
183619ea8026Sopenharmony_ci  it isn't user facing and we can handle the corner cases internally.
183719ea8026Sopenharmony_ci
183819ea8026Sopenharmony_ci- Some filesystems propagate COW operations up the tree until a common parent
183919ea8026Sopenharmony_ci  is found. Unfortunately this interacts poorly with our threaded tree and
184019ea8026Sopenharmony_ci  brings back the issue of upward propagation of wear.
184119ea8026Sopenharmony_ci
184219ea8026Sopenharmony_ci- In a previous version of littlefs we tried to solve this problem by going
184319ea8026Sopenharmony_ci  back and forth between the source and destination, marking and unmarking the
184419ea8026Sopenharmony_ci  file as moving in order to make the move atomic from the user perspective.
184519ea8026Sopenharmony_ci  This worked, but not well. Finding failed moves was expensive and required
184619ea8026Sopenharmony_ci  a unique identifier for each file.
184719ea8026Sopenharmony_ci
184819ea8026Sopenharmony_ciIn the end, solving the move problem required creating a new mechanism for
184919ea8026Sopenharmony_cisharing knowledge between multiple metadata pairs. In littlefs this led to the
185019ea8026Sopenharmony_ciintroduction of a mechanism called "global state".
185119ea8026Sopenharmony_ci
185219ea8026Sopenharmony_ci---
185319ea8026Sopenharmony_ci
185419ea8026Sopenharmony_ciGlobal state is a small set of state that can be updated from _any_ metadata
185519ea8026Sopenharmony_cipair. Combining global state with metadata pairs' ability to update multiple
185619ea8026Sopenharmony_cientries in one commit gives us a powerful tool for crafting complex atomic
185719ea8026Sopenharmony_cioperations.
185819ea8026Sopenharmony_ci
185919ea8026Sopenharmony_ciHow does global state work?
186019ea8026Sopenharmony_ci
186119ea8026Sopenharmony_ciGlobal state exists as a set of deltas that are distributed across the metadata
186219ea8026Sopenharmony_cipairs in the filesystem. The actual global state can be built out of these
186319ea8026Sopenharmony_cideltas by xoring together all of the deltas in the filesystem.
186419ea8026Sopenharmony_ci
186519ea8026Sopenharmony_ci```
186619ea8026Sopenharmony_ci .--------.  .--------.  .--------.  .--------.  .--------.
186719ea8026Sopenharmony_ci.|        |->| gdelta |->|        |->| gdelta |->| gdelta |
186819ea8026Sopenharmony_ci||        | || 0x23   | ||        | || 0xff   | || 0xce   |
186919ea8026Sopenharmony_ci||        | ||        | ||        | ||        | ||        |
187019ea8026Sopenharmony_ci|'--------' |'--------' |'--------' |'--------' |'--------'
187119ea8026Sopenharmony_ci'--------'  '----|---'  '--------'  '----|---'  '----|---'
187219ea8026Sopenharmony_ci                 v                       v           v
187319ea8026Sopenharmony_ci       0x00 --> xor ------------------> xor ------> xor --> gstate 0x12
187419ea8026Sopenharmony_ci```
187519ea8026Sopenharmony_ci
187619ea8026Sopenharmony_ciTo update the global state from a metadata pair, we take the global state we
187719ea8026Sopenharmony_ciknow and xor it with both our changes and any existing delta in the metadata
187819ea8026Sopenharmony_cipair. Committing this new delta to the metadata pair commits the changes to
187919ea8026Sopenharmony_cithe filesystem's global state.
188019ea8026Sopenharmony_ci
188119ea8026Sopenharmony_ci```
188219ea8026Sopenharmony_ci .--------.  .--------.  .--------.  .--------.  .--------.
188319ea8026Sopenharmony_ci.|        |->| gdelta |->|        |->| gdelta |->| gdelta |
188419ea8026Sopenharmony_ci||        | || 0x23   | ||        | || 0xff   | || 0xce   |
188519ea8026Sopenharmony_ci||        | ||        | ||        | ||        | ||        |
188619ea8026Sopenharmony_ci|'--------' |'--------' |'--------' |'--------' |'--------'
188719ea8026Sopenharmony_ci'--------'  '----|---'  '--------'  '--|---|-'  '----|---'
188819ea8026Sopenharmony_ci                 v                     v   |         v
188919ea8026Sopenharmony_ci       0x00 --> xor ----------------> xor -|------> xor --> gstate = 0x12
189019ea8026Sopenharmony_ci                                           |                          |
189119ea8026Sopenharmony_ci                                           |                          |
189219ea8026Sopenharmony_cichange gstate to 0xab --> xor <------------|--------------------------'
189319ea8026Sopenharmony_ci=>                         |               v
189419ea8026Sopenharmony_ci                           '------------> xor
189519ea8026Sopenharmony_ci                                           |
189619ea8026Sopenharmony_ci                                           v
189719ea8026Sopenharmony_ci .--------.  .--------.  .--------.  .--------.  .--------.
189819ea8026Sopenharmony_ci.|        |->| gdelta |->|        |->| gdelta |->| gdelta |
189919ea8026Sopenharmony_ci||        | || 0x23   | ||        | || 0x46   | || 0xce   |
190019ea8026Sopenharmony_ci||        | ||        | ||        | ||        | ||        |
190119ea8026Sopenharmony_ci|'--------' |'--------' |'--------' |'--------' |'--------'
190219ea8026Sopenharmony_ci'--------'  '----|---'  '--------'  '----|---'  '----|---'
190319ea8026Sopenharmony_ci                 v                       v           v
190419ea8026Sopenharmony_ci       0x00 --> xor ------------------> xor ------> xor --> gstate = 0xab
190519ea8026Sopenharmony_ci```
190619ea8026Sopenharmony_ci
190719ea8026Sopenharmony_ciTo make this efficient, we always keep a copy of the global state in RAM. We
190819ea8026Sopenharmony_cionly need to iterate over our metadata pairs and build the global state when
190919ea8026Sopenharmony_cithe filesystem is mounted.
191019ea8026Sopenharmony_ci
191119ea8026Sopenharmony_ciYou may have noticed that global state is very expensive. We keep a copy in
191219ea8026Sopenharmony_ciRAM and a delta in an unbounded number of metadata pairs. Even if we reset
191319ea8026Sopenharmony_cithe global state to its initial value, we can't easily clean up the deltas on
191419ea8026Sopenharmony_cidisk. For this reason, it's very important that we keep the size of global
191519ea8026Sopenharmony_cistate bounded and extremely small. But, even with a strict budget, global
191619ea8026Sopenharmony_cistate is incredibly valuable.
191719ea8026Sopenharmony_ci
191819ea8026Sopenharmony_ci---
191919ea8026Sopenharmony_ci
192019ea8026Sopenharmony_ciNow we can solve the move problem. We can create global state describing our
192119ea8026Sopenharmony_cimove atomically with the creation of the new file, and we can clear this move
192219ea8026Sopenharmony_cistate atomically with the removal of the old file.
192319ea8026Sopenharmony_ci
192419ea8026Sopenharmony_ci```
192519ea8026Sopenharmony_ci               .--------.    gstate = no move
192619ea8026Sopenharmony_ci              .| root   |-.
192719ea8026Sopenharmony_ci              ||        | |
192819ea8026Sopenharmony_ci.-------------||        |-'
192919ea8026Sopenharmony_ci|             |'--------'
193019ea8026Sopenharmony_ci|             '--|-|-|-'
193119ea8026Sopenharmony_ci|        .------'  |  '-------.
193219ea8026Sopenharmony_ci|       v          v           v
193319ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
193419ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
193519ea8026Sopenharmony_ci  ||        | ||        | ||        |
193619ea8026Sopenharmony_ci  ||        | ||        | ||        |
193719ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
193819ea8026Sopenharmony_ci  '----|---'  '--------'  '--------'
193919ea8026Sopenharmony_ci       v
194019ea8026Sopenharmony_ci   .--------.
194119ea8026Sopenharmony_ci   | file D |
194219ea8026Sopenharmony_ci   |        |
194319ea8026Sopenharmony_ci   |        |
194419ea8026Sopenharmony_ci   '--------'
194519ea8026Sopenharmony_ci
194619ea8026Sopenharmony_cibegin move, add reference in dir C, change gstate to have move
194719ea8026Sopenharmony_ci=>
194819ea8026Sopenharmony_ci               .--------.    gstate = moving file D in dir A (m1)
194919ea8026Sopenharmony_ci              .| root   |-.
195019ea8026Sopenharmony_ci              ||        | |
195119ea8026Sopenharmony_ci.-------------||        |-'
195219ea8026Sopenharmony_ci|             |'--------'
195319ea8026Sopenharmony_ci|             '--|-|-|-'
195419ea8026Sopenharmony_ci|        .------'  |  '-------.
195519ea8026Sopenharmony_ci|       v          v           v
195619ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
195719ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
195819ea8026Sopenharmony_ci  ||        | ||        | || gdelta |
195919ea8026Sopenharmony_ci  ||        | ||        | || =m1    |
196019ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
196119ea8026Sopenharmony_ci  '----|---'  '--------'  '----|---'
196219ea8026Sopenharmony_ci       |     .----------------'
196319ea8026Sopenharmony_ci       v    v
196419ea8026Sopenharmony_ci     .--------.
196519ea8026Sopenharmony_ci     | file D |
196619ea8026Sopenharmony_ci     |        |
196719ea8026Sopenharmony_ci     |        |
196819ea8026Sopenharmony_ci     '--------'
196919ea8026Sopenharmony_ci
197019ea8026Sopenharmony_cicomplete move, remove reference in dir A, change gstate to no move
197119ea8026Sopenharmony_ci=>
197219ea8026Sopenharmony_ci               .--------.    gstate = no move (m1^~m1)
197319ea8026Sopenharmony_ci              .| root   |-.
197419ea8026Sopenharmony_ci              ||        | |
197519ea8026Sopenharmony_ci.-------------||        |-'
197619ea8026Sopenharmony_ci|             |'--------'
197719ea8026Sopenharmony_ci|             '--|-|-|-'
197819ea8026Sopenharmony_ci|        .------'  |  '-------.
197919ea8026Sopenharmony_ci|       v          v           v
198019ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
198119ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
198219ea8026Sopenharmony_ci  || gdelta | ||        | || gdelta |
198319ea8026Sopenharmony_ci  || =~m1   | ||        | || =m1    |
198419ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
198519ea8026Sopenharmony_ci  '--------'  '--------'  '----|---'
198619ea8026Sopenharmony_ci                               v
198719ea8026Sopenharmony_ci                           .--------.
198819ea8026Sopenharmony_ci                           | file D |
198919ea8026Sopenharmony_ci                           |        |
199019ea8026Sopenharmony_ci                           |        |
199119ea8026Sopenharmony_ci                           '--------'
199219ea8026Sopenharmony_ci```
199319ea8026Sopenharmony_ci
199419ea8026Sopenharmony_ci
199519ea8026Sopenharmony_ciIf, after building our global state during mount, we find information
199619ea8026Sopenharmony_cidescribing an ongoing move, we know we lost power during a move and the file
199719ea8026Sopenharmony_ciis duplicated in both the source and destination directories. If this happens,
199819ea8026Sopenharmony_ciwe can resolve the move using the information in the global state to remove
199919ea8026Sopenharmony_cione of the files.
200019ea8026Sopenharmony_ci
200119ea8026Sopenharmony_ci```
200219ea8026Sopenharmony_ci                 .--------.    gstate = moving file D in dir A (m1)
200319ea8026Sopenharmony_ci                .| root   |-.             ^
200419ea8026Sopenharmony_ci                ||        |------------> xor
200519ea8026Sopenharmony_ci.---------------||        |-'             ^
200619ea8026Sopenharmony_ci|               |'--------'               |
200719ea8026Sopenharmony_ci|               '--|-|-|-'                |
200819ea8026Sopenharmony_ci|        .--------'  |  '---------.       |
200919ea8026Sopenharmony_ci|       |            |             |      |
201019ea8026Sopenharmony_ci|       |     .----------> xor --------> xor
201119ea8026Sopenharmony_ci|       v     |      v      ^      v      ^
201219ea8026Sopenharmony_ci|  .--------. |  .--------. |  .--------. |
201319ea8026Sopenharmony_ci'->| dir A  |-|->| dir B  |-|->| dir C  | |
201419ea8026Sopenharmony_ci  ||        |-' ||        |-' || gdelta |-'
201519ea8026Sopenharmony_ci  ||        |   ||        |   || =m1    |
201619ea8026Sopenharmony_ci  |'--------'   |'--------'   |'--------'
201719ea8026Sopenharmony_ci  '----|---'    '--------'    '----|---'
201819ea8026Sopenharmony_ci       |     .---------------------'
201919ea8026Sopenharmony_ci       v    v
202019ea8026Sopenharmony_ci     .--------.
202119ea8026Sopenharmony_ci     | file D |
202219ea8026Sopenharmony_ci     |        |
202319ea8026Sopenharmony_ci     |        |
202419ea8026Sopenharmony_ci     '--------'
202519ea8026Sopenharmony_ci```
202619ea8026Sopenharmony_ci
202719ea8026Sopenharmony_ciWe can also move directories the same way we move files. There is the threaded
202819ea8026Sopenharmony_cilinked-list to consider, but leaving the threaded linked-list unchanged works
202919ea8026Sopenharmony_cifine as the order doesn't really matter.
203019ea8026Sopenharmony_ci
203119ea8026Sopenharmony_ci```
203219ea8026Sopenharmony_ci               .--------.    gstate = no move (m1^~m1)
203319ea8026Sopenharmony_ci              .| root   |-.
203419ea8026Sopenharmony_ci              ||        | |
203519ea8026Sopenharmony_ci.-------------||        |-'
203619ea8026Sopenharmony_ci|             |'--------'
203719ea8026Sopenharmony_ci|             '--|-|-|-'
203819ea8026Sopenharmony_ci|        .------'  |  '-------.
203919ea8026Sopenharmony_ci|       v          v           v
204019ea8026Sopenharmony_ci|  .--------.  .--------.  .--------.
204119ea8026Sopenharmony_ci'->| dir A  |->| dir B  |->| dir C  |
204219ea8026Sopenharmony_ci  || gdelta | ||        | || gdelta |
204319ea8026Sopenharmony_ci  || =~m1   | ||        | || =m1    |
204419ea8026Sopenharmony_ci  |'--------' |'--------' |'--------'
204519ea8026Sopenharmony_ci  '--------'  '--------'  '----|---'
204619ea8026Sopenharmony_ci                               v
204719ea8026Sopenharmony_ci                           .--------.
204819ea8026Sopenharmony_ci                           | file D |
204919ea8026Sopenharmony_ci                           |        |
205019ea8026Sopenharmony_ci                           |        |
205119ea8026Sopenharmony_ci                           '--------'
205219ea8026Sopenharmony_ci
205319ea8026Sopenharmony_cibegin move, add reference in dir C, change gstate to have move
205419ea8026Sopenharmony_ci=>
205519ea8026Sopenharmony_ci                .--------.    gstate = moving dir B in root (m1^~m1^m2)
205619ea8026Sopenharmony_ci               .| root   |-.
205719ea8026Sopenharmony_ci               ||        | |
205819ea8026Sopenharmony_ci.--------------||        |-'
205919ea8026Sopenharmony_ci|              |'--------'
206019ea8026Sopenharmony_ci|              '--|-|-|-'
206119ea8026Sopenharmony_ci|        .-------'  |  '----------.
206219ea8026Sopenharmony_ci|       v           |              v
206319ea8026Sopenharmony_ci|  .--------.       |          .--------.
206419ea8026Sopenharmony_ci'->| dir A  |-.     |       .->| dir C  |
206519ea8026Sopenharmony_ci  || gdelta | |     |       | || gdelta |
206619ea8026Sopenharmony_ci  || =~m1   | |     |       | || =m1^m2 |
206719ea8026Sopenharmony_ci  |'--------' |     |       | |'--------'
206819ea8026Sopenharmony_ci  '--------'  |     |       | '---|--|-'
206919ea8026Sopenharmony_ci              |     |    .-------'   |
207019ea8026Sopenharmony_ci              |     v   v   |        v
207119ea8026Sopenharmony_ci              |  .--------. |    .--------.
207219ea8026Sopenharmony_ci              '->| dir B  |-'    | file D |
207319ea8026Sopenharmony_ci                ||        |      |        |
207419ea8026Sopenharmony_ci                ||        |      |        |
207519ea8026Sopenharmony_ci                |'--------'      '--------'
207619ea8026Sopenharmony_ci                '--------'
207719ea8026Sopenharmony_ci
207819ea8026Sopenharmony_cicomplete move, remove reference in root, change gstate to no move
207919ea8026Sopenharmony_ci=>
208019ea8026Sopenharmony_ci             .--------.    gstate = no move (m1^~m1^m2^~m2)
208119ea8026Sopenharmony_ci            .| root   |-.
208219ea8026Sopenharmony_ci            || gdelta | |
208319ea8026Sopenharmony_ci.-----------|| =~m2   |-'
208419ea8026Sopenharmony_ci|           |'--------'
208519ea8026Sopenharmony_ci|           '---|--|-'
208619ea8026Sopenharmony_ci|        .-----'    '-----.
208719ea8026Sopenharmony_ci|       v                  v
208819ea8026Sopenharmony_ci|  .--------.          .--------.
208919ea8026Sopenharmony_ci'->| dir A  |-.     .->| dir C  |
209019ea8026Sopenharmony_ci  || gdelta | |     | || gdelta |
209119ea8026Sopenharmony_ci  || =~m1   | |     '-|| =m1^m2 |-------.
209219ea8026Sopenharmony_ci  |'--------' |       |'--------'       |
209319ea8026Sopenharmony_ci  '--------'  |       '---|--|-'        |
209419ea8026Sopenharmony_ci              |        .-'    '-.       |
209519ea8026Sopenharmony_ci              |       v          v      |
209619ea8026Sopenharmony_ci              |  .--------.  .--------. |
209719ea8026Sopenharmony_ci              '->| dir B  |--| file D |-'
209819ea8026Sopenharmony_ci                ||        |  |        |
209919ea8026Sopenharmony_ci                ||        |  |        |
210019ea8026Sopenharmony_ci                |'--------'  '--------'
210119ea8026Sopenharmony_ci                '--------'
210219ea8026Sopenharmony_ci```
210319ea8026Sopenharmony_ci
210419ea8026Sopenharmony_ciGlobal state gives us a powerful tool we can use to solve the move problem.
210519ea8026Sopenharmony_ciAnd the result is surprisingly performant, only needing the minimum number
210619ea8026Sopenharmony_ciof states and using the same number of commits as a naive move. Additionally,
210719ea8026Sopenharmony_ciglobal state gives us a bit of persistent state we can use for some other
210819ea8026Sopenharmony_cismall improvements.
210919ea8026Sopenharmony_ci
211019ea8026Sopenharmony_ci## Conclusion
211119ea8026Sopenharmony_ci
211219ea8026Sopenharmony_ciAnd that's littlefs, thanks for reading!
211319ea8026Sopenharmony_ci
211419ea8026Sopenharmony_ci
211519ea8026Sopenharmony_ci[wikipedia-flash]: https://en.wikipedia.org/wiki/Flash_memory
211619ea8026Sopenharmony_ci[wikipedia-sna]: https://en.wikipedia.org/wiki/Serial_number_arithmetic
211719ea8026Sopenharmony_ci[wikipedia-crc]: https://en.wikipedia.org/wiki/Cyclic_redundancy_check
211819ea8026Sopenharmony_ci[wikipedia-cow]: https://en.wikipedia.org/wiki/Copy-on-write
211919ea8026Sopenharmony_ci[wikipedia-B-tree]: https://en.wikipedia.org/wiki/B-tree
212019ea8026Sopenharmony_ci[wikipedia-B+-tree]: https://en.wikipedia.org/wiki/B%2B_tree
212119ea8026Sopenharmony_ci[wikipedia-skip-list]: https://en.wikipedia.org/wiki/Skip_list
212219ea8026Sopenharmony_ci[wikipedia-ctz]: https://en.wikipedia.org/wiki/Count_trailing_zeros
212319ea8026Sopenharmony_ci[wikipedia-ecc]: https://en.wikipedia.org/wiki/Error_correction_code
212419ea8026Sopenharmony_ci[wikipedia-hamming-bound]: https://en.wikipedia.org/wiki/Hamming_bound
212519ea8026Sopenharmony_ci[wikipedia-dynamic-wear-leveling]: https://en.wikipedia.org/wiki/Wear_leveling#Dynamic_wear_leveling
212619ea8026Sopenharmony_ci[wikipedia-static-wear-leveling]: https://en.wikipedia.org/wiki/Wear_leveling#Static_wear_leveling
212719ea8026Sopenharmony_ci[wikipedia-ftl]: https://en.wikipedia.org/wiki/Flash_translation_layer
212819ea8026Sopenharmony_ci
212919ea8026Sopenharmony_ci[oeis]: https://oeis.org
213019ea8026Sopenharmony_ci[A001511]: https://oeis.org/A001511
213119ea8026Sopenharmony_ci[A005187]: https://oeis.org/A005187
213219ea8026Sopenharmony_ci
213319ea8026Sopenharmony_ci[fat]: https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system
213419ea8026Sopenharmony_ci[ext2]: http://e2fsprogs.sourceforge.net/ext2intro.html
213519ea8026Sopenharmony_ci[jffs]: https://www.sourceware.org/jffs2/jffs2-html
213619ea8026Sopenharmony_ci[yaffs]: https://yaffs.net/documents/how-yaffs-works
213719ea8026Sopenharmony_ci[spiffs]: https://github.com/pellepl/spiffs/blob/master/docs/TECH_SPEC
213819ea8026Sopenharmony_ci[ext4]: https://ext4.wiki.kernel.org/index.php/Ext4_Design
213919ea8026Sopenharmony_ci[ntfs]: https://en.wikipedia.org/wiki/NTFS
214019ea8026Sopenharmony_ci[btrfs]: https://btrfs.wiki.kernel.org/index.php/Btrfs_design
214119ea8026Sopenharmony_ci[zfs]: https://en.wikipedia.org/wiki/ZFS
214219ea8026Sopenharmony_ci
214319ea8026Sopenharmony_ci[cow]: https://upload.wikimedia.org/wikipedia/commons/0/0c/Cow_female_black_white.jpg
214419ea8026Sopenharmony_ci[elephant]: https://upload.wikimedia.org/wikipedia/commons/3/37/African_Bush_Elephant.jpg
214519ea8026Sopenharmony_ci[ram]: https://upload.wikimedia.org/wikipedia/commons/9/97/New_Mexico_Bighorn_Sheep.JPG
214619ea8026Sopenharmony_ci
214719ea8026Sopenharmony_ci[metadata-formula1]: https://latex.codecogs.com/svg.latex?cost%20%3D%20n%20&plus;%20n%20%5Cfrac%7Bs%7D%7Bd&plus;1%7D
214819ea8026Sopenharmony_ci[metadata-formula2]: https://latex.codecogs.com/svg.latex?s%20%3D%20r%20%5Cfrac%7Bsize%7D%7Bn%7D
214919ea8026Sopenharmony_ci[metadata-formula3]: https://latex.codecogs.com/svg.latex?d%20%3D%20%281-r%29%20%5Cfrac%7Bsize%7D%7Bn%7D
215019ea8026Sopenharmony_ci[metadata-formula4]: https://latex.codecogs.com/svg.latex?cost%20%3D%20n%20&plus;%20n%20%5Cfrac%7Br%5Cfrac%7Bsize%7D%7Bn%7D%7D%7B%281-r%29%5Cfrac%7Bsize%7D%7Bn%7D&plus;1%7D
215119ea8026Sopenharmony_ci
215219ea8026Sopenharmony_ci[ctz-formula1]: https://latex.codecogs.com/svg.latex?%5Clim_%7Bn%5Cto%5Cinfty%7D%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D0%7D%5E%7Bn%7D%5Cleft%28%5Ctext%7Bctz%7D%28i%29&plus;1%5Cright%29%20%3D%20%5Csum_%7Bi%3D0%7D%5Cfrac%7B1%7D%7B2%5Ei%7D%20%3D%202
215319ea8026Sopenharmony_ci[ctz-formula2]: https://latex.codecogs.com/svg.latex?B%20%3D%20%5Cfrac%7Bw%7D%7B8%7D%5Cleft%5Clceil%5Clog_2%5Cleft%28%5Cfrac%7B2%5Ew%7D%7BB-2%5Cfrac%7Bw%7D%7B8%7D%7D%5Cright%29%5Cright%5Crceil
215419ea8026Sopenharmony_ci[ctz-formula3]: https://latex.codecogs.com/svg.latex?N%20%3D%20%5Csum_i%5En%5Cleft%5BB-%5Cfrac%7Bw%7D%7B8%7D%5Cleft%28%5Ctext%7Bctz%7D%28i%29&plus;1%5Cright%29%5Cright%5D
215519ea8026Sopenharmony_ci[ctz-formula4]: https://latex.codecogs.com/svg.latex?%5Csum_i%5En%5Cleft%28%5Ctext%7Bctz%7D%28i%29&plus;1%5Cright%29%20%3D%202n-%5Ctext%7Bpopcount%7D%28n%29
215619ea8026Sopenharmony_ci[ctz-formula5]: https://latex.codecogs.com/svg.latex?N%20%3D%20Bn%20-%20%5Cfrac%7Bw%7D%7B8%7D%5Cleft%282n-%5Ctext%7Bpopcount%7D%28n%29%5Cright%29
215719ea8026Sopenharmony_ci[ctz-formula6]: https://latex.codecogs.com/svg.latex?n%20%3D%20%5Cleft%5Clfloor%5Cfrac%7BN-%5Cfrac%7Bw%7D%7B8%7D%5Cleft%28%5Ctext%7Bpopcount%7D%5Cleft%28%5Cfrac%7BN%7D%7BB-2%5Cfrac%7Bw%7D%7B8%7D%7D-1%5Cright%29&plus;2%5Cright%29%7D%7BB-2%5Cfrac%7Bw%7D%7B8%7D%7D%5Cright%5Crfloor
215819ea8026Sopenharmony_ci[ctz-formula7]: https://latex.codecogs.com/svg.latex?%5Cmathit%7Boff%7D%20%3D%20N%20-%20%5Cleft%28B-2%5Cfrac%7Bw%7D%7B8%7D%5Cright%29n%20-%20%5Cfrac%7Bw%7D%7B8%7D%5Ctext%7Bpopcount%7D%28n%29
215919ea8026Sopenharmony_ci
216019ea8026Sopenharmony_ci[bigB]: https://latex.codecogs.com/svg.latex?B
216119ea8026Sopenharmony_ci[d]: https://latex.codecogs.com/svg.latex?d
216219ea8026Sopenharmony_ci[m]: https://latex.codecogs.com/svg.latex?m
216319ea8026Sopenharmony_ci[bigN]: https://latex.codecogs.com/svg.latex?N
216419ea8026Sopenharmony_ci[n]: https://latex.codecogs.com/svg.latex?n
216519ea8026Sopenharmony_ci[n']: https://latex.codecogs.com/svg.latex?n%27
216619ea8026Sopenharmony_ci[r]: https://latex.codecogs.com/svg.latex?r
216719ea8026Sopenharmony_ci[s]: https://latex.codecogs.com/svg.latex?s
216819ea8026Sopenharmony_ci[w]: https://latex.codecogs.com/svg.latex?w
216919ea8026Sopenharmony_ci[x]: https://latex.codecogs.com/svg.latex?x
217019ea8026Sopenharmony_ci
217119ea8026Sopenharmony_ci[metadata-cost-graph]: https://raw.githubusercontent.com/geky/littlefs/gh-images/metadata-cost.svg?sanitize=true
217219ea8026Sopenharmony_ci[wear-distribution-graph]: https://raw.githubusercontent.com/geky/littlefs/gh-images/wear-distribution.svg?sanitize=true
217319ea8026Sopenharmony_ci[file-cost-graph]: https://raw.githubusercontent.com/geky/littlefs/gh-images/file-cost.svg?sanitize=true
2174