162306a36Sopenharmony_ci=============
262306a36Sopenharmony_cidm-log-writes
362306a36Sopenharmony_ci=============
462306a36Sopenharmony_ci
562306a36Sopenharmony_ciThis target takes 2 devices, one to pass all IO to normally, and one to log all
662306a36Sopenharmony_ciof the write operations to.  This is intended for file system developers wishing
762306a36Sopenharmony_cito verify the integrity of metadata or data as the file system is written to.
862306a36Sopenharmony_ciThere is a log_write_entry written for every WRITE request and the target is
962306a36Sopenharmony_ciable to take arbitrary data from userspace to insert into the log.  The data
1062306a36Sopenharmony_cithat is in the WRITE requests is copied into the log to make the replay happen
1162306a36Sopenharmony_ciexactly as it happened originally.
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ciLog Ordering
1462306a36Sopenharmony_ci============
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ciWe log things in order of completion once we are sure the write is no longer in
1762306a36Sopenharmony_cicache.  This means that normal WRITE requests are not actually logged until the
1862306a36Sopenharmony_cinext REQ_PREFLUSH request.  This is to make it easier for userspace to replay
1962306a36Sopenharmony_cithe log in a way that correlates to what is on disk and not what is in cache,
2062306a36Sopenharmony_cito make it easier to detect improper waiting/flushing.
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciThis works by attaching all WRITE requests to a list once the write completes.
2362306a36Sopenharmony_ciOnce we see a REQ_PREFLUSH request we splice this list onto the request and once
2462306a36Sopenharmony_cithe FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
2562306a36Sopenharmony_cicompleted WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
2662306a36Sopenharmony_cisimulate the worst case scenario with regard to power failures.  Consider the
2762306a36Sopenharmony_cifollowing example (W means write, C means complete):
2862306a36Sopenharmony_ci
2962306a36Sopenharmony_ci	W1,W2,W3,C3,C2,Wflush,C1,Cflush
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ciThe log would show the following:
3262306a36Sopenharmony_ci
3362306a36Sopenharmony_ci	W3,W2,flush,W1....
3462306a36Sopenharmony_ci
3562306a36Sopenharmony_ciAgain this is to simulate what is actually on disk, this allows us to detect
3662306a36Sopenharmony_cicases where a power failure at a particular point in time would create an
3762306a36Sopenharmony_ciinconsistent file system.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciAny REQ_FUA requests bypass this flushing mechanism and are logged as soon as
4062306a36Sopenharmony_cithey complete as those requests will obviously bypass the device cache.
4162306a36Sopenharmony_ci
4262306a36Sopenharmony_ciAny REQ_OP_DISCARD requests are treated like WRITE requests.  Otherwise we would
4362306a36Sopenharmony_cihave all the DISCARD requests, and then the WRITE requests and then the FLUSH
4462306a36Sopenharmony_cirequest.  Consider the following example:
4562306a36Sopenharmony_ci
4662306a36Sopenharmony_ci	WRITE block 1, DISCARD block 1, FLUSH
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ciIf we logged DISCARD when it completed, the replay would look like this:
4962306a36Sopenharmony_ci
5062306a36Sopenharmony_ci	DISCARD 1, WRITE 1, FLUSH
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_ciwhich isn't quite what happened and wouldn't be caught during the log replay.
5362306a36Sopenharmony_ci
5462306a36Sopenharmony_ciTarget interface
5562306a36Sopenharmony_ci================
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_cii) Constructor
5862306a36Sopenharmony_ci
5962306a36Sopenharmony_ci   log-writes <dev_path> <log_dev_path>
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ci   ============= ==============================================
6262306a36Sopenharmony_ci   dev_path	 Device that all of the IO will go to normally.
6362306a36Sopenharmony_ci   log_dev_path  Device where the log entries are written to.
6462306a36Sopenharmony_ci   ============= ==============================================
6562306a36Sopenharmony_ci
6662306a36Sopenharmony_ciii) Status
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ci    <#logged entries> <highest allocated sector>
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ci    =========================== ========================
7162306a36Sopenharmony_ci    #logged entries	        Number of logged entries
7262306a36Sopenharmony_ci    highest allocated sector    Highest allocated sector
7362306a36Sopenharmony_ci    =========================== ========================
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_ciiii) Messages
7662306a36Sopenharmony_ci
7762306a36Sopenharmony_ci    mark <description>
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ci	You can use a dmsetup message to set an arbitrary mark in a log.
8062306a36Sopenharmony_ci	For example say you want to fsck a file system after every
8162306a36Sopenharmony_ci	write, but first you need to replay up to the mkfs to make sure
8262306a36Sopenharmony_ci	we're fsck'ing something reasonable, you would do something like
8362306a36Sopenharmony_ci	this::
8462306a36Sopenharmony_ci
8562306a36Sopenharmony_ci	  mkfs.btrfs -f /dev/mapper/log
8662306a36Sopenharmony_ci	  dmsetup message log 0 mark mkfs
8762306a36Sopenharmony_ci	  <run test>
8862306a36Sopenharmony_ci
8962306a36Sopenharmony_ci	This would allow you to replay the log up to the mkfs mark and
9062306a36Sopenharmony_ci	then replay from that point on doing the fsck check in the
9162306a36Sopenharmony_ci	interval that you want.
9262306a36Sopenharmony_ci
9362306a36Sopenharmony_ci	Every log has a mark at the end labeled "dm-log-writes-end".
9462306a36Sopenharmony_ci
9562306a36Sopenharmony_ciUserspace component
9662306a36Sopenharmony_ci===================
9762306a36Sopenharmony_ci
9862306a36Sopenharmony_ciThere is a userspace tool that will replay the log for you in various ways.
9962306a36Sopenharmony_ciIt can be found here: https://github.com/josefbacik/log-writes
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ciExample usage
10262306a36Sopenharmony_ci=============
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ciSay you want to test fsync on your file system.  You would do something like
10562306a36Sopenharmony_cithis::
10662306a36Sopenharmony_ci
10762306a36Sopenharmony_ci  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
10862306a36Sopenharmony_ci  dmsetup create log --table "$TABLE"
10962306a36Sopenharmony_ci  mkfs.btrfs -f /dev/mapper/log
11062306a36Sopenharmony_ci  dmsetup message log 0 mark mkfs
11162306a36Sopenharmony_ci
11262306a36Sopenharmony_ci  mount /dev/mapper/log /mnt/btrfs-test
11362306a36Sopenharmony_ci  <some test that does fsync at the end>
11462306a36Sopenharmony_ci  dmsetup message log 0 mark fsync
11562306a36Sopenharmony_ci  md5sum /mnt/btrfs-test/foo
11662306a36Sopenharmony_ci  umount /mnt/btrfs-test
11762306a36Sopenharmony_ci
11862306a36Sopenharmony_ci  dmsetup remove log
11962306a36Sopenharmony_ci  replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
12062306a36Sopenharmony_ci  mount /dev/sdb /mnt/btrfs-test
12162306a36Sopenharmony_ci  md5sum /mnt/btrfs-test/foo
12262306a36Sopenharmony_ci  <verify md5sum's are correct>
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ci  Another option is to do a complicated file system operation and verify the file
12562306a36Sopenharmony_ci  system is consistent during the entire operation.  You could do this with:
12662306a36Sopenharmony_ci
12762306a36Sopenharmony_ci  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
12862306a36Sopenharmony_ci  dmsetup create log --table "$TABLE"
12962306a36Sopenharmony_ci  mkfs.btrfs -f /dev/mapper/log
13062306a36Sopenharmony_ci  dmsetup message log 0 mark mkfs
13162306a36Sopenharmony_ci
13262306a36Sopenharmony_ci  mount /dev/mapper/log /mnt/btrfs-test
13362306a36Sopenharmony_ci  <fsstress to dirty the fs>
13462306a36Sopenharmony_ci  btrfs filesystem balance /mnt/btrfs-test
13562306a36Sopenharmony_ci  umount /mnt/btrfs-test
13662306a36Sopenharmony_ci  dmsetup remove log
13762306a36Sopenharmony_ci
13862306a36Sopenharmony_ci  replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
13962306a36Sopenharmony_ci  btrfsck /dev/sdb
14062306a36Sopenharmony_ci  replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
14162306a36Sopenharmony_ci	--fsck "btrfsck /dev/sdb" --check fua
14262306a36Sopenharmony_ci
14362306a36Sopenharmony_ciAnd that will replay the log until it sees a FUA request, run the fsck command
14462306a36Sopenharmony_ciand if the fsck passes it will replay to the next FUA, until it is completed or
14562306a36Sopenharmony_cithe fsck command exists abnormally.
146