162306a36Sopenharmony_ci============= 262306a36Sopenharmony_cidm-log-writes 362306a36Sopenharmony_ci============= 462306a36Sopenharmony_ci 562306a36Sopenharmony_ciThis target takes 2 devices, one to pass all IO to normally, and one to log all 662306a36Sopenharmony_ciof the write operations to. This is intended for file system developers wishing 762306a36Sopenharmony_cito verify the integrity of metadata or data as the file system is written to. 862306a36Sopenharmony_ciThere is a log_write_entry written for every WRITE request and the target is 962306a36Sopenharmony_ciable to take arbitrary data from userspace to insert into the log. The data 1062306a36Sopenharmony_cithat is in the WRITE requests is copied into the log to make the replay happen 1162306a36Sopenharmony_ciexactly as it happened originally. 1262306a36Sopenharmony_ci 1362306a36Sopenharmony_ciLog Ordering 1462306a36Sopenharmony_ci============ 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ciWe log things in order of completion once we are sure the write is no longer in 1762306a36Sopenharmony_cicache. This means that normal WRITE requests are not actually logged until the 1862306a36Sopenharmony_cinext REQ_PREFLUSH request. This is to make it easier for userspace to replay 1962306a36Sopenharmony_cithe log in a way that correlates to what is on disk and not what is in cache, 2062306a36Sopenharmony_cito make it easier to detect improper waiting/flushing. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciThis works by attaching all WRITE requests to a list once the write completes. 2362306a36Sopenharmony_ciOnce we see a REQ_PREFLUSH request we splice this list onto the request and once 2462306a36Sopenharmony_cithe FLUSH request completes we log all of the WRITEs and then the FLUSH. Only 2562306a36Sopenharmony_cicompleted WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to 2662306a36Sopenharmony_cisimulate the worst case scenario with regard to power failures. Consider the 2762306a36Sopenharmony_cifollowing example (W means write, C means complete): 2862306a36Sopenharmony_ci 2962306a36Sopenharmony_ci W1,W2,W3,C3,C2,Wflush,C1,Cflush 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ciThe log would show the following: 3262306a36Sopenharmony_ci 3362306a36Sopenharmony_ci W3,W2,flush,W1.... 3462306a36Sopenharmony_ci 3562306a36Sopenharmony_ciAgain this is to simulate what is actually on disk, this allows us to detect 3662306a36Sopenharmony_cicases where a power failure at a particular point in time would create an 3762306a36Sopenharmony_ciinconsistent file system. 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ciAny REQ_FUA requests bypass this flushing mechanism and are logged as soon as 4062306a36Sopenharmony_cithey complete as those requests will obviously bypass the device cache. 4162306a36Sopenharmony_ci 4262306a36Sopenharmony_ciAny REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would 4362306a36Sopenharmony_cihave all the DISCARD requests, and then the WRITE requests and then the FLUSH 4462306a36Sopenharmony_cirequest. Consider the following example: 4562306a36Sopenharmony_ci 4662306a36Sopenharmony_ci WRITE block 1, DISCARD block 1, FLUSH 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ciIf we logged DISCARD when it completed, the replay would look like this: 4962306a36Sopenharmony_ci 5062306a36Sopenharmony_ci DISCARD 1, WRITE 1, FLUSH 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_ciwhich isn't quite what happened and wouldn't be caught during the log replay. 5362306a36Sopenharmony_ci 5462306a36Sopenharmony_ciTarget interface 5562306a36Sopenharmony_ci================ 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_cii) Constructor 5862306a36Sopenharmony_ci 5962306a36Sopenharmony_ci log-writes <dev_path> <log_dev_path> 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ci ============= ============================================== 6262306a36Sopenharmony_ci dev_path Device that all of the IO will go to normally. 6362306a36Sopenharmony_ci log_dev_path Device where the log entries are written to. 6462306a36Sopenharmony_ci ============= ============================================== 6562306a36Sopenharmony_ci 6662306a36Sopenharmony_ciii) Status 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ci <#logged entries> <highest allocated sector> 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_ci =========================== ======================== 7162306a36Sopenharmony_ci #logged entries Number of logged entries 7262306a36Sopenharmony_ci highest allocated sector Highest allocated sector 7362306a36Sopenharmony_ci =========================== ======================== 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ciiii) Messages 7662306a36Sopenharmony_ci 7762306a36Sopenharmony_ci mark <description> 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ci You can use a dmsetup message to set an arbitrary mark in a log. 8062306a36Sopenharmony_ci For example say you want to fsck a file system after every 8162306a36Sopenharmony_ci write, but first you need to replay up to the mkfs to make sure 8262306a36Sopenharmony_ci we're fsck'ing something reasonable, you would do something like 8362306a36Sopenharmony_ci this:: 8462306a36Sopenharmony_ci 8562306a36Sopenharmony_ci mkfs.btrfs -f /dev/mapper/log 8662306a36Sopenharmony_ci dmsetup message log 0 mark mkfs 8762306a36Sopenharmony_ci <run test> 8862306a36Sopenharmony_ci 8962306a36Sopenharmony_ci This would allow you to replay the log up to the mkfs mark and 9062306a36Sopenharmony_ci then replay from that point on doing the fsck check in the 9162306a36Sopenharmony_ci interval that you want. 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ci Every log has a mark at the end labeled "dm-log-writes-end". 9462306a36Sopenharmony_ci 9562306a36Sopenharmony_ciUserspace component 9662306a36Sopenharmony_ci=================== 9762306a36Sopenharmony_ci 9862306a36Sopenharmony_ciThere is a userspace tool that will replay the log for you in various ways. 9962306a36Sopenharmony_ciIt can be found here: https://github.com/josefbacik/log-writes 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ciExample usage 10262306a36Sopenharmony_ci============= 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ciSay you want to test fsync on your file system. You would do something like 10562306a36Sopenharmony_cithis:: 10662306a36Sopenharmony_ci 10762306a36Sopenharmony_ci TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" 10862306a36Sopenharmony_ci dmsetup create log --table "$TABLE" 10962306a36Sopenharmony_ci mkfs.btrfs -f /dev/mapper/log 11062306a36Sopenharmony_ci dmsetup message log 0 mark mkfs 11162306a36Sopenharmony_ci 11262306a36Sopenharmony_ci mount /dev/mapper/log /mnt/btrfs-test 11362306a36Sopenharmony_ci <some test that does fsync at the end> 11462306a36Sopenharmony_ci dmsetup message log 0 mark fsync 11562306a36Sopenharmony_ci md5sum /mnt/btrfs-test/foo 11662306a36Sopenharmony_ci umount /mnt/btrfs-test 11762306a36Sopenharmony_ci 11862306a36Sopenharmony_ci dmsetup remove log 11962306a36Sopenharmony_ci replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync 12062306a36Sopenharmony_ci mount /dev/sdb /mnt/btrfs-test 12162306a36Sopenharmony_ci md5sum /mnt/btrfs-test/foo 12262306a36Sopenharmony_ci <verify md5sum's are correct> 12362306a36Sopenharmony_ci 12462306a36Sopenharmony_ci Another option is to do a complicated file system operation and verify the file 12562306a36Sopenharmony_ci system is consistent during the entire operation. You could do this with: 12662306a36Sopenharmony_ci 12762306a36Sopenharmony_ci TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" 12862306a36Sopenharmony_ci dmsetup create log --table "$TABLE" 12962306a36Sopenharmony_ci mkfs.btrfs -f /dev/mapper/log 13062306a36Sopenharmony_ci dmsetup message log 0 mark mkfs 13162306a36Sopenharmony_ci 13262306a36Sopenharmony_ci mount /dev/mapper/log /mnt/btrfs-test 13362306a36Sopenharmony_ci <fsstress to dirty the fs> 13462306a36Sopenharmony_ci btrfs filesystem balance /mnt/btrfs-test 13562306a36Sopenharmony_ci umount /mnt/btrfs-test 13662306a36Sopenharmony_ci dmsetup remove log 13762306a36Sopenharmony_ci 13862306a36Sopenharmony_ci replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs 13962306a36Sopenharmony_ci btrfsck /dev/sdb 14062306a36Sopenharmony_ci replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ 14162306a36Sopenharmony_ci --fsck "btrfsck /dev/sdb" --check fua 14262306a36Sopenharmony_ci 14362306a36Sopenharmony_ciAnd that will replay the log until it sees a FUA request, run the fsck command 14462306a36Sopenharmony_ciand if the fsck passes it will replay to the next FUA, until it is completed or 14562306a36Sopenharmony_cithe fsck command exists abnormally. 146