18c2ecf20Sopenharmony_ci=============== 28c2ecf20Sopenharmony_ciPersistent data 38c2ecf20Sopenharmony_ci=============== 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciIntroduction 68c2ecf20Sopenharmony_ci============ 78c2ecf20Sopenharmony_ci 88c2ecf20Sopenharmony_ciThe more-sophisticated device-mapper targets require complex metadata 98c2ecf20Sopenharmony_cithat is managed in kernel. In late 2010 we were seeing that various 108c2ecf20Sopenharmony_cidifferent targets were rolling their own data structures, for example: 118c2ecf20Sopenharmony_ci 128c2ecf20Sopenharmony_ci- Mikulas Patocka's multisnap implementation 138c2ecf20Sopenharmony_ci- Heinz Mauelshagen's thin provisioning target 148c2ecf20Sopenharmony_ci- Another btree-based caching target posted to dm-devel 158c2ecf20Sopenharmony_ci- Another multi-snapshot target based on a design of Daniel Phillips 168c2ecf20Sopenharmony_ci 178c2ecf20Sopenharmony_ciMaintaining these data structures takes a lot of work, so if possible 188c2ecf20Sopenharmony_ciwe'd like to reduce the number. 198c2ecf20Sopenharmony_ci 208c2ecf20Sopenharmony_ciThe persistent-data library is an attempt to provide a re-usable 218c2ecf20Sopenharmony_ciframework for people who want to store metadata in device-mapper 228c2ecf20Sopenharmony_citargets. It's currently used by the thin-provisioning target and an 238c2ecf20Sopenharmony_ciupcoming hierarchical storage target. 248c2ecf20Sopenharmony_ci 258c2ecf20Sopenharmony_ciOverview 268c2ecf20Sopenharmony_ci======== 278c2ecf20Sopenharmony_ci 288c2ecf20Sopenharmony_ciThe main documentation is in the header files which can all be found 298c2ecf20Sopenharmony_ciunder drivers/md/persistent-data. 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ciThe block manager 328c2ecf20Sopenharmony_ci----------------- 338c2ecf20Sopenharmony_ci 348c2ecf20Sopenharmony_cidm-block-manager.[hc] 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ciThis provides access to the data on disk in fixed sized-blocks. There 378c2ecf20Sopenharmony_ciis a read/write locking interface to prevent concurrent accesses, and 388c2ecf20Sopenharmony_cikeep data that is being used in the cache. 398c2ecf20Sopenharmony_ci 408c2ecf20Sopenharmony_ciClients of persistent-data are unlikely to use this directly. 418c2ecf20Sopenharmony_ci 428c2ecf20Sopenharmony_ciThe transaction manager 438c2ecf20Sopenharmony_ci----------------------- 448c2ecf20Sopenharmony_ci 458c2ecf20Sopenharmony_cidm-transaction-manager.[hc] 468c2ecf20Sopenharmony_ci 478c2ecf20Sopenharmony_ciThis restricts access to blocks and enforces copy-on-write semantics. 488c2ecf20Sopenharmony_ciThe only way you can get hold of a writable block through the 498c2ecf20Sopenharmony_citransaction manager is by shadowing an existing block (ie. doing 508c2ecf20Sopenharmony_cicopy-on-write) or allocating a fresh one. Shadowing is elided within 518c2ecf20Sopenharmony_cithe same transaction so performance is reasonable. The commit method 528c2ecf20Sopenharmony_ciensures that all data is flushed before it writes the superblock. 538c2ecf20Sopenharmony_ciOn power failure your metadata will be as it was when last committed. 548c2ecf20Sopenharmony_ci 558c2ecf20Sopenharmony_ciThe Space Maps 568c2ecf20Sopenharmony_ci-------------- 578c2ecf20Sopenharmony_ci 588c2ecf20Sopenharmony_cidm-space-map.h 598c2ecf20Sopenharmony_cidm-space-map-metadata.[hc] 608c2ecf20Sopenharmony_cidm-space-map-disk.[hc] 618c2ecf20Sopenharmony_ci 628c2ecf20Sopenharmony_ciOn-disk data structures that keep track of reference counts of blocks. 638c2ecf20Sopenharmony_ciAlso acts as the allocator of new blocks. Currently two 648c2ecf20Sopenharmony_ciimplementations: a simpler one for managing blocks on a different 658c2ecf20Sopenharmony_cidevice (eg. thinly-provisioned data blocks); and one for managing 668c2ecf20Sopenharmony_cithe metadata space. The latter is complicated by the need to store 678c2ecf20Sopenharmony_ciits own data within the space it's managing. 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_ciThe data structures 708c2ecf20Sopenharmony_ci------------------- 718c2ecf20Sopenharmony_ci 728c2ecf20Sopenharmony_cidm-btree.[hc] 738c2ecf20Sopenharmony_cidm-btree-remove.c 748c2ecf20Sopenharmony_cidm-btree-spine.c 758c2ecf20Sopenharmony_cidm-btree-internal.h 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ciCurrently there is only one data structure, a hierarchical btree. 788c2ecf20Sopenharmony_ciThere are plans to add more. For example, something with an 798c2ecf20Sopenharmony_ciarray-like interface would see a lot of use. 808c2ecf20Sopenharmony_ci 818c2ecf20Sopenharmony_ciThe btree is 'hierarchical' in that you can define it to be composed 828c2ecf20Sopenharmony_ciof nested btrees, and take multiple keys. For example, the 838c2ecf20Sopenharmony_cithin-provisioning target uses a btree with two levels of nesting. 848c2ecf20Sopenharmony_ciThe first maps a device id to a mapping tree, and that in turn maps a 858c2ecf20Sopenharmony_civirtual block to a physical block. 868c2ecf20Sopenharmony_ci 878c2ecf20Sopenharmony_ciValues stored in the btrees can have arbitrary size. Keys are always 888c2ecf20Sopenharmony_ci64bits, although nesting allows you to use multiple keys. 89