18c2ecf20Sopenharmony_ci====================================== 28c2ecf20Sopenharmony_ciImmutable biovecs and biovec iterators 38c2ecf20Sopenharmony_ci====================================== 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciKent Overstreet <kmo@daterainc.com> 68c2ecf20Sopenharmony_ci 78c2ecf20Sopenharmony_ciAs of 3.13, biovecs should never be modified after a bio has been submitted. 88c2ecf20Sopenharmony_ciInstead, we have a new struct bvec_iter which represents a range of a biovec - 98c2ecf20Sopenharmony_cithe iterator will be modified as the bio is completed, not the biovec. 108c2ecf20Sopenharmony_ci 118c2ecf20Sopenharmony_ciMore specifically, old code that needed to partially complete a bio would 128c2ecf20Sopenharmony_ciupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it 138c2ecf20Sopenharmony_ciended up partway through a biovec, it would increment bv_offset and decrement 148c2ecf20Sopenharmony_cibv_len by the number of bytes completed in that biovec. 158c2ecf20Sopenharmony_ci 168c2ecf20Sopenharmony_ciIn the new scheme of things, everything that must be mutated in order to 178c2ecf20Sopenharmony_cipartially complete a bio is segregated into struct bvec_iter: bi_sector, 188c2ecf20Sopenharmony_cibi_size and bi_idx have been moved there; and instead of modifying bv_offset 198c2ecf20Sopenharmony_ciand bv_len, struct bvec_iter has bi_bvec_done, which represents the number of 208c2ecf20Sopenharmony_cibytes completed in the current bvec. 218c2ecf20Sopenharmony_ci 228c2ecf20Sopenharmony_ciThere are a bunch of new helper macros for hiding the gory details - in 238c2ecf20Sopenharmony_ciparticular, presenting the illusion of partially completed biovecs so that 248c2ecf20Sopenharmony_cinormal code doesn't have to deal with bi_bvec_done. 258c2ecf20Sopenharmony_ci 268c2ecf20Sopenharmony_ci * Driver code should no longer refer to biovecs directly; we now have 278c2ecf20Sopenharmony_ci bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs, 288c2ecf20Sopenharmony_ci constructed from the raw biovecs but taking into account bi_bvec_done and 298c2ecf20Sopenharmony_ci bi_size. 308c2ecf20Sopenharmony_ci 318c2ecf20Sopenharmony_ci bio_for_each_segment() has been updated to take a bvec_iter argument 328c2ecf20Sopenharmony_ci instead of an integer (that corresponded to bi_idx); for a lot of code the 338c2ecf20Sopenharmony_ci conversion just required changing the types of the arguments to 348c2ecf20Sopenharmony_ci bio_for_each_segment(). 358c2ecf20Sopenharmony_ci 368c2ecf20Sopenharmony_ci * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a 378c2ecf20Sopenharmony_ci wrapper around bio_advance_iter() that operates on bio->bi_iter, and also 388c2ecf20Sopenharmony_ci advances the bio integrity's iter if present. 398c2ecf20Sopenharmony_ci 408c2ecf20Sopenharmony_ci There is a lower level advance function - bvec_iter_advance() - which takes 418c2ecf20Sopenharmony_ci a pointer to a biovec, not a bio; this is used by the bio integrity code. 428c2ecf20Sopenharmony_ci 438c2ecf20Sopenharmony_ciWhat's all this get us? 448c2ecf20Sopenharmony_ci======================= 458c2ecf20Sopenharmony_ci 468c2ecf20Sopenharmony_ciHaving a real iterator, and making biovecs immutable, has a number of 478c2ecf20Sopenharmony_ciadvantages: 488c2ecf20Sopenharmony_ci 498c2ecf20Sopenharmony_ci * Before, iterating over bios was very awkward when you weren't processing 508c2ecf20Sopenharmony_ci exactly one bvec at a time - for example, bio_copy_data() in block/bio.c, 518c2ecf20Sopenharmony_ci which copies the contents of one bio into another. Because the biovecs 528c2ecf20Sopenharmony_ci wouldn't necessarily be the same size, the old code was tricky convoluted - 538c2ecf20Sopenharmony_ci it had to walk two different bios at the same time, keeping both bi_idx and 548c2ecf20Sopenharmony_ci and offset into the current biovec for each. 558c2ecf20Sopenharmony_ci 568c2ecf20Sopenharmony_ci The new code is much more straightforward - have a look. This sort of 578c2ecf20Sopenharmony_ci pattern comes up in a lot of places; a lot of drivers were essentially open 588c2ecf20Sopenharmony_ci coding bvec iterators before, and having common implementation considerably 598c2ecf20Sopenharmony_ci simplifies a lot of code. 608c2ecf20Sopenharmony_ci 618c2ecf20Sopenharmony_ci * Before, any code that might need to use the biovec after the bio had been 628c2ecf20Sopenharmony_ci completed (perhaps to copy the data somewhere else, or perhaps to resubmit 638c2ecf20Sopenharmony_ci it somewhere else if there was an error) had to save the entire bvec array 648c2ecf20Sopenharmony_ci - again, this was being done in a fair number of places. 658c2ecf20Sopenharmony_ci 668c2ecf20Sopenharmony_ci * Biovecs can be shared between multiple bios - a bvec iter can represent an 678c2ecf20Sopenharmony_ci arbitrary range of an existing biovec, both starting and ending midway 688c2ecf20Sopenharmony_ci through biovecs. This is what enables efficient splitting of arbitrary 698c2ecf20Sopenharmony_ci bios. Note that this means we _only_ use bi_size to determine when we've 708c2ecf20Sopenharmony_ci reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes 718c2ecf20Sopenharmony_ci bi_size into account when constructing biovecs. 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ci * Splitting bios is now much simpler. The old bio_split() didn't even work on 748c2ecf20Sopenharmony_ci bios with more than a single bvec! Now, we can efficiently split arbitrary 758c2ecf20Sopenharmony_ci size bios - because the new bio can share the old bio's biovec. 768c2ecf20Sopenharmony_ci 778c2ecf20Sopenharmony_ci Care must be taken to ensure the biovec isn't freed while the split bio is 788c2ecf20Sopenharmony_ci still using it, in case the original bio completes first, though. Using 798c2ecf20Sopenharmony_ci bio_chain() when splitting bios helps with this. 808c2ecf20Sopenharmony_ci 818c2ecf20Sopenharmony_ci * Submitting partially completed bios is now perfectly fine - this comes up 828c2ecf20Sopenharmony_ci occasionally in stacking block drivers and various code (e.g. md and 838c2ecf20Sopenharmony_ci bcache) had some ugly workarounds for this. 848c2ecf20Sopenharmony_ci 858c2ecf20Sopenharmony_ci It used to be the case that submitting a partially completed bio would work 868c2ecf20Sopenharmony_ci fine to _most_ devices, but since accessing the raw bvec array was the 878c2ecf20Sopenharmony_ci norm, not all drivers would respect bi_idx and those would break. Now, 888c2ecf20Sopenharmony_ci since all drivers _must_ go through the bvec iterator - and have been 898c2ecf20Sopenharmony_ci audited to make sure they are - submitting partially completed bios is 908c2ecf20Sopenharmony_ci perfectly fine. 918c2ecf20Sopenharmony_ci 928c2ecf20Sopenharmony_ciOther implications: 938c2ecf20Sopenharmony_ci=================== 948c2ecf20Sopenharmony_ci 958c2ecf20Sopenharmony_ci * Almost all usage of bi_idx is now incorrect and has been removed; instead, 968c2ecf20Sopenharmony_ci where previously you would have used bi_idx you'd now use a bvec_iter, 978c2ecf20Sopenharmony_ci probably passing it to one of the helper macros. 988c2ecf20Sopenharmony_ci 998c2ecf20Sopenharmony_ci I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you 1008c2ecf20Sopenharmony_ci now use bio_iter_iovec(), which takes a bvec_iter and returns a 1018c2ecf20Sopenharmony_ci literal struct bio_vec - constructed on the fly from the raw biovec but 1028c2ecf20Sopenharmony_ci taking into account bi_bvec_done (and bi_size). 1038c2ecf20Sopenharmony_ci 1048c2ecf20Sopenharmony_ci * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that 1058c2ecf20Sopenharmony_ci doesn't actually own the bio. The reason is twofold: firstly, it's not 1068c2ecf20Sopenharmony_ci actually needed for iterating over the bio anymore - we only use bi_size. 1078c2ecf20Sopenharmony_ci Secondly, when cloning a bio and reusing (a portion of) the original bio's 1088c2ecf20Sopenharmony_ci biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate 1098c2ecf20Sopenharmony_ci over all the biovecs in the new bio - which is silly as it's not needed. 1108c2ecf20Sopenharmony_ci 1118c2ecf20Sopenharmony_ci So, don't use bi_vcnt anymore. 1128c2ecf20Sopenharmony_ci 1138c2ecf20Sopenharmony_ci * The current interface allows the block layer to split bios as needed, so we 1148c2ecf20Sopenharmony_ci could eliminate a lot of complexity particularly in stacked drivers. Code 1158c2ecf20Sopenharmony_ci that creates bios can then create whatever size bios are convenient, and 1168c2ecf20Sopenharmony_ci more importantly stacked drivers don't have to deal with both their own bio 1178c2ecf20Sopenharmony_ci size limitations and the limitations of the underlying devices. Thus 1188c2ecf20Sopenharmony_ci there's no need to define ->merge_bvec_fn() callbacks for individual block 1198c2ecf20Sopenharmony_ci drivers. 1208c2ecf20Sopenharmony_ci 1218c2ecf20Sopenharmony_ciUsage of helpers: 1228c2ecf20Sopenharmony_ci================= 1238c2ecf20Sopenharmony_ci 1248c2ecf20Sopenharmony_ci* The following helpers whose names have the suffix of `_all` can only be used 1258c2ecf20Sopenharmony_ci on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers 1268c2ecf20Sopenharmony_ci shouldn't use them because the bio may have been split before it reached the 1278c2ecf20Sopenharmony_ci driver. 1288c2ecf20Sopenharmony_ci 1298c2ecf20Sopenharmony_ci:: 1308c2ecf20Sopenharmony_ci 1318c2ecf20Sopenharmony_ci bio_for_each_segment_all() 1328c2ecf20Sopenharmony_ci bio_for_each_bvec_all() 1338c2ecf20Sopenharmony_ci bio_first_bvec_all() 1348c2ecf20Sopenharmony_ci bio_first_page_all() 1358c2ecf20Sopenharmony_ci bio_last_bvec_all() 1368c2ecf20Sopenharmony_ci 1378c2ecf20Sopenharmony_ci* The following helpers iterate over single-page segment. The passed 'struct 1388c2ecf20Sopenharmony_ci bio_vec' will contain a single-page IO vector during the iteration:: 1398c2ecf20Sopenharmony_ci 1408c2ecf20Sopenharmony_ci bio_for_each_segment() 1418c2ecf20Sopenharmony_ci bio_for_each_segment_all() 1428c2ecf20Sopenharmony_ci 1438c2ecf20Sopenharmony_ci* The following helpers iterate over multi-page bvec. The passed 'struct 1448c2ecf20Sopenharmony_ci bio_vec' will contain a multi-page IO vector during the iteration:: 1458c2ecf20Sopenharmony_ci 1468c2ecf20Sopenharmony_ci bio_for_each_bvec() 1478c2ecf20Sopenharmony_ci bio_for_each_bvec_all() 1488c2ecf20Sopenharmony_ci rq_for_each_bvec() 149