162306a36Sopenharmony_ci====================================== 262306a36Sopenharmony_ciImmutable biovecs and biovec iterators 362306a36Sopenharmony_ci====================================== 462306a36Sopenharmony_ci 562306a36Sopenharmony_ciKent Overstreet <kmo@daterainc.com> 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciAs of 3.13, biovecs should never be modified after a bio has been submitted. 862306a36Sopenharmony_ciInstead, we have a new struct bvec_iter which represents a range of a biovec - 962306a36Sopenharmony_cithe iterator will be modified as the bio is completed, not the biovec. 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ciMore specifically, old code that needed to partially complete a bio would 1262306a36Sopenharmony_ciupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it 1362306a36Sopenharmony_ciended up partway through a biovec, it would increment bv_offset and decrement 1462306a36Sopenharmony_cibv_len by the number of bytes completed in that biovec. 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ciIn the new scheme of things, everything that must be mutated in order to 1762306a36Sopenharmony_cipartially complete a bio is segregated into struct bvec_iter: bi_sector, 1862306a36Sopenharmony_cibi_size and bi_idx have been moved there; and instead of modifying bv_offset 1962306a36Sopenharmony_ciand bv_len, struct bvec_iter has bi_bvec_done, which represents the number of 2062306a36Sopenharmony_cibytes completed in the current bvec. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ciThere are a bunch of new helper macros for hiding the gory details - in 2362306a36Sopenharmony_ciparticular, presenting the illusion of partially completed biovecs so that 2462306a36Sopenharmony_cinormal code doesn't have to deal with bi_bvec_done. 2562306a36Sopenharmony_ci 2662306a36Sopenharmony_ci * Driver code should no longer refer to biovecs directly; we now have 2762306a36Sopenharmony_ci bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs, 2862306a36Sopenharmony_ci constructed from the raw biovecs but taking into account bi_bvec_done and 2962306a36Sopenharmony_ci bi_size. 3062306a36Sopenharmony_ci 3162306a36Sopenharmony_ci bio_for_each_segment() has been updated to take a bvec_iter argument 3262306a36Sopenharmony_ci instead of an integer (that corresponded to bi_idx); for a lot of code the 3362306a36Sopenharmony_ci conversion just required changing the types of the arguments to 3462306a36Sopenharmony_ci bio_for_each_segment(). 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_ci * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a 3762306a36Sopenharmony_ci wrapper around bio_advance_iter() that operates on bio->bi_iter, and also 3862306a36Sopenharmony_ci advances the bio integrity's iter if present. 3962306a36Sopenharmony_ci 4062306a36Sopenharmony_ci There is a lower level advance function - bvec_iter_advance() - which takes 4162306a36Sopenharmony_ci a pointer to a biovec, not a bio; this is used by the bio integrity code. 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ciAs of 5.12 bvec segments with zero bv_len are not supported. 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ciWhat's all this get us? 4662306a36Sopenharmony_ci======================= 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ciHaving a real iterator, and making biovecs immutable, has a number of 4962306a36Sopenharmony_ciadvantages: 5062306a36Sopenharmony_ci 5162306a36Sopenharmony_ci * Before, iterating over bios was very awkward when you weren't processing 5262306a36Sopenharmony_ci exactly one bvec at a time - for example, bio_copy_data() in block/bio.c, 5362306a36Sopenharmony_ci which copies the contents of one bio into another. Because the biovecs 5462306a36Sopenharmony_ci wouldn't necessarily be the same size, the old code was tricky convoluted - 5562306a36Sopenharmony_ci it had to walk two different bios at the same time, keeping both bi_idx and 5662306a36Sopenharmony_ci and offset into the current biovec for each. 5762306a36Sopenharmony_ci 5862306a36Sopenharmony_ci The new code is much more straightforward - have a look. This sort of 5962306a36Sopenharmony_ci pattern comes up in a lot of places; a lot of drivers were essentially open 6062306a36Sopenharmony_ci coding bvec iterators before, and having common implementation considerably 6162306a36Sopenharmony_ci simplifies a lot of code. 6262306a36Sopenharmony_ci 6362306a36Sopenharmony_ci * Before, any code that might need to use the biovec after the bio had been 6462306a36Sopenharmony_ci completed (perhaps to copy the data somewhere else, or perhaps to resubmit 6562306a36Sopenharmony_ci it somewhere else if there was an error) had to save the entire bvec array 6662306a36Sopenharmony_ci - again, this was being done in a fair number of places. 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ci * Biovecs can be shared between multiple bios - a bvec iter can represent an 6962306a36Sopenharmony_ci arbitrary range of an existing biovec, both starting and ending midway 7062306a36Sopenharmony_ci through biovecs. This is what enables efficient splitting of arbitrary 7162306a36Sopenharmony_ci bios. Note that this means we _only_ use bi_size to determine when we've 7262306a36Sopenharmony_ci reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes 7362306a36Sopenharmony_ci bi_size into account when constructing biovecs. 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_ci * Splitting bios is now much simpler. The old bio_split() didn't even work on 7662306a36Sopenharmony_ci bios with more than a single bvec! Now, we can efficiently split arbitrary 7762306a36Sopenharmony_ci size bios - because the new bio can share the old bio's biovec. 7862306a36Sopenharmony_ci 7962306a36Sopenharmony_ci Care must be taken to ensure the biovec isn't freed while the split bio is 8062306a36Sopenharmony_ci still using it, in case the original bio completes first, though. Using 8162306a36Sopenharmony_ci bio_chain() when splitting bios helps with this. 8262306a36Sopenharmony_ci 8362306a36Sopenharmony_ci * Submitting partially completed bios is now perfectly fine - this comes up 8462306a36Sopenharmony_ci occasionally in stacking block drivers and various code (e.g. md and 8562306a36Sopenharmony_ci bcache) had some ugly workarounds for this. 8662306a36Sopenharmony_ci 8762306a36Sopenharmony_ci It used to be the case that submitting a partially completed bio would work 8862306a36Sopenharmony_ci fine to _most_ devices, but since accessing the raw bvec array was the 8962306a36Sopenharmony_ci norm, not all drivers would respect bi_idx and those would break. Now, 9062306a36Sopenharmony_ci since all drivers _must_ go through the bvec iterator - and have been 9162306a36Sopenharmony_ci audited to make sure they are - submitting partially completed bios is 9262306a36Sopenharmony_ci perfectly fine. 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ciOther implications: 9562306a36Sopenharmony_ci=================== 9662306a36Sopenharmony_ci 9762306a36Sopenharmony_ci * Almost all usage of bi_idx is now incorrect and has been removed; instead, 9862306a36Sopenharmony_ci where previously you would have used bi_idx you'd now use a bvec_iter, 9962306a36Sopenharmony_ci probably passing it to one of the helper macros. 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_ci I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you 10262306a36Sopenharmony_ci now use bio_iter_iovec(), which takes a bvec_iter and returns a 10362306a36Sopenharmony_ci literal struct bio_vec - constructed on the fly from the raw biovec but 10462306a36Sopenharmony_ci taking into account bi_bvec_done (and bi_size). 10562306a36Sopenharmony_ci 10662306a36Sopenharmony_ci * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that 10762306a36Sopenharmony_ci doesn't actually own the bio. The reason is twofold: firstly, it's not 10862306a36Sopenharmony_ci actually needed for iterating over the bio anymore - we only use bi_size. 10962306a36Sopenharmony_ci Secondly, when cloning a bio and reusing (a portion of) the original bio's 11062306a36Sopenharmony_ci biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate 11162306a36Sopenharmony_ci over all the biovecs in the new bio - which is silly as it's not needed. 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ci So, don't use bi_vcnt anymore. 11462306a36Sopenharmony_ci 11562306a36Sopenharmony_ci * The current interface allows the block layer to split bios as needed, so we 11662306a36Sopenharmony_ci could eliminate a lot of complexity particularly in stacked drivers. Code 11762306a36Sopenharmony_ci that creates bios can then create whatever size bios are convenient, and 11862306a36Sopenharmony_ci more importantly stacked drivers don't have to deal with both their own bio 11962306a36Sopenharmony_ci size limitations and the limitations of the underlying devices. Thus 12062306a36Sopenharmony_ci there's no need to define ->merge_bvec_fn() callbacks for individual block 12162306a36Sopenharmony_ci drivers. 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ciUsage of helpers: 12462306a36Sopenharmony_ci================= 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ci* The following helpers whose names have the suffix of `_all` can only be used 12762306a36Sopenharmony_ci on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers 12862306a36Sopenharmony_ci shouldn't use them because the bio may have been split before it reached the 12962306a36Sopenharmony_ci driver. 13062306a36Sopenharmony_ci 13162306a36Sopenharmony_ci:: 13262306a36Sopenharmony_ci 13362306a36Sopenharmony_ci bio_for_each_segment_all() 13462306a36Sopenharmony_ci bio_for_each_bvec_all() 13562306a36Sopenharmony_ci bio_first_bvec_all() 13662306a36Sopenharmony_ci bio_first_page_all() 13762306a36Sopenharmony_ci bio_first_folio_all() 13862306a36Sopenharmony_ci bio_last_bvec_all() 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ci* The following helpers iterate over single-page segment. The passed 'struct 14162306a36Sopenharmony_ci bio_vec' will contain a single-page IO vector during the iteration:: 14262306a36Sopenharmony_ci 14362306a36Sopenharmony_ci bio_for_each_segment() 14462306a36Sopenharmony_ci bio_for_each_segment_all() 14562306a36Sopenharmony_ci 14662306a36Sopenharmony_ci* The following helpers iterate over multi-page bvec. The passed 'struct 14762306a36Sopenharmony_ci bio_vec' will contain a multi-page IO vector during the iteration:: 14862306a36Sopenharmony_ci 14962306a36Sopenharmony_ci bio_for_each_bvec() 15062306a36Sopenharmony_ci bio_for_each_bvec_all() 15162306a36Sopenharmony_ci rq_for_each_bvec() 152