162306a36Sopenharmony_ci======================================
262306a36Sopenharmony_ciImmutable biovecs and biovec iterators
362306a36Sopenharmony_ci======================================
462306a36Sopenharmony_ci
562306a36Sopenharmony_ciKent Overstreet <kmo@daterainc.com>
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciAs of 3.13, biovecs should never be modified after a bio has been submitted.
862306a36Sopenharmony_ciInstead, we have a new struct bvec_iter which represents a range of a biovec -
962306a36Sopenharmony_cithe iterator will be modified as the bio is completed, not the biovec.
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ciMore specifically, old code that needed to partially complete a bio would
1262306a36Sopenharmony_ciupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it
1362306a36Sopenharmony_ciended up partway through a biovec, it would increment bv_offset and decrement
1462306a36Sopenharmony_cibv_len by the number of bytes completed in that biovec.
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ciIn the new scheme of things, everything that must be mutated in order to
1762306a36Sopenharmony_cipartially complete a bio is segregated into struct bvec_iter: bi_sector,
1862306a36Sopenharmony_cibi_size and bi_idx have been moved there; and instead of modifying bv_offset
1962306a36Sopenharmony_ciand bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
2062306a36Sopenharmony_cibytes completed in the current bvec.
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ciThere are a bunch of new helper macros for hiding the gory details - in
2362306a36Sopenharmony_ciparticular, presenting the illusion of partially completed biovecs so that
2462306a36Sopenharmony_cinormal code doesn't have to deal with bi_bvec_done.
2562306a36Sopenharmony_ci
2662306a36Sopenharmony_ci * Driver code should no longer refer to biovecs directly; we now have
2762306a36Sopenharmony_ci   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
2862306a36Sopenharmony_ci   constructed from the raw biovecs but taking into account bi_bvec_done and
2962306a36Sopenharmony_ci   bi_size.
3062306a36Sopenharmony_ci
3162306a36Sopenharmony_ci   bio_for_each_segment() has been updated to take a bvec_iter argument
3262306a36Sopenharmony_ci   instead of an integer (that corresponded to bi_idx); for a lot of code the
3362306a36Sopenharmony_ci   conversion just required changing the types of the arguments to
3462306a36Sopenharmony_ci   bio_for_each_segment().
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_ci * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
3762306a36Sopenharmony_ci   wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
3862306a36Sopenharmony_ci   advances the bio integrity's iter if present.
3962306a36Sopenharmony_ci
4062306a36Sopenharmony_ci   There is a lower level advance function - bvec_iter_advance() - which takes
4162306a36Sopenharmony_ci   a pointer to a biovec, not a bio; this is used by the bio integrity code.
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ciAs of 5.12 bvec segments with zero bv_len are not supported.
4462306a36Sopenharmony_ci
4562306a36Sopenharmony_ciWhat's all this get us?
4662306a36Sopenharmony_ci=======================
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ciHaving a real iterator, and making biovecs immutable, has a number of
4962306a36Sopenharmony_ciadvantages:
5062306a36Sopenharmony_ci
5162306a36Sopenharmony_ci * Before, iterating over bios was very awkward when you weren't processing
5262306a36Sopenharmony_ci   exactly one bvec at a time - for example, bio_copy_data() in block/bio.c,
5362306a36Sopenharmony_ci   which copies the contents of one bio into another. Because the biovecs
5462306a36Sopenharmony_ci   wouldn't necessarily be the same size, the old code was tricky convoluted -
5562306a36Sopenharmony_ci   it had to walk two different bios at the same time, keeping both bi_idx and
5662306a36Sopenharmony_ci   and offset into the current biovec for each.
5762306a36Sopenharmony_ci
5862306a36Sopenharmony_ci   The new code is much more straightforward - have a look. This sort of
5962306a36Sopenharmony_ci   pattern comes up in a lot of places; a lot of drivers were essentially open
6062306a36Sopenharmony_ci   coding bvec iterators before, and having common implementation considerably
6162306a36Sopenharmony_ci   simplifies a lot of code.
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ci * Before, any code that might need to use the biovec after the bio had been
6462306a36Sopenharmony_ci   completed (perhaps to copy the data somewhere else, or perhaps to resubmit
6562306a36Sopenharmony_ci   it somewhere else if there was an error) had to save the entire bvec array
6662306a36Sopenharmony_ci   - again, this was being done in a fair number of places.
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ci * Biovecs can be shared between multiple bios - a bvec iter can represent an
6962306a36Sopenharmony_ci   arbitrary range of an existing biovec, both starting and ending midway
7062306a36Sopenharmony_ci   through biovecs. This is what enables efficient splitting of arbitrary
7162306a36Sopenharmony_ci   bios. Note that this means we _only_ use bi_size to determine when we've
7262306a36Sopenharmony_ci   reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
7362306a36Sopenharmony_ci   bi_size into account when constructing biovecs.
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_ci * Splitting bios is now much simpler. The old bio_split() didn't even work on
7662306a36Sopenharmony_ci   bios with more than a single bvec! Now, we can efficiently split arbitrary
7762306a36Sopenharmony_ci   size bios - because the new bio can share the old bio's biovec.
7862306a36Sopenharmony_ci
7962306a36Sopenharmony_ci   Care must be taken to ensure the biovec isn't freed while the split bio is
8062306a36Sopenharmony_ci   still using it, in case the original bio completes first, though. Using
8162306a36Sopenharmony_ci   bio_chain() when splitting bios helps with this.
8262306a36Sopenharmony_ci
8362306a36Sopenharmony_ci * Submitting partially completed bios is now perfectly fine - this comes up
8462306a36Sopenharmony_ci   occasionally in stacking block drivers and various code (e.g. md and
8562306a36Sopenharmony_ci   bcache) had some ugly workarounds for this.
8662306a36Sopenharmony_ci
8762306a36Sopenharmony_ci   It used to be the case that submitting a partially completed bio would work
8862306a36Sopenharmony_ci   fine to _most_ devices, but since accessing the raw bvec array was the
8962306a36Sopenharmony_ci   norm, not all drivers would respect bi_idx and those would break. Now,
9062306a36Sopenharmony_ci   since all drivers _must_ go through the bvec iterator - and have been
9162306a36Sopenharmony_ci   audited to make sure they are - submitting partially completed bios is
9262306a36Sopenharmony_ci   perfectly fine.
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ciOther implications:
9562306a36Sopenharmony_ci===================
9662306a36Sopenharmony_ci
9762306a36Sopenharmony_ci * Almost all usage of bi_idx is now incorrect and has been removed; instead,
9862306a36Sopenharmony_ci   where previously you would have used bi_idx you'd now use a bvec_iter,
9962306a36Sopenharmony_ci   probably passing it to one of the helper macros.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_ci   I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
10262306a36Sopenharmony_ci   now use bio_iter_iovec(), which takes a bvec_iter and returns a
10362306a36Sopenharmony_ci   literal struct bio_vec - constructed on the fly from the raw biovec but
10462306a36Sopenharmony_ci   taking into account bi_bvec_done (and bi_size).
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ci * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
10762306a36Sopenharmony_ci   doesn't actually own the bio. The reason is twofold: firstly, it's not
10862306a36Sopenharmony_ci   actually needed for iterating over the bio anymore - we only use bi_size.
10962306a36Sopenharmony_ci   Secondly, when cloning a bio and reusing (a portion of) the original bio's
11062306a36Sopenharmony_ci   biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
11162306a36Sopenharmony_ci   over all the biovecs in the new bio - which is silly as it's not needed.
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ci   So, don't use bi_vcnt anymore.
11462306a36Sopenharmony_ci
11562306a36Sopenharmony_ci * The current interface allows the block layer to split bios as needed, so we
11662306a36Sopenharmony_ci   could eliminate a lot of complexity particularly in stacked drivers. Code
11762306a36Sopenharmony_ci   that creates bios can then create whatever size bios are convenient, and
11862306a36Sopenharmony_ci   more importantly stacked drivers don't have to deal with both their own bio
11962306a36Sopenharmony_ci   size limitations and the limitations of the underlying devices. Thus
12062306a36Sopenharmony_ci   there's no need to define ->merge_bvec_fn() callbacks for individual block
12162306a36Sopenharmony_ci   drivers.
12262306a36Sopenharmony_ci
12362306a36Sopenharmony_ciUsage of helpers:
12462306a36Sopenharmony_ci=================
12562306a36Sopenharmony_ci
12662306a36Sopenharmony_ci* The following helpers whose names have the suffix of `_all` can only be used
12762306a36Sopenharmony_ci  on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
12862306a36Sopenharmony_ci  shouldn't use them because the bio may have been split before it reached the
12962306a36Sopenharmony_ci  driver.
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ci::
13262306a36Sopenharmony_ci
13362306a36Sopenharmony_ci	bio_for_each_segment_all()
13462306a36Sopenharmony_ci	bio_for_each_bvec_all()
13562306a36Sopenharmony_ci	bio_first_bvec_all()
13662306a36Sopenharmony_ci	bio_first_page_all()
13762306a36Sopenharmony_ci	bio_first_folio_all()
13862306a36Sopenharmony_ci	bio_last_bvec_all()
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ci* The following helpers iterate over single-page segment. The passed 'struct
14162306a36Sopenharmony_ci  bio_vec' will contain a single-page IO vector during the iteration::
14262306a36Sopenharmony_ci
14362306a36Sopenharmony_ci	bio_for_each_segment()
14462306a36Sopenharmony_ci	bio_for_each_segment_all()
14562306a36Sopenharmony_ci
14662306a36Sopenharmony_ci* The following helpers iterate over multi-page bvec. The passed 'struct
14762306a36Sopenharmony_ci  bio_vec' will contain a multi-page IO vector during the iteration::
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ci	bio_for_each_bvec()
15062306a36Sopenharmony_ci	bio_for_each_bvec_all()
15162306a36Sopenharmony_ci	rq_for_each_bvec()
152