18c2ecf20Sopenharmony_ci======================================
28c2ecf20Sopenharmony_ciImmutable biovecs and biovec iterators
38c2ecf20Sopenharmony_ci======================================
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ciKent Overstreet <kmo@daterainc.com>
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ciAs of 3.13, biovecs should never be modified after a bio has been submitted.
88c2ecf20Sopenharmony_ciInstead, we have a new struct bvec_iter which represents a range of a biovec -
98c2ecf20Sopenharmony_cithe iterator will be modified as the bio is completed, not the biovec.
108c2ecf20Sopenharmony_ci
118c2ecf20Sopenharmony_ciMore specifically, old code that needed to partially complete a bio would
128c2ecf20Sopenharmony_ciupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it
138c2ecf20Sopenharmony_ciended up partway through a biovec, it would increment bv_offset and decrement
148c2ecf20Sopenharmony_cibv_len by the number of bytes completed in that biovec.
158c2ecf20Sopenharmony_ci
168c2ecf20Sopenharmony_ciIn the new scheme of things, everything that must be mutated in order to
178c2ecf20Sopenharmony_cipartially complete a bio is segregated into struct bvec_iter: bi_sector,
188c2ecf20Sopenharmony_cibi_size and bi_idx have been moved there; and instead of modifying bv_offset
198c2ecf20Sopenharmony_ciand bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
208c2ecf20Sopenharmony_cibytes completed in the current bvec.
218c2ecf20Sopenharmony_ci
228c2ecf20Sopenharmony_ciThere are a bunch of new helper macros for hiding the gory details - in
238c2ecf20Sopenharmony_ciparticular, presenting the illusion of partially completed biovecs so that
248c2ecf20Sopenharmony_cinormal code doesn't have to deal with bi_bvec_done.
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ci * Driver code should no longer refer to biovecs directly; we now have
278c2ecf20Sopenharmony_ci   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
288c2ecf20Sopenharmony_ci   constructed from the raw biovecs but taking into account bi_bvec_done and
298c2ecf20Sopenharmony_ci   bi_size.
308c2ecf20Sopenharmony_ci
318c2ecf20Sopenharmony_ci   bio_for_each_segment() has been updated to take a bvec_iter argument
328c2ecf20Sopenharmony_ci   instead of an integer (that corresponded to bi_idx); for a lot of code the
338c2ecf20Sopenharmony_ci   conversion just required changing the types of the arguments to
348c2ecf20Sopenharmony_ci   bio_for_each_segment().
358c2ecf20Sopenharmony_ci
368c2ecf20Sopenharmony_ci * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
378c2ecf20Sopenharmony_ci   wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
388c2ecf20Sopenharmony_ci   advances the bio integrity's iter if present.
398c2ecf20Sopenharmony_ci
408c2ecf20Sopenharmony_ci   There is a lower level advance function - bvec_iter_advance() - which takes
418c2ecf20Sopenharmony_ci   a pointer to a biovec, not a bio; this is used by the bio integrity code.
428c2ecf20Sopenharmony_ci
438c2ecf20Sopenharmony_ciWhat's all this get us?
448c2ecf20Sopenharmony_ci=======================
458c2ecf20Sopenharmony_ci
468c2ecf20Sopenharmony_ciHaving a real iterator, and making biovecs immutable, has a number of
478c2ecf20Sopenharmony_ciadvantages:
488c2ecf20Sopenharmony_ci
498c2ecf20Sopenharmony_ci * Before, iterating over bios was very awkward when you weren't processing
508c2ecf20Sopenharmony_ci   exactly one bvec at a time - for example, bio_copy_data() in block/bio.c,
518c2ecf20Sopenharmony_ci   which copies the contents of one bio into another. Because the biovecs
528c2ecf20Sopenharmony_ci   wouldn't necessarily be the same size, the old code was tricky convoluted -
538c2ecf20Sopenharmony_ci   it had to walk two different bios at the same time, keeping both bi_idx and
548c2ecf20Sopenharmony_ci   and offset into the current biovec for each.
558c2ecf20Sopenharmony_ci
568c2ecf20Sopenharmony_ci   The new code is much more straightforward - have a look. This sort of
578c2ecf20Sopenharmony_ci   pattern comes up in a lot of places; a lot of drivers were essentially open
588c2ecf20Sopenharmony_ci   coding bvec iterators before, and having common implementation considerably
598c2ecf20Sopenharmony_ci   simplifies a lot of code.
608c2ecf20Sopenharmony_ci
618c2ecf20Sopenharmony_ci * Before, any code that might need to use the biovec after the bio had been
628c2ecf20Sopenharmony_ci   completed (perhaps to copy the data somewhere else, or perhaps to resubmit
638c2ecf20Sopenharmony_ci   it somewhere else if there was an error) had to save the entire bvec array
648c2ecf20Sopenharmony_ci   - again, this was being done in a fair number of places.
658c2ecf20Sopenharmony_ci
668c2ecf20Sopenharmony_ci * Biovecs can be shared between multiple bios - a bvec iter can represent an
678c2ecf20Sopenharmony_ci   arbitrary range of an existing biovec, both starting and ending midway
688c2ecf20Sopenharmony_ci   through biovecs. This is what enables efficient splitting of arbitrary
698c2ecf20Sopenharmony_ci   bios. Note that this means we _only_ use bi_size to determine when we've
708c2ecf20Sopenharmony_ci   reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
718c2ecf20Sopenharmony_ci   bi_size into account when constructing biovecs.
728c2ecf20Sopenharmony_ci
738c2ecf20Sopenharmony_ci * Splitting bios is now much simpler. The old bio_split() didn't even work on
748c2ecf20Sopenharmony_ci   bios with more than a single bvec! Now, we can efficiently split arbitrary
758c2ecf20Sopenharmony_ci   size bios - because the new bio can share the old bio's biovec.
768c2ecf20Sopenharmony_ci
778c2ecf20Sopenharmony_ci   Care must be taken to ensure the biovec isn't freed while the split bio is
788c2ecf20Sopenharmony_ci   still using it, in case the original bio completes first, though. Using
798c2ecf20Sopenharmony_ci   bio_chain() when splitting bios helps with this.
808c2ecf20Sopenharmony_ci
818c2ecf20Sopenharmony_ci * Submitting partially completed bios is now perfectly fine - this comes up
828c2ecf20Sopenharmony_ci   occasionally in stacking block drivers and various code (e.g. md and
838c2ecf20Sopenharmony_ci   bcache) had some ugly workarounds for this.
848c2ecf20Sopenharmony_ci
858c2ecf20Sopenharmony_ci   It used to be the case that submitting a partially completed bio would work
868c2ecf20Sopenharmony_ci   fine to _most_ devices, but since accessing the raw bvec array was the
878c2ecf20Sopenharmony_ci   norm, not all drivers would respect bi_idx and those would break. Now,
888c2ecf20Sopenharmony_ci   since all drivers _must_ go through the bvec iterator - and have been
898c2ecf20Sopenharmony_ci   audited to make sure they are - submitting partially completed bios is
908c2ecf20Sopenharmony_ci   perfectly fine.
918c2ecf20Sopenharmony_ci
928c2ecf20Sopenharmony_ciOther implications:
938c2ecf20Sopenharmony_ci===================
948c2ecf20Sopenharmony_ci
958c2ecf20Sopenharmony_ci * Almost all usage of bi_idx is now incorrect and has been removed; instead,
968c2ecf20Sopenharmony_ci   where previously you would have used bi_idx you'd now use a bvec_iter,
978c2ecf20Sopenharmony_ci   probably passing it to one of the helper macros.
988c2ecf20Sopenharmony_ci
998c2ecf20Sopenharmony_ci   I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
1008c2ecf20Sopenharmony_ci   now use bio_iter_iovec(), which takes a bvec_iter and returns a
1018c2ecf20Sopenharmony_ci   literal struct bio_vec - constructed on the fly from the raw biovec but
1028c2ecf20Sopenharmony_ci   taking into account bi_bvec_done (and bi_size).
1038c2ecf20Sopenharmony_ci
1048c2ecf20Sopenharmony_ci * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
1058c2ecf20Sopenharmony_ci   doesn't actually own the bio. The reason is twofold: firstly, it's not
1068c2ecf20Sopenharmony_ci   actually needed for iterating over the bio anymore - we only use bi_size.
1078c2ecf20Sopenharmony_ci   Secondly, when cloning a bio and reusing (a portion of) the original bio's
1088c2ecf20Sopenharmony_ci   biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
1098c2ecf20Sopenharmony_ci   over all the biovecs in the new bio - which is silly as it's not needed.
1108c2ecf20Sopenharmony_ci
1118c2ecf20Sopenharmony_ci   So, don't use bi_vcnt anymore.
1128c2ecf20Sopenharmony_ci
1138c2ecf20Sopenharmony_ci * The current interface allows the block layer to split bios as needed, so we
1148c2ecf20Sopenharmony_ci   could eliminate a lot of complexity particularly in stacked drivers. Code
1158c2ecf20Sopenharmony_ci   that creates bios can then create whatever size bios are convenient, and
1168c2ecf20Sopenharmony_ci   more importantly stacked drivers don't have to deal with both their own bio
1178c2ecf20Sopenharmony_ci   size limitations and the limitations of the underlying devices. Thus
1188c2ecf20Sopenharmony_ci   there's no need to define ->merge_bvec_fn() callbacks for individual block
1198c2ecf20Sopenharmony_ci   drivers.
1208c2ecf20Sopenharmony_ci
1218c2ecf20Sopenharmony_ciUsage of helpers:
1228c2ecf20Sopenharmony_ci=================
1238c2ecf20Sopenharmony_ci
1248c2ecf20Sopenharmony_ci* The following helpers whose names have the suffix of `_all` can only be used
1258c2ecf20Sopenharmony_ci  on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
1268c2ecf20Sopenharmony_ci  shouldn't use them because the bio may have been split before it reached the
1278c2ecf20Sopenharmony_ci  driver.
1288c2ecf20Sopenharmony_ci
1298c2ecf20Sopenharmony_ci::
1308c2ecf20Sopenharmony_ci
1318c2ecf20Sopenharmony_ci	bio_for_each_segment_all()
1328c2ecf20Sopenharmony_ci	bio_for_each_bvec_all()
1338c2ecf20Sopenharmony_ci	bio_first_bvec_all()
1348c2ecf20Sopenharmony_ci	bio_first_page_all()
1358c2ecf20Sopenharmony_ci	bio_last_bvec_all()
1368c2ecf20Sopenharmony_ci
1378c2ecf20Sopenharmony_ci* The following helpers iterate over single-page segment. The passed 'struct
1388c2ecf20Sopenharmony_ci  bio_vec' will contain a single-page IO vector during the iteration::
1398c2ecf20Sopenharmony_ci
1408c2ecf20Sopenharmony_ci	bio_for_each_segment()
1418c2ecf20Sopenharmony_ci	bio_for_each_segment_all()
1428c2ecf20Sopenharmony_ci
1438c2ecf20Sopenharmony_ci* The following helpers iterate over multi-page bvec. The passed 'struct
1448c2ecf20Sopenharmony_ci  bio_vec' will contain a multi-page IO vector during the iteration::
1458c2ecf20Sopenharmony_ci
1468c2ecf20Sopenharmony_ci	bio_for_each_bvec()
1478c2ecf20Sopenharmony_ci	bio_for_each_bvec_all()
1488c2ecf20Sopenharmony_ci	rq_for_each_bvec()
149