18c2ecf20Sopenharmony_ci=====================
28c2ecf20Sopenharmony_ciThe errseq_t datatype
38c2ecf20Sopenharmony_ci=====================
48c2ecf20Sopenharmony_ci
58c2ecf20Sopenharmony_ciAn errseq_t is a way of recording errors in one place, and allowing any
68c2ecf20Sopenharmony_cinumber of "subscribers" to tell whether it has changed since a previous
78c2ecf20Sopenharmony_cipoint where it was sampled.
88c2ecf20Sopenharmony_ci
98c2ecf20Sopenharmony_ciThe initial use case for this is tracking errors for file
108c2ecf20Sopenharmony_cisynchronization syscalls (fsync, fdatasync, msync and sync_file_range),
118c2ecf20Sopenharmony_cibut it may be usable in other situations.
128c2ecf20Sopenharmony_ci
138c2ecf20Sopenharmony_ciIt's implemented as an unsigned 32-bit value.  The low order bits are
148c2ecf20Sopenharmony_cidesignated to hold an error code (between 1 and MAX_ERRNO).  The upper bits
158c2ecf20Sopenharmony_ciare used as a counter.  This is done with atomics instead of locking so that
168c2ecf20Sopenharmony_cithese functions can be called from any context.
178c2ecf20Sopenharmony_ci
188c2ecf20Sopenharmony_ciNote that there is a risk of collisions if new errors are being recorded
198c2ecf20Sopenharmony_cifrequently, since we have so few bits to use as a counter.
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_ciTo mitigate this, the bit between the error value and counter is used as
228c2ecf20Sopenharmony_cia flag to tell whether the value has been sampled since a new value was
238c2ecf20Sopenharmony_cirecorded.  That allows us to avoid bumping the counter if no one has
248c2ecf20Sopenharmony_cisampled it since the last time an error was recorded.
258c2ecf20Sopenharmony_ci
268c2ecf20Sopenharmony_ciThus we end up with a value that looks something like this:
278c2ecf20Sopenharmony_ci
288c2ecf20Sopenharmony_ci+--------------------------------------+----+------------------------+
298c2ecf20Sopenharmony_ci| 31..13                               | 12 | 11..0                  |
308c2ecf20Sopenharmony_ci+--------------------------------------+----+------------------------+
318c2ecf20Sopenharmony_ci| counter                              | SF | errno                  |
328c2ecf20Sopenharmony_ci+--------------------------------------+----+------------------------+
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ciThe general idea is for "watchers" to sample an errseq_t value and keep
358c2ecf20Sopenharmony_ciit as a running cursor.  That value can later be used to tell whether
368c2ecf20Sopenharmony_ciany new errors have occurred since that sampling was done, and atomically
378c2ecf20Sopenharmony_cirecord the state at the time that it was checked.  This allows us to
388c2ecf20Sopenharmony_cirecord errors in one place, and then have a number of "watchers" that
398c2ecf20Sopenharmony_cican tell whether the value has changed since they last checked it.
408c2ecf20Sopenharmony_ci
418c2ecf20Sopenharmony_ciA new errseq_t should always be zeroed out.  An errseq_t value of all zeroes
428c2ecf20Sopenharmony_ciis the special (but common) case where there has never been an error. An all
438c2ecf20Sopenharmony_cizero value thus serves as the "epoch" if one wishes to know whether there
448c2ecf20Sopenharmony_cihas ever been an error set since it was first initialized.
458c2ecf20Sopenharmony_ci
468c2ecf20Sopenharmony_ciAPI usage
478c2ecf20Sopenharmony_ci=========
488c2ecf20Sopenharmony_ci
498c2ecf20Sopenharmony_ciLet me tell you a story about a worker drone.  Now, he's a good worker
508c2ecf20Sopenharmony_cioverall, but the company is a little...management heavy.  He has to
518c2ecf20Sopenharmony_cireport to 77 supervisors today, and tomorrow the "big boss" is coming in
528c2ecf20Sopenharmony_cifrom out of town and he's sure to test the poor fellow too.
538c2ecf20Sopenharmony_ci
548c2ecf20Sopenharmony_ciThey're all handing him work to do -- so much he can't keep track of who
558c2ecf20Sopenharmony_cihanded him what, but that's not really a big problem.  The supervisors
568c2ecf20Sopenharmony_cijust want to know when he's finished all of the work they've handed him so
578c2ecf20Sopenharmony_cifar and whether he made any mistakes since they last asked.
588c2ecf20Sopenharmony_ci
598c2ecf20Sopenharmony_ciHe might have made the mistake on work they didn't actually hand him,
608c2ecf20Sopenharmony_cibut he can't keep track of things at that level of detail, all he can
618c2ecf20Sopenharmony_ciremember is the most recent mistake that he made.
628c2ecf20Sopenharmony_ci
638c2ecf20Sopenharmony_ciHere's our worker_drone representation::
648c2ecf20Sopenharmony_ci
658c2ecf20Sopenharmony_ci        struct worker_drone {
668c2ecf20Sopenharmony_ci                errseq_t        wd_err; /* for recording errors */
678c2ecf20Sopenharmony_ci        };
688c2ecf20Sopenharmony_ci
698c2ecf20Sopenharmony_ciEvery day, the worker_drone starts out with a blank slate::
708c2ecf20Sopenharmony_ci
718c2ecf20Sopenharmony_ci        struct worker_drone wd;
728c2ecf20Sopenharmony_ci
738c2ecf20Sopenharmony_ci        wd.wd_err = (errseq_t)0;
748c2ecf20Sopenharmony_ci
758c2ecf20Sopenharmony_ciThe supervisors come in and get an initial read for the day.  They
768c2ecf20Sopenharmony_cidon't care about anything that happened before their watch begins::
778c2ecf20Sopenharmony_ci
788c2ecf20Sopenharmony_ci        struct supervisor {
798c2ecf20Sopenharmony_ci                errseq_t        s_wd_err; /* private "cursor" for wd_err */
808c2ecf20Sopenharmony_ci                spinlock_t      s_wd_err_lock; /* protects s_wd_err */
818c2ecf20Sopenharmony_ci        }
828c2ecf20Sopenharmony_ci
838c2ecf20Sopenharmony_ci        struct supervisor       su;
848c2ecf20Sopenharmony_ci
858c2ecf20Sopenharmony_ci        su.s_wd_err = errseq_sample(&wd.wd_err);
868c2ecf20Sopenharmony_ci        spin_lock_init(&su.s_wd_err_lock);
878c2ecf20Sopenharmony_ci
888c2ecf20Sopenharmony_ciNow they start handing him tasks to do.  Every few minutes they ask him to
898c2ecf20Sopenharmony_cifinish up all of the work they've handed him so far.  Then they ask him
908c2ecf20Sopenharmony_ciwhether he made any mistakes on any of it::
918c2ecf20Sopenharmony_ci
928c2ecf20Sopenharmony_ci        spin_lock(&su.su_wd_err_lock);
938c2ecf20Sopenharmony_ci        err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
948c2ecf20Sopenharmony_ci        spin_unlock(&su.su_wd_err_lock);
958c2ecf20Sopenharmony_ci
968c2ecf20Sopenharmony_ciUp to this point, that just keeps returning 0.
978c2ecf20Sopenharmony_ci
988c2ecf20Sopenharmony_ciNow, the owners of this company are quite miserly and have given him
998c2ecf20Sopenharmony_cisubstandard equipment with which to do his job. Occasionally it
1008c2ecf20Sopenharmony_ciglitches and he makes a mistake.  He sighs a heavy sigh, and marks it
1018c2ecf20Sopenharmony_cidown::
1028c2ecf20Sopenharmony_ci
1038c2ecf20Sopenharmony_ci        errseq_set(&wd.wd_err, -EIO);
1048c2ecf20Sopenharmony_ci
1058c2ecf20Sopenharmony_ci...and then gets back to work.  The supervisors eventually poll again
1068c2ecf20Sopenharmony_ciand they each get the error when they next check.  Subsequent calls will
1078c2ecf20Sopenharmony_cireturn 0, until another error is recorded, at which point it's reported
1088c2ecf20Sopenharmony_cito each of them once.
1098c2ecf20Sopenharmony_ci
1108c2ecf20Sopenharmony_ciNote that the supervisors can't tell how many mistakes he made, only
1118c2ecf20Sopenharmony_ciwhether one was made since they last checked, and the latest value
1128c2ecf20Sopenharmony_cirecorded.
1138c2ecf20Sopenharmony_ci
1148c2ecf20Sopenharmony_ciOccasionally the big boss comes in for a spot check and asks the worker
1158c2ecf20Sopenharmony_cito do a one-off job for him. He's not really watching the worker
1168c2ecf20Sopenharmony_cifull-time like the supervisors, but he does need to know whether a
1178c2ecf20Sopenharmony_cimistake occurred while his job was processing.
1188c2ecf20Sopenharmony_ci
1198c2ecf20Sopenharmony_ciHe can just sample the current errseq_t in the worker, and then use that
1208c2ecf20Sopenharmony_cito tell whether an error has occurred later::
1218c2ecf20Sopenharmony_ci
1228c2ecf20Sopenharmony_ci        errseq_t since = errseq_sample(&wd.wd_err);
1238c2ecf20Sopenharmony_ci        /* submit some work and wait for it to complete */
1248c2ecf20Sopenharmony_ci        err = errseq_check(&wd.wd_err, since);
1258c2ecf20Sopenharmony_ci
1268c2ecf20Sopenharmony_ciSince he's just going to discard "since" after that point, he doesn't
1278c2ecf20Sopenharmony_cineed to advance it here. He also doesn't need any locking since it's
1288c2ecf20Sopenharmony_cinot usable by anyone else.
1298c2ecf20Sopenharmony_ci
1308c2ecf20Sopenharmony_ciSerializing errseq_t cursor updates
1318c2ecf20Sopenharmony_ci===================================
1328c2ecf20Sopenharmony_ci
1338c2ecf20Sopenharmony_ciNote that the errseq_t API does not protect the errseq_t cursor during a
1348c2ecf20Sopenharmony_cicheck_and_advance_operation. Only the canonical error code is handled
1358c2ecf20Sopenharmony_ciatomically.  In a situation where more than one task might be using the
1368c2ecf20Sopenharmony_cisame errseq_t cursor at the same time, it's important to serialize
1378c2ecf20Sopenharmony_ciupdates to that cursor.
1388c2ecf20Sopenharmony_ci
1398c2ecf20Sopenharmony_ciIf that's not done, then it's possible for the cursor to go backward
1408c2ecf20Sopenharmony_ciin which case the same error could be reported more than once.
1418c2ecf20Sopenharmony_ci
1428c2ecf20Sopenharmony_ciBecause of this, it's often advantageous to first do an errseq_check to
1438c2ecf20Sopenharmony_cisee if anything has changed, and only later do an
1448c2ecf20Sopenharmony_cierrseq_check_and_advance after taking the lock. e.g.::
1458c2ecf20Sopenharmony_ci
1468c2ecf20Sopenharmony_ci        if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) {
1478c2ecf20Sopenharmony_ci                /* su.s_wd_err is protected by s_wd_err_lock */
1488c2ecf20Sopenharmony_ci                spin_lock(&su.s_wd_err_lock);
1498c2ecf20Sopenharmony_ci                err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
1508c2ecf20Sopenharmony_ci                spin_unlock(&su.s_wd_err_lock);
1518c2ecf20Sopenharmony_ci        }
1528c2ecf20Sopenharmony_ci
1538c2ecf20Sopenharmony_ciThat avoids the spinlock in the common case where nothing has changed
1548c2ecf20Sopenharmony_cisince the last time it was checked.
1558c2ecf20Sopenharmony_ci
1568c2ecf20Sopenharmony_ciFunctions
1578c2ecf20Sopenharmony_ci=========
1588c2ecf20Sopenharmony_ci
1598c2ecf20Sopenharmony_ci.. kernel-doc:: lib/errseq.c
160