18c2ecf20Sopenharmony_ci===================== 28c2ecf20Sopenharmony_ciThe errseq_t datatype 38c2ecf20Sopenharmony_ci===================== 48c2ecf20Sopenharmony_ci 58c2ecf20Sopenharmony_ciAn errseq_t is a way of recording errors in one place, and allowing any 68c2ecf20Sopenharmony_cinumber of "subscribers" to tell whether it has changed since a previous 78c2ecf20Sopenharmony_cipoint where it was sampled. 88c2ecf20Sopenharmony_ci 98c2ecf20Sopenharmony_ciThe initial use case for this is tracking errors for file 108c2ecf20Sopenharmony_cisynchronization syscalls (fsync, fdatasync, msync and sync_file_range), 118c2ecf20Sopenharmony_cibut it may be usable in other situations. 128c2ecf20Sopenharmony_ci 138c2ecf20Sopenharmony_ciIt's implemented as an unsigned 32-bit value. The low order bits are 148c2ecf20Sopenharmony_cidesignated to hold an error code (between 1 and MAX_ERRNO). The upper bits 158c2ecf20Sopenharmony_ciare used as a counter. This is done with atomics instead of locking so that 168c2ecf20Sopenharmony_cithese functions can be called from any context. 178c2ecf20Sopenharmony_ci 188c2ecf20Sopenharmony_ciNote that there is a risk of collisions if new errors are being recorded 198c2ecf20Sopenharmony_cifrequently, since we have so few bits to use as a counter. 208c2ecf20Sopenharmony_ci 218c2ecf20Sopenharmony_ciTo mitigate this, the bit between the error value and counter is used as 228c2ecf20Sopenharmony_cia flag to tell whether the value has been sampled since a new value was 238c2ecf20Sopenharmony_cirecorded. That allows us to avoid bumping the counter if no one has 248c2ecf20Sopenharmony_cisampled it since the last time an error was recorded. 258c2ecf20Sopenharmony_ci 268c2ecf20Sopenharmony_ciThus we end up with a value that looks something like this: 278c2ecf20Sopenharmony_ci 288c2ecf20Sopenharmony_ci+--------------------------------------+----+------------------------+ 298c2ecf20Sopenharmony_ci| 31..13 | 12 | 11..0 | 308c2ecf20Sopenharmony_ci+--------------------------------------+----+------------------------+ 318c2ecf20Sopenharmony_ci| counter | SF | errno | 328c2ecf20Sopenharmony_ci+--------------------------------------+----+------------------------+ 338c2ecf20Sopenharmony_ci 348c2ecf20Sopenharmony_ciThe general idea is for "watchers" to sample an errseq_t value and keep 358c2ecf20Sopenharmony_ciit as a running cursor. That value can later be used to tell whether 368c2ecf20Sopenharmony_ciany new errors have occurred since that sampling was done, and atomically 378c2ecf20Sopenharmony_cirecord the state at the time that it was checked. This allows us to 388c2ecf20Sopenharmony_cirecord errors in one place, and then have a number of "watchers" that 398c2ecf20Sopenharmony_cican tell whether the value has changed since they last checked it. 408c2ecf20Sopenharmony_ci 418c2ecf20Sopenharmony_ciA new errseq_t should always be zeroed out. An errseq_t value of all zeroes 428c2ecf20Sopenharmony_ciis the special (but common) case where there has never been an error. An all 438c2ecf20Sopenharmony_cizero value thus serves as the "epoch" if one wishes to know whether there 448c2ecf20Sopenharmony_cihas ever been an error set since it was first initialized. 458c2ecf20Sopenharmony_ci 468c2ecf20Sopenharmony_ciAPI usage 478c2ecf20Sopenharmony_ci========= 488c2ecf20Sopenharmony_ci 498c2ecf20Sopenharmony_ciLet me tell you a story about a worker drone. Now, he's a good worker 508c2ecf20Sopenharmony_cioverall, but the company is a little...management heavy. He has to 518c2ecf20Sopenharmony_cireport to 77 supervisors today, and tomorrow the "big boss" is coming in 528c2ecf20Sopenharmony_cifrom out of town and he's sure to test the poor fellow too. 538c2ecf20Sopenharmony_ci 548c2ecf20Sopenharmony_ciThey're all handing him work to do -- so much he can't keep track of who 558c2ecf20Sopenharmony_cihanded him what, but that's not really a big problem. The supervisors 568c2ecf20Sopenharmony_cijust want to know when he's finished all of the work they've handed him so 578c2ecf20Sopenharmony_cifar and whether he made any mistakes since they last asked. 588c2ecf20Sopenharmony_ci 598c2ecf20Sopenharmony_ciHe might have made the mistake on work they didn't actually hand him, 608c2ecf20Sopenharmony_cibut he can't keep track of things at that level of detail, all he can 618c2ecf20Sopenharmony_ciremember is the most recent mistake that he made. 628c2ecf20Sopenharmony_ci 638c2ecf20Sopenharmony_ciHere's our worker_drone representation:: 648c2ecf20Sopenharmony_ci 658c2ecf20Sopenharmony_ci struct worker_drone { 668c2ecf20Sopenharmony_ci errseq_t wd_err; /* for recording errors */ 678c2ecf20Sopenharmony_ci }; 688c2ecf20Sopenharmony_ci 698c2ecf20Sopenharmony_ciEvery day, the worker_drone starts out with a blank slate:: 708c2ecf20Sopenharmony_ci 718c2ecf20Sopenharmony_ci struct worker_drone wd; 728c2ecf20Sopenharmony_ci 738c2ecf20Sopenharmony_ci wd.wd_err = (errseq_t)0; 748c2ecf20Sopenharmony_ci 758c2ecf20Sopenharmony_ciThe supervisors come in and get an initial read for the day. They 768c2ecf20Sopenharmony_cidon't care about anything that happened before their watch begins:: 778c2ecf20Sopenharmony_ci 788c2ecf20Sopenharmony_ci struct supervisor { 798c2ecf20Sopenharmony_ci errseq_t s_wd_err; /* private "cursor" for wd_err */ 808c2ecf20Sopenharmony_ci spinlock_t s_wd_err_lock; /* protects s_wd_err */ 818c2ecf20Sopenharmony_ci } 828c2ecf20Sopenharmony_ci 838c2ecf20Sopenharmony_ci struct supervisor su; 848c2ecf20Sopenharmony_ci 858c2ecf20Sopenharmony_ci su.s_wd_err = errseq_sample(&wd.wd_err); 868c2ecf20Sopenharmony_ci spin_lock_init(&su.s_wd_err_lock); 878c2ecf20Sopenharmony_ci 888c2ecf20Sopenharmony_ciNow they start handing him tasks to do. Every few minutes they ask him to 898c2ecf20Sopenharmony_cifinish up all of the work they've handed him so far. Then they ask him 908c2ecf20Sopenharmony_ciwhether he made any mistakes on any of it:: 918c2ecf20Sopenharmony_ci 928c2ecf20Sopenharmony_ci spin_lock(&su.su_wd_err_lock); 938c2ecf20Sopenharmony_ci err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); 948c2ecf20Sopenharmony_ci spin_unlock(&su.su_wd_err_lock); 958c2ecf20Sopenharmony_ci 968c2ecf20Sopenharmony_ciUp to this point, that just keeps returning 0. 978c2ecf20Sopenharmony_ci 988c2ecf20Sopenharmony_ciNow, the owners of this company are quite miserly and have given him 998c2ecf20Sopenharmony_cisubstandard equipment with which to do his job. Occasionally it 1008c2ecf20Sopenharmony_ciglitches and he makes a mistake. He sighs a heavy sigh, and marks it 1018c2ecf20Sopenharmony_cidown:: 1028c2ecf20Sopenharmony_ci 1038c2ecf20Sopenharmony_ci errseq_set(&wd.wd_err, -EIO); 1048c2ecf20Sopenharmony_ci 1058c2ecf20Sopenharmony_ci...and then gets back to work. The supervisors eventually poll again 1068c2ecf20Sopenharmony_ciand they each get the error when they next check. Subsequent calls will 1078c2ecf20Sopenharmony_cireturn 0, until another error is recorded, at which point it's reported 1088c2ecf20Sopenharmony_cito each of them once. 1098c2ecf20Sopenharmony_ci 1108c2ecf20Sopenharmony_ciNote that the supervisors can't tell how many mistakes he made, only 1118c2ecf20Sopenharmony_ciwhether one was made since they last checked, and the latest value 1128c2ecf20Sopenharmony_cirecorded. 1138c2ecf20Sopenharmony_ci 1148c2ecf20Sopenharmony_ciOccasionally the big boss comes in for a spot check and asks the worker 1158c2ecf20Sopenharmony_cito do a one-off job for him. He's not really watching the worker 1168c2ecf20Sopenharmony_cifull-time like the supervisors, but he does need to know whether a 1178c2ecf20Sopenharmony_cimistake occurred while his job was processing. 1188c2ecf20Sopenharmony_ci 1198c2ecf20Sopenharmony_ciHe can just sample the current errseq_t in the worker, and then use that 1208c2ecf20Sopenharmony_cito tell whether an error has occurred later:: 1218c2ecf20Sopenharmony_ci 1228c2ecf20Sopenharmony_ci errseq_t since = errseq_sample(&wd.wd_err); 1238c2ecf20Sopenharmony_ci /* submit some work and wait for it to complete */ 1248c2ecf20Sopenharmony_ci err = errseq_check(&wd.wd_err, since); 1258c2ecf20Sopenharmony_ci 1268c2ecf20Sopenharmony_ciSince he's just going to discard "since" after that point, he doesn't 1278c2ecf20Sopenharmony_cineed to advance it here. He also doesn't need any locking since it's 1288c2ecf20Sopenharmony_cinot usable by anyone else. 1298c2ecf20Sopenharmony_ci 1308c2ecf20Sopenharmony_ciSerializing errseq_t cursor updates 1318c2ecf20Sopenharmony_ci=================================== 1328c2ecf20Sopenharmony_ci 1338c2ecf20Sopenharmony_ciNote that the errseq_t API does not protect the errseq_t cursor during a 1348c2ecf20Sopenharmony_cicheck_and_advance_operation. Only the canonical error code is handled 1358c2ecf20Sopenharmony_ciatomically. In a situation where more than one task might be using the 1368c2ecf20Sopenharmony_cisame errseq_t cursor at the same time, it's important to serialize 1378c2ecf20Sopenharmony_ciupdates to that cursor. 1388c2ecf20Sopenharmony_ci 1398c2ecf20Sopenharmony_ciIf that's not done, then it's possible for the cursor to go backward 1408c2ecf20Sopenharmony_ciin which case the same error could be reported more than once. 1418c2ecf20Sopenharmony_ci 1428c2ecf20Sopenharmony_ciBecause of this, it's often advantageous to first do an errseq_check to 1438c2ecf20Sopenharmony_cisee if anything has changed, and only later do an 1448c2ecf20Sopenharmony_cierrseq_check_and_advance after taking the lock. e.g.:: 1458c2ecf20Sopenharmony_ci 1468c2ecf20Sopenharmony_ci if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) { 1478c2ecf20Sopenharmony_ci /* su.s_wd_err is protected by s_wd_err_lock */ 1488c2ecf20Sopenharmony_ci spin_lock(&su.s_wd_err_lock); 1498c2ecf20Sopenharmony_ci err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); 1508c2ecf20Sopenharmony_ci spin_unlock(&su.s_wd_err_lock); 1518c2ecf20Sopenharmony_ci } 1528c2ecf20Sopenharmony_ci 1538c2ecf20Sopenharmony_ciThat avoids the spinlock in the common case where nothing has changed 1548c2ecf20Sopenharmony_cisince the last time it was checked. 1558c2ecf20Sopenharmony_ci 1568c2ecf20Sopenharmony_ciFunctions 1578c2ecf20Sopenharmony_ci========= 1588c2ecf20Sopenharmony_ci 1598c2ecf20Sopenharmony_ci.. kernel-doc:: lib/errseq.c 160