162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-only
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci===========================
462306a36Sopenharmony_ciLockless Ring Buffer Design
562306a36Sopenharmony_ci===========================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciCopyright 2009 Red Hat Inc.
862306a36Sopenharmony_ci
962306a36Sopenharmony_ci:Author:   Steven Rostedt <srostedt@redhat.com>
1062306a36Sopenharmony_ci:License:  The GNU Free Documentation License, Version 1.2
1162306a36Sopenharmony_ci           (dual licensed under the GPL v2)
1262306a36Sopenharmony_ci:Reviewers:  Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
1362306a36Sopenharmony_ci	     and Frederic Weisbecker.
1462306a36Sopenharmony_ci
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ciWritten for: 2.6.31
1762306a36Sopenharmony_ci
1862306a36Sopenharmony_ciTerminology used in this Document
1962306a36Sopenharmony_ci---------------------------------
2062306a36Sopenharmony_ci
2162306a36Sopenharmony_citail
2262306a36Sopenharmony_ci	- where new writes happen in the ring buffer.
2362306a36Sopenharmony_ci
2462306a36Sopenharmony_cihead
2562306a36Sopenharmony_ci	- where new reads happen in the ring buffer.
2662306a36Sopenharmony_ci
2762306a36Sopenharmony_ciproducer
2862306a36Sopenharmony_ci	- the task that writes into the ring buffer (same as writer)
2962306a36Sopenharmony_ci
3062306a36Sopenharmony_ciwriter
3162306a36Sopenharmony_ci	- same as producer
3262306a36Sopenharmony_ci
3362306a36Sopenharmony_ciconsumer
3462306a36Sopenharmony_ci	- the task that reads from the buffer (same as reader)
3562306a36Sopenharmony_ci
3662306a36Sopenharmony_cireader
3762306a36Sopenharmony_ci	- same as consumer.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_cireader_page
4062306a36Sopenharmony_ci	- A page outside the ring buffer used solely (for the most part)
4162306a36Sopenharmony_ci	  by the reader.
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_cihead_page
4462306a36Sopenharmony_ci	- a pointer to the page that the reader will use next
4562306a36Sopenharmony_ci
4662306a36Sopenharmony_citail_page
4762306a36Sopenharmony_ci	- a pointer to the page that will be written to next
4862306a36Sopenharmony_ci
4962306a36Sopenharmony_cicommit_page
5062306a36Sopenharmony_ci	- a pointer to the page with the last finished non-nested write.
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_cicmpxchg
5362306a36Sopenharmony_ci	- hardware-assisted atomic transaction that performs the following::
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ci	    A = B if previous A == C
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ci	    R = cmpxchg(A, C, B) is saying that we replace A with B if and only
5862306a36Sopenharmony_ci		if current A is equal to C, and we put the old (current)
5962306a36Sopenharmony_ci		A into R
6062306a36Sopenharmony_ci
6162306a36Sopenharmony_ci	    R gets the previous A regardless if A is updated with B or not.
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ci	  To see if the update was successful a compare of ``R == C``
6462306a36Sopenharmony_ci	  may be used.
6562306a36Sopenharmony_ci
6662306a36Sopenharmony_ciThe Generic Ring Buffer
6762306a36Sopenharmony_ci-----------------------
6862306a36Sopenharmony_ci
6962306a36Sopenharmony_ciThe ring buffer can be used in either an overwrite mode or in
7062306a36Sopenharmony_ciproducer/consumer mode.
7162306a36Sopenharmony_ci
7262306a36Sopenharmony_ciProducer/consumer mode is where if the producer were to fill up the
7362306a36Sopenharmony_cibuffer before the consumer could free up anything, the producer
7462306a36Sopenharmony_ciwill stop writing to the buffer. This will lose most recent events.
7562306a36Sopenharmony_ci
7662306a36Sopenharmony_ciOverwrite mode is where if the producer were to fill up the buffer
7762306a36Sopenharmony_cibefore the consumer could free up anything, the producer will
7862306a36Sopenharmony_cioverwrite the older data. This will lose the oldest events.
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ciNo two writers can write at the same time (on the same per-cpu buffer),
8162306a36Sopenharmony_cibut a writer may interrupt another writer, but it must finish writing
8262306a36Sopenharmony_cibefore the previous writer may continue. This is very important to the
8362306a36Sopenharmony_cialgorithm. The writers act like a "stack". The way interrupts works
8462306a36Sopenharmony_cienforces this behavior::
8562306a36Sopenharmony_ci
8662306a36Sopenharmony_ci
8762306a36Sopenharmony_ci  writer1 start
8862306a36Sopenharmony_ci     <preempted> writer2 start
8962306a36Sopenharmony_ci         <preempted> writer3 start
9062306a36Sopenharmony_ci                     writer3 finishes
9162306a36Sopenharmony_ci                 writer2 finishes
9262306a36Sopenharmony_ci  writer1 finishes
9362306a36Sopenharmony_ci
9462306a36Sopenharmony_ciThis is very much like a writer being preempted by an interrupt and
9562306a36Sopenharmony_cithe interrupt doing a write as well.
9662306a36Sopenharmony_ci
9762306a36Sopenharmony_ciReaders can happen at any time. But no two readers may run at the
9862306a36Sopenharmony_cisame time, nor can a reader preempt/interrupt another reader. A reader
9962306a36Sopenharmony_cicannot preempt/interrupt a writer, but it may read/consume from the
10062306a36Sopenharmony_cibuffer at the same time as a writer is writing, but the reader must be
10162306a36Sopenharmony_cion another processor to do so. A reader may read on its own processor
10262306a36Sopenharmony_ciand can be preempted by a writer.
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ciA writer can preempt a reader, but a reader cannot preempt a writer.
10562306a36Sopenharmony_ciBut a reader can read the buffer at the same time (on another processor)
10662306a36Sopenharmony_cias a writer.
10762306a36Sopenharmony_ci
10862306a36Sopenharmony_ciThe ring buffer is made up of a list of pages held together by a linked list.
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ciAt initialization a reader page is allocated for the reader that is not
11162306a36Sopenharmony_cipart of the ring buffer.
11262306a36Sopenharmony_ci
11362306a36Sopenharmony_ciThe head_page, tail_page and commit_page are all initialized to point
11462306a36Sopenharmony_cito the same page.
11562306a36Sopenharmony_ci
11662306a36Sopenharmony_ciThe reader page is initialized to have its next pointer pointing to
11762306a36Sopenharmony_cithe head page, and its previous pointer pointing to a page before
11862306a36Sopenharmony_cithe head page.
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ciThe reader has its own page to use. At start up time, this page is
12162306a36Sopenharmony_ciallocated but is not attached to the list. When the reader wants
12262306a36Sopenharmony_cito read from the buffer, if its page is empty (like it is on start-up),
12362306a36Sopenharmony_ciit will swap its page with the head_page. The old reader page will
12462306a36Sopenharmony_cibecome part of the ring buffer and the head_page will be removed.
12562306a36Sopenharmony_ciThe page after the inserted page (old reader_page) will become the
12662306a36Sopenharmony_cinew head page.
12762306a36Sopenharmony_ci
12862306a36Sopenharmony_ciOnce the new page is given to the reader, the reader could do what
12962306a36Sopenharmony_ciit wants with it, as long as a writer has left that page.
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ciA sample of how the reader page is swapped: Note this does not
13262306a36Sopenharmony_cishow the head page in the buffer, it is for demonstrating a swap
13362306a36Sopenharmony_cionly.
13462306a36Sopenharmony_ci
13562306a36Sopenharmony_ci::
13662306a36Sopenharmony_ci
13762306a36Sopenharmony_ci  +------+
13862306a36Sopenharmony_ci  |reader|          RING BUFFER
13962306a36Sopenharmony_ci  |page  |
14062306a36Sopenharmony_ci  +------+
14162306a36Sopenharmony_ci                  +---+   +---+   +---+
14262306a36Sopenharmony_ci                  |   |-->|   |-->|   |
14362306a36Sopenharmony_ci                  |   |<--|   |<--|   |
14462306a36Sopenharmony_ci                  +---+   +---+   +---+
14562306a36Sopenharmony_ci                   ^ |             ^ |
14662306a36Sopenharmony_ci                   | +-------------+ |
14762306a36Sopenharmony_ci                   +-----------------+
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_ci  +------+
15162306a36Sopenharmony_ci  |reader|          RING BUFFER
15262306a36Sopenharmony_ci  |page  |-------------------+
15362306a36Sopenharmony_ci  +------+                   v
15462306a36Sopenharmony_ci    |             +---+   +---+   +---+
15562306a36Sopenharmony_ci    |             |   |-->|   |-->|   |
15662306a36Sopenharmony_ci    |             |   |<--|   |<--|   |<-+
15762306a36Sopenharmony_ci    |             +---+   +---+   +---+  |
15862306a36Sopenharmony_ci    |              ^ |             ^ |   |
15962306a36Sopenharmony_ci    |              | +-------------+ |   |
16062306a36Sopenharmony_ci    |              +-----------------+   |
16162306a36Sopenharmony_ci    +------------------------------------+
16262306a36Sopenharmony_ci
16362306a36Sopenharmony_ci  +------+
16462306a36Sopenharmony_ci  |reader|          RING BUFFER
16562306a36Sopenharmony_ci  |page  |-------------------+
16662306a36Sopenharmony_ci  +------+ <---------------+ v
16762306a36Sopenharmony_ci    |  ^          +---+   +---+   +---+
16862306a36Sopenharmony_ci    |  |          |   |-->|   |-->|   |
16962306a36Sopenharmony_ci    |  |          |   |   |   |<--|   |<-+
17062306a36Sopenharmony_ci    |  |          +---+   +---+   +---+  |
17162306a36Sopenharmony_ci    |  |             |             ^ |   |
17262306a36Sopenharmony_ci    |  |             +-------------+ |   |
17362306a36Sopenharmony_ci    |  +-----------------------------+   |
17462306a36Sopenharmony_ci    +------------------------------------+
17562306a36Sopenharmony_ci
17662306a36Sopenharmony_ci  +------+
17762306a36Sopenharmony_ci  |buffer|          RING BUFFER
17862306a36Sopenharmony_ci  |page  |-------------------+
17962306a36Sopenharmony_ci  +------+ <---------------+ v
18062306a36Sopenharmony_ci    |  ^          +---+   +---+   +---+
18162306a36Sopenharmony_ci    |  |          |   |   |   |-->|   |
18262306a36Sopenharmony_ci    |  |  New     |   |   |   |<--|   |<-+
18362306a36Sopenharmony_ci    |  | Reader   +---+   +---+   +---+  |
18462306a36Sopenharmony_ci    |  |  page ----^                 |   |
18562306a36Sopenharmony_ci    |  |                             |   |
18662306a36Sopenharmony_ci    |  +-----------------------------+   |
18762306a36Sopenharmony_ci    +------------------------------------+
18862306a36Sopenharmony_ci
18962306a36Sopenharmony_ci
19062306a36Sopenharmony_ci
19162306a36Sopenharmony_ciIt is possible that the page swapped is the commit page and the tail page,
19262306a36Sopenharmony_ciif what is in the ring buffer is less than what is held in a buffer page.
19362306a36Sopenharmony_ci
19462306a36Sopenharmony_ci::
19562306a36Sopenharmony_ci
19662306a36Sopenharmony_ci            reader page    commit page   tail page
19762306a36Sopenharmony_ci                |              |             |
19862306a36Sopenharmony_ci                v              |             |
19962306a36Sopenharmony_ci               +---+           |             |
20062306a36Sopenharmony_ci               |   |<----------+             |
20162306a36Sopenharmony_ci               |   |<------------------------+
20262306a36Sopenharmony_ci               |   |------+
20362306a36Sopenharmony_ci               +---+      |
20462306a36Sopenharmony_ci                          |
20562306a36Sopenharmony_ci                          v
20662306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
20762306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |--->
20862306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
20962306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
21062306a36Sopenharmony_ci
21162306a36Sopenharmony_ciThis case is still valid for this algorithm.
21262306a36Sopenharmony_ciWhen the writer leaves the page, it simply goes into the ring buffer
21362306a36Sopenharmony_cisince the reader page still points to the next location in the ring
21462306a36Sopenharmony_cibuffer.
21562306a36Sopenharmony_ci
21662306a36Sopenharmony_ci
21762306a36Sopenharmony_ciThe main pointers:
21862306a36Sopenharmony_ci
21962306a36Sopenharmony_ci  reader page
22062306a36Sopenharmony_ci	    - The page used solely by the reader and is not part
22162306a36Sopenharmony_ci              of the ring buffer (may be swapped in)
22262306a36Sopenharmony_ci
22362306a36Sopenharmony_ci  head page
22462306a36Sopenharmony_ci	    - the next page in the ring buffer that will be swapped
22562306a36Sopenharmony_ci              with the reader page.
22662306a36Sopenharmony_ci
22762306a36Sopenharmony_ci  tail page
22862306a36Sopenharmony_ci	    - the page where the next write will take place.
22962306a36Sopenharmony_ci
23062306a36Sopenharmony_ci  commit page
23162306a36Sopenharmony_ci	    - the page that last finished a write.
23262306a36Sopenharmony_ci
23362306a36Sopenharmony_ciThe commit page only is updated by the outermost writer in the
23462306a36Sopenharmony_ciwriter stack. A writer that preempts another writer will not move the
23562306a36Sopenharmony_cicommit page.
23662306a36Sopenharmony_ci
23762306a36Sopenharmony_ciWhen data is written into the ring buffer, a position is reserved
23862306a36Sopenharmony_ciin the ring buffer and passed back to the writer. When the writer
23962306a36Sopenharmony_ciis finished writing data into that position, it commits the write.
24062306a36Sopenharmony_ci
24162306a36Sopenharmony_ciAnother write (or a read) may take place at anytime during this
24262306a36Sopenharmony_citransaction. If another write happens it must finish before continuing
24362306a36Sopenharmony_ciwith the previous write.
24462306a36Sopenharmony_ci
24562306a36Sopenharmony_ci
24662306a36Sopenharmony_ci   Write reserve::
24762306a36Sopenharmony_ci
24862306a36Sopenharmony_ci       Buffer page
24962306a36Sopenharmony_ci      +---------+
25062306a36Sopenharmony_ci      |written  |
25162306a36Sopenharmony_ci      +---------+  <--- given back to writer (current commit)
25262306a36Sopenharmony_ci      |reserved |
25362306a36Sopenharmony_ci      +---------+ <--- tail pointer
25462306a36Sopenharmony_ci      | empty   |
25562306a36Sopenharmony_ci      +---------+
25662306a36Sopenharmony_ci
25762306a36Sopenharmony_ci   Write commit::
25862306a36Sopenharmony_ci
25962306a36Sopenharmony_ci       Buffer page
26062306a36Sopenharmony_ci      +---------+
26162306a36Sopenharmony_ci      |written  |
26262306a36Sopenharmony_ci      +---------+
26362306a36Sopenharmony_ci      |written  |
26462306a36Sopenharmony_ci      +---------+  <--- next position for write (current commit)
26562306a36Sopenharmony_ci      | empty   |
26662306a36Sopenharmony_ci      +---------+
26762306a36Sopenharmony_ci
26862306a36Sopenharmony_ci
26962306a36Sopenharmony_ci If a write happens after the first reserve::
27062306a36Sopenharmony_ci
27162306a36Sopenharmony_ci       Buffer page
27262306a36Sopenharmony_ci      +---------+
27362306a36Sopenharmony_ci      |written  |
27462306a36Sopenharmony_ci      +---------+  <-- current commit
27562306a36Sopenharmony_ci      |reserved |
27662306a36Sopenharmony_ci      +---------+  <--- given back to second writer
27762306a36Sopenharmony_ci      |reserved |
27862306a36Sopenharmony_ci      +---------+ <--- tail pointer
27962306a36Sopenharmony_ci
28062306a36Sopenharmony_ci  After second writer commits::
28162306a36Sopenharmony_ci
28262306a36Sopenharmony_ci
28362306a36Sopenharmony_ci       Buffer page
28462306a36Sopenharmony_ci      +---------+
28562306a36Sopenharmony_ci      |written  |
28662306a36Sopenharmony_ci      +---------+  <--(last full commit)
28762306a36Sopenharmony_ci      |reserved |
28862306a36Sopenharmony_ci      +---------+
28962306a36Sopenharmony_ci      |pending  |
29062306a36Sopenharmony_ci      |commit   |
29162306a36Sopenharmony_ci      +---------+ <--- tail pointer
29262306a36Sopenharmony_ci
29362306a36Sopenharmony_ci  When the first writer commits::
29462306a36Sopenharmony_ci
29562306a36Sopenharmony_ci       Buffer page
29662306a36Sopenharmony_ci      +---------+
29762306a36Sopenharmony_ci      |written  |
29862306a36Sopenharmony_ci      +---------+
29962306a36Sopenharmony_ci      |written  |
30062306a36Sopenharmony_ci      +---------+
30162306a36Sopenharmony_ci      |written  |
30262306a36Sopenharmony_ci      +---------+  <--(last full commit and tail pointer)
30362306a36Sopenharmony_ci
30462306a36Sopenharmony_ci
30562306a36Sopenharmony_ciThe commit pointer points to the last write location that was
30662306a36Sopenharmony_cicommitted without preempting another write. When a write that
30762306a36Sopenharmony_cipreempted another write is committed, it only becomes a pending commit
30862306a36Sopenharmony_ciand will not be a full commit until all writes have been committed.
30962306a36Sopenharmony_ci
31062306a36Sopenharmony_ciThe commit page points to the page that has the last full commit.
31162306a36Sopenharmony_ciThe tail page points to the page with the last write (before
31262306a36Sopenharmony_cicommitting).
31362306a36Sopenharmony_ci
31462306a36Sopenharmony_ciThe tail page is always equal to or after the commit page. It may
31562306a36Sopenharmony_cibe several pages ahead. If the tail page catches up to the commit
31662306a36Sopenharmony_cipage then no more writes may take place (regardless of the mode
31762306a36Sopenharmony_ciof the ring buffer: overwrite and produce/consumer).
31862306a36Sopenharmony_ci
31962306a36Sopenharmony_ciThe order of pages is::
32062306a36Sopenharmony_ci
32162306a36Sopenharmony_ci head page
32262306a36Sopenharmony_ci commit page
32362306a36Sopenharmony_ci tail page
32462306a36Sopenharmony_ci
32562306a36Sopenharmony_ciPossible scenario::
32662306a36Sopenharmony_ci
32762306a36Sopenharmony_ci                               tail page
32862306a36Sopenharmony_ci    head page         commit page  |
32962306a36Sopenharmony_ci        |                 |        |
33062306a36Sopenharmony_ci        v                 v        v
33162306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
33262306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |--->
33362306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
33462306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
33562306a36Sopenharmony_ci
33662306a36Sopenharmony_ciThere is a special case that the head page is after either the commit page
33762306a36Sopenharmony_ciand possibly the tail page. That is when the commit (and tail) page has been
33862306a36Sopenharmony_ciswapped with the reader page. This is because the head page is always
33962306a36Sopenharmony_cipart of the ring buffer, but the reader page is not. Whenever there
34062306a36Sopenharmony_cihas been less than a full page that has been committed inside the ring buffer,
34162306a36Sopenharmony_ciand a reader swaps out a page, it will be swapping out the commit page.
34262306a36Sopenharmony_ci
34362306a36Sopenharmony_ci::
34462306a36Sopenharmony_ci
34562306a36Sopenharmony_ci            reader page    commit page   tail page
34662306a36Sopenharmony_ci                |              |             |
34762306a36Sopenharmony_ci                v              |             |
34862306a36Sopenharmony_ci               +---+           |             |
34962306a36Sopenharmony_ci               |   |<----------+             |
35062306a36Sopenharmony_ci               |   |<------------------------+
35162306a36Sopenharmony_ci               |   |------+
35262306a36Sopenharmony_ci               +---+      |
35362306a36Sopenharmony_ci                          |
35462306a36Sopenharmony_ci                          v
35562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
35662306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |--->
35762306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
35862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
35962306a36Sopenharmony_ci                          ^
36062306a36Sopenharmony_ci                          |
36162306a36Sopenharmony_ci                      head page
36262306a36Sopenharmony_ci
36362306a36Sopenharmony_ci
36462306a36Sopenharmony_ciIn this case, the head page will not move when the tail and commit
36562306a36Sopenharmony_cimove back into the ring buffer.
36662306a36Sopenharmony_ci
36762306a36Sopenharmony_ciThe reader cannot swap a page into the ring buffer if the commit page
36862306a36Sopenharmony_ciis still on that page. If the read meets the last commit (real commit
36962306a36Sopenharmony_cinot pending or reserved), then there is nothing more to read.
37062306a36Sopenharmony_ciThe buffer is considered empty until another full commit finishes.
37162306a36Sopenharmony_ci
37262306a36Sopenharmony_ciWhen the tail meets the head page, if the buffer is in overwrite mode,
37362306a36Sopenharmony_cithe head page will be pushed ahead one. If the buffer is in producer/consumer
37462306a36Sopenharmony_cimode, the write will fail.
37562306a36Sopenharmony_ci
37662306a36Sopenharmony_ciOverwrite mode::
37762306a36Sopenharmony_ci
37862306a36Sopenharmony_ci              tail page
37962306a36Sopenharmony_ci                 |
38062306a36Sopenharmony_ci                 v
38162306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
38262306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |--->
38362306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
38462306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
38562306a36Sopenharmony_ci                          ^
38662306a36Sopenharmony_ci                          |
38762306a36Sopenharmony_ci                      head page
38862306a36Sopenharmony_ci
38962306a36Sopenharmony_ci
39062306a36Sopenharmony_ci              tail page
39162306a36Sopenharmony_ci                 |
39262306a36Sopenharmony_ci                 v
39362306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
39462306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |--->
39562306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
39662306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
39762306a36Sopenharmony_ci                                   ^
39862306a36Sopenharmony_ci                                   |
39962306a36Sopenharmony_ci                               head page
40062306a36Sopenharmony_ci
40162306a36Sopenharmony_ci
40262306a36Sopenharmony_ci                      tail page
40362306a36Sopenharmony_ci                          |
40462306a36Sopenharmony_ci                          v
40562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
40662306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |--->
40762306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
40862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
40962306a36Sopenharmony_ci                                   ^
41062306a36Sopenharmony_ci                                   |
41162306a36Sopenharmony_ci                               head page
41262306a36Sopenharmony_ci
41362306a36Sopenharmony_ciNote, the reader page will still point to the previous head page.
41462306a36Sopenharmony_ciBut when a swap takes place, it will use the most recent head page.
41562306a36Sopenharmony_ci
41662306a36Sopenharmony_ci
41762306a36Sopenharmony_ciMaking the Ring Buffer Lockless:
41862306a36Sopenharmony_ci--------------------------------
41962306a36Sopenharmony_ci
42062306a36Sopenharmony_ciThe main idea behind the lockless algorithm is to combine the moving
42162306a36Sopenharmony_ciof the head_page pointer with the swapping of pages with the reader.
42262306a36Sopenharmony_ciState flags are placed inside the pointer to the page. To do this,
42362306a36Sopenharmony_cieach page must be aligned in memory by 4 bytes. This will allow the 2
42462306a36Sopenharmony_cileast significant bits of the address to be used as flags, since
42562306a36Sopenharmony_cithey will always be zero for the address. To get the address,
42662306a36Sopenharmony_cisimply mask out the flags::
42762306a36Sopenharmony_ci
42862306a36Sopenharmony_ci  MASK = ~3
42962306a36Sopenharmony_ci
43062306a36Sopenharmony_ci  address & MASK
43162306a36Sopenharmony_ci
43262306a36Sopenharmony_ciTwo flags will be kept by these two bits:
43362306a36Sopenharmony_ci
43462306a36Sopenharmony_ci   HEADER
43562306a36Sopenharmony_ci	- the page being pointed to is a head page
43662306a36Sopenharmony_ci
43762306a36Sopenharmony_ci   UPDATE
43862306a36Sopenharmony_ci	- the page being pointed to is being updated by a writer
43962306a36Sopenharmony_ci          and was or is about to be a head page.
44062306a36Sopenharmony_ci
44162306a36Sopenharmony_ci::
44262306a36Sopenharmony_ci
44362306a36Sopenharmony_ci	      reader page
44462306a36Sopenharmony_ci		  |
44562306a36Sopenharmony_ci		  v
44662306a36Sopenharmony_ci		+---+
44762306a36Sopenharmony_ci		|   |------+
44862306a36Sopenharmony_ci		+---+      |
44962306a36Sopenharmony_ci			    |
45062306a36Sopenharmony_ci			    v
45162306a36Sopenharmony_ci	+---+    +---+    +---+    +---+
45262306a36Sopenharmony_ci    <---|   |--->|   |-H->|   |--->|   |--->
45362306a36Sopenharmony_ci    --->|   |<---|   |<---|   |<---|   |<---
45462306a36Sopenharmony_ci	+---+    +---+    +---+    +---+
45562306a36Sopenharmony_ci
45662306a36Sopenharmony_ci
45762306a36Sopenharmony_ciThe above pointer "-H->" would have the HEADER flag set. That is
45862306a36Sopenharmony_cithe next page is the next page to be swapped out by the reader.
45962306a36Sopenharmony_ciThis pointer means the next page is the head page.
46062306a36Sopenharmony_ci
46162306a36Sopenharmony_ciWhen the tail page meets the head pointer, it will use cmpxchg to
46262306a36Sopenharmony_cichange the pointer to the UPDATE state::
46362306a36Sopenharmony_ci
46462306a36Sopenharmony_ci
46562306a36Sopenharmony_ci              tail page
46662306a36Sopenharmony_ci                 |
46762306a36Sopenharmony_ci                 v
46862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
46962306a36Sopenharmony_ci  <---|   |--->|   |-H->|   |--->|   |--->
47062306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
47162306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
47262306a36Sopenharmony_ci
47362306a36Sopenharmony_ci              tail page
47462306a36Sopenharmony_ci                 |
47562306a36Sopenharmony_ci                 v
47662306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
47762306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |--->|   |--->
47862306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
47962306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
48062306a36Sopenharmony_ci
48162306a36Sopenharmony_ci"-U->" represents a pointer in the UPDATE state.
48262306a36Sopenharmony_ci
48362306a36Sopenharmony_ciAny access to the reader will need to take some sort of lock to serialize
48462306a36Sopenharmony_cithe readers. But the writers will never take a lock to write to the
48562306a36Sopenharmony_ciring buffer. This means we only need to worry about a single reader,
48662306a36Sopenharmony_ciand writes only preempt in "stack" formation.
48762306a36Sopenharmony_ci
48862306a36Sopenharmony_ciWhen the reader tries to swap the page with the ring buffer, it
48962306a36Sopenharmony_ciwill also use cmpxchg. If the flag bit in the pointer to the
49062306a36Sopenharmony_cihead page does not have the HEADER flag set, the compare will fail
49162306a36Sopenharmony_ciand the reader will need to look for the new head page and try again.
49262306a36Sopenharmony_ciNote, the flags UPDATE and HEADER are never set at the same time.
49362306a36Sopenharmony_ci
49462306a36Sopenharmony_ciThe reader swaps the reader page as follows::
49562306a36Sopenharmony_ci
49662306a36Sopenharmony_ci  +------+
49762306a36Sopenharmony_ci  |reader|          RING BUFFER
49862306a36Sopenharmony_ci  |page  |
49962306a36Sopenharmony_ci  +------+
50062306a36Sopenharmony_ci                  +---+    +---+    +---+
50162306a36Sopenharmony_ci                  |   |--->|   |--->|   |
50262306a36Sopenharmony_ci                  |   |<---|   |<---|   |
50362306a36Sopenharmony_ci                  +---+    +---+    +---+
50462306a36Sopenharmony_ci                   ^ |               ^ |
50562306a36Sopenharmony_ci                   | +---------------+ |
50662306a36Sopenharmony_ci                   +-----H-------------+
50762306a36Sopenharmony_ci
50862306a36Sopenharmony_ciThe reader sets the reader page next pointer as HEADER to the page after
50962306a36Sopenharmony_cithe head page::
51062306a36Sopenharmony_ci
51162306a36Sopenharmony_ci
51262306a36Sopenharmony_ci  +------+
51362306a36Sopenharmony_ci  |reader|          RING BUFFER
51462306a36Sopenharmony_ci  |page  |-------H-----------+
51562306a36Sopenharmony_ci  +------+                   v
51662306a36Sopenharmony_ci    |             +---+    +---+    +---+
51762306a36Sopenharmony_ci    |             |   |--->|   |--->|   |
51862306a36Sopenharmony_ci    |             |   |<---|   |<---|   |<-+
51962306a36Sopenharmony_ci    |             +---+    +---+    +---+  |
52062306a36Sopenharmony_ci    |              ^ |               ^ |   |
52162306a36Sopenharmony_ci    |              | +---------------+ |   |
52262306a36Sopenharmony_ci    |              +-----H-------------+   |
52362306a36Sopenharmony_ci    +--------------------------------------+
52462306a36Sopenharmony_ci
52562306a36Sopenharmony_ciIt does a cmpxchg with the pointer to the previous head page to make it
52662306a36Sopenharmony_cipoint to the reader page. Note that the new pointer does not have the HEADER
52762306a36Sopenharmony_ciflag set.  This action atomically moves the head page forward::
52862306a36Sopenharmony_ci
52962306a36Sopenharmony_ci  +------+
53062306a36Sopenharmony_ci  |reader|          RING BUFFER
53162306a36Sopenharmony_ci  |page  |-------H-----------+
53262306a36Sopenharmony_ci  +------+                   v
53362306a36Sopenharmony_ci    |  ^          +---+   +---+   +---+
53462306a36Sopenharmony_ci    |  |          |   |-->|   |-->|   |
53562306a36Sopenharmony_ci    |  |          |   |<--|   |<--|   |<-+
53662306a36Sopenharmony_ci    |  |          +---+   +---+   +---+  |
53762306a36Sopenharmony_ci    |  |             |             ^ |   |
53862306a36Sopenharmony_ci    |  |             +-------------+ |   |
53962306a36Sopenharmony_ci    |  +-----------------------------+   |
54062306a36Sopenharmony_ci    +------------------------------------+
54162306a36Sopenharmony_ci
54262306a36Sopenharmony_ciAfter the new head page is set, the previous pointer of the head page is
54362306a36Sopenharmony_ciupdated to the reader page::
54462306a36Sopenharmony_ci
54562306a36Sopenharmony_ci  +------+
54662306a36Sopenharmony_ci  |reader|          RING BUFFER
54762306a36Sopenharmony_ci  |page  |-------H-----------+
54862306a36Sopenharmony_ci  +------+ <---------------+ v
54962306a36Sopenharmony_ci    |  ^          +---+   +---+   +---+
55062306a36Sopenharmony_ci    |  |          |   |-->|   |-->|   |
55162306a36Sopenharmony_ci    |  |          |   |   |   |<--|   |<-+
55262306a36Sopenharmony_ci    |  |          +---+   +---+   +---+  |
55362306a36Sopenharmony_ci    |  |             |             ^ |   |
55462306a36Sopenharmony_ci    |  |             +-------------+ |   |
55562306a36Sopenharmony_ci    |  +-----------------------------+   |
55662306a36Sopenharmony_ci    +------------------------------------+
55762306a36Sopenharmony_ci
55862306a36Sopenharmony_ci  +------+
55962306a36Sopenharmony_ci  |buffer|          RING BUFFER
56062306a36Sopenharmony_ci  |page  |-------H-----------+  <--- New head page
56162306a36Sopenharmony_ci  +------+ <---------------+ v
56262306a36Sopenharmony_ci    |  ^          +---+   +---+   +---+
56362306a36Sopenharmony_ci    |  |          |   |   |   |-->|   |
56462306a36Sopenharmony_ci    |  |  New     |   |   |   |<--|   |<-+
56562306a36Sopenharmony_ci    |  | Reader   +---+   +---+   +---+  |
56662306a36Sopenharmony_ci    |  |  page ----^                 |   |
56762306a36Sopenharmony_ci    |  |                             |   |
56862306a36Sopenharmony_ci    |  +-----------------------------+   |
56962306a36Sopenharmony_ci    +------------------------------------+
57062306a36Sopenharmony_ci
57162306a36Sopenharmony_ciAnother important point: The page that the reader page points back to
57262306a36Sopenharmony_ciby its previous pointer (the one that now points to the new head page)
57362306a36Sopenharmony_cinever points back to the reader page. That is because the reader page is
57462306a36Sopenharmony_cinot part of the ring buffer. Traversing the ring buffer via the next pointers
57562306a36Sopenharmony_ciwill always stay in the ring buffer. Traversing the ring buffer via the
57662306a36Sopenharmony_ciprev pointers may not.
57762306a36Sopenharmony_ci
57862306a36Sopenharmony_ciNote, the way to determine a reader page is simply by examining the previous
57962306a36Sopenharmony_cipointer of the page. If the next pointer of the previous page does not
58062306a36Sopenharmony_cipoint back to the original page, then the original page is a reader page::
58162306a36Sopenharmony_ci
58262306a36Sopenharmony_ci
58362306a36Sopenharmony_ci             +--------+
58462306a36Sopenharmony_ci             | reader |  next   +----+
58562306a36Sopenharmony_ci             |  page  |-------->|    |<====== (buffer page)
58662306a36Sopenharmony_ci             +--------+         +----+
58762306a36Sopenharmony_ci                 |                | ^
58862306a36Sopenharmony_ci                 |                v | next
58962306a36Sopenharmony_ci            prev |              +----+
59062306a36Sopenharmony_ci                 +------------->|    |
59162306a36Sopenharmony_ci                                +----+
59262306a36Sopenharmony_ci
59362306a36Sopenharmony_ciThe way the head page moves forward:
59462306a36Sopenharmony_ci
59562306a36Sopenharmony_ciWhen the tail page meets the head page and the buffer is in overwrite mode
59662306a36Sopenharmony_ciand more writes take place, the head page must be moved forward before the
59762306a36Sopenharmony_ciwriter may move the tail page. The way this is done is that the writer
59862306a36Sopenharmony_ciperforms a cmpxchg to convert the pointer to the head page from the HEADER
59962306a36Sopenharmony_ciflag to have the UPDATE flag set. Once this is done, the reader will
60062306a36Sopenharmony_cinot be able to swap the head page from the buffer, nor will it be able to
60162306a36Sopenharmony_cimove the head page, until the writer is finished with the move.
60262306a36Sopenharmony_ci
60362306a36Sopenharmony_ciThis eliminates any races that the reader can have on the writer. The reader
60462306a36Sopenharmony_cimust spin, and this is why the reader cannot preempt the writer::
60562306a36Sopenharmony_ci
60662306a36Sopenharmony_ci              tail page
60762306a36Sopenharmony_ci                 |
60862306a36Sopenharmony_ci                 v
60962306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
61062306a36Sopenharmony_ci  <---|   |--->|   |-H->|   |--->|   |--->
61162306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
61262306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
61362306a36Sopenharmony_ci
61462306a36Sopenharmony_ci              tail page
61562306a36Sopenharmony_ci                 |
61662306a36Sopenharmony_ci                 v
61762306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
61862306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |--->|   |--->
61962306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
62062306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
62162306a36Sopenharmony_ci
62262306a36Sopenharmony_ciThe following page will be made into the new head page::
62362306a36Sopenharmony_ci
62462306a36Sopenharmony_ci             tail page
62562306a36Sopenharmony_ci                 |
62662306a36Sopenharmony_ci                 v
62762306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
62862306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-H->|   |--->
62962306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
63062306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
63162306a36Sopenharmony_ci
63262306a36Sopenharmony_ciAfter the new head page has been set, we can set the old head page
63362306a36Sopenharmony_cipointer back to NORMAL::
63462306a36Sopenharmony_ci
63562306a36Sopenharmony_ci             tail page
63662306a36Sopenharmony_ci                 |
63762306a36Sopenharmony_ci                 v
63862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
63962306a36Sopenharmony_ci  <---|   |--->|   |--->|   |-H->|   |--->
64062306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
64162306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
64262306a36Sopenharmony_ci
64362306a36Sopenharmony_ciAfter the head page has been moved, the tail page may now move forward::
64462306a36Sopenharmony_ci
64562306a36Sopenharmony_ci                      tail page
64662306a36Sopenharmony_ci                          |
64762306a36Sopenharmony_ci                          v
64862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
64962306a36Sopenharmony_ci  <---|   |--->|   |--->|   |-H->|   |--->
65062306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
65162306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
65262306a36Sopenharmony_ci
65362306a36Sopenharmony_ci
65462306a36Sopenharmony_ciThe above are the trivial updates. Now for the more complex scenarios.
65562306a36Sopenharmony_ci
65662306a36Sopenharmony_ci
65762306a36Sopenharmony_ciAs stated before, if enough writes preempt the first write, the
65862306a36Sopenharmony_citail page may make it all the way around the buffer and meet the commit
65962306a36Sopenharmony_cipage. At this time, we must start dropping writes (usually with some kind
66062306a36Sopenharmony_ciof warning to the user). But what happens if the commit was still on the
66162306a36Sopenharmony_cireader page? The commit page is not part of the ring buffer. The tail page
66262306a36Sopenharmony_cimust account for this::
66362306a36Sopenharmony_ci
66462306a36Sopenharmony_ci
66562306a36Sopenharmony_ci            reader page    commit page
66662306a36Sopenharmony_ci                |              |
66762306a36Sopenharmony_ci                v              |
66862306a36Sopenharmony_ci               +---+           |
66962306a36Sopenharmony_ci               |   |<----------+
67062306a36Sopenharmony_ci               |   |
67162306a36Sopenharmony_ci               |   |------+
67262306a36Sopenharmony_ci               +---+      |
67362306a36Sopenharmony_ci                          |
67462306a36Sopenharmony_ci                          v
67562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
67662306a36Sopenharmony_ci  <---|   |--->|   |-H->|   |--->|   |--->
67762306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
67862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
67962306a36Sopenharmony_ci                 ^
68062306a36Sopenharmony_ci                 |
68162306a36Sopenharmony_ci             tail page
68262306a36Sopenharmony_ci
68362306a36Sopenharmony_ciIf the tail page were to simply push the head page forward, the commit when
68462306a36Sopenharmony_cileaving the reader page would not be pointing to the correct page.
68562306a36Sopenharmony_ci
68662306a36Sopenharmony_ciThe solution to this is to test if the commit page is on the reader page
68762306a36Sopenharmony_cibefore pushing the head page. If it is, then it can be assumed that the
68862306a36Sopenharmony_citail page wrapped the buffer, and we must drop new writes.
68962306a36Sopenharmony_ci
69062306a36Sopenharmony_ciThis is not a race condition, because the commit page can only be moved
69162306a36Sopenharmony_ciby the outermost writer (the writer that was preempted).
69262306a36Sopenharmony_ciThis means that the commit will not move while a writer is moving the
69362306a36Sopenharmony_citail page. The reader cannot swap the reader page if it is also being
69462306a36Sopenharmony_ciused as the commit page. The reader can simply check that the commit
69562306a36Sopenharmony_ciis off the reader page. Once the commit page leaves the reader page
69662306a36Sopenharmony_ciit will never go back on it unless a reader does another swap with the
69762306a36Sopenharmony_cibuffer page that is also the commit page.
69862306a36Sopenharmony_ci
69962306a36Sopenharmony_ci
70062306a36Sopenharmony_ciNested writes
70162306a36Sopenharmony_ci-------------
70262306a36Sopenharmony_ci
70362306a36Sopenharmony_ciIn the pushing forward of the tail page we must first push forward
70462306a36Sopenharmony_cithe head page if the head page is the next page. If the head page
70562306a36Sopenharmony_ciis not the next page, the tail page is simply updated with a cmpxchg.
70662306a36Sopenharmony_ci
70762306a36Sopenharmony_ciOnly writers move the tail page. This must be done atomically to protect
70862306a36Sopenharmony_ciagainst nested writers::
70962306a36Sopenharmony_ci
71062306a36Sopenharmony_ci  temp_page = tail_page
71162306a36Sopenharmony_ci  next_page = temp_page->next
71262306a36Sopenharmony_ci  cmpxchg(tail_page, temp_page, next_page)
71362306a36Sopenharmony_ci
71462306a36Sopenharmony_ciThe above will update the tail page if it is still pointing to the expected
71562306a36Sopenharmony_cipage. If this fails, a nested write pushed it forward, the current write
71662306a36Sopenharmony_cidoes not need to push it::
71762306a36Sopenharmony_ci
71862306a36Sopenharmony_ci
71962306a36Sopenharmony_ci             temp page
72062306a36Sopenharmony_ci                 |
72162306a36Sopenharmony_ci                 v
72262306a36Sopenharmony_ci              tail page
72362306a36Sopenharmony_ci                 |
72462306a36Sopenharmony_ci                 v
72562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
72662306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |--->
72762306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
72862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
72962306a36Sopenharmony_ci
73062306a36Sopenharmony_ciNested write comes in and moves the tail page forward::
73162306a36Sopenharmony_ci
73262306a36Sopenharmony_ci                      tail page (moved by nested writer)
73362306a36Sopenharmony_ci              temp page   |
73462306a36Sopenharmony_ci                 |        |
73562306a36Sopenharmony_ci                 v        v
73662306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
73762306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |--->
73862306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
73962306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
74062306a36Sopenharmony_ci
74162306a36Sopenharmony_ciThe above would fail the cmpxchg, but since the tail page has already
74262306a36Sopenharmony_cibeen moved forward, the writer will just try again to reserve storage
74362306a36Sopenharmony_cion the new tail page.
74462306a36Sopenharmony_ci
74562306a36Sopenharmony_ciBut the moving of the head page is a bit more complex::
74662306a36Sopenharmony_ci
74762306a36Sopenharmony_ci              tail page
74862306a36Sopenharmony_ci                 |
74962306a36Sopenharmony_ci                 v
75062306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
75162306a36Sopenharmony_ci  <---|   |--->|   |-H->|   |--->|   |--->
75262306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
75362306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
75462306a36Sopenharmony_ci
75562306a36Sopenharmony_ciThe write converts the head page pointer to UPDATE::
75662306a36Sopenharmony_ci
75762306a36Sopenharmony_ci              tail page
75862306a36Sopenharmony_ci                 |
75962306a36Sopenharmony_ci                 v
76062306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
76162306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |--->|   |--->
76262306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
76362306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
76462306a36Sopenharmony_ci
76562306a36Sopenharmony_ciBut if a nested writer preempts here, it will see that the next
76662306a36Sopenharmony_cipage is a head page, but it is also nested. It will detect that
76762306a36Sopenharmony_ciit is nested and will save that information. The detection is the
76862306a36Sopenharmony_cifact that it sees the UPDATE flag instead of a HEADER or NORMAL
76962306a36Sopenharmony_cipointer.
77062306a36Sopenharmony_ci
77162306a36Sopenharmony_ciThe nested writer will set the new head page pointer::
77262306a36Sopenharmony_ci
77362306a36Sopenharmony_ci             tail page
77462306a36Sopenharmony_ci                 |
77562306a36Sopenharmony_ci                 v
77662306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
77762306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-H->|   |--->
77862306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
77962306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
78062306a36Sopenharmony_ci
78162306a36Sopenharmony_ciBut it will not reset the update back to normal. Only the writer
78262306a36Sopenharmony_cithat converted a pointer from HEAD to UPDATE will convert it back
78362306a36Sopenharmony_cito NORMAL::
78462306a36Sopenharmony_ci
78562306a36Sopenharmony_ci                      tail page
78662306a36Sopenharmony_ci                          |
78762306a36Sopenharmony_ci                          v
78862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
78962306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-H->|   |--->
79062306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
79162306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
79262306a36Sopenharmony_ci
79362306a36Sopenharmony_ciAfter the nested writer finishes, the outermost writer will convert
79462306a36Sopenharmony_cithe UPDATE pointer to NORMAL::
79562306a36Sopenharmony_ci
79662306a36Sopenharmony_ci
79762306a36Sopenharmony_ci                      tail page
79862306a36Sopenharmony_ci                          |
79962306a36Sopenharmony_ci                          v
80062306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
80162306a36Sopenharmony_ci  <---|   |--->|   |--->|   |-H->|   |--->
80262306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
80362306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
80462306a36Sopenharmony_ci
80562306a36Sopenharmony_ci
80662306a36Sopenharmony_ciIt can be even more complex if several nested writes came in and moved
80762306a36Sopenharmony_cithe tail page ahead several pages::
80862306a36Sopenharmony_ci
80962306a36Sopenharmony_ci
81062306a36Sopenharmony_ci  (first writer)
81162306a36Sopenharmony_ci
81262306a36Sopenharmony_ci              tail page
81362306a36Sopenharmony_ci                 |
81462306a36Sopenharmony_ci                 v
81562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
81662306a36Sopenharmony_ci  <---|   |--->|   |-H->|   |--->|   |--->
81762306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
81862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
81962306a36Sopenharmony_ci
82062306a36Sopenharmony_ciThe write converts the head page pointer to UPDATE::
82162306a36Sopenharmony_ci
82262306a36Sopenharmony_ci              tail page
82362306a36Sopenharmony_ci                 |
82462306a36Sopenharmony_ci                 v
82562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
82662306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |--->|   |--->
82762306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
82862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
82962306a36Sopenharmony_ci
83062306a36Sopenharmony_ciNext writer comes in, and sees the update and sets up the new
83162306a36Sopenharmony_cihead page::
83262306a36Sopenharmony_ci
83362306a36Sopenharmony_ci  (second writer)
83462306a36Sopenharmony_ci
83562306a36Sopenharmony_ci             tail page
83662306a36Sopenharmony_ci                 |
83762306a36Sopenharmony_ci                 v
83862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
83962306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-H->|   |--->
84062306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
84162306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
84262306a36Sopenharmony_ci
84362306a36Sopenharmony_ciThe nested writer moves the tail page forward. But does not set the old
84462306a36Sopenharmony_ciupdate page to NORMAL because it is not the outermost writer::
84562306a36Sopenharmony_ci
84662306a36Sopenharmony_ci                      tail page
84762306a36Sopenharmony_ci                          |
84862306a36Sopenharmony_ci                          v
84962306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
85062306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-H->|   |--->
85162306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
85262306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
85362306a36Sopenharmony_ci
85462306a36Sopenharmony_ciAnother writer preempts and sees the page after the tail page is a head page.
85562306a36Sopenharmony_ciIt changes it from HEAD to UPDATE::
85662306a36Sopenharmony_ci
85762306a36Sopenharmony_ci  (third writer)
85862306a36Sopenharmony_ci
85962306a36Sopenharmony_ci                      tail page
86062306a36Sopenharmony_ci                          |
86162306a36Sopenharmony_ci                          v
86262306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
86362306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-U->|   |--->
86462306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
86562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
86662306a36Sopenharmony_ci
86762306a36Sopenharmony_ciThe writer will move the head page forward::
86862306a36Sopenharmony_ci
86962306a36Sopenharmony_ci
87062306a36Sopenharmony_ci  (third writer)
87162306a36Sopenharmony_ci
87262306a36Sopenharmony_ci                      tail page
87362306a36Sopenharmony_ci                          |
87462306a36Sopenharmony_ci                          v
87562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
87662306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-U->|   |-H->
87762306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
87862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
87962306a36Sopenharmony_ci
88062306a36Sopenharmony_ciBut now that the third writer did change the HEAD flag to UPDATE it
88162306a36Sopenharmony_ciwill convert it to normal::
88262306a36Sopenharmony_ci
88362306a36Sopenharmony_ci
88462306a36Sopenharmony_ci  (third writer)
88562306a36Sopenharmony_ci
88662306a36Sopenharmony_ci                      tail page
88762306a36Sopenharmony_ci                          |
88862306a36Sopenharmony_ci                          v
88962306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
89062306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |--->|   |-H->
89162306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
89262306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
89362306a36Sopenharmony_ci
89462306a36Sopenharmony_ci
89562306a36Sopenharmony_ciThen it will move the tail page, and return back to the second writer::
89662306a36Sopenharmony_ci
89762306a36Sopenharmony_ci
89862306a36Sopenharmony_ci  (second writer)
89962306a36Sopenharmony_ci
90062306a36Sopenharmony_ci                               tail page
90162306a36Sopenharmony_ci                                   |
90262306a36Sopenharmony_ci                                   v
90362306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
90462306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |--->|   |-H->
90562306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
90662306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
90762306a36Sopenharmony_ci
90862306a36Sopenharmony_ci
90962306a36Sopenharmony_ciThe second writer will fail to move the tail page because it was already
91062306a36Sopenharmony_cimoved, so it will try again and add its data to the new tail page.
91162306a36Sopenharmony_ciIt will return to the first writer::
91262306a36Sopenharmony_ci
91362306a36Sopenharmony_ci
91462306a36Sopenharmony_ci  (first writer)
91562306a36Sopenharmony_ci
91662306a36Sopenharmony_ci                               tail page
91762306a36Sopenharmony_ci                                   |
91862306a36Sopenharmony_ci                                   v
91962306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
92062306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |--->|   |-H->
92162306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
92262306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
92362306a36Sopenharmony_ci
92462306a36Sopenharmony_ciThe first writer cannot know atomically if the tail page moved
92562306a36Sopenharmony_ciwhile it updates the HEAD page. It will then update the head page to
92662306a36Sopenharmony_ciwhat it thinks is the new head page::
92762306a36Sopenharmony_ci
92862306a36Sopenharmony_ci
92962306a36Sopenharmony_ci  (first writer)
93062306a36Sopenharmony_ci
93162306a36Sopenharmony_ci                               tail page
93262306a36Sopenharmony_ci                                   |
93362306a36Sopenharmony_ci                                   v
93462306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
93562306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-H->|   |-H->
93662306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
93762306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
93862306a36Sopenharmony_ci
93962306a36Sopenharmony_ciSince the cmpxchg returns the old value of the pointer the first writer
94062306a36Sopenharmony_ciwill see it succeeded in updating the pointer from NORMAL to HEAD.
94162306a36Sopenharmony_ciBut as we can see, this is not good enough. It must also check to see
94262306a36Sopenharmony_ciif the tail page is either where it use to be or on the next page::
94362306a36Sopenharmony_ci
94462306a36Sopenharmony_ci
94562306a36Sopenharmony_ci  (first writer)
94662306a36Sopenharmony_ci
94762306a36Sopenharmony_ci                 A        B    tail page
94862306a36Sopenharmony_ci                 |        |        |
94962306a36Sopenharmony_ci                 v        v        v
95062306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
95162306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |-H->|   |-H->
95262306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
95362306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
95462306a36Sopenharmony_ci
95562306a36Sopenharmony_ciIf tail page != A and tail page != B, then it must reset the pointer
95662306a36Sopenharmony_ciback to NORMAL. The fact that it only needs to worry about nested
95762306a36Sopenharmony_ciwriters means that it only needs to check this after setting the HEAD page::
95862306a36Sopenharmony_ci
95962306a36Sopenharmony_ci
96062306a36Sopenharmony_ci  (first writer)
96162306a36Sopenharmony_ci
96262306a36Sopenharmony_ci                 A        B    tail page
96362306a36Sopenharmony_ci                 |        |        |
96462306a36Sopenharmony_ci                 v        v        v
96562306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
96662306a36Sopenharmony_ci  <---|   |--->|   |-U->|   |--->|   |-H->
96762306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
96862306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
96962306a36Sopenharmony_ci
97062306a36Sopenharmony_ciNow the writer can update the head page. This is also why the head page must
97162306a36Sopenharmony_ciremain in UPDATE and only reset by the outermost writer. This prevents
97262306a36Sopenharmony_cithe reader from seeing the incorrect head page::
97362306a36Sopenharmony_ci
97462306a36Sopenharmony_ci
97562306a36Sopenharmony_ci  (first writer)
97662306a36Sopenharmony_ci
97762306a36Sopenharmony_ci                 A        B    tail page
97862306a36Sopenharmony_ci                 |        |        |
97962306a36Sopenharmony_ci                 v        v        v
98062306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
98162306a36Sopenharmony_ci  <---|   |--->|   |--->|   |--->|   |-H->
98262306a36Sopenharmony_ci  --->|   |<---|   |<---|   |<---|   |<---
98362306a36Sopenharmony_ci      +---+    +---+    +---+    +---+
984