162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-only 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci=========================== 462306a36Sopenharmony_ciLockless Ring Buffer Design 562306a36Sopenharmony_ci=========================== 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciCopyright 2009 Red Hat Inc. 862306a36Sopenharmony_ci 962306a36Sopenharmony_ci:Author: Steven Rostedt <srostedt@redhat.com> 1062306a36Sopenharmony_ci:License: The GNU Free Documentation License, Version 1.2 1162306a36Sopenharmony_ci (dual licensed under the GPL v2) 1262306a36Sopenharmony_ci:Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto, 1362306a36Sopenharmony_ci and Frederic Weisbecker. 1462306a36Sopenharmony_ci 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ciWritten for: 2.6.31 1762306a36Sopenharmony_ci 1862306a36Sopenharmony_ciTerminology used in this Document 1962306a36Sopenharmony_ci--------------------------------- 2062306a36Sopenharmony_ci 2162306a36Sopenharmony_citail 2262306a36Sopenharmony_ci - where new writes happen in the ring buffer. 2362306a36Sopenharmony_ci 2462306a36Sopenharmony_cihead 2562306a36Sopenharmony_ci - where new reads happen in the ring buffer. 2662306a36Sopenharmony_ci 2762306a36Sopenharmony_ciproducer 2862306a36Sopenharmony_ci - the task that writes into the ring buffer (same as writer) 2962306a36Sopenharmony_ci 3062306a36Sopenharmony_ciwriter 3162306a36Sopenharmony_ci - same as producer 3262306a36Sopenharmony_ci 3362306a36Sopenharmony_ciconsumer 3462306a36Sopenharmony_ci - the task that reads from the buffer (same as reader) 3562306a36Sopenharmony_ci 3662306a36Sopenharmony_cireader 3762306a36Sopenharmony_ci - same as consumer. 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_cireader_page 4062306a36Sopenharmony_ci - A page outside the ring buffer used solely (for the most part) 4162306a36Sopenharmony_ci by the reader. 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_cihead_page 4462306a36Sopenharmony_ci - a pointer to the page that the reader will use next 4562306a36Sopenharmony_ci 4662306a36Sopenharmony_citail_page 4762306a36Sopenharmony_ci - a pointer to the page that will be written to next 4862306a36Sopenharmony_ci 4962306a36Sopenharmony_cicommit_page 5062306a36Sopenharmony_ci - a pointer to the page with the last finished non-nested write. 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_cicmpxchg 5362306a36Sopenharmony_ci - hardware-assisted atomic transaction that performs the following:: 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ci A = B if previous A == C 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ci R = cmpxchg(A, C, B) is saying that we replace A with B if and only 5862306a36Sopenharmony_ci if current A is equal to C, and we put the old (current) 5962306a36Sopenharmony_ci A into R 6062306a36Sopenharmony_ci 6162306a36Sopenharmony_ci R gets the previous A regardless if A is updated with B or not. 6262306a36Sopenharmony_ci 6362306a36Sopenharmony_ci To see if the update was successful a compare of ``R == C`` 6462306a36Sopenharmony_ci may be used. 6562306a36Sopenharmony_ci 6662306a36Sopenharmony_ciThe Generic Ring Buffer 6762306a36Sopenharmony_ci----------------------- 6862306a36Sopenharmony_ci 6962306a36Sopenharmony_ciThe ring buffer can be used in either an overwrite mode or in 7062306a36Sopenharmony_ciproducer/consumer mode. 7162306a36Sopenharmony_ci 7262306a36Sopenharmony_ciProducer/consumer mode is where if the producer were to fill up the 7362306a36Sopenharmony_cibuffer before the consumer could free up anything, the producer 7462306a36Sopenharmony_ciwill stop writing to the buffer. This will lose most recent events. 7562306a36Sopenharmony_ci 7662306a36Sopenharmony_ciOverwrite mode is where if the producer were to fill up the buffer 7762306a36Sopenharmony_cibefore the consumer could free up anything, the producer will 7862306a36Sopenharmony_cioverwrite the older data. This will lose the oldest events. 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ciNo two writers can write at the same time (on the same per-cpu buffer), 8162306a36Sopenharmony_cibut a writer may interrupt another writer, but it must finish writing 8262306a36Sopenharmony_cibefore the previous writer may continue. This is very important to the 8362306a36Sopenharmony_cialgorithm. The writers act like a "stack". The way interrupts works 8462306a36Sopenharmony_cienforces this behavior:: 8562306a36Sopenharmony_ci 8662306a36Sopenharmony_ci 8762306a36Sopenharmony_ci writer1 start 8862306a36Sopenharmony_ci <preempted> writer2 start 8962306a36Sopenharmony_ci <preempted> writer3 start 9062306a36Sopenharmony_ci writer3 finishes 9162306a36Sopenharmony_ci writer2 finishes 9262306a36Sopenharmony_ci writer1 finishes 9362306a36Sopenharmony_ci 9462306a36Sopenharmony_ciThis is very much like a writer being preempted by an interrupt and 9562306a36Sopenharmony_cithe interrupt doing a write as well. 9662306a36Sopenharmony_ci 9762306a36Sopenharmony_ciReaders can happen at any time. But no two readers may run at the 9862306a36Sopenharmony_cisame time, nor can a reader preempt/interrupt another reader. A reader 9962306a36Sopenharmony_cicannot preempt/interrupt a writer, but it may read/consume from the 10062306a36Sopenharmony_cibuffer at the same time as a writer is writing, but the reader must be 10162306a36Sopenharmony_cion another processor to do so. A reader may read on its own processor 10262306a36Sopenharmony_ciand can be preempted by a writer. 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ciA writer can preempt a reader, but a reader cannot preempt a writer. 10562306a36Sopenharmony_ciBut a reader can read the buffer at the same time (on another processor) 10662306a36Sopenharmony_cias a writer. 10762306a36Sopenharmony_ci 10862306a36Sopenharmony_ciThe ring buffer is made up of a list of pages held together by a linked list. 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ciAt initialization a reader page is allocated for the reader that is not 11162306a36Sopenharmony_cipart of the ring buffer. 11262306a36Sopenharmony_ci 11362306a36Sopenharmony_ciThe head_page, tail_page and commit_page are all initialized to point 11462306a36Sopenharmony_cito the same page. 11562306a36Sopenharmony_ci 11662306a36Sopenharmony_ciThe reader page is initialized to have its next pointer pointing to 11762306a36Sopenharmony_cithe head page, and its previous pointer pointing to a page before 11862306a36Sopenharmony_cithe head page. 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ciThe reader has its own page to use. At start up time, this page is 12162306a36Sopenharmony_ciallocated but is not attached to the list. When the reader wants 12262306a36Sopenharmony_cito read from the buffer, if its page is empty (like it is on start-up), 12362306a36Sopenharmony_ciit will swap its page with the head_page. The old reader page will 12462306a36Sopenharmony_cibecome part of the ring buffer and the head_page will be removed. 12562306a36Sopenharmony_ciThe page after the inserted page (old reader_page) will become the 12662306a36Sopenharmony_cinew head page. 12762306a36Sopenharmony_ci 12862306a36Sopenharmony_ciOnce the new page is given to the reader, the reader could do what 12962306a36Sopenharmony_ciit wants with it, as long as a writer has left that page. 13062306a36Sopenharmony_ci 13162306a36Sopenharmony_ciA sample of how the reader page is swapped: Note this does not 13262306a36Sopenharmony_cishow the head page in the buffer, it is for demonstrating a swap 13362306a36Sopenharmony_cionly. 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ci:: 13662306a36Sopenharmony_ci 13762306a36Sopenharmony_ci +------+ 13862306a36Sopenharmony_ci |reader| RING BUFFER 13962306a36Sopenharmony_ci |page | 14062306a36Sopenharmony_ci +------+ 14162306a36Sopenharmony_ci +---+ +---+ +---+ 14262306a36Sopenharmony_ci | |-->| |-->| | 14362306a36Sopenharmony_ci | |<--| |<--| | 14462306a36Sopenharmony_ci +---+ +---+ +---+ 14562306a36Sopenharmony_ci ^ | ^ | 14662306a36Sopenharmony_ci | +-------------+ | 14762306a36Sopenharmony_ci +-----------------+ 14862306a36Sopenharmony_ci 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ci +------+ 15162306a36Sopenharmony_ci |reader| RING BUFFER 15262306a36Sopenharmony_ci |page |-------------------+ 15362306a36Sopenharmony_ci +------+ v 15462306a36Sopenharmony_ci | +---+ +---+ +---+ 15562306a36Sopenharmony_ci | | |-->| |-->| | 15662306a36Sopenharmony_ci | | |<--| |<--| |<-+ 15762306a36Sopenharmony_ci | +---+ +---+ +---+ | 15862306a36Sopenharmony_ci | ^ | ^ | | 15962306a36Sopenharmony_ci | | +-------------+ | | 16062306a36Sopenharmony_ci | +-----------------+ | 16162306a36Sopenharmony_ci +------------------------------------+ 16262306a36Sopenharmony_ci 16362306a36Sopenharmony_ci +------+ 16462306a36Sopenharmony_ci |reader| RING BUFFER 16562306a36Sopenharmony_ci |page |-------------------+ 16662306a36Sopenharmony_ci +------+ <---------------+ v 16762306a36Sopenharmony_ci | ^ +---+ +---+ +---+ 16862306a36Sopenharmony_ci | | | |-->| |-->| | 16962306a36Sopenharmony_ci | | | | | |<--| |<-+ 17062306a36Sopenharmony_ci | | +---+ +---+ +---+ | 17162306a36Sopenharmony_ci | | | ^ | | 17262306a36Sopenharmony_ci | | +-------------+ | | 17362306a36Sopenharmony_ci | +-----------------------------+ | 17462306a36Sopenharmony_ci +------------------------------------+ 17562306a36Sopenharmony_ci 17662306a36Sopenharmony_ci +------+ 17762306a36Sopenharmony_ci |buffer| RING BUFFER 17862306a36Sopenharmony_ci |page |-------------------+ 17962306a36Sopenharmony_ci +------+ <---------------+ v 18062306a36Sopenharmony_ci | ^ +---+ +---+ +---+ 18162306a36Sopenharmony_ci | | | | | |-->| | 18262306a36Sopenharmony_ci | | New | | | |<--| |<-+ 18362306a36Sopenharmony_ci | | Reader +---+ +---+ +---+ | 18462306a36Sopenharmony_ci | | page ----^ | | 18562306a36Sopenharmony_ci | | | | 18662306a36Sopenharmony_ci | +-----------------------------+ | 18762306a36Sopenharmony_ci +------------------------------------+ 18862306a36Sopenharmony_ci 18962306a36Sopenharmony_ci 19062306a36Sopenharmony_ci 19162306a36Sopenharmony_ciIt is possible that the page swapped is the commit page and the tail page, 19262306a36Sopenharmony_ciif what is in the ring buffer is less than what is held in a buffer page. 19362306a36Sopenharmony_ci 19462306a36Sopenharmony_ci:: 19562306a36Sopenharmony_ci 19662306a36Sopenharmony_ci reader page commit page tail page 19762306a36Sopenharmony_ci | | | 19862306a36Sopenharmony_ci v | | 19962306a36Sopenharmony_ci +---+ | | 20062306a36Sopenharmony_ci | |<----------+ | 20162306a36Sopenharmony_ci | |<------------------------+ 20262306a36Sopenharmony_ci | |------+ 20362306a36Sopenharmony_ci +---+ | 20462306a36Sopenharmony_ci | 20562306a36Sopenharmony_ci v 20662306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 20762306a36Sopenharmony_ci <---| |--->| |--->| |--->| |---> 20862306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 20962306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 21062306a36Sopenharmony_ci 21162306a36Sopenharmony_ciThis case is still valid for this algorithm. 21262306a36Sopenharmony_ciWhen the writer leaves the page, it simply goes into the ring buffer 21362306a36Sopenharmony_cisince the reader page still points to the next location in the ring 21462306a36Sopenharmony_cibuffer. 21562306a36Sopenharmony_ci 21662306a36Sopenharmony_ci 21762306a36Sopenharmony_ciThe main pointers: 21862306a36Sopenharmony_ci 21962306a36Sopenharmony_ci reader page 22062306a36Sopenharmony_ci - The page used solely by the reader and is not part 22162306a36Sopenharmony_ci of the ring buffer (may be swapped in) 22262306a36Sopenharmony_ci 22362306a36Sopenharmony_ci head page 22462306a36Sopenharmony_ci - the next page in the ring buffer that will be swapped 22562306a36Sopenharmony_ci with the reader page. 22662306a36Sopenharmony_ci 22762306a36Sopenharmony_ci tail page 22862306a36Sopenharmony_ci - the page where the next write will take place. 22962306a36Sopenharmony_ci 23062306a36Sopenharmony_ci commit page 23162306a36Sopenharmony_ci - the page that last finished a write. 23262306a36Sopenharmony_ci 23362306a36Sopenharmony_ciThe commit page only is updated by the outermost writer in the 23462306a36Sopenharmony_ciwriter stack. A writer that preempts another writer will not move the 23562306a36Sopenharmony_cicommit page. 23662306a36Sopenharmony_ci 23762306a36Sopenharmony_ciWhen data is written into the ring buffer, a position is reserved 23862306a36Sopenharmony_ciin the ring buffer and passed back to the writer. When the writer 23962306a36Sopenharmony_ciis finished writing data into that position, it commits the write. 24062306a36Sopenharmony_ci 24162306a36Sopenharmony_ciAnother write (or a read) may take place at anytime during this 24262306a36Sopenharmony_citransaction. If another write happens it must finish before continuing 24362306a36Sopenharmony_ciwith the previous write. 24462306a36Sopenharmony_ci 24562306a36Sopenharmony_ci 24662306a36Sopenharmony_ci Write reserve:: 24762306a36Sopenharmony_ci 24862306a36Sopenharmony_ci Buffer page 24962306a36Sopenharmony_ci +---------+ 25062306a36Sopenharmony_ci |written | 25162306a36Sopenharmony_ci +---------+ <--- given back to writer (current commit) 25262306a36Sopenharmony_ci |reserved | 25362306a36Sopenharmony_ci +---------+ <--- tail pointer 25462306a36Sopenharmony_ci | empty | 25562306a36Sopenharmony_ci +---------+ 25662306a36Sopenharmony_ci 25762306a36Sopenharmony_ci Write commit:: 25862306a36Sopenharmony_ci 25962306a36Sopenharmony_ci Buffer page 26062306a36Sopenharmony_ci +---------+ 26162306a36Sopenharmony_ci |written | 26262306a36Sopenharmony_ci +---------+ 26362306a36Sopenharmony_ci |written | 26462306a36Sopenharmony_ci +---------+ <--- next position for write (current commit) 26562306a36Sopenharmony_ci | empty | 26662306a36Sopenharmony_ci +---------+ 26762306a36Sopenharmony_ci 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ci If a write happens after the first reserve:: 27062306a36Sopenharmony_ci 27162306a36Sopenharmony_ci Buffer page 27262306a36Sopenharmony_ci +---------+ 27362306a36Sopenharmony_ci |written | 27462306a36Sopenharmony_ci +---------+ <-- current commit 27562306a36Sopenharmony_ci |reserved | 27662306a36Sopenharmony_ci +---------+ <--- given back to second writer 27762306a36Sopenharmony_ci |reserved | 27862306a36Sopenharmony_ci +---------+ <--- tail pointer 27962306a36Sopenharmony_ci 28062306a36Sopenharmony_ci After second writer commits:: 28162306a36Sopenharmony_ci 28262306a36Sopenharmony_ci 28362306a36Sopenharmony_ci Buffer page 28462306a36Sopenharmony_ci +---------+ 28562306a36Sopenharmony_ci |written | 28662306a36Sopenharmony_ci +---------+ <--(last full commit) 28762306a36Sopenharmony_ci |reserved | 28862306a36Sopenharmony_ci +---------+ 28962306a36Sopenharmony_ci |pending | 29062306a36Sopenharmony_ci |commit | 29162306a36Sopenharmony_ci +---------+ <--- tail pointer 29262306a36Sopenharmony_ci 29362306a36Sopenharmony_ci When the first writer commits:: 29462306a36Sopenharmony_ci 29562306a36Sopenharmony_ci Buffer page 29662306a36Sopenharmony_ci +---------+ 29762306a36Sopenharmony_ci |written | 29862306a36Sopenharmony_ci +---------+ 29962306a36Sopenharmony_ci |written | 30062306a36Sopenharmony_ci +---------+ 30162306a36Sopenharmony_ci |written | 30262306a36Sopenharmony_ci +---------+ <--(last full commit and tail pointer) 30362306a36Sopenharmony_ci 30462306a36Sopenharmony_ci 30562306a36Sopenharmony_ciThe commit pointer points to the last write location that was 30662306a36Sopenharmony_cicommitted without preempting another write. When a write that 30762306a36Sopenharmony_cipreempted another write is committed, it only becomes a pending commit 30862306a36Sopenharmony_ciand will not be a full commit until all writes have been committed. 30962306a36Sopenharmony_ci 31062306a36Sopenharmony_ciThe commit page points to the page that has the last full commit. 31162306a36Sopenharmony_ciThe tail page points to the page with the last write (before 31262306a36Sopenharmony_cicommitting). 31362306a36Sopenharmony_ci 31462306a36Sopenharmony_ciThe tail page is always equal to or after the commit page. It may 31562306a36Sopenharmony_cibe several pages ahead. If the tail page catches up to the commit 31662306a36Sopenharmony_cipage then no more writes may take place (regardless of the mode 31762306a36Sopenharmony_ciof the ring buffer: overwrite and produce/consumer). 31862306a36Sopenharmony_ci 31962306a36Sopenharmony_ciThe order of pages is:: 32062306a36Sopenharmony_ci 32162306a36Sopenharmony_ci head page 32262306a36Sopenharmony_ci commit page 32362306a36Sopenharmony_ci tail page 32462306a36Sopenharmony_ci 32562306a36Sopenharmony_ciPossible scenario:: 32662306a36Sopenharmony_ci 32762306a36Sopenharmony_ci tail page 32862306a36Sopenharmony_ci head page commit page | 32962306a36Sopenharmony_ci | | | 33062306a36Sopenharmony_ci v v v 33162306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 33262306a36Sopenharmony_ci <---| |--->| |--->| |--->| |---> 33362306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 33462306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 33562306a36Sopenharmony_ci 33662306a36Sopenharmony_ciThere is a special case that the head page is after either the commit page 33762306a36Sopenharmony_ciand possibly the tail page. That is when the commit (and tail) page has been 33862306a36Sopenharmony_ciswapped with the reader page. This is because the head page is always 33962306a36Sopenharmony_cipart of the ring buffer, but the reader page is not. Whenever there 34062306a36Sopenharmony_cihas been less than a full page that has been committed inside the ring buffer, 34162306a36Sopenharmony_ciand a reader swaps out a page, it will be swapping out the commit page. 34262306a36Sopenharmony_ci 34362306a36Sopenharmony_ci:: 34462306a36Sopenharmony_ci 34562306a36Sopenharmony_ci reader page commit page tail page 34662306a36Sopenharmony_ci | | | 34762306a36Sopenharmony_ci v | | 34862306a36Sopenharmony_ci +---+ | | 34962306a36Sopenharmony_ci | |<----------+ | 35062306a36Sopenharmony_ci | |<------------------------+ 35162306a36Sopenharmony_ci | |------+ 35262306a36Sopenharmony_ci +---+ | 35362306a36Sopenharmony_ci | 35462306a36Sopenharmony_ci v 35562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 35662306a36Sopenharmony_ci <---| |--->| |--->| |--->| |---> 35762306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 35862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 35962306a36Sopenharmony_ci ^ 36062306a36Sopenharmony_ci | 36162306a36Sopenharmony_ci head page 36262306a36Sopenharmony_ci 36362306a36Sopenharmony_ci 36462306a36Sopenharmony_ciIn this case, the head page will not move when the tail and commit 36562306a36Sopenharmony_cimove back into the ring buffer. 36662306a36Sopenharmony_ci 36762306a36Sopenharmony_ciThe reader cannot swap a page into the ring buffer if the commit page 36862306a36Sopenharmony_ciis still on that page. If the read meets the last commit (real commit 36962306a36Sopenharmony_cinot pending or reserved), then there is nothing more to read. 37062306a36Sopenharmony_ciThe buffer is considered empty until another full commit finishes. 37162306a36Sopenharmony_ci 37262306a36Sopenharmony_ciWhen the tail meets the head page, if the buffer is in overwrite mode, 37362306a36Sopenharmony_cithe head page will be pushed ahead one. If the buffer is in producer/consumer 37462306a36Sopenharmony_cimode, the write will fail. 37562306a36Sopenharmony_ci 37662306a36Sopenharmony_ciOverwrite mode:: 37762306a36Sopenharmony_ci 37862306a36Sopenharmony_ci tail page 37962306a36Sopenharmony_ci | 38062306a36Sopenharmony_ci v 38162306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 38262306a36Sopenharmony_ci <---| |--->| |--->| |--->| |---> 38362306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 38462306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 38562306a36Sopenharmony_ci ^ 38662306a36Sopenharmony_ci | 38762306a36Sopenharmony_ci head page 38862306a36Sopenharmony_ci 38962306a36Sopenharmony_ci 39062306a36Sopenharmony_ci tail page 39162306a36Sopenharmony_ci | 39262306a36Sopenharmony_ci v 39362306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 39462306a36Sopenharmony_ci <---| |--->| |--->| |--->| |---> 39562306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 39662306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 39762306a36Sopenharmony_ci ^ 39862306a36Sopenharmony_ci | 39962306a36Sopenharmony_ci head page 40062306a36Sopenharmony_ci 40162306a36Sopenharmony_ci 40262306a36Sopenharmony_ci tail page 40362306a36Sopenharmony_ci | 40462306a36Sopenharmony_ci v 40562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 40662306a36Sopenharmony_ci <---| |--->| |--->| |--->| |---> 40762306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 40862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 40962306a36Sopenharmony_ci ^ 41062306a36Sopenharmony_ci | 41162306a36Sopenharmony_ci head page 41262306a36Sopenharmony_ci 41362306a36Sopenharmony_ciNote, the reader page will still point to the previous head page. 41462306a36Sopenharmony_ciBut when a swap takes place, it will use the most recent head page. 41562306a36Sopenharmony_ci 41662306a36Sopenharmony_ci 41762306a36Sopenharmony_ciMaking the Ring Buffer Lockless: 41862306a36Sopenharmony_ci-------------------------------- 41962306a36Sopenharmony_ci 42062306a36Sopenharmony_ciThe main idea behind the lockless algorithm is to combine the moving 42162306a36Sopenharmony_ciof the head_page pointer with the swapping of pages with the reader. 42262306a36Sopenharmony_ciState flags are placed inside the pointer to the page. To do this, 42362306a36Sopenharmony_cieach page must be aligned in memory by 4 bytes. This will allow the 2 42462306a36Sopenharmony_cileast significant bits of the address to be used as flags, since 42562306a36Sopenharmony_cithey will always be zero for the address. To get the address, 42662306a36Sopenharmony_cisimply mask out the flags:: 42762306a36Sopenharmony_ci 42862306a36Sopenharmony_ci MASK = ~3 42962306a36Sopenharmony_ci 43062306a36Sopenharmony_ci address & MASK 43162306a36Sopenharmony_ci 43262306a36Sopenharmony_ciTwo flags will be kept by these two bits: 43362306a36Sopenharmony_ci 43462306a36Sopenharmony_ci HEADER 43562306a36Sopenharmony_ci - the page being pointed to is a head page 43662306a36Sopenharmony_ci 43762306a36Sopenharmony_ci UPDATE 43862306a36Sopenharmony_ci - the page being pointed to is being updated by a writer 43962306a36Sopenharmony_ci and was or is about to be a head page. 44062306a36Sopenharmony_ci 44162306a36Sopenharmony_ci:: 44262306a36Sopenharmony_ci 44362306a36Sopenharmony_ci reader page 44462306a36Sopenharmony_ci | 44562306a36Sopenharmony_ci v 44662306a36Sopenharmony_ci +---+ 44762306a36Sopenharmony_ci | |------+ 44862306a36Sopenharmony_ci +---+ | 44962306a36Sopenharmony_ci | 45062306a36Sopenharmony_ci v 45162306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 45262306a36Sopenharmony_ci <---| |--->| |-H->| |--->| |---> 45362306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 45462306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 45562306a36Sopenharmony_ci 45662306a36Sopenharmony_ci 45762306a36Sopenharmony_ciThe above pointer "-H->" would have the HEADER flag set. That is 45862306a36Sopenharmony_cithe next page is the next page to be swapped out by the reader. 45962306a36Sopenharmony_ciThis pointer means the next page is the head page. 46062306a36Sopenharmony_ci 46162306a36Sopenharmony_ciWhen the tail page meets the head pointer, it will use cmpxchg to 46262306a36Sopenharmony_cichange the pointer to the UPDATE state:: 46362306a36Sopenharmony_ci 46462306a36Sopenharmony_ci 46562306a36Sopenharmony_ci tail page 46662306a36Sopenharmony_ci | 46762306a36Sopenharmony_ci v 46862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 46962306a36Sopenharmony_ci <---| |--->| |-H->| |--->| |---> 47062306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 47162306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 47262306a36Sopenharmony_ci 47362306a36Sopenharmony_ci tail page 47462306a36Sopenharmony_ci | 47562306a36Sopenharmony_ci v 47662306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 47762306a36Sopenharmony_ci <---| |--->| |-U->| |--->| |---> 47862306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 47962306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 48062306a36Sopenharmony_ci 48162306a36Sopenharmony_ci"-U->" represents a pointer in the UPDATE state. 48262306a36Sopenharmony_ci 48362306a36Sopenharmony_ciAny access to the reader will need to take some sort of lock to serialize 48462306a36Sopenharmony_cithe readers. But the writers will never take a lock to write to the 48562306a36Sopenharmony_ciring buffer. This means we only need to worry about a single reader, 48662306a36Sopenharmony_ciand writes only preempt in "stack" formation. 48762306a36Sopenharmony_ci 48862306a36Sopenharmony_ciWhen the reader tries to swap the page with the ring buffer, it 48962306a36Sopenharmony_ciwill also use cmpxchg. If the flag bit in the pointer to the 49062306a36Sopenharmony_cihead page does not have the HEADER flag set, the compare will fail 49162306a36Sopenharmony_ciand the reader will need to look for the new head page and try again. 49262306a36Sopenharmony_ciNote, the flags UPDATE and HEADER are never set at the same time. 49362306a36Sopenharmony_ci 49462306a36Sopenharmony_ciThe reader swaps the reader page as follows:: 49562306a36Sopenharmony_ci 49662306a36Sopenharmony_ci +------+ 49762306a36Sopenharmony_ci |reader| RING BUFFER 49862306a36Sopenharmony_ci |page | 49962306a36Sopenharmony_ci +------+ 50062306a36Sopenharmony_ci +---+ +---+ +---+ 50162306a36Sopenharmony_ci | |--->| |--->| | 50262306a36Sopenharmony_ci | |<---| |<---| | 50362306a36Sopenharmony_ci +---+ +---+ +---+ 50462306a36Sopenharmony_ci ^ | ^ | 50562306a36Sopenharmony_ci | +---------------+ | 50662306a36Sopenharmony_ci +-----H-------------+ 50762306a36Sopenharmony_ci 50862306a36Sopenharmony_ciThe reader sets the reader page next pointer as HEADER to the page after 50962306a36Sopenharmony_cithe head page:: 51062306a36Sopenharmony_ci 51162306a36Sopenharmony_ci 51262306a36Sopenharmony_ci +------+ 51362306a36Sopenharmony_ci |reader| RING BUFFER 51462306a36Sopenharmony_ci |page |-------H-----------+ 51562306a36Sopenharmony_ci +------+ v 51662306a36Sopenharmony_ci | +---+ +---+ +---+ 51762306a36Sopenharmony_ci | | |--->| |--->| | 51862306a36Sopenharmony_ci | | |<---| |<---| |<-+ 51962306a36Sopenharmony_ci | +---+ +---+ +---+ | 52062306a36Sopenharmony_ci | ^ | ^ | | 52162306a36Sopenharmony_ci | | +---------------+ | | 52262306a36Sopenharmony_ci | +-----H-------------+ | 52362306a36Sopenharmony_ci +--------------------------------------+ 52462306a36Sopenharmony_ci 52562306a36Sopenharmony_ciIt does a cmpxchg with the pointer to the previous head page to make it 52662306a36Sopenharmony_cipoint to the reader page. Note that the new pointer does not have the HEADER 52762306a36Sopenharmony_ciflag set. This action atomically moves the head page forward:: 52862306a36Sopenharmony_ci 52962306a36Sopenharmony_ci +------+ 53062306a36Sopenharmony_ci |reader| RING BUFFER 53162306a36Sopenharmony_ci |page |-------H-----------+ 53262306a36Sopenharmony_ci +------+ v 53362306a36Sopenharmony_ci | ^ +---+ +---+ +---+ 53462306a36Sopenharmony_ci | | | |-->| |-->| | 53562306a36Sopenharmony_ci | | | |<--| |<--| |<-+ 53662306a36Sopenharmony_ci | | +---+ +---+ +---+ | 53762306a36Sopenharmony_ci | | | ^ | | 53862306a36Sopenharmony_ci | | +-------------+ | | 53962306a36Sopenharmony_ci | +-----------------------------+ | 54062306a36Sopenharmony_ci +------------------------------------+ 54162306a36Sopenharmony_ci 54262306a36Sopenharmony_ciAfter the new head page is set, the previous pointer of the head page is 54362306a36Sopenharmony_ciupdated to the reader page:: 54462306a36Sopenharmony_ci 54562306a36Sopenharmony_ci +------+ 54662306a36Sopenharmony_ci |reader| RING BUFFER 54762306a36Sopenharmony_ci |page |-------H-----------+ 54862306a36Sopenharmony_ci +------+ <---------------+ v 54962306a36Sopenharmony_ci | ^ +---+ +---+ +---+ 55062306a36Sopenharmony_ci | | | |-->| |-->| | 55162306a36Sopenharmony_ci | | | | | |<--| |<-+ 55262306a36Sopenharmony_ci | | +---+ +---+ +---+ | 55362306a36Sopenharmony_ci | | | ^ | | 55462306a36Sopenharmony_ci | | +-------------+ | | 55562306a36Sopenharmony_ci | +-----------------------------+ | 55662306a36Sopenharmony_ci +------------------------------------+ 55762306a36Sopenharmony_ci 55862306a36Sopenharmony_ci +------+ 55962306a36Sopenharmony_ci |buffer| RING BUFFER 56062306a36Sopenharmony_ci |page |-------H-----------+ <--- New head page 56162306a36Sopenharmony_ci +------+ <---------------+ v 56262306a36Sopenharmony_ci | ^ +---+ +---+ +---+ 56362306a36Sopenharmony_ci | | | | | |-->| | 56462306a36Sopenharmony_ci | | New | | | |<--| |<-+ 56562306a36Sopenharmony_ci | | Reader +---+ +---+ +---+ | 56662306a36Sopenharmony_ci | | page ----^ | | 56762306a36Sopenharmony_ci | | | | 56862306a36Sopenharmony_ci | +-----------------------------+ | 56962306a36Sopenharmony_ci +------------------------------------+ 57062306a36Sopenharmony_ci 57162306a36Sopenharmony_ciAnother important point: The page that the reader page points back to 57262306a36Sopenharmony_ciby its previous pointer (the one that now points to the new head page) 57362306a36Sopenharmony_cinever points back to the reader page. That is because the reader page is 57462306a36Sopenharmony_cinot part of the ring buffer. Traversing the ring buffer via the next pointers 57562306a36Sopenharmony_ciwill always stay in the ring buffer. Traversing the ring buffer via the 57662306a36Sopenharmony_ciprev pointers may not. 57762306a36Sopenharmony_ci 57862306a36Sopenharmony_ciNote, the way to determine a reader page is simply by examining the previous 57962306a36Sopenharmony_cipointer of the page. If the next pointer of the previous page does not 58062306a36Sopenharmony_cipoint back to the original page, then the original page is a reader page:: 58162306a36Sopenharmony_ci 58262306a36Sopenharmony_ci 58362306a36Sopenharmony_ci +--------+ 58462306a36Sopenharmony_ci | reader | next +----+ 58562306a36Sopenharmony_ci | page |-------->| |<====== (buffer page) 58662306a36Sopenharmony_ci +--------+ +----+ 58762306a36Sopenharmony_ci | | ^ 58862306a36Sopenharmony_ci | v | next 58962306a36Sopenharmony_ci prev | +----+ 59062306a36Sopenharmony_ci +------------->| | 59162306a36Sopenharmony_ci +----+ 59262306a36Sopenharmony_ci 59362306a36Sopenharmony_ciThe way the head page moves forward: 59462306a36Sopenharmony_ci 59562306a36Sopenharmony_ciWhen the tail page meets the head page and the buffer is in overwrite mode 59662306a36Sopenharmony_ciand more writes take place, the head page must be moved forward before the 59762306a36Sopenharmony_ciwriter may move the tail page. The way this is done is that the writer 59862306a36Sopenharmony_ciperforms a cmpxchg to convert the pointer to the head page from the HEADER 59962306a36Sopenharmony_ciflag to have the UPDATE flag set. Once this is done, the reader will 60062306a36Sopenharmony_cinot be able to swap the head page from the buffer, nor will it be able to 60162306a36Sopenharmony_cimove the head page, until the writer is finished with the move. 60262306a36Sopenharmony_ci 60362306a36Sopenharmony_ciThis eliminates any races that the reader can have on the writer. The reader 60462306a36Sopenharmony_cimust spin, and this is why the reader cannot preempt the writer:: 60562306a36Sopenharmony_ci 60662306a36Sopenharmony_ci tail page 60762306a36Sopenharmony_ci | 60862306a36Sopenharmony_ci v 60962306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 61062306a36Sopenharmony_ci <---| |--->| |-H->| |--->| |---> 61162306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 61262306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 61362306a36Sopenharmony_ci 61462306a36Sopenharmony_ci tail page 61562306a36Sopenharmony_ci | 61662306a36Sopenharmony_ci v 61762306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 61862306a36Sopenharmony_ci <---| |--->| |-U->| |--->| |---> 61962306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 62062306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 62162306a36Sopenharmony_ci 62262306a36Sopenharmony_ciThe following page will be made into the new head page:: 62362306a36Sopenharmony_ci 62462306a36Sopenharmony_ci tail page 62562306a36Sopenharmony_ci | 62662306a36Sopenharmony_ci v 62762306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 62862306a36Sopenharmony_ci <---| |--->| |-U->| |-H->| |---> 62962306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 63062306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 63162306a36Sopenharmony_ci 63262306a36Sopenharmony_ciAfter the new head page has been set, we can set the old head page 63362306a36Sopenharmony_cipointer back to NORMAL:: 63462306a36Sopenharmony_ci 63562306a36Sopenharmony_ci tail page 63662306a36Sopenharmony_ci | 63762306a36Sopenharmony_ci v 63862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 63962306a36Sopenharmony_ci <---| |--->| |--->| |-H->| |---> 64062306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 64162306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 64262306a36Sopenharmony_ci 64362306a36Sopenharmony_ciAfter the head page has been moved, the tail page may now move forward:: 64462306a36Sopenharmony_ci 64562306a36Sopenharmony_ci tail page 64662306a36Sopenharmony_ci | 64762306a36Sopenharmony_ci v 64862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 64962306a36Sopenharmony_ci <---| |--->| |--->| |-H->| |---> 65062306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 65162306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 65262306a36Sopenharmony_ci 65362306a36Sopenharmony_ci 65462306a36Sopenharmony_ciThe above are the trivial updates. Now for the more complex scenarios. 65562306a36Sopenharmony_ci 65662306a36Sopenharmony_ci 65762306a36Sopenharmony_ciAs stated before, if enough writes preempt the first write, the 65862306a36Sopenharmony_citail page may make it all the way around the buffer and meet the commit 65962306a36Sopenharmony_cipage. At this time, we must start dropping writes (usually with some kind 66062306a36Sopenharmony_ciof warning to the user). But what happens if the commit was still on the 66162306a36Sopenharmony_cireader page? The commit page is not part of the ring buffer. The tail page 66262306a36Sopenharmony_cimust account for this:: 66362306a36Sopenharmony_ci 66462306a36Sopenharmony_ci 66562306a36Sopenharmony_ci reader page commit page 66662306a36Sopenharmony_ci | | 66762306a36Sopenharmony_ci v | 66862306a36Sopenharmony_ci +---+ | 66962306a36Sopenharmony_ci | |<----------+ 67062306a36Sopenharmony_ci | | 67162306a36Sopenharmony_ci | |------+ 67262306a36Sopenharmony_ci +---+ | 67362306a36Sopenharmony_ci | 67462306a36Sopenharmony_ci v 67562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 67662306a36Sopenharmony_ci <---| |--->| |-H->| |--->| |---> 67762306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 67862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 67962306a36Sopenharmony_ci ^ 68062306a36Sopenharmony_ci | 68162306a36Sopenharmony_ci tail page 68262306a36Sopenharmony_ci 68362306a36Sopenharmony_ciIf the tail page were to simply push the head page forward, the commit when 68462306a36Sopenharmony_cileaving the reader page would not be pointing to the correct page. 68562306a36Sopenharmony_ci 68662306a36Sopenharmony_ciThe solution to this is to test if the commit page is on the reader page 68762306a36Sopenharmony_cibefore pushing the head page. If it is, then it can be assumed that the 68862306a36Sopenharmony_citail page wrapped the buffer, and we must drop new writes. 68962306a36Sopenharmony_ci 69062306a36Sopenharmony_ciThis is not a race condition, because the commit page can only be moved 69162306a36Sopenharmony_ciby the outermost writer (the writer that was preempted). 69262306a36Sopenharmony_ciThis means that the commit will not move while a writer is moving the 69362306a36Sopenharmony_citail page. The reader cannot swap the reader page if it is also being 69462306a36Sopenharmony_ciused as the commit page. The reader can simply check that the commit 69562306a36Sopenharmony_ciis off the reader page. Once the commit page leaves the reader page 69662306a36Sopenharmony_ciit will never go back on it unless a reader does another swap with the 69762306a36Sopenharmony_cibuffer page that is also the commit page. 69862306a36Sopenharmony_ci 69962306a36Sopenharmony_ci 70062306a36Sopenharmony_ciNested writes 70162306a36Sopenharmony_ci------------- 70262306a36Sopenharmony_ci 70362306a36Sopenharmony_ciIn the pushing forward of the tail page we must first push forward 70462306a36Sopenharmony_cithe head page if the head page is the next page. If the head page 70562306a36Sopenharmony_ciis not the next page, the tail page is simply updated with a cmpxchg. 70662306a36Sopenharmony_ci 70762306a36Sopenharmony_ciOnly writers move the tail page. This must be done atomically to protect 70862306a36Sopenharmony_ciagainst nested writers:: 70962306a36Sopenharmony_ci 71062306a36Sopenharmony_ci temp_page = tail_page 71162306a36Sopenharmony_ci next_page = temp_page->next 71262306a36Sopenharmony_ci cmpxchg(tail_page, temp_page, next_page) 71362306a36Sopenharmony_ci 71462306a36Sopenharmony_ciThe above will update the tail page if it is still pointing to the expected 71562306a36Sopenharmony_cipage. If this fails, a nested write pushed it forward, the current write 71662306a36Sopenharmony_cidoes not need to push it:: 71762306a36Sopenharmony_ci 71862306a36Sopenharmony_ci 71962306a36Sopenharmony_ci temp page 72062306a36Sopenharmony_ci | 72162306a36Sopenharmony_ci v 72262306a36Sopenharmony_ci tail page 72362306a36Sopenharmony_ci | 72462306a36Sopenharmony_ci v 72562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 72662306a36Sopenharmony_ci <---| |--->| |--->| |--->| |---> 72762306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 72862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 72962306a36Sopenharmony_ci 73062306a36Sopenharmony_ciNested write comes in and moves the tail page forward:: 73162306a36Sopenharmony_ci 73262306a36Sopenharmony_ci tail page (moved by nested writer) 73362306a36Sopenharmony_ci temp page | 73462306a36Sopenharmony_ci | | 73562306a36Sopenharmony_ci v v 73662306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 73762306a36Sopenharmony_ci <---| |--->| |--->| |--->| |---> 73862306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 73962306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 74062306a36Sopenharmony_ci 74162306a36Sopenharmony_ciThe above would fail the cmpxchg, but since the tail page has already 74262306a36Sopenharmony_cibeen moved forward, the writer will just try again to reserve storage 74362306a36Sopenharmony_cion the new tail page. 74462306a36Sopenharmony_ci 74562306a36Sopenharmony_ciBut the moving of the head page is a bit more complex:: 74662306a36Sopenharmony_ci 74762306a36Sopenharmony_ci tail page 74862306a36Sopenharmony_ci | 74962306a36Sopenharmony_ci v 75062306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 75162306a36Sopenharmony_ci <---| |--->| |-H->| |--->| |---> 75262306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 75362306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 75462306a36Sopenharmony_ci 75562306a36Sopenharmony_ciThe write converts the head page pointer to UPDATE:: 75662306a36Sopenharmony_ci 75762306a36Sopenharmony_ci tail page 75862306a36Sopenharmony_ci | 75962306a36Sopenharmony_ci v 76062306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 76162306a36Sopenharmony_ci <---| |--->| |-U->| |--->| |---> 76262306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 76362306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 76462306a36Sopenharmony_ci 76562306a36Sopenharmony_ciBut if a nested writer preempts here, it will see that the next 76662306a36Sopenharmony_cipage is a head page, but it is also nested. It will detect that 76762306a36Sopenharmony_ciit is nested and will save that information. The detection is the 76862306a36Sopenharmony_cifact that it sees the UPDATE flag instead of a HEADER or NORMAL 76962306a36Sopenharmony_cipointer. 77062306a36Sopenharmony_ci 77162306a36Sopenharmony_ciThe nested writer will set the new head page pointer:: 77262306a36Sopenharmony_ci 77362306a36Sopenharmony_ci tail page 77462306a36Sopenharmony_ci | 77562306a36Sopenharmony_ci v 77662306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 77762306a36Sopenharmony_ci <---| |--->| |-U->| |-H->| |---> 77862306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 77962306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 78062306a36Sopenharmony_ci 78162306a36Sopenharmony_ciBut it will not reset the update back to normal. Only the writer 78262306a36Sopenharmony_cithat converted a pointer from HEAD to UPDATE will convert it back 78362306a36Sopenharmony_cito NORMAL:: 78462306a36Sopenharmony_ci 78562306a36Sopenharmony_ci tail page 78662306a36Sopenharmony_ci | 78762306a36Sopenharmony_ci v 78862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 78962306a36Sopenharmony_ci <---| |--->| |-U->| |-H->| |---> 79062306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 79162306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 79262306a36Sopenharmony_ci 79362306a36Sopenharmony_ciAfter the nested writer finishes, the outermost writer will convert 79462306a36Sopenharmony_cithe UPDATE pointer to NORMAL:: 79562306a36Sopenharmony_ci 79662306a36Sopenharmony_ci 79762306a36Sopenharmony_ci tail page 79862306a36Sopenharmony_ci | 79962306a36Sopenharmony_ci v 80062306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 80162306a36Sopenharmony_ci <---| |--->| |--->| |-H->| |---> 80262306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 80362306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 80462306a36Sopenharmony_ci 80562306a36Sopenharmony_ci 80662306a36Sopenharmony_ciIt can be even more complex if several nested writes came in and moved 80762306a36Sopenharmony_cithe tail page ahead several pages:: 80862306a36Sopenharmony_ci 80962306a36Sopenharmony_ci 81062306a36Sopenharmony_ci (first writer) 81162306a36Sopenharmony_ci 81262306a36Sopenharmony_ci tail page 81362306a36Sopenharmony_ci | 81462306a36Sopenharmony_ci v 81562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 81662306a36Sopenharmony_ci <---| |--->| |-H->| |--->| |---> 81762306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 81862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 81962306a36Sopenharmony_ci 82062306a36Sopenharmony_ciThe write converts the head page pointer to UPDATE:: 82162306a36Sopenharmony_ci 82262306a36Sopenharmony_ci tail page 82362306a36Sopenharmony_ci | 82462306a36Sopenharmony_ci v 82562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 82662306a36Sopenharmony_ci <---| |--->| |-U->| |--->| |---> 82762306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 82862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 82962306a36Sopenharmony_ci 83062306a36Sopenharmony_ciNext writer comes in, and sees the update and sets up the new 83162306a36Sopenharmony_cihead page:: 83262306a36Sopenharmony_ci 83362306a36Sopenharmony_ci (second writer) 83462306a36Sopenharmony_ci 83562306a36Sopenharmony_ci tail page 83662306a36Sopenharmony_ci | 83762306a36Sopenharmony_ci v 83862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 83962306a36Sopenharmony_ci <---| |--->| |-U->| |-H->| |---> 84062306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 84162306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 84262306a36Sopenharmony_ci 84362306a36Sopenharmony_ciThe nested writer moves the tail page forward. But does not set the old 84462306a36Sopenharmony_ciupdate page to NORMAL because it is not the outermost writer:: 84562306a36Sopenharmony_ci 84662306a36Sopenharmony_ci tail page 84762306a36Sopenharmony_ci | 84862306a36Sopenharmony_ci v 84962306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 85062306a36Sopenharmony_ci <---| |--->| |-U->| |-H->| |---> 85162306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 85262306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 85362306a36Sopenharmony_ci 85462306a36Sopenharmony_ciAnother writer preempts and sees the page after the tail page is a head page. 85562306a36Sopenharmony_ciIt changes it from HEAD to UPDATE:: 85662306a36Sopenharmony_ci 85762306a36Sopenharmony_ci (third writer) 85862306a36Sopenharmony_ci 85962306a36Sopenharmony_ci tail page 86062306a36Sopenharmony_ci | 86162306a36Sopenharmony_ci v 86262306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 86362306a36Sopenharmony_ci <---| |--->| |-U->| |-U->| |---> 86462306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 86562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 86662306a36Sopenharmony_ci 86762306a36Sopenharmony_ciThe writer will move the head page forward:: 86862306a36Sopenharmony_ci 86962306a36Sopenharmony_ci 87062306a36Sopenharmony_ci (third writer) 87162306a36Sopenharmony_ci 87262306a36Sopenharmony_ci tail page 87362306a36Sopenharmony_ci | 87462306a36Sopenharmony_ci v 87562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 87662306a36Sopenharmony_ci <---| |--->| |-U->| |-U->| |-H-> 87762306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 87862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 87962306a36Sopenharmony_ci 88062306a36Sopenharmony_ciBut now that the third writer did change the HEAD flag to UPDATE it 88162306a36Sopenharmony_ciwill convert it to normal:: 88262306a36Sopenharmony_ci 88362306a36Sopenharmony_ci 88462306a36Sopenharmony_ci (third writer) 88562306a36Sopenharmony_ci 88662306a36Sopenharmony_ci tail page 88762306a36Sopenharmony_ci | 88862306a36Sopenharmony_ci v 88962306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 89062306a36Sopenharmony_ci <---| |--->| |-U->| |--->| |-H-> 89162306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 89262306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 89362306a36Sopenharmony_ci 89462306a36Sopenharmony_ci 89562306a36Sopenharmony_ciThen it will move the tail page, and return back to the second writer:: 89662306a36Sopenharmony_ci 89762306a36Sopenharmony_ci 89862306a36Sopenharmony_ci (second writer) 89962306a36Sopenharmony_ci 90062306a36Sopenharmony_ci tail page 90162306a36Sopenharmony_ci | 90262306a36Sopenharmony_ci v 90362306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 90462306a36Sopenharmony_ci <---| |--->| |-U->| |--->| |-H-> 90562306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 90662306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 90762306a36Sopenharmony_ci 90862306a36Sopenharmony_ci 90962306a36Sopenharmony_ciThe second writer will fail to move the tail page because it was already 91062306a36Sopenharmony_cimoved, so it will try again and add its data to the new tail page. 91162306a36Sopenharmony_ciIt will return to the first writer:: 91262306a36Sopenharmony_ci 91362306a36Sopenharmony_ci 91462306a36Sopenharmony_ci (first writer) 91562306a36Sopenharmony_ci 91662306a36Sopenharmony_ci tail page 91762306a36Sopenharmony_ci | 91862306a36Sopenharmony_ci v 91962306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 92062306a36Sopenharmony_ci <---| |--->| |-U->| |--->| |-H-> 92162306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 92262306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 92362306a36Sopenharmony_ci 92462306a36Sopenharmony_ciThe first writer cannot know atomically if the tail page moved 92562306a36Sopenharmony_ciwhile it updates the HEAD page. It will then update the head page to 92662306a36Sopenharmony_ciwhat it thinks is the new head page:: 92762306a36Sopenharmony_ci 92862306a36Sopenharmony_ci 92962306a36Sopenharmony_ci (first writer) 93062306a36Sopenharmony_ci 93162306a36Sopenharmony_ci tail page 93262306a36Sopenharmony_ci | 93362306a36Sopenharmony_ci v 93462306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 93562306a36Sopenharmony_ci <---| |--->| |-U->| |-H->| |-H-> 93662306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 93762306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 93862306a36Sopenharmony_ci 93962306a36Sopenharmony_ciSince the cmpxchg returns the old value of the pointer the first writer 94062306a36Sopenharmony_ciwill see it succeeded in updating the pointer from NORMAL to HEAD. 94162306a36Sopenharmony_ciBut as we can see, this is not good enough. It must also check to see 94262306a36Sopenharmony_ciif the tail page is either where it use to be or on the next page:: 94362306a36Sopenharmony_ci 94462306a36Sopenharmony_ci 94562306a36Sopenharmony_ci (first writer) 94662306a36Sopenharmony_ci 94762306a36Sopenharmony_ci A B tail page 94862306a36Sopenharmony_ci | | | 94962306a36Sopenharmony_ci v v v 95062306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 95162306a36Sopenharmony_ci <---| |--->| |-U->| |-H->| |-H-> 95262306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 95362306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 95462306a36Sopenharmony_ci 95562306a36Sopenharmony_ciIf tail page != A and tail page != B, then it must reset the pointer 95662306a36Sopenharmony_ciback to NORMAL. The fact that it only needs to worry about nested 95762306a36Sopenharmony_ciwriters means that it only needs to check this after setting the HEAD page:: 95862306a36Sopenharmony_ci 95962306a36Sopenharmony_ci 96062306a36Sopenharmony_ci (first writer) 96162306a36Sopenharmony_ci 96262306a36Sopenharmony_ci A B tail page 96362306a36Sopenharmony_ci | | | 96462306a36Sopenharmony_ci v v v 96562306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 96662306a36Sopenharmony_ci <---| |--->| |-U->| |--->| |-H-> 96762306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 96862306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 96962306a36Sopenharmony_ci 97062306a36Sopenharmony_ciNow the writer can update the head page. This is also why the head page must 97162306a36Sopenharmony_ciremain in UPDATE and only reset by the outermost writer. This prevents 97262306a36Sopenharmony_cithe reader from seeing the incorrect head page:: 97362306a36Sopenharmony_ci 97462306a36Sopenharmony_ci 97562306a36Sopenharmony_ci (first writer) 97662306a36Sopenharmony_ci 97762306a36Sopenharmony_ci A B tail page 97862306a36Sopenharmony_ci | | | 97962306a36Sopenharmony_ci v v v 98062306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 98162306a36Sopenharmony_ci <---| |--->| |--->| |--->| |-H-> 98262306a36Sopenharmony_ci --->| |<---| |<---| |<---| |<--- 98362306a36Sopenharmony_ci +---+ +---+ +---+ +---+ 984