162306a36Sopenharmony_ci========================================
262306a36Sopenharmony_ciGeneric Associative Array Implementation
362306a36Sopenharmony_ci========================================
462306a36Sopenharmony_ci
562306a36Sopenharmony_ciOverview
662306a36Sopenharmony_ci========
762306a36Sopenharmony_ci
862306a36Sopenharmony_ciThis associative array implementation is an object container with the following
962306a36Sopenharmony_ciproperties:
1062306a36Sopenharmony_ci
1162306a36Sopenharmony_ci1. Objects are opaque pointers.  The implementation does not care where they
1262306a36Sopenharmony_ci   point (if anywhere) or what they point to (if anything).
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_ci   .. note::
1562306a36Sopenharmony_ci
1662306a36Sopenharmony_ci      Pointers to objects _must_ be zero in the least significant bit.
1762306a36Sopenharmony_ci
1862306a36Sopenharmony_ci2. Objects do not need to contain linkage blocks for use by the array.  This
1962306a36Sopenharmony_ci   permits an object to be located in multiple arrays simultaneously.
2062306a36Sopenharmony_ci   Rather, the array is made up of metadata blocks that point to objects.
2162306a36Sopenharmony_ci
2262306a36Sopenharmony_ci3. Objects require index keys to locate them within the array.
2362306a36Sopenharmony_ci
2462306a36Sopenharmony_ci4. Index keys must be unique.  Inserting an object with the same key as one
2562306a36Sopenharmony_ci   already in the array will replace the old object.
2662306a36Sopenharmony_ci
2762306a36Sopenharmony_ci5. Index keys can be of any length and can be of different lengths.
2862306a36Sopenharmony_ci
2962306a36Sopenharmony_ci6. Index keys should encode the length early on, before any variation due to
3062306a36Sopenharmony_ci   length is seen.
3162306a36Sopenharmony_ci
3262306a36Sopenharmony_ci7. Index keys can include a hash to scatter objects throughout the array.
3362306a36Sopenharmony_ci
3462306a36Sopenharmony_ci8. The array can iterated over.  The objects will not necessarily come out in
3562306a36Sopenharmony_ci   key order.
3662306a36Sopenharmony_ci
3762306a36Sopenharmony_ci9. The array can be iterated over while it is being modified, provided the
3862306a36Sopenharmony_ci   RCU readlock is being held by the iterator.  Note, however, under these
3962306a36Sopenharmony_ci   circumstances, some objects may be seen more than once.  If this is a
4062306a36Sopenharmony_ci   problem, the iterator should lock against modification.  Objects will not
4162306a36Sopenharmony_ci   be missed, however, unless deleted.
4262306a36Sopenharmony_ci
4362306a36Sopenharmony_ci10. Objects in the array can be looked up by means of their index key.
4462306a36Sopenharmony_ci
4562306a36Sopenharmony_ci11. Objects can be looked up while the array is being modified, provided the
4662306a36Sopenharmony_ci    RCU readlock is being held by the thread doing the look up.
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ciThe implementation uses a tree of 16-pointer nodes internally that are indexed
4962306a36Sopenharmony_cion each level by nibbles from the index key in the same manner as in a radix
5062306a36Sopenharmony_citree.  To improve memory efficiency, shortcuts can be emplaced to skip over
5162306a36Sopenharmony_ciwhat would otherwise be a series of single-occupancy nodes.  Further, nodes
5262306a36Sopenharmony_cipack leaf object pointers into spare space in the node rather than making an
5362306a36Sopenharmony_ciextra branch until as such time an object needs to be added to a full node.
5462306a36Sopenharmony_ci
5562306a36Sopenharmony_ci
5662306a36Sopenharmony_ciThe Public API
5762306a36Sopenharmony_ci==============
5862306a36Sopenharmony_ci
5962306a36Sopenharmony_ciThe public API can be found in ``<linux/assoc_array.h>``.  The associative
6062306a36Sopenharmony_ciarray is rooted on the following structure::
6162306a36Sopenharmony_ci
6262306a36Sopenharmony_ci    struct assoc_array {
6362306a36Sopenharmony_ci            ...
6462306a36Sopenharmony_ci    };
6562306a36Sopenharmony_ci
6662306a36Sopenharmony_ciThe code is selected by enabling ``CONFIG_ASSOCIATIVE_ARRAY`` with::
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ci    ./script/config -e ASSOCIATIVE_ARRAY
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_ci
7162306a36Sopenharmony_ciEdit Script
7262306a36Sopenharmony_ci-----------
7362306a36Sopenharmony_ci
7462306a36Sopenharmony_ciThe insertion and deletion functions produce an 'edit script' that can later be
7562306a36Sopenharmony_ciapplied to effect the changes without risking ``ENOMEM``. This retains the
7662306a36Sopenharmony_cipreallocated metadata blocks that will be installed in the internal tree and
7762306a36Sopenharmony_cikeeps track of the metadata blocks that will be removed from the tree when the
7862306a36Sopenharmony_ciscript is applied.
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ciThis is also used to keep track of dead blocks and dead objects after the
8162306a36Sopenharmony_ciscript has been applied so that they can be freed later.  The freeing is done
8262306a36Sopenharmony_ciafter an RCU grace period has passed - thus allowing access functions to
8362306a36Sopenharmony_ciproceed under the RCU read lock.
8462306a36Sopenharmony_ci
8562306a36Sopenharmony_ciThe script appears as outside of the API as a pointer of the type::
8662306a36Sopenharmony_ci
8762306a36Sopenharmony_ci    struct assoc_array_edit;
8862306a36Sopenharmony_ci
8962306a36Sopenharmony_ciThere are two functions for dealing with the script:
9062306a36Sopenharmony_ci
9162306a36Sopenharmony_ci1. Apply an edit script::
9262306a36Sopenharmony_ci
9362306a36Sopenharmony_ci    void assoc_array_apply_edit(struct assoc_array_edit *edit);
9462306a36Sopenharmony_ci
9562306a36Sopenharmony_ciThis will perform the edit functions, interpolating various write barriers
9662306a36Sopenharmony_cito permit accesses under the RCU read lock to continue.  The edit script
9762306a36Sopenharmony_ciwill then be passed to ``call_rcu()`` to free it and any dead stuff it points
9862306a36Sopenharmony_cito.
9962306a36Sopenharmony_ci
10062306a36Sopenharmony_ci2. Cancel an edit script::
10162306a36Sopenharmony_ci
10262306a36Sopenharmony_ci    void assoc_array_cancel_edit(struct assoc_array_edit *edit);
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ciThis frees the edit script and all preallocated memory immediately. If
10562306a36Sopenharmony_cithis was for insertion, the new object is _not_ released by this function,
10662306a36Sopenharmony_cibut must rather be released by the caller.
10762306a36Sopenharmony_ci
10862306a36Sopenharmony_ciThese functions are guaranteed not to fail.
10962306a36Sopenharmony_ci
11062306a36Sopenharmony_ci
11162306a36Sopenharmony_ciOperations Table
11262306a36Sopenharmony_ci----------------
11362306a36Sopenharmony_ci
11462306a36Sopenharmony_ciVarious functions take a table of operations::
11562306a36Sopenharmony_ci
11662306a36Sopenharmony_ci    struct assoc_array_ops {
11762306a36Sopenharmony_ci            ...
11862306a36Sopenharmony_ci    };
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_ciThis points to a number of methods, all of which need to be provided:
12162306a36Sopenharmony_ci
12262306a36Sopenharmony_ci1. Get a chunk of index key from caller data::
12362306a36Sopenharmony_ci
12462306a36Sopenharmony_ci    unsigned long (*get_key_chunk)(const void *index_key, int level);
12562306a36Sopenharmony_ci
12662306a36Sopenharmony_ciThis should return a chunk of caller-supplied index key starting at the
12762306a36Sopenharmony_ci*bit* position given by the level argument.  The level argument will be a
12862306a36Sopenharmony_cimultiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return
12962306a36Sopenharmony_ci``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``.  No error is possible.
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ci
13262306a36Sopenharmony_ci2. Get a chunk of an object's index key::
13362306a36Sopenharmony_ci
13462306a36Sopenharmony_ci    unsigned long (*get_object_key_chunk)(const void *object, int level);
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_ciAs the previous function, but gets its data from an object in the array
13762306a36Sopenharmony_cirather than from a caller-supplied index key.
13862306a36Sopenharmony_ci
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ci3. See if this is the object we're looking for::
14162306a36Sopenharmony_ci
14262306a36Sopenharmony_ci    bool (*compare_object)(const void *object, const void *index_key);
14362306a36Sopenharmony_ci
14462306a36Sopenharmony_ciCompare the object against an index key and return ``true`` if it matches and
14562306a36Sopenharmony_ci``false`` if it doesn't.
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ci
14862306a36Sopenharmony_ci4. Diff the index keys of two objects::
14962306a36Sopenharmony_ci
15062306a36Sopenharmony_ci    int (*diff_objects)(const void *object, const void *index_key);
15162306a36Sopenharmony_ci
15262306a36Sopenharmony_ciReturn the bit position at which the index key of the specified object
15362306a36Sopenharmony_cidiffers from the given index key or -1 if they are the same.
15462306a36Sopenharmony_ci
15562306a36Sopenharmony_ci
15662306a36Sopenharmony_ci5. Free an object::
15762306a36Sopenharmony_ci
15862306a36Sopenharmony_ci    void (*free_object)(void *object);
15962306a36Sopenharmony_ci
16062306a36Sopenharmony_ciFree the specified object.  Note that this may be called an RCU grace period
16162306a36Sopenharmony_ciafter ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may be
16262306a36Sopenharmony_cinecessary on module unloading.
16362306a36Sopenharmony_ci
16462306a36Sopenharmony_ci
16562306a36Sopenharmony_ciManipulation Functions
16662306a36Sopenharmony_ci----------------------
16762306a36Sopenharmony_ci
16862306a36Sopenharmony_ciThere are a number of functions for manipulating an associative array:
16962306a36Sopenharmony_ci
17062306a36Sopenharmony_ci1. Initialise an associative array::
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_ci    void assoc_array_init(struct assoc_array *array);
17362306a36Sopenharmony_ci
17462306a36Sopenharmony_ciThis initialises the base structure for an associative array.  It can't fail.
17562306a36Sopenharmony_ci
17662306a36Sopenharmony_ci
17762306a36Sopenharmony_ci2. Insert/replace an object in an associative array::
17862306a36Sopenharmony_ci
17962306a36Sopenharmony_ci    struct assoc_array_edit *
18062306a36Sopenharmony_ci    assoc_array_insert(struct assoc_array *array,
18162306a36Sopenharmony_ci                       const struct assoc_array_ops *ops,
18262306a36Sopenharmony_ci                       const void *index_key,
18362306a36Sopenharmony_ci                       void *object);
18462306a36Sopenharmony_ci
18562306a36Sopenharmony_ciThis inserts the given object into the array.  Note that the least
18662306a36Sopenharmony_cisignificant bit of the pointer must be zero as it's used to type-mark
18762306a36Sopenharmony_cipointers internally.
18862306a36Sopenharmony_ci
18962306a36Sopenharmony_ciIf an object already exists for that key then it will be replaced with the
19062306a36Sopenharmony_cinew object and the old one will be freed automatically.
19162306a36Sopenharmony_ci
19262306a36Sopenharmony_ciThe ``index_key`` argument should hold index key information and is
19362306a36Sopenharmony_cipassed to the methods in the ops table when they are called.
19462306a36Sopenharmony_ci
19562306a36Sopenharmony_ciThis function makes no alteration to the array itself, but rather returns
19662306a36Sopenharmony_cian edit script that must be applied.  ``-ENOMEM`` is returned in the case of
19762306a36Sopenharmony_cian out-of-memory error.
19862306a36Sopenharmony_ci
19962306a36Sopenharmony_ciThe caller should lock exclusively against other modifiers of the array.
20062306a36Sopenharmony_ci
20162306a36Sopenharmony_ci
20262306a36Sopenharmony_ci3. Delete an object from an associative array::
20362306a36Sopenharmony_ci
20462306a36Sopenharmony_ci    struct assoc_array_edit *
20562306a36Sopenharmony_ci    assoc_array_delete(struct assoc_array *array,
20662306a36Sopenharmony_ci                       const struct assoc_array_ops *ops,
20762306a36Sopenharmony_ci                       const void *index_key);
20862306a36Sopenharmony_ci
20962306a36Sopenharmony_ciThis deletes an object that matches the specified data from the array.
21062306a36Sopenharmony_ci
21162306a36Sopenharmony_ciThe ``index_key`` argument should hold index key information and is
21262306a36Sopenharmony_cipassed to the methods in the ops table when they are called.
21362306a36Sopenharmony_ci
21462306a36Sopenharmony_ciThis function makes no alteration to the array itself, but rather returns
21562306a36Sopenharmony_cian edit script that must be applied.  ``-ENOMEM`` is returned in the case of
21662306a36Sopenharmony_cian out-of-memory error.  ``NULL`` will be returned if the specified object is
21762306a36Sopenharmony_cinot found within the array.
21862306a36Sopenharmony_ci
21962306a36Sopenharmony_ciThe caller should lock exclusively against other modifiers of the array.
22062306a36Sopenharmony_ci
22162306a36Sopenharmony_ci
22262306a36Sopenharmony_ci4. Delete all objects from an associative array::
22362306a36Sopenharmony_ci
22462306a36Sopenharmony_ci    struct assoc_array_edit *
22562306a36Sopenharmony_ci    assoc_array_clear(struct assoc_array *array,
22662306a36Sopenharmony_ci                      const struct assoc_array_ops *ops);
22762306a36Sopenharmony_ci
22862306a36Sopenharmony_ciThis deletes all the objects from an associative array and leaves it
22962306a36Sopenharmony_cicompletely empty.
23062306a36Sopenharmony_ci
23162306a36Sopenharmony_ciThis function makes no alteration to the array itself, but rather returns
23262306a36Sopenharmony_cian edit script that must be applied.  ``-ENOMEM`` is returned in the case of
23362306a36Sopenharmony_cian out-of-memory error.
23462306a36Sopenharmony_ci
23562306a36Sopenharmony_ciThe caller should lock exclusively against other modifiers of the array.
23662306a36Sopenharmony_ci
23762306a36Sopenharmony_ci
23862306a36Sopenharmony_ci5. Destroy an associative array, deleting all objects::
23962306a36Sopenharmony_ci
24062306a36Sopenharmony_ci    void assoc_array_destroy(struct assoc_array *array,
24162306a36Sopenharmony_ci                             const struct assoc_array_ops *ops);
24262306a36Sopenharmony_ci
24362306a36Sopenharmony_ciThis destroys the contents of the associative array and leaves it
24462306a36Sopenharmony_cicompletely empty.  It is not permitted for another thread to be traversing
24562306a36Sopenharmony_cithe array under the RCU read lock at the same time as this function is
24662306a36Sopenharmony_cidestroying it as no RCU deferral is performed on memory release -
24762306a36Sopenharmony_cisomething that would require memory to be allocated.
24862306a36Sopenharmony_ci
24962306a36Sopenharmony_ciThe caller should lock exclusively against other modifiers and accessors
25062306a36Sopenharmony_ciof the array.
25162306a36Sopenharmony_ci
25262306a36Sopenharmony_ci
25362306a36Sopenharmony_ci6. Garbage collect an associative array::
25462306a36Sopenharmony_ci
25562306a36Sopenharmony_ci    int assoc_array_gc(struct assoc_array *array,
25662306a36Sopenharmony_ci                       const struct assoc_array_ops *ops,
25762306a36Sopenharmony_ci                       bool (*iterator)(void *object, void *iterator_data),
25862306a36Sopenharmony_ci                       void *iterator_data);
25962306a36Sopenharmony_ci
26062306a36Sopenharmony_ciThis iterates over the objects in an associative array and passes each one to
26162306a36Sopenharmony_ci``iterator()``.  If ``iterator()`` returns ``true``, the object is kept.  If it
26262306a36Sopenharmony_cireturns ``false``, the object will be freed.  If the ``iterator()`` function
26362306a36Sopenharmony_cireturns ``true``, it must perform any appropriate refcount incrementing on the
26462306a36Sopenharmony_ciobject before returning.
26562306a36Sopenharmony_ci
26662306a36Sopenharmony_ciThe internal tree will be packed down if possible as part of the iteration
26762306a36Sopenharmony_cito reduce the number of nodes in it.
26862306a36Sopenharmony_ci
26962306a36Sopenharmony_ciThe ``iterator_data`` is passed directly to ``iterator()`` and is otherwise
27062306a36Sopenharmony_ciignored by the function.
27162306a36Sopenharmony_ci
27262306a36Sopenharmony_ciThe function will return ``0`` if successful and ``-ENOMEM`` if there wasn't
27362306a36Sopenharmony_cienough memory.
27462306a36Sopenharmony_ci
27562306a36Sopenharmony_ciIt is possible for other threads to iterate over or search the array under
27662306a36Sopenharmony_cithe RCU read lock while this function is in progress.  The caller should
27762306a36Sopenharmony_cilock exclusively against other modifiers of the array.
27862306a36Sopenharmony_ci
27962306a36Sopenharmony_ci
28062306a36Sopenharmony_ciAccess Functions
28162306a36Sopenharmony_ci----------------
28262306a36Sopenharmony_ci
28362306a36Sopenharmony_ciThere are two functions for accessing an associative array:
28462306a36Sopenharmony_ci
28562306a36Sopenharmony_ci1. Iterate over all the objects in an associative array::
28662306a36Sopenharmony_ci
28762306a36Sopenharmony_ci    int assoc_array_iterate(const struct assoc_array *array,
28862306a36Sopenharmony_ci                            int (*iterator)(const void *object,
28962306a36Sopenharmony_ci                                            void *iterator_data),
29062306a36Sopenharmony_ci                            void *iterator_data);
29162306a36Sopenharmony_ci
29262306a36Sopenharmony_ciThis passes each object in the array to the iterator callback function.
29362306a36Sopenharmony_ci``iterator_data`` is private data for that function.
29462306a36Sopenharmony_ci
29562306a36Sopenharmony_ciThis may be used on an array at the same time as the array is being
29662306a36Sopenharmony_cimodified, provided the RCU read lock is held.  Under such circumstances,
29762306a36Sopenharmony_ciit is possible for the iteration function to see some objects twice.  If
29862306a36Sopenharmony_cithis is a problem, then modification should be locked against.  The
29962306a36Sopenharmony_ciiteration algorithm should not, however, miss any objects.
30062306a36Sopenharmony_ci
30162306a36Sopenharmony_ciThe function will return ``0`` if no objects were in the array or else it will
30262306a36Sopenharmony_cireturn the result of the last iterator function called.  Iteration stops
30362306a36Sopenharmony_ciimmediately if any call to the iteration function results in a non-zero
30462306a36Sopenharmony_cireturn.
30562306a36Sopenharmony_ci
30662306a36Sopenharmony_ci
30762306a36Sopenharmony_ci2. Find an object in an associative array::
30862306a36Sopenharmony_ci
30962306a36Sopenharmony_ci    void *assoc_array_find(const struct assoc_array *array,
31062306a36Sopenharmony_ci                           const struct assoc_array_ops *ops,
31162306a36Sopenharmony_ci                           const void *index_key);
31262306a36Sopenharmony_ci
31362306a36Sopenharmony_ciThis walks through the array's internal tree directly to the object
31462306a36Sopenharmony_cispecified by the index key..
31562306a36Sopenharmony_ci
31662306a36Sopenharmony_ciThis may be used on an array at the same time as the array is being
31762306a36Sopenharmony_cimodified, provided the RCU read lock is held.
31862306a36Sopenharmony_ci
31962306a36Sopenharmony_ciThe function will return the object if found (and set ``*_type`` to the object
32062306a36Sopenharmony_citype) or will return ``NULL`` if the object was not found.
32162306a36Sopenharmony_ci
32262306a36Sopenharmony_ci
32362306a36Sopenharmony_ciIndex Key Form
32462306a36Sopenharmony_ci--------------
32562306a36Sopenharmony_ci
32662306a36Sopenharmony_ciThe index key can be of any form, but since the algorithms aren't told how long
32762306a36Sopenharmony_cithe key is, it is strongly recommended that the index key includes its length
32862306a36Sopenharmony_civery early on before any variation due to the length would have an effect on
32962306a36Sopenharmony_cicomparisons.
33062306a36Sopenharmony_ci
33162306a36Sopenharmony_ciThis will cause leaves with different length keys to scatter away from each
33262306a36Sopenharmony_ciother - and those with the same length keys to cluster together.
33362306a36Sopenharmony_ci
33462306a36Sopenharmony_ciIt is also recommended that the index key begin with a hash of the rest of the
33562306a36Sopenharmony_cikey to maximise scattering throughout keyspace.
33662306a36Sopenharmony_ci
33762306a36Sopenharmony_ciThe better the scattering, the wider and lower the internal tree will be.
33862306a36Sopenharmony_ci
33962306a36Sopenharmony_ciPoor scattering isn't too much of a problem as there are shortcuts and nodes
34062306a36Sopenharmony_cican contain mixtures of leaves and metadata pointers.
34162306a36Sopenharmony_ci
34262306a36Sopenharmony_ciThe index key is read in chunks of machine word.  Each chunk is subdivided into
34362306a36Sopenharmony_cione nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and
34462306a36Sopenharmony_cion a 64-bit CPU, 16 levels.  Unless the scattering is really poor, it is
34562306a36Sopenharmony_ciunlikely that more than one word of any particular index key will have to be
34662306a36Sopenharmony_ciused.
34762306a36Sopenharmony_ci
34862306a36Sopenharmony_ci
34962306a36Sopenharmony_ciInternal Workings
35062306a36Sopenharmony_ci=================
35162306a36Sopenharmony_ci
35262306a36Sopenharmony_ciThe associative array data structure has an internal tree.  This tree is
35362306a36Sopenharmony_ciconstructed of two types of metadata blocks: nodes and shortcuts.
35462306a36Sopenharmony_ci
35562306a36Sopenharmony_ciA node is an array of slots.  Each slot can contain one of four things:
35662306a36Sopenharmony_ci
35762306a36Sopenharmony_ci* A NULL pointer, indicating that the slot is empty.
35862306a36Sopenharmony_ci* A pointer to an object (a leaf).
35962306a36Sopenharmony_ci* A pointer to a node at the next level.
36062306a36Sopenharmony_ci* A pointer to a shortcut.
36162306a36Sopenharmony_ci
36262306a36Sopenharmony_ci
36362306a36Sopenharmony_ciBasic Internal Tree Layout
36462306a36Sopenharmony_ci--------------------------
36562306a36Sopenharmony_ci
36662306a36Sopenharmony_ciIgnoring shortcuts for the moment, the nodes form a multilevel tree.  The index
36762306a36Sopenharmony_cikey space is strictly subdivided by the nodes in the tree and nodes occur on
36862306a36Sopenharmony_cifixed levels.  For example::
36962306a36Sopenharmony_ci
37062306a36Sopenharmony_ci Level: 0               1               2               3
37162306a36Sopenharmony_ci        =============== =============== =============== ===============
37262306a36Sopenharmony_ci                                                        NODE D
37362306a36Sopenharmony_ci                        NODE B          NODE C  +------>+---+
37462306a36Sopenharmony_ci                +------>+---+   +------>+---+   |       | 0 |
37562306a36Sopenharmony_ci        NODE A  |       | 0 |   |       | 0 |   |       +---+
37662306a36Sopenharmony_ci        +---+   |       +---+   |       +---+   |       :   :
37762306a36Sopenharmony_ci        | 0 |   |       :   :   |       :   :   |       +---+
37862306a36Sopenharmony_ci        +---+   |       +---+   |       +---+   |       | f |
37962306a36Sopenharmony_ci        | 1 |---+       | 3 |---+       | 7 |---+       +---+
38062306a36Sopenharmony_ci        +---+           +---+           +---+
38162306a36Sopenharmony_ci        :   :           :   :           | 8 |---+
38262306a36Sopenharmony_ci        +---+           +---+           +---+   |       NODE E
38362306a36Sopenharmony_ci        | e |---+       | f |           :   :   +------>+---+
38462306a36Sopenharmony_ci        +---+   |       +---+           +---+           | 0 |
38562306a36Sopenharmony_ci        | f |   |                       | f |           +---+
38662306a36Sopenharmony_ci        +---+   |                       +---+           :   :
38762306a36Sopenharmony_ci                |       NODE F                          +---+
38862306a36Sopenharmony_ci                +------>+---+                           | f |
38962306a36Sopenharmony_ci                        | 0 |           NODE G          +---+
39062306a36Sopenharmony_ci                        +---+   +------>+---+
39162306a36Sopenharmony_ci                        :   :   |       | 0 |
39262306a36Sopenharmony_ci                        +---+   |       +---+
39362306a36Sopenharmony_ci                        | 6 |---+       :   :
39462306a36Sopenharmony_ci                        +---+           +---+
39562306a36Sopenharmony_ci                        :   :           | f |
39662306a36Sopenharmony_ci                        +---+           +---+
39762306a36Sopenharmony_ci                        | f |
39862306a36Sopenharmony_ci                        +---+
39962306a36Sopenharmony_ci
40062306a36Sopenharmony_ciIn the above example, there are 7 nodes (A-G), each with 16 slots (0-f).
40162306a36Sopenharmony_ciAssuming no other meta data nodes in the tree, the key space is divided
40262306a36Sopenharmony_cithusly::
40362306a36Sopenharmony_ci
40462306a36Sopenharmony_ci    KEY PREFIX      NODE
40562306a36Sopenharmony_ci    ==========      ====
40662306a36Sopenharmony_ci    137*            D
40762306a36Sopenharmony_ci    138*            E
40862306a36Sopenharmony_ci    13[0-69-f]*     C
40962306a36Sopenharmony_ci    1[0-24-f]*      B
41062306a36Sopenharmony_ci    e6*             G
41162306a36Sopenharmony_ci    e[0-57-f]*      F
41262306a36Sopenharmony_ci    [02-df]*        A
41362306a36Sopenharmony_ci
41462306a36Sopenharmony_ciSo, for instance, keys with the following example index keys will be found in
41562306a36Sopenharmony_cithe appropriate nodes::
41662306a36Sopenharmony_ci
41762306a36Sopenharmony_ci    INDEX KEY       PREFIX  NODE
41862306a36Sopenharmony_ci    =============== ======= ====
41962306a36Sopenharmony_ci    13694892892489  13      C
42062306a36Sopenharmony_ci    13795289025897  137     D
42162306a36Sopenharmony_ci    13889dde88793   138     E
42262306a36Sopenharmony_ci    138bbb89003093  138     E
42362306a36Sopenharmony_ci    1394879524789   12      C
42462306a36Sopenharmony_ci    1458952489      1       B
42562306a36Sopenharmony_ci    9431809de993ba  -       A
42662306a36Sopenharmony_ci    b4542910809cd   -       A
42762306a36Sopenharmony_ci    e5284310def98   e       F
42862306a36Sopenharmony_ci    e68428974237    e6      G
42962306a36Sopenharmony_ci    e7fffcbd443     e       F
43062306a36Sopenharmony_ci    f3842239082     -       A
43162306a36Sopenharmony_ci
43262306a36Sopenharmony_ciTo save memory, if a node can hold all the leaves in its portion of keyspace,
43362306a36Sopenharmony_cithen the node will have all those leaves in it and will not have any metadata
43462306a36Sopenharmony_cipointers - even if some of those leaves would like to be in the same slot.
43562306a36Sopenharmony_ci
43662306a36Sopenharmony_ciA node can contain a heterogeneous mix of leaves and metadata pointers.
43762306a36Sopenharmony_ciMetadata pointers must be in the slots that match their subdivisions of key
43862306a36Sopenharmony_cispace.  The leaves can be in any slot not occupied by a metadata pointer.  It
43962306a36Sopenharmony_ciis guaranteed that none of the leaves in a node will match a slot occupied by a
44062306a36Sopenharmony_cimetadata pointer.  If the metadata pointer is there, any leaf whose key matches
44162306a36Sopenharmony_cithe metadata key prefix must be in the subtree that the metadata pointer points
44262306a36Sopenharmony_cito.
44362306a36Sopenharmony_ci
44462306a36Sopenharmony_ciIn the above example list of index keys, node A will contain::
44562306a36Sopenharmony_ci
44662306a36Sopenharmony_ci    SLOT    CONTENT         INDEX KEY (PREFIX)
44762306a36Sopenharmony_ci    ====    =============== ==================
44862306a36Sopenharmony_ci    1       PTR TO NODE B   1*
44962306a36Sopenharmony_ci    any     LEAF            9431809de993ba
45062306a36Sopenharmony_ci    any     LEAF            b4542910809cd
45162306a36Sopenharmony_ci    e       PTR TO NODE F   e*
45262306a36Sopenharmony_ci    any     LEAF            f3842239082
45362306a36Sopenharmony_ci
45462306a36Sopenharmony_ciand node B::
45562306a36Sopenharmony_ci
45662306a36Sopenharmony_ci    3	PTR TO NODE C	13*
45762306a36Sopenharmony_ci    any	LEAF		1458952489
45862306a36Sopenharmony_ci
45962306a36Sopenharmony_ci
46062306a36Sopenharmony_ciShortcuts
46162306a36Sopenharmony_ci---------
46262306a36Sopenharmony_ci
46362306a36Sopenharmony_ciShortcuts are metadata records that jump over a piece of keyspace.  A shortcut
46462306a36Sopenharmony_ciis a replacement for a series of single-occupancy nodes ascending through the
46562306a36Sopenharmony_cilevels.  Shortcuts exist to save memory and to speed up traversal.
46662306a36Sopenharmony_ci
46762306a36Sopenharmony_ciIt is possible for the root of the tree to be a shortcut - say, for example,
46862306a36Sopenharmony_cithe tree contains at least 17 nodes all with key prefix ``1111``.  The
46962306a36Sopenharmony_ciinsertion algorithm will insert a shortcut to skip over the ``1111`` keyspace
47062306a36Sopenharmony_ciin a single bound and get to the fourth level where these actually become
47162306a36Sopenharmony_cidifferent.
47262306a36Sopenharmony_ci
47362306a36Sopenharmony_ci
47462306a36Sopenharmony_ciSplitting And Collapsing Nodes
47562306a36Sopenharmony_ci------------------------------
47662306a36Sopenharmony_ci
47762306a36Sopenharmony_ciEach node has a maximum capacity of 16 leaves and metadata pointers.  If the
47862306a36Sopenharmony_ciinsertion algorithm finds that it is trying to insert a 17th object into a
47962306a36Sopenharmony_cinode, that node will be split such that at least two leaves that have a common
48062306a36Sopenharmony_cikey segment at that level end up in a separate node rooted on that slot for
48162306a36Sopenharmony_cithat common key segment.
48262306a36Sopenharmony_ci
48362306a36Sopenharmony_ciIf the leaves in a full node and the leaf that is being inserted are
48462306a36Sopenharmony_cisufficiently similar, then a shortcut will be inserted into the tree.
48562306a36Sopenharmony_ci
48662306a36Sopenharmony_ciWhen the number of objects in the subtree rooted at a node falls to 16 or
48762306a36Sopenharmony_cifewer, then the subtree will be collapsed down to a single node - and this will
48862306a36Sopenharmony_ciripple towards the root if possible.
48962306a36Sopenharmony_ci
49062306a36Sopenharmony_ci
49162306a36Sopenharmony_ciNon-Recursive Iteration
49262306a36Sopenharmony_ci-----------------------
49362306a36Sopenharmony_ci
49462306a36Sopenharmony_ciEach node and shortcut contains a back pointer to its parent and the number of
49562306a36Sopenharmony_cislot in that parent that points to it.  None-recursive iteration uses these to
49662306a36Sopenharmony_ciproceed rootwards through the tree, going to the parent node, slot N + 1 to
49762306a36Sopenharmony_cimake sure progress is made without the need for a stack.
49862306a36Sopenharmony_ci
49962306a36Sopenharmony_ciThe backpointers, however, make simultaneous alteration and iteration tricky.
50062306a36Sopenharmony_ci
50162306a36Sopenharmony_ci
50262306a36Sopenharmony_ciSimultaneous Alteration And Iteration
50362306a36Sopenharmony_ci-------------------------------------
50462306a36Sopenharmony_ci
50562306a36Sopenharmony_ciThere are a number of cases to consider:
50662306a36Sopenharmony_ci
50762306a36Sopenharmony_ci1. Simple insert/replace.  This involves simply replacing a NULL or old
50862306a36Sopenharmony_ci   matching leaf pointer with the pointer to the new leaf after a barrier.
50962306a36Sopenharmony_ci   The metadata blocks don't change otherwise.  An old leaf won't be freed
51062306a36Sopenharmony_ci   until after the RCU grace period.
51162306a36Sopenharmony_ci
51262306a36Sopenharmony_ci2. Simple delete.  This involves just clearing an old matching leaf.  The
51362306a36Sopenharmony_ci   metadata blocks don't change otherwise.  The old leaf won't be freed until
51462306a36Sopenharmony_ci   after the RCU grace period.
51562306a36Sopenharmony_ci
51662306a36Sopenharmony_ci3. Insertion replacing part of a subtree that we haven't yet entered.  This
51762306a36Sopenharmony_ci   may involve replacement of part of that subtree - but that won't affect
51862306a36Sopenharmony_ci   the iteration as we won't have reached the pointer to it yet and the
51962306a36Sopenharmony_ci   ancestry blocks are not replaced (the layout of those does not change).
52062306a36Sopenharmony_ci
52162306a36Sopenharmony_ci4. Insertion replacing nodes that we're actively processing.  This isn't a
52262306a36Sopenharmony_ci   problem as we've passed the anchoring pointer and won't switch onto the
52362306a36Sopenharmony_ci   new layout until we follow the back pointers - at which point we've
52462306a36Sopenharmony_ci   already examined the leaves in the replaced node (we iterate over all the
52562306a36Sopenharmony_ci   leaves in a node before following any of its metadata pointers).
52662306a36Sopenharmony_ci
52762306a36Sopenharmony_ci   We might, however, re-see some leaves that have been split out into a new
52862306a36Sopenharmony_ci   branch that's in a slot further along than we were at.
52962306a36Sopenharmony_ci
53062306a36Sopenharmony_ci5. Insertion replacing nodes that we're processing a dependent branch of.
53162306a36Sopenharmony_ci   This won't affect us until we follow the back pointers.  Similar to (4).
53262306a36Sopenharmony_ci
53362306a36Sopenharmony_ci6. Deletion collapsing a branch under us.  This doesn't affect us because the
53462306a36Sopenharmony_ci   back pointers will get us back to the parent of the new node before we
53562306a36Sopenharmony_ci   could see the new node.  The entire collapsed subtree is thrown away
53662306a36Sopenharmony_ci   unchanged - and will still be rooted on the same slot, so we shouldn't
53762306a36Sopenharmony_ci   process it a second time as we'll go back to slot + 1.
53862306a36Sopenharmony_ci
53962306a36Sopenharmony_ci.. note::
54062306a36Sopenharmony_ci
54162306a36Sopenharmony_ci   Under some circumstances, we need to simultaneously change the parent
54262306a36Sopenharmony_ci   pointer and the parent slot pointer on a node (say, for example, we
54362306a36Sopenharmony_ci   inserted another node before it and moved it up a level).  We cannot do
54462306a36Sopenharmony_ci   this without locking against a read - so we have to replace that node too.
54562306a36Sopenharmony_ci
54662306a36Sopenharmony_ci   However, when we're changing a shortcut into a node this isn't a problem
54762306a36Sopenharmony_ci   as shortcuts only have one slot and so the parent slot number isn't used
54862306a36Sopenharmony_ci   when traversing backwards over one.  This means that it's okay to change
54962306a36Sopenharmony_ci   the slot number first - provided suitable barriers are used to make sure
55062306a36Sopenharmony_ci   the parent slot number is read after the back pointer.
55162306a36Sopenharmony_ci
55262306a36Sopenharmony_ciObsolete blocks and leaves are freed up after an RCU grace period has passed,
55362306a36Sopenharmony_ciso as long as anyone doing walking or iteration holds the RCU read lock, the
55462306a36Sopenharmony_ciold superstructure should not go away on them.
555