162306a36Sopenharmony_ci======================================== 262306a36Sopenharmony_ciGeneric Associative Array Implementation 362306a36Sopenharmony_ci======================================== 462306a36Sopenharmony_ci 562306a36Sopenharmony_ciOverview 662306a36Sopenharmony_ci======== 762306a36Sopenharmony_ci 862306a36Sopenharmony_ciThis associative array implementation is an object container with the following 962306a36Sopenharmony_ciproperties: 1062306a36Sopenharmony_ci 1162306a36Sopenharmony_ci1. Objects are opaque pointers. The implementation does not care where they 1262306a36Sopenharmony_ci point (if anywhere) or what they point to (if anything). 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_ci .. note:: 1562306a36Sopenharmony_ci 1662306a36Sopenharmony_ci Pointers to objects _must_ be zero in the least significant bit. 1762306a36Sopenharmony_ci 1862306a36Sopenharmony_ci2. Objects do not need to contain linkage blocks for use by the array. This 1962306a36Sopenharmony_ci permits an object to be located in multiple arrays simultaneously. 2062306a36Sopenharmony_ci Rather, the array is made up of metadata blocks that point to objects. 2162306a36Sopenharmony_ci 2262306a36Sopenharmony_ci3. Objects require index keys to locate them within the array. 2362306a36Sopenharmony_ci 2462306a36Sopenharmony_ci4. Index keys must be unique. Inserting an object with the same key as one 2562306a36Sopenharmony_ci already in the array will replace the old object. 2662306a36Sopenharmony_ci 2762306a36Sopenharmony_ci5. Index keys can be of any length and can be of different lengths. 2862306a36Sopenharmony_ci 2962306a36Sopenharmony_ci6. Index keys should encode the length early on, before any variation due to 3062306a36Sopenharmony_ci length is seen. 3162306a36Sopenharmony_ci 3262306a36Sopenharmony_ci7. Index keys can include a hash to scatter objects throughout the array. 3362306a36Sopenharmony_ci 3462306a36Sopenharmony_ci8. The array can iterated over. The objects will not necessarily come out in 3562306a36Sopenharmony_ci key order. 3662306a36Sopenharmony_ci 3762306a36Sopenharmony_ci9. The array can be iterated over while it is being modified, provided the 3862306a36Sopenharmony_ci RCU readlock is being held by the iterator. Note, however, under these 3962306a36Sopenharmony_ci circumstances, some objects may be seen more than once. If this is a 4062306a36Sopenharmony_ci problem, the iterator should lock against modification. Objects will not 4162306a36Sopenharmony_ci be missed, however, unless deleted. 4262306a36Sopenharmony_ci 4362306a36Sopenharmony_ci10. Objects in the array can be looked up by means of their index key. 4462306a36Sopenharmony_ci 4562306a36Sopenharmony_ci11. Objects can be looked up while the array is being modified, provided the 4662306a36Sopenharmony_ci RCU readlock is being held by the thread doing the look up. 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ciThe implementation uses a tree of 16-pointer nodes internally that are indexed 4962306a36Sopenharmony_cion each level by nibbles from the index key in the same manner as in a radix 5062306a36Sopenharmony_citree. To improve memory efficiency, shortcuts can be emplaced to skip over 5162306a36Sopenharmony_ciwhat would otherwise be a series of single-occupancy nodes. Further, nodes 5262306a36Sopenharmony_cipack leaf object pointers into spare space in the node rather than making an 5362306a36Sopenharmony_ciextra branch until as such time an object needs to be added to a full node. 5462306a36Sopenharmony_ci 5562306a36Sopenharmony_ci 5662306a36Sopenharmony_ciThe Public API 5762306a36Sopenharmony_ci============== 5862306a36Sopenharmony_ci 5962306a36Sopenharmony_ciThe public API can be found in ``<linux/assoc_array.h>``. The associative 6062306a36Sopenharmony_ciarray is rooted on the following structure:: 6162306a36Sopenharmony_ci 6262306a36Sopenharmony_ci struct assoc_array { 6362306a36Sopenharmony_ci ... 6462306a36Sopenharmony_ci }; 6562306a36Sopenharmony_ci 6662306a36Sopenharmony_ciThe code is selected by enabling ``CONFIG_ASSOCIATIVE_ARRAY`` with:: 6762306a36Sopenharmony_ci 6862306a36Sopenharmony_ci ./script/config -e ASSOCIATIVE_ARRAY 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_ci 7162306a36Sopenharmony_ciEdit Script 7262306a36Sopenharmony_ci----------- 7362306a36Sopenharmony_ci 7462306a36Sopenharmony_ciThe insertion and deletion functions produce an 'edit script' that can later be 7562306a36Sopenharmony_ciapplied to effect the changes without risking ``ENOMEM``. This retains the 7662306a36Sopenharmony_cipreallocated metadata blocks that will be installed in the internal tree and 7762306a36Sopenharmony_cikeeps track of the metadata blocks that will be removed from the tree when the 7862306a36Sopenharmony_ciscript is applied. 7962306a36Sopenharmony_ci 8062306a36Sopenharmony_ciThis is also used to keep track of dead blocks and dead objects after the 8162306a36Sopenharmony_ciscript has been applied so that they can be freed later. The freeing is done 8262306a36Sopenharmony_ciafter an RCU grace period has passed - thus allowing access functions to 8362306a36Sopenharmony_ciproceed under the RCU read lock. 8462306a36Sopenharmony_ci 8562306a36Sopenharmony_ciThe script appears as outside of the API as a pointer of the type:: 8662306a36Sopenharmony_ci 8762306a36Sopenharmony_ci struct assoc_array_edit; 8862306a36Sopenharmony_ci 8962306a36Sopenharmony_ciThere are two functions for dealing with the script: 9062306a36Sopenharmony_ci 9162306a36Sopenharmony_ci1. Apply an edit script:: 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_ci void assoc_array_apply_edit(struct assoc_array_edit *edit); 9462306a36Sopenharmony_ci 9562306a36Sopenharmony_ciThis will perform the edit functions, interpolating various write barriers 9662306a36Sopenharmony_cito permit accesses under the RCU read lock to continue. The edit script 9762306a36Sopenharmony_ciwill then be passed to ``call_rcu()`` to free it and any dead stuff it points 9862306a36Sopenharmony_cito. 9962306a36Sopenharmony_ci 10062306a36Sopenharmony_ci2. Cancel an edit script:: 10162306a36Sopenharmony_ci 10262306a36Sopenharmony_ci void assoc_array_cancel_edit(struct assoc_array_edit *edit); 10362306a36Sopenharmony_ci 10462306a36Sopenharmony_ciThis frees the edit script and all preallocated memory immediately. If 10562306a36Sopenharmony_cithis was for insertion, the new object is _not_ released by this function, 10662306a36Sopenharmony_cibut must rather be released by the caller. 10762306a36Sopenharmony_ci 10862306a36Sopenharmony_ciThese functions are guaranteed not to fail. 10962306a36Sopenharmony_ci 11062306a36Sopenharmony_ci 11162306a36Sopenharmony_ciOperations Table 11262306a36Sopenharmony_ci---------------- 11362306a36Sopenharmony_ci 11462306a36Sopenharmony_ciVarious functions take a table of operations:: 11562306a36Sopenharmony_ci 11662306a36Sopenharmony_ci struct assoc_array_ops { 11762306a36Sopenharmony_ci ... 11862306a36Sopenharmony_ci }; 11962306a36Sopenharmony_ci 12062306a36Sopenharmony_ciThis points to a number of methods, all of which need to be provided: 12162306a36Sopenharmony_ci 12262306a36Sopenharmony_ci1. Get a chunk of index key from caller data:: 12362306a36Sopenharmony_ci 12462306a36Sopenharmony_ci unsigned long (*get_key_chunk)(const void *index_key, int level); 12562306a36Sopenharmony_ci 12662306a36Sopenharmony_ciThis should return a chunk of caller-supplied index key starting at the 12762306a36Sopenharmony_ci*bit* position given by the level argument. The level argument will be a 12862306a36Sopenharmony_cimultiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return 12962306a36Sopenharmony_ci``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``. No error is possible. 13062306a36Sopenharmony_ci 13162306a36Sopenharmony_ci 13262306a36Sopenharmony_ci2. Get a chunk of an object's index key:: 13362306a36Sopenharmony_ci 13462306a36Sopenharmony_ci unsigned long (*get_object_key_chunk)(const void *object, int level); 13562306a36Sopenharmony_ci 13662306a36Sopenharmony_ciAs the previous function, but gets its data from an object in the array 13762306a36Sopenharmony_cirather than from a caller-supplied index key. 13862306a36Sopenharmony_ci 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ci3. See if this is the object we're looking for:: 14162306a36Sopenharmony_ci 14262306a36Sopenharmony_ci bool (*compare_object)(const void *object, const void *index_key); 14362306a36Sopenharmony_ci 14462306a36Sopenharmony_ciCompare the object against an index key and return ``true`` if it matches and 14562306a36Sopenharmony_ci``false`` if it doesn't. 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ci 14862306a36Sopenharmony_ci4. Diff the index keys of two objects:: 14962306a36Sopenharmony_ci 15062306a36Sopenharmony_ci int (*diff_objects)(const void *object, const void *index_key); 15162306a36Sopenharmony_ci 15262306a36Sopenharmony_ciReturn the bit position at which the index key of the specified object 15362306a36Sopenharmony_cidiffers from the given index key or -1 if they are the same. 15462306a36Sopenharmony_ci 15562306a36Sopenharmony_ci 15662306a36Sopenharmony_ci5. Free an object:: 15762306a36Sopenharmony_ci 15862306a36Sopenharmony_ci void (*free_object)(void *object); 15962306a36Sopenharmony_ci 16062306a36Sopenharmony_ciFree the specified object. Note that this may be called an RCU grace period 16162306a36Sopenharmony_ciafter ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may be 16262306a36Sopenharmony_cinecessary on module unloading. 16362306a36Sopenharmony_ci 16462306a36Sopenharmony_ci 16562306a36Sopenharmony_ciManipulation Functions 16662306a36Sopenharmony_ci---------------------- 16762306a36Sopenharmony_ci 16862306a36Sopenharmony_ciThere are a number of functions for manipulating an associative array: 16962306a36Sopenharmony_ci 17062306a36Sopenharmony_ci1. Initialise an associative array:: 17162306a36Sopenharmony_ci 17262306a36Sopenharmony_ci void assoc_array_init(struct assoc_array *array); 17362306a36Sopenharmony_ci 17462306a36Sopenharmony_ciThis initialises the base structure for an associative array. It can't fail. 17562306a36Sopenharmony_ci 17662306a36Sopenharmony_ci 17762306a36Sopenharmony_ci2. Insert/replace an object in an associative array:: 17862306a36Sopenharmony_ci 17962306a36Sopenharmony_ci struct assoc_array_edit * 18062306a36Sopenharmony_ci assoc_array_insert(struct assoc_array *array, 18162306a36Sopenharmony_ci const struct assoc_array_ops *ops, 18262306a36Sopenharmony_ci const void *index_key, 18362306a36Sopenharmony_ci void *object); 18462306a36Sopenharmony_ci 18562306a36Sopenharmony_ciThis inserts the given object into the array. Note that the least 18662306a36Sopenharmony_cisignificant bit of the pointer must be zero as it's used to type-mark 18762306a36Sopenharmony_cipointers internally. 18862306a36Sopenharmony_ci 18962306a36Sopenharmony_ciIf an object already exists for that key then it will be replaced with the 19062306a36Sopenharmony_cinew object and the old one will be freed automatically. 19162306a36Sopenharmony_ci 19262306a36Sopenharmony_ciThe ``index_key`` argument should hold index key information and is 19362306a36Sopenharmony_cipassed to the methods in the ops table when they are called. 19462306a36Sopenharmony_ci 19562306a36Sopenharmony_ciThis function makes no alteration to the array itself, but rather returns 19662306a36Sopenharmony_cian edit script that must be applied. ``-ENOMEM`` is returned in the case of 19762306a36Sopenharmony_cian out-of-memory error. 19862306a36Sopenharmony_ci 19962306a36Sopenharmony_ciThe caller should lock exclusively against other modifiers of the array. 20062306a36Sopenharmony_ci 20162306a36Sopenharmony_ci 20262306a36Sopenharmony_ci3. Delete an object from an associative array:: 20362306a36Sopenharmony_ci 20462306a36Sopenharmony_ci struct assoc_array_edit * 20562306a36Sopenharmony_ci assoc_array_delete(struct assoc_array *array, 20662306a36Sopenharmony_ci const struct assoc_array_ops *ops, 20762306a36Sopenharmony_ci const void *index_key); 20862306a36Sopenharmony_ci 20962306a36Sopenharmony_ciThis deletes an object that matches the specified data from the array. 21062306a36Sopenharmony_ci 21162306a36Sopenharmony_ciThe ``index_key`` argument should hold index key information and is 21262306a36Sopenharmony_cipassed to the methods in the ops table when they are called. 21362306a36Sopenharmony_ci 21462306a36Sopenharmony_ciThis function makes no alteration to the array itself, but rather returns 21562306a36Sopenharmony_cian edit script that must be applied. ``-ENOMEM`` is returned in the case of 21662306a36Sopenharmony_cian out-of-memory error. ``NULL`` will be returned if the specified object is 21762306a36Sopenharmony_cinot found within the array. 21862306a36Sopenharmony_ci 21962306a36Sopenharmony_ciThe caller should lock exclusively against other modifiers of the array. 22062306a36Sopenharmony_ci 22162306a36Sopenharmony_ci 22262306a36Sopenharmony_ci4. Delete all objects from an associative array:: 22362306a36Sopenharmony_ci 22462306a36Sopenharmony_ci struct assoc_array_edit * 22562306a36Sopenharmony_ci assoc_array_clear(struct assoc_array *array, 22662306a36Sopenharmony_ci const struct assoc_array_ops *ops); 22762306a36Sopenharmony_ci 22862306a36Sopenharmony_ciThis deletes all the objects from an associative array and leaves it 22962306a36Sopenharmony_cicompletely empty. 23062306a36Sopenharmony_ci 23162306a36Sopenharmony_ciThis function makes no alteration to the array itself, but rather returns 23262306a36Sopenharmony_cian edit script that must be applied. ``-ENOMEM`` is returned in the case of 23362306a36Sopenharmony_cian out-of-memory error. 23462306a36Sopenharmony_ci 23562306a36Sopenharmony_ciThe caller should lock exclusively against other modifiers of the array. 23662306a36Sopenharmony_ci 23762306a36Sopenharmony_ci 23862306a36Sopenharmony_ci5. Destroy an associative array, deleting all objects:: 23962306a36Sopenharmony_ci 24062306a36Sopenharmony_ci void assoc_array_destroy(struct assoc_array *array, 24162306a36Sopenharmony_ci const struct assoc_array_ops *ops); 24262306a36Sopenharmony_ci 24362306a36Sopenharmony_ciThis destroys the contents of the associative array and leaves it 24462306a36Sopenharmony_cicompletely empty. It is not permitted for another thread to be traversing 24562306a36Sopenharmony_cithe array under the RCU read lock at the same time as this function is 24662306a36Sopenharmony_cidestroying it as no RCU deferral is performed on memory release - 24762306a36Sopenharmony_cisomething that would require memory to be allocated. 24862306a36Sopenharmony_ci 24962306a36Sopenharmony_ciThe caller should lock exclusively against other modifiers and accessors 25062306a36Sopenharmony_ciof the array. 25162306a36Sopenharmony_ci 25262306a36Sopenharmony_ci 25362306a36Sopenharmony_ci6. Garbage collect an associative array:: 25462306a36Sopenharmony_ci 25562306a36Sopenharmony_ci int assoc_array_gc(struct assoc_array *array, 25662306a36Sopenharmony_ci const struct assoc_array_ops *ops, 25762306a36Sopenharmony_ci bool (*iterator)(void *object, void *iterator_data), 25862306a36Sopenharmony_ci void *iterator_data); 25962306a36Sopenharmony_ci 26062306a36Sopenharmony_ciThis iterates over the objects in an associative array and passes each one to 26162306a36Sopenharmony_ci``iterator()``. If ``iterator()`` returns ``true``, the object is kept. If it 26262306a36Sopenharmony_cireturns ``false``, the object will be freed. If the ``iterator()`` function 26362306a36Sopenharmony_cireturns ``true``, it must perform any appropriate refcount incrementing on the 26462306a36Sopenharmony_ciobject before returning. 26562306a36Sopenharmony_ci 26662306a36Sopenharmony_ciThe internal tree will be packed down if possible as part of the iteration 26762306a36Sopenharmony_cito reduce the number of nodes in it. 26862306a36Sopenharmony_ci 26962306a36Sopenharmony_ciThe ``iterator_data`` is passed directly to ``iterator()`` and is otherwise 27062306a36Sopenharmony_ciignored by the function. 27162306a36Sopenharmony_ci 27262306a36Sopenharmony_ciThe function will return ``0`` if successful and ``-ENOMEM`` if there wasn't 27362306a36Sopenharmony_cienough memory. 27462306a36Sopenharmony_ci 27562306a36Sopenharmony_ciIt is possible for other threads to iterate over or search the array under 27662306a36Sopenharmony_cithe RCU read lock while this function is in progress. The caller should 27762306a36Sopenharmony_cilock exclusively against other modifiers of the array. 27862306a36Sopenharmony_ci 27962306a36Sopenharmony_ci 28062306a36Sopenharmony_ciAccess Functions 28162306a36Sopenharmony_ci---------------- 28262306a36Sopenharmony_ci 28362306a36Sopenharmony_ciThere are two functions for accessing an associative array: 28462306a36Sopenharmony_ci 28562306a36Sopenharmony_ci1. Iterate over all the objects in an associative array:: 28662306a36Sopenharmony_ci 28762306a36Sopenharmony_ci int assoc_array_iterate(const struct assoc_array *array, 28862306a36Sopenharmony_ci int (*iterator)(const void *object, 28962306a36Sopenharmony_ci void *iterator_data), 29062306a36Sopenharmony_ci void *iterator_data); 29162306a36Sopenharmony_ci 29262306a36Sopenharmony_ciThis passes each object in the array to the iterator callback function. 29362306a36Sopenharmony_ci``iterator_data`` is private data for that function. 29462306a36Sopenharmony_ci 29562306a36Sopenharmony_ciThis may be used on an array at the same time as the array is being 29662306a36Sopenharmony_cimodified, provided the RCU read lock is held. Under such circumstances, 29762306a36Sopenharmony_ciit is possible for the iteration function to see some objects twice. If 29862306a36Sopenharmony_cithis is a problem, then modification should be locked against. The 29962306a36Sopenharmony_ciiteration algorithm should not, however, miss any objects. 30062306a36Sopenharmony_ci 30162306a36Sopenharmony_ciThe function will return ``0`` if no objects were in the array or else it will 30262306a36Sopenharmony_cireturn the result of the last iterator function called. Iteration stops 30362306a36Sopenharmony_ciimmediately if any call to the iteration function results in a non-zero 30462306a36Sopenharmony_cireturn. 30562306a36Sopenharmony_ci 30662306a36Sopenharmony_ci 30762306a36Sopenharmony_ci2. Find an object in an associative array:: 30862306a36Sopenharmony_ci 30962306a36Sopenharmony_ci void *assoc_array_find(const struct assoc_array *array, 31062306a36Sopenharmony_ci const struct assoc_array_ops *ops, 31162306a36Sopenharmony_ci const void *index_key); 31262306a36Sopenharmony_ci 31362306a36Sopenharmony_ciThis walks through the array's internal tree directly to the object 31462306a36Sopenharmony_cispecified by the index key.. 31562306a36Sopenharmony_ci 31662306a36Sopenharmony_ciThis may be used on an array at the same time as the array is being 31762306a36Sopenharmony_cimodified, provided the RCU read lock is held. 31862306a36Sopenharmony_ci 31962306a36Sopenharmony_ciThe function will return the object if found (and set ``*_type`` to the object 32062306a36Sopenharmony_citype) or will return ``NULL`` if the object was not found. 32162306a36Sopenharmony_ci 32262306a36Sopenharmony_ci 32362306a36Sopenharmony_ciIndex Key Form 32462306a36Sopenharmony_ci-------------- 32562306a36Sopenharmony_ci 32662306a36Sopenharmony_ciThe index key can be of any form, but since the algorithms aren't told how long 32762306a36Sopenharmony_cithe key is, it is strongly recommended that the index key includes its length 32862306a36Sopenharmony_civery early on before any variation due to the length would have an effect on 32962306a36Sopenharmony_cicomparisons. 33062306a36Sopenharmony_ci 33162306a36Sopenharmony_ciThis will cause leaves with different length keys to scatter away from each 33262306a36Sopenharmony_ciother - and those with the same length keys to cluster together. 33362306a36Sopenharmony_ci 33462306a36Sopenharmony_ciIt is also recommended that the index key begin with a hash of the rest of the 33562306a36Sopenharmony_cikey to maximise scattering throughout keyspace. 33662306a36Sopenharmony_ci 33762306a36Sopenharmony_ciThe better the scattering, the wider and lower the internal tree will be. 33862306a36Sopenharmony_ci 33962306a36Sopenharmony_ciPoor scattering isn't too much of a problem as there are shortcuts and nodes 34062306a36Sopenharmony_cican contain mixtures of leaves and metadata pointers. 34162306a36Sopenharmony_ci 34262306a36Sopenharmony_ciThe index key is read in chunks of machine word. Each chunk is subdivided into 34362306a36Sopenharmony_cione nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and 34462306a36Sopenharmony_cion a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is 34562306a36Sopenharmony_ciunlikely that more than one word of any particular index key will have to be 34662306a36Sopenharmony_ciused. 34762306a36Sopenharmony_ci 34862306a36Sopenharmony_ci 34962306a36Sopenharmony_ciInternal Workings 35062306a36Sopenharmony_ci================= 35162306a36Sopenharmony_ci 35262306a36Sopenharmony_ciThe associative array data structure has an internal tree. This tree is 35362306a36Sopenharmony_ciconstructed of two types of metadata blocks: nodes and shortcuts. 35462306a36Sopenharmony_ci 35562306a36Sopenharmony_ciA node is an array of slots. Each slot can contain one of four things: 35662306a36Sopenharmony_ci 35762306a36Sopenharmony_ci* A NULL pointer, indicating that the slot is empty. 35862306a36Sopenharmony_ci* A pointer to an object (a leaf). 35962306a36Sopenharmony_ci* A pointer to a node at the next level. 36062306a36Sopenharmony_ci* A pointer to a shortcut. 36162306a36Sopenharmony_ci 36262306a36Sopenharmony_ci 36362306a36Sopenharmony_ciBasic Internal Tree Layout 36462306a36Sopenharmony_ci-------------------------- 36562306a36Sopenharmony_ci 36662306a36Sopenharmony_ciIgnoring shortcuts for the moment, the nodes form a multilevel tree. The index 36762306a36Sopenharmony_cikey space is strictly subdivided by the nodes in the tree and nodes occur on 36862306a36Sopenharmony_cifixed levels. For example:: 36962306a36Sopenharmony_ci 37062306a36Sopenharmony_ci Level: 0 1 2 3 37162306a36Sopenharmony_ci =============== =============== =============== =============== 37262306a36Sopenharmony_ci NODE D 37362306a36Sopenharmony_ci NODE B NODE C +------>+---+ 37462306a36Sopenharmony_ci +------>+---+ +------>+---+ | | 0 | 37562306a36Sopenharmony_ci NODE A | | 0 | | | 0 | | +---+ 37662306a36Sopenharmony_ci +---+ | +---+ | +---+ | : : 37762306a36Sopenharmony_ci | 0 | | : : | : : | +---+ 37862306a36Sopenharmony_ci +---+ | +---+ | +---+ | | f | 37962306a36Sopenharmony_ci | 1 |---+ | 3 |---+ | 7 |---+ +---+ 38062306a36Sopenharmony_ci +---+ +---+ +---+ 38162306a36Sopenharmony_ci : : : : | 8 |---+ 38262306a36Sopenharmony_ci +---+ +---+ +---+ | NODE E 38362306a36Sopenharmony_ci | e |---+ | f | : : +------>+---+ 38462306a36Sopenharmony_ci +---+ | +---+ +---+ | 0 | 38562306a36Sopenharmony_ci | f | | | f | +---+ 38662306a36Sopenharmony_ci +---+ | +---+ : : 38762306a36Sopenharmony_ci | NODE F +---+ 38862306a36Sopenharmony_ci +------>+---+ | f | 38962306a36Sopenharmony_ci | 0 | NODE G +---+ 39062306a36Sopenharmony_ci +---+ +------>+---+ 39162306a36Sopenharmony_ci : : | | 0 | 39262306a36Sopenharmony_ci +---+ | +---+ 39362306a36Sopenharmony_ci | 6 |---+ : : 39462306a36Sopenharmony_ci +---+ +---+ 39562306a36Sopenharmony_ci : : | f | 39662306a36Sopenharmony_ci +---+ +---+ 39762306a36Sopenharmony_ci | f | 39862306a36Sopenharmony_ci +---+ 39962306a36Sopenharmony_ci 40062306a36Sopenharmony_ciIn the above example, there are 7 nodes (A-G), each with 16 slots (0-f). 40162306a36Sopenharmony_ciAssuming no other meta data nodes in the tree, the key space is divided 40262306a36Sopenharmony_cithusly:: 40362306a36Sopenharmony_ci 40462306a36Sopenharmony_ci KEY PREFIX NODE 40562306a36Sopenharmony_ci ========== ==== 40662306a36Sopenharmony_ci 137* D 40762306a36Sopenharmony_ci 138* E 40862306a36Sopenharmony_ci 13[0-69-f]* C 40962306a36Sopenharmony_ci 1[0-24-f]* B 41062306a36Sopenharmony_ci e6* G 41162306a36Sopenharmony_ci e[0-57-f]* F 41262306a36Sopenharmony_ci [02-df]* A 41362306a36Sopenharmony_ci 41462306a36Sopenharmony_ciSo, for instance, keys with the following example index keys will be found in 41562306a36Sopenharmony_cithe appropriate nodes:: 41662306a36Sopenharmony_ci 41762306a36Sopenharmony_ci INDEX KEY PREFIX NODE 41862306a36Sopenharmony_ci =============== ======= ==== 41962306a36Sopenharmony_ci 13694892892489 13 C 42062306a36Sopenharmony_ci 13795289025897 137 D 42162306a36Sopenharmony_ci 13889dde88793 138 E 42262306a36Sopenharmony_ci 138bbb89003093 138 E 42362306a36Sopenharmony_ci 1394879524789 12 C 42462306a36Sopenharmony_ci 1458952489 1 B 42562306a36Sopenharmony_ci 9431809de993ba - A 42662306a36Sopenharmony_ci b4542910809cd - A 42762306a36Sopenharmony_ci e5284310def98 e F 42862306a36Sopenharmony_ci e68428974237 e6 G 42962306a36Sopenharmony_ci e7fffcbd443 e F 43062306a36Sopenharmony_ci f3842239082 - A 43162306a36Sopenharmony_ci 43262306a36Sopenharmony_ciTo save memory, if a node can hold all the leaves in its portion of keyspace, 43362306a36Sopenharmony_cithen the node will have all those leaves in it and will not have any metadata 43462306a36Sopenharmony_cipointers - even if some of those leaves would like to be in the same slot. 43562306a36Sopenharmony_ci 43662306a36Sopenharmony_ciA node can contain a heterogeneous mix of leaves and metadata pointers. 43762306a36Sopenharmony_ciMetadata pointers must be in the slots that match their subdivisions of key 43862306a36Sopenharmony_cispace. The leaves can be in any slot not occupied by a metadata pointer. It 43962306a36Sopenharmony_ciis guaranteed that none of the leaves in a node will match a slot occupied by a 44062306a36Sopenharmony_cimetadata pointer. If the metadata pointer is there, any leaf whose key matches 44162306a36Sopenharmony_cithe metadata key prefix must be in the subtree that the metadata pointer points 44262306a36Sopenharmony_cito. 44362306a36Sopenharmony_ci 44462306a36Sopenharmony_ciIn the above example list of index keys, node A will contain:: 44562306a36Sopenharmony_ci 44662306a36Sopenharmony_ci SLOT CONTENT INDEX KEY (PREFIX) 44762306a36Sopenharmony_ci ==== =============== ================== 44862306a36Sopenharmony_ci 1 PTR TO NODE B 1* 44962306a36Sopenharmony_ci any LEAF 9431809de993ba 45062306a36Sopenharmony_ci any LEAF b4542910809cd 45162306a36Sopenharmony_ci e PTR TO NODE F e* 45262306a36Sopenharmony_ci any LEAF f3842239082 45362306a36Sopenharmony_ci 45462306a36Sopenharmony_ciand node B:: 45562306a36Sopenharmony_ci 45662306a36Sopenharmony_ci 3 PTR TO NODE C 13* 45762306a36Sopenharmony_ci any LEAF 1458952489 45862306a36Sopenharmony_ci 45962306a36Sopenharmony_ci 46062306a36Sopenharmony_ciShortcuts 46162306a36Sopenharmony_ci--------- 46262306a36Sopenharmony_ci 46362306a36Sopenharmony_ciShortcuts are metadata records that jump over a piece of keyspace. A shortcut 46462306a36Sopenharmony_ciis a replacement for a series of single-occupancy nodes ascending through the 46562306a36Sopenharmony_cilevels. Shortcuts exist to save memory and to speed up traversal. 46662306a36Sopenharmony_ci 46762306a36Sopenharmony_ciIt is possible for the root of the tree to be a shortcut - say, for example, 46862306a36Sopenharmony_cithe tree contains at least 17 nodes all with key prefix ``1111``. The 46962306a36Sopenharmony_ciinsertion algorithm will insert a shortcut to skip over the ``1111`` keyspace 47062306a36Sopenharmony_ciin a single bound and get to the fourth level where these actually become 47162306a36Sopenharmony_cidifferent. 47262306a36Sopenharmony_ci 47362306a36Sopenharmony_ci 47462306a36Sopenharmony_ciSplitting And Collapsing Nodes 47562306a36Sopenharmony_ci------------------------------ 47662306a36Sopenharmony_ci 47762306a36Sopenharmony_ciEach node has a maximum capacity of 16 leaves and metadata pointers. If the 47862306a36Sopenharmony_ciinsertion algorithm finds that it is trying to insert a 17th object into a 47962306a36Sopenharmony_cinode, that node will be split such that at least two leaves that have a common 48062306a36Sopenharmony_cikey segment at that level end up in a separate node rooted on that slot for 48162306a36Sopenharmony_cithat common key segment. 48262306a36Sopenharmony_ci 48362306a36Sopenharmony_ciIf the leaves in a full node and the leaf that is being inserted are 48462306a36Sopenharmony_cisufficiently similar, then a shortcut will be inserted into the tree. 48562306a36Sopenharmony_ci 48662306a36Sopenharmony_ciWhen the number of objects in the subtree rooted at a node falls to 16 or 48762306a36Sopenharmony_cifewer, then the subtree will be collapsed down to a single node - and this will 48862306a36Sopenharmony_ciripple towards the root if possible. 48962306a36Sopenharmony_ci 49062306a36Sopenharmony_ci 49162306a36Sopenharmony_ciNon-Recursive Iteration 49262306a36Sopenharmony_ci----------------------- 49362306a36Sopenharmony_ci 49462306a36Sopenharmony_ciEach node and shortcut contains a back pointer to its parent and the number of 49562306a36Sopenharmony_cislot in that parent that points to it. None-recursive iteration uses these to 49662306a36Sopenharmony_ciproceed rootwards through the tree, going to the parent node, slot N + 1 to 49762306a36Sopenharmony_cimake sure progress is made without the need for a stack. 49862306a36Sopenharmony_ci 49962306a36Sopenharmony_ciThe backpointers, however, make simultaneous alteration and iteration tricky. 50062306a36Sopenharmony_ci 50162306a36Sopenharmony_ci 50262306a36Sopenharmony_ciSimultaneous Alteration And Iteration 50362306a36Sopenharmony_ci------------------------------------- 50462306a36Sopenharmony_ci 50562306a36Sopenharmony_ciThere are a number of cases to consider: 50662306a36Sopenharmony_ci 50762306a36Sopenharmony_ci1. Simple insert/replace. This involves simply replacing a NULL or old 50862306a36Sopenharmony_ci matching leaf pointer with the pointer to the new leaf after a barrier. 50962306a36Sopenharmony_ci The metadata blocks don't change otherwise. An old leaf won't be freed 51062306a36Sopenharmony_ci until after the RCU grace period. 51162306a36Sopenharmony_ci 51262306a36Sopenharmony_ci2. Simple delete. This involves just clearing an old matching leaf. The 51362306a36Sopenharmony_ci metadata blocks don't change otherwise. The old leaf won't be freed until 51462306a36Sopenharmony_ci after the RCU grace period. 51562306a36Sopenharmony_ci 51662306a36Sopenharmony_ci3. Insertion replacing part of a subtree that we haven't yet entered. This 51762306a36Sopenharmony_ci may involve replacement of part of that subtree - but that won't affect 51862306a36Sopenharmony_ci the iteration as we won't have reached the pointer to it yet and the 51962306a36Sopenharmony_ci ancestry blocks are not replaced (the layout of those does not change). 52062306a36Sopenharmony_ci 52162306a36Sopenharmony_ci4. Insertion replacing nodes that we're actively processing. This isn't a 52262306a36Sopenharmony_ci problem as we've passed the anchoring pointer and won't switch onto the 52362306a36Sopenharmony_ci new layout until we follow the back pointers - at which point we've 52462306a36Sopenharmony_ci already examined the leaves in the replaced node (we iterate over all the 52562306a36Sopenharmony_ci leaves in a node before following any of its metadata pointers). 52662306a36Sopenharmony_ci 52762306a36Sopenharmony_ci We might, however, re-see some leaves that have been split out into a new 52862306a36Sopenharmony_ci branch that's in a slot further along than we were at. 52962306a36Sopenharmony_ci 53062306a36Sopenharmony_ci5. Insertion replacing nodes that we're processing a dependent branch of. 53162306a36Sopenharmony_ci This won't affect us until we follow the back pointers. Similar to (4). 53262306a36Sopenharmony_ci 53362306a36Sopenharmony_ci6. Deletion collapsing a branch under us. This doesn't affect us because the 53462306a36Sopenharmony_ci back pointers will get us back to the parent of the new node before we 53562306a36Sopenharmony_ci could see the new node. The entire collapsed subtree is thrown away 53662306a36Sopenharmony_ci unchanged - and will still be rooted on the same slot, so we shouldn't 53762306a36Sopenharmony_ci process it a second time as we'll go back to slot + 1. 53862306a36Sopenharmony_ci 53962306a36Sopenharmony_ci.. note:: 54062306a36Sopenharmony_ci 54162306a36Sopenharmony_ci Under some circumstances, we need to simultaneously change the parent 54262306a36Sopenharmony_ci pointer and the parent slot pointer on a node (say, for example, we 54362306a36Sopenharmony_ci inserted another node before it and moved it up a level). We cannot do 54462306a36Sopenharmony_ci this without locking against a read - so we have to replace that node too. 54562306a36Sopenharmony_ci 54662306a36Sopenharmony_ci However, when we're changing a shortcut into a node this isn't a problem 54762306a36Sopenharmony_ci as shortcuts only have one slot and so the parent slot number isn't used 54862306a36Sopenharmony_ci when traversing backwards over one. This means that it's okay to change 54962306a36Sopenharmony_ci the slot number first - provided suitable barriers are used to make sure 55062306a36Sopenharmony_ci the parent slot number is read after the back pointer. 55162306a36Sopenharmony_ci 55262306a36Sopenharmony_ciObsolete blocks and leaves are freed up after an RCU grace period has passed, 55362306a36Sopenharmony_ciso as long as anyone doing walking or iteration holds the RCU read lock, the 55462306a36Sopenharmony_ciold superstructure should not go away on them. 555