162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0 262306a36Sopenharmony_ci 362306a36Sopenharmony_ci============================ 462306a36Sopenharmony_ciLC-trie implementation notes 562306a36Sopenharmony_ci============================ 662306a36Sopenharmony_ci 762306a36Sopenharmony_ciNode types 862306a36Sopenharmony_ci---------- 962306a36Sopenharmony_cileaf 1062306a36Sopenharmony_ci An end node with data. This has a copy of the relevant key, along 1162306a36Sopenharmony_ci with 'hlist' with routing table entries sorted by prefix length. 1262306a36Sopenharmony_ci See struct leaf and struct leaf_info. 1362306a36Sopenharmony_ci 1462306a36Sopenharmony_citrie node or tnode 1562306a36Sopenharmony_ci An internal node, holding an array of child (leaf or tnode) pointers, 1662306a36Sopenharmony_ci indexed through a subset of the key. See Level Compression. 1762306a36Sopenharmony_ci 1862306a36Sopenharmony_ciA few concepts explained 1962306a36Sopenharmony_ci------------------------ 2062306a36Sopenharmony_ciBits (tnode) 2162306a36Sopenharmony_ci The number of bits in the key segment used for indexing into the 2262306a36Sopenharmony_ci child array - the "child index". See Level Compression. 2362306a36Sopenharmony_ci 2462306a36Sopenharmony_ciPos (tnode) 2562306a36Sopenharmony_ci The position (in the key) of the key segment used for indexing into 2662306a36Sopenharmony_ci the child array. See Path Compression. 2762306a36Sopenharmony_ci 2862306a36Sopenharmony_ciPath Compression / skipped bits 2962306a36Sopenharmony_ci Any given tnode is linked to from the child array of its parent, using 3062306a36Sopenharmony_ci a segment of the key specified by the parent's "pos" and "bits" 3162306a36Sopenharmony_ci In certain cases, this tnode's own "pos" will not be immediately 3262306a36Sopenharmony_ci adjacent to the parent (pos+bits), but there will be some bits 3362306a36Sopenharmony_ci in the key skipped over because they represent a single path with no 3462306a36Sopenharmony_ci deviations. These "skipped bits" constitute Path Compression. 3562306a36Sopenharmony_ci Note that the search algorithm will simply skip over these bits when 3662306a36Sopenharmony_ci searching, making it necessary to save the keys in the leaves to 3762306a36Sopenharmony_ci verify that they actually do match the key we are searching for. 3862306a36Sopenharmony_ci 3962306a36Sopenharmony_ciLevel Compression / child arrays 4062306a36Sopenharmony_ci the trie is kept level balanced moving, under certain conditions, the 4162306a36Sopenharmony_ci children of a full child (see "full_children") up one level, so that 4262306a36Sopenharmony_ci instead of a pure binary tree, each internal node ("tnode") may 4362306a36Sopenharmony_ci contain an arbitrarily large array of links to several children. 4462306a36Sopenharmony_ci Conversely, a tnode with a mostly empty child array (see empty_children) 4562306a36Sopenharmony_ci may be "halved", having some of its children moved downwards one level, 4662306a36Sopenharmony_ci in order to avoid ever-increasing child arrays. 4762306a36Sopenharmony_ci 4862306a36Sopenharmony_ciempty_children 4962306a36Sopenharmony_ci the number of positions in the child array of a given tnode that are 5062306a36Sopenharmony_ci NULL. 5162306a36Sopenharmony_ci 5262306a36Sopenharmony_cifull_children 5362306a36Sopenharmony_ci the number of children of a given tnode that aren't path compressed. 5462306a36Sopenharmony_ci (in other words, they aren't NULL or leaves and their "pos" is equal 5562306a36Sopenharmony_ci to this tnode's "pos"+"bits"). 5662306a36Sopenharmony_ci 5762306a36Sopenharmony_ci (The word "full" here is used more in the sense of "complete" than 5862306a36Sopenharmony_ci as the opposite of "empty", which might be a tad confusing.) 5962306a36Sopenharmony_ci 6062306a36Sopenharmony_ciComments 6162306a36Sopenharmony_ci--------- 6262306a36Sopenharmony_ci 6362306a36Sopenharmony_ciWe have tried to keep the structure of the code as close to fib_hash as 6462306a36Sopenharmony_cipossible to allow verification and help up reviewing. 6562306a36Sopenharmony_ci 6662306a36Sopenharmony_cifib_find_node() 6762306a36Sopenharmony_ci A good start for understanding this code. This function implements a 6862306a36Sopenharmony_ci straightforward trie lookup. 6962306a36Sopenharmony_ci 7062306a36Sopenharmony_cifib_insert_node() 7162306a36Sopenharmony_ci Inserts a new leaf node in the trie. This is bit more complicated than 7262306a36Sopenharmony_ci fib_find_node(). Inserting a new node means we might have to run the 7362306a36Sopenharmony_ci level compression algorithm on part of the trie. 7462306a36Sopenharmony_ci 7562306a36Sopenharmony_citrie_leaf_remove() 7662306a36Sopenharmony_ci Looks up a key, deletes it and runs the level compression algorithm. 7762306a36Sopenharmony_ci 7862306a36Sopenharmony_citrie_rebalance() 7962306a36Sopenharmony_ci The key function for the dynamic trie after any change in the trie 8062306a36Sopenharmony_ci it is run to optimize and reorganize. It will walk the trie upwards 8162306a36Sopenharmony_ci towards the root from a given tnode, doing a resize() at each step 8262306a36Sopenharmony_ci to implement level compression. 8362306a36Sopenharmony_ci 8462306a36Sopenharmony_ciresize() 8562306a36Sopenharmony_ci Analyzes a tnode and optimizes the child array size by either inflating 8662306a36Sopenharmony_ci or shrinking it repeatedly until it fulfills the criteria for optimal 8762306a36Sopenharmony_ci level compression. This part follows the original paper pretty closely 8862306a36Sopenharmony_ci and there may be some room for experimentation here. 8962306a36Sopenharmony_ci 9062306a36Sopenharmony_ciinflate() 9162306a36Sopenharmony_ci Doubles the size of the child array within a tnode. Used by resize(). 9262306a36Sopenharmony_ci 9362306a36Sopenharmony_cihalve() 9462306a36Sopenharmony_ci Halves the size of the child array within a tnode - the inverse of 9562306a36Sopenharmony_ci inflate(). Used by resize(); 9662306a36Sopenharmony_ci 9762306a36Sopenharmony_cifn_trie_insert(), fn_trie_delete(), fn_trie_select_default() 9862306a36Sopenharmony_ci The route manipulation functions. Should conform pretty closely to the 9962306a36Sopenharmony_ci corresponding functions in fib_hash. 10062306a36Sopenharmony_ci 10162306a36Sopenharmony_cifn_trie_flush() 10262306a36Sopenharmony_ci This walks the full trie (using nextleaf()) and searches for empty 10362306a36Sopenharmony_ci leaves which have to be removed. 10462306a36Sopenharmony_ci 10562306a36Sopenharmony_cifn_trie_dump() 10662306a36Sopenharmony_ci Dumps the routing table ordered by prefix length. This is somewhat 10762306a36Sopenharmony_ci slower than the corresponding fib_hash function, as we have to walk the 10862306a36Sopenharmony_ci entire trie for each prefix length. In comparison, fib_hash is organized 10962306a36Sopenharmony_ci as one "zone"/hash per prefix length. 11062306a36Sopenharmony_ci 11162306a36Sopenharmony_ciLocking 11262306a36Sopenharmony_ci------- 11362306a36Sopenharmony_ci 11462306a36Sopenharmony_cifib_lock is used for an RW-lock in the same way that this is done in fib_hash. 11562306a36Sopenharmony_ciHowever, the functions are somewhat separated for other possible locking 11662306a36Sopenharmony_ciscenarios. It might conceivably be possible to run trie_rebalance via RCU 11762306a36Sopenharmony_cito avoid read_lock in the fn_trie_lookup() function. 11862306a36Sopenharmony_ci 11962306a36Sopenharmony_ciMain lookup mechanism 12062306a36Sopenharmony_ci--------------------- 12162306a36Sopenharmony_cifn_trie_lookup() is the main lookup function. 12262306a36Sopenharmony_ci 12362306a36Sopenharmony_ciThe lookup is in its simplest form just like fib_find_node(). We descend the 12462306a36Sopenharmony_citrie, key segment by key segment, until we find a leaf. check_leaf() does 12562306a36Sopenharmony_cithe fib_semantic_match in the leaf's sorted prefix hlist. 12662306a36Sopenharmony_ci 12762306a36Sopenharmony_ciIf we find a match, we are done. 12862306a36Sopenharmony_ci 12962306a36Sopenharmony_ciIf we don't find a match, we enter prefix matching mode. The prefix length, 13062306a36Sopenharmony_cistarting out at the same as the key length, is reduced one step at a time, 13162306a36Sopenharmony_ciand we backtrack upwards through the trie trying to find a longest matching 13262306a36Sopenharmony_ciprefix. The goal is always to reach a leaf and get a positive result from the 13362306a36Sopenharmony_cifib_semantic_match mechanism. 13462306a36Sopenharmony_ci 13562306a36Sopenharmony_ciInside each tnode, the search for longest matching prefix consists of searching 13662306a36Sopenharmony_cithrough the child array, chopping off (zeroing) the least significant "1" of 13762306a36Sopenharmony_cithe child index until we find a match or the child index consists of nothing but 13862306a36Sopenharmony_cizeros. 13962306a36Sopenharmony_ci 14062306a36Sopenharmony_ciAt this point we backtrack (t->stats.backtrack++) up the trie, continuing to 14162306a36Sopenharmony_cichop off part of the key in order to find the longest matching prefix. 14262306a36Sopenharmony_ci 14362306a36Sopenharmony_ciAt this point we will repeatedly descend subtries to look for a match, and there 14462306a36Sopenharmony_ciare some optimizations available that can provide us with "shortcuts" to avoid 14562306a36Sopenharmony_cidescending into dead ends. Look for "HL_OPTIMIZE" sections in the code. 14662306a36Sopenharmony_ci 14762306a36Sopenharmony_ciTo alleviate any doubts about the correctness of the route selection process, 14862306a36Sopenharmony_cia new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which 14962306a36Sopenharmony_cigives userland access to fib_lookup(). 150