162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci
362306a36Sopenharmony_ci============================
462306a36Sopenharmony_ciLC-trie implementation notes
562306a36Sopenharmony_ci============================
662306a36Sopenharmony_ci
762306a36Sopenharmony_ciNode types
862306a36Sopenharmony_ci----------
962306a36Sopenharmony_cileaf
1062306a36Sopenharmony_ci	An end node with data. This has a copy of the relevant key, along
1162306a36Sopenharmony_ci	with 'hlist' with routing table entries sorted by prefix length.
1262306a36Sopenharmony_ci	See struct leaf and struct leaf_info.
1362306a36Sopenharmony_ci
1462306a36Sopenharmony_citrie node or tnode
1562306a36Sopenharmony_ci	An internal node, holding an array of child (leaf or tnode) pointers,
1662306a36Sopenharmony_ci	indexed	through a subset of the key. See Level Compression.
1762306a36Sopenharmony_ci
1862306a36Sopenharmony_ciA few concepts explained
1962306a36Sopenharmony_ci------------------------
2062306a36Sopenharmony_ciBits (tnode)
2162306a36Sopenharmony_ci	The number of bits in the key segment used for indexing into the
2262306a36Sopenharmony_ci	child array - the "child index". See Level Compression.
2362306a36Sopenharmony_ci
2462306a36Sopenharmony_ciPos (tnode)
2562306a36Sopenharmony_ci	The position (in the key) of the key segment used for indexing into
2662306a36Sopenharmony_ci	the child array. See Path Compression.
2762306a36Sopenharmony_ci
2862306a36Sopenharmony_ciPath Compression / skipped bits
2962306a36Sopenharmony_ci	Any given tnode is linked to from the child array of its parent, using
3062306a36Sopenharmony_ci	a segment of the key specified by the parent's "pos" and "bits"
3162306a36Sopenharmony_ci	In certain cases, this tnode's own "pos" will not be immediately
3262306a36Sopenharmony_ci	adjacent to the parent (pos+bits), but there will be some bits
3362306a36Sopenharmony_ci	in the key skipped over because they represent a single path with no
3462306a36Sopenharmony_ci	deviations. These "skipped bits" constitute Path Compression.
3562306a36Sopenharmony_ci	Note that the search algorithm will simply skip over these bits when
3662306a36Sopenharmony_ci	searching, making it necessary to save the keys in the leaves to
3762306a36Sopenharmony_ci	verify that they actually do match the key we are searching for.
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciLevel Compression / child arrays
4062306a36Sopenharmony_ci	the trie is kept level balanced moving, under certain conditions, the
4162306a36Sopenharmony_ci	children of a full child (see "full_children") up one level, so that
4262306a36Sopenharmony_ci	instead of a pure binary tree, each internal node ("tnode") may
4362306a36Sopenharmony_ci	contain an arbitrarily large array of links to several children.
4462306a36Sopenharmony_ci	Conversely, a tnode with a mostly empty	child array (see empty_children)
4562306a36Sopenharmony_ci	may be "halved", having some of its children moved downwards one level,
4662306a36Sopenharmony_ci	in order to avoid ever-increasing child arrays.
4762306a36Sopenharmony_ci
4862306a36Sopenharmony_ciempty_children
4962306a36Sopenharmony_ci	the number of positions in the child array of a given tnode that are
5062306a36Sopenharmony_ci	NULL.
5162306a36Sopenharmony_ci
5262306a36Sopenharmony_cifull_children
5362306a36Sopenharmony_ci	the number of children of a given tnode that aren't path compressed.
5462306a36Sopenharmony_ci	(in other words, they aren't NULL or leaves and their "pos" is equal
5562306a36Sopenharmony_ci	to this	tnode's "pos"+"bits").
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ci	(The word "full" here is used more in the sense of "complete" than
5862306a36Sopenharmony_ci	as the opposite of "empty", which might be a tad confusing.)
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_ciComments
6162306a36Sopenharmony_ci---------
6262306a36Sopenharmony_ci
6362306a36Sopenharmony_ciWe have tried to keep the structure of the code as close to fib_hash as
6462306a36Sopenharmony_cipossible to allow verification and help up reviewing.
6562306a36Sopenharmony_ci
6662306a36Sopenharmony_cifib_find_node()
6762306a36Sopenharmony_ci	A good start for understanding this code. This function implements a
6862306a36Sopenharmony_ci	straightforward trie lookup.
6962306a36Sopenharmony_ci
7062306a36Sopenharmony_cifib_insert_node()
7162306a36Sopenharmony_ci	Inserts a new leaf node in the trie. This is bit more complicated than
7262306a36Sopenharmony_ci	fib_find_node(). Inserting a new node means we might have to run the
7362306a36Sopenharmony_ci	level compression algorithm on part of the trie.
7462306a36Sopenharmony_ci
7562306a36Sopenharmony_citrie_leaf_remove()
7662306a36Sopenharmony_ci	Looks up a key, deletes it and runs the level compression algorithm.
7762306a36Sopenharmony_ci
7862306a36Sopenharmony_citrie_rebalance()
7962306a36Sopenharmony_ci	The key function for the dynamic trie after any change in the trie
8062306a36Sopenharmony_ci	it is run to optimize and reorganize. It will walk the trie upwards
8162306a36Sopenharmony_ci	towards the root from a given tnode, doing a resize() at each step
8262306a36Sopenharmony_ci	to implement level compression.
8362306a36Sopenharmony_ci
8462306a36Sopenharmony_ciresize()
8562306a36Sopenharmony_ci	Analyzes a tnode and optimizes the child array size by either inflating
8662306a36Sopenharmony_ci	or shrinking it repeatedly until it fulfills the criteria for optimal
8762306a36Sopenharmony_ci	level compression. This part follows the original paper pretty closely
8862306a36Sopenharmony_ci	and there may be some room for experimentation here.
8962306a36Sopenharmony_ci
9062306a36Sopenharmony_ciinflate()
9162306a36Sopenharmony_ci	Doubles the size of the child array within a tnode. Used by resize().
9262306a36Sopenharmony_ci
9362306a36Sopenharmony_cihalve()
9462306a36Sopenharmony_ci	Halves the size of the child array within a tnode - the inverse of
9562306a36Sopenharmony_ci	inflate(). Used by resize();
9662306a36Sopenharmony_ci
9762306a36Sopenharmony_cifn_trie_insert(), fn_trie_delete(), fn_trie_select_default()
9862306a36Sopenharmony_ci	The route manipulation functions. Should conform pretty closely to the
9962306a36Sopenharmony_ci	corresponding functions in fib_hash.
10062306a36Sopenharmony_ci
10162306a36Sopenharmony_cifn_trie_flush()
10262306a36Sopenharmony_ci	This walks the full trie (using nextleaf()) and searches for empty
10362306a36Sopenharmony_ci	leaves which have to be removed.
10462306a36Sopenharmony_ci
10562306a36Sopenharmony_cifn_trie_dump()
10662306a36Sopenharmony_ci	Dumps the routing table ordered by prefix length. This is somewhat
10762306a36Sopenharmony_ci	slower than the corresponding fib_hash function, as we have to walk the
10862306a36Sopenharmony_ci	entire trie for each prefix length. In comparison, fib_hash is organized
10962306a36Sopenharmony_ci	as one "zone"/hash per prefix length.
11062306a36Sopenharmony_ci
11162306a36Sopenharmony_ciLocking
11262306a36Sopenharmony_ci-------
11362306a36Sopenharmony_ci
11462306a36Sopenharmony_cifib_lock is used for an RW-lock in the same way that this is done in fib_hash.
11562306a36Sopenharmony_ciHowever, the functions are somewhat separated for other possible locking
11662306a36Sopenharmony_ciscenarios. It might conceivably be possible to run trie_rebalance via RCU
11762306a36Sopenharmony_cito avoid read_lock in the fn_trie_lookup() function.
11862306a36Sopenharmony_ci
11962306a36Sopenharmony_ciMain lookup mechanism
12062306a36Sopenharmony_ci---------------------
12162306a36Sopenharmony_cifn_trie_lookup() is the main lookup function.
12262306a36Sopenharmony_ci
12362306a36Sopenharmony_ciThe lookup is in its simplest form just like fib_find_node(). We descend the
12462306a36Sopenharmony_citrie, key segment by key segment, until we find a leaf. check_leaf() does
12562306a36Sopenharmony_cithe fib_semantic_match in the leaf's sorted prefix hlist.
12662306a36Sopenharmony_ci
12762306a36Sopenharmony_ciIf we find a match, we are done.
12862306a36Sopenharmony_ci
12962306a36Sopenharmony_ciIf we don't find a match, we enter prefix matching mode. The prefix length,
13062306a36Sopenharmony_cistarting out at the same as the key length, is reduced one step at a time,
13162306a36Sopenharmony_ciand we backtrack upwards through the trie trying to find a longest matching
13262306a36Sopenharmony_ciprefix. The goal is always to reach a leaf and get a positive result from the
13362306a36Sopenharmony_cifib_semantic_match mechanism.
13462306a36Sopenharmony_ci
13562306a36Sopenharmony_ciInside each tnode, the search for longest matching prefix consists of searching
13662306a36Sopenharmony_cithrough the child array, chopping off (zeroing) the least significant "1" of
13762306a36Sopenharmony_cithe child index until we find a match or the child index consists of nothing but
13862306a36Sopenharmony_cizeros.
13962306a36Sopenharmony_ci
14062306a36Sopenharmony_ciAt this point we backtrack (t->stats.backtrack++) up the trie, continuing to
14162306a36Sopenharmony_cichop off part of the key in order to find the longest matching prefix.
14262306a36Sopenharmony_ci
14362306a36Sopenharmony_ciAt this point we will repeatedly descend subtries to look for a match, and there
14462306a36Sopenharmony_ciare some optimizations available that can provide us with "shortcuts" to avoid
14562306a36Sopenharmony_cidescending into dead ends. Look for "HL_OPTIMIZE" sections in the code.
14662306a36Sopenharmony_ci
14762306a36Sopenharmony_ciTo alleviate any doubts about the correctness of the route selection process,
14862306a36Sopenharmony_cia new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which
14962306a36Sopenharmony_cigives userland access to fib_lookup().
150