18c2ecf20Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0+
28c2ecf20Sopenharmony_ci
38c2ecf20Sopenharmony_ci======
48c2ecf20Sopenharmony_ciXArray
58c2ecf20Sopenharmony_ci======
68c2ecf20Sopenharmony_ci
78c2ecf20Sopenharmony_ci:Author: Matthew Wilcox
88c2ecf20Sopenharmony_ci
98c2ecf20Sopenharmony_ciOverview
108c2ecf20Sopenharmony_ci========
118c2ecf20Sopenharmony_ci
128c2ecf20Sopenharmony_ciThe XArray is an abstract data type which behaves like a very large array
138c2ecf20Sopenharmony_ciof pointers.  It meets many of the same needs as a hash or a conventional
148c2ecf20Sopenharmony_ciresizable array.  Unlike a hash, it allows you to sensibly go to the
158c2ecf20Sopenharmony_cinext or previous entry in a cache-efficient manner.  In contrast to a
168c2ecf20Sopenharmony_ciresizable array, there is no need to copy data or change MMU mappings in
178c2ecf20Sopenharmony_ciorder to grow the array.  It is more memory-efficient, parallelisable
188c2ecf20Sopenharmony_ciand cache friendly than a doubly-linked list.  It takes advantage of
198c2ecf20Sopenharmony_ciRCU to perform lookups without locking.
208c2ecf20Sopenharmony_ci
218c2ecf20Sopenharmony_ciThe XArray implementation is efficient when the indices used are densely
228c2ecf20Sopenharmony_ciclustered; hashing the object and using the hash as the index will not
238c2ecf20Sopenharmony_ciperform well.  The XArray is optimised for small indices, but still has
248c2ecf20Sopenharmony_cigood performance with large indices.  If your index can be larger than
258c2ecf20Sopenharmony_ci``ULONG_MAX`` then the XArray is not the data type for you.  The most
268c2ecf20Sopenharmony_ciimportant user of the XArray is the page cache.
278c2ecf20Sopenharmony_ci
288c2ecf20Sopenharmony_ciNormal pointers may be stored in the XArray directly.  They must be 4-byte
298c2ecf20Sopenharmony_cialigned, which is true for any pointer returned from kmalloc() and
308c2ecf20Sopenharmony_cialloc_page().  It isn't true for arbitrary user-space pointers,
318c2ecf20Sopenharmony_cinor for function pointers.  You can store pointers to statically allocated
328c2ecf20Sopenharmony_ciobjects, as long as those objects have an alignment of at least 4.
338c2ecf20Sopenharmony_ci
348c2ecf20Sopenharmony_ciYou can also store integers between 0 and ``LONG_MAX`` in the XArray.
358c2ecf20Sopenharmony_ciYou must first convert it into an entry using xa_mk_value().
368c2ecf20Sopenharmony_ciWhen you retrieve an entry from the XArray, you can check whether it is
378c2ecf20Sopenharmony_cia value entry by calling xa_is_value(), and convert it back to
388c2ecf20Sopenharmony_cian integer by calling xa_to_value().
398c2ecf20Sopenharmony_ci
408c2ecf20Sopenharmony_ciSome users want to tag the pointers they store in the XArray.  You can
418c2ecf20Sopenharmony_cicall xa_tag_pointer() to create an entry with a tag, xa_untag_pointer()
428c2ecf20Sopenharmony_cito turn a tagged entry back into an untagged pointer and xa_pointer_tag()
438c2ecf20Sopenharmony_cito retrieve the tag of an entry.  Tagged pointers use the same bits that
448c2ecf20Sopenharmony_ciare used to distinguish value entries from normal pointers, so you must
458c2ecf20Sopenharmony_cidecide whether they want to store value entries or tagged pointers in
468c2ecf20Sopenharmony_ciany particular XArray.
478c2ecf20Sopenharmony_ci
488c2ecf20Sopenharmony_ciThe XArray does not support storing IS_ERR() pointers as some
498c2ecf20Sopenharmony_ciconflict with value entries or internal entries.
508c2ecf20Sopenharmony_ci
518c2ecf20Sopenharmony_ciAn unusual feature of the XArray is the ability to create entries which
528c2ecf20Sopenharmony_cioccupy a range of indices.  Once stored to, looking up any index in
538c2ecf20Sopenharmony_cithe range will return the same entry as looking up any other index in
548c2ecf20Sopenharmony_cithe range.  Storing to any index will store to all of them.  Multi-index
558c2ecf20Sopenharmony_cientries can be explicitly split into smaller entries, or storing ``NULL``
568c2ecf20Sopenharmony_ciinto any entry will cause the XArray to forget about the range.
578c2ecf20Sopenharmony_ci
588c2ecf20Sopenharmony_ciNormal API
598c2ecf20Sopenharmony_ci==========
608c2ecf20Sopenharmony_ci
618c2ecf20Sopenharmony_ciStart by initialising an XArray, either with DEFINE_XARRAY()
628c2ecf20Sopenharmony_cifor statically allocated XArrays or xa_init() for dynamically
638c2ecf20Sopenharmony_ciallocated ones.  A freshly-initialised XArray contains a ``NULL``
648c2ecf20Sopenharmony_cipointer at every index.
658c2ecf20Sopenharmony_ci
668c2ecf20Sopenharmony_ciYou can then set entries using xa_store() and get entries
678c2ecf20Sopenharmony_ciusing xa_load().  xa_store will overwrite any entry with the
688c2ecf20Sopenharmony_cinew entry and return the previous entry stored at that index.  You can
698c2ecf20Sopenharmony_ciuse xa_erase() instead of calling xa_store() with a
708c2ecf20Sopenharmony_ci``NULL`` entry.  There is no difference between an entry that has never
718c2ecf20Sopenharmony_cibeen stored to, one that has been erased and one that has most recently
728c2ecf20Sopenharmony_cihad ``NULL`` stored to it.
738c2ecf20Sopenharmony_ci
748c2ecf20Sopenharmony_ciYou can conditionally replace an entry at an index by using
758c2ecf20Sopenharmony_cixa_cmpxchg().  Like cmpxchg(), it will only succeed if
768c2ecf20Sopenharmony_cithe entry at that index has the 'old' value.  It also returns the entry
778c2ecf20Sopenharmony_ciwhich was at that index; if it returns the same entry which was passed as
788c2ecf20Sopenharmony_ci'old', then xa_cmpxchg() succeeded.
798c2ecf20Sopenharmony_ci
808c2ecf20Sopenharmony_ciIf you want to only store a new entry to an index if the current entry
818c2ecf20Sopenharmony_ciat that index is ``NULL``, you can use xa_insert() which
828c2ecf20Sopenharmony_cireturns ``-EBUSY`` if the entry is not empty.
838c2ecf20Sopenharmony_ci
848c2ecf20Sopenharmony_ciYou can copy entries out of the XArray into a plain array by calling
858c2ecf20Sopenharmony_cixa_extract().  Or you can iterate over the present entries in the XArray
868c2ecf20Sopenharmony_ciby calling xa_for_each(), xa_for_each_start() or xa_for_each_range().
878c2ecf20Sopenharmony_ciYou may prefer to use xa_find() or xa_find_after() to move to the next
888c2ecf20Sopenharmony_cipresent entry in the XArray.
898c2ecf20Sopenharmony_ci
908c2ecf20Sopenharmony_ciCalling xa_store_range() stores the same entry in a range
918c2ecf20Sopenharmony_ciof indices.  If you do this, some of the other operations will behave
928c2ecf20Sopenharmony_ciin a slightly odd way.  For example, marking the entry at one index
938c2ecf20Sopenharmony_cimay result in the entry being marked at some, but not all of the other
948c2ecf20Sopenharmony_ciindices.  Storing into one index may result in the entry retrieved by
958c2ecf20Sopenharmony_cisome, but not all of the other indices changing.
968c2ecf20Sopenharmony_ci
978c2ecf20Sopenharmony_ciSometimes you need to ensure that a subsequent call to xa_store()
988c2ecf20Sopenharmony_ciwill not need to allocate memory.  The xa_reserve() function
998c2ecf20Sopenharmony_ciwill store a reserved entry at the indicated index.  Users of the
1008c2ecf20Sopenharmony_cinormal API will see this entry as containing ``NULL``.  If you do
1018c2ecf20Sopenharmony_cinot need to use the reserved entry, you can call xa_release()
1028c2ecf20Sopenharmony_cito remove the unused entry.  If another user has stored to the entry
1038c2ecf20Sopenharmony_ciin the meantime, xa_release() will do nothing; if instead you
1048c2ecf20Sopenharmony_ciwant the entry to become ``NULL``, you should use xa_erase().
1058c2ecf20Sopenharmony_ciUsing xa_insert() on a reserved entry will fail.
1068c2ecf20Sopenharmony_ci
1078c2ecf20Sopenharmony_ciIf all entries in the array are ``NULL``, the xa_empty() function
1088c2ecf20Sopenharmony_ciwill return ``true``.
1098c2ecf20Sopenharmony_ci
1108c2ecf20Sopenharmony_ciFinally, you can remove all entries from an XArray by calling
1118c2ecf20Sopenharmony_cixa_destroy().  If the XArray entries are pointers, you may wish
1128c2ecf20Sopenharmony_cito free the entries first.  You can do this by iterating over all present
1138c2ecf20Sopenharmony_cientries in the XArray using the xa_for_each() iterator.
1148c2ecf20Sopenharmony_ci
1158c2ecf20Sopenharmony_ciSearch Marks
1168c2ecf20Sopenharmony_ci------------
1178c2ecf20Sopenharmony_ci
1188c2ecf20Sopenharmony_ciEach entry in the array has three bits associated with it called marks.
1198c2ecf20Sopenharmony_ciEach mark may be set or cleared independently of the others.  You can
1208c2ecf20Sopenharmony_ciiterate over marked entries by using the xa_for_each_marked() iterator.
1218c2ecf20Sopenharmony_ci
1228c2ecf20Sopenharmony_ciYou can enquire whether a mark is set on an entry by using
1238c2ecf20Sopenharmony_cixa_get_mark().  If the entry is not ``NULL``, you can set a mark on it
1248c2ecf20Sopenharmony_ciby using xa_set_mark() and remove the mark from an entry by calling
1258c2ecf20Sopenharmony_cixa_clear_mark().  You can ask whether any entry in the XArray has a
1268c2ecf20Sopenharmony_ciparticular mark set by calling xa_marked().  Erasing an entry from the
1278c2ecf20Sopenharmony_ciXArray causes all marks associated with that entry to be cleared.
1288c2ecf20Sopenharmony_ci
1298c2ecf20Sopenharmony_ciSetting or clearing a mark on any index of a multi-index entry will
1308c2ecf20Sopenharmony_ciaffect all indices covered by that entry.  Querying the mark on any
1318c2ecf20Sopenharmony_ciindex will return the same result.
1328c2ecf20Sopenharmony_ci
1338c2ecf20Sopenharmony_ciThere is no way to iterate over entries which are not marked; the data
1348c2ecf20Sopenharmony_cistructure does not allow this to be implemented efficiently.  There are
1358c2ecf20Sopenharmony_cinot currently iterators to search for logical combinations of bits (eg
1368c2ecf20Sopenharmony_ciiterate over all entries which have both ``XA_MARK_1`` and ``XA_MARK_2``
1378c2ecf20Sopenharmony_ciset, or iterate over all entries which have ``XA_MARK_0`` or ``XA_MARK_2``
1388c2ecf20Sopenharmony_ciset).  It would be possible to add these if a user arises.
1398c2ecf20Sopenharmony_ci
1408c2ecf20Sopenharmony_ciAllocating XArrays
1418c2ecf20Sopenharmony_ci------------------
1428c2ecf20Sopenharmony_ci
1438c2ecf20Sopenharmony_ciIf you use DEFINE_XARRAY_ALLOC() to define the XArray, or
1448c2ecf20Sopenharmony_ciinitialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(),
1458c2ecf20Sopenharmony_cithe XArray changes to track whether entries are in use or not.
1468c2ecf20Sopenharmony_ci
1478c2ecf20Sopenharmony_ciYou can call xa_alloc() to store the entry at an unused index
1488c2ecf20Sopenharmony_ciin the XArray.  If you need to modify the array from interrupt context,
1498c2ecf20Sopenharmony_ciyou can use xa_alloc_bh() or xa_alloc_irq() to disable
1508c2ecf20Sopenharmony_ciinterrupts while allocating the ID.
1518c2ecf20Sopenharmony_ci
1528c2ecf20Sopenharmony_ciUsing xa_store(), xa_cmpxchg() or xa_insert() will
1538c2ecf20Sopenharmony_cialso mark the entry as being allocated.  Unlike a normal XArray, storing
1548c2ecf20Sopenharmony_ci``NULL`` will mark the entry as being in use, like xa_reserve().
1558c2ecf20Sopenharmony_ciTo free an entry, use xa_erase() (or xa_release() if
1568c2ecf20Sopenharmony_ciyou only want to free the entry if it's ``NULL``).
1578c2ecf20Sopenharmony_ci
1588c2ecf20Sopenharmony_ciBy default, the lowest free entry is allocated starting from 0.  If you
1598c2ecf20Sopenharmony_ciwant to allocate entries starting at 1, it is more efficient to use
1608c2ecf20Sopenharmony_ciDEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``.  If you want to
1618c2ecf20Sopenharmony_ciallocate IDs up to a maximum, then wrap back around to the lowest free
1628c2ecf20Sopenharmony_ciID, you can use xa_alloc_cyclic().
1638c2ecf20Sopenharmony_ci
1648c2ecf20Sopenharmony_ciYou cannot use ``XA_MARK_0`` with an allocating XArray as this mark
1658c2ecf20Sopenharmony_ciis used to track whether an entry is free or not.  The other marks are
1668c2ecf20Sopenharmony_ciavailable for your use.
1678c2ecf20Sopenharmony_ci
1688c2ecf20Sopenharmony_ciMemory allocation
1698c2ecf20Sopenharmony_ci-----------------
1708c2ecf20Sopenharmony_ci
1718c2ecf20Sopenharmony_ciThe xa_store(), xa_cmpxchg(), xa_alloc(),
1728c2ecf20Sopenharmony_cixa_reserve() and xa_insert() functions take a gfp_t
1738c2ecf20Sopenharmony_ciparameter in case the XArray needs to allocate memory to store this entry.
1748c2ecf20Sopenharmony_ciIf the entry is being deleted, no memory allocation needs to be performed,
1758c2ecf20Sopenharmony_ciand the GFP flags specified will be ignored.
1768c2ecf20Sopenharmony_ci
1778c2ecf20Sopenharmony_ciIt is possible for no memory to be allocatable, particularly if you pass
1788c2ecf20Sopenharmony_cia restrictive set of GFP flags.  In that case, the functions return a
1798c2ecf20Sopenharmony_cispecial value which can be turned into an errno using xa_err().
1808c2ecf20Sopenharmony_ciIf you don't need to know exactly which error occurred, using
1818c2ecf20Sopenharmony_cixa_is_err() is slightly more efficient.
1828c2ecf20Sopenharmony_ci
1838c2ecf20Sopenharmony_ciLocking
1848c2ecf20Sopenharmony_ci-------
1858c2ecf20Sopenharmony_ci
1868c2ecf20Sopenharmony_ciWhen using the Normal API, you do not have to worry about locking.
1878c2ecf20Sopenharmony_ciThe XArray uses RCU and an internal spinlock to synchronise access:
1888c2ecf20Sopenharmony_ci
1898c2ecf20Sopenharmony_ciNo lock needed:
1908c2ecf20Sopenharmony_ci * xa_empty()
1918c2ecf20Sopenharmony_ci * xa_marked()
1928c2ecf20Sopenharmony_ci
1938c2ecf20Sopenharmony_ciTakes RCU read lock:
1948c2ecf20Sopenharmony_ci * xa_load()
1958c2ecf20Sopenharmony_ci * xa_for_each()
1968c2ecf20Sopenharmony_ci * xa_for_each_start()
1978c2ecf20Sopenharmony_ci * xa_for_each_range()
1988c2ecf20Sopenharmony_ci * xa_find()
1998c2ecf20Sopenharmony_ci * xa_find_after()
2008c2ecf20Sopenharmony_ci * xa_extract()
2018c2ecf20Sopenharmony_ci * xa_get_mark()
2028c2ecf20Sopenharmony_ci
2038c2ecf20Sopenharmony_ciTakes xa_lock internally:
2048c2ecf20Sopenharmony_ci * xa_store()
2058c2ecf20Sopenharmony_ci * xa_store_bh()
2068c2ecf20Sopenharmony_ci * xa_store_irq()
2078c2ecf20Sopenharmony_ci * xa_insert()
2088c2ecf20Sopenharmony_ci * xa_insert_bh()
2098c2ecf20Sopenharmony_ci * xa_insert_irq()
2108c2ecf20Sopenharmony_ci * xa_erase()
2118c2ecf20Sopenharmony_ci * xa_erase_bh()
2128c2ecf20Sopenharmony_ci * xa_erase_irq()
2138c2ecf20Sopenharmony_ci * xa_cmpxchg()
2148c2ecf20Sopenharmony_ci * xa_cmpxchg_bh()
2158c2ecf20Sopenharmony_ci * xa_cmpxchg_irq()
2168c2ecf20Sopenharmony_ci * xa_store_range()
2178c2ecf20Sopenharmony_ci * xa_alloc()
2188c2ecf20Sopenharmony_ci * xa_alloc_bh()
2198c2ecf20Sopenharmony_ci * xa_alloc_irq()
2208c2ecf20Sopenharmony_ci * xa_reserve()
2218c2ecf20Sopenharmony_ci * xa_reserve_bh()
2228c2ecf20Sopenharmony_ci * xa_reserve_irq()
2238c2ecf20Sopenharmony_ci * xa_destroy()
2248c2ecf20Sopenharmony_ci * xa_set_mark()
2258c2ecf20Sopenharmony_ci * xa_clear_mark()
2268c2ecf20Sopenharmony_ci
2278c2ecf20Sopenharmony_ciAssumes xa_lock held on entry:
2288c2ecf20Sopenharmony_ci * __xa_store()
2298c2ecf20Sopenharmony_ci * __xa_insert()
2308c2ecf20Sopenharmony_ci * __xa_erase()
2318c2ecf20Sopenharmony_ci * __xa_cmpxchg()
2328c2ecf20Sopenharmony_ci * __xa_alloc()
2338c2ecf20Sopenharmony_ci * __xa_set_mark()
2348c2ecf20Sopenharmony_ci * __xa_clear_mark()
2358c2ecf20Sopenharmony_ci
2368c2ecf20Sopenharmony_ciIf you want to take advantage of the lock to protect the data structures
2378c2ecf20Sopenharmony_cithat you are storing in the XArray, you can call xa_lock()
2388c2ecf20Sopenharmony_cibefore calling xa_load(), then take a reference count on the
2398c2ecf20Sopenharmony_ciobject you have found before calling xa_unlock().  This will
2408c2ecf20Sopenharmony_ciprevent stores from removing the object from the array between looking
2418c2ecf20Sopenharmony_ciup the object and incrementing the refcount.  You can also use RCU to
2428c2ecf20Sopenharmony_ciavoid dereferencing freed memory, but an explanation of that is beyond
2438c2ecf20Sopenharmony_cithe scope of this document.
2448c2ecf20Sopenharmony_ci
2458c2ecf20Sopenharmony_ciThe XArray does not disable interrupts or softirqs while modifying
2468c2ecf20Sopenharmony_cithe array.  It is safe to read the XArray from interrupt or softirq
2478c2ecf20Sopenharmony_cicontext as the RCU lock provides enough protection.
2488c2ecf20Sopenharmony_ci
2498c2ecf20Sopenharmony_ciIf, for example, you want to store entries in the XArray in process
2508c2ecf20Sopenharmony_cicontext and then erase them in softirq context, you can do that this way::
2518c2ecf20Sopenharmony_ci
2528c2ecf20Sopenharmony_ci    void foo_init(struct foo *foo)
2538c2ecf20Sopenharmony_ci    {
2548c2ecf20Sopenharmony_ci        xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
2558c2ecf20Sopenharmony_ci    }
2568c2ecf20Sopenharmony_ci
2578c2ecf20Sopenharmony_ci    int foo_store(struct foo *foo, unsigned long index, void *entry)
2588c2ecf20Sopenharmony_ci    {
2598c2ecf20Sopenharmony_ci        int err;
2608c2ecf20Sopenharmony_ci
2618c2ecf20Sopenharmony_ci        xa_lock_bh(&foo->array);
2628c2ecf20Sopenharmony_ci        err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
2638c2ecf20Sopenharmony_ci        if (!err)
2648c2ecf20Sopenharmony_ci            foo->count++;
2658c2ecf20Sopenharmony_ci        xa_unlock_bh(&foo->array);
2668c2ecf20Sopenharmony_ci        return err;
2678c2ecf20Sopenharmony_ci    }
2688c2ecf20Sopenharmony_ci
2698c2ecf20Sopenharmony_ci    /* foo_erase() is only called from softirq context */
2708c2ecf20Sopenharmony_ci    void foo_erase(struct foo *foo, unsigned long index)
2718c2ecf20Sopenharmony_ci    {
2728c2ecf20Sopenharmony_ci        xa_lock(&foo->array);
2738c2ecf20Sopenharmony_ci        __xa_erase(&foo->array, index);
2748c2ecf20Sopenharmony_ci        foo->count--;
2758c2ecf20Sopenharmony_ci        xa_unlock(&foo->array);
2768c2ecf20Sopenharmony_ci    }
2778c2ecf20Sopenharmony_ci
2788c2ecf20Sopenharmony_ciIf you are going to modify the XArray from interrupt or softirq context,
2798c2ecf20Sopenharmony_ciyou need to initialise the array using xa_init_flags(), passing
2808c2ecf20Sopenharmony_ci``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
2818c2ecf20Sopenharmony_ci
2828c2ecf20Sopenharmony_ciThe above example also shows a common pattern of wanting to extend the
2838c2ecf20Sopenharmony_cicoverage of the xa_lock on the store side to protect some statistics
2848c2ecf20Sopenharmony_ciassociated with the array.
2858c2ecf20Sopenharmony_ci
2868c2ecf20Sopenharmony_ciSharing the XArray with interrupt context is also possible, either
2878c2ecf20Sopenharmony_ciusing xa_lock_irqsave() in both the interrupt handler and process
2888c2ecf20Sopenharmony_cicontext, or xa_lock_irq() in process context and xa_lock()
2898c2ecf20Sopenharmony_ciin the interrupt handler.  Some of the more common patterns have helper
2908c2ecf20Sopenharmony_cifunctions such as xa_store_bh(), xa_store_irq(),
2918c2ecf20Sopenharmony_cixa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh()
2928c2ecf20Sopenharmony_ciand xa_cmpxchg_irq().
2938c2ecf20Sopenharmony_ci
2948c2ecf20Sopenharmony_ciSometimes you need to protect access to the XArray with a mutex because
2958c2ecf20Sopenharmony_cithat lock sits above another mutex in the locking hierarchy.  That does
2968c2ecf20Sopenharmony_cinot entitle you to use functions like __xa_erase() without taking
2978c2ecf20Sopenharmony_cithe xa_lock; the xa_lock is used for lockdep validation and will be used
2988c2ecf20Sopenharmony_cifor other purposes in the future.
2998c2ecf20Sopenharmony_ci
3008c2ecf20Sopenharmony_ciThe __xa_set_mark() and __xa_clear_mark() functions are also
3018c2ecf20Sopenharmony_ciavailable for situations where you look up an entry and want to atomically
3028c2ecf20Sopenharmony_ciset or clear a mark.  It may be more efficient to use the advanced API
3038c2ecf20Sopenharmony_ciin this case, as it will save you from walking the tree twice.
3048c2ecf20Sopenharmony_ci
3058c2ecf20Sopenharmony_ciAdvanced API
3068c2ecf20Sopenharmony_ci============
3078c2ecf20Sopenharmony_ci
3088c2ecf20Sopenharmony_ciThe advanced API offers more flexibility and better performance at the
3098c2ecf20Sopenharmony_cicost of an interface which can be harder to use and has fewer safeguards.
3108c2ecf20Sopenharmony_ciNo locking is done for you by the advanced API, and you are required
3118c2ecf20Sopenharmony_cito use the xa_lock while modifying the array.  You can choose whether
3128c2ecf20Sopenharmony_cito use the xa_lock or the RCU lock while doing read-only operations on
3138c2ecf20Sopenharmony_cithe array.  You can mix advanced and normal operations on the same array;
3148c2ecf20Sopenharmony_ciindeed the normal API is implemented in terms of the advanced API.  The
3158c2ecf20Sopenharmony_ciadvanced API is only available to modules with a GPL-compatible license.
3168c2ecf20Sopenharmony_ci
3178c2ecf20Sopenharmony_ciThe advanced API is based around the xa_state.  This is an opaque data
3188c2ecf20Sopenharmony_cistructure which you declare on the stack using the XA_STATE()
3198c2ecf20Sopenharmony_cimacro.  This macro initialises the xa_state ready to start walking
3208c2ecf20Sopenharmony_ciaround the XArray.  It is used as a cursor to maintain the position
3218c2ecf20Sopenharmony_ciin the XArray and let you compose various operations together without
3228c2ecf20Sopenharmony_cihaving to restart from the top every time.
3238c2ecf20Sopenharmony_ci
3248c2ecf20Sopenharmony_ciThe xa_state is also used to store errors.  You can call
3258c2ecf20Sopenharmony_cixas_error() to retrieve the error.  All operations check whether
3268c2ecf20Sopenharmony_cithe xa_state is in an error state before proceeding, so there's no need
3278c2ecf20Sopenharmony_cifor you to check for an error after each call; you can make multiple
3288c2ecf20Sopenharmony_cicalls in succession and only check at a convenient point.  The only
3298c2ecf20Sopenharmony_cierrors currently generated by the XArray code itself are ``ENOMEM`` and
3308c2ecf20Sopenharmony_ci``EINVAL``, but it supports arbitrary errors in case you want to call
3318c2ecf20Sopenharmony_cixas_set_err() yourself.
3328c2ecf20Sopenharmony_ci
3338c2ecf20Sopenharmony_ciIf the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
3348c2ecf20Sopenharmony_ciwill attempt to allocate more memory using the specified gfp flags and
3358c2ecf20Sopenharmony_cicache it in the xa_state for the next attempt.  The idea is that you take
3368c2ecf20Sopenharmony_cithe xa_lock, attempt the operation and drop the lock.  The operation
3378c2ecf20Sopenharmony_ciattempts to allocate memory while holding the lock, but it is more
3388c2ecf20Sopenharmony_cilikely to fail.  Once you have dropped the lock, xas_nomem()
3398c2ecf20Sopenharmony_cican try harder to allocate more memory.  It will return ``true`` if it
3408c2ecf20Sopenharmony_ciis worth retrying the operation (i.e. that there was a memory error *and*
3418c2ecf20Sopenharmony_cimore memory was allocated).  If it has previously allocated memory, and
3428c2ecf20Sopenharmony_cithat memory wasn't used, and there is no error (or some error that isn't
3438c2ecf20Sopenharmony_ci``ENOMEM``), then it will free the memory previously allocated.
3448c2ecf20Sopenharmony_ci
3458c2ecf20Sopenharmony_ciInternal Entries
3468c2ecf20Sopenharmony_ci----------------
3478c2ecf20Sopenharmony_ci
3488c2ecf20Sopenharmony_ciThe XArray reserves some entries for its own purposes.  These are never
3498c2ecf20Sopenharmony_ciexposed through the normal API, but when using the advanced API, it's
3508c2ecf20Sopenharmony_cipossible to see them.  Usually the best way to handle them is to pass them
3518c2ecf20Sopenharmony_cito xas_retry(), and retry the operation if it returns ``true``.
3528c2ecf20Sopenharmony_ci
3538c2ecf20Sopenharmony_ci.. flat-table::
3548c2ecf20Sopenharmony_ci   :widths: 1 1 6
3558c2ecf20Sopenharmony_ci
3568c2ecf20Sopenharmony_ci   * - Name
3578c2ecf20Sopenharmony_ci     - Test
3588c2ecf20Sopenharmony_ci     - Usage
3598c2ecf20Sopenharmony_ci
3608c2ecf20Sopenharmony_ci   * - Node
3618c2ecf20Sopenharmony_ci     - xa_is_node()
3628c2ecf20Sopenharmony_ci     - An XArray node.  May be visible when using a multi-index xa_state.
3638c2ecf20Sopenharmony_ci
3648c2ecf20Sopenharmony_ci   * - Sibling
3658c2ecf20Sopenharmony_ci     - xa_is_sibling()
3668c2ecf20Sopenharmony_ci     - A non-canonical entry for a multi-index entry.  The value indicates
3678c2ecf20Sopenharmony_ci       which slot in this node has the canonical entry.
3688c2ecf20Sopenharmony_ci
3698c2ecf20Sopenharmony_ci   * - Retry
3708c2ecf20Sopenharmony_ci     - xa_is_retry()
3718c2ecf20Sopenharmony_ci     - This entry is currently being modified by a thread which has the
3728c2ecf20Sopenharmony_ci       xa_lock.  The node containing this entry may be freed at the end
3738c2ecf20Sopenharmony_ci       of this RCU period.  You should restart the lookup from the head
3748c2ecf20Sopenharmony_ci       of the array.
3758c2ecf20Sopenharmony_ci
3768c2ecf20Sopenharmony_ci   * - Zero
3778c2ecf20Sopenharmony_ci     - xa_is_zero()
3788c2ecf20Sopenharmony_ci     - Zero entries appear as ``NULL`` through the Normal API, but occupy
3798c2ecf20Sopenharmony_ci       an entry in the XArray which can be used to reserve the index for
3808c2ecf20Sopenharmony_ci       future use.  This is used by allocating XArrays for allocated entries
3818c2ecf20Sopenharmony_ci       which are ``NULL``.
3828c2ecf20Sopenharmony_ci
3838c2ecf20Sopenharmony_ciOther internal entries may be added in the future.  As far as possible, they
3848c2ecf20Sopenharmony_ciwill be handled by xas_retry().
3858c2ecf20Sopenharmony_ci
3868c2ecf20Sopenharmony_ciAdditional functionality
3878c2ecf20Sopenharmony_ci------------------------
3888c2ecf20Sopenharmony_ci
3898c2ecf20Sopenharmony_ciThe xas_create_range() function allocates all the necessary memory
3908c2ecf20Sopenharmony_cito store every entry in a range.  It will set ENOMEM in the xa_state if
3918c2ecf20Sopenharmony_ciit cannot allocate memory.
3928c2ecf20Sopenharmony_ci
3938c2ecf20Sopenharmony_ciYou can use xas_init_marks() to reset the marks on an entry
3948c2ecf20Sopenharmony_cito their default state.  This is usually all marks clear, unless the
3958c2ecf20Sopenharmony_ciXArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
3968c2ecf20Sopenharmony_ciand all other marks are clear.  Replacing one entry with another using
3978c2ecf20Sopenharmony_cixas_store() will not reset the marks on that entry; if you want
3988c2ecf20Sopenharmony_cithe marks reset, you should do that explicitly.
3998c2ecf20Sopenharmony_ci
4008c2ecf20Sopenharmony_ciThe xas_load() will walk the xa_state as close to the entry
4018c2ecf20Sopenharmony_cias it can.  If you know the xa_state has already been walked to the
4028c2ecf20Sopenharmony_cientry and need to check that the entry hasn't changed, you can use
4038c2ecf20Sopenharmony_cixas_reload() to save a function call.
4048c2ecf20Sopenharmony_ci
4058c2ecf20Sopenharmony_ciIf you need to move to a different index in the XArray, call
4068c2ecf20Sopenharmony_cixas_set().  This resets the cursor to the top of the tree, which
4078c2ecf20Sopenharmony_ciwill generally make the next operation walk the cursor to the desired
4088c2ecf20Sopenharmony_cispot in the tree.  If you want to move to the next or previous index,
4098c2ecf20Sopenharmony_cicall xas_next() or xas_prev().  Setting the index does
4108c2ecf20Sopenharmony_cinot walk the cursor around the array so does not require a lock to be
4118c2ecf20Sopenharmony_ciheld, while moving to the next or previous index does.
4128c2ecf20Sopenharmony_ci
4138c2ecf20Sopenharmony_ciYou can search for the next present entry using xas_find().  This
4148c2ecf20Sopenharmony_ciis the equivalent of both xa_find() and xa_find_after();
4158c2ecf20Sopenharmony_ciif the cursor has been walked to an entry, then it will find the next
4168c2ecf20Sopenharmony_cientry after the one currently referenced.  If not, it will return the
4178c2ecf20Sopenharmony_cientry at the index of the xa_state.  Using xas_next_entry() to
4188c2ecf20Sopenharmony_cimove to the next present entry instead of xas_find() will save
4198c2ecf20Sopenharmony_cia function call in the majority of cases at the expense of emitting more
4208c2ecf20Sopenharmony_ciinline code.
4218c2ecf20Sopenharmony_ci
4228c2ecf20Sopenharmony_ciThe xas_find_marked() function is similar.  If the xa_state has
4238c2ecf20Sopenharmony_cinot been walked, it will return the entry at the index of the xa_state,
4248c2ecf20Sopenharmony_ciif it is marked.  Otherwise, it will return the first marked entry after
4258c2ecf20Sopenharmony_cithe entry referenced by the xa_state.  The xas_next_marked()
4268c2ecf20Sopenharmony_cifunction is the equivalent of xas_next_entry().
4278c2ecf20Sopenharmony_ci
4288c2ecf20Sopenharmony_ciWhen iterating over a range of the XArray using xas_for_each()
4298c2ecf20Sopenharmony_cior xas_for_each_marked(), it may be necessary to temporarily stop
4308c2ecf20Sopenharmony_cithe iteration.  The xas_pause() function exists for this purpose.
4318c2ecf20Sopenharmony_ciAfter you have done the necessary work and wish to resume, the xa_state
4328c2ecf20Sopenharmony_ciis in an appropriate state to continue the iteration after the entry
4338c2ecf20Sopenharmony_ciyou last processed.  If you have interrupts disabled while iterating,
4348c2ecf20Sopenharmony_cithen it is good manners to pause the iteration and reenable interrupts
4358c2ecf20Sopenharmony_cievery ``XA_CHECK_SCHED`` entries.
4368c2ecf20Sopenharmony_ci
4378c2ecf20Sopenharmony_ciThe xas_get_mark(), xas_set_mark() and xas_clear_mark() functions require
4388c2ecf20Sopenharmony_cithe xa_state cursor to have been moved to the appropriate location in the
4398c2ecf20Sopenharmony_ciXArray; they will do nothing if you have called xas_pause() or xas_set()
4408c2ecf20Sopenharmony_ciimmediately before.
4418c2ecf20Sopenharmony_ci
4428c2ecf20Sopenharmony_ciYou can call xas_set_update() to have a callback function
4438c2ecf20Sopenharmony_cicalled each time the XArray updates a node.  This is used by the page
4448c2ecf20Sopenharmony_cicache workingset code to maintain its list of nodes which contain only
4458c2ecf20Sopenharmony_cishadow entries.
4468c2ecf20Sopenharmony_ci
4478c2ecf20Sopenharmony_ciMulti-Index Entries
4488c2ecf20Sopenharmony_ci-------------------
4498c2ecf20Sopenharmony_ci
4508c2ecf20Sopenharmony_ciThe XArray has the ability to tie multiple indices together so that
4518c2ecf20Sopenharmony_cioperations on one index affect all indices.  For example, storing into
4528c2ecf20Sopenharmony_ciany index will change the value of the entry retrieved from any index.
4538c2ecf20Sopenharmony_ciSetting or clearing a mark on any index will set or clear the mark
4548c2ecf20Sopenharmony_cion every index that is tied together.  The current implementation
4558c2ecf20Sopenharmony_cionly allows tying ranges which are aligned powers of two together;
4568c2ecf20Sopenharmony_cieg indices 64-127 may be tied together, but 2-6 may not be.  This may
4578c2ecf20Sopenharmony_cisave substantial quantities of memory; for example tying 512 entries
4588c2ecf20Sopenharmony_citogether will save over 4kB.
4598c2ecf20Sopenharmony_ci
4608c2ecf20Sopenharmony_ciYou can create a multi-index entry by using XA_STATE_ORDER()
4618c2ecf20Sopenharmony_cior xas_set_order() followed by a call to xas_store().
4628c2ecf20Sopenharmony_ciCalling xas_load() with a multi-index xa_state will walk the
4638c2ecf20Sopenharmony_cixa_state to the right location in the tree, but the return value is not
4648c2ecf20Sopenharmony_cimeaningful, potentially being an internal entry or ``NULL`` even when there
4658c2ecf20Sopenharmony_ciis an entry stored within the range.  Calling xas_find_conflict()
4668c2ecf20Sopenharmony_ciwill return the first entry within the range or ``NULL`` if there are no
4678c2ecf20Sopenharmony_cientries in the range.  The xas_for_each_conflict() iterator will
4688c2ecf20Sopenharmony_ciiterate over every entry which overlaps the specified range.
4698c2ecf20Sopenharmony_ci
4708c2ecf20Sopenharmony_ciIf xas_load() encounters a multi-index entry, the xa_index
4718c2ecf20Sopenharmony_ciin the xa_state will not be changed.  When iterating over an XArray
4728c2ecf20Sopenharmony_cior calling xas_find(), if the initial index is in the middle
4738c2ecf20Sopenharmony_ciof a multi-index entry, it will not be altered.  Subsequent calls
4748c2ecf20Sopenharmony_cior iterations will move the index to the first index in the range.
4758c2ecf20Sopenharmony_ciEach entry will only be returned once, no matter how many indices it
4768c2ecf20Sopenharmony_cioccupies.
4778c2ecf20Sopenharmony_ci
4788c2ecf20Sopenharmony_ciUsing xas_next() or xas_prev() with a multi-index xa_state is not
4798c2ecf20Sopenharmony_cisupported.  Using either of these functions on a multi-index entry will
4808c2ecf20Sopenharmony_cireveal sibling entries; these should be skipped over by the caller.
4818c2ecf20Sopenharmony_ci
4828c2ecf20Sopenharmony_ciStoring ``NULL`` into any index of a multi-index entry will set the
4838c2ecf20Sopenharmony_cientry at every index to ``NULL`` and dissolve the tie.  A multi-index
4848c2ecf20Sopenharmony_cientry can be split into entries occupying smaller ranges by calling
4858c2ecf20Sopenharmony_cixas_split_alloc() without the xa_lock held, followed by taking the lock
4868c2ecf20Sopenharmony_ciand calling xas_split().
4878c2ecf20Sopenharmony_ci
4888c2ecf20Sopenharmony_ciFunctions and structures
4898c2ecf20Sopenharmony_ci========================
4908c2ecf20Sopenharmony_ci
4918c2ecf20Sopenharmony_ci.. kernel-doc:: include/linux/xarray.h
4928c2ecf20Sopenharmony_ci.. kernel-doc:: lib/xarray.c
493