xref: /third_party/python/Doc/library/bisect.rst (revision 7db96d56)
17db96d56Sopenharmony_ci:mod:`bisect` --- Array bisection algorithm
27db96d56Sopenharmony_ci===========================================
37db96d56Sopenharmony_ci
47db96d56Sopenharmony_ci.. module:: bisect
57db96d56Sopenharmony_ci   :synopsis: Array bisection algorithms for binary searching.
67db96d56Sopenharmony_ci.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
77db96d56Sopenharmony_ci.. sectionauthor:: Raymond Hettinger <python at rcn.com>
87db96d56Sopenharmony_ci.. example based on the PyModules FAQ entry by Aaron Watters <arw@pythonpros.com>
97db96d56Sopenharmony_ci
107db96d56Sopenharmony_ci**Source code:** :source:`Lib/bisect.py`
117db96d56Sopenharmony_ci
127db96d56Sopenharmony_ci--------------
137db96d56Sopenharmony_ci
147db96d56Sopenharmony_ciThis module provides support for maintaining a list in sorted order without
157db96d56Sopenharmony_cihaving to sort the list after each insertion.  For long lists of items with
167db96d56Sopenharmony_ciexpensive comparison operations, this can be an improvement over the more common
177db96d56Sopenharmony_ciapproach.  The module is called :mod:`bisect` because it uses a basic bisection
187db96d56Sopenharmony_cialgorithm to do its work.  The source code may be most useful as a working
197db96d56Sopenharmony_ciexample of the algorithm (the boundary conditions are already right!).
207db96d56Sopenharmony_ci
217db96d56Sopenharmony_ci.. _bisect functions:
227db96d56Sopenharmony_ci
237db96d56Sopenharmony_ciThe following functions are provided:
247db96d56Sopenharmony_ci
257db96d56Sopenharmony_ci
267db96d56Sopenharmony_ci.. function:: bisect_left(a, x, lo=0, hi=len(a), *, key=None)
277db96d56Sopenharmony_ci
287db96d56Sopenharmony_ci   Locate the insertion point for *x* in *a* to maintain sorted order.
297db96d56Sopenharmony_ci   The parameters *lo* and *hi* may be used to specify a subset of the list
307db96d56Sopenharmony_ci   which should be considered; by default the entire list is used.  If *x* is
317db96d56Sopenharmony_ci   already present in *a*, the insertion point will be before (to the left of)
327db96d56Sopenharmony_ci   any existing entries.  The return value is suitable for use as the first
337db96d56Sopenharmony_ci   parameter to ``list.insert()`` assuming that *a* is already sorted.
347db96d56Sopenharmony_ci
357db96d56Sopenharmony_ci   The returned insertion point *i* partitions the array *a* into two halves so
367db96d56Sopenharmony_ci   that ``all(val < x for val in a[lo : i])`` for the left side and
377db96d56Sopenharmony_ci   ``all(val >= x for val in a[i : hi])`` for the right side.
387db96d56Sopenharmony_ci
397db96d56Sopenharmony_ci   *key* specifies a :term:`key function` of one argument that is used to
407db96d56Sopenharmony_ci   extract a comparison key from each element in the array.  To support
417db96d56Sopenharmony_ci   searching complex records, the key function is not applied to the *x* value.
427db96d56Sopenharmony_ci
437db96d56Sopenharmony_ci   If *key* is ``None``, the elements are compared directly with no
447db96d56Sopenharmony_ci   intervening function call.
457db96d56Sopenharmony_ci
467db96d56Sopenharmony_ci   .. versionchanged:: 3.10
477db96d56Sopenharmony_ci      Added the *key* parameter.
487db96d56Sopenharmony_ci
497db96d56Sopenharmony_ci
507db96d56Sopenharmony_ci.. function:: bisect_right(a, x, lo=0, hi=len(a), *, key=None)
517db96d56Sopenharmony_ci              bisect(a, x, lo=0, hi=len(a), *, key=None)
527db96d56Sopenharmony_ci
537db96d56Sopenharmony_ci   Similar to :py:func:`~bisect.bisect_left`, but returns an insertion point which comes
547db96d56Sopenharmony_ci   after (to the right of) any existing entries of *x* in *a*.
557db96d56Sopenharmony_ci
567db96d56Sopenharmony_ci   The returned insertion point *i* partitions the array *a* into two halves so
577db96d56Sopenharmony_ci   that ``all(val <= x for val in a[lo : i])`` for the left side and
587db96d56Sopenharmony_ci   ``all(val > x for val in a[i : hi])`` for the right side.
597db96d56Sopenharmony_ci
607db96d56Sopenharmony_ci   *key* specifies a :term:`key function` of one argument that is used to
617db96d56Sopenharmony_ci   extract a comparison key from each element in the array.  To support
627db96d56Sopenharmony_ci   searching complex records, the key function is not applied to the *x* value.
637db96d56Sopenharmony_ci
647db96d56Sopenharmony_ci   If *key* is ``None``, the elements are compared directly with no
657db96d56Sopenharmony_ci   intervening function call.
667db96d56Sopenharmony_ci
677db96d56Sopenharmony_ci   .. versionchanged:: 3.10
687db96d56Sopenharmony_ci      Added the *key* parameter.
697db96d56Sopenharmony_ci
707db96d56Sopenharmony_ci
717db96d56Sopenharmony_ci.. function:: insort_left(a, x, lo=0, hi=len(a), *, key=None)
727db96d56Sopenharmony_ci
737db96d56Sopenharmony_ci   Insert *x* in *a* in sorted order.
747db96d56Sopenharmony_ci
757db96d56Sopenharmony_ci   This function first runs :py:func:`~bisect.bisect_left` to locate an insertion point.
767db96d56Sopenharmony_ci   Next, it runs the :meth:`insert` method on *a* to insert *x* at the
777db96d56Sopenharmony_ci   appropriate position to maintain sort order.
787db96d56Sopenharmony_ci
797db96d56Sopenharmony_ci   To support inserting records in a table, the *key* function (if any) is
807db96d56Sopenharmony_ci   applied to *x* for the search step but not for the insertion step.
817db96d56Sopenharmony_ci
827db96d56Sopenharmony_ci   Keep in mind that the ``O(log n)`` search is dominated by the slow O(n)
837db96d56Sopenharmony_ci   insertion step.
847db96d56Sopenharmony_ci
857db96d56Sopenharmony_ci   .. versionchanged:: 3.10
867db96d56Sopenharmony_ci      Added the *key* parameter.
877db96d56Sopenharmony_ci
887db96d56Sopenharmony_ci
897db96d56Sopenharmony_ci.. function:: insort_right(a, x, lo=0, hi=len(a), *, key=None)
907db96d56Sopenharmony_ci              insort(a, x, lo=0, hi=len(a), *, key=None)
917db96d56Sopenharmony_ci
927db96d56Sopenharmony_ci   Similar to :py:func:`~bisect.insort_left`, but inserting *x* in *a* after any existing
937db96d56Sopenharmony_ci   entries of *x*.
947db96d56Sopenharmony_ci
957db96d56Sopenharmony_ci   This function first runs :py:func:`~bisect.bisect_right` to locate an insertion point.
967db96d56Sopenharmony_ci   Next, it runs the :meth:`insert` method on *a* to insert *x* at the
977db96d56Sopenharmony_ci   appropriate position to maintain sort order.
987db96d56Sopenharmony_ci
997db96d56Sopenharmony_ci   To support inserting records in a table, the *key* function (if any) is
1007db96d56Sopenharmony_ci   applied to *x* for the search step but not for the insertion step.
1017db96d56Sopenharmony_ci
1027db96d56Sopenharmony_ci   Keep in mind that the ``O(log n)`` search is dominated by the slow O(n)
1037db96d56Sopenharmony_ci   insertion step.
1047db96d56Sopenharmony_ci
1057db96d56Sopenharmony_ci   .. versionchanged:: 3.10
1067db96d56Sopenharmony_ci      Added the *key* parameter.
1077db96d56Sopenharmony_ci
1087db96d56Sopenharmony_ci
1097db96d56Sopenharmony_ciPerformance Notes
1107db96d56Sopenharmony_ci-----------------
1117db96d56Sopenharmony_ci
1127db96d56Sopenharmony_ciWhen writing time sensitive code using *bisect()* and *insort()*, keep these
1137db96d56Sopenharmony_cithoughts in mind:
1147db96d56Sopenharmony_ci
1157db96d56Sopenharmony_ci* Bisection is effective for searching ranges of values.
1167db96d56Sopenharmony_ci  For locating specific values, dictionaries are more performant.
1177db96d56Sopenharmony_ci
1187db96d56Sopenharmony_ci* The *insort()* functions are ``O(n)`` because the logarithmic search step
1197db96d56Sopenharmony_ci  is dominated by the linear time insertion step.
1207db96d56Sopenharmony_ci
1217db96d56Sopenharmony_ci* The search functions are stateless and discard key function results after
1227db96d56Sopenharmony_ci  they are used.  Consequently, if the search functions are used in a loop,
1237db96d56Sopenharmony_ci  the key function may be called again and again on the same array elements.
1247db96d56Sopenharmony_ci  If the key function isn't fast, consider wrapping it with
1257db96d56Sopenharmony_ci  :py:func:`functools.cache` to avoid duplicate computations.  Alternatively,
1267db96d56Sopenharmony_ci  consider searching an array of precomputed keys to locate the insertion
1277db96d56Sopenharmony_ci  point (as shown in the examples section below).
1287db96d56Sopenharmony_ci
1297db96d56Sopenharmony_ci.. seealso::
1307db96d56Sopenharmony_ci
1317db96d56Sopenharmony_ci   * `Sorted Collections
1327db96d56Sopenharmony_ci     <https://grantjenks.com/docs/sortedcollections/>`_ is a high performance
1337db96d56Sopenharmony_ci     module that uses *bisect* to managed sorted collections of data.
1347db96d56Sopenharmony_ci
1357db96d56Sopenharmony_ci   * The `SortedCollection recipe
1367db96d56Sopenharmony_ci     <https://code.activestate.com/recipes/577197-sortedcollection/>`_ uses
1377db96d56Sopenharmony_ci     bisect to build a full-featured collection class with straight-forward search
1387db96d56Sopenharmony_ci     methods and support for a key-function.  The keys are precomputed to save
1397db96d56Sopenharmony_ci     unnecessary calls to the key function during searches.
1407db96d56Sopenharmony_ci
1417db96d56Sopenharmony_ci
1427db96d56Sopenharmony_ciSearching Sorted Lists
1437db96d56Sopenharmony_ci----------------------
1447db96d56Sopenharmony_ci
1457db96d56Sopenharmony_ciThe above `bisect functions`_ are useful for finding insertion points but
1467db96d56Sopenharmony_cican be tricky or awkward to use for common searching tasks. The following five
1477db96d56Sopenharmony_cifunctions show how to transform them into the standard lookups for sorted
1487db96d56Sopenharmony_cilists::
1497db96d56Sopenharmony_ci
1507db96d56Sopenharmony_ci    def index(a, x):
1517db96d56Sopenharmony_ci        'Locate the leftmost value exactly equal to x'
1527db96d56Sopenharmony_ci        i = bisect_left(a, x)
1537db96d56Sopenharmony_ci        if i != len(a) and a[i] == x:
1547db96d56Sopenharmony_ci            return i
1557db96d56Sopenharmony_ci        raise ValueError
1567db96d56Sopenharmony_ci
1577db96d56Sopenharmony_ci    def find_lt(a, x):
1587db96d56Sopenharmony_ci        'Find rightmost value less than x'
1597db96d56Sopenharmony_ci        i = bisect_left(a, x)
1607db96d56Sopenharmony_ci        if i:
1617db96d56Sopenharmony_ci            return a[i-1]
1627db96d56Sopenharmony_ci        raise ValueError
1637db96d56Sopenharmony_ci
1647db96d56Sopenharmony_ci    def find_le(a, x):
1657db96d56Sopenharmony_ci        'Find rightmost value less than or equal to x'
1667db96d56Sopenharmony_ci        i = bisect_right(a, x)
1677db96d56Sopenharmony_ci        if i:
1687db96d56Sopenharmony_ci            return a[i-1]
1697db96d56Sopenharmony_ci        raise ValueError
1707db96d56Sopenharmony_ci
1717db96d56Sopenharmony_ci    def find_gt(a, x):
1727db96d56Sopenharmony_ci        'Find leftmost value greater than x'
1737db96d56Sopenharmony_ci        i = bisect_right(a, x)
1747db96d56Sopenharmony_ci        if i != len(a):
1757db96d56Sopenharmony_ci            return a[i]
1767db96d56Sopenharmony_ci        raise ValueError
1777db96d56Sopenharmony_ci
1787db96d56Sopenharmony_ci    def find_ge(a, x):
1797db96d56Sopenharmony_ci        'Find leftmost item greater than or equal to x'
1807db96d56Sopenharmony_ci        i = bisect_left(a, x)
1817db96d56Sopenharmony_ci        if i != len(a):
1827db96d56Sopenharmony_ci            return a[i]
1837db96d56Sopenharmony_ci        raise ValueError
1847db96d56Sopenharmony_ci
1857db96d56Sopenharmony_ci
1867db96d56Sopenharmony_ciExamples
1877db96d56Sopenharmony_ci--------
1887db96d56Sopenharmony_ci
1897db96d56Sopenharmony_ci.. _bisect-example:
1907db96d56Sopenharmony_ci
1917db96d56Sopenharmony_ciThe :py:func:`~bisect.bisect` function can be useful for numeric table lookups. This
1927db96d56Sopenharmony_ciexample uses :py:func:`~bisect.bisect` to look up a letter grade for an exam score (say)
1937db96d56Sopenharmony_cibased on a set of ordered numeric breakpoints: 90 and up is an 'A', 80 to 89 is
1947db96d56Sopenharmony_cia 'B', and so on::
1957db96d56Sopenharmony_ci
1967db96d56Sopenharmony_ci   >>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
1977db96d56Sopenharmony_ci   ...     i = bisect(breakpoints, score)
1987db96d56Sopenharmony_ci   ...     return grades[i]
1997db96d56Sopenharmony_ci   ...
2007db96d56Sopenharmony_ci   >>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
2017db96d56Sopenharmony_ci   ['F', 'A', 'C', 'C', 'B', 'A', 'A']
2027db96d56Sopenharmony_ci
2037db96d56Sopenharmony_ciThe :py:func:`~bisect.bisect` and :py:func:`~bisect.insort` functions also work with
2047db96d56Sopenharmony_cilists of tuples.  The *key* argument can serve to extract the field used for ordering
2057db96d56Sopenharmony_cirecords in a table::
2067db96d56Sopenharmony_ci
2077db96d56Sopenharmony_ci    >>> from collections import namedtuple
2087db96d56Sopenharmony_ci    >>> from operator import attrgetter
2097db96d56Sopenharmony_ci    >>> from bisect import bisect, insort
2107db96d56Sopenharmony_ci    >>> from pprint import pprint
2117db96d56Sopenharmony_ci
2127db96d56Sopenharmony_ci    >>> Movie = namedtuple('Movie', ('name', 'released', 'director'))
2137db96d56Sopenharmony_ci
2147db96d56Sopenharmony_ci    >>> movies = [
2157db96d56Sopenharmony_ci    ...     Movie('Jaws', 1975, 'Spielberg'),
2167db96d56Sopenharmony_ci    ...     Movie('Titanic', 1997, 'Cameron'),
2177db96d56Sopenharmony_ci    ...     Movie('The Birds', 1963, 'Hitchcock'),
2187db96d56Sopenharmony_ci    ...     Movie('Aliens', 1986, 'Cameron')
2197db96d56Sopenharmony_ci    ... ]
2207db96d56Sopenharmony_ci
2217db96d56Sopenharmony_ci    >>> # Find the first movie released after 1960
2227db96d56Sopenharmony_ci    >>> by_year = attrgetter('released')
2237db96d56Sopenharmony_ci    >>> movies.sort(key=by_year)
2247db96d56Sopenharmony_ci    >>> movies[bisect(movies, 1960, key=by_year)]
2257db96d56Sopenharmony_ci    Movie(name='The Birds', released=1963, director='Hitchcock')
2267db96d56Sopenharmony_ci
2277db96d56Sopenharmony_ci    >>> # Insert a movie while maintaining sort order
2287db96d56Sopenharmony_ci    >>> romance = Movie('Love Story', 1970, 'Hiller')
2297db96d56Sopenharmony_ci    >>> insort(movies, romance, key=by_year)
2307db96d56Sopenharmony_ci    >>> pprint(movies)
2317db96d56Sopenharmony_ci    [Movie(name='The Birds', released=1963, director='Hitchcock'),
2327db96d56Sopenharmony_ci     Movie(name='Love Story', released=1970, director='Hiller'),
2337db96d56Sopenharmony_ci     Movie(name='Jaws', released=1975, director='Spielberg'),
2347db96d56Sopenharmony_ci     Movie(name='Aliens', released=1986, director='Cameron'),
2357db96d56Sopenharmony_ci     Movie(name='Titanic', released=1997, director='Cameron')]
2367db96d56Sopenharmony_ci
2377db96d56Sopenharmony_ciIf the key function is expensive, it is possible to avoid repeated function
2387db96d56Sopenharmony_cicalls by searching a list of precomputed keys to find the index of a record::
2397db96d56Sopenharmony_ci
2407db96d56Sopenharmony_ci    >>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
2417db96d56Sopenharmony_ci    >>> data.sort(key=lambda r: r[1])       # Or use operator.itemgetter(1).
2427db96d56Sopenharmony_ci    >>> keys = [r[1] for r in data]         # Precompute a list of keys.
2437db96d56Sopenharmony_ci    >>> data[bisect_left(keys, 0)]
2447db96d56Sopenharmony_ci    ('black', 0)
2457db96d56Sopenharmony_ci    >>> data[bisect_left(keys, 1)]
2467db96d56Sopenharmony_ci    ('blue', 1)
2477db96d56Sopenharmony_ci    >>> data[bisect_left(keys, 5)]
2487db96d56Sopenharmony_ci    ('red', 5)
2497db96d56Sopenharmony_ci    >>> data[bisect_left(keys, 8)]
2507db96d56Sopenharmony_ci    ('yellow', 8)
251