17db96d56Sopenharmony_ci.. _sortinghowto:
27db96d56Sopenharmony_ci
37db96d56Sopenharmony_ciSorting HOW TO
47db96d56Sopenharmony_ci**************
57db96d56Sopenharmony_ci
67db96d56Sopenharmony_ci:Author: Andrew Dalke and Raymond Hettinger
77db96d56Sopenharmony_ci:Release: 0.1
87db96d56Sopenharmony_ci
97db96d56Sopenharmony_ci
107db96d56Sopenharmony_ciPython lists have a built-in :meth:`list.sort` method that modifies the list
117db96d56Sopenharmony_ciin-place.  There is also a :func:`sorted` built-in function that builds a new
127db96d56Sopenharmony_cisorted list from an iterable.
137db96d56Sopenharmony_ci
147db96d56Sopenharmony_ciIn this document, we explore the various techniques for sorting data using Python.
157db96d56Sopenharmony_ci
167db96d56Sopenharmony_ci
177db96d56Sopenharmony_ciSorting Basics
187db96d56Sopenharmony_ci==============
197db96d56Sopenharmony_ci
207db96d56Sopenharmony_ciA simple ascending sort is very easy: just call the :func:`sorted` function. It
217db96d56Sopenharmony_cireturns a new sorted list:
227db96d56Sopenharmony_ci
237db96d56Sopenharmony_ci.. doctest::
247db96d56Sopenharmony_ci
257db96d56Sopenharmony_ci    >>> sorted([5, 2, 3, 1, 4])
267db96d56Sopenharmony_ci    [1, 2, 3, 4, 5]
277db96d56Sopenharmony_ci
287db96d56Sopenharmony_ciYou can also use the :meth:`list.sort` method. It modifies the list
297db96d56Sopenharmony_ciin-place (and returns ``None`` to avoid confusion). Usually it's less convenient
307db96d56Sopenharmony_cithan :func:`sorted` - but if you don't need the original list, it's slightly
317db96d56Sopenharmony_cimore efficient.
327db96d56Sopenharmony_ci
337db96d56Sopenharmony_ci.. doctest::
347db96d56Sopenharmony_ci
357db96d56Sopenharmony_ci    >>> a = [5, 2, 3, 1, 4]
367db96d56Sopenharmony_ci    >>> a.sort()
377db96d56Sopenharmony_ci    >>> a
387db96d56Sopenharmony_ci    [1, 2, 3, 4, 5]
397db96d56Sopenharmony_ci
407db96d56Sopenharmony_ciAnother difference is that the :meth:`list.sort` method is only defined for
417db96d56Sopenharmony_cilists. In contrast, the :func:`sorted` function accepts any iterable.
427db96d56Sopenharmony_ci
437db96d56Sopenharmony_ci.. doctest::
447db96d56Sopenharmony_ci
457db96d56Sopenharmony_ci    >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
467db96d56Sopenharmony_ci    [1, 2, 3, 4, 5]
477db96d56Sopenharmony_ci
487db96d56Sopenharmony_ciKey Functions
497db96d56Sopenharmony_ci=============
507db96d56Sopenharmony_ci
517db96d56Sopenharmony_ciBoth :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
527db96d56Sopenharmony_cifunction (or other callable) to be called on each list element prior to making
537db96d56Sopenharmony_cicomparisons.
547db96d56Sopenharmony_ci
557db96d56Sopenharmony_ciFor example, here's a case-insensitive string comparison:
567db96d56Sopenharmony_ci
577db96d56Sopenharmony_ci.. doctest::
587db96d56Sopenharmony_ci
597db96d56Sopenharmony_ci    >>> sorted("This is a test string from Andrew".split(), key=str.lower)
607db96d56Sopenharmony_ci    ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
617db96d56Sopenharmony_ci
627db96d56Sopenharmony_ciThe value of the *key* parameter should be a function (or other callable) that
637db96d56Sopenharmony_citakes a single argument and returns a key to use for sorting purposes. This
647db96d56Sopenharmony_citechnique is fast because the key function is called exactly once for each
657db96d56Sopenharmony_ciinput record.
667db96d56Sopenharmony_ci
677db96d56Sopenharmony_ciA common pattern is to sort complex objects using some of the object's indices
687db96d56Sopenharmony_cias keys. For example:
697db96d56Sopenharmony_ci
707db96d56Sopenharmony_ci.. doctest::
717db96d56Sopenharmony_ci
727db96d56Sopenharmony_ci    >>> student_tuples = [
737db96d56Sopenharmony_ci    ...     ('john', 'A', 15),
747db96d56Sopenharmony_ci    ...     ('jane', 'B', 12),
757db96d56Sopenharmony_ci    ...     ('dave', 'B', 10),
767db96d56Sopenharmony_ci    ... ]
777db96d56Sopenharmony_ci    >>> sorted(student_tuples, key=lambda student: student[2])   # sort by age
787db96d56Sopenharmony_ci    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
797db96d56Sopenharmony_ci
807db96d56Sopenharmony_ciThe same technique works for objects with named attributes. For example:
817db96d56Sopenharmony_ci
827db96d56Sopenharmony_ci.. doctest::
837db96d56Sopenharmony_ci
847db96d56Sopenharmony_ci    >>> class Student:
857db96d56Sopenharmony_ci    ...     def __init__(self, name, grade, age):
867db96d56Sopenharmony_ci    ...         self.name = name
877db96d56Sopenharmony_ci    ...         self.grade = grade
887db96d56Sopenharmony_ci    ...         self.age = age
897db96d56Sopenharmony_ci    ...     def __repr__(self):
907db96d56Sopenharmony_ci    ...         return repr((self.name, self.grade, self.age))
917db96d56Sopenharmony_ci
927db96d56Sopenharmony_ci    >>> student_objects = [
937db96d56Sopenharmony_ci    ...     Student('john', 'A', 15),
947db96d56Sopenharmony_ci    ...     Student('jane', 'B', 12),
957db96d56Sopenharmony_ci    ...     Student('dave', 'B', 10),
967db96d56Sopenharmony_ci    ... ]
977db96d56Sopenharmony_ci    >>> sorted(student_objects, key=lambda student: student.age)   # sort by age
987db96d56Sopenharmony_ci    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
997db96d56Sopenharmony_ci
1007db96d56Sopenharmony_ciOperator Module Functions
1017db96d56Sopenharmony_ci=========================
1027db96d56Sopenharmony_ci
1037db96d56Sopenharmony_ciThe key-function patterns shown above are very common, so Python provides
1047db96d56Sopenharmony_ciconvenience functions to make accessor functions easier and faster. The
1057db96d56Sopenharmony_ci:mod:`operator` module has :func:`~operator.itemgetter`,
1067db96d56Sopenharmony_ci:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
1077db96d56Sopenharmony_ci
1087db96d56Sopenharmony_ciUsing those functions, the above examples become simpler and faster:
1097db96d56Sopenharmony_ci
1107db96d56Sopenharmony_ci.. doctest::
1117db96d56Sopenharmony_ci
1127db96d56Sopenharmony_ci    >>> from operator import itemgetter, attrgetter
1137db96d56Sopenharmony_ci
1147db96d56Sopenharmony_ci    >>> sorted(student_tuples, key=itemgetter(2))
1157db96d56Sopenharmony_ci    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
1167db96d56Sopenharmony_ci
1177db96d56Sopenharmony_ci    >>> sorted(student_objects, key=attrgetter('age'))
1187db96d56Sopenharmony_ci    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
1197db96d56Sopenharmony_ci
1207db96d56Sopenharmony_ciThe operator module functions allow multiple levels of sorting. For example, to
1217db96d56Sopenharmony_cisort by *grade* then by *age*:
1227db96d56Sopenharmony_ci
1237db96d56Sopenharmony_ci.. doctest::
1247db96d56Sopenharmony_ci
1257db96d56Sopenharmony_ci    >>> sorted(student_tuples, key=itemgetter(1,2))
1267db96d56Sopenharmony_ci    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
1277db96d56Sopenharmony_ci
1287db96d56Sopenharmony_ci    >>> sorted(student_objects, key=attrgetter('grade', 'age'))
1297db96d56Sopenharmony_ci    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
1307db96d56Sopenharmony_ci
1317db96d56Sopenharmony_ciAscending and Descending
1327db96d56Sopenharmony_ci========================
1337db96d56Sopenharmony_ci
1347db96d56Sopenharmony_ciBoth :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
1357db96d56Sopenharmony_ciboolean value. This is used to flag descending sorts. For example, to get the
1367db96d56Sopenharmony_cistudent data in reverse *age* order:
1377db96d56Sopenharmony_ci
1387db96d56Sopenharmony_ci.. doctest::
1397db96d56Sopenharmony_ci
1407db96d56Sopenharmony_ci    >>> sorted(student_tuples, key=itemgetter(2), reverse=True)
1417db96d56Sopenharmony_ci    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
1427db96d56Sopenharmony_ci
1437db96d56Sopenharmony_ci    >>> sorted(student_objects, key=attrgetter('age'), reverse=True)
1447db96d56Sopenharmony_ci    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
1457db96d56Sopenharmony_ci
1467db96d56Sopenharmony_ciSort Stability and Complex Sorts
1477db96d56Sopenharmony_ci================================
1487db96d56Sopenharmony_ci
1497db96d56Sopenharmony_ciSorts are guaranteed to be `stable
1507db96d56Sopenharmony_ci<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
1517db96d56Sopenharmony_ciwhen multiple records have the same key, their original order is preserved.
1527db96d56Sopenharmony_ci
1537db96d56Sopenharmony_ci.. doctest::
1547db96d56Sopenharmony_ci
1557db96d56Sopenharmony_ci    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
1567db96d56Sopenharmony_ci    >>> sorted(data, key=itemgetter(0))
1577db96d56Sopenharmony_ci    [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
1587db96d56Sopenharmony_ci
1597db96d56Sopenharmony_ciNotice how the two records for *blue* retain their original order so that
1607db96d56Sopenharmony_ci``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
1617db96d56Sopenharmony_ci
1627db96d56Sopenharmony_ciThis wonderful property lets you build complex sorts in a series of sorting
1637db96d56Sopenharmony_cisteps. For example, to sort the student data by descending *grade* and then
1647db96d56Sopenharmony_ciascending *age*, do the *age* sort first and then sort again using *grade*:
1657db96d56Sopenharmony_ci
1667db96d56Sopenharmony_ci.. doctest::
1677db96d56Sopenharmony_ci
1687db96d56Sopenharmony_ci    >>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key
1697db96d56Sopenharmony_ci    >>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
1707db96d56Sopenharmony_ci    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
1717db96d56Sopenharmony_ci
1727db96d56Sopenharmony_ciThis can be abstracted out into a wrapper function that can take a list and
1737db96d56Sopenharmony_cituples of field and order to sort them on multiple passes.
1747db96d56Sopenharmony_ci
1757db96d56Sopenharmony_ci.. doctest::
1767db96d56Sopenharmony_ci
1777db96d56Sopenharmony_ci    >>> def multisort(xs, specs):
1787db96d56Sopenharmony_ci    ...     for key, reverse in reversed(specs):
1797db96d56Sopenharmony_ci    ...         xs.sort(key=attrgetter(key), reverse=reverse)
1807db96d56Sopenharmony_ci    ...     return xs
1817db96d56Sopenharmony_ci
1827db96d56Sopenharmony_ci    >>> multisort(list(student_objects), (('grade', True), ('age', False)))
1837db96d56Sopenharmony_ci    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
1847db96d56Sopenharmony_ci
1857db96d56Sopenharmony_ciThe `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
1867db96d56Sopenharmony_cidoes multiple sorts efficiently because it can take advantage of any ordering
1877db96d56Sopenharmony_cialready present in a dataset.
1887db96d56Sopenharmony_ci
1897db96d56Sopenharmony_ciDecorate-Sort-Undecorate
1907db96d56Sopenharmony_ci========================
1917db96d56Sopenharmony_ci
1927db96d56Sopenharmony_ciThis idiom is called Decorate-Sort-Undecorate after its three steps:
1937db96d56Sopenharmony_ci
1947db96d56Sopenharmony_ci* First, the initial list is decorated with new values that control the sort order.
1957db96d56Sopenharmony_ci
1967db96d56Sopenharmony_ci* Second, the decorated list is sorted.
1977db96d56Sopenharmony_ci
1987db96d56Sopenharmony_ci* Finally, the decorations are removed, creating a list that contains only the
1997db96d56Sopenharmony_ci  initial values in the new order.
2007db96d56Sopenharmony_ci
2017db96d56Sopenharmony_ciFor example, to sort the student data by *grade* using the DSU approach:
2027db96d56Sopenharmony_ci
2037db96d56Sopenharmony_ci    >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
2047db96d56Sopenharmony_ci    >>> decorated.sort()
2057db96d56Sopenharmony_ci    >>> [student for grade, i, student in decorated]               # undecorate
2067db96d56Sopenharmony_ci    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
2077db96d56Sopenharmony_ci
2087db96d56Sopenharmony_ciThis idiom works because tuples are compared lexicographically; the first items
2097db96d56Sopenharmony_ciare compared; if they are the same then the second items are compared, and so
2107db96d56Sopenharmony_cion.
2117db96d56Sopenharmony_ci
2127db96d56Sopenharmony_ciIt is not strictly necessary in all cases to include the index *i* in the
2137db96d56Sopenharmony_cidecorated list, but including it gives two benefits:
2147db96d56Sopenharmony_ci
2157db96d56Sopenharmony_ci* The sort is stable -- if two items have the same key, their order will be
2167db96d56Sopenharmony_ci  preserved in the sorted list.
2177db96d56Sopenharmony_ci
2187db96d56Sopenharmony_ci* The original items do not have to be comparable because the ordering of the
2197db96d56Sopenharmony_ci  decorated tuples will be determined by at most the first two items. So for
2207db96d56Sopenharmony_ci  example the original list could contain complex numbers which cannot be sorted
2217db96d56Sopenharmony_ci  directly.
2227db96d56Sopenharmony_ci
2237db96d56Sopenharmony_ciAnother name for this idiom is
2247db96d56Sopenharmony_ci`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
2257db96d56Sopenharmony_ciafter Randal L. Schwartz, who popularized it among Perl programmers.
2267db96d56Sopenharmony_ci
2277db96d56Sopenharmony_ciNow that Python sorting provides key-functions, this technique is not often needed.
2287db96d56Sopenharmony_ci
2297db96d56Sopenharmony_ciComparison Functions
2307db96d56Sopenharmony_ci====================
2317db96d56Sopenharmony_ci
2327db96d56Sopenharmony_ciUnlike key functions that return an absolute value for sorting, a comparison
2337db96d56Sopenharmony_cifunction computes the relative ordering for two inputs.
2347db96d56Sopenharmony_ci
2357db96d56Sopenharmony_ciFor example, a `balance scale
2367db96d56Sopenharmony_ci<https://upload.wikimedia.org/wikipedia/commons/1/17/Balance_tabac_1850.JPG>`_
2377db96d56Sopenharmony_cicompares two samples giving a relative ordering: lighter, equal, or heavier.
2387db96d56Sopenharmony_ciLikewise, a comparison function such as ``cmp(a, b)`` will return a negative
2397db96d56Sopenharmony_civalue for less-than, zero if the inputs are equal, or a positive value for
2407db96d56Sopenharmony_cigreater-than.
2417db96d56Sopenharmony_ci
2427db96d56Sopenharmony_ciIt is common to encounter comparison functions when translating algorithms from
2437db96d56Sopenharmony_ciother languages.  Also, some libraries provide comparison functions as part of
2447db96d56Sopenharmony_citheir API.  For example, :func:`locale.strcoll` is a comparison function.
2457db96d56Sopenharmony_ci
2467db96d56Sopenharmony_ciTo accommodate those situations, Python provides
2477db96d56Sopenharmony_ci:class:`functools.cmp_to_key` to wrap the comparison function
2487db96d56Sopenharmony_cito make it usable as a key function::
2497db96d56Sopenharmony_ci
2507db96d56Sopenharmony_ci    sorted(words, key=cmp_to_key(strcoll))  # locale-aware sort order
2517db96d56Sopenharmony_ci
2527db96d56Sopenharmony_ciOdds and Ends
2537db96d56Sopenharmony_ci=============
2547db96d56Sopenharmony_ci
2557db96d56Sopenharmony_ci* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
2567db96d56Sopenharmony_ci  :func:`locale.strcoll` for a comparison function.  This is necessary
2577db96d56Sopenharmony_ci  because "alphabetical" sort orderings can vary across cultures even
2587db96d56Sopenharmony_ci  if the underlying alphabet is the same.
2597db96d56Sopenharmony_ci
2607db96d56Sopenharmony_ci* The *reverse* parameter still maintains sort stability (so that records with
2617db96d56Sopenharmony_ci  equal keys retain the original order). Interestingly, that effect can be
2627db96d56Sopenharmony_ci  simulated without the parameter by using the builtin :func:`reversed` function
2637db96d56Sopenharmony_ci  twice:
2647db96d56Sopenharmony_ci
2657db96d56Sopenharmony_ci  .. doctest::
2667db96d56Sopenharmony_ci
2677db96d56Sopenharmony_ci    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
2687db96d56Sopenharmony_ci    >>> standard_way = sorted(data, key=itemgetter(0), reverse=True)
2697db96d56Sopenharmony_ci    >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0))))
2707db96d56Sopenharmony_ci    >>> assert standard_way == double_reversed
2717db96d56Sopenharmony_ci    >>> standard_way
2727db96d56Sopenharmony_ci    [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
2737db96d56Sopenharmony_ci
2747db96d56Sopenharmony_ci* The sort routines use ``<`` when making comparisons
2757db96d56Sopenharmony_ci  between two objects. So, it is easy to add a standard sort order to a class by
2767db96d56Sopenharmony_ci  defining an :meth:`__lt__` method:
2777db96d56Sopenharmony_ci
2787db96d56Sopenharmony_ci  .. doctest::
2797db96d56Sopenharmony_ci
2807db96d56Sopenharmony_ci    >>> Student.__lt__ = lambda self, other: self.age < other.age
2817db96d56Sopenharmony_ci    >>> sorted(student_objects)
2827db96d56Sopenharmony_ci    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
2837db96d56Sopenharmony_ci
2847db96d56Sopenharmony_ci  However, note that ``<`` can fall back to using :meth:`__gt__` if
2857db96d56Sopenharmony_ci  :meth:`__lt__` is not implemented (see :func:`object.__lt__`).
2867db96d56Sopenharmony_ci
2877db96d56Sopenharmony_ci* Key functions need not depend directly on the objects being sorted. A key
2887db96d56Sopenharmony_ci  function can also access external resources. For instance, if the student grades
2897db96d56Sopenharmony_ci  are stored in a dictionary, they can be used to sort a separate list of student
2907db96d56Sopenharmony_ci  names:
2917db96d56Sopenharmony_ci
2927db96d56Sopenharmony_ci  .. doctest::
2937db96d56Sopenharmony_ci
2947db96d56Sopenharmony_ci    >>> students = ['dave', 'john', 'jane']
2957db96d56Sopenharmony_ci    >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
2967db96d56Sopenharmony_ci    >>> sorted(students, key=newgrades.__getitem__)
2977db96d56Sopenharmony_ci    ['jane', 'dave', 'john']
298