17db96d56Sopenharmony_ci.. _sortinghowto: 27db96d56Sopenharmony_ci 37db96d56Sopenharmony_ciSorting HOW TO 47db96d56Sopenharmony_ci************** 57db96d56Sopenharmony_ci 67db96d56Sopenharmony_ci:Author: Andrew Dalke and Raymond Hettinger 77db96d56Sopenharmony_ci:Release: 0.1 87db96d56Sopenharmony_ci 97db96d56Sopenharmony_ci 107db96d56Sopenharmony_ciPython lists have a built-in :meth:`list.sort` method that modifies the list 117db96d56Sopenharmony_ciin-place. There is also a :func:`sorted` built-in function that builds a new 127db96d56Sopenharmony_cisorted list from an iterable. 137db96d56Sopenharmony_ci 147db96d56Sopenharmony_ciIn this document, we explore the various techniques for sorting data using Python. 157db96d56Sopenharmony_ci 167db96d56Sopenharmony_ci 177db96d56Sopenharmony_ciSorting Basics 187db96d56Sopenharmony_ci============== 197db96d56Sopenharmony_ci 207db96d56Sopenharmony_ciA simple ascending sort is very easy: just call the :func:`sorted` function. It 217db96d56Sopenharmony_cireturns a new sorted list: 227db96d56Sopenharmony_ci 237db96d56Sopenharmony_ci.. doctest:: 247db96d56Sopenharmony_ci 257db96d56Sopenharmony_ci >>> sorted([5, 2, 3, 1, 4]) 267db96d56Sopenharmony_ci [1, 2, 3, 4, 5] 277db96d56Sopenharmony_ci 287db96d56Sopenharmony_ciYou can also use the :meth:`list.sort` method. It modifies the list 297db96d56Sopenharmony_ciin-place (and returns ``None`` to avoid confusion). Usually it's less convenient 307db96d56Sopenharmony_cithan :func:`sorted` - but if you don't need the original list, it's slightly 317db96d56Sopenharmony_cimore efficient. 327db96d56Sopenharmony_ci 337db96d56Sopenharmony_ci.. doctest:: 347db96d56Sopenharmony_ci 357db96d56Sopenharmony_ci >>> a = [5, 2, 3, 1, 4] 367db96d56Sopenharmony_ci >>> a.sort() 377db96d56Sopenharmony_ci >>> a 387db96d56Sopenharmony_ci [1, 2, 3, 4, 5] 397db96d56Sopenharmony_ci 407db96d56Sopenharmony_ciAnother difference is that the :meth:`list.sort` method is only defined for 417db96d56Sopenharmony_cilists. In contrast, the :func:`sorted` function accepts any iterable. 427db96d56Sopenharmony_ci 437db96d56Sopenharmony_ci.. doctest:: 447db96d56Sopenharmony_ci 457db96d56Sopenharmony_ci >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) 467db96d56Sopenharmony_ci [1, 2, 3, 4, 5] 477db96d56Sopenharmony_ci 487db96d56Sopenharmony_ciKey Functions 497db96d56Sopenharmony_ci============= 507db96d56Sopenharmony_ci 517db96d56Sopenharmony_ciBoth :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a 527db96d56Sopenharmony_cifunction (or other callable) to be called on each list element prior to making 537db96d56Sopenharmony_cicomparisons. 547db96d56Sopenharmony_ci 557db96d56Sopenharmony_ciFor example, here's a case-insensitive string comparison: 567db96d56Sopenharmony_ci 577db96d56Sopenharmony_ci.. doctest:: 587db96d56Sopenharmony_ci 597db96d56Sopenharmony_ci >>> sorted("This is a test string from Andrew".split(), key=str.lower) 607db96d56Sopenharmony_ci ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] 617db96d56Sopenharmony_ci 627db96d56Sopenharmony_ciThe value of the *key* parameter should be a function (or other callable) that 637db96d56Sopenharmony_citakes a single argument and returns a key to use for sorting purposes. This 647db96d56Sopenharmony_citechnique is fast because the key function is called exactly once for each 657db96d56Sopenharmony_ciinput record. 667db96d56Sopenharmony_ci 677db96d56Sopenharmony_ciA common pattern is to sort complex objects using some of the object's indices 687db96d56Sopenharmony_cias keys. For example: 697db96d56Sopenharmony_ci 707db96d56Sopenharmony_ci.. doctest:: 717db96d56Sopenharmony_ci 727db96d56Sopenharmony_ci >>> student_tuples = [ 737db96d56Sopenharmony_ci ... ('john', 'A', 15), 747db96d56Sopenharmony_ci ... ('jane', 'B', 12), 757db96d56Sopenharmony_ci ... ('dave', 'B', 10), 767db96d56Sopenharmony_ci ... ] 777db96d56Sopenharmony_ci >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age 787db96d56Sopenharmony_ci [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 797db96d56Sopenharmony_ci 807db96d56Sopenharmony_ciThe same technique works for objects with named attributes. For example: 817db96d56Sopenharmony_ci 827db96d56Sopenharmony_ci.. doctest:: 837db96d56Sopenharmony_ci 847db96d56Sopenharmony_ci >>> class Student: 857db96d56Sopenharmony_ci ... def __init__(self, name, grade, age): 867db96d56Sopenharmony_ci ... self.name = name 877db96d56Sopenharmony_ci ... self.grade = grade 887db96d56Sopenharmony_ci ... self.age = age 897db96d56Sopenharmony_ci ... def __repr__(self): 907db96d56Sopenharmony_ci ... return repr((self.name, self.grade, self.age)) 917db96d56Sopenharmony_ci 927db96d56Sopenharmony_ci >>> student_objects = [ 937db96d56Sopenharmony_ci ... Student('john', 'A', 15), 947db96d56Sopenharmony_ci ... Student('jane', 'B', 12), 957db96d56Sopenharmony_ci ... Student('dave', 'B', 10), 967db96d56Sopenharmony_ci ... ] 977db96d56Sopenharmony_ci >>> sorted(student_objects, key=lambda student: student.age) # sort by age 987db96d56Sopenharmony_ci [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 997db96d56Sopenharmony_ci 1007db96d56Sopenharmony_ciOperator Module Functions 1017db96d56Sopenharmony_ci========================= 1027db96d56Sopenharmony_ci 1037db96d56Sopenharmony_ciThe key-function patterns shown above are very common, so Python provides 1047db96d56Sopenharmony_ciconvenience functions to make accessor functions easier and faster. The 1057db96d56Sopenharmony_ci:mod:`operator` module has :func:`~operator.itemgetter`, 1067db96d56Sopenharmony_ci:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function. 1077db96d56Sopenharmony_ci 1087db96d56Sopenharmony_ciUsing those functions, the above examples become simpler and faster: 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_ci.. doctest:: 1117db96d56Sopenharmony_ci 1127db96d56Sopenharmony_ci >>> from operator import itemgetter, attrgetter 1137db96d56Sopenharmony_ci 1147db96d56Sopenharmony_ci >>> sorted(student_tuples, key=itemgetter(2)) 1157db96d56Sopenharmony_ci [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 1167db96d56Sopenharmony_ci 1177db96d56Sopenharmony_ci >>> sorted(student_objects, key=attrgetter('age')) 1187db96d56Sopenharmony_ci [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 1197db96d56Sopenharmony_ci 1207db96d56Sopenharmony_ciThe operator module functions allow multiple levels of sorting. For example, to 1217db96d56Sopenharmony_cisort by *grade* then by *age*: 1227db96d56Sopenharmony_ci 1237db96d56Sopenharmony_ci.. doctest:: 1247db96d56Sopenharmony_ci 1257db96d56Sopenharmony_ci >>> sorted(student_tuples, key=itemgetter(1,2)) 1267db96d56Sopenharmony_ci [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 1277db96d56Sopenharmony_ci 1287db96d56Sopenharmony_ci >>> sorted(student_objects, key=attrgetter('grade', 'age')) 1297db96d56Sopenharmony_ci [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 1307db96d56Sopenharmony_ci 1317db96d56Sopenharmony_ciAscending and Descending 1327db96d56Sopenharmony_ci======================== 1337db96d56Sopenharmony_ci 1347db96d56Sopenharmony_ciBoth :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a 1357db96d56Sopenharmony_ciboolean value. This is used to flag descending sorts. For example, to get the 1367db96d56Sopenharmony_cistudent data in reverse *age* order: 1377db96d56Sopenharmony_ci 1387db96d56Sopenharmony_ci.. doctest:: 1397db96d56Sopenharmony_ci 1407db96d56Sopenharmony_ci >>> sorted(student_tuples, key=itemgetter(2), reverse=True) 1417db96d56Sopenharmony_ci [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 1427db96d56Sopenharmony_ci 1437db96d56Sopenharmony_ci >>> sorted(student_objects, key=attrgetter('age'), reverse=True) 1447db96d56Sopenharmony_ci [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 1457db96d56Sopenharmony_ci 1467db96d56Sopenharmony_ciSort Stability and Complex Sorts 1477db96d56Sopenharmony_ci================================ 1487db96d56Sopenharmony_ci 1497db96d56Sopenharmony_ciSorts are guaranteed to be `stable 1507db96d56Sopenharmony_ci<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that 1517db96d56Sopenharmony_ciwhen multiple records have the same key, their original order is preserved. 1527db96d56Sopenharmony_ci 1537db96d56Sopenharmony_ci.. doctest:: 1547db96d56Sopenharmony_ci 1557db96d56Sopenharmony_ci >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 1567db96d56Sopenharmony_ci >>> sorted(data, key=itemgetter(0)) 1577db96d56Sopenharmony_ci [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] 1587db96d56Sopenharmony_ci 1597db96d56Sopenharmony_ciNotice how the two records for *blue* retain their original order so that 1607db96d56Sopenharmony_ci``('blue', 1)`` is guaranteed to precede ``('blue', 2)``. 1617db96d56Sopenharmony_ci 1627db96d56Sopenharmony_ciThis wonderful property lets you build complex sorts in a series of sorting 1637db96d56Sopenharmony_cisteps. For example, to sort the student data by descending *grade* and then 1647db96d56Sopenharmony_ciascending *age*, do the *age* sort first and then sort again using *grade*: 1657db96d56Sopenharmony_ci 1667db96d56Sopenharmony_ci.. doctest:: 1677db96d56Sopenharmony_ci 1687db96d56Sopenharmony_ci >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key 1697db96d56Sopenharmony_ci >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending 1707db96d56Sopenharmony_ci [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 1717db96d56Sopenharmony_ci 1727db96d56Sopenharmony_ciThis can be abstracted out into a wrapper function that can take a list and 1737db96d56Sopenharmony_cituples of field and order to sort them on multiple passes. 1747db96d56Sopenharmony_ci 1757db96d56Sopenharmony_ci.. doctest:: 1767db96d56Sopenharmony_ci 1777db96d56Sopenharmony_ci >>> def multisort(xs, specs): 1787db96d56Sopenharmony_ci ... for key, reverse in reversed(specs): 1797db96d56Sopenharmony_ci ... xs.sort(key=attrgetter(key), reverse=reverse) 1807db96d56Sopenharmony_ci ... return xs 1817db96d56Sopenharmony_ci 1827db96d56Sopenharmony_ci >>> multisort(list(student_objects), (('grade', True), ('age', False))) 1837db96d56Sopenharmony_ci [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 1847db96d56Sopenharmony_ci 1857db96d56Sopenharmony_ciThe `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python 1867db96d56Sopenharmony_cidoes multiple sorts efficiently because it can take advantage of any ordering 1877db96d56Sopenharmony_cialready present in a dataset. 1887db96d56Sopenharmony_ci 1897db96d56Sopenharmony_ciDecorate-Sort-Undecorate 1907db96d56Sopenharmony_ci======================== 1917db96d56Sopenharmony_ci 1927db96d56Sopenharmony_ciThis idiom is called Decorate-Sort-Undecorate after its three steps: 1937db96d56Sopenharmony_ci 1947db96d56Sopenharmony_ci* First, the initial list is decorated with new values that control the sort order. 1957db96d56Sopenharmony_ci 1967db96d56Sopenharmony_ci* Second, the decorated list is sorted. 1977db96d56Sopenharmony_ci 1987db96d56Sopenharmony_ci* Finally, the decorations are removed, creating a list that contains only the 1997db96d56Sopenharmony_ci initial values in the new order. 2007db96d56Sopenharmony_ci 2017db96d56Sopenharmony_ciFor example, to sort the student data by *grade* using the DSU approach: 2027db96d56Sopenharmony_ci 2037db96d56Sopenharmony_ci >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] 2047db96d56Sopenharmony_ci >>> decorated.sort() 2057db96d56Sopenharmony_ci >>> [student for grade, i, student in decorated] # undecorate 2067db96d56Sopenharmony_ci [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 2077db96d56Sopenharmony_ci 2087db96d56Sopenharmony_ciThis idiom works because tuples are compared lexicographically; the first items 2097db96d56Sopenharmony_ciare compared; if they are the same then the second items are compared, and so 2107db96d56Sopenharmony_cion. 2117db96d56Sopenharmony_ci 2127db96d56Sopenharmony_ciIt is not strictly necessary in all cases to include the index *i* in the 2137db96d56Sopenharmony_cidecorated list, but including it gives two benefits: 2147db96d56Sopenharmony_ci 2157db96d56Sopenharmony_ci* The sort is stable -- if two items have the same key, their order will be 2167db96d56Sopenharmony_ci preserved in the sorted list. 2177db96d56Sopenharmony_ci 2187db96d56Sopenharmony_ci* The original items do not have to be comparable because the ordering of the 2197db96d56Sopenharmony_ci decorated tuples will be determined by at most the first two items. So for 2207db96d56Sopenharmony_ci example the original list could contain complex numbers which cannot be sorted 2217db96d56Sopenharmony_ci directly. 2227db96d56Sopenharmony_ci 2237db96d56Sopenharmony_ciAnother name for this idiom is 2247db96d56Sopenharmony_ci`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\, 2257db96d56Sopenharmony_ciafter Randal L. Schwartz, who popularized it among Perl programmers. 2267db96d56Sopenharmony_ci 2277db96d56Sopenharmony_ciNow that Python sorting provides key-functions, this technique is not often needed. 2287db96d56Sopenharmony_ci 2297db96d56Sopenharmony_ciComparison Functions 2307db96d56Sopenharmony_ci==================== 2317db96d56Sopenharmony_ci 2327db96d56Sopenharmony_ciUnlike key functions that return an absolute value for sorting, a comparison 2337db96d56Sopenharmony_cifunction computes the relative ordering for two inputs. 2347db96d56Sopenharmony_ci 2357db96d56Sopenharmony_ciFor example, a `balance scale 2367db96d56Sopenharmony_ci<https://upload.wikimedia.org/wikipedia/commons/1/17/Balance_à_tabac_1850.JPG>`_ 2377db96d56Sopenharmony_cicompares two samples giving a relative ordering: lighter, equal, or heavier. 2387db96d56Sopenharmony_ciLikewise, a comparison function such as ``cmp(a, b)`` will return a negative 2397db96d56Sopenharmony_civalue for less-than, zero if the inputs are equal, or a positive value for 2407db96d56Sopenharmony_cigreater-than. 2417db96d56Sopenharmony_ci 2427db96d56Sopenharmony_ciIt is common to encounter comparison functions when translating algorithms from 2437db96d56Sopenharmony_ciother languages. Also, some libraries provide comparison functions as part of 2447db96d56Sopenharmony_citheir API. For example, :func:`locale.strcoll` is a comparison function. 2457db96d56Sopenharmony_ci 2467db96d56Sopenharmony_ciTo accommodate those situations, Python provides 2477db96d56Sopenharmony_ci:class:`functools.cmp_to_key` to wrap the comparison function 2487db96d56Sopenharmony_cito make it usable as a key function:: 2497db96d56Sopenharmony_ci 2507db96d56Sopenharmony_ci sorted(words, key=cmp_to_key(strcoll)) # locale-aware sort order 2517db96d56Sopenharmony_ci 2527db96d56Sopenharmony_ciOdds and Ends 2537db96d56Sopenharmony_ci============= 2547db96d56Sopenharmony_ci 2557db96d56Sopenharmony_ci* For locale aware sorting, use :func:`locale.strxfrm` for a key function or 2567db96d56Sopenharmony_ci :func:`locale.strcoll` for a comparison function. This is necessary 2577db96d56Sopenharmony_ci because "alphabetical" sort orderings can vary across cultures even 2587db96d56Sopenharmony_ci if the underlying alphabet is the same. 2597db96d56Sopenharmony_ci 2607db96d56Sopenharmony_ci* The *reverse* parameter still maintains sort stability (so that records with 2617db96d56Sopenharmony_ci equal keys retain the original order). Interestingly, that effect can be 2627db96d56Sopenharmony_ci simulated without the parameter by using the builtin :func:`reversed` function 2637db96d56Sopenharmony_ci twice: 2647db96d56Sopenharmony_ci 2657db96d56Sopenharmony_ci .. doctest:: 2667db96d56Sopenharmony_ci 2677db96d56Sopenharmony_ci >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 2687db96d56Sopenharmony_ci >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) 2697db96d56Sopenharmony_ci >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) 2707db96d56Sopenharmony_ci >>> assert standard_way == double_reversed 2717db96d56Sopenharmony_ci >>> standard_way 2727db96d56Sopenharmony_ci [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] 2737db96d56Sopenharmony_ci 2747db96d56Sopenharmony_ci* The sort routines use ``<`` when making comparisons 2757db96d56Sopenharmony_ci between two objects. So, it is easy to add a standard sort order to a class by 2767db96d56Sopenharmony_ci defining an :meth:`__lt__` method: 2777db96d56Sopenharmony_ci 2787db96d56Sopenharmony_ci .. doctest:: 2797db96d56Sopenharmony_ci 2807db96d56Sopenharmony_ci >>> Student.__lt__ = lambda self, other: self.age < other.age 2817db96d56Sopenharmony_ci >>> sorted(student_objects) 2827db96d56Sopenharmony_ci [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 2837db96d56Sopenharmony_ci 2847db96d56Sopenharmony_ci However, note that ``<`` can fall back to using :meth:`__gt__` if 2857db96d56Sopenharmony_ci :meth:`__lt__` is not implemented (see :func:`object.__lt__`). 2867db96d56Sopenharmony_ci 2877db96d56Sopenharmony_ci* Key functions need not depend directly on the objects being sorted. A key 2887db96d56Sopenharmony_ci function can also access external resources. For instance, if the student grades 2897db96d56Sopenharmony_ci are stored in a dictionary, they can be used to sort a separate list of student 2907db96d56Sopenharmony_ci names: 2917db96d56Sopenharmony_ci 2927db96d56Sopenharmony_ci .. doctest:: 2937db96d56Sopenharmony_ci 2947db96d56Sopenharmony_ci >>> students = ['dave', 'john', 'jane'] 2957db96d56Sopenharmony_ci >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} 2967db96d56Sopenharmony_ci >>> sorted(students, key=newgrades.__getitem__) 2977db96d56Sopenharmony_ci ['jane', 'dave', 'john'] 298