17db96d56Sopenharmony_ci:mod:`bz2` --- Support for :program:`bzip2` compression 27db96d56Sopenharmony_ci======================================================= 37db96d56Sopenharmony_ci 47db96d56Sopenharmony_ci.. module:: bz2 57db96d56Sopenharmony_ci :synopsis: Interfaces for bzip2 compression and decompression. 67db96d56Sopenharmony_ci 77db96d56Sopenharmony_ci.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 87db96d56Sopenharmony_ci.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> 97db96d56Sopenharmony_ci.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 107db96d56Sopenharmony_ci.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> 117db96d56Sopenharmony_ci 127db96d56Sopenharmony_ci**Source code:** :source:`Lib/bz2.py` 137db96d56Sopenharmony_ci 147db96d56Sopenharmony_ci-------------- 157db96d56Sopenharmony_ci 167db96d56Sopenharmony_ciThis module provides a comprehensive interface for compressing and 177db96d56Sopenharmony_cidecompressing data using the bzip2 compression algorithm. 187db96d56Sopenharmony_ci 197db96d56Sopenharmony_ciThe :mod:`bz2` module contains: 207db96d56Sopenharmony_ci 217db96d56Sopenharmony_ci* The :func:`.open` function and :class:`BZ2File` class for reading and 227db96d56Sopenharmony_ci writing compressed files. 237db96d56Sopenharmony_ci* The :class:`BZ2Compressor` and :class:`BZ2Decompressor` classes for 247db96d56Sopenharmony_ci incremental (de)compression. 257db96d56Sopenharmony_ci* The :func:`compress` and :func:`decompress` functions for one-shot 267db96d56Sopenharmony_ci (de)compression. 277db96d56Sopenharmony_ci 287db96d56Sopenharmony_ci 297db96d56Sopenharmony_ci(De)compression of files 307db96d56Sopenharmony_ci------------------------ 317db96d56Sopenharmony_ci 327db96d56Sopenharmony_ci.. function:: open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None) 337db96d56Sopenharmony_ci 347db96d56Sopenharmony_ci Open a bzip2-compressed file in binary or text mode, returning a :term:`file 357db96d56Sopenharmony_ci object`. 367db96d56Sopenharmony_ci 377db96d56Sopenharmony_ci As with the constructor for :class:`BZ2File`, the *filename* argument can be 387db96d56Sopenharmony_ci an actual filename (a :class:`str` or :class:`bytes` object), or an existing 397db96d56Sopenharmony_ci file object to read from or write to. 407db96d56Sopenharmony_ci 417db96d56Sopenharmony_ci The *mode* argument can be any of ``'r'``, ``'rb'``, ``'w'``, ``'wb'``, 427db96d56Sopenharmony_ci ``'x'``, ``'xb'``, ``'a'`` or ``'ab'`` for binary mode, or ``'rt'``, 437db96d56Sopenharmony_ci ``'wt'``, ``'xt'``, or ``'at'`` for text mode. The default is ``'rb'``. 447db96d56Sopenharmony_ci 457db96d56Sopenharmony_ci The *compresslevel* argument is an integer from 1 to 9, as for the 467db96d56Sopenharmony_ci :class:`BZ2File` constructor. 477db96d56Sopenharmony_ci 487db96d56Sopenharmony_ci For binary mode, this function is equivalent to the :class:`BZ2File` 497db96d56Sopenharmony_ci constructor: ``BZ2File(filename, mode, compresslevel=compresslevel)``. In 507db96d56Sopenharmony_ci this case, the *encoding*, *errors* and *newline* arguments must not be 517db96d56Sopenharmony_ci provided. 527db96d56Sopenharmony_ci 537db96d56Sopenharmony_ci For text mode, a :class:`BZ2File` object is created, and wrapped in an 547db96d56Sopenharmony_ci :class:`io.TextIOWrapper` instance with the specified encoding, error 557db96d56Sopenharmony_ci handling behavior, and line ending(s). 567db96d56Sopenharmony_ci 577db96d56Sopenharmony_ci .. versionadded:: 3.3 587db96d56Sopenharmony_ci 597db96d56Sopenharmony_ci .. versionchanged:: 3.4 607db96d56Sopenharmony_ci The ``'x'`` (exclusive creation) mode was added. 617db96d56Sopenharmony_ci 627db96d56Sopenharmony_ci .. versionchanged:: 3.6 637db96d56Sopenharmony_ci Accepts a :term:`path-like object`. 647db96d56Sopenharmony_ci 657db96d56Sopenharmony_ci 667db96d56Sopenharmony_ci.. class:: BZ2File(filename, mode='r', *, compresslevel=9) 677db96d56Sopenharmony_ci 687db96d56Sopenharmony_ci Open a bzip2-compressed file in binary mode. 697db96d56Sopenharmony_ci 707db96d56Sopenharmony_ci If *filename* is a :class:`str` or :class:`bytes` object, open the named file 717db96d56Sopenharmony_ci directly. Otherwise, *filename* should be a :term:`file object`, which will 727db96d56Sopenharmony_ci be used to read or write the compressed data. 737db96d56Sopenharmony_ci 747db96d56Sopenharmony_ci The *mode* argument can be either ``'r'`` for reading (default), ``'w'`` for 757db96d56Sopenharmony_ci overwriting, ``'x'`` for exclusive creation, or ``'a'`` for appending. These 767db96d56Sopenharmony_ci can equivalently be given as ``'rb'``, ``'wb'``, ``'xb'`` and ``'ab'`` 777db96d56Sopenharmony_ci respectively. 787db96d56Sopenharmony_ci 797db96d56Sopenharmony_ci If *filename* is a file object (rather than an actual file name), a mode of 807db96d56Sopenharmony_ci ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``. 817db96d56Sopenharmony_ci 827db96d56Sopenharmony_ci If *mode* is ``'w'`` or ``'a'``, *compresslevel* can be an integer between 837db96d56Sopenharmony_ci ``1`` and ``9`` specifying the level of compression: ``1`` produces the 847db96d56Sopenharmony_ci least compression, and ``9`` (default) produces the most compression. 857db96d56Sopenharmony_ci 867db96d56Sopenharmony_ci If *mode* is ``'r'``, the input file may be the concatenation of multiple 877db96d56Sopenharmony_ci compressed streams. 887db96d56Sopenharmony_ci 897db96d56Sopenharmony_ci :class:`BZ2File` provides all of the members specified by the 907db96d56Sopenharmony_ci :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`. 917db96d56Sopenharmony_ci Iteration and the :keyword:`with` statement are supported. 927db96d56Sopenharmony_ci 937db96d56Sopenharmony_ci :class:`BZ2File` also provides the following method: 947db96d56Sopenharmony_ci 957db96d56Sopenharmony_ci .. method:: peek([n]) 967db96d56Sopenharmony_ci 977db96d56Sopenharmony_ci Return buffered data without advancing the file position. At least one 987db96d56Sopenharmony_ci byte of data will be returned (unless at EOF). The exact number of bytes 997db96d56Sopenharmony_ci returned is unspecified. 1007db96d56Sopenharmony_ci 1017db96d56Sopenharmony_ci .. note:: While calling :meth:`peek` does not change the file position of 1027db96d56Sopenharmony_ci the :class:`BZ2File`, it may change the position of the underlying file 1037db96d56Sopenharmony_ci object (e.g. if the :class:`BZ2File` was constructed by passing a file 1047db96d56Sopenharmony_ci object for *filename*). 1057db96d56Sopenharmony_ci 1067db96d56Sopenharmony_ci .. versionadded:: 3.3 1077db96d56Sopenharmony_ci 1087db96d56Sopenharmony_ci 1097db96d56Sopenharmony_ci .. versionchanged:: 3.1 1107db96d56Sopenharmony_ci Support for the :keyword:`with` statement was added. 1117db96d56Sopenharmony_ci 1127db96d56Sopenharmony_ci .. versionchanged:: 3.3 1137db96d56Sopenharmony_ci The :meth:`fileno`, :meth:`readable`, :meth:`seekable`, :meth:`writable`, 1147db96d56Sopenharmony_ci :meth:`read1` and :meth:`readinto` methods were added. 1157db96d56Sopenharmony_ci 1167db96d56Sopenharmony_ci .. versionchanged:: 3.3 1177db96d56Sopenharmony_ci Support was added for *filename* being a :term:`file object` instead of an 1187db96d56Sopenharmony_ci actual filename. 1197db96d56Sopenharmony_ci 1207db96d56Sopenharmony_ci .. versionchanged:: 3.3 1217db96d56Sopenharmony_ci The ``'a'`` (append) mode was added, along with support for reading 1227db96d56Sopenharmony_ci multi-stream files. 1237db96d56Sopenharmony_ci 1247db96d56Sopenharmony_ci .. versionchanged:: 3.4 1257db96d56Sopenharmony_ci The ``'x'`` (exclusive creation) mode was added. 1267db96d56Sopenharmony_ci 1277db96d56Sopenharmony_ci .. versionchanged:: 3.5 1287db96d56Sopenharmony_ci The :meth:`~io.BufferedIOBase.read` method now accepts an argument of 1297db96d56Sopenharmony_ci ``None``. 1307db96d56Sopenharmony_ci 1317db96d56Sopenharmony_ci .. versionchanged:: 3.6 1327db96d56Sopenharmony_ci Accepts a :term:`path-like object`. 1337db96d56Sopenharmony_ci 1347db96d56Sopenharmony_ci .. versionchanged:: 3.9 1357db96d56Sopenharmony_ci The *buffering* parameter has been removed. It was ignored and deprecated 1367db96d56Sopenharmony_ci since Python 3.0. Pass an open file object to control how the file is 1377db96d56Sopenharmony_ci opened. 1387db96d56Sopenharmony_ci 1397db96d56Sopenharmony_ci The *compresslevel* parameter became keyword-only. 1407db96d56Sopenharmony_ci 1417db96d56Sopenharmony_ci .. versionchanged:: 3.10 1427db96d56Sopenharmony_ci This class is thread unsafe in the face of multiple simultaneous 1437db96d56Sopenharmony_ci readers or writers, just like its equivalent classes in :mod:`gzip` and 1447db96d56Sopenharmony_ci :mod:`lzma` have always been. 1457db96d56Sopenharmony_ci 1467db96d56Sopenharmony_ci 1477db96d56Sopenharmony_ciIncremental (de)compression 1487db96d56Sopenharmony_ci--------------------------- 1497db96d56Sopenharmony_ci 1507db96d56Sopenharmony_ci.. class:: BZ2Compressor(compresslevel=9) 1517db96d56Sopenharmony_ci 1527db96d56Sopenharmony_ci Create a new compressor object. This object may be used to compress data 1537db96d56Sopenharmony_ci incrementally. For one-shot compression, use the :func:`compress` function 1547db96d56Sopenharmony_ci instead. 1557db96d56Sopenharmony_ci 1567db96d56Sopenharmony_ci *compresslevel*, if given, must be an integer between ``1`` and ``9``. The 1577db96d56Sopenharmony_ci default is ``9``. 1587db96d56Sopenharmony_ci 1597db96d56Sopenharmony_ci .. method:: compress(data) 1607db96d56Sopenharmony_ci 1617db96d56Sopenharmony_ci Provide data to the compressor object. Returns a chunk of compressed data 1627db96d56Sopenharmony_ci if possible, or an empty byte string otherwise. 1637db96d56Sopenharmony_ci 1647db96d56Sopenharmony_ci When you have finished providing data to the compressor, call the 1657db96d56Sopenharmony_ci :meth:`flush` method to finish the compression process. 1667db96d56Sopenharmony_ci 1677db96d56Sopenharmony_ci 1687db96d56Sopenharmony_ci .. method:: flush() 1697db96d56Sopenharmony_ci 1707db96d56Sopenharmony_ci Finish the compression process. Returns the compressed data left in 1717db96d56Sopenharmony_ci internal buffers. 1727db96d56Sopenharmony_ci 1737db96d56Sopenharmony_ci The compressor object may not be used after this method has been called. 1747db96d56Sopenharmony_ci 1757db96d56Sopenharmony_ci 1767db96d56Sopenharmony_ci.. class:: BZ2Decompressor() 1777db96d56Sopenharmony_ci 1787db96d56Sopenharmony_ci Create a new decompressor object. This object may be used to decompress data 1797db96d56Sopenharmony_ci incrementally. For one-shot compression, use the :func:`decompress` function 1807db96d56Sopenharmony_ci instead. 1817db96d56Sopenharmony_ci 1827db96d56Sopenharmony_ci .. note:: 1837db96d56Sopenharmony_ci This class does not transparently handle inputs containing multiple 1847db96d56Sopenharmony_ci compressed streams, unlike :func:`decompress` and :class:`BZ2File`. If 1857db96d56Sopenharmony_ci you need to decompress a multi-stream input with :class:`BZ2Decompressor`, 1867db96d56Sopenharmony_ci you must use a new decompressor for each stream. 1877db96d56Sopenharmony_ci 1887db96d56Sopenharmony_ci .. method:: decompress(data, max_length=-1) 1897db96d56Sopenharmony_ci 1907db96d56Sopenharmony_ci Decompress *data* (a :term:`bytes-like object`), returning 1917db96d56Sopenharmony_ci uncompressed data as bytes. Some of *data* may be buffered 1927db96d56Sopenharmony_ci internally, for use in later calls to :meth:`decompress`. The 1937db96d56Sopenharmony_ci returned data should be concatenated with the output of any 1947db96d56Sopenharmony_ci previous calls to :meth:`decompress`. 1957db96d56Sopenharmony_ci 1967db96d56Sopenharmony_ci If *max_length* is nonnegative, returns at most *max_length* 1977db96d56Sopenharmony_ci bytes of decompressed data. If this limit is reached and further 1987db96d56Sopenharmony_ci output can be produced, the :attr:`~.needs_input` attribute will 1997db96d56Sopenharmony_ci be set to ``False``. In this case, the next call to 2007db96d56Sopenharmony_ci :meth:`~.decompress` may provide *data* as ``b''`` to obtain 2017db96d56Sopenharmony_ci more of the output. 2027db96d56Sopenharmony_ci 2037db96d56Sopenharmony_ci If all of the input data was decompressed and returned (either 2047db96d56Sopenharmony_ci because this was less than *max_length* bytes, or because 2057db96d56Sopenharmony_ci *max_length* was negative), the :attr:`~.needs_input` attribute 2067db96d56Sopenharmony_ci will be set to ``True``. 2077db96d56Sopenharmony_ci 2087db96d56Sopenharmony_ci Attempting to decompress data after the end of stream is reached 2097db96d56Sopenharmony_ci raises an :exc:`EOFError`. Any data found after the end of the 2107db96d56Sopenharmony_ci stream is ignored and saved in the :attr:`~.unused_data` attribute. 2117db96d56Sopenharmony_ci 2127db96d56Sopenharmony_ci .. versionchanged:: 3.5 2137db96d56Sopenharmony_ci Added the *max_length* parameter. 2147db96d56Sopenharmony_ci 2157db96d56Sopenharmony_ci .. attribute:: eof 2167db96d56Sopenharmony_ci 2177db96d56Sopenharmony_ci ``True`` if the end-of-stream marker has been reached. 2187db96d56Sopenharmony_ci 2197db96d56Sopenharmony_ci .. versionadded:: 3.3 2207db96d56Sopenharmony_ci 2217db96d56Sopenharmony_ci 2227db96d56Sopenharmony_ci .. attribute:: unused_data 2237db96d56Sopenharmony_ci 2247db96d56Sopenharmony_ci Data found after the end of the compressed stream. 2257db96d56Sopenharmony_ci 2267db96d56Sopenharmony_ci If this attribute is accessed before the end of the stream has been 2277db96d56Sopenharmony_ci reached, its value will be ``b''``. 2287db96d56Sopenharmony_ci 2297db96d56Sopenharmony_ci .. attribute:: needs_input 2307db96d56Sopenharmony_ci 2317db96d56Sopenharmony_ci ``False`` if the :meth:`.decompress` method can provide more 2327db96d56Sopenharmony_ci decompressed data before requiring new uncompressed input. 2337db96d56Sopenharmony_ci 2347db96d56Sopenharmony_ci .. versionadded:: 3.5 2357db96d56Sopenharmony_ci 2367db96d56Sopenharmony_ci 2377db96d56Sopenharmony_ciOne-shot (de)compression 2387db96d56Sopenharmony_ci------------------------ 2397db96d56Sopenharmony_ci 2407db96d56Sopenharmony_ci.. function:: compress(data, compresslevel=9) 2417db96d56Sopenharmony_ci 2427db96d56Sopenharmony_ci Compress *data*, a :term:`bytes-like object <bytes-like object>`. 2437db96d56Sopenharmony_ci 2447db96d56Sopenharmony_ci *compresslevel*, if given, must be an integer between ``1`` and ``9``. The 2457db96d56Sopenharmony_ci default is ``9``. 2467db96d56Sopenharmony_ci 2477db96d56Sopenharmony_ci For incremental compression, use a :class:`BZ2Compressor` instead. 2487db96d56Sopenharmony_ci 2497db96d56Sopenharmony_ci 2507db96d56Sopenharmony_ci.. function:: decompress(data) 2517db96d56Sopenharmony_ci 2527db96d56Sopenharmony_ci Decompress *data*, a :term:`bytes-like object <bytes-like object>`. 2537db96d56Sopenharmony_ci 2547db96d56Sopenharmony_ci If *data* is the concatenation of multiple compressed streams, decompress 2557db96d56Sopenharmony_ci all of the streams. 2567db96d56Sopenharmony_ci 2577db96d56Sopenharmony_ci For incremental decompression, use a :class:`BZ2Decompressor` instead. 2587db96d56Sopenharmony_ci 2597db96d56Sopenharmony_ci .. versionchanged:: 3.3 2607db96d56Sopenharmony_ci Support for multi-stream inputs was added. 2617db96d56Sopenharmony_ci 2627db96d56Sopenharmony_ci.. _bz2-usage-examples: 2637db96d56Sopenharmony_ci 2647db96d56Sopenharmony_ciExamples of usage 2657db96d56Sopenharmony_ci----------------- 2667db96d56Sopenharmony_ci 2677db96d56Sopenharmony_ciBelow are some examples of typical usage of the :mod:`bz2` module. 2687db96d56Sopenharmony_ci 2697db96d56Sopenharmony_ciUsing :func:`compress` and :func:`decompress` to demonstrate round-trip compression: 2707db96d56Sopenharmony_ci 2717db96d56Sopenharmony_ci >>> import bz2 2727db96d56Sopenharmony_ci >>> data = b"""\ 2737db96d56Sopenharmony_ci ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue 2747db96d56Sopenharmony_ci ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem, 2757db96d56Sopenharmony_ci ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus 2767db96d56Sopenharmony_ci ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat. 2777db96d56Sopenharmony_ci ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo 2787db96d56Sopenharmony_ci ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum 2797db96d56Sopenharmony_ci ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum.""" 2807db96d56Sopenharmony_ci >>> c = bz2.compress(data) 2817db96d56Sopenharmony_ci >>> len(data) / len(c) # Data compression ratio 2827db96d56Sopenharmony_ci 1.513595166163142 2837db96d56Sopenharmony_ci >>> d = bz2.decompress(c) 2847db96d56Sopenharmony_ci >>> data == d # Check equality to original object after round-trip 2857db96d56Sopenharmony_ci True 2867db96d56Sopenharmony_ci 2877db96d56Sopenharmony_ciUsing :class:`BZ2Compressor` for incremental compression: 2887db96d56Sopenharmony_ci 2897db96d56Sopenharmony_ci >>> import bz2 2907db96d56Sopenharmony_ci >>> def gen_data(chunks=10, chunksize=1000): 2917db96d56Sopenharmony_ci ... """Yield incremental blocks of chunksize bytes.""" 2927db96d56Sopenharmony_ci ... for _ in range(chunks): 2937db96d56Sopenharmony_ci ... yield b"z" * chunksize 2947db96d56Sopenharmony_ci ... 2957db96d56Sopenharmony_ci >>> comp = bz2.BZ2Compressor() 2967db96d56Sopenharmony_ci >>> out = b"" 2977db96d56Sopenharmony_ci >>> for chunk in gen_data(): 2987db96d56Sopenharmony_ci ... # Provide data to the compressor object 2997db96d56Sopenharmony_ci ... out = out + comp.compress(chunk) 3007db96d56Sopenharmony_ci ... 3017db96d56Sopenharmony_ci >>> # Finish the compression process. Call this once you have 3027db96d56Sopenharmony_ci >>> # finished providing data to the compressor. 3037db96d56Sopenharmony_ci >>> out = out + comp.flush() 3047db96d56Sopenharmony_ci 3057db96d56Sopenharmony_ciThe example above uses a very "nonrandom" stream of data 3067db96d56Sopenharmony_ci(a stream of ``b"z"`` chunks). Random data tends to compress poorly, 3077db96d56Sopenharmony_ciwhile ordered, repetitive data usually yields a high compression ratio. 3087db96d56Sopenharmony_ci 3097db96d56Sopenharmony_ciWriting and reading a bzip2-compressed file in binary mode: 3107db96d56Sopenharmony_ci 3117db96d56Sopenharmony_ci >>> import bz2 3127db96d56Sopenharmony_ci >>> data = b"""\ 3137db96d56Sopenharmony_ci ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue 3147db96d56Sopenharmony_ci ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem, 3157db96d56Sopenharmony_ci ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus 3167db96d56Sopenharmony_ci ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat. 3177db96d56Sopenharmony_ci ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo 3187db96d56Sopenharmony_ci ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum 3197db96d56Sopenharmony_ci ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum.""" 3207db96d56Sopenharmony_ci >>> with bz2.open("myfile.bz2", "wb") as f: 3217db96d56Sopenharmony_ci ... # Write compressed data to file 3227db96d56Sopenharmony_ci ... unused = f.write(data) 3237db96d56Sopenharmony_ci >>> with bz2.open("myfile.bz2", "rb") as f: 3247db96d56Sopenharmony_ci ... # Decompress data from file 3257db96d56Sopenharmony_ci ... content = f.read() 3267db96d56Sopenharmony_ci >>> content == data # Check equality to original object after round-trip 3277db96d56Sopenharmony_ci True 3287db96d56Sopenharmony_ci 3297db96d56Sopenharmony_ci.. testcleanup:: 3307db96d56Sopenharmony_ci 3317db96d56Sopenharmony_ci import os 3327db96d56Sopenharmony_ci os.remove("myfile.bz2") 333