xref: /third_party/python/Doc/library/bz2.rst (revision 7db96d56)
17db96d56Sopenharmony_ci:mod:`bz2` --- Support for :program:`bzip2` compression
27db96d56Sopenharmony_ci=======================================================
37db96d56Sopenharmony_ci
47db96d56Sopenharmony_ci.. module:: bz2
57db96d56Sopenharmony_ci   :synopsis: Interfaces for bzip2 compression and decompression.
67db96d56Sopenharmony_ci
77db96d56Sopenharmony_ci.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
87db96d56Sopenharmony_ci.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
97db96d56Sopenharmony_ci.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
107db96d56Sopenharmony_ci.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
117db96d56Sopenharmony_ci
127db96d56Sopenharmony_ci**Source code:** :source:`Lib/bz2.py`
137db96d56Sopenharmony_ci
147db96d56Sopenharmony_ci--------------
157db96d56Sopenharmony_ci
167db96d56Sopenharmony_ciThis module provides a comprehensive interface for compressing and
177db96d56Sopenharmony_cidecompressing data using the bzip2 compression algorithm.
187db96d56Sopenharmony_ci
197db96d56Sopenharmony_ciThe :mod:`bz2` module contains:
207db96d56Sopenharmony_ci
217db96d56Sopenharmony_ci* The :func:`.open` function and :class:`BZ2File` class for reading and
227db96d56Sopenharmony_ci  writing compressed files.
237db96d56Sopenharmony_ci* The :class:`BZ2Compressor` and :class:`BZ2Decompressor` classes for
247db96d56Sopenharmony_ci  incremental (de)compression.
257db96d56Sopenharmony_ci* The :func:`compress` and :func:`decompress` functions for one-shot
267db96d56Sopenharmony_ci  (de)compression.
277db96d56Sopenharmony_ci
287db96d56Sopenharmony_ci
297db96d56Sopenharmony_ci(De)compression of files
307db96d56Sopenharmony_ci------------------------
317db96d56Sopenharmony_ci
327db96d56Sopenharmony_ci.. function:: open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None)
337db96d56Sopenharmony_ci
347db96d56Sopenharmony_ci   Open a bzip2-compressed file in binary or text mode, returning a :term:`file
357db96d56Sopenharmony_ci   object`.
367db96d56Sopenharmony_ci
377db96d56Sopenharmony_ci   As with the constructor for :class:`BZ2File`, the *filename* argument can be
387db96d56Sopenharmony_ci   an actual filename (a :class:`str` or :class:`bytes` object), or an existing
397db96d56Sopenharmony_ci   file object to read from or write to.
407db96d56Sopenharmony_ci
417db96d56Sopenharmony_ci   The *mode* argument can be any of ``'r'``, ``'rb'``, ``'w'``, ``'wb'``,
427db96d56Sopenharmony_ci   ``'x'``, ``'xb'``, ``'a'`` or ``'ab'`` for binary mode, or ``'rt'``,
437db96d56Sopenharmony_ci   ``'wt'``, ``'xt'``, or ``'at'`` for text mode. The default is ``'rb'``.
447db96d56Sopenharmony_ci
457db96d56Sopenharmony_ci   The *compresslevel* argument is an integer from 1 to 9, as for the
467db96d56Sopenharmony_ci   :class:`BZ2File` constructor.
477db96d56Sopenharmony_ci
487db96d56Sopenharmony_ci   For binary mode, this function is equivalent to the :class:`BZ2File`
497db96d56Sopenharmony_ci   constructor: ``BZ2File(filename, mode, compresslevel=compresslevel)``. In
507db96d56Sopenharmony_ci   this case, the *encoding*, *errors* and *newline* arguments must not be
517db96d56Sopenharmony_ci   provided.
527db96d56Sopenharmony_ci
537db96d56Sopenharmony_ci   For text mode, a :class:`BZ2File` object is created, and wrapped in an
547db96d56Sopenharmony_ci   :class:`io.TextIOWrapper` instance with the specified encoding, error
557db96d56Sopenharmony_ci   handling behavior, and line ending(s).
567db96d56Sopenharmony_ci
577db96d56Sopenharmony_ci   .. versionadded:: 3.3
587db96d56Sopenharmony_ci
597db96d56Sopenharmony_ci   .. versionchanged:: 3.4
607db96d56Sopenharmony_ci      The ``'x'`` (exclusive creation) mode was added.
617db96d56Sopenharmony_ci
627db96d56Sopenharmony_ci   .. versionchanged:: 3.6
637db96d56Sopenharmony_ci      Accepts a :term:`path-like object`.
647db96d56Sopenharmony_ci
657db96d56Sopenharmony_ci
667db96d56Sopenharmony_ci.. class:: BZ2File(filename, mode='r', *, compresslevel=9)
677db96d56Sopenharmony_ci
687db96d56Sopenharmony_ci   Open a bzip2-compressed file in binary mode.
697db96d56Sopenharmony_ci
707db96d56Sopenharmony_ci   If *filename* is a :class:`str` or :class:`bytes` object, open the named file
717db96d56Sopenharmony_ci   directly. Otherwise, *filename* should be a :term:`file object`, which will
727db96d56Sopenharmony_ci   be used to read or write the compressed data.
737db96d56Sopenharmony_ci
747db96d56Sopenharmony_ci   The *mode* argument can be either ``'r'`` for reading (default), ``'w'`` for
757db96d56Sopenharmony_ci   overwriting, ``'x'`` for exclusive creation, or ``'a'`` for appending. These
767db96d56Sopenharmony_ci   can equivalently be given as ``'rb'``, ``'wb'``, ``'xb'`` and ``'ab'``
777db96d56Sopenharmony_ci   respectively.
787db96d56Sopenharmony_ci
797db96d56Sopenharmony_ci   If *filename* is a file object (rather than an actual file name), a mode of
807db96d56Sopenharmony_ci   ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``.
817db96d56Sopenharmony_ci
827db96d56Sopenharmony_ci   If *mode* is ``'w'`` or ``'a'``, *compresslevel* can be an integer between
837db96d56Sopenharmony_ci   ``1`` and ``9`` specifying the level of compression: ``1`` produces the
847db96d56Sopenharmony_ci   least compression, and ``9`` (default) produces the most compression.
857db96d56Sopenharmony_ci
867db96d56Sopenharmony_ci   If *mode* is ``'r'``, the input file may be the concatenation of multiple
877db96d56Sopenharmony_ci   compressed streams.
887db96d56Sopenharmony_ci
897db96d56Sopenharmony_ci   :class:`BZ2File` provides all of the members specified by the
907db96d56Sopenharmony_ci   :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`.
917db96d56Sopenharmony_ci   Iteration and the :keyword:`with` statement are supported.
927db96d56Sopenharmony_ci
937db96d56Sopenharmony_ci   :class:`BZ2File` also provides the following method:
947db96d56Sopenharmony_ci
957db96d56Sopenharmony_ci   .. method:: peek([n])
967db96d56Sopenharmony_ci
977db96d56Sopenharmony_ci      Return buffered data without advancing the file position. At least one
987db96d56Sopenharmony_ci      byte of data will be returned (unless at EOF). The exact number of bytes
997db96d56Sopenharmony_ci      returned is unspecified.
1007db96d56Sopenharmony_ci
1017db96d56Sopenharmony_ci      .. note:: While calling :meth:`peek` does not change the file position of
1027db96d56Sopenharmony_ci         the :class:`BZ2File`, it may change the position of the underlying file
1037db96d56Sopenharmony_ci         object (e.g. if the :class:`BZ2File` was constructed by passing a file
1047db96d56Sopenharmony_ci         object for *filename*).
1057db96d56Sopenharmony_ci
1067db96d56Sopenharmony_ci      .. versionadded:: 3.3
1077db96d56Sopenharmony_ci
1087db96d56Sopenharmony_ci
1097db96d56Sopenharmony_ci   .. versionchanged:: 3.1
1107db96d56Sopenharmony_ci      Support for the :keyword:`with` statement was added.
1117db96d56Sopenharmony_ci
1127db96d56Sopenharmony_ci   .. versionchanged:: 3.3
1137db96d56Sopenharmony_ci      The :meth:`fileno`, :meth:`readable`, :meth:`seekable`, :meth:`writable`,
1147db96d56Sopenharmony_ci      :meth:`read1` and :meth:`readinto` methods were added.
1157db96d56Sopenharmony_ci
1167db96d56Sopenharmony_ci   .. versionchanged:: 3.3
1177db96d56Sopenharmony_ci      Support was added for *filename* being a :term:`file object` instead of an
1187db96d56Sopenharmony_ci      actual filename.
1197db96d56Sopenharmony_ci
1207db96d56Sopenharmony_ci   .. versionchanged:: 3.3
1217db96d56Sopenharmony_ci      The ``'a'`` (append) mode was added, along with support for reading
1227db96d56Sopenharmony_ci      multi-stream files.
1237db96d56Sopenharmony_ci
1247db96d56Sopenharmony_ci   .. versionchanged:: 3.4
1257db96d56Sopenharmony_ci      The ``'x'`` (exclusive creation) mode was added.
1267db96d56Sopenharmony_ci
1277db96d56Sopenharmony_ci   .. versionchanged:: 3.5
1287db96d56Sopenharmony_ci      The :meth:`~io.BufferedIOBase.read` method now accepts an argument of
1297db96d56Sopenharmony_ci      ``None``.
1307db96d56Sopenharmony_ci
1317db96d56Sopenharmony_ci   .. versionchanged:: 3.6
1327db96d56Sopenharmony_ci      Accepts a :term:`path-like object`.
1337db96d56Sopenharmony_ci
1347db96d56Sopenharmony_ci   .. versionchanged:: 3.9
1357db96d56Sopenharmony_ci      The *buffering* parameter has been removed. It was ignored and deprecated
1367db96d56Sopenharmony_ci      since Python 3.0. Pass an open file object to control how the file is
1377db96d56Sopenharmony_ci      opened.
1387db96d56Sopenharmony_ci
1397db96d56Sopenharmony_ci      The *compresslevel* parameter became keyword-only.
1407db96d56Sopenharmony_ci
1417db96d56Sopenharmony_ci   .. versionchanged:: 3.10
1427db96d56Sopenharmony_ci      This class is thread unsafe in the face of multiple simultaneous
1437db96d56Sopenharmony_ci      readers or writers, just like its equivalent classes in :mod:`gzip` and
1447db96d56Sopenharmony_ci      :mod:`lzma` have always been.
1457db96d56Sopenharmony_ci
1467db96d56Sopenharmony_ci
1477db96d56Sopenharmony_ciIncremental (de)compression
1487db96d56Sopenharmony_ci---------------------------
1497db96d56Sopenharmony_ci
1507db96d56Sopenharmony_ci.. class:: BZ2Compressor(compresslevel=9)
1517db96d56Sopenharmony_ci
1527db96d56Sopenharmony_ci   Create a new compressor object. This object may be used to compress data
1537db96d56Sopenharmony_ci   incrementally. For one-shot compression, use the :func:`compress` function
1547db96d56Sopenharmony_ci   instead.
1557db96d56Sopenharmony_ci
1567db96d56Sopenharmony_ci   *compresslevel*, if given, must be an integer between ``1`` and ``9``. The
1577db96d56Sopenharmony_ci   default is ``9``.
1587db96d56Sopenharmony_ci
1597db96d56Sopenharmony_ci   .. method:: compress(data)
1607db96d56Sopenharmony_ci
1617db96d56Sopenharmony_ci      Provide data to the compressor object. Returns a chunk of compressed data
1627db96d56Sopenharmony_ci      if possible, or an empty byte string otherwise.
1637db96d56Sopenharmony_ci
1647db96d56Sopenharmony_ci      When you have finished providing data to the compressor, call the
1657db96d56Sopenharmony_ci      :meth:`flush` method to finish the compression process.
1667db96d56Sopenharmony_ci
1677db96d56Sopenharmony_ci
1687db96d56Sopenharmony_ci   .. method:: flush()
1697db96d56Sopenharmony_ci
1707db96d56Sopenharmony_ci      Finish the compression process. Returns the compressed data left in
1717db96d56Sopenharmony_ci      internal buffers.
1727db96d56Sopenharmony_ci
1737db96d56Sopenharmony_ci      The compressor object may not be used after this method has been called.
1747db96d56Sopenharmony_ci
1757db96d56Sopenharmony_ci
1767db96d56Sopenharmony_ci.. class:: BZ2Decompressor()
1777db96d56Sopenharmony_ci
1787db96d56Sopenharmony_ci   Create a new decompressor object. This object may be used to decompress data
1797db96d56Sopenharmony_ci   incrementally. For one-shot compression, use the :func:`decompress` function
1807db96d56Sopenharmony_ci   instead.
1817db96d56Sopenharmony_ci
1827db96d56Sopenharmony_ci   .. note::
1837db96d56Sopenharmony_ci      This class does not transparently handle inputs containing multiple
1847db96d56Sopenharmony_ci      compressed streams, unlike :func:`decompress` and :class:`BZ2File`. If
1857db96d56Sopenharmony_ci      you need to decompress a multi-stream input with :class:`BZ2Decompressor`,
1867db96d56Sopenharmony_ci      you must use a new decompressor for each stream.
1877db96d56Sopenharmony_ci
1887db96d56Sopenharmony_ci   .. method:: decompress(data, max_length=-1)
1897db96d56Sopenharmony_ci
1907db96d56Sopenharmony_ci      Decompress *data* (a :term:`bytes-like object`), returning
1917db96d56Sopenharmony_ci      uncompressed data as bytes. Some of *data* may be buffered
1927db96d56Sopenharmony_ci      internally, for use in later calls to :meth:`decompress`. The
1937db96d56Sopenharmony_ci      returned data should be concatenated with the output of any
1947db96d56Sopenharmony_ci      previous calls to :meth:`decompress`.
1957db96d56Sopenharmony_ci
1967db96d56Sopenharmony_ci      If *max_length* is nonnegative, returns at most *max_length*
1977db96d56Sopenharmony_ci      bytes of decompressed data. If this limit is reached and further
1987db96d56Sopenharmony_ci      output can be produced, the :attr:`~.needs_input` attribute will
1997db96d56Sopenharmony_ci      be set to ``False``. In this case, the next call to
2007db96d56Sopenharmony_ci      :meth:`~.decompress` may provide *data* as ``b''`` to obtain
2017db96d56Sopenharmony_ci      more of the output.
2027db96d56Sopenharmony_ci
2037db96d56Sopenharmony_ci      If all of the input data was decompressed and returned (either
2047db96d56Sopenharmony_ci      because this was less than *max_length* bytes, or because
2057db96d56Sopenharmony_ci      *max_length* was negative), the :attr:`~.needs_input` attribute
2067db96d56Sopenharmony_ci      will be set to ``True``.
2077db96d56Sopenharmony_ci
2087db96d56Sopenharmony_ci      Attempting to decompress data after the end of stream is reached
2097db96d56Sopenharmony_ci      raises an :exc:`EOFError`.  Any data found after the end of the
2107db96d56Sopenharmony_ci      stream is ignored and saved in the :attr:`~.unused_data` attribute.
2117db96d56Sopenharmony_ci
2127db96d56Sopenharmony_ci      .. versionchanged:: 3.5
2137db96d56Sopenharmony_ci         Added the *max_length* parameter.
2147db96d56Sopenharmony_ci
2157db96d56Sopenharmony_ci   .. attribute:: eof
2167db96d56Sopenharmony_ci
2177db96d56Sopenharmony_ci      ``True`` if the end-of-stream marker has been reached.
2187db96d56Sopenharmony_ci
2197db96d56Sopenharmony_ci      .. versionadded:: 3.3
2207db96d56Sopenharmony_ci
2217db96d56Sopenharmony_ci
2227db96d56Sopenharmony_ci   .. attribute:: unused_data
2237db96d56Sopenharmony_ci
2247db96d56Sopenharmony_ci      Data found after the end of the compressed stream.
2257db96d56Sopenharmony_ci
2267db96d56Sopenharmony_ci      If this attribute is accessed before the end of the stream has been
2277db96d56Sopenharmony_ci      reached, its value will be ``b''``.
2287db96d56Sopenharmony_ci
2297db96d56Sopenharmony_ci   .. attribute:: needs_input
2307db96d56Sopenharmony_ci
2317db96d56Sopenharmony_ci      ``False`` if the :meth:`.decompress` method can provide more
2327db96d56Sopenharmony_ci      decompressed data before requiring new uncompressed input.
2337db96d56Sopenharmony_ci
2347db96d56Sopenharmony_ci      .. versionadded:: 3.5
2357db96d56Sopenharmony_ci
2367db96d56Sopenharmony_ci
2377db96d56Sopenharmony_ciOne-shot (de)compression
2387db96d56Sopenharmony_ci------------------------
2397db96d56Sopenharmony_ci
2407db96d56Sopenharmony_ci.. function:: compress(data, compresslevel=9)
2417db96d56Sopenharmony_ci
2427db96d56Sopenharmony_ci   Compress *data*, a :term:`bytes-like object <bytes-like object>`.
2437db96d56Sopenharmony_ci
2447db96d56Sopenharmony_ci   *compresslevel*, if given, must be an integer between ``1`` and ``9``. The
2457db96d56Sopenharmony_ci   default is ``9``.
2467db96d56Sopenharmony_ci
2477db96d56Sopenharmony_ci   For incremental compression, use a :class:`BZ2Compressor` instead.
2487db96d56Sopenharmony_ci
2497db96d56Sopenharmony_ci
2507db96d56Sopenharmony_ci.. function:: decompress(data)
2517db96d56Sopenharmony_ci
2527db96d56Sopenharmony_ci   Decompress *data*, a :term:`bytes-like object <bytes-like object>`.
2537db96d56Sopenharmony_ci
2547db96d56Sopenharmony_ci   If *data* is the concatenation of multiple compressed streams, decompress
2557db96d56Sopenharmony_ci   all of the streams.
2567db96d56Sopenharmony_ci
2577db96d56Sopenharmony_ci   For incremental decompression, use a :class:`BZ2Decompressor` instead.
2587db96d56Sopenharmony_ci
2597db96d56Sopenharmony_ci   .. versionchanged:: 3.3
2607db96d56Sopenharmony_ci      Support for multi-stream inputs was added.
2617db96d56Sopenharmony_ci
2627db96d56Sopenharmony_ci.. _bz2-usage-examples:
2637db96d56Sopenharmony_ci
2647db96d56Sopenharmony_ciExamples of usage
2657db96d56Sopenharmony_ci-----------------
2667db96d56Sopenharmony_ci
2677db96d56Sopenharmony_ciBelow are some examples of typical usage of the :mod:`bz2` module.
2687db96d56Sopenharmony_ci
2697db96d56Sopenharmony_ciUsing :func:`compress` and :func:`decompress` to demonstrate round-trip compression:
2707db96d56Sopenharmony_ci
2717db96d56Sopenharmony_ci    >>> import bz2
2727db96d56Sopenharmony_ci    >>> data = b"""\
2737db96d56Sopenharmony_ci    ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue
2747db96d56Sopenharmony_ci    ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem,
2757db96d56Sopenharmony_ci    ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus
2767db96d56Sopenharmony_ci    ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat.
2777db96d56Sopenharmony_ci    ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo
2787db96d56Sopenharmony_ci    ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum
2797db96d56Sopenharmony_ci    ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum."""
2807db96d56Sopenharmony_ci    >>> c = bz2.compress(data)
2817db96d56Sopenharmony_ci    >>> len(data) / len(c)  # Data compression ratio
2827db96d56Sopenharmony_ci    1.513595166163142
2837db96d56Sopenharmony_ci    >>> d = bz2.decompress(c)
2847db96d56Sopenharmony_ci    >>> data == d  # Check equality to original object after round-trip
2857db96d56Sopenharmony_ci    True
2867db96d56Sopenharmony_ci
2877db96d56Sopenharmony_ciUsing :class:`BZ2Compressor` for incremental compression:
2887db96d56Sopenharmony_ci
2897db96d56Sopenharmony_ci    >>> import bz2
2907db96d56Sopenharmony_ci    >>> def gen_data(chunks=10, chunksize=1000):
2917db96d56Sopenharmony_ci    ...     """Yield incremental blocks of chunksize bytes."""
2927db96d56Sopenharmony_ci    ...     for _ in range(chunks):
2937db96d56Sopenharmony_ci    ...         yield b"z" * chunksize
2947db96d56Sopenharmony_ci    ...
2957db96d56Sopenharmony_ci    >>> comp = bz2.BZ2Compressor()
2967db96d56Sopenharmony_ci    >>> out = b""
2977db96d56Sopenharmony_ci    >>> for chunk in gen_data():
2987db96d56Sopenharmony_ci    ...     # Provide data to the compressor object
2997db96d56Sopenharmony_ci    ...     out = out + comp.compress(chunk)
3007db96d56Sopenharmony_ci    ...
3017db96d56Sopenharmony_ci    >>> # Finish the compression process.  Call this once you have
3027db96d56Sopenharmony_ci    >>> # finished providing data to the compressor.
3037db96d56Sopenharmony_ci    >>> out = out + comp.flush()
3047db96d56Sopenharmony_ci
3057db96d56Sopenharmony_ciThe example above uses a very "nonrandom" stream of data
3067db96d56Sopenharmony_ci(a stream of ``b"z"`` chunks).  Random data tends to compress poorly,
3077db96d56Sopenharmony_ciwhile ordered, repetitive data usually yields a high compression ratio.
3087db96d56Sopenharmony_ci
3097db96d56Sopenharmony_ciWriting and reading a bzip2-compressed file in binary mode:
3107db96d56Sopenharmony_ci
3117db96d56Sopenharmony_ci    >>> import bz2
3127db96d56Sopenharmony_ci    >>> data = b"""\
3137db96d56Sopenharmony_ci    ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue
3147db96d56Sopenharmony_ci    ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem,
3157db96d56Sopenharmony_ci    ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus
3167db96d56Sopenharmony_ci    ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat.
3177db96d56Sopenharmony_ci    ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo
3187db96d56Sopenharmony_ci    ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum
3197db96d56Sopenharmony_ci    ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum."""
3207db96d56Sopenharmony_ci    >>> with bz2.open("myfile.bz2", "wb") as f:
3217db96d56Sopenharmony_ci    ...     # Write compressed data to file
3227db96d56Sopenharmony_ci    ...     unused = f.write(data)
3237db96d56Sopenharmony_ci    >>> with bz2.open("myfile.bz2", "rb") as f:
3247db96d56Sopenharmony_ci    ...     # Decompress data from file
3257db96d56Sopenharmony_ci    ...     content = f.read()
3267db96d56Sopenharmony_ci    >>> content == data  # Check equality to original object after round-trip
3277db96d56Sopenharmony_ci    True
3287db96d56Sopenharmony_ci
3297db96d56Sopenharmony_ci.. testcleanup::
3307db96d56Sopenharmony_ci
3317db96d56Sopenharmony_ci   import os
3327db96d56Sopenharmony_ci   os.remove("myfile.bz2")
333