xref: /third_party/python/Doc/library/xml.rst (revision 7db96d56)
17db96d56Sopenharmony_ci.. _xml:
27db96d56Sopenharmony_ci
37db96d56Sopenharmony_ciXML Processing Modules
47db96d56Sopenharmony_ci======================
57db96d56Sopenharmony_ci
67db96d56Sopenharmony_ci.. module:: xml
77db96d56Sopenharmony_ci   :synopsis: Package containing XML processing modules
87db96d56Sopenharmony_ci
97db96d56Sopenharmony_ci.. sectionauthor:: Christian Heimes <christian@python.org>
107db96d56Sopenharmony_ci.. sectionauthor:: Georg Brandl <georg@python.org>
117db96d56Sopenharmony_ci
127db96d56Sopenharmony_ci**Source code:** :source:`Lib/xml/`
137db96d56Sopenharmony_ci
147db96d56Sopenharmony_ci--------------
157db96d56Sopenharmony_ci
167db96d56Sopenharmony_ciPython's interfaces for processing XML are grouped in the ``xml`` package.
177db96d56Sopenharmony_ci
187db96d56Sopenharmony_ci.. warning::
197db96d56Sopenharmony_ci
207db96d56Sopenharmony_ci   The XML modules are not secure against erroneous or maliciously
217db96d56Sopenharmony_ci   constructed data.  If you need to parse untrusted or
227db96d56Sopenharmony_ci   unauthenticated data see the :ref:`xml-vulnerabilities` and
237db96d56Sopenharmony_ci   :ref:`defusedxml-package` sections.
247db96d56Sopenharmony_ci
257db96d56Sopenharmony_ciIt is important to note that modules in the :mod:`xml` package require that
267db96d56Sopenharmony_cithere be at least one SAX-compliant XML parser available. The Expat parser is
277db96d56Sopenharmony_ciincluded with Python, so the :mod:`xml.parsers.expat` module will always be
287db96d56Sopenharmony_ciavailable.
297db96d56Sopenharmony_ci
307db96d56Sopenharmony_ciThe documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the
317db96d56Sopenharmony_cidefinition of the Python bindings for the DOM and SAX interfaces.
327db96d56Sopenharmony_ci
337db96d56Sopenharmony_ciThe XML handling submodules are:
347db96d56Sopenharmony_ci
357db96d56Sopenharmony_ci* :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight
367db96d56Sopenharmony_ci  XML processor
377db96d56Sopenharmony_ci
387db96d56Sopenharmony_ci..
397db96d56Sopenharmony_ci
407db96d56Sopenharmony_ci* :mod:`xml.dom`: the DOM API definition
417db96d56Sopenharmony_ci* :mod:`xml.dom.minidom`: a minimal DOM implementation
427db96d56Sopenharmony_ci* :mod:`xml.dom.pulldom`: support for building partial DOM trees
437db96d56Sopenharmony_ci
447db96d56Sopenharmony_ci..
457db96d56Sopenharmony_ci
467db96d56Sopenharmony_ci* :mod:`xml.sax`: SAX2 base classes and convenience functions
477db96d56Sopenharmony_ci* :mod:`xml.parsers.expat`: the Expat parser binding
487db96d56Sopenharmony_ci
497db96d56Sopenharmony_ci
507db96d56Sopenharmony_ci.. _xml-vulnerabilities:
517db96d56Sopenharmony_ci
527db96d56Sopenharmony_ciXML vulnerabilities
537db96d56Sopenharmony_ci-------------------
547db96d56Sopenharmony_ci
557db96d56Sopenharmony_ciThe XML processing modules are not secure against maliciously constructed data.
567db96d56Sopenharmony_ciAn attacker can abuse XML features to carry out denial of service attacks,
577db96d56Sopenharmony_ciaccess local files, generate network connections to other machines, or
587db96d56Sopenharmony_cicircumvent firewalls.
597db96d56Sopenharmony_ci
607db96d56Sopenharmony_ciThe following table gives an overview of the known attacks and whether
617db96d56Sopenharmony_cithe various modules are vulnerable to them.
627db96d56Sopenharmony_ci
637db96d56Sopenharmony_ci=========================  ==================  ==================  ==================  ==================  ==================
647db96d56Sopenharmony_cikind                       sax                 etree               minidom             pulldom             xmlrpc
657db96d56Sopenharmony_ci=========================  ==================  ==================  ==================  ==================  ==================
667db96d56Sopenharmony_cibillion laughs             **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)
677db96d56Sopenharmony_ciquadratic blowup           **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)  **Vulnerable** (1)
687db96d56Sopenharmony_ciexternal entity expansion  Safe (5)            Safe (2)            Safe (3)            Safe (5)            Safe (4)
697db96d56Sopenharmony_ci`DTD`_ retrieval           Safe (5)            Safe                Safe                Safe (5)            Safe
707db96d56Sopenharmony_cidecompression bomb         Safe                Safe                Safe                Safe                **Vulnerable**
717db96d56Sopenharmony_ci=========================  ==================  ==================  ==================  ==================  ==================
727db96d56Sopenharmony_ci
737db96d56Sopenharmony_ci1. Expat 2.4.1 and newer is not vulnerable to the "billion laughs" and
747db96d56Sopenharmony_ci   "quadratic blowup" vulnerabilities. Items still listed as vulnerable due to
757db96d56Sopenharmony_ci   potential reliance on system-provided libraries. Check
767db96d56Sopenharmony_ci   :data:`pyexpat.EXPAT_VERSION`.
777db96d56Sopenharmony_ci2. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a
787db96d56Sopenharmony_ci   :exc:`ParserError` when an entity occurs.
797db96d56Sopenharmony_ci3. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
807db96d56Sopenharmony_ci   the unexpanded entity verbatim.
817db96d56Sopenharmony_ci4. :mod:`xmlrpclib` doesn't expand external entities and omits them.
827db96d56Sopenharmony_ci5. Since Python 3.7.1, external general entities are no longer processed by
837db96d56Sopenharmony_ci   default.
847db96d56Sopenharmony_ci
857db96d56Sopenharmony_ci
867db96d56Sopenharmony_cibillion laughs / exponential entity expansion
877db96d56Sopenharmony_ci  The `Billion Laughs`_ attack -- also known as exponential entity expansion --
887db96d56Sopenharmony_ci  uses multiple levels of nested entities. Each entity refers to another entity
897db96d56Sopenharmony_ci  several times, and the final entity definition contains a small string.
907db96d56Sopenharmony_ci  The exponential expansion results in several gigabytes of text and
917db96d56Sopenharmony_ci  consumes lots of memory and CPU time.
927db96d56Sopenharmony_ci
937db96d56Sopenharmony_ciquadratic blowup entity expansion
947db96d56Sopenharmony_ci  A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
957db96d56Sopenharmony_ci  entity expansion, too. Instead of nested entities it repeats one large entity
967db96d56Sopenharmony_ci  with a couple of thousand chars over and over again. The attack isn't as
977db96d56Sopenharmony_ci  efficient as the exponential case but it avoids triggering parser countermeasures
987db96d56Sopenharmony_ci  that forbid deeply nested entities.
997db96d56Sopenharmony_ci
1007db96d56Sopenharmony_ciexternal entity expansion
1017db96d56Sopenharmony_ci  Entity declarations can contain more than just text for replacement. They can
1027db96d56Sopenharmony_ci  also point to external resources or local files. The XML
1037db96d56Sopenharmony_ci  parser accesses the resource and embeds the content into the XML document.
1047db96d56Sopenharmony_ci
1057db96d56Sopenharmony_ci`DTD`_ retrieval
1067db96d56Sopenharmony_ci  Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type
1077db96d56Sopenharmony_ci  definitions from remote or local locations. The feature has similar
1087db96d56Sopenharmony_ci  implications as the external entity expansion issue.
1097db96d56Sopenharmony_ci
1107db96d56Sopenharmony_cidecompression bomb
1117db96d56Sopenharmony_ci  Decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
1127db96d56Sopenharmony_ci  that can parse compressed XML streams such as gzipped HTTP streams or
1137db96d56Sopenharmony_ci  LZMA-compressed
1147db96d56Sopenharmony_ci  files. For an attacker it can reduce the amount of transmitted data by three
1157db96d56Sopenharmony_ci  magnitudes or more.
1167db96d56Sopenharmony_ci
1177db96d56Sopenharmony_ciThe documentation for `defusedxml`_ on PyPI has further information about
1187db96d56Sopenharmony_ciall known attack vectors with examples and references.
1197db96d56Sopenharmony_ci
1207db96d56Sopenharmony_ci.. _defusedxml-package:
1217db96d56Sopenharmony_ci
1227db96d56Sopenharmony_ciThe :mod:`defusedxml` Package
1237db96d56Sopenharmony_ci------------------------------------------------------
1247db96d56Sopenharmony_ci
1257db96d56Sopenharmony_ci`defusedxml`_ is a pure Python package with modified subclasses of all stdlib
1267db96d56Sopenharmony_ciXML parsers that prevent any potentially malicious operation. Use of this
1277db96d56Sopenharmony_cipackage is recommended for any server code that parses untrusted XML data. The
1287db96d56Sopenharmony_cipackage also ships with example exploits and extended documentation on more
1297db96d56Sopenharmony_ciXML exploits such as XPath injection.
1307db96d56Sopenharmony_ci
1317db96d56Sopenharmony_ci
1327db96d56Sopenharmony_ci.. _defusedxml: https://pypi.org/project/defusedxml/
1337db96d56Sopenharmony_ci.. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs
1347db96d56Sopenharmony_ci.. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb
1357db96d56Sopenharmony_ci.. _DTD: https://en.wikipedia.org/wiki/Document_type_definition
136