17db96d56Sopenharmony_ci.. _xml: 27db96d56Sopenharmony_ci 37db96d56Sopenharmony_ciXML Processing Modules 47db96d56Sopenharmony_ci====================== 57db96d56Sopenharmony_ci 67db96d56Sopenharmony_ci.. module:: xml 77db96d56Sopenharmony_ci :synopsis: Package containing XML processing modules 87db96d56Sopenharmony_ci 97db96d56Sopenharmony_ci.. sectionauthor:: Christian Heimes <christian@python.org> 107db96d56Sopenharmony_ci.. sectionauthor:: Georg Brandl <georg@python.org> 117db96d56Sopenharmony_ci 127db96d56Sopenharmony_ci**Source code:** :source:`Lib/xml/` 137db96d56Sopenharmony_ci 147db96d56Sopenharmony_ci-------------- 157db96d56Sopenharmony_ci 167db96d56Sopenharmony_ciPython's interfaces for processing XML are grouped in the ``xml`` package. 177db96d56Sopenharmony_ci 187db96d56Sopenharmony_ci.. warning:: 197db96d56Sopenharmony_ci 207db96d56Sopenharmony_ci The XML modules are not secure against erroneous or maliciously 217db96d56Sopenharmony_ci constructed data. If you need to parse untrusted or 227db96d56Sopenharmony_ci unauthenticated data see the :ref:`xml-vulnerabilities` and 237db96d56Sopenharmony_ci :ref:`defusedxml-package` sections. 247db96d56Sopenharmony_ci 257db96d56Sopenharmony_ciIt is important to note that modules in the :mod:`xml` package require that 267db96d56Sopenharmony_cithere be at least one SAX-compliant XML parser available. The Expat parser is 277db96d56Sopenharmony_ciincluded with Python, so the :mod:`xml.parsers.expat` module will always be 287db96d56Sopenharmony_ciavailable. 297db96d56Sopenharmony_ci 307db96d56Sopenharmony_ciThe documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the 317db96d56Sopenharmony_cidefinition of the Python bindings for the DOM and SAX interfaces. 327db96d56Sopenharmony_ci 337db96d56Sopenharmony_ciThe XML handling submodules are: 347db96d56Sopenharmony_ci 357db96d56Sopenharmony_ci* :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight 367db96d56Sopenharmony_ci XML processor 377db96d56Sopenharmony_ci 387db96d56Sopenharmony_ci.. 397db96d56Sopenharmony_ci 407db96d56Sopenharmony_ci* :mod:`xml.dom`: the DOM API definition 417db96d56Sopenharmony_ci* :mod:`xml.dom.minidom`: a minimal DOM implementation 427db96d56Sopenharmony_ci* :mod:`xml.dom.pulldom`: support for building partial DOM trees 437db96d56Sopenharmony_ci 447db96d56Sopenharmony_ci.. 457db96d56Sopenharmony_ci 467db96d56Sopenharmony_ci* :mod:`xml.sax`: SAX2 base classes and convenience functions 477db96d56Sopenharmony_ci* :mod:`xml.parsers.expat`: the Expat parser binding 487db96d56Sopenharmony_ci 497db96d56Sopenharmony_ci 507db96d56Sopenharmony_ci.. _xml-vulnerabilities: 517db96d56Sopenharmony_ci 527db96d56Sopenharmony_ciXML vulnerabilities 537db96d56Sopenharmony_ci------------------- 547db96d56Sopenharmony_ci 557db96d56Sopenharmony_ciThe XML processing modules are not secure against maliciously constructed data. 567db96d56Sopenharmony_ciAn attacker can abuse XML features to carry out denial of service attacks, 577db96d56Sopenharmony_ciaccess local files, generate network connections to other machines, or 587db96d56Sopenharmony_cicircumvent firewalls. 597db96d56Sopenharmony_ci 607db96d56Sopenharmony_ciThe following table gives an overview of the known attacks and whether 617db96d56Sopenharmony_cithe various modules are vulnerable to them. 627db96d56Sopenharmony_ci 637db96d56Sopenharmony_ci========================= ================== ================== ================== ================== ================== 647db96d56Sopenharmony_cikind sax etree minidom pulldom xmlrpc 657db96d56Sopenharmony_ci========================= ================== ================== ================== ================== ================== 667db96d56Sopenharmony_cibillion laughs **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) 677db96d56Sopenharmony_ciquadratic blowup **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) **Vulnerable** (1) 687db96d56Sopenharmony_ciexternal entity expansion Safe (5) Safe (2) Safe (3) Safe (5) Safe (4) 697db96d56Sopenharmony_ci`DTD`_ retrieval Safe (5) Safe Safe Safe (5) Safe 707db96d56Sopenharmony_cidecompression bomb Safe Safe Safe Safe **Vulnerable** 717db96d56Sopenharmony_ci========================= ================== ================== ================== ================== ================== 727db96d56Sopenharmony_ci 737db96d56Sopenharmony_ci1. Expat 2.4.1 and newer is not vulnerable to the "billion laughs" and 747db96d56Sopenharmony_ci "quadratic blowup" vulnerabilities. Items still listed as vulnerable due to 757db96d56Sopenharmony_ci potential reliance on system-provided libraries. Check 767db96d56Sopenharmony_ci :data:`pyexpat.EXPAT_VERSION`. 777db96d56Sopenharmony_ci2. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a 787db96d56Sopenharmony_ci :exc:`ParserError` when an entity occurs. 797db96d56Sopenharmony_ci3. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns 807db96d56Sopenharmony_ci the unexpanded entity verbatim. 817db96d56Sopenharmony_ci4. :mod:`xmlrpclib` doesn't expand external entities and omits them. 827db96d56Sopenharmony_ci5. Since Python 3.7.1, external general entities are no longer processed by 837db96d56Sopenharmony_ci default. 847db96d56Sopenharmony_ci 857db96d56Sopenharmony_ci 867db96d56Sopenharmony_cibillion laughs / exponential entity expansion 877db96d56Sopenharmony_ci The `Billion Laughs`_ attack -- also known as exponential entity expansion -- 887db96d56Sopenharmony_ci uses multiple levels of nested entities. Each entity refers to another entity 897db96d56Sopenharmony_ci several times, and the final entity definition contains a small string. 907db96d56Sopenharmony_ci The exponential expansion results in several gigabytes of text and 917db96d56Sopenharmony_ci consumes lots of memory and CPU time. 927db96d56Sopenharmony_ci 937db96d56Sopenharmony_ciquadratic blowup entity expansion 947db96d56Sopenharmony_ci A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses 957db96d56Sopenharmony_ci entity expansion, too. Instead of nested entities it repeats one large entity 967db96d56Sopenharmony_ci with a couple of thousand chars over and over again. The attack isn't as 977db96d56Sopenharmony_ci efficient as the exponential case but it avoids triggering parser countermeasures 987db96d56Sopenharmony_ci that forbid deeply nested entities. 997db96d56Sopenharmony_ci 1007db96d56Sopenharmony_ciexternal entity expansion 1017db96d56Sopenharmony_ci Entity declarations can contain more than just text for replacement. They can 1027db96d56Sopenharmony_ci also point to external resources or local files. The XML 1037db96d56Sopenharmony_ci parser accesses the resource and embeds the content into the XML document. 1047db96d56Sopenharmony_ci 1057db96d56Sopenharmony_ci`DTD`_ retrieval 1067db96d56Sopenharmony_ci Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type 1077db96d56Sopenharmony_ci definitions from remote or local locations. The feature has similar 1087db96d56Sopenharmony_ci implications as the external entity expansion issue. 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_cidecompression bomb 1117db96d56Sopenharmony_ci Decompression bombs (aka `ZIP bomb`_) apply to all XML libraries 1127db96d56Sopenharmony_ci that can parse compressed XML streams such as gzipped HTTP streams or 1137db96d56Sopenharmony_ci LZMA-compressed 1147db96d56Sopenharmony_ci files. For an attacker it can reduce the amount of transmitted data by three 1157db96d56Sopenharmony_ci magnitudes or more. 1167db96d56Sopenharmony_ci 1177db96d56Sopenharmony_ciThe documentation for `defusedxml`_ on PyPI has further information about 1187db96d56Sopenharmony_ciall known attack vectors with examples and references. 1197db96d56Sopenharmony_ci 1207db96d56Sopenharmony_ci.. _defusedxml-package: 1217db96d56Sopenharmony_ci 1227db96d56Sopenharmony_ciThe :mod:`defusedxml` Package 1237db96d56Sopenharmony_ci------------------------------------------------------ 1247db96d56Sopenharmony_ci 1257db96d56Sopenharmony_ci`defusedxml`_ is a pure Python package with modified subclasses of all stdlib 1267db96d56Sopenharmony_ciXML parsers that prevent any potentially malicious operation. Use of this 1277db96d56Sopenharmony_cipackage is recommended for any server code that parses untrusted XML data. The 1287db96d56Sopenharmony_cipackage also ships with example exploits and extended documentation on more 1297db96d56Sopenharmony_ciXML exploits such as XPath injection. 1307db96d56Sopenharmony_ci 1317db96d56Sopenharmony_ci 1327db96d56Sopenharmony_ci.. _defusedxml: https://pypi.org/project/defusedxml/ 1337db96d56Sopenharmony_ci.. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs 1347db96d56Sopenharmony_ci.. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb 1357db96d56Sopenharmony_ci.. _DTD: https://en.wikipedia.org/wiki/Document_type_definition 136