17db96d56Sopenharmony_ci:mod:`xml.dom.minidom` --- Minimal DOM implementation 27db96d56Sopenharmony_ci===================================================== 37db96d56Sopenharmony_ci 47db96d56Sopenharmony_ci.. module:: xml.dom.minidom 57db96d56Sopenharmony_ci :synopsis: Minimal Document Object Model (DOM) implementation. 67db96d56Sopenharmony_ci 77db96d56Sopenharmony_ci.. moduleauthor:: Paul Prescod <paul@prescod.net> 87db96d56Sopenharmony_ci.. sectionauthor:: Paul Prescod <paul@prescod.net> 97db96d56Sopenharmony_ci.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> 107db96d56Sopenharmony_ci 117db96d56Sopenharmony_ci**Source code:** :source:`Lib/xml/dom/minidom.py` 127db96d56Sopenharmony_ci 137db96d56Sopenharmony_ci-------------- 147db96d56Sopenharmony_ci 157db96d56Sopenharmony_ci:mod:`xml.dom.minidom` is a minimal implementation of the Document Object 167db96d56Sopenharmony_ciModel interface, with an API similar to that in other languages. It is intended 177db96d56Sopenharmony_cito be simpler than the full DOM and also significantly smaller. Users who are 187db96d56Sopenharmony_cinot already proficient with the DOM should consider using the 197db96d56Sopenharmony_ci:mod:`xml.etree.ElementTree` module for their XML processing instead. 207db96d56Sopenharmony_ci 217db96d56Sopenharmony_ci 227db96d56Sopenharmony_ci.. warning:: 237db96d56Sopenharmony_ci 247db96d56Sopenharmony_ci The :mod:`xml.dom.minidom` module is not secure against 257db96d56Sopenharmony_ci maliciously constructed data. If you need to parse untrusted or 267db96d56Sopenharmony_ci unauthenticated data see :ref:`xml-vulnerabilities`. 277db96d56Sopenharmony_ci 287db96d56Sopenharmony_ci 297db96d56Sopenharmony_ciDOM applications typically start by parsing some XML into a DOM. With 307db96d56Sopenharmony_ci:mod:`xml.dom.minidom`, this is done through the parse functions:: 317db96d56Sopenharmony_ci 327db96d56Sopenharmony_ci from xml.dom.minidom import parse, parseString 337db96d56Sopenharmony_ci 347db96d56Sopenharmony_ci dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name 357db96d56Sopenharmony_ci 367db96d56Sopenharmony_ci datasource = open('c:\\temp\\mydata.xml') 377db96d56Sopenharmony_ci dom2 = parse(datasource) # parse an open file 387db96d56Sopenharmony_ci 397db96d56Sopenharmony_ci dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') 407db96d56Sopenharmony_ci 417db96d56Sopenharmony_ciThe :func:`parse` function can take either a filename or an open file object. 427db96d56Sopenharmony_ci 437db96d56Sopenharmony_ci 447db96d56Sopenharmony_ci.. function:: parse(filename_or_file, parser=None, bufsize=None) 457db96d56Sopenharmony_ci 467db96d56Sopenharmony_ci Return a :class:`Document` from the given input. *filename_or_file* may be 477db96d56Sopenharmony_ci either a file name, or a file-like object. *parser*, if given, must be a SAX2 487db96d56Sopenharmony_ci parser object. This function will change the document handler of the parser and 497db96d56Sopenharmony_ci activate namespace support; other parser configuration (like setting an entity 507db96d56Sopenharmony_ci resolver) must have been done in advance. 517db96d56Sopenharmony_ci 527db96d56Sopenharmony_ciIf you have XML in a string, you can use the :func:`parseString` function 537db96d56Sopenharmony_ciinstead: 547db96d56Sopenharmony_ci 557db96d56Sopenharmony_ci 567db96d56Sopenharmony_ci.. function:: parseString(string, parser=None) 577db96d56Sopenharmony_ci 587db96d56Sopenharmony_ci Return a :class:`Document` that represents the *string*. This method creates an 597db96d56Sopenharmony_ci :class:`io.StringIO` object for the string and passes that on to :func:`parse`. 607db96d56Sopenharmony_ci 617db96d56Sopenharmony_ciBoth functions return a :class:`Document` object representing the content of the 627db96d56Sopenharmony_cidocument. 637db96d56Sopenharmony_ci 647db96d56Sopenharmony_ciWhat the :func:`parse` and :func:`parseString` functions do is connect an XML 657db96d56Sopenharmony_ciparser with a "DOM builder" that can accept parse events from any SAX parser and 667db96d56Sopenharmony_ciconvert them into a DOM tree. The name of the functions are perhaps misleading, 677db96d56Sopenharmony_cibut are easy to grasp when learning the interfaces. The parsing of the document 687db96d56Sopenharmony_ciwill be completed before these functions return; it's simply that these 697db96d56Sopenharmony_cifunctions do not provide a parser implementation themselves. 707db96d56Sopenharmony_ci 717db96d56Sopenharmony_ciYou can also create a :class:`Document` by calling a method on a "DOM 727db96d56Sopenharmony_ciImplementation" object. You can get this object either by calling the 737db96d56Sopenharmony_ci:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the 747db96d56Sopenharmony_ci:mod:`xml.dom.minidom` module. Once you have a :class:`Document`, you 757db96d56Sopenharmony_cican add child nodes to it to populate the DOM:: 767db96d56Sopenharmony_ci 777db96d56Sopenharmony_ci from xml.dom.minidom import getDOMImplementation 787db96d56Sopenharmony_ci 797db96d56Sopenharmony_ci impl = getDOMImplementation() 807db96d56Sopenharmony_ci 817db96d56Sopenharmony_ci newdoc = impl.createDocument(None, "some_tag", None) 827db96d56Sopenharmony_ci top_element = newdoc.documentElement 837db96d56Sopenharmony_ci text = newdoc.createTextNode('Some textual content.') 847db96d56Sopenharmony_ci top_element.appendChild(text) 857db96d56Sopenharmony_ci 867db96d56Sopenharmony_ciOnce you have a DOM document object, you can access the parts of your XML 877db96d56Sopenharmony_cidocument through its properties and methods. These properties are defined in 887db96d56Sopenharmony_cithe DOM specification. The main property of the document object is the 897db96d56Sopenharmony_ci:attr:`documentElement` property. It gives you the main element in the XML 907db96d56Sopenharmony_cidocument: the one that holds all others. Here is an example program:: 917db96d56Sopenharmony_ci 927db96d56Sopenharmony_ci dom3 = parseString("<myxml>Some data</myxml>") 937db96d56Sopenharmony_ci assert dom3.documentElement.tagName == "myxml" 947db96d56Sopenharmony_ci 957db96d56Sopenharmony_ciWhen you are finished with a DOM tree, you may optionally call the 967db96d56Sopenharmony_ci:meth:`unlink` method to encourage early cleanup of the now-unneeded 977db96d56Sopenharmony_ciobjects. :meth:`unlink` is an :mod:`xml.dom.minidom`\ -specific 987db96d56Sopenharmony_ciextension to the DOM API that renders the node and its descendants 997db96d56Sopenharmony_ciessentially useless. Otherwise, Python's garbage collector will 1007db96d56Sopenharmony_cieventually take care of the objects in the tree. 1017db96d56Sopenharmony_ci 1027db96d56Sopenharmony_ci.. seealso:: 1037db96d56Sopenharmony_ci 1047db96d56Sopenharmony_ci `Document Object Model (DOM) Level 1 Specification <https://www.w3.org/TR/REC-DOM-Level-1/>`_ 1057db96d56Sopenharmony_ci The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. 1067db96d56Sopenharmony_ci 1077db96d56Sopenharmony_ci 1087db96d56Sopenharmony_ci.. _minidom-objects: 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_ciDOM Objects 1117db96d56Sopenharmony_ci----------- 1127db96d56Sopenharmony_ci 1137db96d56Sopenharmony_ciThe definition of the DOM API for Python is given as part of the :mod:`xml.dom` 1147db96d56Sopenharmony_cimodule documentation. This section lists the differences between the API and 1157db96d56Sopenharmony_ci:mod:`xml.dom.minidom`. 1167db96d56Sopenharmony_ci 1177db96d56Sopenharmony_ci 1187db96d56Sopenharmony_ci.. method:: Node.unlink() 1197db96d56Sopenharmony_ci 1207db96d56Sopenharmony_ci Break internal references within the DOM so that it will be garbage collected on 1217db96d56Sopenharmony_ci versions of Python without cyclic GC. Even when cyclic GC is available, using 1227db96d56Sopenharmony_ci this can make large amounts of memory available sooner, so calling this on DOM 1237db96d56Sopenharmony_ci objects as soon as they are no longer needed is good practice. This only needs 1247db96d56Sopenharmony_ci to be called on the :class:`Document` object, but may be called on child nodes 1257db96d56Sopenharmony_ci to discard children of that node. 1267db96d56Sopenharmony_ci 1277db96d56Sopenharmony_ci You can avoid calling this method explicitly by using the :keyword:`with` 1287db96d56Sopenharmony_ci statement. The following code will automatically unlink *dom* when the 1297db96d56Sopenharmony_ci :keyword:`!with` block is exited:: 1307db96d56Sopenharmony_ci 1317db96d56Sopenharmony_ci with xml.dom.minidom.parse(datasource) as dom: 1327db96d56Sopenharmony_ci ... # Work with dom. 1337db96d56Sopenharmony_ci 1347db96d56Sopenharmony_ci 1357db96d56Sopenharmony_ci.. method:: Node.writexml(writer, indent="", addindent="", newl="", \ 1367db96d56Sopenharmony_ci encoding=None, standalone=None) 1377db96d56Sopenharmony_ci 1387db96d56Sopenharmony_ci Write XML to the writer object. The writer receives texts but not bytes as input, 1397db96d56Sopenharmony_ci it should have a :meth:`write` method which matches that of the file object 1407db96d56Sopenharmony_ci interface. The *indent* parameter is the indentation of the current node. 1417db96d56Sopenharmony_ci The *addindent* parameter is the incremental indentation to use for subnodes 1427db96d56Sopenharmony_ci of the current one. The *newl* parameter specifies the string to use to 1437db96d56Sopenharmony_ci terminate newlines. 1447db96d56Sopenharmony_ci 1457db96d56Sopenharmony_ci For the :class:`Document` node, an additional keyword argument *encoding* can 1467db96d56Sopenharmony_ci be used to specify the encoding field of the XML header. 1477db96d56Sopenharmony_ci 1487db96d56Sopenharmony_ci Similarly, explicitly stating the *standalone* argument causes the 1497db96d56Sopenharmony_ci standalone document declarations to be added to the prologue of the XML 1507db96d56Sopenharmony_ci document. 1517db96d56Sopenharmony_ci If the value is set to ``True``, ``standalone="yes"`` is added, 1527db96d56Sopenharmony_ci otherwise it is set to ``"no"``. 1537db96d56Sopenharmony_ci Not stating the argument will omit the declaration from the document. 1547db96d56Sopenharmony_ci 1557db96d56Sopenharmony_ci .. versionchanged:: 3.8 1567db96d56Sopenharmony_ci The :meth:`writexml` method now preserves the attribute order specified 1577db96d56Sopenharmony_ci by the user. 1587db96d56Sopenharmony_ci 1597db96d56Sopenharmony_ci .. versionchanged:: 3.9 1607db96d56Sopenharmony_ci The *standalone* parameter was added. 1617db96d56Sopenharmony_ci 1627db96d56Sopenharmony_ci.. method:: Node.toxml(encoding=None, standalone=None) 1637db96d56Sopenharmony_ci 1647db96d56Sopenharmony_ci Return a string or byte string containing the XML represented by 1657db96d56Sopenharmony_ci the DOM node. 1667db96d56Sopenharmony_ci 1677db96d56Sopenharmony_ci With an explicit *encoding* [1]_ argument, the result is a byte 1687db96d56Sopenharmony_ci string in the specified encoding. 1697db96d56Sopenharmony_ci With no *encoding* argument, the result is a Unicode string, and the 1707db96d56Sopenharmony_ci XML declaration in the resulting string does not specify an 1717db96d56Sopenharmony_ci encoding. Encoding this string in an encoding other than UTF-8 is 1727db96d56Sopenharmony_ci likely incorrect, since UTF-8 is the default encoding of XML. 1737db96d56Sopenharmony_ci 1747db96d56Sopenharmony_ci The *standalone* argument behaves exactly as in :meth:`writexml`. 1757db96d56Sopenharmony_ci 1767db96d56Sopenharmony_ci .. versionchanged:: 3.8 1777db96d56Sopenharmony_ci The :meth:`toxml` method now preserves the attribute order specified 1787db96d56Sopenharmony_ci by the user. 1797db96d56Sopenharmony_ci 1807db96d56Sopenharmony_ci .. versionchanged:: 3.9 1817db96d56Sopenharmony_ci The *standalone* parameter was added. 1827db96d56Sopenharmony_ci 1837db96d56Sopenharmony_ci.. method:: Node.toprettyxml(indent="\t", newl="\n", encoding=None, \ 1847db96d56Sopenharmony_ci standalone=None) 1857db96d56Sopenharmony_ci 1867db96d56Sopenharmony_ci Return a pretty-printed version of the document. *indent* specifies the 1877db96d56Sopenharmony_ci indentation string and defaults to a tabulator; *newl* specifies the string 1887db96d56Sopenharmony_ci emitted at the end of each line and defaults to ``\n``. 1897db96d56Sopenharmony_ci 1907db96d56Sopenharmony_ci The *encoding* argument behaves like the corresponding argument of 1917db96d56Sopenharmony_ci :meth:`toxml`. 1927db96d56Sopenharmony_ci 1937db96d56Sopenharmony_ci The *standalone* argument behaves exactly as in :meth:`writexml`. 1947db96d56Sopenharmony_ci 1957db96d56Sopenharmony_ci .. versionchanged:: 3.8 1967db96d56Sopenharmony_ci The :meth:`toprettyxml` method now preserves the attribute order specified 1977db96d56Sopenharmony_ci by the user. 1987db96d56Sopenharmony_ci 1997db96d56Sopenharmony_ci .. versionchanged:: 3.9 2007db96d56Sopenharmony_ci The *standalone* parameter was added. 2017db96d56Sopenharmony_ci 2027db96d56Sopenharmony_ci.. _dom-example: 2037db96d56Sopenharmony_ci 2047db96d56Sopenharmony_ciDOM Example 2057db96d56Sopenharmony_ci----------- 2067db96d56Sopenharmony_ci 2077db96d56Sopenharmony_ciThis example program is a fairly realistic example of a simple program. In this 2087db96d56Sopenharmony_ciparticular case, we do not take much advantage of the flexibility of the DOM. 2097db96d56Sopenharmony_ci 2107db96d56Sopenharmony_ci.. literalinclude:: ../includes/minidom-example.py 2117db96d56Sopenharmony_ci 2127db96d56Sopenharmony_ci 2137db96d56Sopenharmony_ci.. _minidom-and-dom: 2147db96d56Sopenharmony_ci 2157db96d56Sopenharmony_ciminidom and the DOM standard 2167db96d56Sopenharmony_ci---------------------------- 2177db96d56Sopenharmony_ci 2187db96d56Sopenharmony_ciThe :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with 2197db96d56Sopenharmony_cisome DOM 2 features (primarily namespace features). 2207db96d56Sopenharmony_ci 2217db96d56Sopenharmony_ciUsage of the DOM interface in Python is straight-forward. The following mapping 2227db96d56Sopenharmony_cirules apply: 2237db96d56Sopenharmony_ci 2247db96d56Sopenharmony_ci* Interfaces are accessed through instance objects. Applications should not 2257db96d56Sopenharmony_ci instantiate the classes themselves; they should use the creator functions 2267db96d56Sopenharmony_ci available on the :class:`Document` object. Derived interfaces support all 2277db96d56Sopenharmony_ci operations (and attributes) from the base interfaces, plus any new operations. 2287db96d56Sopenharmony_ci 2297db96d56Sopenharmony_ci* Operations are used as methods. Since the DOM uses only :keyword:`in` 2307db96d56Sopenharmony_ci parameters, the arguments are passed in normal order (from left to right). 2317db96d56Sopenharmony_ci There are no optional arguments. ``void`` operations return ``None``. 2327db96d56Sopenharmony_ci 2337db96d56Sopenharmony_ci* IDL attributes map to instance attributes. For compatibility with the OMG IDL 2347db96d56Sopenharmony_ci language mapping for Python, an attribute ``foo`` can also be accessed through 2357db96d56Sopenharmony_ci accessor methods :meth:`_get_foo` and :meth:`_set_foo`. ``readonly`` 2367db96d56Sopenharmony_ci attributes must not be changed; this is not enforced at runtime. 2377db96d56Sopenharmony_ci 2387db96d56Sopenharmony_ci* The types ``short int``, ``unsigned int``, ``unsigned long long``, and 2397db96d56Sopenharmony_ci ``boolean`` all map to Python integer objects. 2407db96d56Sopenharmony_ci 2417db96d56Sopenharmony_ci* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports 2427db96d56Sopenharmony_ci either bytes or strings, but will normally produce strings. 2437db96d56Sopenharmony_ci Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL 2447db96d56Sopenharmony_ci ``null`` value by the DOM specification from the W3C. 2457db96d56Sopenharmony_ci 2467db96d56Sopenharmony_ci* ``const`` declarations map to variables in their respective scope (e.g. 2477db96d56Sopenharmony_ci ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. 2487db96d56Sopenharmony_ci 2497db96d56Sopenharmony_ci* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. 2507db96d56Sopenharmony_ci Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as 2517db96d56Sopenharmony_ci :exc:`TypeError` and :exc:`AttributeError`. 2527db96d56Sopenharmony_ci 2537db96d56Sopenharmony_ci* :class:`NodeList` objects are implemented using Python's built-in list type. 2547db96d56Sopenharmony_ci These objects provide the interface defined in the DOM specification, but with 2557db96d56Sopenharmony_ci earlier versions of Python they do not support the official API. They are, 2567db96d56Sopenharmony_ci however, much more "Pythonic" than the interface defined in the W3C 2577db96d56Sopenharmony_ci recommendations. 2587db96d56Sopenharmony_ci 2597db96d56Sopenharmony_ciThe following interfaces have no implementation in :mod:`xml.dom.minidom`: 2607db96d56Sopenharmony_ci 2617db96d56Sopenharmony_ci* :class:`DOMTimeStamp` 2627db96d56Sopenharmony_ci 2637db96d56Sopenharmony_ci* :class:`EntityReference` 2647db96d56Sopenharmony_ci 2657db96d56Sopenharmony_ciMost of these reflect information in the XML document that is not of general 2667db96d56Sopenharmony_ciutility to most DOM users. 2677db96d56Sopenharmony_ci 2687db96d56Sopenharmony_ci.. rubric:: Footnotes 2697db96d56Sopenharmony_ci 2707db96d56Sopenharmony_ci.. [1] The encoding name included in the XML output should conform to 2717db96d56Sopenharmony_ci the appropriate standards. For example, "UTF-8" is valid, but 2727db96d56Sopenharmony_ci "UTF8" is not valid in an XML document's declaration, even though 2737db96d56Sopenharmony_ci Python accepts it as an encoding name. 2747db96d56Sopenharmony_ci See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 2757db96d56Sopenharmony_ci and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 276