17db96d56Sopenharmony_ci:mod:`xml.dom.minidom` --- Minimal DOM implementation
27db96d56Sopenharmony_ci=====================================================
37db96d56Sopenharmony_ci
47db96d56Sopenharmony_ci.. module:: xml.dom.minidom
57db96d56Sopenharmony_ci   :synopsis: Minimal Document Object Model (DOM) implementation.
67db96d56Sopenharmony_ci
77db96d56Sopenharmony_ci.. moduleauthor:: Paul Prescod <paul@prescod.net>
87db96d56Sopenharmony_ci.. sectionauthor:: Paul Prescod <paul@prescod.net>
97db96d56Sopenharmony_ci.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
107db96d56Sopenharmony_ci
117db96d56Sopenharmony_ci**Source code:** :source:`Lib/xml/dom/minidom.py`
127db96d56Sopenharmony_ci
137db96d56Sopenharmony_ci--------------
147db96d56Sopenharmony_ci
157db96d56Sopenharmony_ci:mod:`xml.dom.minidom` is a minimal implementation of the Document Object
167db96d56Sopenharmony_ciModel interface, with an API similar to that in other languages.  It is intended
177db96d56Sopenharmony_cito be simpler than the full DOM and also significantly smaller.  Users who are
187db96d56Sopenharmony_cinot already proficient with the DOM should consider using the
197db96d56Sopenharmony_ci:mod:`xml.etree.ElementTree` module for their XML processing instead.
207db96d56Sopenharmony_ci
217db96d56Sopenharmony_ci
227db96d56Sopenharmony_ci.. warning::
237db96d56Sopenharmony_ci
247db96d56Sopenharmony_ci   The :mod:`xml.dom.minidom` module is not secure against
257db96d56Sopenharmony_ci   maliciously constructed data.  If you need to parse untrusted or
267db96d56Sopenharmony_ci   unauthenticated data see :ref:`xml-vulnerabilities`.
277db96d56Sopenharmony_ci
287db96d56Sopenharmony_ci
297db96d56Sopenharmony_ciDOM applications typically start by parsing some XML into a DOM.  With
307db96d56Sopenharmony_ci:mod:`xml.dom.minidom`, this is done through the parse functions::
317db96d56Sopenharmony_ci
327db96d56Sopenharmony_ci   from xml.dom.minidom import parse, parseString
337db96d56Sopenharmony_ci
347db96d56Sopenharmony_ci   dom1 = parse('c:\\temp\\mydata.xml')  # parse an XML file by name
357db96d56Sopenharmony_ci
367db96d56Sopenharmony_ci   datasource = open('c:\\temp\\mydata.xml')
377db96d56Sopenharmony_ci   dom2 = parse(datasource)  # parse an open file
387db96d56Sopenharmony_ci
397db96d56Sopenharmony_ci   dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
407db96d56Sopenharmony_ci
417db96d56Sopenharmony_ciThe :func:`parse` function can take either a filename or an open file object.
427db96d56Sopenharmony_ci
437db96d56Sopenharmony_ci
447db96d56Sopenharmony_ci.. function:: parse(filename_or_file, parser=None, bufsize=None)
457db96d56Sopenharmony_ci
467db96d56Sopenharmony_ci   Return a :class:`Document` from the given input. *filename_or_file* may be
477db96d56Sopenharmony_ci   either a file name, or a file-like object. *parser*, if given, must be a SAX2
487db96d56Sopenharmony_ci   parser object. This function will change the document handler of the parser and
497db96d56Sopenharmony_ci   activate namespace support; other parser configuration (like setting an entity
507db96d56Sopenharmony_ci   resolver) must have been done in advance.
517db96d56Sopenharmony_ci
527db96d56Sopenharmony_ciIf you have XML in a string, you can use the :func:`parseString` function
537db96d56Sopenharmony_ciinstead:
547db96d56Sopenharmony_ci
557db96d56Sopenharmony_ci
567db96d56Sopenharmony_ci.. function:: parseString(string, parser=None)
577db96d56Sopenharmony_ci
587db96d56Sopenharmony_ci   Return a :class:`Document` that represents the *string*. This method creates an
597db96d56Sopenharmony_ci   :class:`io.StringIO` object for the string and passes that on to :func:`parse`.
607db96d56Sopenharmony_ci
617db96d56Sopenharmony_ciBoth functions return a :class:`Document` object representing the content of the
627db96d56Sopenharmony_cidocument.
637db96d56Sopenharmony_ci
647db96d56Sopenharmony_ciWhat the :func:`parse` and :func:`parseString` functions do is connect an XML
657db96d56Sopenharmony_ciparser with a "DOM builder" that can accept parse events from any SAX parser and
667db96d56Sopenharmony_ciconvert them into a DOM tree.  The name of the functions are perhaps misleading,
677db96d56Sopenharmony_cibut are easy to grasp when learning the interfaces.  The parsing of the document
687db96d56Sopenharmony_ciwill be completed before these functions return; it's simply that these
697db96d56Sopenharmony_cifunctions do not provide a parser implementation themselves.
707db96d56Sopenharmony_ci
717db96d56Sopenharmony_ciYou can also create a :class:`Document` by calling a method on a "DOM
727db96d56Sopenharmony_ciImplementation" object.  You can get this object either by calling the
737db96d56Sopenharmony_ci:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the
747db96d56Sopenharmony_ci:mod:`xml.dom.minidom` module.  Once you have a :class:`Document`, you
757db96d56Sopenharmony_cican add child nodes to it to populate the DOM::
767db96d56Sopenharmony_ci
777db96d56Sopenharmony_ci   from xml.dom.minidom import getDOMImplementation
787db96d56Sopenharmony_ci
797db96d56Sopenharmony_ci   impl = getDOMImplementation()
807db96d56Sopenharmony_ci
817db96d56Sopenharmony_ci   newdoc = impl.createDocument(None, "some_tag", None)
827db96d56Sopenharmony_ci   top_element = newdoc.documentElement
837db96d56Sopenharmony_ci   text = newdoc.createTextNode('Some textual content.')
847db96d56Sopenharmony_ci   top_element.appendChild(text)
857db96d56Sopenharmony_ci
867db96d56Sopenharmony_ciOnce you have a DOM document object, you can access the parts of your XML
877db96d56Sopenharmony_cidocument through its properties and methods.  These properties are defined in
887db96d56Sopenharmony_cithe DOM specification.  The main property of the document object is the
897db96d56Sopenharmony_ci:attr:`documentElement` property.  It gives you the main element in the XML
907db96d56Sopenharmony_cidocument: the one that holds all others.  Here is an example program::
917db96d56Sopenharmony_ci
927db96d56Sopenharmony_ci   dom3 = parseString("<myxml>Some data</myxml>")
937db96d56Sopenharmony_ci   assert dom3.documentElement.tagName == "myxml"
947db96d56Sopenharmony_ci
957db96d56Sopenharmony_ciWhen you are finished with a DOM tree, you may optionally call the
967db96d56Sopenharmony_ci:meth:`unlink` method to encourage early cleanup of the now-unneeded
977db96d56Sopenharmony_ciobjects.  :meth:`unlink` is an :mod:`xml.dom.minidom`\ -specific
987db96d56Sopenharmony_ciextension to the DOM API that renders the node and its descendants
997db96d56Sopenharmony_ciessentially useless.  Otherwise, Python's garbage collector will
1007db96d56Sopenharmony_cieventually take care of the objects in the tree.
1017db96d56Sopenharmony_ci
1027db96d56Sopenharmony_ci.. seealso::
1037db96d56Sopenharmony_ci
1047db96d56Sopenharmony_ci   `Document Object Model (DOM) Level 1 Specification <https://www.w3.org/TR/REC-DOM-Level-1/>`_
1057db96d56Sopenharmony_ci      The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`.
1067db96d56Sopenharmony_ci
1077db96d56Sopenharmony_ci
1087db96d56Sopenharmony_ci.. _minidom-objects:
1097db96d56Sopenharmony_ci
1107db96d56Sopenharmony_ciDOM Objects
1117db96d56Sopenharmony_ci-----------
1127db96d56Sopenharmony_ci
1137db96d56Sopenharmony_ciThe definition of the DOM API for Python is given as part of the :mod:`xml.dom`
1147db96d56Sopenharmony_cimodule documentation.  This section lists the differences between the API and
1157db96d56Sopenharmony_ci:mod:`xml.dom.minidom`.
1167db96d56Sopenharmony_ci
1177db96d56Sopenharmony_ci
1187db96d56Sopenharmony_ci.. method:: Node.unlink()
1197db96d56Sopenharmony_ci
1207db96d56Sopenharmony_ci   Break internal references within the DOM so that it will be garbage collected on
1217db96d56Sopenharmony_ci   versions of Python without cyclic GC.  Even when cyclic GC is available, using
1227db96d56Sopenharmony_ci   this can make large amounts of memory available sooner, so calling this on DOM
1237db96d56Sopenharmony_ci   objects as soon as they are no longer needed is good practice.  This only needs
1247db96d56Sopenharmony_ci   to be called on the :class:`Document` object, but may be called on child nodes
1257db96d56Sopenharmony_ci   to discard children of that node.
1267db96d56Sopenharmony_ci
1277db96d56Sopenharmony_ci   You can avoid calling this method explicitly by using the :keyword:`with`
1287db96d56Sopenharmony_ci   statement. The following code will automatically unlink *dom* when the
1297db96d56Sopenharmony_ci   :keyword:`!with` block is exited::
1307db96d56Sopenharmony_ci
1317db96d56Sopenharmony_ci      with xml.dom.minidom.parse(datasource) as dom:
1327db96d56Sopenharmony_ci          ... # Work with dom.
1337db96d56Sopenharmony_ci
1347db96d56Sopenharmony_ci
1357db96d56Sopenharmony_ci.. method:: Node.writexml(writer, indent="", addindent="", newl="", \
1367db96d56Sopenharmony_ci                          encoding=None, standalone=None)
1377db96d56Sopenharmony_ci
1387db96d56Sopenharmony_ci   Write XML to the writer object.  The writer receives texts but not bytes as input,
1397db96d56Sopenharmony_ci   it should have a :meth:`write` method which matches that of the file object
1407db96d56Sopenharmony_ci   interface.  The *indent* parameter is the indentation of the current node.
1417db96d56Sopenharmony_ci   The *addindent* parameter is the incremental indentation to use for subnodes
1427db96d56Sopenharmony_ci   of the current one.  The *newl* parameter specifies the string to use to
1437db96d56Sopenharmony_ci   terminate newlines.
1447db96d56Sopenharmony_ci
1457db96d56Sopenharmony_ci   For the :class:`Document` node, an additional keyword argument *encoding* can
1467db96d56Sopenharmony_ci   be used to specify the encoding field of the XML header.
1477db96d56Sopenharmony_ci
1487db96d56Sopenharmony_ci   Similarly, explicitly stating the *standalone* argument causes the
1497db96d56Sopenharmony_ci   standalone document declarations to be added to the prologue of the XML
1507db96d56Sopenharmony_ci   document.
1517db96d56Sopenharmony_ci   If the value is set to ``True``, ``standalone="yes"`` is added,
1527db96d56Sopenharmony_ci   otherwise it is set to ``"no"``.
1537db96d56Sopenharmony_ci   Not stating the argument will omit the declaration from the document.
1547db96d56Sopenharmony_ci
1557db96d56Sopenharmony_ci   .. versionchanged:: 3.8
1567db96d56Sopenharmony_ci      The :meth:`writexml` method now preserves the attribute order specified
1577db96d56Sopenharmony_ci      by the user.
1587db96d56Sopenharmony_ci
1597db96d56Sopenharmony_ci   .. versionchanged:: 3.9
1607db96d56Sopenharmony_ci      The *standalone* parameter was added.
1617db96d56Sopenharmony_ci
1627db96d56Sopenharmony_ci.. method:: Node.toxml(encoding=None, standalone=None)
1637db96d56Sopenharmony_ci
1647db96d56Sopenharmony_ci   Return a string or byte string containing the XML represented by
1657db96d56Sopenharmony_ci   the DOM node.
1667db96d56Sopenharmony_ci
1677db96d56Sopenharmony_ci   With an explicit *encoding* [1]_ argument, the result is a byte
1687db96d56Sopenharmony_ci   string in the specified encoding.
1697db96d56Sopenharmony_ci   With no *encoding* argument, the result is a Unicode string, and the
1707db96d56Sopenharmony_ci   XML declaration in the resulting string does not specify an
1717db96d56Sopenharmony_ci   encoding. Encoding this string in an encoding other than UTF-8 is
1727db96d56Sopenharmony_ci   likely incorrect, since UTF-8 is the default encoding of XML.
1737db96d56Sopenharmony_ci
1747db96d56Sopenharmony_ci   The *standalone* argument behaves exactly as in :meth:`writexml`.
1757db96d56Sopenharmony_ci
1767db96d56Sopenharmony_ci   .. versionchanged:: 3.8
1777db96d56Sopenharmony_ci      The :meth:`toxml` method now preserves the attribute order specified
1787db96d56Sopenharmony_ci      by the user.
1797db96d56Sopenharmony_ci
1807db96d56Sopenharmony_ci   .. versionchanged:: 3.9
1817db96d56Sopenharmony_ci      The *standalone* parameter was added.
1827db96d56Sopenharmony_ci
1837db96d56Sopenharmony_ci.. method:: Node.toprettyxml(indent="\t", newl="\n", encoding=None, \
1847db96d56Sopenharmony_ci                             standalone=None)
1857db96d56Sopenharmony_ci
1867db96d56Sopenharmony_ci   Return a pretty-printed version of the document. *indent* specifies the
1877db96d56Sopenharmony_ci   indentation string and defaults to a tabulator; *newl* specifies the string
1887db96d56Sopenharmony_ci   emitted at the end of each line and defaults to ``\n``.
1897db96d56Sopenharmony_ci
1907db96d56Sopenharmony_ci   The *encoding* argument behaves like the corresponding argument of
1917db96d56Sopenharmony_ci   :meth:`toxml`.
1927db96d56Sopenharmony_ci
1937db96d56Sopenharmony_ci   The *standalone* argument behaves exactly as in :meth:`writexml`.
1947db96d56Sopenharmony_ci
1957db96d56Sopenharmony_ci   .. versionchanged:: 3.8
1967db96d56Sopenharmony_ci      The :meth:`toprettyxml` method now preserves the attribute order specified
1977db96d56Sopenharmony_ci      by the user.
1987db96d56Sopenharmony_ci
1997db96d56Sopenharmony_ci   .. versionchanged:: 3.9
2007db96d56Sopenharmony_ci      The *standalone* parameter was added.
2017db96d56Sopenharmony_ci
2027db96d56Sopenharmony_ci.. _dom-example:
2037db96d56Sopenharmony_ci
2047db96d56Sopenharmony_ciDOM Example
2057db96d56Sopenharmony_ci-----------
2067db96d56Sopenharmony_ci
2077db96d56Sopenharmony_ciThis example program is a fairly realistic example of a simple program. In this
2087db96d56Sopenharmony_ciparticular case, we do not take much advantage of the flexibility of the DOM.
2097db96d56Sopenharmony_ci
2107db96d56Sopenharmony_ci.. literalinclude:: ../includes/minidom-example.py
2117db96d56Sopenharmony_ci
2127db96d56Sopenharmony_ci
2137db96d56Sopenharmony_ci.. _minidom-and-dom:
2147db96d56Sopenharmony_ci
2157db96d56Sopenharmony_ciminidom and the DOM standard
2167db96d56Sopenharmony_ci----------------------------
2177db96d56Sopenharmony_ci
2187db96d56Sopenharmony_ciThe :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with
2197db96d56Sopenharmony_cisome DOM 2 features (primarily namespace features).
2207db96d56Sopenharmony_ci
2217db96d56Sopenharmony_ciUsage of the DOM interface in Python is straight-forward.  The following mapping
2227db96d56Sopenharmony_cirules apply:
2237db96d56Sopenharmony_ci
2247db96d56Sopenharmony_ci* Interfaces are accessed through instance objects. Applications should not
2257db96d56Sopenharmony_ci  instantiate the classes themselves; they should use the creator functions
2267db96d56Sopenharmony_ci  available on the :class:`Document` object. Derived interfaces support all
2277db96d56Sopenharmony_ci  operations (and attributes) from the base interfaces, plus any new operations.
2287db96d56Sopenharmony_ci
2297db96d56Sopenharmony_ci* Operations are used as methods. Since the DOM uses only :keyword:`in`
2307db96d56Sopenharmony_ci  parameters, the arguments are passed in normal order (from left to right).
2317db96d56Sopenharmony_ci  There are no optional arguments. ``void`` operations return ``None``.
2327db96d56Sopenharmony_ci
2337db96d56Sopenharmony_ci* IDL attributes map to instance attributes. For compatibility with the OMG IDL
2347db96d56Sopenharmony_ci  language mapping for Python, an attribute ``foo`` can also be accessed through
2357db96d56Sopenharmony_ci  accessor methods :meth:`_get_foo` and :meth:`_set_foo`.  ``readonly``
2367db96d56Sopenharmony_ci  attributes must not be changed; this is not enforced at runtime.
2377db96d56Sopenharmony_ci
2387db96d56Sopenharmony_ci* The types ``short int``, ``unsigned int``, ``unsigned long long``, and
2397db96d56Sopenharmony_ci  ``boolean`` all map to Python integer objects.
2407db96d56Sopenharmony_ci
2417db96d56Sopenharmony_ci* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports
2427db96d56Sopenharmony_ci  either bytes or strings, but will normally produce strings.
2437db96d56Sopenharmony_ci  Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL
2447db96d56Sopenharmony_ci  ``null`` value by the DOM specification from the W3C.
2457db96d56Sopenharmony_ci
2467db96d56Sopenharmony_ci* ``const`` declarations map to variables in their respective scope (e.g.
2477db96d56Sopenharmony_ci  ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed.
2487db96d56Sopenharmony_ci
2497db96d56Sopenharmony_ci* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`.
2507db96d56Sopenharmony_ci  Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as
2517db96d56Sopenharmony_ci  :exc:`TypeError` and :exc:`AttributeError`.
2527db96d56Sopenharmony_ci
2537db96d56Sopenharmony_ci* :class:`NodeList` objects are implemented using Python's built-in list type.
2547db96d56Sopenharmony_ci  These objects provide the interface defined in the DOM specification, but with
2557db96d56Sopenharmony_ci  earlier versions of Python they do not support the official API.  They are,
2567db96d56Sopenharmony_ci  however, much more "Pythonic" than the interface defined in the W3C
2577db96d56Sopenharmony_ci  recommendations.
2587db96d56Sopenharmony_ci
2597db96d56Sopenharmony_ciThe following interfaces have no implementation in :mod:`xml.dom.minidom`:
2607db96d56Sopenharmony_ci
2617db96d56Sopenharmony_ci* :class:`DOMTimeStamp`
2627db96d56Sopenharmony_ci
2637db96d56Sopenharmony_ci* :class:`EntityReference`
2647db96d56Sopenharmony_ci
2657db96d56Sopenharmony_ciMost of these reflect information in the XML document that is not of general
2667db96d56Sopenharmony_ciutility to most DOM users.
2677db96d56Sopenharmony_ci
2687db96d56Sopenharmony_ci.. rubric:: Footnotes
2697db96d56Sopenharmony_ci
2707db96d56Sopenharmony_ci.. [1] The encoding name included in the XML output should conform to
2717db96d56Sopenharmony_ci   the appropriate standards. For example, "UTF-8" is valid, but
2727db96d56Sopenharmony_ci   "UTF8" is not valid in an XML document's declaration, even though
2737db96d56Sopenharmony_ci   Python accepts it as an encoding name.
2747db96d56Sopenharmony_ci   See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
2757db96d56Sopenharmony_ci   and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
276