17db96d56Sopenharmony_ci:mod:`xml.dom.pulldom` --- Support for building partial DOM trees
27db96d56Sopenharmony_ci=================================================================
37db96d56Sopenharmony_ci
47db96d56Sopenharmony_ci.. module:: xml.dom.pulldom
57db96d56Sopenharmony_ci   :synopsis: Support for building partial DOM trees from SAX events.
67db96d56Sopenharmony_ci
77db96d56Sopenharmony_ci.. moduleauthor:: Paul Prescod <paul@prescod.net>
87db96d56Sopenharmony_ci
97db96d56Sopenharmony_ci**Source code:** :source:`Lib/xml/dom/pulldom.py`
107db96d56Sopenharmony_ci
117db96d56Sopenharmony_ci--------------
127db96d56Sopenharmony_ci
137db96d56Sopenharmony_ciThe :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be
147db96d56Sopenharmony_ciasked to produce DOM-accessible fragments of the document where necessary. The
157db96d56Sopenharmony_cibasic concept involves pulling "events" from a stream of incoming XML and
167db96d56Sopenharmony_ciprocessing them. In contrast to SAX which also employs an event-driven
177db96d56Sopenharmony_ciprocessing model together with callbacks, the user of a pull parser is
187db96d56Sopenharmony_ciresponsible for explicitly pulling events from the stream, looping over those
197db96d56Sopenharmony_cievents until either processing is finished or an error condition occurs.
207db96d56Sopenharmony_ci
217db96d56Sopenharmony_ci
227db96d56Sopenharmony_ci.. warning::
237db96d56Sopenharmony_ci
247db96d56Sopenharmony_ci   The :mod:`xml.dom.pulldom` module is not secure against
257db96d56Sopenharmony_ci   maliciously constructed data.  If you need to parse untrusted or
267db96d56Sopenharmony_ci   unauthenticated data see :ref:`xml-vulnerabilities`.
277db96d56Sopenharmony_ci
287db96d56Sopenharmony_ci.. versionchanged:: 3.7.1
297db96d56Sopenharmony_ci
307db96d56Sopenharmony_ci   The SAX parser no longer processes general external entities by default to
317db96d56Sopenharmony_ci   increase security by default. To enable processing of external entities,
327db96d56Sopenharmony_ci   pass a custom parser instance in::
337db96d56Sopenharmony_ci
347db96d56Sopenharmony_ci      from xml.dom.pulldom import parse
357db96d56Sopenharmony_ci      from xml.sax import make_parser
367db96d56Sopenharmony_ci      from xml.sax.handler import feature_external_ges
377db96d56Sopenharmony_ci
387db96d56Sopenharmony_ci      parser = make_parser()
397db96d56Sopenharmony_ci      parser.setFeature(feature_external_ges, True)
407db96d56Sopenharmony_ci      parse(filename, parser=parser)
417db96d56Sopenharmony_ci
427db96d56Sopenharmony_ci
437db96d56Sopenharmony_ciExample::
447db96d56Sopenharmony_ci
457db96d56Sopenharmony_ci   from xml.dom import pulldom
467db96d56Sopenharmony_ci
477db96d56Sopenharmony_ci   doc = pulldom.parse('sales_items.xml')
487db96d56Sopenharmony_ci   for event, node in doc:
497db96d56Sopenharmony_ci       if event == pulldom.START_ELEMENT and node.tagName == 'item':
507db96d56Sopenharmony_ci           if int(node.getAttribute('price')) > 50:
517db96d56Sopenharmony_ci               doc.expandNode(node)
527db96d56Sopenharmony_ci               print(node.toxml())
537db96d56Sopenharmony_ci
547db96d56Sopenharmony_ci``event`` is a constant and can be one of:
557db96d56Sopenharmony_ci
567db96d56Sopenharmony_ci* :data:`START_ELEMENT`
577db96d56Sopenharmony_ci* :data:`END_ELEMENT`
587db96d56Sopenharmony_ci* :data:`COMMENT`
597db96d56Sopenharmony_ci* :data:`START_DOCUMENT`
607db96d56Sopenharmony_ci* :data:`END_DOCUMENT`
617db96d56Sopenharmony_ci* :data:`CHARACTERS`
627db96d56Sopenharmony_ci* :data:`PROCESSING_INSTRUCTION`
637db96d56Sopenharmony_ci* :data:`IGNORABLE_WHITESPACE`
647db96d56Sopenharmony_ci
657db96d56Sopenharmony_ci``node`` is an object of type :class:`xml.dom.minidom.Document`,
667db96d56Sopenharmony_ci:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`.
677db96d56Sopenharmony_ci
687db96d56Sopenharmony_ciSince the document is treated as a "flat" stream of events, the document "tree"
697db96d56Sopenharmony_ciis implicitly traversed and the desired elements are found regardless of their
707db96d56Sopenharmony_cidepth in the tree. In other words, one does not need to consider hierarchical
717db96d56Sopenharmony_ciissues such as recursive searching of the document nodes, although if the
727db96d56Sopenharmony_cicontext of elements were important, one would either need to maintain some
737db96d56Sopenharmony_cicontext-related state (i.e. remembering where one is in the document at any
747db96d56Sopenharmony_cigiven point) or to make use of the :func:`DOMEventStream.expandNode` method
757db96d56Sopenharmony_ciand switch to DOM-related processing.
767db96d56Sopenharmony_ci
777db96d56Sopenharmony_ci
787db96d56Sopenharmony_ci.. class:: PullDom(documentFactory=None)
797db96d56Sopenharmony_ci
807db96d56Sopenharmony_ci   Subclass of :class:`xml.sax.handler.ContentHandler`.
817db96d56Sopenharmony_ci
827db96d56Sopenharmony_ci
837db96d56Sopenharmony_ci.. class:: SAX2DOM(documentFactory=None)
847db96d56Sopenharmony_ci
857db96d56Sopenharmony_ci   Subclass of :class:`xml.sax.handler.ContentHandler`.
867db96d56Sopenharmony_ci
877db96d56Sopenharmony_ci
887db96d56Sopenharmony_ci.. function:: parse(stream_or_string, parser=None, bufsize=None)
897db96d56Sopenharmony_ci
907db96d56Sopenharmony_ci   Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be
917db96d56Sopenharmony_ci   either a file name, or a file-like object. *parser*, if given, must be an
927db96d56Sopenharmony_ci   :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the
937db96d56Sopenharmony_ci   document handler of the
947db96d56Sopenharmony_ci   parser and activate namespace support; other parser configuration (like
957db96d56Sopenharmony_ci   setting an entity resolver) must have been done in advance.
967db96d56Sopenharmony_ci
977db96d56Sopenharmony_ciIf you have XML in a string, you can use the :func:`parseString` function instead:
987db96d56Sopenharmony_ci
997db96d56Sopenharmony_ci.. function:: parseString(string, parser=None)
1007db96d56Sopenharmony_ci
1017db96d56Sopenharmony_ci   Return a :class:`DOMEventStream` that represents the (Unicode) *string*.
1027db96d56Sopenharmony_ci
1037db96d56Sopenharmony_ci.. data:: default_bufsize
1047db96d56Sopenharmony_ci
1057db96d56Sopenharmony_ci   Default value for the *bufsize* parameter to :func:`parse`.
1067db96d56Sopenharmony_ci
1077db96d56Sopenharmony_ci   The value of this variable can be changed before calling :func:`parse` and
1087db96d56Sopenharmony_ci   the new value will take effect.
1097db96d56Sopenharmony_ci
1107db96d56Sopenharmony_ci.. _domeventstream-objects:
1117db96d56Sopenharmony_ci
1127db96d56Sopenharmony_ciDOMEventStream Objects
1137db96d56Sopenharmony_ci----------------------
1147db96d56Sopenharmony_ci
1157db96d56Sopenharmony_ci.. class:: DOMEventStream(stream, parser, bufsize)
1167db96d56Sopenharmony_ci
1177db96d56Sopenharmony_ci   .. versionchanged:: 3.11
1187db96d56Sopenharmony_ci      Support for :meth:`__getitem__` method has been removed.
1197db96d56Sopenharmony_ci
1207db96d56Sopenharmony_ci   .. method:: getEvent()
1217db96d56Sopenharmony_ci
1227db96d56Sopenharmony_ci      Return a tuple containing *event* and the current *node* as
1237db96d56Sopenharmony_ci      :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`,
1247db96d56Sopenharmony_ci      :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or
1257db96d56Sopenharmony_ci      :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals
1267db96d56Sopenharmony_ci      :data:`CHARACTERS`.
1277db96d56Sopenharmony_ci      The current node does not contain information about its children, unless
1287db96d56Sopenharmony_ci      :func:`expandNode` is called.
1297db96d56Sopenharmony_ci
1307db96d56Sopenharmony_ci   .. method:: expandNode(node)
1317db96d56Sopenharmony_ci
1327db96d56Sopenharmony_ci      Expands all children of *node* into *node*. Example::
1337db96d56Sopenharmony_ci
1347db96d56Sopenharmony_ci          from xml.dom import pulldom
1357db96d56Sopenharmony_ci
1367db96d56Sopenharmony_ci          xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
1377db96d56Sopenharmony_ci          doc = pulldom.parseString(xml)
1387db96d56Sopenharmony_ci          for event, node in doc:
1397db96d56Sopenharmony_ci              if event == pulldom.START_ELEMENT and node.tagName == 'p':
1407db96d56Sopenharmony_ci                  # Following statement only prints '<p/>'
1417db96d56Sopenharmony_ci                  print(node.toxml())
1427db96d56Sopenharmony_ci                  doc.expandNode(node)
1437db96d56Sopenharmony_ci                  # Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
1447db96d56Sopenharmony_ci                  print(node.toxml())
1457db96d56Sopenharmony_ci
1467db96d56Sopenharmony_ci   .. method:: DOMEventStream.reset()
147