17db96d56Sopenharmony_ci:mod:`xml.dom.pulldom` --- Support for building partial DOM trees 27db96d56Sopenharmony_ci================================================================= 37db96d56Sopenharmony_ci 47db96d56Sopenharmony_ci.. module:: xml.dom.pulldom 57db96d56Sopenharmony_ci :synopsis: Support for building partial DOM trees from SAX events. 67db96d56Sopenharmony_ci 77db96d56Sopenharmony_ci.. moduleauthor:: Paul Prescod <paul@prescod.net> 87db96d56Sopenharmony_ci 97db96d56Sopenharmony_ci**Source code:** :source:`Lib/xml/dom/pulldom.py` 107db96d56Sopenharmony_ci 117db96d56Sopenharmony_ci-------------- 127db96d56Sopenharmony_ci 137db96d56Sopenharmony_ciThe :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be 147db96d56Sopenharmony_ciasked to produce DOM-accessible fragments of the document where necessary. The 157db96d56Sopenharmony_cibasic concept involves pulling "events" from a stream of incoming XML and 167db96d56Sopenharmony_ciprocessing them. In contrast to SAX which also employs an event-driven 177db96d56Sopenharmony_ciprocessing model together with callbacks, the user of a pull parser is 187db96d56Sopenharmony_ciresponsible for explicitly pulling events from the stream, looping over those 197db96d56Sopenharmony_cievents until either processing is finished or an error condition occurs. 207db96d56Sopenharmony_ci 217db96d56Sopenharmony_ci 227db96d56Sopenharmony_ci.. warning:: 237db96d56Sopenharmony_ci 247db96d56Sopenharmony_ci The :mod:`xml.dom.pulldom` module is not secure against 257db96d56Sopenharmony_ci maliciously constructed data. If you need to parse untrusted or 267db96d56Sopenharmony_ci unauthenticated data see :ref:`xml-vulnerabilities`. 277db96d56Sopenharmony_ci 287db96d56Sopenharmony_ci.. versionchanged:: 3.7.1 297db96d56Sopenharmony_ci 307db96d56Sopenharmony_ci The SAX parser no longer processes general external entities by default to 317db96d56Sopenharmony_ci increase security by default. To enable processing of external entities, 327db96d56Sopenharmony_ci pass a custom parser instance in:: 337db96d56Sopenharmony_ci 347db96d56Sopenharmony_ci from xml.dom.pulldom import parse 357db96d56Sopenharmony_ci from xml.sax import make_parser 367db96d56Sopenharmony_ci from xml.sax.handler import feature_external_ges 377db96d56Sopenharmony_ci 387db96d56Sopenharmony_ci parser = make_parser() 397db96d56Sopenharmony_ci parser.setFeature(feature_external_ges, True) 407db96d56Sopenharmony_ci parse(filename, parser=parser) 417db96d56Sopenharmony_ci 427db96d56Sopenharmony_ci 437db96d56Sopenharmony_ciExample:: 447db96d56Sopenharmony_ci 457db96d56Sopenharmony_ci from xml.dom import pulldom 467db96d56Sopenharmony_ci 477db96d56Sopenharmony_ci doc = pulldom.parse('sales_items.xml') 487db96d56Sopenharmony_ci for event, node in doc: 497db96d56Sopenharmony_ci if event == pulldom.START_ELEMENT and node.tagName == 'item': 507db96d56Sopenharmony_ci if int(node.getAttribute('price')) > 50: 517db96d56Sopenharmony_ci doc.expandNode(node) 527db96d56Sopenharmony_ci print(node.toxml()) 537db96d56Sopenharmony_ci 547db96d56Sopenharmony_ci``event`` is a constant and can be one of: 557db96d56Sopenharmony_ci 567db96d56Sopenharmony_ci* :data:`START_ELEMENT` 577db96d56Sopenharmony_ci* :data:`END_ELEMENT` 587db96d56Sopenharmony_ci* :data:`COMMENT` 597db96d56Sopenharmony_ci* :data:`START_DOCUMENT` 607db96d56Sopenharmony_ci* :data:`END_DOCUMENT` 617db96d56Sopenharmony_ci* :data:`CHARACTERS` 627db96d56Sopenharmony_ci* :data:`PROCESSING_INSTRUCTION` 637db96d56Sopenharmony_ci* :data:`IGNORABLE_WHITESPACE` 647db96d56Sopenharmony_ci 657db96d56Sopenharmony_ci``node`` is an object of type :class:`xml.dom.minidom.Document`, 667db96d56Sopenharmony_ci:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`. 677db96d56Sopenharmony_ci 687db96d56Sopenharmony_ciSince the document is treated as a "flat" stream of events, the document "tree" 697db96d56Sopenharmony_ciis implicitly traversed and the desired elements are found regardless of their 707db96d56Sopenharmony_cidepth in the tree. In other words, one does not need to consider hierarchical 717db96d56Sopenharmony_ciissues such as recursive searching of the document nodes, although if the 727db96d56Sopenharmony_cicontext of elements were important, one would either need to maintain some 737db96d56Sopenharmony_cicontext-related state (i.e. remembering where one is in the document at any 747db96d56Sopenharmony_cigiven point) or to make use of the :func:`DOMEventStream.expandNode` method 757db96d56Sopenharmony_ciand switch to DOM-related processing. 767db96d56Sopenharmony_ci 777db96d56Sopenharmony_ci 787db96d56Sopenharmony_ci.. class:: PullDom(documentFactory=None) 797db96d56Sopenharmony_ci 807db96d56Sopenharmony_ci Subclass of :class:`xml.sax.handler.ContentHandler`. 817db96d56Sopenharmony_ci 827db96d56Sopenharmony_ci 837db96d56Sopenharmony_ci.. class:: SAX2DOM(documentFactory=None) 847db96d56Sopenharmony_ci 857db96d56Sopenharmony_ci Subclass of :class:`xml.sax.handler.ContentHandler`. 867db96d56Sopenharmony_ci 877db96d56Sopenharmony_ci 887db96d56Sopenharmony_ci.. function:: parse(stream_or_string, parser=None, bufsize=None) 897db96d56Sopenharmony_ci 907db96d56Sopenharmony_ci Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be 917db96d56Sopenharmony_ci either a file name, or a file-like object. *parser*, if given, must be an 927db96d56Sopenharmony_ci :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the 937db96d56Sopenharmony_ci document handler of the 947db96d56Sopenharmony_ci parser and activate namespace support; other parser configuration (like 957db96d56Sopenharmony_ci setting an entity resolver) must have been done in advance. 967db96d56Sopenharmony_ci 977db96d56Sopenharmony_ciIf you have XML in a string, you can use the :func:`parseString` function instead: 987db96d56Sopenharmony_ci 997db96d56Sopenharmony_ci.. function:: parseString(string, parser=None) 1007db96d56Sopenharmony_ci 1017db96d56Sopenharmony_ci Return a :class:`DOMEventStream` that represents the (Unicode) *string*. 1027db96d56Sopenharmony_ci 1037db96d56Sopenharmony_ci.. data:: default_bufsize 1047db96d56Sopenharmony_ci 1057db96d56Sopenharmony_ci Default value for the *bufsize* parameter to :func:`parse`. 1067db96d56Sopenharmony_ci 1077db96d56Sopenharmony_ci The value of this variable can be changed before calling :func:`parse` and 1087db96d56Sopenharmony_ci the new value will take effect. 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_ci.. _domeventstream-objects: 1117db96d56Sopenharmony_ci 1127db96d56Sopenharmony_ciDOMEventStream Objects 1137db96d56Sopenharmony_ci---------------------- 1147db96d56Sopenharmony_ci 1157db96d56Sopenharmony_ci.. class:: DOMEventStream(stream, parser, bufsize) 1167db96d56Sopenharmony_ci 1177db96d56Sopenharmony_ci .. versionchanged:: 3.11 1187db96d56Sopenharmony_ci Support for :meth:`__getitem__` method has been removed. 1197db96d56Sopenharmony_ci 1207db96d56Sopenharmony_ci .. method:: getEvent() 1217db96d56Sopenharmony_ci 1227db96d56Sopenharmony_ci Return a tuple containing *event* and the current *node* as 1237db96d56Sopenharmony_ci :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`, 1247db96d56Sopenharmony_ci :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or 1257db96d56Sopenharmony_ci :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals 1267db96d56Sopenharmony_ci :data:`CHARACTERS`. 1277db96d56Sopenharmony_ci The current node does not contain information about its children, unless 1287db96d56Sopenharmony_ci :func:`expandNode` is called. 1297db96d56Sopenharmony_ci 1307db96d56Sopenharmony_ci .. method:: expandNode(node) 1317db96d56Sopenharmony_ci 1327db96d56Sopenharmony_ci Expands all children of *node* into *node*. Example:: 1337db96d56Sopenharmony_ci 1347db96d56Sopenharmony_ci from xml.dom import pulldom 1357db96d56Sopenharmony_ci 1367db96d56Sopenharmony_ci xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>' 1377db96d56Sopenharmony_ci doc = pulldom.parseString(xml) 1387db96d56Sopenharmony_ci for event, node in doc: 1397db96d56Sopenharmony_ci if event == pulldom.START_ELEMENT and node.tagName == 'p': 1407db96d56Sopenharmony_ci # Following statement only prints '<p/>' 1417db96d56Sopenharmony_ci print(node.toxml()) 1427db96d56Sopenharmony_ci doc.expandNode(node) 1437db96d56Sopenharmony_ci # Following statement prints node with all its children '<p>Some text <div>and more</div></p>' 1447db96d56Sopenharmony_ci print(node.toxml()) 1457db96d56Sopenharmony_ci 1467db96d56Sopenharmony_ci .. method:: DOMEventStream.reset() 147