17db96d56Sopenharmony_ci:mod:`email.parser`: Parsing email messages 27db96d56Sopenharmony_ci------------------------------------------- 37db96d56Sopenharmony_ci 47db96d56Sopenharmony_ci.. module:: email.parser 57db96d56Sopenharmony_ci :synopsis: Parse flat text email messages to produce a message object structure. 67db96d56Sopenharmony_ci 77db96d56Sopenharmony_ci**Source code:** :source:`Lib/email/parser.py` 87db96d56Sopenharmony_ci 97db96d56Sopenharmony_ci-------------- 107db96d56Sopenharmony_ci 117db96d56Sopenharmony_ciMessage object structures can be created in one of two ways: they can be 127db96d56Sopenharmony_cicreated from whole cloth by creating an :class:`~email.message.EmailMessage` 137db96d56Sopenharmony_ciobject, adding headers using the dictionary interface, and adding payload(s) 147db96d56Sopenharmony_ciusing :meth:`~email.message.EmailMessage.set_content` and related methods, or 157db96d56Sopenharmony_cithey can be created by parsing a serialized representation of the email 167db96d56Sopenharmony_cimessage. 177db96d56Sopenharmony_ci 187db96d56Sopenharmony_ciThe :mod:`email` package provides a standard parser that understands most email 197db96d56Sopenharmony_cidocument structures, including MIME documents. You can pass the parser a 207db96d56Sopenharmony_cibytes, string or file object, and the parser will return to you the root 217db96d56Sopenharmony_ci:class:`~email.message.EmailMessage` instance of the object structure. For 227db96d56Sopenharmony_cisimple, non-MIME messages the payload of this root object will likely be a 237db96d56Sopenharmony_cistring containing the text of the message. For MIME messages, the root object 247db96d56Sopenharmony_ciwill return ``True`` from its :meth:`~email.message.EmailMessage.is_multipart` 257db96d56Sopenharmony_cimethod, and the subparts can be accessed via the payload manipulation methods, 267db96d56Sopenharmony_cisuch as :meth:`~email.message.EmailMessage.get_body`, 277db96d56Sopenharmony_ci:meth:`~email.message.EmailMessage.iter_parts`, and 287db96d56Sopenharmony_ci:meth:`~email.message.EmailMessage.walk`. 297db96d56Sopenharmony_ci 307db96d56Sopenharmony_ciThere are actually two parser interfaces available for use, the :class:`Parser` 317db96d56Sopenharmony_ciAPI and the incremental :class:`FeedParser` API. The :class:`Parser` API is 327db96d56Sopenharmony_cimost useful if you have the entire text of the message in memory, or if the 337db96d56Sopenharmony_cientire message lives in a file on the file system. :class:`FeedParser` is more 347db96d56Sopenharmony_ciappropriate when you are reading the message from a stream which might block 357db96d56Sopenharmony_ciwaiting for more input (such as reading an email message from a socket). The 367db96d56Sopenharmony_ci:class:`FeedParser` can consume and parse the message incrementally, and only 377db96d56Sopenharmony_cireturns the root object when you close the parser. 387db96d56Sopenharmony_ci 397db96d56Sopenharmony_ciNote that the parser can be extended in limited ways, and of course you can 407db96d56Sopenharmony_ciimplement your own parser completely from scratch. All of the logic that 417db96d56Sopenharmony_ciconnects the :mod:`email` package's bundled parser and the 427db96d56Sopenharmony_ci:class:`~email.message.EmailMessage` class is embodied in the :mod:`policy` 437db96d56Sopenharmony_ciclass, so a custom parser can create message object trees any way it finds 447db96d56Sopenharmony_cinecessary by implementing custom versions of the appropriate :mod:`policy` 457db96d56Sopenharmony_cimethods. 467db96d56Sopenharmony_ci 477db96d56Sopenharmony_ci 487db96d56Sopenharmony_ciFeedParser API 497db96d56Sopenharmony_ci^^^^^^^^^^^^^^ 507db96d56Sopenharmony_ci 517db96d56Sopenharmony_ciThe :class:`BytesFeedParser`, imported from the :mod:`email.feedparser` module, 527db96d56Sopenharmony_ciprovides an API that is conducive to incremental parsing of email messages, 537db96d56Sopenharmony_cisuch as would be necessary when reading the text of an email message from a 547db96d56Sopenharmony_cisource that can block (such as a socket). The :class:`BytesFeedParser` can of 557db96d56Sopenharmony_cicourse be used to parse an email message fully contained in a :term:`bytes-like 567db96d56Sopenharmony_ciobject`, string, or file, but the :class:`BytesParser` API may be more 577db96d56Sopenharmony_ciconvenient for such use cases. The semantics and results of the two parser 587db96d56Sopenharmony_ciAPIs are identical. 597db96d56Sopenharmony_ci 607db96d56Sopenharmony_ciThe :class:`BytesFeedParser`'s API is simple; you create an instance, feed it a 617db96d56Sopenharmony_cibunch of bytes until there's no more to feed it, then close the parser to 627db96d56Sopenharmony_ciretrieve the root message object. The :class:`BytesFeedParser` is extremely 637db96d56Sopenharmony_ciaccurate when parsing standards-compliant messages, and it does a very good job 647db96d56Sopenharmony_ciof parsing non-compliant messages, providing information about how a message 657db96d56Sopenharmony_ciwas deemed broken. It will populate a message object's 667db96d56Sopenharmony_ci:attr:`~email.message.EmailMessage.defects` attribute with a list of any 677db96d56Sopenharmony_ciproblems it found in a message. See the :mod:`email.errors` module for the 687db96d56Sopenharmony_cilist of defects that it can find. 697db96d56Sopenharmony_ci 707db96d56Sopenharmony_ciHere is the API for the :class:`BytesFeedParser`: 717db96d56Sopenharmony_ci 727db96d56Sopenharmony_ci 737db96d56Sopenharmony_ci.. class:: BytesFeedParser(_factory=None, *, policy=policy.compat32) 747db96d56Sopenharmony_ci 757db96d56Sopenharmony_ci Create a :class:`BytesFeedParser` instance. Optional *_factory* is a 767db96d56Sopenharmony_ci no-argument callable; if not specified use the 777db96d56Sopenharmony_ci :attr:`~email.policy.Policy.message_factory` from the *policy*. Call 787db96d56Sopenharmony_ci *_factory* whenever a new message object is needed. 797db96d56Sopenharmony_ci 807db96d56Sopenharmony_ci If *policy* is specified use the rules it specifies to update the 817db96d56Sopenharmony_ci representation of the message. If *policy* is not set, use the 827db96d56Sopenharmony_ci :class:`compat32 <email.policy.Compat32>` policy, which maintains backward 837db96d56Sopenharmony_ci compatibility with the Python 3.2 version of the email package and provides 847db96d56Sopenharmony_ci :class:`~email.message.Message` as the default factory. All other policies 857db96d56Sopenharmony_ci provide :class:`~email.message.EmailMessage` as the default *_factory*. For 867db96d56Sopenharmony_ci more information on what else *policy* controls, see the 877db96d56Sopenharmony_ci :mod:`~email.policy` documentation. 887db96d56Sopenharmony_ci 897db96d56Sopenharmony_ci Note: **The policy keyword should always be specified**; The default will 907db96d56Sopenharmony_ci change to :data:`email.policy.default` in a future version of Python. 917db96d56Sopenharmony_ci 927db96d56Sopenharmony_ci .. versionadded:: 3.2 937db96d56Sopenharmony_ci 947db96d56Sopenharmony_ci .. versionchanged:: 3.3 Added the *policy* keyword. 957db96d56Sopenharmony_ci .. versionchanged:: 3.6 *_factory* defaults to the policy ``message_factory``. 967db96d56Sopenharmony_ci 977db96d56Sopenharmony_ci 987db96d56Sopenharmony_ci .. method:: feed(data) 997db96d56Sopenharmony_ci 1007db96d56Sopenharmony_ci Feed the parser some more data. *data* should be a :term:`bytes-like 1017db96d56Sopenharmony_ci object` containing one or more lines. The lines can be partial and the 1027db96d56Sopenharmony_ci parser will stitch such partial lines together properly. The lines can 1037db96d56Sopenharmony_ci have any of the three common line endings: carriage return, newline, or 1047db96d56Sopenharmony_ci carriage return and newline (they can even be mixed). 1057db96d56Sopenharmony_ci 1067db96d56Sopenharmony_ci 1077db96d56Sopenharmony_ci .. method:: close() 1087db96d56Sopenharmony_ci 1097db96d56Sopenharmony_ci Complete the parsing of all previously fed data and return the root 1107db96d56Sopenharmony_ci message object. It is undefined what happens if :meth:`~feed` is called 1117db96d56Sopenharmony_ci after this method has been called. 1127db96d56Sopenharmony_ci 1137db96d56Sopenharmony_ci 1147db96d56Sopenharmony_ci.. class:: FeedParser(_factory=None, *, policy=policy.compat32) 1157db96d56Sopenharmony_ci 1167db96d56Sopenharmony_ci Works like :class:`BytesFeedParser` except that the input to the 1177db96d56Sopenharmony_ci :meth:`~BytesFeedParser.feed` method must be a string. This is of limited 1187db96d56Sopenharmony_ci utility, since the only way for such a message to be valid is for it to 1197db96d56Sopenharmony_ci contain only ASCII text or, if :attr:`~email.policy.Policy.utf8` is 1207db96d56Sopenharmony_ci ``True``, no binary attachments. 1217db96d56Sopenharmony_ci 1227db96d56Sopenharmony_ci .. versionchanged:: 3.3 Added the *policy* keyword. 1237db96d56Sopenharmony_ci 1247db96d56Sopenharmony_ci 1257db96d56Sopenharmony_ciParser API 1267db96d56Sopenharmony_ci^^^^^^^^^^ 1277db96d56Sopenharmony_ci 1287db96d56Sopenharmony_ciThe :class:`BytesParser` class, imported from the :mod:`email.parser` module, 1297db96d56Sopenharmony_ciprovides an API that can be used to parse a message when the complete contents 1307db96d56Sopenharmony_ciof the message are available in a :term:`bytes-like object` or file. The 1317db96d56Sopenharmony_ci:mod:`email.parser` module also provides :class:`Parser` for parsing strings, 1327db96d56Sopenharmony_ciand header-only parsers, :class:`BytesHeaderParser` and 1337db96d56Sopenharmony_ci:class:`HeaderParser`, which can be used if you're only interested in the 1347db96d56Sopenharmony_ciheaders of the message. :class:`BytesHeaderParser` and :class:`HeaderParser` 1357db96d56Sopenharmony_cican be much faster in these situations, since they do not attempt to parse the 1367db96d56Sopenharmony_cimessage body, instead setting the payload to the raw body. 1377db96d56Sopenharmony_ci 1387db96d56Sopenharmony_ci 1397db96d56Sopenharmony_ci.. class:: BytesParser(_class=None, *, policy=policy.compat32) 1407db96d56Sopenharmony_ci 1417db96d56Sopenharmony_ci Create a :class:`BytesParser` instance. The *_class* and *policy* 1427db96d56Sopenharmony_ci arguments have the same meaning and semantics as the *_factory* 1437db96d56Sopenharmony_ci and *policy* arguments of :class:`BytesFeedParser`. 1447db96d56Sopenharmony_ci 1457db96d56Sopenharmony_ci Note: **The policy keyword should always be specified**; The default will 1467db96d56Sopenharmony_ci change to :data:`email.policy.default` in a future version of Python. 1477db96d56Sopenharmony_ci 1487db96d56Sopenharmony_ci .. versionchanged:: 3.3 1497db96d56Sopenharmony_ci Removed the *strict* argument that was deprecated in 2.4. Added the 1507db96d56Sopenharmony_ci *policy* keyword. 1517db96d56Sopenharmony_ci .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. 1527db96d56Sopenharmony_ci 1537db96d56Sopenharmony_ci 1547db96d56Sopenharmony_ci .. method:: parse(fp, headersonly=False) 1557db96d56Sopenharmony_ci 1567db96d56Sopenharmony_ci Read all the data from the binary file-like object *fp*, parse the 1577db96d56Sopenharmony_ci resulting bytes, and return the message object. *fp* must support 1587db96d56Sopenharmony_ci both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read` 1597db96d56Sopenharmony_ci methods. 1607db96d56Sopenharmony_ci 1617db96d56Sopenharmony_ci The bytes contained in *fp* must be formatted as a block of :rfc:`5322` 1627db96d56Sopenharmony_ci (or, if :attr:`~email.policy.Policy.utf8` is ``True``, :rfc:`6532`) 1637db96d56Sopenharmony_ci style headers and header continuation lines, optionally preceded by an 1647db96d56Sopenharmony_ci envelope header. The header block is terminated either by the end of the 1657db96d56Sopenharmony_ci data or by a blank line. Following the header block is the body of the 1667db96d56Sopenharmony_ci message (which may contain MIME-encoded subparts, including subparts 1677db96d56Sopenharmony_ci with a :mailheader:`Content-Transfer-Encoding` of ``8bit``). 1687db96d56Sopenharmony_ci 1697db96d56Sopenharmony_ci Optional *headersonly* is a flag specifying whether to stop parsing after 1707db96d56Sopenharmony_ci reading the headers or not. The default is ``False``, meaning it parses 1717db96d56Sopenharmony_ci the entire contents of the file. 1727db96d56Sopenharmony_ci 1737db96d56Sopenharmony_ci 1747db96d56Sopenharmony_ci .. method:: parsebytes(bytes, headersonly=False) 1757db96d56Sopenharmony_ci 1767db96d56Sopenharmony_ci Similar to the :meth:`parse` method, except it takes a :term:`bytes-like 1777db96d56Sopenharmony_ci object` instead of a file-like object. Calling this method on a 1787db96d56Sopenharmony_ci :term:`bytes-like object` is equivalent to wrapping *bytes* in a 1797db96d56Sopenharmony_ci :class:`~io.BytesIO` instance first and calling :meth:`parse`. 1807db96d56Sopenharmony_ci 1817db96d56Sopenharmony_ci Optional *headersonly* is as with the :meth:`parse` method. 1827db96d56Sopenharmony_ci 1837db96d56Sopenharmony_ci .. versionadded:: 3.2 1847db96d56Sopenharmony_ci 1857db96d56Sopenharmony_ci 1867db96d56Sopenharmony_ci.. class:: BytesHeaderParser(_class=None, *, policy=policy.compat32) 1877db96d56Sopenharmony_ci 1887db96d56Sopenharmony_ci Exactly like :class:`BytesParser`, except that *headersonly* 1897db96d56Sopenharmony_ci defaults to ``True``. 1907db96d56Sopenharmony_ci 1917db96d56Sopenharmony_ci .. versionadded:: 3.3 1927db96d56Sopenharmony_ci 1937db96d56Sopenharmony_ci 1947db96d56Sopenharmony_ci.. class:: Parser(_class=None, *, policy=policy.compat32) 1957db96d56Sopenharmony_ci 1967db96d56Sopenharmony_ci This class is parallel to :class:`BytesParser`, but handles string input. 1977db96d56Sopenharmony_ci 1987db96d56Sopenharmony_ci .. versionchanged:: 3.3 1997db96d56Sopenharmony_ci Removed the *strict* argument. Added the *policy* keyword. 2007db96d56Sopenharmony_ci .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. 2017db96d56Sopenharmony_ci 2027db96d56Sopenharmony_ci 2037db96d56Sopenharmony_ci .. method:: parse(fp, headersonly=False) 2047db96d56Sopenharmony_ci 2057db96d56Sopenharmony_ci Read all the data from the text-mode file-like object *fp*, parse the 2067db96d56Sopenharmony_ci resulting text, and return the root message object. *fp* must support 2077db96d56Sopenharmony_ci both the :meth:`~io.TextIOBase.readline` and the 2087db96d56Sopenharmony_ci :meth:`~io.TextIOBase.read` methods on file-like objects. 2097db96d56Sopenharmony_ci 2107db96d56Sopenharmony_ci Other than the text mode requirement, this method operates like 2117db96d56Sopenharmony_ci :meth:`BytesParser.parse`. 2127db96d56Sopenharmony_ci 2137db96d56Sopenharmony_ci 2147db96d56Sopenharmony_ci .. method:: parsestr(text, headersonly=False) 2157db96d56Sopenharmony_ci 2167db96d56Sopenharmony_ci Similar to the :meth:`parse` method, except it takes a string object 2177db96d56Sopenharmony_ci instead of a file-like object. Calling this method on a string is 2187db96d56Sopenharmony_ci equivalent to wrapping *text* in a :class:`~io.StringIO` instance first 2197db96d56Sopenharmony_ci and calling :meth:`parse`. 2207db96d56Sopenharmony_ci 2217db96d56Sopenharmony_ci Optional *headersonly* is as with the :meth:`parse` method. 2227db96d56Sopenharmony_ci 2237db96d56Sopenharmony_ci 2247db96d56Sopenharmony_ci.. class:: HeaderParser(_class=None, *, policy=policy.compat32) 2257db96d56Sopenharmony_ci 2267db96d56Sopenharmony_ci Exactly like :class:`Parser`, except that *headersonly* 2277db96d56Sopenharmony_ci defaults to ``True``. 2287db96d56Sopenharmony_ci 2297db96d56Sopenharmony_ci 2307db96d56Sopenharmony_ciSince creating a message object structure from a string or a file object is such 2317db96d56Sopenharmony_cia common task, four functions are provided as a convenience. They are available 2327db96d56Sopenharmony_ciin the top-level :mod:`email` package namespace. 2337db96d56Sopenharmony_ci 2347db96d56Sopenharmony_ci.. currentmodule:: email 2357db96d56Sopenharmony_ci 2367db96d56Sopenharmony_ci 2377db96d56Sopenharmony_ci.. function:: message_from_bytes(s, _class=None, *, policy=policy.compat32) 2387db96d56Sopenharmony_ci 2397db96d56Sopenharmony_ci Return a message object structure from a :term:`bytes-like object`. This is 2407db96d56Sopenharmony_ci equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and 2417db96d56Sopenharmony_ci *policy* are interpreted as with the :class:`~email.parser.BytesParser` class 2427db96d56Sopenharmony_ci constructor. 2437db96d56Sopenharmony_ci 2447db96d56Sopenharmony_ci .. versionadded:: 3.2 2457db96d56Sopenharmony_ci .. versionchanged:: 3.3 2467db96d56Sopenharmony_ci Removed the *strict* argument. Added the *policy* keyword. 2477db96d56Sopenharmony_ci 2487db96d56Sopenharmony_ci 2497db96d56Sopenharmony_ci.. function:: message_from_binary_file(fp, _class=None, *, \ 2507db96d56Sopenharmony_ci policy=policy.compat32) 2517db96d56Sopenharmony_ci 2527db96d56Sopenharmony_ci Return a message object structure tree from an open binary :term:`file 2537db96d56Sopenharmony_ci object`. This is equivalent to ``BytesParser().parse(fp)``. *_class* and 2547db96d56Sopenharmony_ci *policy* are interpreted as with the :class:`~email.parser.BytesParser` class 2557db96d56Sopenharmony_ci constructor. 2567db96d56Sopenharmony_ci 2577db96d56Sopenharmony_ci .. versionadded:: 3.2 2587db96d56Sopenharmony_ci .. versionchanged:: 3.3 2597db96d56Sopenharmony_ci Removed the *strict* argument. Added the *policy* keyword. 2607db96d56Sopenharmony_ci 2617db96d56Sopenharmony_ci 2627db96d56Sopenharmony_ci.. function:: message_from_string(s, _class=None, *, policy=policy.compat32) 2637db96d56Sopenharmony_ci 2647db96d56Sopenharmony_ci Return a message object structure from a string. This is equivalent to 2657db96d56Sopenharmony_ci ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as 2667db96d56Sopenharmony_ci with the :class:`~email.parser.Parser` class constructor. 2677db96d56Sopenharmony_ci 2687db96d56Sopenharmony_ci .. versionchanged:: 3.3 2697db96d56Sopenharmony_ci Removed the *strict* argument. Added the *policy* keyword. 2707db96d56Sopenharmony_ci 2717db96d56Sopenharmony_ci 2727db96d56Sopenharmony_ci.. function:: message_from_file(fp, _class=None, *, policy=policy.compat32) 2737db96d56Sopenharmony_ci 2747db96d56Sopenharmony_ci Return a message object structure tree from an open :term:`file object`. 2757db96d56Sopenharmony_ci This is equivalent to ``Parser().parse(fp)``. *_class* and *policy* are 2767db96d56Sopenharmony_ci interpreted as with the :class:`~email.parser.Parser` class constructor. 2777db96d56Sopenharmony_ci 2787db96d56Sopenharmony_ci .. versionchanged:: 3.3 2797db96d56Sopenharmony_ci Removed the *strict* argument. Added the *policy* keyword. 2807db96d56Sopenharmony_ci .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. 2817db96d56Sopenharmony_ci 2827db96d56Sopenharmony_ci 2837db96d56Sopenharmony_ciHere's an example of how you might use :func:`message_from_bytes` at an 2847db96d56Sopenharmony_ciinteractive Python prompt:: 2857db96d56Sopenharmony_ci 2867db96d56Sopenharmony_ci >>> import email 2877db96d56Sopenharmony_ci >>> msg = email.message_from_bytes(myBytes) # doctest: +SKIP 2887db96d56Sopenharmony_ci 2897db96d56Sopenharmony_ci 2907db96d56Sopenharmony_ciAdditional notes 2917db96d56Sopenharmony_ci^^^^^^^^^^^^^^^^ 2927db96d56Sopenharmony_ci 2937db96d56Sopenharmony_ciHere are some notes on the parsing semantics: 2947db96d56Sopenharmony_ci 2957db96d56Sopenharmony_ci* Most non-\ :mimetype:`multipart` type messages are parsed as a single message 2967db96d56Sopenharmony_ci object with a string payload. These objects will return ``False`` for 2977db96d56Sopenharmony_ci :meth:`~email.message.EmailMessage.is_multipart`, and 2987db96d56Sopenharmony_ci :meth:`~email.message.EmailMessage.iter_parts` will yield an empty list. 2997db96d56Sopenharmony_ci 3007db96d56Sopenharmony_ci* All :mimetype:`multipart` type messages will be parsed as a container message 3017db96d56Sopenharmony_ci object with a list of sub-message objects for their payload. The outer 3027db96d56Sopenharmony_ci container message will return ``True`` for 3037db96d56Sopenharmony_ci :meth:`~email.message.EmailMessage.is_multipart`, and 3047db96d56Sopenharmony_ci :meth:`~email.message.EmailMessage.iter_parts` will yield a list of subparts. 3057db96d56Sopenharmony_ci 3067db96d56Sopenharmony_ci* Most messages with a content type of :mimetype:`message/\*` (such as 3077db96d56Sopenharmony_ci :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also 3087db96d56Sopenharmony_ci be parsed as container object containing a list payload of length 1. Their 3097db96d56Sopenharmony_ci :meth:`~email.message.EmailMessage.is_multipart` method will return ``True``. 3107db96d56Sopenharmony_ci The single element yielded by :meth:`~email.message.EmailMessage.iter_parts` 3117db96d56Sopenharmony_ci will be a sub-message object. 3127db96d56Sopenharmony_ci 3137db96d56Sopenharmony_ci* Some non-standards-compliant messages may not be internally consistent about 3147db96d56Sopenharmony_ci their :mimetype:`multipart`\ -edness. Such messages may have a 3157db96d56Sopenharmony_ci :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their 3167db96d56Sopenharmony_ci :meth:`~email.message.EmailMessage.is_multipart` method may return ``False``. 3177db96d56Sopenharmony_ci If such messages were parsed with the :class:`~email.parser.FeedParser`, 3187db96d56Sopenharmony_ci they will have an instance of the 3197db96d56Sopenharmony_ci :class:`~email.errors.MultipartInvariantViolationDefect` class in their 3207db96d56Sopenharmony_ci *defects* attribute list. See :mod:`email.errors` for details. 321