17db96d56Sopenharmony_ci:mod:`email.header`: Internationalized headers 27db96d56Sopenharmony_ci---------------------------------------------- 37db96d56Sopenharmony_ci 47db96d56Sopenharmony_ci.. module:: email.header 57db96d56Sopenharmony_ci :synopsis: Representing non-ASCII headers 67db96d56Sopenharmony_ci 77db96d56Sopenharmony_ci**Source code:** :source:`Lib/email/header.py` 87db96d56Sopenharmony_ci 97db96d56Sopenharmony_ci-------------- 107db96d56Sopenharmony_ci 117db96d56Sopenharmony_ciThis module is part of the legacy (``Compat32``) email API. In the current API 127db96d56Sopenharmony_ciencoding and decoding of headers is handled transparently by the 137db96d56Sopenharmony_cidictionary-like API of the :class:`~email.message.EmailMessage` class. In 147db96d56Sopenharmony_ciaddition to uses in legacy code, this module can be useful in applications that 157db96d56Sopenharmony_cineed to completely control the character sets used when encoding headers. 167db96d56Sopenharmony_ci 177db96d56Sopenharmony_ciThe remaining text in this section is the original documentation of the module. 187db96d56Sopenharmony_ci 197db96d56Sopenharmony_ci:rfc:`2822` is the base standard that describes the format of email messages. 207db96d56Sopenharmony_ciIt derives from the older :rfc:`822` standard which came into widespread use at 217db96d56Sopenharmony_cia time when most email was composed of ASCII characters only. :rfc:`2822` is a 227db96d56Sopenharmony_cispecification written assuming email contains only 7-bit ASCII characters. 237db96d56Sopenharmony_ci 247db96d56Sopenharmony_ciOf course, as email has been deployed worldwide, it has become 257db96d56Sopenharmony_ciinternationalized, such that language specific character sets can now be used in 267db96d56Sopenharmony_ciemail messages. The base standard still requires email messages to be 277db96d56Sopenharmony_citransferred using only 7-bit ASCII characters, so a slew of RFCs have been 287db96d56Sopenharmony_ciwritten describing how to encode email containing non-ASCII characters into 297db96d56Sopenharmony_ci:rfc:`2822`\ -compliant format. These RFCs include :rfc:`2045`, :rfc:`2046`, 307db96d56Sopenharmony_ci:rfc:`2047`, and :rfc:`2231`. The :mod:`email` package supports these standards 317db96d56Sopenharmony_ciin its :mod:`email.header` and :mod:`email.charset` modules. 327db96d56Sopenharmony_ci 337db96d56Sopenharmony_ciIf you want to include non-ASCII characters in your email headers, say in the 347db96d56Sopenharmony_ci:mailheader:`Subject` or :mailheader:`To` fields, you should use the 357db96d56Sopenharmony_ci:class:`Header` class and assign the field in the :class:`~email.message.Message` 367db96d56Sopenharmony_ciobject to an instance of :class:`Header` instead of using a string for the header 377db96d56Sopenharmony_civalue. Import the :class:`Header` class from the :mod:`email.header` module. 387db96d56Sopenharmony_ciFor example:: 397db96d56Sopenharmony_ci 407db96d56Sopenharmony_ci >>> from email.message import Message 417db96d56Sopenharmony_ci >>> from email.header import Header 427db96d56Sopenharmony_ci >>> msg = Message() 437db96d56Sopenharmony_ci >>> h = Header('p\xf6stal', 'iso-8859-1') 447db96d56Sopenharmony_ci >>> msg['Subject'] = h 457db96d56Sopenharmony_ci >>> msg.as_string() 467db96d56Sopenharmony_ci 'Subject: =?iso-8859-1?q?p=F6stal?=\n\n' 477db96d56Sopenharmony_ci 487db96d56Sopenharmony_ci 497db96d56Sopenharmony_ci 507db96d56Sopenharmony_ciNotice here how we wanted the :mailheader:`Subject` field to contain a non-ASCII 517db96d56Sopenharmony_cicharacter? We did this by creating a :class:`Header` instance and passing in 527db96d56Sopenharmony_cithe character set that the byte string was encoded in. When the subsequent 537db96d56Sopenharmony_ci:class:`~email.message.Message` instance was flattened, the :mailheader:`Subject` 547db96d56Sopenharmony_cifield was properly :rfc:`2047` encoded. MIME-aware mail readers would show this 557db96d56Sopenharmony_ciheader using the embedded ISO-8859-1 character. 567db96d56Sopenharmony_ci 577db96d56Sopenharmony_ciHere is the :class:`Header` class description: 587db96d56Sopenharmony_ci 597db96d56Sopenharmony_ci 607db96d56Sopenharmony_ci.. class:: Header(s=None, charset=None, maxlinelen=None, header_name=None, continuation_ws=' ', errors='strict') 617db96d56Sopenharmony_ci 627db96d56Sopenharmony_ci Create a MIME-compliant header that can contain strings in different character 637db96d56Sopenharmony_ci sets. 647db96d56Sopenharmony_ci 657db96d56Sopenharmony_ci Optional *s* is the initial header value. If ``None`` (the default), the 667db96d56Sopenharmony_ci initial header value is not set. You can later append to the header with 677db96d56Sopenharmony_ci :meth:`append` method calls. *s* may be an instance of :class:`bytes` or 687db96d56Sopenharmony_ci :class:`str`, but see the :meth:`append` documentation for semantics. 697db96d56Sopenharmony_ci 707db96d56Sopenharmony_ci Optional *charset* serves two purposes: it has the same meaning as the *charset* 717db96d56Sopenharmony_ci argument to the :meth:`append` method. It also sets the default character set 727db96d56Sopenharmony_ci for all subsequent :meth:`append` calls that omit the *charset* argument. If 737db96d56Sopenharmony_ci *charset* is not provided in the constructor (the default), the ``us-ascii`` 747db96d56Sopenharmony_ci character set is used both as *s*'s initial charset and as the default for 757db96d56Sopenharmony_ci subsequent :meth:`append` calls. 767db96d56Sopenharmony_ci 777db96d56Sopenharmony_ci The maximum line length can be specified explicitly via *maxlinelen*. For 787db96d56Sopenharmony_ci splitting the first line to a shorter value (to account for the field header 797db96d56Sopenharmony_ci which isn't included in *s*, e.g. :mailheader:`Subject`) pass in the name of the 807db96d56Sopenharmony_ci field in *header_name*. The default *maxlinelen* is 76, and the default value 817db96d56Sopenharmony_ci for *header_name* is ``None``, meaning it is not taken into account for the 827db96d56Sopenharmony_ci first line of a long, split header. 837db96d56Sopenharmony_ci 847db96d56Sopenharmony_ci Optional *continuation_ws* must be :rfc:`2822`\ -compliant folding 857db96d56Sopenharmony_ci whitespace, and is usually either a space or a hard tab character. This 867db96d56Sopenharmony_ci character will be prepended to continuation lines. *continuation_ws* 877db96d56Sopenharmony_ci defaults to a single space character. 887db96d56Sopenharmony_ci 897db96d56Sopenharmony_ci Optional *errors* is passed straight through to the :meth:`append` method. 907db96d56Sopenharmony_ci 917db96d56Sopenharmony_ci 927db96d56Sopenharmony_ci .. method:: append(s, charset=None, errors='strict') 937db96d56Sopenharmony_ci 947db96d56Sopenharmony_ci Append the string *s* to the MIME header. 957db96d56Sopenharmony_ci 967db96d56Sopenharmony_ci Optional *charset*, if given, should be a :class:`~email.charset.Charset` 977db96d56Sopenharmony_ci instance (see :mod:`email.charset`) or the name of a character set, which 987db96d56Sopenharmony_ci will be converted to a :class:`~email.charset.Charset` instance. A value 997db96d56Sopenharmony_ci of ``None`` (the default) means that the *charset* given in the constructor 1007db96d56Sopenharmony_ci is used. 1017db96d56Sopenharmony_ci 1027db96d56Sopenharmony_ci *s* may be an instance of :class:`bytes` or :class:`str`. If it is an 1037db96d56Sopenharmony_ci instance of :class:`bytes`, then *charset* is the encoding of that byte 1047db96d56Sopenharmony_ci string, and a :exc:`UnicodeError` will be raised if the string cannot be 1057db96d56Sopenharmony_ci decoded with that character set. 1067db96d56Sopenharmony_ci 1077db96d56Sopenharmony_ci If *s* is an instance of :class:`str`, then *charset* is a hint specifying 1087db96d56Sopenharmony_ci the character set of the characters in the string. 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_ci In either case, when producing an :rfc:`2822`\ -compliant header using 1117db96d56Sopenharmony_ci :rfc:`2047` rules, the string will be encoded using the output codec of 1127db96d56Sopenharmony_ci the charset. If the string cannot be encoded using the output codec, a 1137db96d56Sopenharmony_ci UnicodeError will be raised. 1147db96d56Sopenharmony_ci 1157db96d56Sopenharmony_ci Optional *errors* is passed as the errors argument to the decode call 1167db96d56Sopenharmony_ci if *s* is a byte string. 1177db96d56Sopenharmony_ci 1187db96d56Sopenharmony_ci 1197db96d56Sopenharmony_ci .. method:: encode(splitchars=';, \t', maxlinelen=None, linesep='\n') 1207db96d56Sopenharmony_ci 1217db96d56Sopenharmony_ci Encode a message header into an RFC-compliant format, possibly wrapping 1227db96d56Sopenharmony_ci long lines and encapsulating non-ASCII parts in base64 or quoted-printable 1237db96d56Sopenharmony_ci encodings. 1247db96d56Sopenharmony_ci 1257db96d56Sopenharmony_ci Optional *splitchars* is a string containing characters which should be 1267db96d56Sopenharmony_ci given extra weight by the splitting algorithm during normal header 1277db96d56Sopenharmony_ci wrapping. This is in very rough support of :RFC:`2822`\'s 'higher level 1287db96d56Sopenharmony_ci syntactic breaks': split points preceded by a splitchar are preferred 1297db96d56Sopenharmony_ci during line splitting, with the characters preferred in the order in 1307db96d56Sopenharmony_ci which they appear in the string. Space and tab may be included in the 1317db96d56Sopenharmony_ci string to indicate whether preference should be given to one over the 1327db96d56Sopenharmony_ci other as a split point when other split chars do not appear in the line 1337db96d56Sopenharmony_ci being split. Splitchars does not affect :RFC:`2047` encoded lines. 1347db96d56Sopenharmony_ci 1357db96d56Sopenharmony_ci *maxlinelen*, if given, overrides the instance's value for the maximum 1367db96d56Sopenharmony_ci line length. 1377db96d56Sopenharmony_ci 1387db96d56Sopenharmony_ci *linesep* specifies the characters used to separate the lines of the 1397db96d56Sopenharmony_ci folded header. It defaults to the most useful value for Python 1407db96d56Sopenharmony_ci application code (``\n``), but ``\r\n`` can be specified in order 1417db96d56Sopenharmony_ci to produce headers with RFC-compliant line separators. 1427db96d56Sopenharmony_ci 1437db96d56Sopenharmony_ci .. versionchanged:: 3.2 1447db96d56Sopenharmony_ci Added the *linesep* argument. 1457db96d56Sopenharmony_ci 1467db96d56Sopenharmony_ci 1477db96d56Sopenharmony_ci The :class:`Header` class also provides a number of methods to support 1487db96d56Sopenharmony_ci standard operators and built-in functions. 1497db96d56Sopenharmony_ci 1507db96d56Sopenharmony_ci .. method:: __str__() 1517db96d56Sopenharmony_ci 1527db96d56Sopenharmony_ci Returns an approximation of the :class:`Header` as a string, using an 1537db96d56Sopenharmony_ci unlimited line length. All pieces are converted to unicode using the 1547db96d56Sopenharmony_ci specified encoding and joined together appropriately. Any pieces with a 1557db96d56Sopenharmony_ci charset of ``'unknown-8bit'`` are decoded as ASCII using the ``'replace'`` 1567db96d56Sopenharmony_ci error handler. 1577db96d56Sopenharmony_ci 1587db96d56Sopenharmony_ci .. versionchanged:: 3.2 1597db96d56Sopenharmony_ci Added handling for the ``'unknown-8bit'`` charset. 1607db96d56Sopenharmony_ci 1617db96d56Sopenharmony_ci 1627db96d56Sopenharmony_ci .. method:: __eq__(other) 1637db96d56Sopenharmony_ci 1647db96d56Sopenharmony_ci This method allows you to compare two :class:`Header` instances for 1657db96d56Sopenharmony_ci equality. 1667db96d56Sopenharmony_ci 1677db96d56Sopenharmony_ci 1687db96d56Sopenharmony_ci .. method:: __ne__(other) 1697db96d56Sopenharmony_ci 1707db96d56Sopenharmony_ci This method allows you to compare two :class:`Header` instances for 1717db96d56Sopenharmony_ci inequality. 1727db96d56Sopenharmony_ci 1737db96d56Sopenharmony_ciThe :mod:`email.header` module also provides the following convenient functions. 1747db96d56Sopenharmony_ci 1757db96d56Sopenharmony_ci 1767db96d56Sopenharmony_ci.. function:: decode_header(header) 1777db96d56Sopenharmony_ci 1787db96d56Sopenharmony_ci Decode a message header value without converting the character set. The header 1797db96d56Sopenharmony_ci value is in *header*. 1807db96d56Sopenharmony_ci 1817db96d56Sopenharmony_ci This function returns a list of ``(decoded_string, charset)`` pairs containing 1827db96d56Sopenharmony_ci each of the decoded parts of the header. *charset* is ``None`` for non-encoded 1837db96d56Sopenharmony_ci parts of the header, otherwise a lower case string containing the name of the 1847db96d56Sopenharmony_ci character set specified in the encoded string. 1857db96d56Sopenharmony_ci 1867db96d56Sopenharmony_ci Here's an example:: 1877db96d56Sopenharmony_ci 1887db96d56Sopenharmony_ci >>> from email.header import decode_header 1897db96d56Sopenharmony_ci >>> decode_header('=?iso-8859-1?q?p=F6stal?=') 1907db96d56Sopenharmony_ci [(b'p\xf6stal', 'iso-8859-1')] 1917db96d56Sopenharmony_ci 1927db96d56Sopenharmony_ci 1937db96d56Sopenharmony_ci.. function:: make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ') 1947db96d56Sopenharmony_ci 1957db96d56Sopenharmony_ci Create a :class:`Header` instance from a sequence of pairs as returned by 1967db96d56Sopenharmony_ci :func:`decode_header`. 1977db96d56Sopenharmony_ci 1987db96d56Sopenharmony_ci :func:`decode_header` takes a header value string and returns a sequence of 1997db96d56Sopenharmony_ci pairs of the format ``(decoded_string, charset)`` where *charset* is the name of 2007db96d56Sopenharmony_ci the character set. 2017db96d56Sopenharmony_ci 2027db96d56Sopenharmony_ci This function takes one of those sequence of pairs and returns a 2037db96d56Sopenharmony_ci :class:`Header` instance. Optional *maxlinelen*, *header_name*, and 2047db96d56Sopenharmony_ci *continuation_ws* are as in the :class:`Header` constructor. 2057db96d56Sopenharmony_ci 206