17db96d56Sopenharmony_ci:mod:`email.header`: Internationalized headers
27db96d56Sopenharmony_ci----------------------------------------------
37db96d56Sopenharmony_ci
47db96d56Sopenharmony_ci.. module:: email.header
57db96d56Sopenharmony_ci   :synopsis: Representing non-ASCII headers
67db96d56Sopenharmony_ci
77db96d56Sopenharmony_ci**Source code:** :source:`Lib/email/header.py`
87db96d56Sopenharmony_ci
97db96d56Sopenharmony_ci--------------
107db96d56Sopenharmony_ci
117db96d56Sopenharmony_ciThis module is part of the legacy (``Compat32``) email API.  In the current API
127db96d56Sopenharmony_ciencoding and decoding of headers is handled transparently by the
137db96d56Sopenharmony_cidictionary-like API of the :class:`~email.message.EmailMessage` class.  In
147db96d56Sopenharmony_ciaddition to uses in legacy code, this module can be useful in applications that
157db96d56Sopenharmony_cineed to completely control the character sets used when encoding headers.
167db96d56Sopenharmony_ci
177db96d56Sopenharmony_ciThe remaining text in this section is the original documentation of the module.
187db96d56Sopenharmony_ci
197db96d56Sopenharmony_ci:rfc:`2822` is the base standard that describes the format of email messages.
207db96d56Sopenharmony_ciIt derives from the older :rfc:`822` standard which came into widespread use at
217db96d56Sopenharmony_cia time when most email was composed of ASCII characters only.  :rfc:`2822` is a
227db96d56Sopenharmony_cispecification written assuming email contains only 7-bit ASCII characters.
237db96d56Sopenharmony_ci
247db96d56Sopenharmony_ciOf course, as email has been deployed worldwide, it has become
257db96d56Sopenharmony_ciinternationalized, such that language specific character sets can now be used in
267db96d56Sopenharmony_ciemail messages.  The base standard still requires email messages to be
277db96d56Sopenharmony_citransferred using only 7-bit ASCII characters, so a slew of RFCs have been
287db96d56Sopenharmony_ciwritten describing how to encode email containing non-ASCII characters into
297db96d56Sopenharmony_ci:rfc:`2822`\ -compliant format. These RFCs include :rfc:`2045`, :rfc:`2046`,
307db96d56Sopenharmony_ci:rfc:`2047`, and :rfc:`2231`. The :mod:`email` package supports these standards
317db96d56Sopenharmony_ciin its :mod:`email.header` and :mod:`email.charset` modules.
327db96d56Sopenharmony_ci
337db96d56Sopenharmony_ciIf you want to include non-ASCII characters in your email headers, say in the
347db96d56Sopenharmony_ci:mailheader:`Subject` or :mailheader:`To` fields, you should use the
357db96d56Sopenharmony_ci:class:`Header` class and assign the field in the :class:`~email.message.Message`
367db96d56Sopenharmony_ciobject to an instance of :class:`Header` instead of using a string for the header
377db96d56Sopenharmony_civalue.  Import the :class:`Header` class from the :mod:`email.header` module.
387db96d56Sopenharmony_ciFor example::
397db96d56Sopenharmony_ci
407db96d56Sopenharmony_ci   >>> from email.message import Message
417db96d56Sopenharmony_ci   >>> from email.header import Header
427db96d56Sopenharmony_ci   >>> msg = Message()
437db96d56Sopenharmony_ci   >>> h = Header('p\xf6stal', 'iso-8859-1')
447db96d56Sopenharmony_ci   >>> msg['Subject'] = h
457db96d56Sopenharmony_ci   >>> msg.as_string()
467db96d56Sopenharmony_ci   'Subject: =?iso-8859-1?q?p=F6stal?=\n\n'
477db96d56Sopenharmony_ci
487db96d56Sopenharmony_ci
497db96d56Sopenharmony_ci
507db96d56Sopenharmony_ciNotice here how we wanted the :mailheader:`Subject` field to contain a non-ASCII
517db96d56Sopenharmony_cicharacter?  We did this by creating a :class:`Header` instance and passing in
527db96d56Sopenharmony_cithe character set that the byte string was encoded in.  When the subsequent
537db96d56Sopenharmony_ci:class:`~email.message.Message` instance was flattened, the :mailheader:`Subject`
547db96d56Sopenharmony_cifield was properly :rfc:`2047` encoded.  MIME-aware mail readers would show this
557db96d56Sopenharmony_ciheader using the embedded ISO-8859-1 character.
567db96d56Sopenharmony_ci
577db96d56Sopenharmony_ciHere is the :class:`Header` class description:
587db96d56Sopenharmony_ci
597db96d56Sopenharmony_ci
607db96d56Sopenharmony_ci.. class:: Header(s=None, charset=None, maxlinelen=None, header_name=None, continuation_ws=' ', errors='strict')
617db96d56Sopenharmony_ci
627db96d56Sopenharmony_ci   Create a MIME-compliant header that can contain strings in different character
637db96d56Sopenharmony_ci   sets.
647db96d56Sopenharmony_ci
657db96d56Sopenharmony_ci   Optional *s* is the initial header value.  If ``None`` (the default), the
667db96d56Sopenharmony_ci   initial header value is not set.  You can later append to the header with
677db96d56Sopenharmony_ci   :meth:`append` method calls.  *s* may be an instance of :class:`bytes` or
687db96d56Sopenharmony_ci   :class:`str`, but see the :meth:`append` documentation for semantics.
697db96d56Sopenharmony_ci
707db96d56Sopenharmony_ci   Optional *charset* serves two purposes: it has the same meaning as the *charset*
717db96d56Sopenharmony_ci   argument to the :meth:`append` method.  It also sets the default character set
727db96d56Sopenharmony_ci   for all subsequent :meth:`append` calls that omit the *charset* argument.  If
737db96d56Sopenharmony_ci   *charset* is not provided in the constructor (the default), the ``us-ascii``
747db96d56Sopenharmony_ci   character set is used both as *s*'s initial charset and as the default for
757db96d56Sopenharmony_ci   subsequent :meth:`append` calls.
767db96d56Sopenharmony_ci
777db96d56Sopenharmony_ci   The maximum line length can be specified explicitly via *maxlinelen*.  For
787db96d56Sopenharmony_ci   splitting the first line to a shorter value (to account for the field header
797db96d56Sopenharmony_ci   which isn't included in *s*, e.g. :mailheader:`Subject`) pass in the name of the
807db96d56Sopenharmony_ci   field in *header_name*.  The default *maxlinelen* is 76, and the default value
817db96d56Sopenharmony_ci   for *header_name* is ``None``, meaning it is not taken into account for the
827db96d56Sopenharmony_ci   first line of a long, split header.
837db96d56Sopenharmony_ci
847db96d56Sopenharmony_ci   Optional *continuation_ws* must be :rfc:`2822`\ -compliant folding
857db96d56Sopenharmony_ci   whitespace, and is usually either a space or a hard tab character.  This
867db96d56Sopenharmony_ci   character will be prepended to continuation lines.  *continuation_ws*
877db96d56Sopenharmony_ci   defaults to a single space character.
887db96d56Sopenharmony_ci
897db96d56Sopenharmony_ci   Optional *errors* is passed straight through to the :meth:`append` method.
907db96d56Sopenharmony_ci
917db96d56Sopenharmony_ci
927db96d56Sopenharmony_ci   .. method:: append(s, charset=None, errors='strict')
937db96d56Sopenharmony_ci
947db96d56Sopenharmony_ci      Append the string *s* to the MIME header.
957db96d56Sopenharmony_ci
967db96d56Sopenharmony_ci      Optional *charset*, if given, should be a :class:`~email.charset.Charset`
977db96d56Sopenharmony_ci      instance (see :mod:`email.charset`) or the name of a character set, which
987db96d56Sopenharmony_ci      will be converted to a :class:`~email.charset.Charset` instance.  A value
997db96d56Sopenharmony_ci      of ``None`` (the default) means that the *charset* given in the constructor
1007db96d56Sopenharmony_ci      is used.
1017db96d56Sopenharmony_ci
1027db96d56Sopenharmony_ci      *s* may be an instance of :class:`bytes` or :class:`str`.  If it is an
1037db96d56Sopenharmony_ci      instance of :class:`bytes`, then *charset* is the encoding of that byte
1047db96d56Sopenharmony_ci      string, and a :exc:`UnicodeError` will be raised if the string cannot be
1057db96d56Sopenharmony_ci      decoded with that character set.
1067db96d56Sopenharmony_ci
1077db96d56Sopenharmony_ci      If *s* is an instance of :class:`str`, then *charset* is a hint specifying
1087db96d56Sopenharmony_ci      the character set of the characters in the string.
1097db96d56Sopenharmony_ci
1107db96d56Sopenharmony_ci      In either case, when producing an :rfc:`2822`\ -compliant header using
1117db96d56Sopenharmony_ci      :rfc:`2047` rules, the string will be encoded using the output codec of
1127db96d56Sopenharmony_ci      the charset.  If the string cannot be encoded using the output codec, a
1137db96d56Sopenharmony_ci      UnicodeError will be raised.
1147db96d56Sopenharmony_ci
1157db96d56Sopenharmony_ci      Optional *errors* is passed as the errors argument to the decode call
1167db96d56Sopenharmony_ci      if *s* is a byte string.
1177db96d56Sopenharmony_ci
1187db96d56Sopenharmony_ci
1197db96d56Sopenharmony_ci   .. method:: encode(splitchars=';, \t', maxlinelen=None, linesep='\n')
1207db96d56Sopenharmony_ci
1217db96d56Sopenharmony_ci      Encode a message header into an RFC-compliant format, possibly wrapping
1227db96d56Sopenharmony_ci      long lines and encapsulating non-ASCII parts in base64 or quoted-printable
1237db96d56Sopenharmony_ci      encodings.
1247db96d56Sopenharmony_ci
1257db96d56Sopenharmony_ci      Optional *splitchars* is a string containing characters which should be
1267db96d56Sopenharmony_ci      given extra weight by the splitting algorithm during normal header
1277db96d56Sopenharmony_ci      wrapping.  This is in very rough support of :RFC:`2822`\'s 'higher level
1287db96d56Sopenharmony_ci      syntactic breaks':  split points preceded by a splitchar are preferred
1297db96d56Sopenharmony_ci      during line splitting, with the characters preferred in the order in
1307db96d56Sopenharmony_ci      which they appear in the string.  Space and tab may be included in the
1317db96d56Sopenharmony_ci      string to indicate whether preference should be given to one over the
1327db96d56Sopenharmony_ci      other as a split point when other split chars do not appear in the line
1337db96d56Sopenharmony_ci      being split.  Splitchars does not affect :RFC:`2047` encoded lines.
1347db96d56Sopenharmony_ci
1357db96d56Sopenharmony_ci      *maxlinelen*, if given, overrides the instance's value for the maximum
1367db96d56Sopenharmony_ci      line length.
1377db96d56Sopenharmony_ci
1387db96d56Sopenharmony_ci      *linesep* specifies the characters used to separate the lines of the
1397db96d56Sopenharmony_ci      folded header.  It defaults to the most useful value for Python
1407db96d56Sopenharmony_ci      application code (``\n``), but ``\r\n`` can be specified in order
1417db96d56Sopenharmony_ci      to produce headers with RFC-compliant line separators.
1427db96d56Sopenharmony_ci
1437db96d56Sopenharmony_ci      .. versionchanged:: 3.2
1447db96d56Sopenharmony_ci         Added the *linesep* argument.
1457db96d56Sopenharmony_ci
1467db96d56Sopenharmony_ci
1477db96d56Sopenharmony_ci   The :class:`Header` class also provides a number of methods to support
1487db96d56Sopenharmony_ci   standard operators and built-in functions.
1497db96d56Sopenharmony_ci
1507db96d56Sopenharmony_ci   .. method:: __str__()
1517db96d56Sopenharmony_ci
1527db96d56Sopenharmony_ci      Returns an approximation of the :class:`Header` as a string, using an
1537db96d56Sopenharmony_ci      unlimited line length.  All pieces are converted to unicode using the
1547db96d56Sopenharmony_ci      specified encoding and joined together appropriately.  Any pieces with a
1557db96d56Sopenharmony_ci      charset of ``'unknown-8bit'`` are decoded as ASCII using the ``'replace'``
1567db96d56Sopenharmony_ci      error handler.
1577db96d56Sopenharmony_ci
1587db96d56Sopenharmony_ci      .. versionchanged:: 3.2
1597db96d56Sopenharmony_ci         Added handling for the ``'unknown-8bit'`` charset.
1607db96d56Sopenharmony_ci
1617db96d56Sopenharmony_ci
1627db96d56Sopenharmony_ci   .. method:: __eq__(other)
1637db96d56Sopenharmony_ci
1647db96d56Sopenharmony_ci      This method allows you to compare two :class:`Header` instances for
1657db96d56Sopenharmony_ci      equality.
1667db96d56Sopenharmony_ci
1677db96d56Sopenharmony_ci
1687db96d56Sopenharmony_ci   .. method:: __ne__(other)
1697db96d56Sopenharmony_ci
1707db96d56Sopenharmony_ci      This method allows you to compare two :class:`Header` instances for
1717db96d56Sopenharmony_ci      inequality.
1727db96d56Sopenharmony_ci
1737db96d56Sopenharmony_ciThe :mod:`email.header` module also provides the following convenient functions.
1747db96d56Sopenharmony_ci
1757db96d56Sopenharmony_ci
1767db96d56Sopenharmony_ci.. function:: decode_header(header)
1777db96d56Sopenharmony_ci
1787db96d56Sopenharmony_ci   Decode a message header value without converting the character set. The header
1797db96d56Sopenharmony_ci   value is in *header*.
1807db96d56Sopenharmony_ci
1817db96d56Sopenharmony_ci   This function returns a list of ``(decoded_string, charset)`` pairs containing
1827db96d56Sopenharmony_ci   each of the decoded parts of the header.  *charset* is ``None`` for non-encoded
1837db96d56Sopenharmony_ci   parts of the header, otherwise a lower case string containing the name of the
1847db96d56Sopenharmony_ci   character set specified in the encoded string.
1857db96d56Sopenharmony_ci
1867db96d56Sopenharmony_ci   Here's an example::
1877db96d56Sopenharmony_ci
1887db96d56Sopenharmony_ci      >>> from email.header import decode_header
1897db96d56Sopenharmony_ci      >>> decode_header('=?iso-8859-1?q?p=F6stal?=')
1907db96d56Sopenharmony_ci      [(b'p\xf6stal', 'iso-8859-1')]
1917db96d56Sopenharmony_ci
1927db96d56Sopenharmony_ci
1937db96d56Sopenharmony_ci.. function:: make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ')
1947db96d56Sopenharmony_ci
1957db96d56Sopenharmony_ci   Create a :class:`Header` instance from a sequence of pairs as returned by
1967db96d56Sopenharmony_ci   :func:`decode_header`.
1977db96d56Sopenharmony_ci
1987db96d56Sopenharmony_ci   :func:`decode_header` takes a header value string and returns a sequence of
1997db96d56Sopenharmony_ci   pairs of the format ``(decoded_string, charset)`` where *charset* is the name of
2007db96d56Sopenharmony_ci   the character set.
2017db96d56Sopenharmony_ci
2027db96d56Sopenharmony_ci   This function takes one of those sequence of pairs and returns a
2037db96d56Sopenharmony_ci   :class:`Header` instance.  Optional *maxlinelen*, *header_name*, and
2047db96d56Sopenharmony_ci   *continuation_ws* are as in the :class:`Header` constructor.
2057db96d56Sopenharmony_ci
206