17db96d56Sopenharmony_ci"""Header value parser implementing various email-related RFC parsing rules.
27db96d56Sopenharmony_ci
37db96d56Sopenharmony_ciThe parsing methods defined in this module implement various email related
47db96d56Sopenharmony_ciparsing rules.  Principal among them is RFC 5322, which is the followon
57db96d56Sopenharmony_cito RFC 2822 and primarily a clarification of the former.  It also implements
67db96d56Sopenharmony_ciRFC 2047 encoded word decoding.
77db96d56Sopenharmony_ci
87db96d56Sopenharmony_ciRFC 5322 goes to considerable trouble to maintain backward compatibility with
97db96d56Sopenharmony_ciRFC 822 in the parse phase, while cleaning up the structure on the generation
107db96d56Sopenharmony_ciphase.  This parser supports correct RFC 5322 generation by tagging white space
117db96d56Sopenharmony_cias folding white space only when folding is allowed in the non-obsolete rule
127db96d56Sopenharmony_cisets.  Actually, the parser is even more generous when accepting input than RFC
137db96d56Sopenharmony_ci5322 mandates, following the spirit of Postel's Law, which RFC 5322 encourages.
147db96d56Sopenharmony_ciWhere possible deviations from the standard are annotated on the 'defects'
157db96d56Sopenharmony_ciattribute of tokens that deviate.
167db96d56Sopenharmony_ci
177db96d56Sopenharmony_ciThe general structure of the parser follows RFC 5322, and uses its terminology
187db96d56Sopenharmony_ciwhere there is a direct correspondence.  Where the implementation requires a
197db96d56Sopenharmony_cisomewhat different structure than that used by the formal grammar, new terms
207db96d56Sopenharmony_cithat mimic the closest existing terms are used.  Thus, it really helps to have
217db96d56Sopenharmony_cia copy of RFC 5322 handy when studying this code.
227db96d56Sopenharmony_ci
237db96d56Sopenharmony_ciInput to the parser is a string that has already been unfolded according to
247db96d56Sopenharmony_ciRFC 5322 rules.  According to the RFC this unfolding is the very first step, and
257db96d56Sopenharmony_cithis parser leaves the unfolding step to a higher level message parser, which
267db96d56Sopenharmony_ciwill have already detected the line breaks that need unfolding while
277db96d56Sopenharmony_cidetermining the beginning and end of each header.
287db96d56Sopenharmony_ci
297db96d56Sopenharmony_ciThe output of the parser is a TokenList object, which is a list subclass.  A
307db96d56Sopenharmony_ciTokenList is a recursive data structure.  The terminal nodes of the structure
317db96d56Sopenharmony_ciare Terminal objects, which are subclasses of str.  These do not correspond
327db96d56Sopenharmony_cidirectly to terminal objects in the formal grammar, but are instead more
337db96d56Sopenharmony_cipractical higher level combinations of true terminals.
347db96d56Sopenharmony_ci
357db96d56Sopenharmony_ciAll TokenList and Terminal objects have a 'value' attribute, which produces the
367db96d56Sopenharmony_cisemantically meaningful value of that part of the parse subtree.  The value of
377db96d56Sopenharmony_ciall whitespace tokens (no matter how many sub-tokens they may contain) is a
387db96d56Sopenharmony_cisingle space, as per the RFC rules.  This includes 'CFWS', which is herein
397db96d56Sopenharmony_ciincluded in the general class of whitespace tokens.  There is one exception to
407db96d56Sopenharmony_cithe rule that whitespace tokens are collapsed into single spaces in values: in
417db96d56Sopenharmony_cithe value of a 'bare-quoted-string' (a quoted-string with no leading or
427db96d56Sopenharmony_citrailing whitespace), any whitespace that appeared between the quotation marks
437db96d56Sopenharmony_ciis preserved in the returned value.  Note that in all Terminal strings quoted
447db96d56Sopenharmony_cipairs are turned into their unquoted values.
457db96d56Sopenharmony_ci
467db96d56Sopenharmony_ciAll TokenList and Terminal objects also have a string value, which attempts to
477db96d56Sopenharmony_cibe a "canonical" representation of the RFC-compliant form of the substring that
487db96d56Sopenharmony_ciproduced the parsed subtree, including minimal use of quoted pair quoting.
497db96d56Sopenharmony_ciWhitespace runs are not collapsed.
507db96d56Sopenharmony_ci
517db96d56Sopenharmony_ciComment tokens also have a 'content' attribute providing the string found
527db96d56Sopenharmony_cibetween the parens (including any nested comments) with whitespace preserved.
537db96d56Sopenharmony_ci
547db96d56Sopenharmony_ciAll TokenList and Terminal objects have a 'defects' attribute which is a
557db96d56Sopenharmony_cipossibly empty list all of the defects found while creating the token.  Defects
567db96d56Sopenharmony_cimay appear on any token in the tree, and a composite list of all defects in the
577db96d56Sopenharmony_cisubtree is available through the 'all_defects' attribute of any node.  (For
587db96d56Sopenharmony_ciTerminal notes x.defects == x.all_defects.)
597db96d56Sopenharmony_ci
607db96d56Sopenharmony_ciEach object in a parse tree is called a 'token', and each has a 'token_type'
617db96d56Sopenharmony_ciattribute that gives the name from the RFC 5322 grammar that it represents.
627db96d56Sopenharmony_ciNot all RFC 5322 nodes are produced, and there is one non-RFC 5322 node that
637db96d56Sopenharmony_cimay be produced: 'ptext'.  A 'ptext' is a string of printable ascii characters.
647db96d56Sopenharmony_ciIt is returned in place of lists of (ctext/quoted-pair) and
657db96d56Sopenharmony_ci(qtext/quoted-pair).
667db96d56Sopenharmony_ci
677db96d56Sopenharmony_ciXXX: provide complete list of token types.
687db96d56Sopenharmony_ci"""
697db96d56Sopenharmony_ci
707db96d56Sopenharmony_ciimport re
717db96d56Sopenharmony_ciimport sys
727db96d56Sopenharmony_ciimport urllib   # For urllib.parse.unquote
737db96d56Sopenharmony_cifrom string import hexdigits
747db96d56Sopenharmony_cifrom operator import itemgetter
757db96d56Sopenharmony_cifrom email import _encoded_words as _ew
767db96d56Sopenharmony_cifrom email import errors
777db96d56Sopenharmony_cifrom email import utils
787db96d56Sopenharmony_ci
797db96d56Sopenharmony_ci#
807db96d56Sopenharmony_ci# Useful constants and functions
817db96d56Sopenharmony_ci#
827db96d56Sopenharmony_ci
837db96d56Sopenharmony_ciWSP = set(' \t')
847db96d56Sopenharmony_ciCFWS_LEADER = WSP | set('(')
857db96d56Sopenharmony_ciSPECIALS = set(r'()<>@,:;.\"[]')
867db96d56Sopenharmony_ciATOM_ENDS = SPECIALS | WSP
877db96d56Sopenharmony_ciDOT_ATOM_ENDS = ATOM_ENDS - set('.')
887db96d56Sopenharmony_ci# '.', '"', and '(' do not end phrases in order to support obs-phrase
897db96d56Sopenharmony_ciPHRASE_ENDS = SPECIALS - set('."(')
907db96d56Sopenharmony_ciTSPECIALS = (SPECIALS | set('/?=')) - set('.')
917db96d56Sopenharmony_ciTOKEN_ENDS = TSPECIALS | WSP
927db96d56Sopenharmony_ciASPECIALS = TSPECIALS | set("*'%")
937db96d56Sopenharmony_ciATTRIBUTE_ENDS = ASPECIALS | WSP
947db96d56Sopenharmony_ciEXTENDED_ATTRIBUTE_ENDS = ATTRIBUTE_ENDS - set('%')
957db96d56Sopenharmony_ci
967db96d56Sopenharmony_cidef quote_string(value):
977db96d56Sopenharmony_ci    return '"'+str(value).replace('\\', '\\\\').replace('"', r'\"')+'"'
987db96d56Sopenharmony_ci
997db96d56Sopenharmony_ci# Match a RFC 2047 word, looks like =?utf-8?q?someword?=
1007db96d56Sopenharmony_cirfc2047_matcher = re.compile(r'''
1017db96d56Sopenharmony_ci   =\?            # literal =?
1027db96d56Sopenharmony_ci   [^?]*          # charset
1037db96d56Sopenharmony_ci   \?             # literal ?
1047db96d56Sopenharmony_ci   [qQbB]         # literal 'q' or 'b', case insensitive
1057db96d56Sopenharmony_ci   \?             # literal ?
1067db96d56Sopenharmony_ci  .*?             # encoded word
1077db96d56Sopenharmony_ci  \?=             # literal ?=
1087db96d56Sopenharmony_ci''', re.VERBOSE | re.MULTILINE)
1097db96d56Sopenharmony_ci
1107db96d56Sopenharmony_ci
1117db96d56Sopenharmony_ci#
1127db96d56Sopenharmony_ci# TokenList and its subclasses
1137db96d56Sopenharmony_ci#
1147db96d56Sopenharmony_ci
1157db96d56Sopenharmony_ciclass TokenList(list):
1167db96d56Sopenharmony_ci
1177db96d56Sopenharmony_ci    token_type = None
1187db96d56Sopenharmony_ci    syntactic_break = True
1197db96d56Sopenharmony_ci    ew_combine_allowed = True
1207db96d56Sopenharmony_ci
1217db96d56Sopenharmony_ci    def __init__(self, *args, **kw):
1227db96d56Sopenharmony_ci        super().__init__(*args, **kw)
1237db96d56Sopenharmony_ci        self.defects = []
1247db96d56Sopenharmony_ci
1257db96d56Sopenharmony_ci    def __str__(self):
1267db96d56Sopenharmony_ci        return ''.join(str(x) for x in self)
1277db96d56Sopenharmony_ci
1287db96d56Sopenharmony_ci    def __repr__(self):
1297db96d56Sopenharmony_ci        return '{}({})'.format(self.__class__.__name__,
1307db96d56Sopenharmony_ci                             super().__repr__())
1317db96d56Sopenharmony_ci
1327db96d56Sopenharmony_ci    @property
1337db96d56Sopenharmony_ci    def value(self):
1347db96d56Sopenharmony_ci        return ''.join(x.value for x in self if x.value)
1357db96d56Sopenharmony_ci
1367db96d56Sopenharmony_ci    @property
1377db96d56Sopenharmony_ci    def all_defects(self):
1387db96d56Sopenharmony_ci        return sum((x.all_defects for x in self), self.defects)
1397db96d56Sopenharmony_ci
1407db96d56Sopenharmony_ci    def startswith_fws(self):
1417db96d56Sopenharmony_ci        return self[0].startswith_fws()
1427db96d56Sopenharmony_ci
1437db96d56Sopenharmony_ci    @property
1447db96d56Sopenharmony_ci    def as_ew_allowed(self):
1457db96d56Sopenharmony_ci        """True if all top level tokens of this part may be RFC2047 encoded."""
1467db96d56Sopenharmony_ci        return all(part.as_ew_allowed for part in self)
1477db96d56Sopenharmony_ci
1487db96d56Sopenharmony_ci    @property
1497db96d56Sopenharmony_ci    def comments(self):
1507db96d56Sopenharmony_ci        comments = []
1517db96d56Sopenharmony_ci        for token in self:
1527db96d56Sopenharmony_ci            comments.extend(token.comments)
1537db96d56Sopenharmony_ci        return comments
1547db96d56Sopenharmony_ci
1557db96d56Sopenharmony_ci    def fold(self, *, policy):
1567db96d56Sopenharmony_ci        return _refold_parse_tree(self, policy=policy)
1577db96d56Sopenharmony_ci
1587db96d56Sopenharmony_ci    def pprint(self, indent=''):
1597db96d56Sopenharmony_ci        print(self.ppstr(indent=indent))
1607db96d56Sopenharmony_ci
1617db96d56Sopenharmony_ci    def ppstr(self, indent=''):
1627db96d56Sopenharmony_ci        return '\n'.join(self._pp(indent=indent))
1637db96d56Sopenharmony_ci
1647db96d56Sopenharmony_ci    def _pp(self, indent=''):
1657db96d56Sopenharmony_ci        yield '{}{}/{}('.format(
1667db96d56Sopenharmony_ci            indent,
1677db96d56Sopenharmony_ci            self.__class__.__name__,
1687db96d56Sopenharmony_ci            self.token_type)
1697db96d56Sopenharmony_ci        for token in self:
1707db96d56Sopenharmony_ci            if not hasattr(token, '_pp'):
1717db96d56Sopenharmony_ci                yield (indent + '    !! invalid element in token '
1727db96d56Sopenharmony_ci                                        'list: {!r}'.format(token))
1737db96d56Sopenharmony_ci            else:
1747db96d56Sopenharmony_ci                yield from token._pp(indent+'    ')
1757db96d56Sopenharmony_ci        if self.defects:
1767db96d56Sopenharmony_ci            extra = ' Defects: {}'.format(self.defects)
1777db96d56Sopenharmony_ci        else:
1787db96d56Sopenharmony_ci            extra = ''
1797db96d56Sopenharmony_ci        yield '{}){}'.format(indent, extra)
1807db96d56Sopenharmony_ci
1817db96d56Sopenharmony_ci
1827db96d56Sopenharmony_ciclass WhiteSpaceTokenList(TokenList):
1837db96d56Sopenharmony_ci
1847db96d56Sopenharmony_ci    @property
1857db96d56Sopenharmony_ci    def value(self):
1867db96d56Sopenharmony_ci        return ' '
1877db96d56Sopenharmony_ci
1887db96d56Sopenharmony_ci    @property
1897db96d56Sopenharmony_ci    def comments(self):
1907db96d56Sopenharmony_ci        return [x.content for x in self if x.token_type=='comment']
1917db96d56Sopenharmony_ci
1927db96d56Sopenharmony_ci
1937db96d56Sopenharmony_ciclass UnstructuredTokenList(TokenList):
1947db96d56Sopenharmony_ci    token_type = 'unstructured'
1957db96d56Sopenharmony_ci
1967db96d56Sopenharmony_ci
1977db96d56Sopenharmony_ciclass Phrase(TokenList):
1987db96d56Sopenharmony_ci    token_type = 'phrase'
1997db96d56Sopenharmony_ci
2007db96d56Sopenharmony_ciclass Word(TokenList):
2017db96d56Sopenharmony_ci    token_type = 'word'
2027db96d56Sopenharmony_ci
2037db96d56Sopenharmony_ci
2047db96d56Sopenharmony_ciclass CFWSList(WhiteSpaceTokenList):
2057db96d56Sopenharmony_ci    token_type = 'cfws'
2067db96d56Sopenharmony_ci
2077db96d56Sopenharmony_ci
2087db96d56Sopenharmony_ciclass Atom(TokenList):
2097db96d56Sopenharmony_ci    token_type = 'atom'
2107db96d56Sopenharmony_ci
2117db96d56Sopenharmony_ci
2127db96d56Sopenharmony_ciclass Token(TokenList):
2137db96d56Sopenharmony_ci    token_type = 'token'
2147db96d56Sopenharmony_ci    encode_as_ew = False
2157db96d56Sopenharmony_ci
2167db96d56Sopenharmony_ci
2177db96d56Sopenharmony_ciclass EncodedWord(TokenList):
2187db96d56Sopenharmony_ci    token_type = 'encoded-word'
2197db96d56Sopenharmony_ci    cte = None
2207db96d56Sopenharmony_ci    charset = None
2217db96d56Sopenharmony_ci    lang = None
2227db96d56Sopenharmony_ci
2237db96d56Sopenharmony_ci
2247db96d56Sopenharmony_ciclass QuotedString(TokenList):
2257db96d56Sopenharmony_ci
2267db96d56Sopenharmony_ci    token_type = 'quoted-string'
2277db96d56Sopenharmony_ci
2287db96d56Sopenharmony_ci    @property
2297db96d56Sopenharmony_ci    def content(self):
2307db96d56Sopenharmony_ci        for x in self:
2317db96d56Sopenharmony_ci            if x.token_type == 'bare-quoted-string':
2327db96d56Sopenharmony_ci                return x.value
2337db96d56Sopenharmony_ci
2347db96d56Sopenharmony_ci    @property
2357db96d56Sopenharmony_ci    def quoted_value(self):
2367db96d56Sopenharmony_ci        res = []
2377db96d56Sopenharmony_ci        for x in self:
2387db96d56Sopenharmony_ci            if x.token_type == 'bare-quoted-string':
2397db96d56Sopenharmony_ci                res.append(str(x))
2407db96d56Sopenharmony_ci            else:
2417db96d56Sopenharmony_ci                res.append(x.value)
2427db96d56Sopenharmony_ci        return ''.join(res)
2437db96d56Sopenharmony_ci
2447db96d56Sopenharmony_ci    @property
2457db96d56Sopenharmony_ci    def stripped_value(self):
2467db96d56Sopenharmony_ci        for token in self:
2477db96d56Sopenharmony_ci            if token.token_type == 'bare-quoted-string':
2487db96d56Sopenharmony_ci                return token.value
2497db96d56Sopenharmony_ci
2507db96d56Sopenharmony_ci
2517db96d56Sopenharmony_ciclass BareQuotedString(QuotedString):
2527db96d56Sopenharmony_ci
2537db96d56Sopenharmony_ci    token_type = 'bare-quoted-string'
2547db96d56Sopenharmony_ci
2557db96d56Sopenharmony_ci    def __str__(self):
2567db96d56Sopenharmony_ci        return quote_string(''.join(str(x) for x in self))
2577db96d56Sopenharmony_ci
2587db96d56Sopenharmony_ci    @property
2597db96d56Sopenharmony_ci    def value(self):
2607db96d56Sopenharmony_ci        return ''.join(str(x) for x in self)
2617db96d56Sopenharmony_ci
2627db96d56Sopenharmony_ci
2637db96d56Sopenharmony_ciclass Comment(WhiteSpaceTokenList):
2647db96d56Sopenharmony_ci
2657db96d56Sopenharmony_ci    token_type = 'comment'
2667db96d56Sopenharmony_ci
2677db96d56Sopenharmony_ci    def __str__(self):
2687db96d56Sopenharmony_ci        return ''.join(sum([
2697db96d56Sopenharmony_ci                            ["("],
2707db96d56Sopenharmony_ci                            [self.quote(x) for x in self],
2717db96d56Sopenharmony_ci                            [")"],
2727db96d56Sopenharmony_ci                            ], []))
2737db96d56Sopenharmony_ci
2747db96d56Sopenharmony_ci    def quote(self, value):
2757db96d56Sopenharmony_ci        if value.token_type == 'comment':
2767db96d56Sopenharmony_ci            return str(value)
2777db96d56Sopenharmony_ci        return str(value).replace('\\', '\\\\').replace(
2787db96d56Sopenharmony_ci                                  '(', r'\(').replace(
2797db96d56Sopenharmony_ci                                  ')', r'\)')
2807db96d56Sopenharmony_ci
2817db96d56Sopenharmony_ci    @property
2827db96d56Sopenharmony_ci    def content(self):
2837db96d56Sopenharmony_ci        return ''.join(str(x) for x in self)
2847db96d56Sopenharmony_ci
2857db96d56Sopenharmony_ci    @property
2867db96d56Sopenharmony_ci    def comments(self):
2877db96d56Sopenharmony_ci        return [self.content]
2887db96d56Sopenharmony_ci
2897db96d56Sopenharmony_ciclass AddressList(TokenList):
2907db96d56Sopenharmony_ci
2917db96d56Sopenharmony_ci    token_type = 'address-list'
2927db96d56Sopenharmony_ci
2937db96d56Sopenharmony_ci    @property
2947db96d56Sopenharmony_ci    def addresses(self):
2957db96d56Sopenharmony_ci        return [x for x in self if x.token_type=='address']
2967db96d56Sopenharmony_ci
2977db96d56Sopenharmony_ci    @property
2987db96d56Sopenharmony_ci    def mailboxes(self):
2997db96d56Sopenharmony_ci        return sum((x.mailboxes
3007db96d56Sopenharmony_ci                    for x in self if x.token_type=='address'), [])
3017db96d56Sopenharmony_ci
3027db96d56Sopenharmony_ci    @property
3037db96d56Sopenharmony_ci    def all_mailboxes(self):
3047db96d56Sopenharmony_ci        return sum((x.all_mailboxes
3057db96d56Sopenharmony_ci                    for x in self if x.token_type=='address'), [])
3067db96d56Sopenharmony_ci
3077db96d56Sopenharmony_ci
3087db96d56Sopenharmony_ciclass Address(TokenList):
3097db96d56Sopenharmony_ci
3107db96d56Sopenharmony_ci    token_type = 'address'
3117db96d56Sopenharmony_ci
3127db96d56Sopenharmony_ci    @property
3137db96d56Sopenharmony_ci    def display_name(self):
3147db96d56Sopenharmony_ci        if self[0].token_type == 'group':
3157db96d56Sopenharmony_ci            return self[0].display_name
3167db96d56Sopenharmony_ci
3177db96d56Sopenharmony_ci    @property
3187db96d56Sopenharmony_ci    def mailboxes(self):
3197db96d56Sopenharmony_ci        if self[0].token_type == 'mailbox':
3207db96d56Sopenharmony_ci            return [self[0]]
3217db96d56Sopenharmony_ci        elif self[0].token_type == 'invalid-mailbox':
3227db96d56Sopenharmony_ci            return []
3237db96d56Sopenharmony_ci        return self[0].mailboxes
3247db96d56Sopenharmony_ci
3257db96d56Sopenharmony_ci    @property
3267db96d56Sopenharmony_ci    def all_mailboxes(self):
3277db96d56Sopenharmony_ci        if self[0].token_type == 'mailbox':
3287db96d56Sopenharmony_ci            return [self[0]]
3297db96d56Sopenharmony_ci        elif self[0].token_type == 'invalid-mailbox':
3307db96d56Sopenharmony_ci            return [self[0]]
3317db96d56Sopenharmony_ci        return self[0].all_mailboxes
3327db96d56Sopenharmony_ci
3337db96d56Sopenharmony_ciclass MailboxList(TokenList):
3347db96d56Sopenharmony_ci
3357db96d56Sopenharmony_ci    token_type = 'mailbox-list'
3367db96d56Sopenharmony_ci
3377db96d56Sopenharmony_ci    @property
3387db96d56Sopenharmony_ci    def mailboxes(self):
3397db96d56Sopenharmony_ci        return [x for x in self if x.token_type=='mailbox']
3407db96d56Sopenharmony_ci
3417db96d56Sopenharmony_ci    @property
3427db96d56Sopenharmony_ci    def all_mailboxes(self):
3437db96d56Sopenharmony_ci        return [x for x in self
3447db96d56Sopenharmony_ci            if x.token_type in ('mailbox', 'invalid-mailbox')]
3457db96d56Sopenharmony_ci
3467db96d56Sopenharmony_ci
3477db96d56Sopenharmony_ciclass GroupList(TokenList):
3487db96d56Sopenharmony_ci
3497db96d56Sopenharmony_ci    token_type = 'group-list'
3507db96d56Sopenharmony_ci
3517db96d56Sopenharmony_ci    @property
3527db96d56Sopenharmony_ci    def mailboxes(self):
3537db96d56Sopenharmony_ci        if not self or self[0].token_type != 'mailbox-list':
3547db96d56Sopenharmony_ci            return []
3557db96d56Sopenharmony_ci        return self[0].mailboxes
3567db96d56Sopenharmony_ci
3577db96d56Sopenharmony_ci    @property
3587db96d56Sopenharmony_ci    def all_mailboxes(self):
3597db96d56Sopenharmony_ci        if not self or self[0].token_type != 'mailbox-list':
3607db96d56Sopenharmony_ci            return []
3617db96d56Sopenharmony_ci        return self[0].all_mailboxes
3627db96d56Sopenharmony_ci
3637db96d56Sopenharmony_ci
3647db96d56Sopenharmony_ciclass Group(TokenList):
3657db96d56Sopenharmony_ci
3667db96d56Sopenharmony_ci    token_type = "group"
3677db96d56Sopenharmony_ci
3687db96d56Sopenharmony_ci    @property
3697db96d56Sopenharmony_ci    def mailboxes(self):
3707db96d56Sopenharmony_ci        if self[2].token_type != 'group-list':
3717db96d56Sopenharmony_ci            return []
3727db96d56Sopenharmony_ci        return self[2].mailboxes
3737db96d56Sopenharmony_ci
3747db96d56Sopenharmony_ci    @property
3757db96d56Sopenharmony_ci    def all_mailboxes(self):
3767db96d56Sopenharmony_ci        if self[2].token_type != 'group-list':
3777db96d56Sopenharmony_ci            return []
3787db96d56Sopenharmony_ci        return self[2].all_mailboxes
3797db96d56Sopenharmony_ci
3807db96d56Sopenharmony_ci    @property
3817db96d56Sopenharmony_ci    def display_name(self):
3827db96d56Sopenharmony_ci        return self[0].display_name
3837db96d56Sopenharmony_ci
3847db96d56Sopenharmony_ci
3857db96d56Sopenharmony_ciclass NameAddr(TokenList):
3867db96d56Sopenharmony_ci
3877db96d56Sopenharmony_ci    token_type = 'name-addr'
3887db96d56Sopenharmony_ci
3897db96d56Sopenharmony_ci    @property
3907db96d56Sopenharmony_ci    def display_name(self):
3917db96d56Sopenharmony_ci        if len(self) == 1:
3927db96d56Sopenharmony_ci            return None
3937db96d56Sopenharmony_ci        return self[0].display_name
3947db96d56Sopenharmony_ci
3957db96d56Sopenharmony_ci    @property
3967db96d56Sopenharmony_ci    def local_part(self):
3977db96d56Sopenharmony_ci        return self[-1].local_part
3987db96d56Sopenharmony_ci
3997db96d56Sopenharmony_ci    @property
4007db96d56Sopenharmony_ci    def domain(self):
4017db96d56Sopenharmony_ci        return self[-1].domain
4027db96d56Sopenharmony_ci
4037db96d56Sopenharmony_ci    @property
4047db96d56Sopenharmony_ci    def route(self):
4057db96d56Sopenharmony_ci        return self[-1].route
4067db96d56Sopenharmony_ci
4077db96d56Sopenharmony_ci    @property
4087db96d56Sopenharmony_ci    def addr_spec(self):
4097db96d56Sopenharmony_ci        return self[-1].addr_spec
4107db96d56Sopenharmony_ci
4117db96d56Sopenharmony_ci
4127db96d56Sopenharmony_ciclass AngleAddr(TokenList):
4137db96d56Sopenharmony_ci
4147db96d56Sopenharmony_ci    token_type = 'angle-addr'
4157db96d56Sopenharmony_ci
4167db96d56Sopenharmony_ci    @property
4177db96d56Sopenharmony_ci    def local_part(self):
4187db96d56Sopenharmony_ci        for x in self:
4197db96d56Sopenharmony_ci            if x.token_type == 'addr-spec':
4207db96d56Sopenharmony_ci                return x.local_part
4217db96d56Sopenharmony_ci
4227db96d56Sopenharmony_ci    @property
4237db96d56Sopenharmony_ci    def domain(self):
4247db96d56Sopenharmony_ci        for x in self:
4257db96d56Sopenharmony_ci            if x.token_type == 'addr-spec':
4267db96d56Sopenharmony_ci                return x.domain
4277db96d56Sopenharmony_ci
4287db96d56Sopenharmony_ci    @property
4297db96d56Sopenharmony_ci    def route(self):
4307db96d56Sopenharmony_ci        for x in self:
4317db96d56Sopenharmony_ci            if x.token_type == 'obs-route':
4327db96d56Sopenharmony_ci                return x.domains
4337db96d56Sopenharmony_ci
4347db96d56Sopenharmony_ci    @property
4357db96d56Sopenharmony_ci    def addr_spec(self):
4367db96d56Sopenharmony_ci        for x in self:
4377db96d56Sopenharmony_ci            if x.token_type == 'addr-spec':
4387db96d56Sopenharmony_ci                if x.local_part:
4397db96d56Sopenharmony_ci                    return x.addr_spec
4407db96d56Sopenharmony_ci                else:
4417db96d56Sopenharmony_ci                    return quote_string(x.local_part) + x.addr_spec
4427db96d56Sopenharmony_ci        else:
4437db96d56Sopenharmony_ci            return '<>'
4447db96d56Sopenharmony_ci
4457db96d56Sopenharmony_ci
4467db96d56Sopenharmony_ciclass ObsRoute(TokenList):
4477db96d56Sopenharmony_ci
4487db96d56Sopenharmony_ci    token_type = 'obs-route'
4497db96d56Sopenharmony_ci
4507db96d56Sopenharmony_ci    @property
4517db96d56Sopenharmony_ci    def domains(self):
4527db96d56Sopenharmony_ci        return [x.domain for x in self if x.token_type == 'domain']
4537db96d56Sopenharmony_ci
4547db96d56Sopenharmony_ci
4557db96d56Sopenharmony_ciclass Mailbox(TokenList):
4567db96d56Sopenharmony_ci
4577db96d56Sopenharmony_ci    token_type = 'mailbox'
4587db96d56Sopenharmony_ci
4597db96d56Sopenharmony_ci    @property
4607db96d56Sopenharmony_ci    def display_name(self):
4617db96d56Sopenharmony_ci        if self[0].token_type == 'name-addr':
4627db96d56Sopenharmony_ci            return self[0].display_name
4637db96d56Sopenharmony_ci
4647db96d56Sopenharmony_ci    @property
4657db96d56Sopenharmony_ci    def local_part(self):
4667db96d56Sopenharmony_ci        return self[0].local_part
4677db96d56Sopenharmony_ci
4687db96d56Sopenharmony_ci    @property
4697db96d56Sopenharmony_ci    def domain(self):
4707db96d56Sopenharmony_ci        return self[0].domain
4717db96d56Sopenharmony_ci
4727db96d56Sopenharmony_ci    @property
4737db96d56Sopenharmony_ci    def route(self):
4747db96d56Sopenharmony_ci        if self[0].token_type == 'name-addr':
4757db96d56Sopenharmony_ci            return self[0].route
4767db96d56Sopenharmony_ci
4777db96d56Sopenharmony_ci    @property
4787db96d56Sopenharmony_ci    def addr_spec(self):
4797db96d56Sopenharmony_ci        return self[0].addr_spec
4807db96d56Sopenharmony_ci
4817db96d56Sopenharmony_ci
4827db96d56Sopenharmony_ciclass InvalidMailbox(TokenList):
4837db96d56Sopenharmony_ci
4847db96d56Sopenharmony_ci    token_type = 'invalid-mailbox'
4857db96d56Sopenharmony_ci
4867db96d56Sopenharmony_ci    @property
4877db96d56Sopenharmony_ci    def display_name(self):
4887db96d56Sopenharmony_ci        return None
4897db96d56Sopenharmony_ci
4907db96d56Sopenharmony_ci    local_part = domain = route = addr_spec = display_name
4917db96d56Sopenharmony_ci
4927db96d56Sopenharmony_ci
4937db96d56Sopenharmony_ciclass Domain(TokenList):
4947db96d56Sopenharmony_ci
4957db96d56Sopenharmony_ci    token_type = 'domain'
4967db96d56Sopenharmony_ci    as_ew_allowed = False
4977db96d56Sopenharmony_ci
4987db96d56Sopenharmony_ci    @property
4997db96d56Sopenharmony_ci    def domain(self):
5007db96d56Sopenharmony_ci        return ''.join(super().value.split())
5017db96d56Sopenharmony_ci
5027db96d56Sopenharmony_ci
5037db96d56Sopenharmony_ciclass DotAtom(TokenList):
5047db96d56Sopenharmony_ci    token_type = 'dot-atom'
5057db96d56Sopenharmony_ci
5067db96d56Sopenharmony_ci
5077db96d56Sopenharmony_ciclass DotAtomText(TokenList):
5087db96d56Sopenharmony_ci    token_type = 'dot-atom-text'
5097db96d56Sopenharmony_ci    as_ew_allowed = True
5107db96d56Sopenharmony_ci
5117db96d56Sopenharmony_ci
5127db96d56Sopenharmony_ciclass NoFoldLiteral(TokenList):
5137db96d56Sopenharmony_ci    token_type = 'no-fold-literal'
5147db96d56Sopenharmony_ci    as_ew_allowed = False
5157db96d56Sopenharmony_ci
5167db96d56Sopenharmony_ci
5177db96d56Sopenharmony_ciclass AddrSpec(TokenList):
5187db96d56Sopenharmony_ci
5197db96d56Sopenharmony_ci    token_type = 'addr-spec'
5207db96d56Sopenharmony_ci    as_ew_allowed = False
5217db96d56Sopenharmony_ci
5227db96d56Sopenharmony_ci    @property
5237db96d56Sopenharmony_ci    def local_part(self):
5247db96d56Sopenharmony_ci        return self[0].local_part
5257db96d56Sopenharmony_ci
5267db96d56Sopenharmony_ci    @property
5277db96d56Sopenharmony_ci    def domain(self):
5287db96d56Sopenharmony_ci        if len(self) < 3:
5297db96d56Sopenharmony_ci            return None
5307db96d56Sopenharmony_ci        return self[-1].domain
5317db96d56Sopenharmony_ci
5327db96d56Sopenharmony_ci    @property
5337db96d56Sopenharmony_ci    def value(self):
5347db96d56Sopenharmony_ci        if len(self) < 3:
5357db96d56Sopenharmony_ci            return self[0].value
5367db96d56Sopenharmony_ci        return self[0].value.rstrip()+self[1].value+self[2].value.lstrip()
5377db96d56Sopenharmony_ci
5387db96d56Sopenharmony_ci    @property
5397db96d56Sopenharmony_ci    def addr_spec(self):
5407db96d56Sopenharmony_ci        nameset = set(self.local_part)
5417db96d56Sopenharmony_ci        if len(nameset) > len(nameset-DOT_ATOM_ENDS):
5427db96d56Sopenharmony_ci            lp = quote_string(self.local_part)
5437db96d56Sopenharmony_ci        else:
5447db96d56Sopenharmony_ci            lp = self.local_part
5457db96d56Sopenharmony_ci        if self.domain is not None:
5467db96d56Sopenharmony_ci            return lp + '@' + self.domain
5477db96d56Sopenharmony_ci        return lp
5487db96d56Sopenharmony_ci
5497db96d56Sopenharmony_ci
5507db96d56Sopenharmony_ciclass ObsLocalPart(TokenList):
5517db96d56Sopenharmony_ci
5527db96d56Sopenharmony_ci    token_type = 'obs-local-part'
5537db96d56Sopenharmony_ci    as_ew_allowed = False
5547db96d56Sopenharmony_ci
5557db96d56Sopenharmony_ci
5567db96d56Sopenharmony_ciclass DisplayName(Phrase):
5577db96d56Sopenharmony_ci
5587db96d56Sopenharmony_ci    token_type = 'display-name'
5597db96d56Sopenharmony_ci    ew_combine_allowed = False
5607db96d56Sopenharmony_ci
5617db96d56Sopenharmony_ci    @property
5627db96d56Sopenharmony_ci    def display_name(self):
5637db96d56Sopenharmony_ci        res = TokenList(self)
5647db96d56Sopenharmony_ci        if len(res) == 0:
5657db96d56Sopenharmony_ci            return res.value
5667db96d56Sopenharmony_ci        if res[0].token_type == 'cfws':
5677db96d56Sopenharmony_ci            res.pop(0)
5687db96d56Sopenharmony_ci        else:
5697db96d56Sopenharmony_ci            if res[0][0].token_type == 'cfws':
5707db96d56Sopenharmony_ci                res[0] = TokenList(res[0][1:])
5717db96d56Sopenharmony_ci        if res[-1].token_type == 'cfws':
5727db96d56Sopenharmony_ci            res.pop()
5737db96d56Sopenharmony_ci        else:
5747db96d56Sopenharmony_ci            if res[-1][-1].token_type == 'cfws':
5757db96d56Sopenharmony_ci                res[-1] = TokenList(res[-1][:-1])
5767db96d56Sopenharmony_ci        return res.value
5777db96d56Sopenharmony_ci
5787db96d56Sopenharmony_ci    @property
5797db96d56Sopenharmony_ci    def value(self):
5807db96d56Sopenharmony_ci        quote = False
5817db96d56Sopenharmony_ci        if self.defects:
5827db96d56Sopenharmony_ci            quote = True
5837db96d56Sopenharmony_ci        else:
5847db96d56Sopenharmony_ci            for x in self:
5857db96d56Sopenharmony_ci                if x.token_type == 'quoted-string':
5867db96d56Sopenharmony_ci                    quote = True
5877db96d56Sopenharmony_ci        if len(self) != 0 and quote:
5887db96d56Sopenharmony_ci            pre = post = ''
5897db96d56Sopenharmony_ci            if self[0].token_type=='cfws' or self[0][0].token_type=='cfws':
5907db96d56Sopenharmony_ci                pre = ' '
5917db96d56Sopenharmony_ci            if self[-1].token_type=='cfws' or self[-1][-1].token_type=='cfws':
5927db96d56Sopenharmony_ci                post = ' '
5937db96d56Sopenharmony_ci            return pre+quote_string(self.display_name)+post
5947db96d56Sopenharmony_ci        else:
5957db96d56Sopenharmony_ci            return super().value
5967db96d56Sopenharmony_ci
5977db96d56Sopenharmony_ci
5987db96d56Sopenharmony_ciclass LocalPart(TokenList):
5997db96d56Sopenharmony_ci
6007db96d56Sopenharmony_ci    token_type = 'local-part'
6017db96d56Sopenharmony_ci    as_ew_allowed = False
6027db96d56Sopenharmony_ci
6037db96d56Sopenharmony_ci    @property
6047db96d56Sopenharmony_ci    def value(self):
6057db96d56Sopenharmony_ci        if self[0].token_type == "quoted-string":
6067db96d56Sopenharmony_ci            return self[0].quoted_value
6077db96d56Sopenharmony_ci        else:
6087db96d56Sopenharmony_ci            return self[0].value
6097db96d56Sopenharmony_ci
6107db96d56Sopenharmony_ci    @property
6117db96d56Sopenharmony_ci    def local_part(self):
6127db96d56Sopenharmony_ci        # Strip whitespace from front, back, and around dots.
6137db96d56Sopenharmony_ci        res = [DOT]
6147db96d56Sopenharmony_ci        last = DOT
6157db96d56Sopenharmony_ci        last_is_tl = False
6167db96d56Sopenharmony_ci        for tok in self[0] + [DOT]:
6177db96d56Sopenharmony_ci            if tok.token_type == 'cfws':
6187db96d56Sopenharmony_ci                continue
6197db96d56Sopenharmony_ci            if (last_is_tl and tok.token_type == 'dot' and
6207db96d56Sopenharmony_ci                    last[-1].token_type == 'cfws'):
6217db96d56Sopenharmony_ci                res[-1] = TokenList(last[:-1])
6227db96d56Sopenharmony_ci            is_tl = isinstance(tok, TokenList)
6237db96d56Sopenharmony_ci            if (is_tl and last.token_type == 'dot' and
6247db96d56Sopenharmony_ci                    tok[0].token_type == 'cfws'):
6257db96d56Sopenharmony_ci                res.append(TokenList(tok[1:]))
6267db96d56Sopenharmony_ci            else:
6277db96d56Sopenharmony_ci                res.append(tok)
6287db96d56Sopenharmony_ci            last = res[-1]
6297db96d56Sopenharmony_ci            last_is_tl = is_tl
6307db96d56Sopenharmony_ci        res = TokenList(res[1:-1])
6317db96d56Sopenharmony_ci        return res.value
6327db96d56Sopenharmony_ci
6337db96d56Sopenharmony_ci
6347db96d56Sopenharmony_ciclass DomainLiteral(TokenList):
6357db96d56Sopenharmony_ci
6367db96d56Sopenharmony_ci    token_type = 'domain-literal'
6377db96d56Sopenharmony_ci    as_ew_allowed = False
6387db96d56Sopenharmony_ci
6397db96d56Sopenharmony_ci    @property
6407db96d56Sopenharmony_ci    def domain(self):
6417db96d56Sopenharmony_ci        return ''.join(super().value.split())
6427db96d56Sopenharmony_ci
6437db96d56Sopenharmony_ci    @property
6447db96d56Sopenharmony_ci    def ip(self):
6457db96d56Sopenharmony_ci        for x in self:
6467db96d56Sopenharmony_ci            if x.token_type == 'ptext':
6477db96d56Sopenharmony_ci                return x.value
6487db96d56Sopenharmony_ci
6497db96d56Sopenharmony_ci
6507db96d56Sopenharmony_ciclass MIMEVersion(TokenList):
6517db96d56Sopenharmony_ci
6527db96d56Sopenharmony_ci    token_type = 'mime-version'
6537db96d56Sopenharmony_ci    major = None
6547db96d56Sopenharmony_ci    minor = None
6557db96d56Sopenharmony_ci
6567db96d56Sopenharmony_ci
6577db96d56Sopenharmony_ciclass Parameter(TokenList):
6587db96d56Sopenharmony_ci
6597db96d56Sopenharmony_ci    token_type = 'parameter'
6607db96d56Sopenharmony_ci    sectioned = False
6617db96d56Sopenharmony_ci    extended = False
6627db96d56Sopenharmony_ci    charset = 'us-ascii'
6637db96d56Sopenharmony_ci
6647db96d56Sopenharmony_ci    @property
6657db96d56Sopenharmony_ci    def section_number(self):
6667db96d56Sopenharmony_ci        # Because the first token, the attribute (name) eats CFWS, the second
6677db96d56Sopenharmony_ci        # token is always the section if there is one.
6687db96d56Sopenharmony_ci        return self[1].number if self.sectioned else 0
6697db96d56Sopenharmony_ci
6707db96d56Sopenharmony_ci    @property
6717db96d56Sopenharmony_ci    def param_value(self):
6727db96d56Sopenharmony_ci        # This is part of the "handle quoted extended parameters" hack.
6737db96d56Sopenharmony_ci        for token in self:
6747db96d56Sopenharmony_ci            if token.token_type == 'value':
6757db96d56Sopenharmony_ci                return token.stripped_value
6767db96d56Sopenharmony_ci            if token.token_type == 'quoted-string':
6777db96d56Sopenharmony_ci                for token in token:
6787db96d56Sopenharmony_ci                    if token.token_type == 'bare-quoted-string':
6797db96d56Sopenharmony_ci                        for token in token:
6807db96d56Sopenharmony_ci                            if token.token_type == 'value':
6817db96d56Sopenharmony_ci                                return token.stripped_value
6827db96d56Sopenharmony_ci        return ''
6837db96d56Sopenharmony_ci
6847db96d56Sopenharmony_ci
6857db96d56Sopenharmony_ciclass InvalidParameter(Parameter):
6867db96d56Sopenharmony_ci
6877db96d56Sopenharmony_ci    token_type = 'invalid-parameter'
6887db96d56Sopenharmony_ci
6897db96d56Sopenharmony_ci
6907db96d56Sopenharmony_ciclass Attribute(TokenList):
6917db96d56Sopenharmony_ci
6927db96d56Sopenharmony_ci    token_type = 'attribute'
6937db96d56Sopenharmony_ci
6947db96d56Sopenharmony_ci    @property
6957db96d56Sopenharmony_ci    def stripped_value(self):
6967db96d56Sopenharmony_ci        for token in self:
6977db96d56Sopenharmony_ci            if token.token_type.endswith('attrtext'):
6987db96d56Sopenharmony_ci                return token.value
6997db96d56Sopenharmony_ci
7007db96d56Sopenharmony_ciclass Section(TokenList):
7017db96d56Sopenharmony_ci
7027db96d56Sopenharmony_ci    token_type = 'section'
7037db96d56Sopenharmony_ci    number = None
7047db96d56Sopenharmony_ci
7057db96d56Sopenharmony_ci
7067db96d56Sopenharmony_ciclass Value(TokenList):
7077db96d56Sopenharmony_ci
7087db96d56Sopenharmony_ci    token_type = 'value'
7097db96d56Sopenharmony_ci
7107db96d56Sopenharmony_ci    @property
7117db96d56Sopenharmony_ci    def stripped_value(self):
7127db96d56Sopenharmony_ci        token = self[0]
7137db96d56Sopenharmony_ci        if token.token_type == 'cfws':
7147db96d56Sopenharmony_ci            token = self[1]
7157db96d56Sopenharmony_ci        if token.token_type.endswith(
7167db96d56Sopenharmony_ci                ('quoted-string', 'attribute', 'extended-attribute')):
7177db96d56Sopenharmony_ci            return token.stripped_value
7187db96d56Sopenharmony_ci        return self.value
7197db96d56Sopenharmony_ci
7207db96d56Sopenharmony_ci
7217db96d56Sopenharmony_ciclass MimeParameters(TokenList):
7227db96d56Sopenharmony_ci
7237db96d56Sopenharmony_ci    token_type = 'mime-parameters'
7247db96d56Sopenharmony_ci    syntactic_break = False
7257db96d56Sopenharmony_ci
7267db96d56Sopenharmony_ci    @property
7277db96d56Sopenharmony_ci    def params(self):
7287db96d56Sopenharmony_ci        # The RFC specifically states that the ordering of parameters is not
7297db96d56Sopenharmony_ci        # guaranteed and may be reordered by the transport layer.  So we have
7307db96d56Sopenharmony_ci        # to assume the RFC 2231 pieces can come in any order.  However, we
7317db96d56Sopenharmony_ci        # output them in the order that we first see a given name, which gives
7327db96d56Sopenharmony_ci        # us a stable __str__.
7337db96d56Sopenharmony_ci        params = {}  # Using order preserving dict from Python 3.7+
7347db96d56Sopenharmony_ci        for token in self:
7357db96d56Sopenharmony_ci            if not token.token_type.endswith('parameter'):
7367db96d56Sopenharmony_ci                continue
7377db96d56Sopenharmony_ci            if token[0].token_type != 'attribute':
7387db96d56Sopenharmony_ci                continue
7397db96d56Sopenharmony_ci            name = token[0].value.strip()
7407db96d56Sopenharmony_ci            if name not in params:
7417db96d56Sopenharmony_ci                params[name] = []
7427db96d56Sopenharmony_ci            params[name].append((token.section_number, token))
7437db96d56Sopenharmony_ci        for name, parts in params.items():
7447db96d56Sopenharmony_ci            parts = sorted(parts, key=itemgetter(0))
7457db96d56Sopenharmony_ci            first_param = parts[0][1]
7467db96d56Sopenharmony_ci            charset = first_param.charset
7477db96d56Sopenharmony_ci            # Our arbitrary error recovery is to ignore duplicate parameters,
7487db96d56Sopenharmony_ci            # to use appearance order if there are duplicate rfc 2231 parts,
7497db96d56Sopenharmony_ci            # and to ignore gaps.  This mimics the error recovery of get_param.
7507db96d56Sopenharmony_ci            if not first_param.extended and len(parts) > 1:
7517db96d56Sopenharmony_ci                if parts[1][0] == 0:
7527db96d56Sopenharmony_ci                    parts[1][1].defects.append(errors.InvalidHeaderDefect(
7537db96d56Sopenharmony_ci                        'duplicate parameter name; duplicate(s) ignored'))
7547db96d56Sopenharmony_ci                    parts = parts[:1]
7557db96d56Sopenharmony_ci                # Else assume the *0* was missing...note that this is different
7567db96d56Sopenharmony_ci                # from get_param, but we registered a defect for this earlier.
7577db96d56Sopenharmony_ci            value_parts = []
7587db96d56Sopenharmony_ci            i = 0
7597db96d56Sopenharmony_ci            for section_number, param in parts:
7607db96d56Sopenharmony_ci                if section_number != i:
7617db96d56Sopenharmony_ci                    # We could get fancier here and look for a complete
7627db96d56Sopenharmony_ci                    # duplicate extended parameter and ignore the second one
7637db96d56Sopenharmony_ci                    # seen.  But we're not doing that.  The old code didn't.
7647db96d56Sopenharmony_ci                    if not param.extended:
7657db96d56Sopenharmony_ci                        param.defects.append(errors.InvalidHeaderDefect(
7667db96d56Sopenharmony_ci                            'duplicate parameter name; duplicate ignored'))
7677db96d56Sopenharmony_ci                        continue
7687db96d56Sopenharmony_ci                    else:
7697db96d56Sopenharmony_ci                        param.defects.append(errors.InvalidHeaderDefect(
7707db96d56Sopenharmony_ci                            "inconsistent RFC2231 parameter numbering"))
7717db96d56Sopenharmony_ci                i += 1
7727db96d56Sopenharmony_ci                value = param.param_value
7737db96d56Sopenharmony_ci                if param.extended:
7747db96d56Sopenharmony_ci                    try:
7757db96d56Sopenharmony_ci                        value = urllib.parse.unquote_to_bytes(value)
7767db96d56Sopenharmony_ci                    except UnicodeEncodeError:
7777db96d56Sopenharmony_ci                        # source had surrogate escaped bytes.  What we do now
7787db96d56Sopenharmony_ci                        # is a bit of an open question.  I'm not sure this is
7797db96d56Sopenharmony_ci                        # the best choice, but it is what the old algorithm did
7807db96d56Sopenharmony_ci                        value = urllib.parse.unquote(value, encoding='latin-1')
7817db96d56Sopenharmony_ci                    else:
7827db96d56Sopenharmony_ci                        try:
7837db96d56Sopenharmony_ci                            value = value.decode(charset, 'surrogateescape')
7847db96d56Sopenharmony_ci                        except (LookupError, UnicodeEncodeError):
7857db96d56Sopenharmony_ci                            # XXX: there should really be a custom defect for
7867db96d56Sopenharmony_ci                            # unknown character set to make it easy to find,
7877db96d56Sopenharmony_ci                            # because otherwise unknown charset is a silent
7887db96d56Sopenharmony_ci                            # failure.
7897db96d56Sopenharmony_ci                            value = value.decode('us-ascii', 'surrogateescape')
7907db96d56Sopenharmony_ci                        if utils._has_surrogates(value):
7917db96d56Sopenharmony_ci                            param.defects.append(errors.UndecodableBytesDefect())
7927db96d56Sopenharmony_ci                value_parts.append(value)
7937db96d56Sopenharmony_ci            value = ''.join(value_parts)
7947db96d56Sopenharmony_ci            yield name, value
7957db96d56Sopenharmony_ci
7967db96d56Sopenharmony_ci    def __str__(self):
7977db96d56Sopenharmony_ci        params = []
7987db96d56Sopenharmony_ci        for name, value in self.params:
7997db96d56Sopenharmony_ci            if value:
8007db96d56Sopenharmony_ci                params.append('{}={}'.format(name, quote_string(value)))
8017db96d56Sopenharmony_ci            else:
8027db96d56Sopenharmony_ci                params.append(name)
8037db96d56Sopenharmony_ci        params = '; '.join(params)
8047db96d56Sopenharmony_ci        return ' ' + params if params else ''
8057db96d56Sopenharmony_ci
8067db96d56Sopenharmony_ci
8077db96d56Sopenharmony_ciclass ParameterizedHeaderValue(TokenList):
8087db96d56Sopenharmony_ci
8097db96d56Sopenharmony_ci    # Set this false so that the value doesn't wind up on a new line even
8107db96d56Sopenharmony_ci    # if it and the parameters would fit there but not on the first line.
8117db96d56Sopenharmony_ci    syntactic_break = False
8127db96d56Sopenharmony_ci
8137db96d56Sopenharmony_ci    @property
8147db96d56Sopenharmony_ci    def params(self):
8157db96d56Sopenharmony_ci        for token in reversed(self):
8167db96d56Sopenharmony_ci            if token.token_type == 'mime-parameters':
8177db96d56Sopenharmony_ci                return token.params
8187db96d56Sopenharmony_ci        return {}
8197db96d56Sopenharmony_ci
8207db96d56Sopenharmony_ci
8217db96d56Sopenharmony_ciclass ContentType(ParameterizedHeaderValue):
8227db96d56Sopenharmony_ci    token_type = 'content-type'
8237db96d56Sopenharmony_ci    as_ew_allowed = False
8247db96d56Sopenharmony_ci    maintype = 'text'
8257db96d56Sopenharmony_ci    subtype = 'plain'
8267db96d56Sopenharmony_ci
8277db96d56Sopenharmony_ci
8287db96d56Sopenharmony_ciclass ContentDisposition(ParameterizedHeaderValue):
8297db96d56Sopenharmony_ci    token_type = 'content-disposition'
8307db96d56Sopenharmony_ci    as_ew_allowed = False
8317db96d56Sopenharmony_ci    content_disposition = None
8327db96d56Sopenharmony_ci
8337db96d56Sopenharmony_ci
8347db96d56Sopenharmony_ciclass ContentTransferEncoding(TokenList):
8357db96d56Sopenharmony_ci    token_type = 'content-transfer-encoding'
8367db96d56Sopenharmony_ci    as_ew_allowed = False
8377db96d56Sopenharmony_ci    cte = '7bit'
8387db96d56Sopenharmony_ci
8397db96d56Sopenharmony_ci
8407db96d56Sopenharmony_ciclass HeaderLabel(TokenList):
8417db96d56Sopenharmony_ci    token_type = 'header-label'
8427db96d56Sopenharmony_ci    as_ew_allowed = False
8437db96d56Sopenharmony_ci
8447db96d56Sopenharmony_ci
8457db96d56Sopenharmony_ciclass MsgID(TokenList):
8467db96d56Sopenharmony_ci    token_type = 'msg-id'
8477db96d56Sopenharmony_ci    as_ew_allowed = False
8487db96d56Sopenharmony_ci
8497db96d56Sopenharmony_ci    def fold(self, policy):
8507db96d56Sopenharmony_ci        # message-id tokens may not be folded.
8517db96d56Sopenharmony_ci        return str(self) + policy.linesep
8527db96d56Sopenharmony_ci
8537db96d56Sopenharmony_ci
8547db96d56Sopenharmony_ciclass MessageID(MsgID):
8557db96d56Sopenharmony_ci    token_type = 'message-id'
8567db96d56Sopenharmony_ci
8577db96d56Sopenharmony_ci
8587db96d56Sopenharmony_ciclass InvalidMessageID(MessageID):
8597db96d56Sopenharmony_ci    token_type = 'invalid-message-id'
8607db96d56Sopenharmony_ci
8617db96d56Sopenharmony_ci
8627db96d56Sopenharmony_ciclass Header(TokenList):
8637db96d56Sopenharmony_ci    token_type = 'header'
8647db96d56Sopenharmony_ci
8657db96d56Sopenharmony_ci
8667db96d56Sopenharmony_ci#
8677db96d56Sopenharmony_ci# Terminal classes and instances
8687db96d56Sopenharmony_ci#
8697db96d56Sopenharmony_ci
8707db96d56Sopenharmony_ciclass Terminal(str):
8717db96d56Sopenharmony_ci
8727db96d56Sopenharmony_ci    as_ew_allowed = True
8737db96d56Sopenharmony_ci    ew_combine_allowed = True
8747db96d56Sopenharmony_ci    syntactic_break = True
8757db96d56Sopenharmony_ci
8767db96d56Sopenharmony_ci    def __new__(cls, value, token_type):
8777db96d56Sopenharmony_ci        self = super().__new__(cls, value)
8787db96d56Sopenharmony_ci        self.token_type = token_type
8797db96d56Sopenharmony_ci        self.defects = []
8807db96d56Sopenharmony_ci        return self
8817db96d56Sopenharmony_ci
8827db96d56Sopenharmony_ci    def __repr__(self):
8837db96d56Sopenharmony_ci        return "{}({})".format(self.__class__.__name__, super().__repr__())
8847db96d56Sopenharmony_ci
8857db96d56Sopenharmony_ci    def pprint(self):
8867db96d56Sopenharmony_ci        print(self.__class__.__name__ + '/' + self.token_type)
8877db96d56Sopenharmony_ci
8887db96d56Sopenharmony_ci    @property
8897db96d56Sopenharmony_ci    def all_defects(self):
8907db96d56Sopenharmony_ci        return list(self.defects)
8917db96d56Sopenharmony_ci
8927db96d56Sopenharmony_ci    def _pp(self, indent=''):
8937db96d56Sopenharmony_ci        return ["{}{}/{}({}){}".format(
8947db96d56Sopenharmony_ci            indent,
8957db96d56Sopenharmony_ci            self.__class__.__name__,
8967db96d56Sopenharmony_ci            self.token_type,
8977db96d56Sopenharmony_ci            super().__repr__(),
8987db96d56Sopenharmony_ci            '' if not self.defects else ' {}'.format(self.defects),
8997db96d56Sopenharmony_ci            )]
9007db96d56Sopenharmony_ci
9017db96d56Sopenharmony_ci    def pop_trailing_ws(self):
9027db96d56Sopenharmony_ci        # This terminates the recursion.
9037db96d56Sopenharmony_ci        return None
9047db96d56Sopenharmony_ci
9057db96d56Sopenharmony_ci    @property
9067db96d56Sopenharmony_ci    def comments(self):
9077db96d56Sopenharmony_ci        return []
9087db96d56Sopenharmony_ci
9097db96d56Sopenharmony_ci    def __getnewargs__(self):
9107db96d56Sopenharmony_ci        return(str(self), self.token_type)
9117db96d56Sopenharmony_ci
9127db96d56Sopenharmony_ci
9137db96d56Sopenharmony_ciclass WhiteSpaceTerminal(Terminal):
9147db96d56Sopenharmony_ci
9157db96d56Sopenharmony_ci    @property
9167db96d56Sopenharmony_ci    def value(self):
9177db96d56Sopenharmony_ci        return ' '
9187db96d56Sopenharmony_ci
9197db96d56Sopenharmony_ci    def startswith_fws(self):
9207db96d56Sopenharmony_ci        return True
9217db96d56Sopenharmony_ci
9227db96d56Sopenharmony_ci
9237db96d56Sopenharmony_ciclass ValueTerminal(Terminal):
9247db96d56Sopenharmony_ci
9257db96d56Sopenharmony_ci    @property
9267db96d56Sopenharmony_ci    def value(self):
9277db96d56Sopenharmony_ci        return self
9287db96d56Sopenharmony_ci
9297db96d56Sopenharmony_ci    def startswith_fws(self):
9307db96d56Sopenharmony_ci        return False
9317db96d56Sopenharmony_ci
9327db96d56Sopenharmony_ci
9337db96d56Sopenharmony_ciclass EWWhiteSpaceTerminal(WhiteSpaceTerminal):
9347db96d56Sopenharmony_ci
9357db96d56Sopenharmony_ci    @property
9367db96d56Sopenharmony_ci    def value(self):
9377db96d56Sopenharmony_ci        return ''
9387db96d56Sopenharmony_ci
9397db96d56Sopenharmony_ci    def __str__(self):
9407db96d56Sopenharmony_ci        return ''
9417db96d56Sopenharmony_ci
9427db96d56Sopenharmony_ci
9437db96d56Sopenharmony_ciclass _InvalidEwError(errors.HeaderParseError):
9447db96d56Sopenharmony_ci    """Invalid encoded word found while parsing headers."""
9457db96d56Sopenharmony_ci
9467db96d56Sopenharmony_ci
9477db96d56Sopenharmony_ci# XXX these need to become classes and used as instances so
9487db96d56Sopenharmony_ci# that a program can't change them in a parse tree and screw
9497db96d56Sopenharmony_ci# up other parse trees.  Maybe should have  tests for that, too.
9507db96d56Sopenharmony_ciDOT = ValueTerminal('.', 'dot')
9517db96d56Sopenharmony_ciListSeparator = ValueTerminal(',', 'list-separator')
9527db96d56Sopenharmony_ciRouteComponentMarker = ValueTerminal('@', 'route-component-marker')
9537db96d56Sopenharmony_ci
9547db96d56Sopenharmony_ci#
9557db96d56Sopenharmony_ci# Parser
9567db96d56Sopenharmony_ci#
9577db96d56Sopenharmony_ci
9587db96d56Sopenharmony_ci# Parse strings according to RFC822/2047/2822/5322 rules.
9597db96d56Sopenharmony_ci#
9607db96d56Sopenharmony_ci# This is a stateless parser.  Each get_XXX function accepts a string and
9617db96d56Sopenharmony_ci# returns either a Terminal or a TokenList representing the RFC object named
9627db96d56Sopenharmony_ci# by the method and a string containing the remaining unparsed characters
9637db96d56Sopenharmony_ci# from the input.  Thus a parser method consumes the next syntactic construct
9647db96d56Sopenharmony_ci# of a given type and returns a token representing the construct plus the
9657db96d56Sopenharmony_ci# unparsed remainder of the input string.
9667db96d56Sopenharmony_ci#
9677db96d56Sopenharmony_ci# For example, if the first element of a structured header is a 'phrase',
9687db96d56Sopenharmony_ci# then:
9697db96d56Sopenharmony_ci#
9707db96d56Sopenharmony_ci#     phrase, value = get_phrase(value)
9717db96d56Sopenharmony_ci#
9727db96d56Sopenharmony_ci# returns the complete phrase from the start of the string value, plus any
9737db96d56Sopenharmony_ci# characters left in the string after the phrase is removed.
9747db96d56Sopenharmony_ci
9757db96d56Sopenharmony_ci_wsp_splitter = re.compile(r'([{}]+)'.format(''.join(WSP))).split
9767db96d56Sopenharmony_ci_non_atom_end_matcher = re.compile(r"[^{}]+".format(
9777db96d56Sopenharmony_ci    re.escape(''.join(ATOM_ENDS)))).match
9787db96d56Sopenharmony_ci_non_printable_finder = re.compile(r"[\x00-\x20\x7F]").findall
9797db96d56Sopenharmony_ci_non_token_end_matcher = re.compile(r"[^{}]+".format(
9807db96d56Sopenharmony_ci    re.escape(''.join(TOKEN_ENDS)))).match
9817db96d56Sopenharmony_ci_non_attribute_end_matcher = re.compile(r"[^{}]+".format(
9827db96d56Sopenharmony_ci    re.escape(''.join(ATTRIBUTE_ENDS)))).match
9837db96d56Sopenharmony_ci_non_extended_attribute_end_matcher = re.compile(r"[^{}]+".format(
9847db96d56Sopenharmony_ci    re.escape(''.join(EXTENDED_ATTRIBUTE_ENDS)))).match
9857db96d56Sopenharmony_ci
9867db96d56Sopenharmony_cidef _validate_xtext(xtext):
9877db96d56Sopenharmony_ci    """If input token contains ASCII non-printables, register a defect."""
9887db96d56Sopenharmony_ci
9897db96d56Sopenharmony_ci    non_printables = _non_printable_finder(xtext)
9907db96d56Sopenharmony_ci    if non_printables:
9917db96d56Sopenharmony_ci        xtext.defects.append(errors.NonPrintableDefect(non_printables))
9927db96d56Sopenharmony_ci    if utils._has_surrogates(xtext):
9937db96d56Sopenharmony_ci        xtext.defects.append(errors.UndecodableBytesDefect(
9947db96d56Sopenharmony_ci            "Non-ASCII characters found in header token"))
9957db96d56Sopenharmony_ci
9967db96d56Sopenharmony_cidef _get_ptext_to_endchars(value, endchars):
9977db96d56Sopenharmony_ci    """Scan printables/quoted-pairs until endchars and return unquoted ptext.
9987db96d56Sopenharmony_ci
9997db96d56Sopenharmony_ci    This function turns a run of qcontent, ccontent-without-comments, or
10007db96d56Sopenharmony_ci    dtext-with-quoted-printables into a single string by unquoting any
10017db96d56Sopenharmony_ci    quoted printables.  It returns the string, the remaining value, and
10027db96d56Sopenharmony_ci    a flag that is True iff there were any quoted printables decoded.
10037db96d56Sopenharmony_ci
10047db96d56Sopenharmony_ci    """
10057db96d56Sopenharmony_ci    fragment, *remainder = _wsp_splitter(value, 1)
10067db96d56Sopenharmony_ci    vchars = []
10077db96d56Sopenharmony_ci    escape = False
10087db96d56Sopenharmony_ci    had_qp = False
10097db96d56Sopenharmony_ci    for pos in range(len(fragment)):
10107db96d56Sopenharmony_ci        if fragment[pos] == '\\':
10117db96d56Sopenharmony_ci            if escape:
10127db96d56Sopenharmony_ci                escape = False
10137db96d56Sopenharmony_ci                had_qp = True
10147db96d56Sopenharmony_ci            else:
10157db96d56Sopenharmony_ci                escape = True
10167db96d56Sopenharmony_ci                continue
10177db96d56Sopenharmony_ci        if escape:
10187db96d56Sopenharmony_ci            escape = False
10197db96d56Sopenharmony_ci        elif fragment[pos] in endchars:
10207db96d56Sopenharmony_ci            break
10217db96d56Sopenharmony_ci        vchars.append(fragment[pos])
10227db96d56Sopenharmony_ci    else:
10237db96d56Sopenharmony_ci        pos = pos + 1
10247db96d56Sopenharmony_ci    return ''.join(vchars), ''.join([fragment[pos:]] + remainder), had_qp
10257db96d56Sopenharmony_ci
10267db96d56Sopenharmony_cidef get_fws(value):
10277db96d56Sopenharmony_ci    """FWS = 1*WSP
10287db96d56Sopenharmony_ci
10297db96d56Sopenharmony_ci    This isn't the RFC definition.  We're using fws to represent tokens where
10307db96d56Sopenharmony_ci    folding can be done, but when we are parsing the *un*folding has already
10317db96d56Sopenharmony_ci    been done so we don't need to watch out for CRLF.
10327db96d56Sopenharmony_ci
10337db96d56Sopenharmony_ci    """
10347db96d56Sopenharmony_ci    newvalue = value.lstrip()
10357db96d56Sopenharmony_ci    fws = WhiteSpaceTerminal(value[:len(value)-len(newvalue)], 'fws')
10367db96d56Sopenharmony_ci    return fws, newvalue
10377db96d56Sopenharmony_ci
10387db96d56Sopenharmony_cidef get_encoded_word(value):
10397db96d56Sopenharmony_ci    """ encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
10407db96d56Sopenharmony_ci
10417db96d56Sopenharmony_ci    """
10427db96d56Sopenharmony_ci    ew = EncodedWord()
10437db96d56Sopenharmony_ci    if not value.startswith('=?'):
10447db96d56Sopenharmony_ci        raise errors.HeaderParseError(
10457db96d56Sopenharmony_ci            "expected encoded word but found {}".format(value))
10467db96d56Sopenharmony_ci    tok, *remainder = value[2:].split('?=', 1)
10477db96d56Sopenharmony_ci    if tok == value[2:]:
10487db96d56Sopenharmony_ci        raise errors.HeaderParseError(
10497db96d56Sopenharmony_ci            "expected encoded word but found {}".format(value))
10507db96d56Sopenharmony_ci    remstr = ''.join(remainder)
10517db96d56Sopenharmony_ci    if (len(remstr) > 1 and
10527db96d56Sopenharmony_ci        remstr[0] in hexdigits and
10537db96d56Sopenharmony_ci        remstr[1] in hexdigits and
10547db96d56Sopenharmony_ci        tok.count('?') < 2):
10557db96d56Sopenharmony_ci        # The ? after the CTE was followed by an encoded word escape (=XX).
10567db96d56Sopenharmony_ci        rest, *remainder = remstr.split('?=', 1)
10577db96d56Sopenharmony_ci        tok = tok + '?=' + rest
10587db96d56Sopenharmony_ci    if len(tok.split()) > 1:
10597db96d56Sopenharmony_ci        ew.defects.append(errors.InvalidHeaderDefect(
10607db96d56Sopenharmony_ci            "whitespace inside encoded word"))
10617db96d56Sopenharmony_ci    ew.cte = value
10627db96d56Sopenharmony_ci    value = ''.join(remainder)
10637db96d56Sopenharmony_ci    try:
10647db96d56Sopenharmony_ci        text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
10657db96d56Sopenharmony_ci    except (ValueError, KeyError):
10667db96d56Sopenharmony_ci        raise _InvalidEwError(
10677db96d56Sopenharmony_ci            "encoded word format invalid: '{}'".format(ew.cte))
10687db96d56Sopenharmony_ci    ew.charset = charset
10697db96d56Sopenharmony_ci    ew.lang = lang
10707db96d56Sopenharmony_ci    ew.defects.extend(defects)
10717db96d56Sopenharmony_ci    while text:
10727db96d56Sopenharmony_ci        if text[0] in WSP:
10737db96d56Sopenharmony_ci            token, text = get_fws(text)
10747db96d56Sopenharmony_ci            ew.append(token)
10757db96d56Sopenharmony_ci            continue
10767db96d56Sopenharmony_ci        chars, *remainder = _wsp_splitter(text, 1)
10777db96d56Sopenharmony_ci        vtext = ValueTerminal(chars, 'vtext')
10787db96d56Sopenharmony_ci        _validate_xtext(vtext)
10797db96d56Sopenharmony_ci        ew.append(vtext)
10807db96d56Sopenharmony_ci        text = ''.join(remainder)
10817db96d56Sopenharmony_ci    # Encoded words should be followed by a WS
10827db96d56Sopenharmony_ci    if value and value[0] not in WSP:
10837db96d56Sopenharmony_ci        ew.defects.append(errors.InvalidHeaderDefect(
10847db96d56Sopenharmony_ci            "missing trailing whitespace after encoded-word"))
10857db96d56Sopenharmony_ci    return ew, value
10867db96d56Sopenharmony_ci
10877db96d56Sopenharmony_cidef get_unstructured(value):
10887db96d56Sopenharmony_ci    """unstructured = (*([FWS] vchar) *WSP) / obs-unstruct
10897db96d56Sopenharmony_ci       obs-unstruct = *((*LF *CR *(obs-utext) *LF *CR)) / FWS)
10907db96d56Sopenharmony_ci       obs-utext = %d0 / obs-NO-WS-CTL / LF / CR
10917db96d56Sopenharmony_ci
10927db96d56Sopenharmony_ci       obs-NO-WS-CTL is control characters except WSP/CR/LF.
10937db96d56Sopenharmony_ci
10947db96d56Sopenharmony_ci    So, basically, we have printable runs, plus control characters or nulls in
10957db96d56Sopenharmony_ci    the obsolete syntax, separated by whitespace.  Since RFC 2047 uses the
10967db96d56Sopenharmony_ci    obsolete syntax in its specification, but requires whitespace on either
10977db96d56Sopenharmony_ci    side of the encoded words, I can see no reason to need to separate the
10987db96d56Sopenharmony_ci    non-printable-non-whitespace from the printable runs if they occur, so we
10997db96d56Sopenharmony_ci    parse this into xtext tokens separated by WSP tokens.
11007db96d56Sopenharmony_ci
11017db96d56Sopenharmony_ci    Because an 'unstructured' value must by definition constitute the entire
11027db96d56Sopenharmony_ci    value, this 'get' routine does not return a remaining value, only the
11037db96d56Sopenharmony_ci    parsed TokenList.
11047db96d56Sopenharmony_ci
11057db96d56Sopenharmony_ci    """
11067db96d56Sopenharmony_ci    # XXX: but what about bare CR and LF?  They might signal the start or
11077db96d56Sopenharmony_ci    # end of an encoded word.  YAGNI for now, since our current parsers
11087db96d56Sopenharmony_ci    # will never send us strings with bare CR or LF.
11097db96d56Sopenharmony_ci
11107db96d56Sopenharmony_ci    unstructured = UnstructuredTokenList()
11117db96d56Sopenharmony_ci    while value:
11127db96d56Sopenharmony_ci        if value[0] in WSP:
11137db96d56Sopenharmony_ci            token, value = get_fws(value)
11147db96d56Sopenharmony_ci            unstructured.append(token)
11157db96d56Sopenharmony_ci            continue
11167db96d56Sopenharmony_ci        valid_ew = True
11177db96d56Sopenharmony_ci        if value.startswith('=?'):
11187db96d56Sopenharmony_ci            try:
11197db96d56Sopenharmony_ci                token, value = get_encoded_word(value)
11207db96d56Sopenharmony_ci            except _InvalidEwError:
11217db96d56Sopenharmony_ci                valid_ew = False
11227db96d56Sopenharmony_ci            except errors.HeaderParseError:
11237db96d56Sopenharmony_ci                # XXX: Need to figure out how to register defects when
11247db96d56Sopenharmony_ci                # appropriate here.
11257db96d56Sopenharmony_ci                pass
11267db96d56Sopenharmony_ci            else:
11277db96d56Sopenharmony_ci                have_ws = True
11287db96d56Sopenharmony_ci                if len(unstructured) > 0:
11297db96d56Sopenharmony_ci                    if unstructured[-1].token_type != 'fws':
11307db96d56Sopenharmony_ci                        unstructured.defects.append(errors.InvalidHeaderDefect(
11317db96d56Sopenharmony_ci                            "missing whitespace before encoded word"))
11327db96d56Sopenharmony_ci                        have_ws = False
11337db96d56Sopenharmony_ci                if have_ws and len(unstructured) > 1:
11347db96d56Sopenharmony_ci                    if unstructured[-2].token_type == 'encoded-word':
11357db96d56Sopenharmony_ci                        unstructured[-1] = EWWhiteSpaceTerminal(
11367db96d56Sopenharmony_ci                            unstructured[-1], 'fws')
11377db96d56Sopenharmony_ci                unstructured.append(token)
11387db96d56Sopenharmony_ci                continue
11397db96d56Sopenharmony_ci        tok, *remainder = _wsp_splitter(value, 1)
11407db96d56Sopenharmony_ci        # Split in the middle of an atom if there is a rfc2047 encoded word
11417db96d56Sopenharmony_ci        # which does not have WSP on both sides. The defect will be registered
11427db96d56Sopenharmony_ci        # the next time through the loop.
11437db96d56Sopenharmony_ci        # This needs to only be performed when the encoded word is valid;
11447db96d56Sopenharmony_ci        # otherwise, performing it on an invalid encoded word can cause
11457db96d56Sopenharmony_ci        # the parser to go in an infinite loop.
11467db96d56Sopenharmony_ci        if valid_ew and rfc2047_matcher.search(tok):
11477db96d56Sopenharmony_ci            tok, *remainder = value.partition('=?')
11487db96d56Sopenharmony_ci        vtext = ValueTerminal(tok, 'vtext')
11497db96d56Sopenharmony_ci        _validate_xtext(vtext)
11507db96d56Sopenharmony_ci        unstructured.append(vtext)
11517db96d56Sopenharmony_ci        value = ''.join(remainder)
11527db96d56Sopenharmony_ci    return unstructured
11537db96d56Sopenharmony_ci
11547db96d56Sopenharmony_cidef get_qp_ctext(value):
11557db96d56Sopenharmony_ci    r"""ctext = <printable ascii except \ ( )>
11567db96d56Sopenharmony_ci
11577db96d56Sopenharmony_ci    This is not the RFC ctext, since we are handling nested comments in comment
11587db96d56Sopenharmony_ci    and unquoting quoted-pairs here.  We allow anything except the '()'
11597db96d56Sopenharmony_ci    characters, but if we find any ASCII other than the RFC defined printable
11607db96d56Sopenharmony_ci    ASCII, a NonPrintableDefect is added to the token's defects list.  Since
11617db96d56Sopenharmony_ci    quoted pairs are converted to their unquoted values, what is returned is
11627db96d56Sopenharmony_ci    a 'ptext' token.  In this case it is a WhiteSpaceTerminal, so it's value
11637db96d56Sopenharmony_ci    is ' '.
11647db96d56Sopenharmony_ci
11657db96d56Sopenharmony_ci    """
11667db96d56Sopenharmony_ci    ptext, value, _ = _get_ptext_to_endchars(value, '()')
11677db96d56Sopenharmony_ci    ptext = WhiteSpaceTerminal(ptext, 'ptext')
11687db96d56Sopenharmony_ci    _validate_xtext(ptext)
11697db96d56Sopenharmony_ci    return ptext, value
11707db96d56Sopenharmony_ci
11717db96d56Sopenharmony_cidef get_qcontent(value):
11727db96d56Sopenharmony_ci    """qcontent = qtext / quoted-pair
11737db96d56Sopenharmony_ci
11747db96d56Sopenharmony_ci    We allow anything except the DQUOTE character, but if we find any ASCII
11757db96d56Sopenharmony_ci    other than the RFC defined printable ASCII, a NonPrintableDefect is
11767db96d56Sopenharmony_ci    added to the token's defects list.  Any quoted pairs are converted to their
11777db96d56Sopenharmony_ci    unquoted values, so what is returned is a 'ptext' token.  In this case it
11787db96d56Sopenharmony_ci    is a ValueTerminal.
11797db96d56Sopenharmony_ci
11807db96d56Sopenharmony_ci    """
11817db96d56Sopenharmony_ci    ptext, value, _ = _get_ptext_to_endchars(value, '"')
11827db96d56Sopenharmony_ci    ptext = ValueTerminal(ptext, 'ptext')
11837db96d56Sopenharmony_ci    _validate_xtext(ptext)
11847db96d56Sopenharmony_ci    return ptext, value
11857db96d56Sopenharmony_ci
11867db96d56Sopenharmony_cidef get_atext(value):
11877db96d56Sopenharmony_ci    """atext = <matches _atext_matcher>
11887db96d56Sopenharmony_ci
11897db96d56Sopenharmony_ci    We allow any non-ATOM_ENDS in atext, but add an InvalidATextDefect to
11907db96d56Sopenharmony_ci    the token's defects list if we find non-atext characters.
11917db96d56Sopenharmony_ci    """
11927db96d56Sopenharmony_ci    m = _non_atom_end_matcher(value)
11937db96d56Sopenharmony_ci    if not m:
11947db96d56Sopenharmony_ci        raise errors.HeaderParseError(
11957db96d56Sopenharmony_ci            "expected atext but found '{}'".format(value))
11967db96d56Sopenharmony_ci    atext = m.group()
11977db96d56Sopenharmony_ci    value = value[len(atext):]
11987db96d56Sopenharmony_ci    atext = ValueTerminal(atext, 'atext')
11997db96d56Sopenharmony_ci    _validate_xtext(atext)
12007db96d56Sopenharmony_ci    return atext, value
12017db96d56Sopenharmony_ci
12027db96d56Sopenharmony_cidef get_bare_quoted_string(value):
12037db96d56Sopenharmony_ci    """bare-quoted-string = DQUOTE *([FWS] qcontent) [FWS] DQUOTE
12047db96d56Sopenharmony_ci
12057db96d56Sopenharmony_ci    A quoted-string without the leading or trailing white space.  Its
12067db96d56Sopenharmony_ci    value is the text between the quote marks, with whitespace
12077db96d56Sopenharmony_ci    preserved and quoted pairs decoded.
12087db96d56Sopenharmony_ci    """
12097db96d56Sopenharmony_ci    if value[0] != '"':
12107db96d56Sopenharmony_ci        raise errors.HeaderParseError(
12117db96d56Sopenharmony_ci            "expected '\"' but found '{}'".format(value))
12127db96d56Sopenharmony_ci    bare_quoted_string = BareQuotedString()
12137db96d56Sopenharmony_ci    value = value[1:]
12147db96d56Sopenharmony_ci    if value and value[0] == '"':
12157db96d56Sopenharmony_ci        token, value = get_qcontent(value)
12167db96d56Sopenharmony_ci        bare_quoted_string.append(token)
12177db96d56Sopenharmony_ci    while value and value[0] != '"':
12187db96d56Sopenharmony_ci        if value[0] in WSP:
12197db96d56Sopenharmony_ci            token, value = get_fws(value)
12207db96d56Sopenharmony_ci        elif value[:2] == '=?':
12217db96d56Sopenharmony_ci            valid_ew = False
12227db96d56Sopenharmony_ci            try:
12237db96d56Sopenharmony_ci                token, value = get_encoded_word(value)
12247db96d56Sopenharmony_ci                bare_quoted_string.defects.append(errors.InvalidHeaderDefect(
12257db96d56Sopenharmony_ci                    "encoded word inside quoted string"))
12267db96d56Sopenharmony_ci                valid_ew = True
12277db96d56Sopenharmony_ci            except errors.HeaderParseError:
12287db96d56Sopenharmony_ci                token, value = get_qcontent(value)
12297db96d56Sopenharmony_ci            # Collapse the whitespace between two encoded words that occur in a
12307db96d56Sopenharmony_ci            # bare-quoted-string.
12317db96d56Sopenharmony_ci            if valid_ew and len(bare_quoted_string) > 1:
12327db96d56Sopenharmony_ci                if (bare_quoted_string[-1].token_type == 'fws' and
12337db96d56Sopenharmony_ci                        bare_quoted_string[-2].token_type == 'encoded-word'):
12347db96d56Sopenharmony_ci                    bare_quoted_string[-1] = EWWhiteSpaceTerminal(
12357db96d56Sopenharmony_ci                        bare_quoted_string[-1], 'fws')
12367db96d56Sopenharmony_ci        else:
12377db96d56Sopenharmony_ci            token, value = get_qcontent(value)
12387db96d56Sopenharmony_ci        bare_quoted_string.append(token)
12397db96d56Sopenharmony_ci    if not value:
12407db96d56Sopenharmony_ci        bare_quoted_string.defects.append(errors.InvalidHeaderDefect(
12417db96d56Sopenharmony_ci            "end of header inside quoted string"))
12427db96d56Sopenharmony_ci        return bare_quoted_string, value
12437db96d56Sopenharmony_ci    return bare_quoted_string, value[1:]
12447db96d56Sopenharmony_ci
12457db96d56Sopenharmony_cidef get_comment(value):
12467db96d56Sopenharmony_ci    """comment = "(" *([FWS] ccontent) [FWS] ")"
12477db96d56Sopenharmony_ci       ccontent = ctext / quoted-pair / comment
12487db96d56Sopenharmony_ci
12497db96d56Sopenharmony_ci    We handle nested comments here, and quoted-pair in our qp-ctext routine.
12507db96d56Sopenharmony_ci    """
12517db96d56Sopenharmony_ci    if value and value[0] != '(':
12527db96d56Sopenharmony_ci        raise errors.HeaderParseError(
12537db96d56Sopenharmony_ci            "expected '(' but found '{}'".format(value))
12547db96d56Sopenharmony_ci    comment = Comment()
12557db96d56Sopenharmony_ci    value = value[1:]
12567db96d56Sopenharmony_ci    while value and value[0] != ")":
12577db96d56Sopenharmony_ci        if value[0] in WSP:
12587db96d56Sopenharmony_ci            token, value = get_fws(value)
12597db96d56Sopenharmony_ci        elif value[0] == '(':
12607db96d56Sopenharmony_ci            token, value = get_comment(value)
12617db96d56Sopenharmony_ci        else:
12627db96d56Sopenharmony_ci            token, value = get_qp_ctext(value)
12637db96d56Sopenharmony_ci        comment.append(token)
12647db96d56Sopenharmony_ci    if not value:
12657db96d56Sopenharmony_ci        comment.defects.append(errors.InvalidHeaderDefect(
12667db96d56Sopenharmony_ci            "end of header inside comment"))
12677db96d56Sopenharmony_ci        return comment, value
12687db96d56Sopenharmony_ci    return comment, value[1:]
12697db96d56Sopenharmony_ci
12707db96d56Sopenharmony_cidef get_cfws(value):
12717db96d56Sopenharmony_ci    """CFWS = (1*([FWS] comment) [FWS]) / FWS
12727db96d56Sopenharmony_ci
12737db96d56Sopenharmony_ci    """
12747db96d56Sopenharmony_ci    cfws = CFWSList()
12757db96d56Sopenharmony_ci    while value and value[0] in CFWS_LEADER:
12767db96d56Sopenharmony_ci        if value[0] in WSP:
12777db96d56Sopenharmony_ci            token, value = get_fws(value)
12787db96d56Sopenharmony_ci        else:
12797db96d56Sopenharmony_ci            token, value = get_comment(value)
12807db96d56Sopenharmony_ci        cfws.append(token)
12817db96d56Sopenharmony_ci    return cfws, value
12827db96d56Sopenharmony_ci
12837db96d56Sopenharmony_cidef get_quoted_string(value):
12847db96d56Sopenharmony_ci    """quoted-string = [CFWS] <bare-quoted-string> [CFWS]
12857db96d56Sopenharmony_ci
12867db96d56Sopenharmony_ci    'bare-quoted-string' is an intermediate class defined by this
12877db96d56Sopenharmony_ci    parser and not by the RFC grammar.  It is the quoted string
12887db96d56Sopenharmony_ci    without any attached CFWS.
12897db96d56Sopenharmony_ci    """
12907db96d56Sopenharmony_ci    quoted_string = QuotedString()
12917db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
12927db96d56Sopenharmony_ci        token, value = get_cfws(value)
12937db96d56Sopenharmony_ci        quoted_string.append(token)
12947db96d56Sopenharmony_ci    token, value = get_bare_quoted_string(value)
12957db96d56Sopenharmony_ci    quoted_string.append(token)
12967db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
12977db96d56Sopenharmony_ci        token, value = get_cfws(value)
12987db96d56Sopenharmony_ci        quoted_string.append(token)
12997db96d56Sopenharmony_ci    return quoted_string, value
13007db96d56Sopenharmony_ci
13017db96d56Sopenharmony_cidef get_atom(value):
13027db96d56Sopenharmony_ci    """atom = [CFWS] 1*atext [CFWS]
13037db96d56Sopenharmony_ci
13047db96d56Sopenharmony_ci    An atom could be an rfc2047 encoded word.
13057db96d56Sopenharmony_ci    """
13067db96d56Sopenharmony_ci    atom = Atom()
13077db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
13087db96d56Sopenharmony_ci        token, value = get_cfws(value)
13097db96d56Sopenharmony_ci        atom.append(token)
13107db96d56Sopenharmony_ci    if value and value[0] in ATOM_ENDS:
13117db96d56Sopenharmony_ci        raise errors.HeaderParseError(
13127db96d56Sopenharmony_ci            "expected atom but found '{}'".format(value))
13137db96d56Sopenharmony_ci    if value.startswith('=?'):
13147db96d56Sopenharmony_ci        try:
13157db96d56Sopenharmony_ci            token, value = get_encoded_word(value)
13167db96d56Sopenharmony_ci        except errors.HeaderParseError:
13177db96d56Sopenharmony_ci            # XXX: need to figure out how to register defects when
13187db96d56Sopenharmony_ci            # appropriate here.
13197db96d56Sopenharmony_ci            token, value = get_atext(value)
13207db96d56Sopenharmony_ci    else:
13217db96d56Sopenharmony_ci        token, value = get_atext(value)
13227db96d56Sopenharmony_ci    atom.append(token)
13237db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
13247db96d56Sopenharmony_ci        token, value = get_cfws(value)
13257db96d56Sopenharmony_ci        atom.append(token)
13267db96d56Sopenharmony_ci    return atom, value
13277db96d56Sopenharmony_ci
13287db96d56Sopenharmony_cidef get_dot_atom_text(value):
13297db96d56Sopenharmony_ci    """ dot-text = 1*atext *("." 1*atext)
13307db96d56Sopenharmony_ci
13317db96d56Sopenharmony_ci    """
13327db96d56Sopenharmony_ci    dot_atom_text = DotAtomText()
13337db96d56Sopenharmony_ci    if not value or value[0] in ATOM_ENDS:
13347db96d56Sopenharmony_ci        raise errors.HeaderParseError("expected atom at a start of "
13357db96d56Sopenharmony_ci            "dot-atom-text but found '{}'".format(value))
13367db96d56Sopenharmony_ci    while value and value[0] not in ATOM_ENDS:
13377db96d56Sopenharmony_ci        token, value = get_atext(value)
13387db96d56Sopenharmony_ci        dot_atom_text.append(token)
13397db96d56Sopenharmony_ci        if value and value[0] == '.':
13407db96d56Sopenharmony_ci            dot_atom_text.append(DOT)
13417db96d56Sopenharmony_ci            value = value[1:]
13427db96d56Sopenharmony_ci    if dot_atom_text[-1] is DOT:
13437db96d56Sopenharmony_ci        raise errors.HeaderParseError("expected atom at end of dot-atom-text "
13447db96d56Sopenharmony_ci            "but found '{}'".format('.'+value))
13457db96d56Sopenharmony_ci    return dot_atom_text, value
13467db96d56Sopenharmony_ci
13477db96d56Sopenharmony_cidef get_dot_atom(value):
13487db96d56Sopenharmony_ci    """ dot-atom = [CFWS] dot-atom-text [CFWS]
13497db96d56Sopenharmony_ci
13507db96d56Sopenharmony_ci    Any place we can have a dot atom, we could instead have an rfc2047 encoded
13517db96d56Sopenharmony_ci    word.
13527db96d56Sopenharmony_ci    """
13537db96d56Sopenharmony_ci    dot_atom = DotAtom()
13547db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
13557db96d56Sopenharmony_ci        token, value = get_cfws(value)
13567db96d56Sopenharmony_ci        dot_atom.append(token)
13577db96d56Sopenharmony_ci    if value.startswith('=?'):
13587db96d56Sopenharmony_ci        try:
13597db96d56Sopenharmony_ci            token, value = get_encoded_word(value)
13607db96d56Sopenharmony_ci        except errors.HeaderParseError:
13617db96d56Sopenharmony_ci            # XXX: need to figure out how to register defects when
13627db96d56Sopenharmony_ci            # appropriate here.
13637db96d56Sopenharmony_ci            token, value = get_dot_atom_text(value)
13647db96d56Sopenharmony_ci    else:
13657db96d56Sopenharmony_ci        token, value = get_dot_atom_text(value)
13667db96d56Sopenharmony_ci    dot_atom.append(token)
13677db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
13687db96d56Sopenharmony_ci        token, value = get_cfws(value)
13697db96d56Sopenharmony_ci        dot_atom.append(token)
13707db96d56Sopenharmony_ci    return dot_atom, value
13717db96d56Sopenharmony_ci
13727db96d56Sopenharmony_cidef get_word(value):
13737db96d56Sopenharmony_ci    """word = atom / quoted-string
13747db96d56Sopenharmony_ci
13757db96d56Sopenharmony_ci    Either atom or quoted-string may start with CFWS.  We have to peel off this
13767db96d56Sopenharmony_ci    CFWS first to determine which type of word to parse.  Afterward we splice
13777db96d56Sopenharmony_ci    the leading CFWS, if any, into the parsed sub-token.
13787db96d56Sopenharmony_ci
13797db96d56Sopenharmony_ci    If neither an atom or a quoted-string is found before the next special, a
13807db96d56Sopenharmony_ci    HeaderParseError is raised.
13817db96d56Sopenharmony_ci
13827db96d56Sopenharmony_ci    The token returned is either an Atom or a QuotedString, as appropriate.
13837db96d56Sopenharmony_ci    This means the 'word' level of the formal grammar is not represented in the
13847db96d56Sopenharmony_ci    parse tree; this is because having that extra layer when manipulating the
13857db96d56Sopenharmony_ci    parse tree is more confusing than it is helpful.
13867db96d56Sopenharmony_ci
13877db96d56Sopenharmony_ci    """
13887db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
13897db96d56Sopenharmony_ci        leader, value = get_cfws(value)
13907db96d56Sopenharmony_ci    else:
13917db96d56Sopenharmony_ci        leader = None
13927db96d56Sopenharmony_ci    if not value:
13937db96d56Sopenharmony_ci        raise errors.HeaderParseError(
13947db96d56Sopenharmony_ci            "Expected 'atom' or 'quoted-string' but found nothing.")
13957db96d56Sopenharmony_ci    if value[0]=='"':
13967db96d56Sopenharmony_ci        token, value = get_quoted_string(value)
13977db96d56Sopenharmony_ci    elif value[0] in SPECIALS:
13987db96d56Sopenharmony_ci        raise errors.HeaderParseError("Expected 'atom' or 'quoted-string' "
13997db96d56Sopenharmony_ci                                      "but found '{}'".format(value))
14007db96d56Sopenharmony_ci    else:
14017db96d56Sopenharmony_ci        token, value = get_atom(value)
14027db96d56Sopenharmony_ci    if leader is not None:
14037db96d56Sopenharmony_ci        token[:0] = [leader]
14047db96d56Sopenharmony_ci    return token, value
14057db96d56Sopenharmony_ci
14067db96d56Sopenharmony_cidef get_phrase(value):
14077db96d56Sopenharmony_ci    """ phrase = 1*word / obs-phrase
14087db96d56Sopenharmony_ci        obs-phrase = word *(word / "." / CFWS)
14097db96d56Sopenharmony_ci
14107db96d56Sopenharmony_ci    This means a phrase can be a sequence of words, periods, and CFWS in any
14117db96d56Sopenharmony_ci    order as long as it starts with at least one word.  If anything other than
14127db96d56Sopenharmony_ci    words is detected, an ObsoleteHeaderDefect is added to the token's defect
14137db96d56Sopenharmony_ci    list.  We also accept a phrase that starts with CFWS followed by a dot;
14147db96d56Sopenharmony_ci    this is registered as an InvalidHeaderDefect, since it is not supported by
14157db96d56Sopenharmony_ci    even the obsolete grammar.
14167db96d56Sopenharmony_ci
14177db96d56Sopenharmony_ci    """
14187db96d56Sopenharmony_ci    phrase = Phrase()
14197db96d56Sopenharmony_ci    try:
14207db96d56Sopenharmony_ci        token, value = get_word(value)
14217db96d56Sopenharmony_ci        phrase.append(token)
14227db96d56Sopenharmony_ci    except errors.HeaderParseError:
14237db96d56Sopenharmony_ci        phrase.defects.append(errors.InvalidHeaderDefect(
14247db96d56Sopenharmony_ci            "phrase does not start with word"))
14257db96d56Sopenharmony_ci    while value and value[0] not in PHRASE_ENDS:
14267db96d56Sopenharmony_ci        if value[0]=='.':
14277db96d56Sopenharmony_ci            phrase.append(DOT)
14287db96d56Sopenharmony_ci            phrase.defects.append(errors.ObsoleteHeaderDefect(
14297db96d56Sopenharmony_ci                "period in 'phrase'"))
14307db96d56Sopenharmony_ci            value = value[1:]
14317db96d56Sopenharmony_ci        else:
14327db96d56Sopenharmony_ci            try:
14337db96d56Sopenharmony_ci                token, value = get_word(value)
14347db96d56Sopenharmony_ci            except errors.HeaderParseError:
14357db96d56Sopenharmony_ci                if value[0] in CFWS_LEADER:
14367db96d56Sopenharmony_ci                    token, value = get_cfws(value)
14377db96d56Sopenharmony_ci                    phrase.defects.append(errors.ObsoleteHeaderDefect(
14387db96d56Sopenharmony_ci                        "comment found without atom"))
14397db96d56Sopenharmony_ci                else:
14407db96d56Sopenharmony_ci                    raise
14417db96d56Sopenharmony_ci            phrase.append(token)
14427db96d56Sopenharmony_ci    return phrase, value
14437db96d56Sopenharmony_ci
14447db96d56Sopenharmony_cidef get_local_part(value):
14457db96d56Sopenharmony_ci    """ local-part = dot-atom / quoted-string / obs-local-part
14467db96d56Sopenharmony_ci
14477db96d56Sopenharmony_ci    """
14487db96d56Sopenharmony_ci    local_part = LocalPart()
14497db96d56Sopenharmony_ci    leader = None
14507db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
14517db96d56Sopenharmony_ci        leader, value = get_cfws(value)
14527db96d56Sopenharmony_ci    if not value:
14537db96d56Sopenharmony_ci        raise errors.HeaderParseError(
14547db96d56Sopenharmony_ci            "expected local-part but found '{}'".format(value))
14557db96d56Sopenharmony_ci    try:
14567db96d56Sopenharmony_ci        token, value = get_dot_atom(value)
14577db96d56Sopenharmony_ci    except errors.HeaderParseError:
14587db96d56Sopenharmony_ci        try:
14597db96d56Sopenharmony_ci            token, value = get_word(value)
14607db96d56Sopenharmony_ci        except errors.HeaderParseError:
14617db96d56Sopenharmony_ci            if value[0] != '\\' and value[0] in PHRASE_ENDS:
14627db96d56Sopenharmony_ci                raise
14637db96d56Sopenharmony_ci            token = TokenList()
14647db96d56Sopenharmony_ci    if leader is not None:
14657db96d56Sopenharmony_ci        token[:0] = [leader]
14667db96d56Sopenharmony_ci    local_part.append(token)
14677db96d56Sopenharmony_ci    if value and (value[0]=='\\' or value[0] not in PHRASE_ENDS):
14687db96d56Sopenharmony_ci        obs_local_part, value = get_obs_local_part(str(local_part) + value)
14697db96d56Sopenharmony_ci        if obs_local_part.token_type == 'invalid-obs-local-part':
14707db96d56Sopenharmony_ci            local_part.defects.append(errors.InvalidHeaderDefect(
14717db96d56Sopenharmony_ci                "local-part is not dot-atom, quoted-string, or obs-local-part"))
14727db96d56Sopenharmony_ci        else:
14737db96d56Sopenharmony_ci            local_part.defects.append(errors.ObsoleteHeaderDefect(
14747db96d56Sopenharmony_ci                "local-part is not a dot-atom (contains CFWS)"))
14757db96d56Sopenharmony_ci        local_part[0] = obs_local_part
14767db96d56Sopenharmony_ci    try:
14777db96d56Sopenharmony_ci        local_part.value.encode('ascii')
14787db96d56Sopenharmony_ci    except UnicodeEncodeError:
14797db96d56Sopenharmony_ci        local_part.defects.append(errors.NonASCIILocalPartDefect(
14807db96d56Sopenharmony_ci                "local-part contains non-ASCII characters)"))
14817db96d56Sopenharmony_ci    return local_part, value
14827db96d56Sopenharmony_ci
14837db96d56Sopenharmony_cidef get_obs_local_part(value):
14847db96d56Sopenharmony_ci    """ obs-local-part = word *("." word)
14857db96d56Sopenharmony_ci    """
14867db96d56Sopenharmony_ci    obs_local_part = ObsLocalPart()
14877db96d56Sopenharmony_ci    last_non_ws_was_dot = False
14887db96d56Sopenharmony_ci    while value and (value[0]=='\\' or value[0] not in PHRASE_ENDS):
14897db96d56Sopenharmony_ci        if value[0] == '.':
14907db96d56Sopenharmony_ci            if last_non_ws_was_dot:
14917db96d56Sopenharmony_ci                obs_local_part.defects.append(errors.InvalidHeaderDefect(
14927db96d56Sopenharmony_ci                    "invalid repeated '.'"))
14937db96d56Sopenharmony_ci            obs_local_part.append(DOT)
14947db96d56Sopenharmony_ci            last_non_ws_was_dot = True
14957db96d56Sopenharmony_ci            value = value[1:]
14967db96d56Sopenharmony_ci            continue
14977db96d56Sopenharmony_ci        elif value[0]=='\\':
14987db96d56Sopenharmony_ci            obs_local_part.append(ValueTerminal(value[0],
14997db96d56Sopenharmony_ci                                                'misplaced-special'))
15007db96d56Sopenharmony_ci            value = value[1:]
15017db96d56Sopenharmony_ci            obs_local_part.defects.append(errors.InvalidHeaderDefect(
15027db96d56Sopenharmony_ci                "'\\' character outside of quoted-string/ccontent"))
15037db96d56Sopenharmony_ci            last_non_ws_was_dot = False
15047db96d56Sopenharmony_ci            continue
15057db96d56Sopenharmony_ci        if obs_local_part and obs_local_part[-1].token_type != 'dot':
15067db96d56Sopenharmony_ci            obs_local_part.defects.append(errors.InvalidHeaderDefect(
15077db96d56Sopenharmony_ci                "missing '.' between words"))
15087db96d56Sopenharmony_ci        try:
15097db96d56Sopenharmony_ci            token, value = get_word(value)
15107db96d56Sopenharmony_ci            last_non_ws_was_dot = False
15117db96d56Sopenharmony_ci        except errors.HeaderParseError:
15127db96d56Sopenharmony_ci            if value[0] not in CFWS_LEADER:
15137db96d56Sopenharmony_ci                raise
15147db96d56Sopenharmony_ci            token, value = get_cfws(value)
15157db96d56Sopenharmony_ci        obs_local_part.append(token)
15167db96d56Sopenharmony_ci    if (obs_local_part[0].token_type == 'dot' or
15177db96d56Sopenharmony_ci            obs_local_part[0].token_type=='cfws' and
15187db96d56Sopenharmony_ci            obs_local_part[1].token_type=='dot'):
15197db96d56Sopenharmony_ci        obs_local_part.defects.append(errors.InvalidHeaderDefect(
15207db96d56Sopenharmony_ci            "Invalid leading '.' in local part"))
15217db96d56Sopenharmony_ci    if (obs_local_part[-1].token_type == 'dot' or
15227db96d56Sopenharmony_ci            obs_local_part[-1].token_type=='cfws' and
15237db96d56Sopenharmony_ci            obs_local_part[-2].token_type=='dot'):
15247db96d56Sopenharmony_ci        obs_local_part.defects.append(errors.InvalidHeaderDefect(
15257db96d56Sopenharmony_ci            "Invalid trailing '.' in local part"))
15267db96d56Sopenharmony_ci    if obs_local_part.defects:
15277db96d56Sopenharmony_ci        obs_local_part.token_type = 'invalid-obs-local-part'
15287db96d56Sopenharmony_ci    return obs_local_part, value
15297db96d56Sopenharmony_ci
15307db96d56Sopenharmony_cidef get_dtext(value):
15317db96d56Sopenharmony_ci    r""" dtext = <printable ascii except \ [ ]> / obs-dtext
15327db96d56Sopenharmony_ci        obs-dtext = obs-NO-WS-CTL / quoted-pair
15337db96d56Sopenharmony_ci
15347db96d56Sopenharmony_ci    We allow anything except the excluded characters, but if we find any
15357db96d56Sopenharmony_ci    ASCII other than the RFC defined printable ASCII, a NonPrintableDefect is
15367db96d56Sopenharmony_ci    added to the token's defects list.  Quoted pairs are converted to their
15377db96d56Sopenharmony_ci    unquoted values, so what is returned is a ptext token, in this case a
15387db96d56Sopenharmony_ci    ValueTerminal.  If there were quoted-printables, an ObsoleteHeaderDefect is
15397db96d56Sopenharmony_ci    added to the returned token's defect list.
15407db96d56Sopenharmony_ci
15417db96d56Sopenharmony_ci    """
15427db96d56Sopenharmony_ci    ptext, value, had_qp = _get_ptext_to_endchars(value, '[]')
15437db96d56Sopenharmony_ci    ptext = ValueTerminal(ptext, 'ptext')
15447db96d56Sopenharmony_ci    if had_qp:
15457db96d56Sopenharmony_ci        ptext.defects.append(errors.ObsoleteHeaderDefect(
15467db96d56Sopenharmony_ci            "quoted printable found in domain-literal"))
15477db96d56Sopenharmony_ci    _validate_xtext(ptext)
15487db96d56Sopenharmony_ci    return ptext, value
15497db96d56Sopenharmony_ci
15507db96d56Sopenharmony_cidef _check_for_early_dl_end(value, domain_literal):
15517db96d56Sopenharmony_ci    if value:
15527db96d56Sopenharmony_ci        return False
15537db96d56Sopenharmony_ci    domain_literal.append(errors.InvalidHeaderDefect(
15547db96d56Sopenharmony_ci        "end of input inside domain-literal"))
15557db96d56Sopenharmony_ci    domain_literal.append(ValueTerminal(']', 'domain-literal-end'))
15567db96d56Sopenharmony_ci    return True
15577db96d56Sopenharmony_ci
15587db96d56Sopenharmony_cidef get_domain_literal(value):
15597db96d56Sopenharmony_ci    """ domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
15607db96d56Sopenharmony_ci
15617db96d56Sopenharmony_ci    """
15627db96d56Sopenharmony_ci    domain_literal = DomainLiteral()
15637db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
15647db96d56Sopenharmony_ci        token, value = get_cfws(value)
15657db96d56Sopenharmony_ci        domain_literal.append(token)
15667db96d56Sopenharmony_ci    if not value:
15677db96d56Sopenharmony_ci        raise errors.HeaderParseError("expected domain-literal")
15687db96d56Sopenharmony_ci    if value[0] != '[':
15697db96d56Sopenharmony_ci        raise errors.HeaderParseError("expected '[' at start of domain-literal "
15707db96d56Sopenharmony_ci                "but found '{}'".format(value))
15717db96d56Sopenharmony_ci    value = value[1:]
15727db96d56Sopenharmony_ci    if _check_for_early_dl_end(value, domain_literal):
15737db96d56Sopenharmony_ci        return domain_literal, value
15747db96d56Sopenharmony_ci    domain_literal.append(ValueTerminal('[', 'domain-literal-start'))
15757db96d56Sopenharmony_ci    if value[0] in WSP:
15767db96d56Sopenharmony_ci        token, value = get_fws(value)
15777db96d56Sopenharmony_ci        domain_literal.append(token)
15787db96d56Sopenharmony_ci    token, value = get_dtext(value)
15797db96d56Sopenharmony_ci    domain_literal.append(token)
15807db96d56Sopenharmony_ci    if _check_for_early_dl_end(value, domain_literal):
15817db96d56Sopenharmony_ci        return domain_literal, value
15827db96d56Sopenharmony_ci    if value[0] in WSP:
15837db96d56Sopenharmony_ci        token, value = get_fws(value)
15847db96d56Sopenharmony_ci        domain_literal.append(token)
15857db96d56Sopenharmony_ci    if _check_for_early_dl_end(value, domain_literal):
15867db96d56Sopenharmony_ci        return domain_literal, value
15877db96d56Sopenharmony_ci    if value[0] != ']':
15887db96d56Sopenharmony_ci        raise errors.HeaderParseError("expected ']' at end of domain-literal "
15897db96d56Sopenharmony_ci                "but found '{}'".format(value))
15907db96d56Sopenharmony_ci    domain_literal.append(ValueTerminal(']', 'domain-literal-end'))
15917db96d56Sopenharmony_ci    value = value[1:]
15927db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
15937db96d56Sopenharmony_ci        token, value = get_cfws(value)
15947db96d56Sopenharmony_ci        domain_literal.append(token)
15957db96d56Sopenharmony_ci    return domain_literal, value
15967db96d56Sopenharmony_ci
15977db96d56Sopenharmony_cidef get_domain(value):
15987db96d56Sopenharmony_ci    """ domain = dot-atom / domain-literal / obs-domain
15997db96d56Sopenharmony_ci        obs-domain = atom *("." atom))
16007db96d56Sopenharmony_ci
16017db96d56Sopenharmony_ci    """
16027db96d56Sopenharmony_ci    domain = Domain()
16037db96d56Sopenharmony_ci    leader = None
16047db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
16057db96d56Sopenharmony_ci        leader, value = get_cfws(value)
16067db96d56Sopenharmony_ci    if not value:
16077db96d56Sopenharmony_ci        raise errors.HeaderParseError(
16087db96d56Sopenharmony_ci            "expected domain but found '{}'".format(value))
16097db96d56Sopenharmony_ci    if value[0] == '[':
16107db96d56Sopenharmony_ci        token, value = get_domain_literal(value)
16117db96d56Sopenharmony_ci        if leader is not None:
16127db96d56Sopenharmony_ci            token[:0] = [leader]
16137db96d56Sopenharmony_ci        domain.append(token)
16147db96d56Sopenharmony_ci        return domain, value
16157db96d56Sopenharmony_ci    try:
16167db96d56Sopenharmony_ci        token, value = get_dot_atom(value)
16177db96d56Sopenharmony_ci    except errors.HeaderParseError:
16187db96d56Sopenharmony_ci        token, value = get_atom(value)
16197db96d56Sopenharmony_ci    if value and value[0] == '@':
16207db96d56Sopenharmony_ci        raise errors.HeaderParseError('Invalid Domain')
16217db96d56Sopenharmony_ci    if leader is not None:
16227db96d56Sopenharmony_ci        token[:0] = [leader]
16237db96d56Sopenharmony_ci    domain.append(token)
16247db96d56Sopenharmony_ci    if value and value[0] == '.':
16257db96d56Sopenharmony_ci        domain.defects.append(errors.ObsoleteHeaderDefect(
16267db96d56Sopenharmony_ci            "domain is not a dot-atom (contains CFWS)"))
16277db96d56Sopenharmony_ci        if domain[0].token_type == 'dot-atom':
16287db96d56Sopenharmony_ci            domain[:] = domain[0]
16297db96d56Sopenharmony_ci        while value and value[0] == '.':
16307db96d56Sopenharmony_ci            domain.append(DOT)
16317db96d56Sopenharmony_ci            token, value = get_atom(value[1:])
16327db96d56Sopenharmony_ci            domain.append(token)
16337db96d56Sopenharmony_ci    return domain, value
16347db96d56Sopenharmony_ci
16357db96d56Sopenharmony_cidef get_addr_spec(value):
16367db96d56Sopenharmony_ci    """ addr-spec = local-part "@" domain
16377db96d56Sopenharmony_ci
16387db96d56Sopenharmony_ci    """
16397db96d56Sopenharmony_ci    addr_spec = AddrSpec()
16407db96d56Sopenharmony_ci    token, value = get_local_part(value)
16417db96d56Sopenharmony_ci    addr_spec.append(token)
16427db96d56Sopenharmony_ci    if not value or value[0] != '@':
16437db96d56Sopenharmony_ci        addr_spec.defects.append(errors.InvalidHeaderDefect(
16447db96d56Sopenharmony_ci            "addr-spec local part with no domain"))
16457db96d56Sopenharmony_ci        return addr_spec, value
16467db96d56Sopenharmony_ci    addr_spec.append(ValueTerminal('@', 'address-at-symbol'))
16477db96d56Sopenharmony_ci    token, value = get_domain(value[1:])
16487db96d56Sopenharmony_ci    addr_spec.append(token)
16497db96d56Sopenharmony_ci    return addr_spec, value
16507db96d56Sopenharmony_ci
16517db96d56Sopenharmony_cidef get_obs_route(value):
16527db96d56Sopenharmony_ci    """ obs-route = obs-domain-list ":"
16537db96d56Sopenharmony_ci        obs-domain-list = *(CFWS / ",") "@" domain *("," [CFWS] ["@" domain])
16547db96d56Sopenharmony_ci
16557db96d56Sopenharmony_ci        Returns an obs-route token with the appropriate sub-tokens (that is,
16567db96d56Sopenharmony_ci        there is no obs-domain-list in the parse tree).
16577db96d56Sopenharmony_ci    """
16587db96d56Sopenharmony_ci    obs_route = ObsRoute()
16597db96d56Sopenharmony_ci    while value and (value[0]==',' or value[0] in CFWS_LEADER):
16607db96d56Sopenharmony_ci        if value[0] in CFWS_LEADER:
16617db96d56Sopenharmony_ci            token, value = get_cfws(value)
16627db96d56Sopenharmony_ci            obs_route.append(token)
16637db96d56Sopenharmony_ci        elif value[0] == ',':
16647db96d56Sopenharmony_ci            obs_route.append(ListSeparator)
16657db96d56Sopenharmony_ci            value = value[1:]
16667db96d56Sopenharmony_ci    if not value or value[0] != '@':
16677db96d56Sopenharmony_ci        raise errors.HeaderParseError(
16687db96d56Sopenharmony_ci            "expected obs-route domain but found '{}'".format(value))
16697db96d56Sopenharmony_ci    obs_route.append(RouteComponentMarker)
16707db96d56Sopenharmony_ci    token, value = get_domain(value[1:])
16717db96d56Sopenharmony_ci    obs_route.append(token)
16727db96d56Sopenharmony_ci    while value and value[0]==',':
16737db96d56Sopenharmony_ci        obs_route.append(ListSeparator)
16747db96d56Sopenharmony_ci        value = value[1:]
16757db96d56Sopenharmony_ci        if not value:
16767db96d56Sopenharmony_ci            break
16777db96d56Sopenharmony_ci        if value[0] in CFWS_LEADER:
16787db96d56Sopenharmony_ci            token, value = get_cfws(value)
16797db96d56Sopenharmony_ci            obs_route.append(token)
16807db96d56Sopenharmony_ci        if value[0] == '@':
16817db96d56Sopenharmony_ci            obs_route.append(RouteComponentMarker)
16827db96d56Sopenharmony_ci            token, value = get_domain(value[1:])
16837db96d56Sopenharmony_ci            obs_route.append(token)
16847db96d56Sopenharmony_ci    if not value:
16857db96d56Sopenharmony_ci        raise errors.HeaderParseError("end of header while parsing obs-route")
16867db96d56Sopenharmony_ci    if value[0] != ':':
16877db96d56Sopenharmony_ci        raise errors.HeaderParseError( "expected ':' marking end of "
16887db96d56Sopenharmony_ci            "obs-route but found '{}'".format(value))
16897db96d56Sopenharmony_ci    obs_route.append(ValueTerminal(':', 'end-of-obs-route-marker'))
16907db96d56Sopenharmony_ci    return obs_route, value[1:]
16917db96d56Sopenharmony_ci
16927db96d56Sopenharmony_cidef get_angle_addr(value):
16937db96d56Sopenharmony_ci    """ angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
16947db96d56Sopenharmony_ci        obs-angle-addr = [CFWS] "<" obs-route addr-spec ">" [CFWS]
16957db96d56Sopenharmony_ci
16967db96d56Sopenharmony_ci    """
16977db96d56Sopenharmony_ci    angle_addr = AngleAddr()
16987db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
16997db96d56Sopenharmony_ci        token, value = get_cfws(value)
17007db96d56Sopenharmony_ci        angle_addr.append(token)
17017db96d56Sopenharmony_ci    if not value or value[0] != '<':
17027db96d56Sopenharmony_ci        raise errors.HeaderParseError(
17037db96d56Sopenharmony_ci            "expected angle-addr but found '{}'".format(value))
17047db96d56Sopenharmony_ci    angle_addr.append(ValueTerminal('<', 'angle-addr-start'))
17057db96d56Sopenharmony_ci    value = value[1:]
17067db96d56Sopenharmony_ci    # Although it is not legal per RFC5322, SMTP uses '<>' in certain
17077db96d56Sopenharmony_ci    # circumstances.
17087db96d56Sopenharmony_ci    if value[0] == '>':
17097db96d56Sopenharmony_ci        angle_addr.append(ValueTerminal('>', 'angle-addr-end'))
17107db96d56Sopenharmony_ci        angle_addr.defects.append(errors.InvalidHeaderDefect(
17117db96d56Sopenharmony_ci            "null addr-spec in angle-addr"))
17127db96d56Sopenharmony_ci        value = value[1:]
17137db96d56Sopenharmony_ci        return angle_addr, value
17147db96d56Sopenharmony_ci    try:
17157db96d56Sopenharmony_ci        token, value = get_addr_spec(value)
17167db96d56Sopenharmony_ci    except errors.HeaderParseError:
17177db96d56Sopenharmony_ci        try:
17187db96d56Sopenharmony_ci            token, value = get_obs_route(value)
17197db96d56Sopenharmony_ci            angle_addr.defects.append(errors.ObsoleteHeaderDefect(
17207db96d56Sopenharmony_ci                "obsolete route specification in angle-addr"))
17217db96d56Sopenharmony_ci        except errors.HeaderParseError:
17227db96d56Sopenharmony_ci            raise errors.HeaderParseError(
17237db96d56Sopenharmony_ci                "expected addr-spec or obs-route but found '{}'".format(value))
17247db96d56Sopenharmony_ci        angle_addr.append(token)
17257db96d56Sopenharmony_ci        token, value = get_addr_spec(value)
17267db96d56Sopenharmony_ci    angle_addr.append(token)
17277db96d56Sopenharmony_ci    if value and value[0] == '>':
17287db96d56Sopenharmony_ci        value = value[1:]
17297db96d56Sopenharmony_ci    else:
17307db96d56Sopenharmony_ci        angle_addr.defects.append(errors.InvalidHeaderDefect(
17317db96d56Sopenharmony_ci            "missing trailing '>' on angle-addr"))
17327db96d56Sopenharmony_ci    angle_addr.append(ValueTerminal('>', 'angle-addr-end'))
17337db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
17347db96d56Sopenharmony_ci        token, value = get_cfws(value)
17357db96d56Sopenharmony_ci        angle_addr.append(token)
17367db96d56Sopenharmony_ci    return angle_addr, value
17377db96d56Sopenharmony_ci
17387db96d56Sopenharmony_cidef get_display_name(value):
17397db96d56Sopenharmony_ci    """ display-name = phrase
17407db96d56Sopenharmony_ci
17417db96d56Sopenharmony_ci    Because this is simply a name-rule, we don't return a display-name
17427db96d56Sopenharmony_ci    token containing a phrase, but rather a display-name token with
17437db96d56Sopenharmony_ci    the content of the phrase.
17447db96d56Sopenharmony_ci
17457db96d56Sopenharmony_ci    """
17467db96d56Sopenharmony_ci    display_name = DisplayName()
17477db96d56Sopenharmony_ci    token, value = get_phrase(value)
17487db96d56Sopenharmony_ci    display_name.extend(token[:])
17497db96d56Sopenharmony_ci    display_name.defects = token.defects[:]
17507db96d56Sopenharmony_ci    return display_name, value
17517db96d56Sopenharmony_ci
17527db96d56Sopenharmony_ci
17537db96d56Sopenharmony_cidef get_name_addr(value):
17547db96d56Sopenharmony_ci    """ name-addr = [display-name] angle-addr
17557db96d56Sopenharmony_ci
17567db96d56Sopenharmony_ci    """
17577db96d56Sopenharmony_ci    name_addr = NameAddr()
17587db96d56Sopenharmony_ci    # Both the optional display name and the angle-addr can start with cfws.
17597db96d56Sopenharmony_ci    leader = None
17607db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
17617db96d56Sopenharmony_ci        leader, value = get_cfws(value)
17627db96d56Sopenharmony_ci        if not value:
17637db96d56Sopenharmony_ci            raise errors.HeaderParseError(
17647db96d56Sopenharmony_ci                "expected name-addr but found '{}'".format(leader))
17657db96d56Sopenharmony_ci    if value[0] != '<':
17667db96d56Sopenharmony_ci        if value[0] in PHRASE_ENDS:
17677db96d56Sopenharmony_ci            raise errors.HeaderParseError(
17687db96d56Sopenharmony_ci                "expected name-addr but found '{}'".format(value))
17697db96d56Sopenharmony_ci        token, value = get_display_name(value)
17707db96d56Sopenharmony_ci        if not value:
17717db96d56Sopenharmony_ci            raise errors.HeaderParseError(
17727db96d56Sopenharmony_ci                "expected name-addr but found '{}'".format(token))
17737db96d56Sopenharmony_ci        if leader is not None:
17747db96d56Sopenharmony_ci            token[0][:0] = [leader]
17757db96d56Sopenharmony_ci            leader = None
17767db96d56Sopenharmony_ci        name_addr.append(token)
17777db96d56Sopenharmony_ci    token, value = get_angle_addr(value)
17787db96d56Sopenharmony_ci    if leader is not None:
17797db96d56Sopenharmony_ci        token[:0] = [leader]
17807db96d56Sopenharmony_ci    name_addr.append(token)
17817db96d56Sopenharmony_ci    return name_addr, value
17827db96d56Sopenharmony_ci
17837db96d56Sopenharmony_cidef get_mailbox(value):
17847db96d56Sopenharmony_ci    """ mailbox = name-addr / addr-spec
17857db96d56Sopenharmony_ci
17867db96d56Sopenharmony_ci    """
17877db96d56Sopenharmony_ci    # The only way to figure out if we are dealing with a name-addr or an
17887db96d56Sopenharmony_ci    # addr-spec is to try parsing each one.
17897db96d56Sopenharmony_ci    mailbox = Mailbox()
17907db96d56Sopenharmony_ci    try:
17917db96d56Sopenharmony_ci        token, value = get_name_addr(value)
17927db96d56Sopenharmony_ci    except errors.HeaderParseError:
17937db96d56Sopenharmony_ci        try:
17947db96d56Sopenharmony_ci            token, value = get_addr_spec(value)
17957db96d56Sopenharmony_ci        except errors.HeaderParseError:
17967db96d56Sopenharmony_ci            raise errors.HeaderParseError(
17977db96d56Sopenharmony_ci                "expected mailbox but found '{}'".format(value))
17987db96d56Sopenharmony_ci    if any(isinstance(x, errors.InvalidHeaderDefect)
17997db96d56Sopenharmony_ci                       for x in token.all_defects):
18007db96d56Sopenharmony_ci        mailbox.token_type = 'invalid-mailbox'
18017db96d56Sopenharmony_ci    mailbox.append(token)
18027db96d56Sopenharmony_ci    return mailbox, value
18037db96d56Sopenharmony_ci
18047db96d56Sopenharmony_cidef get_invalid_mailbox(value, endchars):
18057db96d56Sopenharmony_ci    """ Read everything up to one of the chars in endchars.
18067db96d56Sopenharmony_ci
18077db96d56Sopenharmony_ci    This is outside the formal grammar.  The InvalidMailbox TokenList that is
18087db96d56Sopenharmony_ci    returned acts like a Mailbox, but the data attributes are None.
18097db96d56Sopenharmony_ci
18107db96d56Sopenharmony_ci    """
18117db96d56Sopenharmony_ci    invalid_mailbox = InvalidMailbox()
18127db96d56Sopenharmony_ci    while value and value[0] not in endchars:
18137db96d56Sopenharmony_ci        if value[0] in PHRASE_ENDS:
18147db96d56Sopenharmony_ci            invalid_mailbox.append(ValueTerminal(value[0],
18157db96d56Sopenharmony_ci                                                 'misplaced-special'))
18167db96d56Sopenharmony_ci            value = value[1:]
18177db96d56Sopenharmony_ci        else:
18187db96d56Sopenharmony_ci            token, value = get_phrase(value)
18197db96d56Sopenharmony_ci            invalid_mailbox.append(token)
18207db96d56Sopenharmony_ci    return invalid_mailbox, value
18217db96d56Sopenharmony_ci
18227db96d56Sopenharmony_cidef get_mailbox_list(value):
18237db96d56Sopenharmony_ci    """ mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
18247db96d56Sopenharmony_ci        obs-mbox-list = *([CFWS] ",") mailbox *("," [mailbox / CFWS])
18257db96d56Sopenharmony_ci
18267db96d56Sopenharmony_ci    For this routine we go outside the formal grammar in order to improve error
18277db96d56Sopenharmony_ci    handling.  We recognize the end of the mailbox list only at the end of the
18287db96d56Sopenharmony_ci    value or at a ';' (the group terminator).  This is so that we can turn
18297db96d56Sopenharmony_ci    invalid mailboxes into InvalidMailbox tokens and continue parsing any
18307db96d56Sopenharmony_ci    remaining valid mailboxes.  We also allow all mailbox entries to be null,
18317db96d56Sopenharmony_ci    and this condition is handled appropriately at a higher level.
18327db96d56Sopenharmony_ci
18337db96d56Sopenharmony_ci    """
18347db96d56Sopenharmony_ci    mailbox_list = MailboxList()
18357db96d56Sopenharmony_ci    while value and value[0] != ';':
18367db96d56Sopenharmony_ci        try:
18377db96d56Sopenharmony_ci            token, value = get_mailbox(value)
18387db96d56Sopenharmony_ci            mailbox_list.append(token)
18397db96d56Sopenharmony_ci        except errors.HeaderParseError:
18407db96d56Sopenharmony_ci            leader = None
18417db96d56Sopenharmony_ci            if value[0] in CFWS_LEADER:
18427db96d56Sopenharmony_ci                leader, value = get_cfws(value)
18437db96d56Sopenharmony_ci                if not value or value[0] in ',;':
18447db96d56Sopenharmony_ci                    mailbox_list.append(leader)
18457db96d56Sopenharmony_ci                    mailbox_list.defects.append(errors.ObsoleteHeaderDefect(
18467db96d56Sopenharmony_ci                        "empty element in mailbox-list"))
18477db96d56Sopenharmony_ci                else:
18487db96d56Sopenharmony_ci                    token, value = get_invalid_mailbox(value, ',;')
18497db96d56Sopenharmony_ci                    if leader is not None:
18507db96d56Sopenharmony_ci                        token[:0] = [leader]
18517db96d56Sopenharmony_ci                    mailbox_list.append(token)
18527db96d56Sopenharmony_ci                    mailbox_list.defects.append(errors.InvalidHeaderDefect(
18537db96d56Sopenharmony_ci                        "invalid mailbox in mailbox-list"))
18547db96d56Sopenharmony_ci            elif value[0] == ',':
18557db96d56Sopenharmony_ci                mailbox_list.defects.append(errors.ObsoleteHeaderDefect(
18567db96d56Sopenharmony_ci                    "empty element in mailbox-list"))
18577db96d56Sopenharmony_ci            else:
18587db96d56Sopenharmony_ci                token, value = get_invalid_mailbox(value, ',;')
18597db96d56Sopenharmony_ci                if leader is not None:
18607db96d56Sopenharmony_ci                    token[:0] = [leader]
18617db96d56Sopenharmony_ci                mailbox_list.append(token)
18627db96d56Sopenharmony_ci                mailbox_list.defects.append(errors.InvalidHeaderDefect(
18637db96d56Sopenharmony_ci                    "invalid mailbox in mailbox-list"))
18647db96d56Sopenharmony_ci        if value and value[0] not in ',;':
18657db96d56Sopenharmony_ci            # Crap after mailbox; treat it as an invalid mailbox.
18667db96d56Sopenharmony_ci            # The mailbox info will still be available.
18677db96d56Sopenharmony_ci            mailbox = mailbox_list[-1]
18687db96d56Sopenharmony_ci            mailbox.token_type = 'invalid-mailbox'
18697db96d56Sopenharmony_ci            token, value = get_invalid_mailbox(value, ',;')
18707db96d56Sopenharmony_ci            mailbox.extend(token)
18717db96d56Sopenharmony_ci            mailbox_list.defects.append(errors.InvalidHeaderDefect(
18727db96d56Sopenharmony_ci                "invalid mailbox in mailbox-list"))
18737db96d56Sopenharmony_ci        if value and value[0] == ',':
18747db96d56Sopenharmony_ci            mailbox_list.append(ListSeparator)
18757db96d56Sopenharmony_ci            value = value[1:]
18767db96d56Sopenharmony_ci    return mailbox_list, value
18777db96d56Sopenharmony_ci
18787db96d56Sopenharmony_ci
18797db96d56Sopenharmony_cidef get_group_list(value):
18807db96d56Sopenharmony_ci    """ group-list = mailbox-list / CFWS / obs-group-list
18817db96d56Sopenharmony_ci        obs-group-list = 1*([CFWS] ",") [CFWS]
18827db96d56Sopenharmony_ci
18837db96d56Sopenharmony_ci    """
18847db96d56Sopenharmony_ci    group_list = GroupList()
18857db96d56Sopenharmony_ci    if not value:
18867db96d56Sopenharmony_ci        group_list.defects.append(errors.InvalidHeaderDefect(
18877db96d56Sopenharmony_ci            "end of header before group-list"))
18887db96d56Sopenharmony_ci        return group_list, value
18897db96d56Sopenharmony_ci    leader = None
18907db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
18917db96d56Sopenharmony_ci        leader, value = get_cfws(value)
18927db96d56Sopenharmony_ci        if not value:
18937db96d56Sopenharmony_ci            # This should never happen in email parsing, since CFWS-only is a
18947db96d56Sopenharmony_ci            # legal alternative to group-list in a group, which is the only
18957db96d56Sopenharmony_ci            # place group-list appears.
18967db96d56Sopenharmony_ci            group_list.defects.append(errors.InvalidHeaderDefect(
18977db96d56Sopenharmony_ci                "end of header in group-list"))
18987db96d56Sopenharmony_ci            group_list.append(leader)
18997db96d56Sopenharmony_ci            return group_list, value
19007db96d56Sopenharmony_ci        if value[0] == ';':
19017db96d56Sopenharmony_ci            group_list.append(leader)
19027db96d56Sopenharmony_ci            return group_list, value
19037db96d56Sopenharmony_ci    token, value = get_mailbox_list(value)
19047db96d56Sopenharmony_ci    if len(token.all_mailboxes)==0:
19057db96d56Sopenharmony_ci        if leader is not None:
19067db96d56Sopenharmony_ci            group_list.append(leader)
19077db96d56Sopenharmony_ci        group_list.extend(token)
19087db96d56Sopenharmony_ci        group_list.defects.append(errors.ObsoleteHeaderDefect(
19097db96d56Sopenharmony_ci            "group-list with empty entries"))
19107db96d56Sopenharmony_ci        return group_list, value
19117db96d56Sopenharmony_ci    if leader is not None:
19127db96d56Sopenharmony_ci        token[:0] = [leader]
19137db96d56Sopenharmony_ci    group_list.append(token)
19147db96d56Sopenharmony_ci    return group_list, value
19157db96d56Sopenharmony_ci
19167db96d56Sopenharmony_cidef get_group(value):
19177db96d56Sopenharmony_ci    """ group = display-name ":" [group-list] ";" [CFWS]
19187db96d56Sopenharmony_ci
19197db96d56Sopenharmony_ci    """
19207db96d56Sopenharmony_ci    group = Group()
19217db96d56Sopenharmony_ci    token, value = get_display_name(value)
19227db96d56Sopenharmony_ci    if not value or value[0] != ':':
19237db96d56Sopenharmony_ci        raise errors.HeaderParseError("expected ':' at end of group "
19247db96d56Sopenharmony_ci            "display name but found '{}'".format(value))
19257db96d56Sopenharmony_ci    group.append(token)
19267db96d56Sopenharmony_ci    group.append(ValueTerminal(':', 'group-display-name-terminator'))
19277db96d56Sopenharmony_ci    value = value[1:]
19287db96d56Sopenharmony_ci    if value and value[0] == ';':
19297db96d56Sopenharmony_ci        group.append(ValueTerminal(';', 'group-terminator'))
19307db96d56Sopenharmony_ci        return group, value[1:]
19317db96d56Sopenharmony_ci    token, value = get_group_list(value)
19327db96d56Sopenharmony_ci    group.append(token)
19337db96d56Sopenharmony_ci    if not value:
19347db96d56Sopenharmony_ci        group.defects.append(errors.InvalidHeaderDefect(
19357db96d56Sopenharmony_ci            "end of header in group"))
19367db96d56Sopenharmony_ci    elif value[0] != ';':
19377db96d56Sopenharmony_ci        raise errors.HeaderParseError(
19387db96d56Sopenharmony_ci            "expected ';' at end of group but found {}".format(value))
19397db96d56Sopenharmony_ci    group.append(ValueTerminal(';', 'group-terminator'))
19407db96d56Sopenharmony_ci    value = value[1:]
19417db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
19427db96d56Sopenharmony_ci        token, value = get_cfws(value)
19437db96d56Sopenharmony_ci        group.append(token)
19447db96d56Sopenharmony_ci    return group, value
19457db96d56Sopenharmony_ci
19467db96d56Sopenharmony_cidef get_address(value):
19477db96d56Sopenharmony_ci    """ address = mailbox / group
19487db96d56Sopenharmony_ci
19497db96d56Sopenharmony_ci    Note that counter-intuitively, an address can be either a single address or
19507db96d56Sopenharmony_ci    a list of addresses (a group).  This is why the returned Address object has
19517db96d56Sopenharmony_ci    a 'mailboxes' attribute which treats a single address as a list of length
19527db96d56Sopenharmony_ci    one.  When you need to differentiate between to two cases, extract the single
19537db96d56Sopenharmony_ci    element, which is either a mailbox or a group token.
19547db96d56Sopenharmony_ci
19557db96d56Sopenharmony_ci    """
19567db96d56Sopenharmony_ci    # The formal grammar isn't very helpful when parsing an address.  mailbox
19577db96d56Sopenharmony_ci    # and group, especially when allowing for obsolete forms, start off very
19587db96d56Sopenharmony_ci    # similarly.  It is only when you reach one of @, <, or : that you know
19597db96d56Sopenharmony_ci    # what you've got.  So, we try each one in turn, starting with the more
19607db96d56Sopenharmony_ci    # likely of the two.  We could perhaps make this more efficient by looking
19617db96d56Sopenharmony_ci    # for a phrase and then branching based on the next character, but that
19627db96d56Sopenharmony_ci    # would be a premature optimization.
19637db96d56Sopenharmony_ci    address = Address()
19647db96d56Sopenharmony_ci    try:
19657db96d56Sopenharmony_ci        token, value = get_group(value)
19667db96d56Sopenharmony_ci    except errors.HeaderParseError:
19677db96d56Sopenharmony_ci        try:
19687db96d56Sopenharmony_ci            token, value = get_mailbox(value)
19697db96d56Sopenharmony_ci        except errors.HeaderParseError:
19707db96d56Sopenharmony_ci            raise errors.HeaderParseError(
19717db96d56Sopenharmony_ci                "expected address but found '{}'".format(value))
19727db96d56Sopenharmony_ci    address.append(token)
19737db96d56Sopenharmony_ci    return address, value
19747db96d56Sopenharmony_ci
19757db96d56Sopenharmony_cidef get_address_list(value):
19767db96d56Sopenharmony_ci    """ address_list = (address *("," address)) / obs-addr-list
19777db96d56Sopenharmony_ci        obs-addr-list = *([CFWS] ",") address *("," [address / CFWS])
19787db96d56Sopenharmony_ci
19797db96d56Sopenharmony_ci    We depart from the formal grammar here by continuing to parse until the end
19807db96d56Sopenharmony_ci    of the input, assuming the input to be entirely composed of an
19817db96d56Sopenharmony_ci    address-list.  This is always true in email parsing, and allows us
19827db96d56Sopenharmony_ci    to skip invalid addresses to parse additional valid ones.
19837db96d56Sopenharmony_ci
19847db96d56Sopenharmony_ci    """
19857db96d56Sopenharmony_ci    address_list = AddressList()
19867db96d56Sopenharmony_ci    while value:
19877db96d56Sopenharmony_ci        try:
19887db96d56Sopenharmony_ci            token, value = get_address(value)
19897db96d56Sopenharmony_ci            address_list.append(token)
19907db96d56Sopenharmony_ci        except errors.HeaderParseError as err:
19917db96d56Sopenharmony_ci            leader = None
19927db96d56Sopenharmony_ci            if value[0] in CFWS_LEADER:
19937db96d56Sopenharmony_ci                leader, value = get_cfws(value)
19947db96d56Sopenharmony_ci                if not value or value[0] == ',':
19957db96d56Sopenharmony_ci                    address_list.append(leader)
19967db96d56Sopenharmony_ci                    address_list.defects.append(errors.ObsoleteHeaderDefect(
19977db96d56Sopenharmony_ci                        "address-list entry with no content"))
19987db96d56Sopenharmony_ci                else:
19997db96d56Sopenharmony_ci                    token, value = get_invalid_mailbox(value, ',')
20007db96d56Sopenharmony_ci                    if leader is not None:
20017db96d56Sopenharmony_ci                        token[:0] = [leader]
20027db96d56Sopenharmony_ci                    address_list.append(Address([token]))
20037db96d56Sopenharmony_ci                    address_list.defects.append(errors.InvalidHeaderDefect(
20047db96d56Sopenharmony_ci                        "invalid address in address-list"))
20057db96d56Sopenharmony_ci            elif value[0] == ',':
20067db96d56Sopenharmony_ci                address_list.defects.append(errors.ObsoleteHeaderDefect(
20077db96d56Sopenharmony_ci                    "empty element in address-list"))
20087db96d56Sopenharmony_ci            else:
20097db96d56Sopenharmony_ci                token, value = get_invalid_mailbox(value, ',')
20107db96d56Sopenharmony_ci                if leader is not None:
20117db96d56Sopenharmony_ci                    token[:0] = [leader]
20127db96d56Sopenharmony_ci                address_list.append(Address([token]))
20137db96d56Sopenharmony_ci                address_list.defects.append(errors.InvalidHeaderDefect(
20147db96d56Sopenharmony_ci                    "invalid address in address-list"))
20157db96d56Sopenharmony_ci        if value and value[0] != ',':
20167db96d56Sopenharmony_ci            # Crap after address; treat it as an invalid mailbox.
20177db96d56Sopenharmony_ci            # The mailbox info will still be available.
20187db96d56Sopenharmony_ci            mailbox = address_list[-1][0]
20197db96d56Sopenharmony_ci            mailbox.token_type = 'invalid-mailbox'
20207db96d56Sopenharmony_ci            token, value = get_invalid_mailbox(value, ',')
20217db96d56Sopenharmony_ci            mailbox.extend(token)
20227db96d56Sopenharmony_ci            address_list.defects.append(errors.InvalidHeaderDefect(
20237db96d56Sopenharmony_ci                "invalid address in address-list"))
20247db96d56Sopenharmony_ci        if value:  # Must be a , at this point.
20257db96d56Sopenharmony_ci            address_list.append(ValueTerminal(',', 'list-separator'))
20267db96d56Sopenharmony_ci            value = value[1:]
20277db96d56Sopenharmony_ci    return address_list, value
20287db96d56Sopenharmony_ci
20297db96d56Sopenharmony_ci
20307db96d56Sopenharmony_cidef get_no_fold_literal(value):
20317db96d56Sopenharmony_ci    """ no-fold-literal = "[" *dtext "]"
20327db96d56Sopenharmony_ci    """
20337db96d56Sopenharmony_ci    no_fold_literal = NoFoldLiteral()
20347db96d56Sopenharmony_ci    if not value:
20357db96d56Sopenharmony_ci        raise errors.HeaderParseError(
20367db96d56Sopenharmony_ci            "expected no-fold-literal but found '{}'".format(value))
20377db96d56Sopenharmony_ci    if value[0] != '[':
20387db96d56Sopenharmony_ci        raise errors.HeaderParseError(
20397db96d56Sopenharmony_ci            "expected '[' at the start of no-fold-literal "
20407db96d56Sopenharmony_ci            "but found '{}'".format(value))
20417db96d56Sopenharmony_ci    no_fold_literal.append(ValueTerminal('[', 'no-fold-literal-start'))
20427db96d56Sopenharmony_ci    value = value[1:]
20437db96d56Sopenharmony_ci    token, value = get_dtext(value)
20447db96d56Sopenharmony_ci    no_fold_literal.append(token)
20457db96d56Sopenharmony_ci    if not value or value[0] != ']':
20467db96d56Sopenharmony_ci        raise errors.HeaderParseError(
20477db96d56Sopenharmony_ci            "expected ']' at the end of no-fold-literal "
20487db96d56Sopenharmony_ci            "but found '{}'".format(value))
20497db96d56Sopenharmony_ci    no_fold_literal.append(ValueTerminal(']', 'no-fold-literal-end'))
20507db96d56Sopenharmony_ci    return no_fold_literal, value[1:]
20517db96d56Sopenharmony_ci
20527db96d56Sopenharmony_cidef get_msg_id(value):
20537db96d56Sopenharmony_ci    """msg-id = [CFWS] "<" id-left '@' id-right  ">" [CFWS]
20547db96d56Sopenharmony_ci       id-left = dot-atom-text / obs-id-left
20557db96d56Sopenharmony_ci       id-right = dot-atom-text / no-fold-literal / obs-id-right
20567db96d56Sopenharmony_ci       no-fold-literal = "[" *dtext "]"
20577db96d56Sopenharmony_ci    """
20587db96d56Sopenharmony_ci    msg_id = MsgID()
20597db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
20607db96d56Sopenharmony_ci        token, value = get_cfws(value)
20617db96d56Sopenharmony_ci        msg_id.append(token)
20627db96d56Sopenharmony_ci    if not value or value[0] != '<':
20637db96d56Sopenharmony_ci        raise errors.HeaderParseError(
20647db96d56Sopenharmony_ci            "expected msg-id but found '{}'".format(value))
20657db96d56Sopenharmony_ci    msg_id.append(ValueTerminal('<', 'msg-id-start'))
20667db96d56Sopenharmony_ci    value = value[1:]
20677db96d56Sopenharmony_ci    # Parse id-left.
20687db96d56Sopenharmony_ci    try:
20697db96d56Sopenharmony_ci        token, value = get_dot_atom_text(value)
20707db96d56Sopenharmony_ci    except errors.HeaderParseError:
20717db96d56Sopenharmony_ci        try:
20727db96d56Sopenharmony_ci            # obs-id-left is same as local-part of add-spec.
20737db96d56Sopenharmony_ci            token, value = get_obs_local_part(value)
20747db96d56Sopenharmony_ci            msg_id.defects.append(errors.ObsoleteHeaderDefect(
20757db96d56Sopenharmony_ci                "obsolete id-left in msg-id"))
20767db96d56Sopenharmony_ci        except errors.HeaderParseError:
20777db96d56Sopenharmony_ci            raise errors.HeaderParseError(
20787db96d56Sopenharmony_ci                "expected dot-atom-text or obs-id-left"
20797db96d56Sopenharmony_ci                " but found '{}'".format(value))
20807db96d56Sopenharmony_ci    msg_id.append(token)
20817db96d56Sopenharmony_ci    if not value or value[0] != '@':
20827db96d56Sopenharmony_ci        msg_id.defects.append(errors.InvalidHeaderDefect(
20837db96d56Sopenharmony_ci            "msg-id with no id-right"))
20847db96d56Sopenharmony_ci        # Even though there is no id-right, if the local part
20857db96d56Sopenharmony_ci        # ends with `>` let's just parse it too and return
20867db96d56Sopenharmony_ci        # along with the defect.
20877db96d56Sopenharmony_ci        if value and value[0] == '>':
20887db96d56Sopenharmony_ci            msg_id.append(ValueTerminal('>', 'msg-id-end'))
20897db96d56Sopenharmony_ci            value = value[1:]
20907db96d56Sopenharmony_ci        return msg_id, value
20917db96d56Sopenharmony_ci    msg_id.append(ValueTerminal('@', 'address-at-symbol'))
20927db96d56Sopenharmony_ci    value = value[1:]
20937db96d56Sopenharmony_ci    # Parse id-right.
20947db96d56Sopenharmony_ci    try:
20957db96d56Sopenharmony_ci        token, value = get_dot_atom_text(value)
20967db96d56Sopenharmony_ci    except errors.HeaderParseError:
20977db96d56Sopenharmony_ci        try:
20987db96d56Sopenharmony_ci            token, value = get_no_fold_literal(value)
20997db96d56Sopenharmony_ci        except errors.HeaderParseError as e:
21007db96d56Sopenharmony_ci            try:
21017db96d56Sopenharmony_ci                token, value = get_domain(value)
21027db96d56Sopenharmony_ci                msg_id.defects.append(errors.ObsoleteHeaderDefect(
21037db96d56Sopenharmony_ci                    "obsolete id-right in msg-id"))
21047db96d56Sopenharmony_ci            except errors.HeaderParseError:
21057db96d56Sopenharmony_ci                raise errors.HeaderParseError(
21067db96d56Sopenharmony_ci                    "expected dot-atom-text, no-fold-literal or obs-id-right"
21077db96d56Sopenharmony_ci                    " but found '{}'".format(value))
21087db96d56Sopenharmony_ci    msg_id.append(token)
21097db96d56Sopenharmony_ci    if value and value[0] == '>':
21107db96d56Sopenharmony_ci        value = value[1:]
21117db96d56Sopenharmony_ci    else:
21127db96d56Sopenharmony_ci        msg_id.defects.append(errors.InvalidHeaderDefect(
21137db96d56Sopenharmony_ci            "missing trailing '>' on msg-id"))
21147db96d56Sopenharmony_ci    msg_id.append(ValueTerminal('>', 'msg-id-end'))
21157db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
21167db96d56Sopenharmony_ci        token, value = get_cfws(value)
21177db96d56Sopenharmony_ci        msg_id.append(token)
21187db96d56Sopenharmony_ci    return msg_id, value
21197db96d56Sopenharmony_ci
21207db96d56Sopenharmony_ci
21217db96d56Sopenharmony_cidef parse_message_id(value):
21227db96d56Sopenharmony_ci    """message-id      =   "Message-ID:" msg-id CRLF
21237db96d56Sopenharmony_ci    """
21247db96d56Sopenharmony_ci    message_id = MessageID()
21257db96d56Sopenharmony_ci    try:
21267db96d56Sopenharmony_ci        token, value = get_msg_id(value)
21277db96d56Sopenharmony_ci        message_id.append(token)
21287db96d56Sopenharmony_ci    except errors.HeaderParseError as ex:
21297db96d56Sopenharmony_ci        token = get_unstructured(value)
21307db96d56Sopenharmony_ci        message_id = InvalidMessageID(token)
21317db96d56Sopenharmony_ci        message_id.defects.append(
21327db96d56Sopenharmony_ci            errors.InvalidHeaderDefect("Invalid msg-id: {!r}".format(ex)))
21337db96d56Sopenharmony_ci    else:
21347db96d56Sopenharmony_ci        # Value after parsing a valid msg_id should be None.
21357db96d56Sopenharmony_ci        if value:
21367db96d56Sopenharmony_ci            message_id.defects.append(errors.InvalidHeaderDefect(
21377db96d56Sopenharmony_ci                "Unexpected {!r}".format(value)))
21387db96d56Sopenharmony_ci
21397db96d56Sopenharmony_ci    return message_id
21407db96d56Sopenharmony_ci
21417db96d56Sopenharmony_ci#
21427db96d56Sopenharmony_ci# XXX: As I begin to add additional header parsers, I'm realizing we probably
21437db96d56Sopenharmony_ci# have two level of parser routines: the get_XXX methods that get a token in
21447db96d56Sopenharmony_ci# the grammar, and parse_XXX methods that parse an entire field value.  So
21457db96d56Sopenharmony_ci# get_address_list above should really be a parse_ method, as probably should
21467db96d56Sopenharmony_ci# be get_unstructured.
21477db96d56Sopenharmony_ci#
21487db96d56Sopenharmony_ci
21497db96d56Sopenharmony_cidef parse_mime_version(value):
21507db96d56Sopenharmony_ci    """ mime-version = [CFWS] 1*digit [CFWS] "." [CFWS] 1*digit [CFWS]
21517db96d56Sopenharmony_ci
21527db96d56Sopenharmony_ci    """
21537db96d56Sopenharmony_ci    # The [CFWS] is implicit in the RFC 2045 BNF.
21547db96d56Sopenharmony_ci    # XXX: This routine is a bit verbose, should factor out a get_int method.
21557db96d56Sopenharmony_ci    mime_version = MIMEVersion()
21567db96d56Sopenharmony_ci    if not value:
21577db96d56Sopenharmony_ci        mime_version.defects.append(errors.HeaderMissingRequiredValue(
21587db96d56Sopenharmony_ci            "Missing MIME version number (eg: 1.0)"))
21597db96d56Sopenharmony_ci        return mime_version
21607db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
21617db96d56Sopenharmony_ci        token, value = get_cfws(value)
21627db96d56Sopenharmony_ci        mime_version.append(token)
21637db96d56Sopenharmony_ci        if not value:
21647db96d56Sopenharmony_ci            mime_version.defects.append(errors.HeaderMissingRequiredValue(
21657db96d56Sopenharmony_ci                "Expected MIME version number but found only CFWS"))
21667db96d56Sopenharmony_ci    digits = ''
21677db96d56Sopenharmony_ci    while value and value[0] != '.' and value[0] not in CFWS_LEADER:
21687db96d56Sopenharmony_ci        digits += value[0]
21697db96d56Sopenharmony_ci        value = value[1:]
21707db96d56Sopenharmony_ci    if not digits.isdigit():
21717db96d56Sopenharmony_ci        mime_version.defects.append(errors.InvalidHeaderDefect(
21727db96d56Sopenharmony_ci            "Expected MIME major version number but found {!r}".format(digits)))
21737db96d56Sopenharmony_ci        mime_version.append(ValueTerminal(digits, 'xtext'))
21747db96d56Sopenharmony_ci    else:
21757db96d56Sopenharmony_ci        mime_version.major = int(digits)
21767db96d56Sopenharmony_ci        mime_version.append(ValueTerminal(digits, 'digits'))
21777db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
21787db96d56Sopenharmony_ci        token, value = get_cfws(value)
21797db96d56Sopenharmony_ci        mime_version.append(token)
21807db96d56Sopenharmony_ci    if not value or value[0] != '.':
21817db96d56Sopenharmony_ci        if mime_version.major is not None:
21827db96d56Sopenharmony_ci            mime_version.defects.append(errors.InvalidHeaderDefect(
21837db96d56Sopenharmony_ci                "Incomplete MIME version; found only major number"))
21847db96d56Sopenharmony_ci        if value:
21857db96d56Sopenharmony_ci            mime_version.append(ValueTerminal(value, 'xtext'))
21867db96d56Sopenharmony_ci        return mime_version
21877db96d56Sopenharmony_ci    mime_version.append(ValueTerminal('.', 'version-separator'))
21887db96d56Sopenharmony_ci    value = value[1:]
21897db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
21907db96d56Sopenharmony_ci        token, value = get_cfws(value)
21917db96d56Sopenharmony_ci        mime_version.append(token)
21927db96d56Sopenharmony_ci    if not value:
21937db96d56Sopenharmony_ci        if mime_version.major is not None:
21947db96d56Sopenharmony_ci            mime_version.defects.append(errors.InvalidHeaderDefect(
21957db96d56Sopenharmony_ci                "Incomplete MIME version; found only major number"))
21967db96d56Sopenharmony_ci        return mime_version
21977db96d56Sopenharmony_ci    digits = ''
21987db96d56Sopenharmony_ci    while value and value[0] not in CFWS_LEADER:
21997db96d56Sopenharmony_ci        digits += value[0]
22007db96d56Sopenharmony_ci        value = value[1:]
22017db96d56Sopenharmony_ci    if not digits.isdigit():
22027db96d56Sopenharmony_ci        mime_version.defects.append(errors.InvalidHeaderDefect(
22037db96d56Sopenharmony_ci            "Expected MIME minor version number but found {!r}".format(digits)))
22047db96d56Sopenharmony_ci        mime_version.append(ValueTerminal(digits, 'xtext'))
22057db96d56Sopenharmony_ci    else:
22067db96d56Sopenharmony_ci        mime_version.minor = int(digits)
22077db96d56Sopenharmony_ci        mime_version.append(ValueTerminal(digits, 'digits'))
22087db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
22097db96d56Sopenharmony_ci        token, value = get_cfws(value)
22107db96d56Sopenharmony_ci        mime_version.append(token)
22117db96d56Sopenharmony_ci    if value:
22127db96d56Sopenharmony_ci        mime_version.defects.append(errors.InvalidHeaderDefect(
22137db96d56Sopenharmony_ci            "Excess non-CFWS text after MIME version"))
22147db96d56Sopenharmony_ci        mime_version.append(ValueTerminal(value, 'xtext'))
22157db96d56Sopenharmony_ci    return mime_version
22167db96d56Sopenharmony_ci
22177db96d56Sopenharmony_cidef get_invalid_parameter(value):
22187db96d56Sopenharmony_ci    """ Read everything up to the next ';'.
22197db96d56Sopenharmony_ci
22207db96d56Sopenharmony_ci    This is outside the formal grammar.  The InvalidParameter TokenList that is
22217db96d56Sopenharmony_ci    returned acts like a Parameter, but the data attributes are None.
22227db96d56Sopenharmony_ci
22237db96d56Sopenharmony_ci    """
22247db96d56Sopenharmony_ci    invalid_parameter = InvalidParameter()
22257db96d56Sopenharmony_ci    while value and value[0] != ';':
22267db96d56Sopenharmony_ci        if value[0] in PHRASE_ENDS:
22277db96d56Sopenharmony_ci            invalid_parameter.append(ValueTerminal(value[0],
22287db96d56Sopenharmony_ci                                                   'misplaced-special'))
22297db96d56Sopenharmony_ci            value = value[1:]
22307db96d56Sopenharmony_ci        else:
22317db96d56Sopenharmony_ci            token, value = get_phrase(value)
22327db96d56Sopenharmony_ci            invalid_parameter.append(token)
22337db96d56Sopenharmony_ci    return invalid_parameter, value
22347db96d56Sopenharmony_ci
22357db96d56Sopenharmony_cidef get_ttext(value):
22367db96d56Sopenharmony_ci    """ttext = <matches _ttext_matcher>
22377db96d56Sopenharmony_ci
22387db96d56Sopenharmony_ci    We allow any non-TOKEN_ENDS in ttext, but add defects to the token's
22397db96d56Sopenharmony_ci    defects list if we find non-ttext characters.  We also register defects for
22407db96d56Sopenharmony_ci    *any* non-printables even though the RFC doesn't exclude all of them,
22417db96d56Sopenharmony_ci    because we follow the spirit of RFC 5322.
22427db96d56Sopenharmony_ci
22437db96d56Sopenharmony_ci    """
22447db96d56Sopenharmony_ci    m = _non_token_end_matcher(value)
22457db96d56Sopenharmony_ci    if not m:
22467db96d56Sopenharmony_ci        raise errors.HeaderParseError(
22477db96d56Sopenharmony_ci            "expected ttext but found '{}'".format(value))
22487db96d56Sopenharmony_ci    ttext = m.group()
22497db96d56Sopenharmony_ci    value = value[len(ttext):]
22507db96d56Sopenharmony_ci    ttext = ValueTerminal(ttext, 'ttext')
22517db96d56Sopenharmony_ci    _validate_xtext(ttext)
22527db96d56Sopenharmony_ci    return ttext, value
22537db96d56Sopenharmony_ci
22547db96d56Sopenharmony_cidef get_token(value):
22557db96d56Sopenharmony_ci    """token = [CFWS] 1*ttext [CFWS]
22567db96d56Sopenharmony_ci
22577db96d56Sopenharmony_ci    The RFC equivalent of ttext is any US-ASCII chars except space, ctls, or
22587db96d56Sopenharmony_ci    tspecials.  We also exclude tabs even though the RFC doesn't.
22597db96d56Sopenharmony_ci
22607db96d56Sopenharmony_ci    The RFC implies the CFWS but is not explicit about it in the BNF.
22617db96d56Sopenharmony_ci
22627db96d56Sopenharmony_ci    """
22637db96d56Sopenharmony_ci    mtoken = Token()
22647db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
22657db96d56Sopenharmony_ci        token, value = get_cfws(value)
22667db96d56Sopenharmony_ci        mtoken.append(token)
22677db96d56Sopenharmony_ci    if value and value[0] in TOKEN_ENDS:
22687db96d56Sopenharmony_ci        raise errors.HeaderParseError(
22697db96d56Sopenharmony_ci            "expected token but found '{}'".format(value))
22707db96d56Sopenharmony_ci    token, value = get_ttext(value)
22717db96d56Sopenharmony_ci    mtoken.append(token)
22727db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
22737db96d56Sopenharmony_ci        token, value = get_cfws(value)
22747db96d56Sopenharmony_ci        mtoken.append(token)
22757db96d56Sopenharmony_ci    return mtoken, value
22767db96d56Sopenharmony_ci
22777db96d56Sopenharmony_cidef get_attrtext(value):
22787db96d56Sopenharmony_ci    """attrtext = 1*(any non-ATTRIBUTE_ENDS character)
22797db96d56Sopenharmony_ci
22807db96d56Sopenharmony_ci    We allow any non-ATTRIBUTE_ENDS in attrtext, but add defects to the
22817db96d56Sopenharmony_ci    token's defects list if we find non-attrtext characters.  We also register
22827db96d56Sopenharmony_ci    defects for *any* non-printables even though the RFC doesn't exclude all of
22837db96d56Sopenharmony_ci    them, because we follow the spirit of RFC 5322.
22847db96d56Sopenharmony_ci
22857db96d56Sopenharmony_ci    """
22867db96d56Sopenharmony_ci    m = _non_attribute_end_matcher(value)
22877db96d56Sopenharmony_ci    if not m:
22887db96d56Sopenharmony_ci        raise errors.HeaderParseError(
22897db96d56Sopenharmony_ci            "expected attrtext but found {!r}".format(value))
22907db96d56Sopenharmony_ci    attrtext = m.group()
22917db96d56Sopenharmony_ci    value = value[len(attrtext):]
22927db96d56Sopenharmony_ci    attrtext = ValueTerminal(attrtext, 'attrtext')
22937db96d56Sopenharmony_ci    _validate_xtext(attrtext)
22947db96d56Sopenharmony_ci    return attrtext, value
22957db96d56Sopenharmony_ci
22967db96d56Sopenharmony_cidef get_attribute(value):
22977db96d56Sopenharmony_ci    """ [CFWS] 1*attrtext [CFWS]
22987db96d56Sopenharmony_ci
22997db96d56Sopenharmony_ci    This version of the BNF makes the CFWS explicit, and as usual we use a
23007db96d56Sopenharmony_ci    value terminal for the actual run of characters.  The RFC equivalent of
23017db96d56Sopenharmony_ci    attrtext is the token characters, with the subtraction of '*', "'", and '%'.
23027db96d56Sopenharmony_ci    We include tab in the excluded set just as we do for token.
23037db96d56Sopenharmony_ci
23047db96d56Sopenharmony_ci    """
23057db96d56Sopenharmony_ci    attribute = Attribute()
23067db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
23077db96d56Sopenharmony_ci        token, value = get_cfws(value)
23087db96d56Sopenharmony_ci        attribute.append(token)
23097db96d56Sopenharmony_ci    if value and value[0] in ATTRIBUTE_ENDS:
23107db96d56Sopenharmony_ci        raise errors.HeaderParseError(
23117db96d56Sopenharmony_ci            "expected token but found '{}'".format(value))
23127db96d56Sopenharmony_ci    token, value = get_attrtext(value)
23137db96d56Sopenharmony_ci    attribute.append(token)
23147db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
23157db96d56Sopenharmony_ci        token, value = get_cfws(value)
23167db96d56Sopenharmony_ci        attribute.append(token)
23177db96d56Sopenharmony_ci    return attribute, value
23187db96d56Sopenharmony_ci
23197db96d56Sopenharmony_cidef get_extended_attrtext(value):
23207db96d56Sopenharmony_ci    """attrtext = 1*(any non-ATTRIBUTE_ENDS character plus '%')
23217db96d56Sopenharmony_ci
23227db96d56Sopenharmony_ci    This is a special parsing routine so that we get a value that
23237db96d56Sopenharmony_ci    includes % escapes as a single string (which we decode as a single
23247db96d56Sopenharmony_ci    string later).
23257db96d56Sopenharmony_ci
23267db96d56Sopenharmony_ci    """
23277db96d56Sopenharmony_ci    m = _non_extended_attribute_end_matcher(value)
23287db96d56Sopenharmony_ci    if not m:
23297db96d56Sopenharmony_ci        raise errors.HeaderParseError(
23307db96d56Sopenharmony_ci            "expected extended attrtext but found {!r}".format(value))
23317db96d56Sopenharmony_ci    attrtext = m.group()
23327db96d56Sopenharmony_ci    value = value[len(attrtext):]
23337db96d56Sopenharmony_ci    attrtext = ValueTerminal(attrtext, 'extended-attrtext')
23347db96d56Sopenharmony_ci    _validate_xtext(attrtext)
23357db96d56Sopenharmony_ci    return attrtext, value
23367db96d56Sopenharmony_ci
23377db96d56Sopenharmony_cidef get_extended_attribute(value):
23387db96d56Sopenharmony_ci    """ [CFWS] 1*extended_attrtext [CFWS]
23397db96d56Sopenharmony_ci
23407db96d56Sopenharmony_ci    This is like the non-extended version except we allow % characters, so that
23417db96d56Sopenharmony_ci    we can pick up an encoded value as a single string.
23427db96d56Sopenharmony_ci
23437db96d56Sopenharmony_ci    """
23447db96d56Sopenharmony_ci    # XXX: should we have an ExtendedAttribute TokenList?
23457db96d56Sopenharmony_ci    attribute = Attribute()
23467db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
23477db96d56Sopenharmony_ci        token, value = get_cfws(value)
23487db96d56Sopenharmony_ci        attribute.append(token)
23497db96d56Sopenharmony_ci    if value and value[0] in EXTENDED_ATTRIBUTE_ENDS:
23507db96d56Sopenharmony_ci        raise errors.HeaderParseError(
23517db96d56Sopenharmony_ci            "expected token but found '{}'".format(value))
23527db96d56Sopenharmony_ci    token, value = get_extended_attrtext(value)
23537db96d56Sopenharmony_ci    attribute.append(token)
23547db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
23557db96d56Sopenharmony_ci        token, value = get_cfws(value)
23567db96d56Sopenharmony_ci        attribute.append(token)
23577db96d56Sopenharmony_ci    return attribute, value
23587db96d56Sopenharmony_ci
23597db96d56Sopenharmony_cidef get_section(value):
23607db96d56Sopenharmony_ci    """ '*' digits
23617db96d56Sopenharmony_ci
23627db96d56Sopenharmony_ci    The formal BNF is more complicated because leading 0s are not allowed.  We
23637db96d56Sopenharmony_ci    check for that and add a defect.  We also assume no CFWS is allowed between
23647db96d56Sopenharmony_ci    the '*' and the digits, though the RFC is not crystal clear on that.
23657db96d56Sopenharmony_ci    The caller should already have dealt with leading CFWS.
23667db96d56Sopenharmony_ci
23677db96d56Sopenharmony_ci    """
23687db96d56Sopenharmony_ci    section = Section()
23697db96d56Sopenharmony_ci    if not value or value[0] != '*':
23707db96d56Sopenharmony_ci        raise errors.HeaderParseError("Expected section but found {}".format(
23717db96d56Sopenharmony_ci                                        value))
23727db96d56Sopenharmony_ci    section.append(ValueTerminal('*', 'section-marker'))
23737db96d56Sopenharmony_ci    value = value[1:]
23747db96d56Sopenharmony_ci    if not value or not value[0].isdigit():
23757db96d56Sopenharmony_ci        raise errors.HeaderParseError("Expected section number but "
23767db96d56Sopenharmony_ci                                      "found {}".format(value))
23777db96d56Sopenharmony_ci    digits = ''
23787db96d56Sopenharmony_ci    while value and value[0].isdigit():
23797db96d56Sopenharmony_ci        digits += value[0]
23807db96d56Sopenharmony_ci        value = value[1:]
23817db96d56Sopenharmony_ci    if digits[0] == '0' and digits != '0':
23827db96d56Sopenharmony_ci        section.defects.append(errors.InvalidHeaderDefect(
23837db96d56Sopenharmony_ci                "section number has an invalid leading 0"))
23847db96d56Sopenharmony_ci    section.number = int(digits)
23857db96d56Sopenharmony_ci    section.append(ValueTerminal(digits, 'digits'))
23867db96d56Sopenharmony_ci    return section, value
23877db96d56Sopenharmony_ci
23887db96d56Sopenharmony_ci
23897db96d56Sopenharmony_cidef get_value(value):
23907db96d56Sopenharmony_ci    """ quoted-string / attribute
23917db96d56Sopenharmony_ci
23927db96d56Sopenharmony_ci    """
23937db96d56Sopenharmony_ci    v = Value()
23947db96d56Sopenharmony_ci    if not value:
23957db96d56Sopenharmony_ci        raise errors.HeaderParseError("Expected value but found end of string")
23967db96d56Sopenharmony_ci    leader = None
23977db96d56Sopenharmony_ci    if value[0] in CFWS_LEADER:
23987db96d56Sopenharmony_ci        leader, value = get_cfws(value)
23997db96d56Sopenharmony_ci    if not value:
24007db96d56Sopenharmony_ci        raise errors.HeaderParseError("Expected value but found "
24017db96d56Sopenharmony_ci                                      "only {}".format(leader))
24027db96d56Sopenharmony_ci    if value[0] == '"':
24037db96d56Sopenharmony_ci        token, value = get_quoted_string(value)
24047db96d56Sopenharmony_ci    else:
24057db96d56Sopenharmony_ci        token, value = get_extended_attribute(value)
24067db96d56Sopenharmony_ci    if leader is not None:
24077db96d56Sopenharmony_ci        token[:0] = [leader]
24087db96d56Sopenharmony_ci    v.append(token)
24097db96d56Sopenharmony_ci    return v, value
24107db96d56Sopenharmony_ci
24117db96d56Sopenharmony_cidef get_parameter(value):
24127db96d56Sopenharmony_ci    """ attribute [section] ["*"] [CFWS] "=" value
24137db96d56Sopenharmony_ci
24147db96d56Sopenharmony_ci    The CFWS is implied by the RFC but not made explicit in the BNF.  This
24157db96d56Sopenharmony_ci    simplified form of the BNF from the RFC is made to conform with the RFC BNF
24167db96d56Sopenharmony_ci    through some extra checks.  We do it this way because it makes both error
24177db96d56Sopenharmony_ci    recovery and working with the resulting parse tree easier.
24187db96d56Sopenharmony_ci    """
24197db96d56Sopenharmony_ci    # It is possible CFWS would also be implicitly allowed between the section
24207db96d56Sopenharmony_ci    # and the 'extended-attribute' marker (the '*') , but we've never seen that
24217db96d56Sopenharmony_ci    # in the wild and we will therefore ignore the possibility.
24227db96d56Sopenharmony_ci    param = Parameter()
24237db96d56Sopenharmony_ci    token, value = get_attribute(value)
24247db96d56Sopenharmony_ci    param.append(token)
24257db96d56Sopenharmony_ci    if not value or value[0] == ';':
24267db96d56Sopenharmony_ci        param.defects.append(errors.InvalidHeaderDefect("Parameter contains "
24277db96d56Sopenharmony_ci            "name ({}) but no value".format(token)))
24287db96d56Sopenharmony_ci        return param, value
24297db96d56Sopenharmony_ci    if value[0] == '*':
24307db96d56Sopenharmony_ci        try:
24317db96d56Sopenharmony_ci            token, value = get_section(value)
24327db96d56Sopenharmony_ci            param.sectioned = True
24337db96d56Sopenharmony_ci            param.append(token)
24347db96d56Sopenharmony_ci        except errors.HeaderParseError:
24357db96d56Sopenharmony_ci            pass
24367db96d56Sopenharmony_ci        if not value:
24377db96d56Sopenharmony_ci            raise errors.HeaderParseError("Incomplete parameter")
24387db96d56Sopenharmony_ci        if value[0] == '*':
24397db96d56Sopenharmony_ci            param.append(ValueTerminal('*', 'extended-parameter-marker'))
24407db96d56Sopenharmony_ci            value = value[1:]
24417db96d56Sopenharmony_ci            param.extended = True
24427db96d56Sopenharmony_ci    if value[0] != '=':
24437db96d56Sopenharmony_ci        raise errors.HeaderParseError("Parameter not followed by '='")
24447db96d56Sopenharmony_ci    param.append(ValueTerminal('=', 'parameter-separator'))
24457db96d56Sopenharmony_ci    value = value[1:]
24467db96d56Sopenharmony_ci    leader = None
24477db96d56Sopenharmony_ci    if value and value[0] in CFWS_LEADER:
24487db96d56Sopenharmony_ci        token, value = get_cfws(value)
24497db96d56Sopenharmony_ci        param.append(token)
24507db96d56Sopenharmony_ci    remainder = None
24517db96d56Sopenharmony_ci    appendto = param
24527db96d56Sopenharmony_ci    if param.extended and value and value[0] == '"':
24537db96d56Sopenharmony_ci        # Now for some serious hackery to handle the common invalid case of
24547db96d56Sopenharmony_ci        # double quotes around an extended value.  We also accept (with defect)
24557db96d56Sopenharmony_ci        # a value marked as encoded that isn't really.
24567db96d56Sopenharmony_ci        qstring, remainder = get_quoted_string(value)
24577db96d56Sopenharmony_ci        inner_value = qstring.stripped_value
24587db96d56Sopenharmony_ci        semi_valid = False
24597db96d56Sopenharmony_ci        if param.section_number == 0:
24607db96d56Sopenharmony_ci            if inner_value and inner_value[0] == "'":
24617db96d56Sopenharmony_ci                semi_valid = True
24627db96d56Sopenharmony_ci            else:
24637db96d56Sopenharmony_ci                token, rest = get_attrtext(inner_value)
24647db96d56Sopenharmony_ci                if rest and rest[0] == "'":
24657db96d56Sopenharmony_ci                    semi_valid = True
24667db96d56Sopenharmony_ci        else:
24677db96d56Sopenharmony_ci            try:
24687db96d56Sopenharmony_ci                token, rest = get_extended_attrtext(inner_value)
24697db96d56Sopenharmony_ci            except:
24707db96d56Sopenharmony_ci                pass
24717db96d56Sopenharmony_ci            else:
24727db96d56Sopenharmony_ci                if not rest:
24737db96d56Sopenharmony_ci                    semi_valid = True
24747db96d56Sopenharmony_ci        if semi_valid:
24757db96d56Sopenharmony_ci            param.defects.append(errors.InvalidHeaderDefect(
24767db96d56Sopenharmony_ci                "Quoted string value for extended parameter is invalid"))
24777db96d56Sopenharmony_ci            param.append(qstring)
24787db96d56Sopenharmony_ci            for t in qstring:
24797db96d56Sopenharmony_ci                if t.token_type == 'bare-quoted-string':
24807db96d56Sopenharmony_ci                    t[:] = []
24817db96d56Sopenharmony_ci                    appendto = t
24827db96d56Sopenharmony_ci                    break
24837db96d56Sopenharmony_ci            value = inner_value
24847db96d56Sopenharmony_ci        else:
24857db96d56Sopenharmony_ci            remainder = None
24867db96d56Sopenharmony_ci            param.defects.append(errors.InvalidHeaderDefect(
24877db96d56Sopenharmony_ci                "Parameter marked as extended but appears to have a "
24887db96d56Sopenharmony_ci                "quoted string value that is non-encoded"))
24897db96d56Sopenharmony_ci    if value and value[0] == "'":
24907db96d56Sopenharmony_ci        token = None
24917db96d56Sopenharmony_ci    else:
24927db96d56Sopenharmony_ci        token, value = get_value(value)
24937db96d56Sopenharmony_ci    if not param.extended or param.section_number > 0:
24947db96d56Sopenharmony_ci        if not value or value[0] != "'":
24957db96d56Sopenharmony_ci            appendto.append(token)
24967db96d56Sopenharmony_ci            if remainder is not None:
24977db96d56Sopenharmony_ci                assert not value, value
24987db96d56Sopenharmony_ci                value = remainder
24997db96d56Sopenharmony_ci            return param, value
25007db96d56Sopenharmony_ci        param.defects.append(errors.InvalidHeaderDefect(
25017db96d56Sopenharmony_ci            "Apparent initial-extended-value but attribute "
25027db96d56Sopenharmony_ci            "was not marked as extended or was not initial section"))
25037db96d56Sopenharmony_ci    if not value:
25047db96d56Sopenharmony_ci        # Assume the charset/lang is missing and the token is the value.
25057db96d56Sopenharmony_ci        param.defects.append(errors.InvalidHeaderDefect(
25067db96d56Sopenharmony_ci            "Missing required charset/lang delimiters"))
25077db96d56Sopenharmony_ci        appendto.append(token)
25087db96d56Sopenharmony_ci        if remainder is None:
25097db96d56Sopenharmony_ci            return param, value
25107db96d56Sopenharmony_ci    else:
25117db96d56Sopenharmony_ci        if token is not None:
25127db96d56Sopenharmony_ci            for t in token:
25137db96d56Sopenharmony_ci                if t.token_type == 'extended-attrtext':
25147db96d56Sopenharmony_ci                    break
25157db96d56Sopenharmony_ci            t.token_type == 'attrtext'
25167db96d56Sopenharmony_ci            appendto.append(t)
25177db96d56Sopenharmony_ci            param.charset = t.value
25187db96d56Sopenharmony_ci        if value[0] != "'":
25197db96d56Sopenharmony_ci            raise errors.HeaderParseError("Expected RFC2231 char/lang encoding "
25207db96d56Sopenharmony_ci                                          "delimiter, but found {!r}".format(value))
25217db96d56Sopenharmony_ci        appendto.append(ValueTerminal("'", 'RFC2231-delimiter'))
25227db96d56Sopenharmony_ci        value = value[1:]
25237db96d56Sopenharmony_ci        if value and value[0] != "'":
25247db96d56Sopenharmony_ci            token, value = get_attrtext(value)
25257db96d56Sopenharmony_ci            appendto.append(token)
25267db96d56Sopenharmony_ci            param.lang = token.value
25277db96d56Sopenharmony_ci            if not value or value[0] != "'":
25287db96d56Sopenharmony_ci                raise errors.HeaderParseError("Expected RFC2231 char/lang encoding "
25297db96d56Sopenharmony_ci                                  "delimiter, but found {}".format(value))
25307db96d56Sopenharmony_ci        appendto.append(ValueTerminal("'", 'RFC2231-delimiter'))
25317db96d56Sopenharmony_ci        value = value[1:]
25327db96d56Sopenharmony_ci    if remainder is not None:
25337db96d56Sopenharmony_ci        # Treat the rest of value as bare quoted string content.
25347db96d56Sopenharmony_ci        v = Value()
25357db96d56Sopenharmony_ci        while value:
25367db96d56Sopenharmony_ci            if value[0] in WSP:
25377db96d56Sopenharmony_ci                token, value = get_fws(value)
25387db96d56Sopenharmony_ci            elif value[0] == '"':
25397db96d56Sopenharmony_ci                token = ValueTerminal('"', 'DQUOTE')
25407db96d56Sopenharmony_ci                value = value[1:]
25417db96d56Sopenharmony_ci            else:
25427db96d56Sopenharmony_ci                token, value = get_qcontent(value)
25437db96d56Sopenharmony_ci            v.append(token)
25447db96d56Sopenharmony_ci        token = v
25457db96d56Sopenharmony_ci    else:
25467db96d56Sopenharmony_ci        token, value = get_value(value)
25477db96d56Sopenharmony_ci    appendto.append(token)
25487db96d56Sopenharmony_ci    if remainder is not None:
25497db96d56Sopenharmony_ci        assert not value, value
25507db96d56Sopenharmony_ci        value = remainder
25517db96d56Sopenharmony_ci    return param, value
25527db96d56Sopenharmony_ci
25537db96d56Sopenharmony_cidef parse_mime_parameters(value):
25547db96d56Sopenharmony_ci    """ parameter *( ";" parameter )
25557db96d56Sopenharmony_ci
25567db96d56Sopenharmony_ci    That BNF is meant to indicate this routine should only be called after
25577db96d56Sopenharmony_ci    finding and handling the leading ';'.  There is no corresponding rule in
25587db96d56Sopenharmony_ci    the formal RFC grammar, but it is more convenient for us for the set of
25597db96d56Sopenharmony_ci    parameters to be treated as its own TokenList.
25607db96d56Sopenharmony_ci
25617db96d56Sopenharmony_ci    This is 'parse' routine because it consumes the remaining value, but it
25627db96d56Sopenharmony_ci    would never be called to parse a full header.  Instead it is called to
25637db96d56Sopenharmony_ci    parse everything after the non-parameter value of a specific MIME header.
25647db96d56Sopenharmony_ci
25657db96d56Sopenharmony_ci    """
25667db96d56Sopenharmony_ci    mime_parameters = MimeParameters()
25677db96d56Sopenharmony_ci    while value:
25687db96d56Sopenharmony_ci        try:
25697db96d56Sopenharmony_ci            token, value = get_parameter(value)
25707db96d56Sopenharmony_ci            mime_parameters.append(token)
25717db96d56Sopenharmony_ci        except errors.HeaderParseError as err:
25727db96d56Sopenharmony_ci            leader = None
25737db96d56Sopenharmony_ci            if value[0] in CFWS_LEADER:
25747db96d56Sopenharmony_ci                leader, value = get_cfws(value)
25757db96d56Sopenharmony_ci            if not value:
25767db96d56Sopenharmony_ci                mime_parameters.append(leader)
25777db96d56Sopenharmony_ci                return mime_parameters
25787db96d56Sopenharmony_ci            if value[0] == ';':
25797db96d56Sopenharmony_ci                if leader is not None:
25807db96d56Sopenharmony_ci                    mime_parameters.append(leader)
25817db96d56Sopenharmony_ci                mime_parameters.defects.append(errors.InvalidHeaderDefect(
25827db96d56Sopenharmony_ci                    "parameter entry with no content"))
25837db96d56Sopenharmony_ci            else:
25847db96d56Sopenharmony_ci                token, value = get_invalid_parameter(value)
25857db96d56Sopenharmony_ci                if leader:
25867db96d56Sopenharmony_ci                    token[:0] = [leader]
25877db96d56Sopenharmony_ci                mime_parameters.append(token)
25887db96d56Sopenharmony_ci                mime_parameters.defects.append(errors.InvalidHeaderDefect(
25897db96d56Sopenharmony_ci                    "invalid parameter {!r}".format(token)))
25907db96d56Sopenharmony_ci        if value and value[0] != ';':
25917db96d56Sopenharmony_ci            # Junk after the otherwise valid parameter.  Mark it as
25927db96d56Sopenharmony_ci            # invalid, but it will have a value.
25937db96d56Sopenharmony_ci            param = mime_parameters[-1]
25947db96d56Sopenharmony_ci            param.token_type = 'invalid-parameter'
25957db96d56Sopenharmony_ci            token, value = get_invalid_parameter(value)
25967db96d56Sopenharmony_ci            param.extend(token)
25977db96d56Sopenharmony_ci            mime_parameters.defects.append(errors.InvalidHeaderDefect(
25987db96d56Sopenharmony_ci                "parameter with invalid trailing text {!r}".format(token)))
25997db96d56Sopenharmony_ci        if value:
26007db96d56Sopenharmony_ci            # Must be a ';' at this point.
26017db96d56Sopenharmony_ci            mime_parameters.append(ValueTerminal(';', 'parameter-separator'))
26027db96d56Sopenharmony_ci            value = value[1:]
26037db96d56Sopenharmony_ci    return mime_parameters
26047db96d56Sopenharmony_ci
26057db96d56Sopenharmony_cidef _find_mime_parameters(tokenlist, value):
26067db96d56Sopenharmony_ci    """Do our best to find the parameters in an invalid MIME header
26077db96d56Sopenharmony_ci
26087db96d56Sopenharmony_ci    """
26097db96d56Sopenharmony_ci    while value and value[0] != ';':
26107db96d56Sopenharmony_ci        if value[0] in PHRASE_ENDS:
26117db96d56Sopenharmony_ci            tokenlist.append(ValueTerminal(value[0], 'misplaced-special'))
26127db96d56Sopenharmony_ci            value = value[1:]
26137db96d56Sopenharmony_ci        else:
26147db96d56Sopenharmony_ci            token, value = get_phrase(value)
26157db96d56Sopenharmony_ci            tokenlist.append(token)
26167db96d56Sopenharmony_ci    if not value:
26177db96d56Sopenharmony_ci        return
26187db96d56Sopenharmony_ci    tokenlist.append(ValueTerminal(';', 'parameter-separator'))
26197db96d56Sopenharmony_ci    tokenlist.append(parse_mime_parameters(value[1:]))
26207db96d56Sopenharmony_ci
26217db96d56Sopenharmony_cidef parse_content_type_header(value):
26227db96d56Sopenharmony_ci    """ maintype "/" subtype *( ";" parameter )
26237db96d56Sopenharmony_ci
26247db96d56Sopenharmony_ci    The maintype and substype are tokens.  Theoretically they could
26257db96d56Sopenharmony_ci    be checked against the official IANA list + x-token, but we
26267db96d56Sopenharmony_ci    don't do that.
26277db96d56Sopenharmony_ci    """
26287db96d56Sopenharmony_ci    ctype = ContentType()
26297db96d56Sopenharmony_ci    recover = False
26307db96d56Sopenharmony_ci    if not value:
26317db96d56Sopenharmony_ci        ctype.defects.append(errors.HeaderMissingRequiredValue(
26327db96d56Sopenharmony_ci            "Missing content type specification"))
26337db96d56Sopenharmony_ci        return ctype
26347db96d56Sopenharmony_ci    try:
26357db96d56Sopenharmony_ci        token, value = get_token(value)
26367db96d56Sopenharmony_ci    except errors.HeaderParseError:
26377db96d56Sopenharmony_ci        ctype.defects.append(errors.InvalidHeaderDefect(
26387db96d56Sopenharmony_ci            "Expected content maintype but found {!r}".format(value)))
26397db96d56Sopenharmony_ci        _find_mime_parameters(ctype, value)
26407db96d56Sopenharmony_ci        return ctype
26417db96d56Sopenharmony_ci    ctype.append(token)
26427db96d56Sopenharmony_ci    # XXX: If we really want to follow the formal grammar we should make
26437db96d56Sopenharmony_ci    # mantype and subtype specialized TokenLists here.  Probably not worth it.
26447db96d56Sopenharmony_ci    if not value or value[0] != '/':
26457db96d56Sopenharmony_ci        ctype.defects.append(errors.InvalidHeaderDefect(
26467db96d56Sopenharmony_ci            "Invalid content type"))
26477db96d56Sopenharmony_ci        if value:
26487db96d56Sopenharmony_ci            _find_mime_parameters(ctype, value)
26497db96d56Sopenharmony_ci        return ctype
26507db96d56Sopenharmony_ci    ctype.maintype = token.value.strip().lower()
26517db96d56Sopenharmony_ci    ctype.append(ValueTerminal('/', 'content-type-separator'))
26527db96d56Sopenharmony_ci    value = value[1:]
26537db96d56Sopenharmony_ci    try:
26547db96d56Sopenharmony_ci        token, value = get_token(value)
26557db96d56Sopenharmony_ci    except errors.HeaderParseError:
26567db96d56Sopenharmony_ci        ctype.defects.append(errors.InvalidHeaderDefect(
26577db96d56Sopenharmony_ci            "Expected content subtype but found {!r}".format(value)))
26587db96d56Sopenharmony_ci        _find_mime_parameters(ctype, value)
26597db96d56Sopenharmony_ci        return ctype
26607db96d56Sopenharmony_ci    ctype.append(token)
26617db96d56Sopenharmony_ci    ctype.subtype = token.value.strip().lower()
26627db96d56Sopenharmony_ci    if not value:
26637db96d56Sopenharmony_ci        return ctype
26647db96d56Sopenharmony_ci    if value[0] != ';':
26657db96d56Sopenharmony_ci        ctype.defects.append(errors.InvalidHeaderDefect(
26667db96d56Sopenharmony_ci            "Only parameters are valid after content type, but "
26677db96d56Sopenharmony_ci            "found {!r}".format(value)))
26687db96d56Sopenharmony_ci        # The RFC requires that a syntactically invalid content-type be treated
26697db96d56Sopenharmony_ci        # as text/plain.  Perhaps we should postel this, but we should probably
26707db96d56Sopenharmony_ci        # only do that if we were checking the subtype value against IANA.
26717db96d56Sopenharmony_ci        del ctype.maintype, ctype.subtype
26727db96d56Sopenharmony_ci        _find_mime_parameters(ctype, value)
26737db96d56Sopenharmony_ci        return ctype
26747db96d56Sopenharmony_ci    ctype.append(ValueTerminal(';', 'parameter-separator'))
26757db96d56Sopenharmony_ci    ctype.append(parse_mime_parameters(value[1:]))
26767db96d56Sopenharmony_ci    return ctype
26777db96d56Sopenharmony_ci
26787db96d56Sopenharmony_cidef parse_content_disposition_header(value):
26797db96d56Sopenharmony_ci    """ disposition-type *( ";" parameter )
26807db96d56Sopenharmony_ci
26817db96d56Sopenharmony_ci    """
26827db96d56Sopenharmony_ci    disp_header = ContentDisposition()
26837db96d56Sopenharmony_ci    if not value:
26847db96d56Sopenharmony_ci        disp_header.defects.append(errors.HeaderMissingRequiredValue(
26857db96d56Sopenharmony_ci            "Missing content disposition"))
26867db96d56Sopenharmony_ci        return disp_header
26877db96d56Sopenharmony_ci    try:
26887db96d56Sopenharmony_ci        token, value = get_token(value)
26897db96d56Sopenharmony_ci    except errors.HeaderParseError:
26907db96d56Sopenharmony_ci        disp_header.defects.append(errors.InvalidHeaderDefect(
26917db96d56Sopenharmony_ci            "Expected content disposition but found {!r}".format(value)))
26927db96d56Sopenharmony_ci        _find_mime_parameters(disp_header, value)
26937db96d56Sopenharmony_ci        return disp_header
26947db96d56Sopenharmony_ci    disp_header.append(token)
26957db96d56Sopenharmony_ci    disp_header.content_disposition = token.value.strip().lower()
26967db96d56Sopenharmony_ci    if not value:
26977db96d56Sopenharmony_ci        return disp_header
26987db96d56Sopenharmony_ci    if value[0] != ';':
26997db96d56Sopenharmony_ci        disp_header.defects.append(errors.InvalidHeaderDefect(
27007db96d56Sopenharmony_ci            "Only parameters are valid after content disposition, but "
27017db96d56Sopenharmony_ci            "found {!r}".format(value)))
27027db96d56Sopenharmony_ci        _find_mime_parameters(disp_header, value)
27037db96d56Sopenharmony_ci        return disp_header
27047db96d56Sopenharmony_ci    disp_header.append(ValueTerminal(';', 'parameter-separator'))
27057db96d56Sopenharmony_ci    disp_header.append(parse_mime_parameters(value[1:]))
27067db96d56Sopenharmony_ci    return disp_header
27077db96d56Sopenharmony_ci
27087db96d56Sopenharmony_cidef parse_content_transfer_encoding_header(value):
27097db96d56Sopenharmony_ci    """ mechanism
27107db96d56Sopenharmony_ci
27117db96d56Sopenharmony_ci    """
27127db96d56Sopenharmony_ci    # We should probably validate the values, since the list is fixed.
27137db96d56Sopenharmony_ci    cte_header = ContentTransferEncoding()
27147db96d56Sopenharmony_ci    if not value:
27157db96d56Sopenharmony_ci        cte_header.defects.append(errors.HeaderMissingRequiredValue(
27167db96d56Sopenharmony_ci            "Missing content transfer encoding"))
27177db96d56Sopenharmony_ci        return cte_header
27187db96d56Sopenharmony_ci    try:
27197db96d56Sopenharmony_ci        token, value = get_token(value)
27207db96d56Sopenharmony_ci    except errors.HeaderParseError:
27217db96d56Sopenharmony_ci        cte_header.defects.append(errors.InvalidHeaderDefect(
27227db96d56Sopenharmony_ci            "Expected content transfer encoding but found {!r}".format(value)))
27237db96d56Sopenharmony_ci    else:
27247db96d56Sopenharmony_ci        cte_header.append(token)
27257db96d56Sopenharmony_ci        cte_header.cte = token.value.strip().lower()
27267db96d56Sopenharmony_ci    if not value:
27277db96d56Sopenharmony_ci        return cte_header
27287db96d56Sopenharmony_ci    while value:
27297db96d56Sopenharmony_ci        cte_header.defects.append(errors.InvalidHeaderDefect(
27307db96d56Sopenharmony_ci            "Extra text after content transfer encoding"))
27317db96d56Sopenharmony_ci        if value[0] in PHRASE_ENDS:
27327db96d56Sopenharmony_ci            cte_header.append(ValueTerminal(value[0], 'misplaced-special'))
27337db96d56Sopenharmony_ci            value = value[1:]
27347db96d56Sopenharmony_ci        else:
27357db96d56Sopenharmony_ci            token, value = get_phrase(value)
27367db96d56Sopenharmony_ci            cte_header.append(token)
27377db96d56Sopenharmony_ci    return cte_header
27387db96d56Sopenharmony_ci
27397db96d56Sopenharmony_ci
27407db96d56Sopenharmony_ci#
27417db96d56Sopenharmony_ci# Header folding
27427db96d56Sopenharmony_ci#
27437db96d56Sopenharmony_ci# Header folding is complex, with lots of rules and corner cases.  The
27447db96d56Sopenharmony_ci# following code does its best to obey the rules and handle the corner
27457db96d56Sopenharmony_ci# cases, but you can be sure there are few bugs:)
27467db96d56Sopenharmony_ci#
27477db96d56Sopenharmony_ci# This folder generally canonicalizes as it goes, preferring the stringified
27487db96d56Sopenharmony_ci# version of each token.  The tokens contain information that supports the
27497db96d56Sopenharmony_ci# folder, including which tokens can be encoded in which ways.
27507db96d56Sopenharmony_ci#
27517db96d56Sopenharmony_ci# Folded text is accumulated in a simple list of strings ('lines'), each
27527db96d56Sopenharmony_ci# one of which should be less than policy.max_line_length ('maxlen').
27537db96d56Sopenharmony_ci#
27547db96d56Sopenharmony_ci
27557db96d56Sopenharmony_cidef _steal_trailing_WSP_if_exists(lines):
27567db96d56Sopenharmony_ci    wsp = ''
27577db96d56Sopenharmony_ci    if lines and lines[-1] and lines[-1][-1] in WSP:
27587db96d56Sopenharmony_ci        wsp = lines[-1][-1]
27597db96d56Sopenharmony_ci        lines[-1] = lines[-1][:-1]
27607db96d56Sopenharmony_ci    return wsp
27617db96d56Sopenharmony_ci
27627db96d56Sopenharmony_cidef _refold_parse_tree(parse_tree, *, policy):
27637db96d56Sopenharmony_ci    """Return string of contents of parse_tree folded according to RFC rules.
27647db96d56Sopenharmony_ci
27657db96d56Sopenharmony_ci    """
27667db96d56Sopenharmony_ci    # max_line_length 0/None means no limit, ie: infinitely long.
27677db96d56Sopenharmony_ci    maxlen = policy.max_line_length or sys.maxsize
27687db96d56Sopenharmony_ci    encoding = 'utf-8' if policy.utf8 else 'us-ascii'
27697db96d56Sopenharmony_ci    lines = ['']
27707db96d56Sopenharmony_ci    last_ew = None
27717db96d56Sopenharmony_ci    wrap_as_ew_blocked = 0
27727db96d56Sopenharmony_ci    want_encoding = False
27737db96d56Sopenharmony_ci    end_ew_not_allowed = Terminal('', 'wrap_as_ew_blocked')
27747db96d56Sopenharmony_ci    parts = list(parse_tree)
27757db96d56Sopenharmony_ci    while parts:
27767db96d56Sopenharmony_ci        part = parts.pop(0)
27777db96d56Sopenharmony_ci        if part is end_ew_not_allowed:
27787db96d56Sopenharmony_ci            wrap_as_ew_blocked -= 1
27797db96d56Sopenharmony_ci            continue
27807db96d56Sopenharmony_ci        tstr = str(part)
27817db96d56Sopenharmony_ci        if part.token_type == 'ptext' and set(tstr) & SPECIALS:
27827db96d56Sopenharmony_ci            # Encode if tstr contains special characters.
27837db96d56Sopenharmony_ci            want_encoding = True
27847db96d56Sopenharmony_ci        try:
27857db96d56Sopenharmony_ci            tstr.encode(encoding)
27867db96d56Sopenharmony_ci            charset = encoding
27877db96d56Sopenharmony_ci        except UnicodeEncodeError:
27887db96d56Sopenharmony_ci            if any(isinstance(x, errors.UndecodableBytesDefect)
27897db96d56Sopenharmony_ci                   for x in part.all_defects):
27907db96d56Sopenharmony_ci                charset = 'unknown-8bit'
27917db96d56Sopenharmony_ci            else:
27927db96d56Sopenharmony_ci                # If policy.utf8 is false this should really be taken from a
27937db96d56Sopenharmony_ci                # 'charset' property on the policy.
27947db96d56Sopenharmony_ci                charset = 'utf-8'
27957db96d56Sopenharmony_ci            want_encoding = True
27967db96d56Sopenharmony_ci        if part.token_type == 'mime-parameters':
27977db96d56Sopenharmony_ci            # Mime parameter folding (using RFC2231) is extra special.
27987db96d56Sopenharmony_ci            _fold_mime_parameters(part, lines, maxlen, encoding)
27997db96d56Sopenharmony_ci            continue
28007db96d56Sopenharmony_ci        if want_encoding and not wrap_as_ew_blocked:
28017db96d56Sopenharmony_ci            if not part.as_ew_allowed:
28027db96d56Sopenharmony_ci                want_encoding = False
28037db96d56Sopenharmony_ci                last_ew = None
28047db96d56Sopenharmony_ci                if part.syntactic_break:
28057db96d56Sopenharmony_ci                    encoded_part = part.fold(policy=policy)[:-len(policy.linesep)]
28067db96d56Sopenharmony_ci                    if policy.linesep not in encoded_part:
28077db96d56Sopenharmony_ci                        # It fits on a single line
28087db96d56Sopenharmony_ci                        if len(encoded_part) > maxlen - len(lines[-1]):
28097db96d56Sopenharmony_ci                            # But not on this one, so start a new one.
28107db96d56Sopenharmony_ci                            newline = _steal_trailing_WSP_if_exists(lines)
28117db96d56Sopenharmony_ci                            # XXX what if encoded_part has no leading FWS?
28127db96d56Sopenharmony_ci                            lines.append(newline)
28137db96d56Sopenharmony_ci                        lines[-1] += encoded_part
28147db96d56Sopenharmony_ci                        continue
28157db96d56Sopenharmony_ci                # Either this is not a major syntactic break, so we don't
28167db96d56Sopenharmony_ci                # want it on a line by itself even if it fits, or it
28177db96d56Sopenharmony_ci                # doesn't fit on a line by itself.  Either way, fall through
28187db96d56Sopenharmony_ci                # to unpacking the subparts and wrapping them.
28197db96d56Sopenharmony_ci            if not hasattr(part, 'encode'):
28207db96d56Sopenharmony_ci                # It's not a Terminal, do each piece individually.
28217db96d56Sopenharmony_ci                parts = list(part) + parts
28227db96d56Sopenharmony_ci            else:
28237db96d56Sopenharmony_ci                # It's a terminal, wrap it as an encoded word, possibly
28247db96d56Sopenharmony_ci                # combining it with previously encoded words if allowed.
28257db96d56Sopenharmony_ci                last_ew = _fold_as_ew(tstr, lines, maxlen, last_ew,
28267db96d56Sopenharmony_ci                                      part.ew_combine_allowed, charset)
28277db96d56Sopenharmony_ci            want_encoding = False
28287db96d56Sopenharmony_ci            continue
28297db96d56Sopenharmony_ci        if len(tstr) <= maxlen - len(lines[-1]):
28307db96d56Sopenharmony_ci            lines[-1] += tstr
28317db96d56Sopenharmony_ci            continue
28327db96d56Sopenharmony_ci        # This part is too long to fit.  The RFC wants us to break at
28337db96d56Sopenharmony_ci        # "major syntactic breaks", so unless we don't consider this
28347db96d56Sopenharmony_ci        # to be one, check if it will fit on the next line by itself.
28357db96d56Sopenharmony_ci        if (part.syntactic_break and
28367db96d56Sopenharmony_ci                len(tstr) + 1 <= maxlen):
28377db96d56Sopenharmony_ci            newline = _steal_trailing_WSP_if_exists(lines)
28387db96d56Sopenharmony_ci            if newline or part.startswith_fws():
28397db96d56Sopenharmony_ci                lines.append(newline + tstr)
28407db96d56Sopenharmony_ci                last_ew = None
28417db96d56Sopenharmony_ci                continue
28427db96d56Sopenharmony_ci        if not hasattr(part, 'encode'):
28437db96d56Sopenharmony_ci            # It's not a terminal, try folding the subparts.
28447db96d56Sopenharmony_ci            newparts = list(part)
28457db96d56Sopenharmony_ci            if not part.as_ew_allowed:
28467db96d56Sopenharmony_ci                wrap_as_ew_blocked += 1
28477db96d56Sopenharmony_ci                newparts.append(end_ew_not_allowed)
28487db96d56Sopenharmony_ci            parts = newparts + parts
28497db96d56Sopenharmony_ci            continue
28507db96d56Sopenharmony_ci        if part.as_ew_allowed and not wrap_as_ew_blocked:
28517db96d56Sopenharmony_ci            # It doesn't need CTE encoding, but encode it anyway so we can
28527db96d56Sopenharmony_ci            # wrap it.
28537db96d56Sopenharmony_ci            parts.insert(0, part)
28547db96d56Sopenharmony_ci            want_encoding = True
28557db96d56Sopenharmony_ci            continue
28567db96d56Sopenharmony_ci        # We can't figure out how to wrap, it, so give up.
28577db96d56Sopenharmony_ci        newline = _steal_trailing_WSP_if_exists(lines)
28587db96d56Sopenharmony_ci        if newline or part.startswith_fws():
28597db96d56Sopenharmony_ci            lines.append(newline + tstr)
28607db96d56Sopenharmony_ci        else:
28617db96d56Sopenharmony_ci            # We can't fold it onto the next line either...
28627db96d56Sopenharmony_ci            lines[-1] += tstr
28637db96d56Sopenharmony_ci    return policy.linesep.join(lines) + policy.linesep
28647db96d56Sopenharmony_ci
28657db96d56Sopenharmony_cidef _fold_as_ew(to_encode, lines, maxlen, last_ew, ew_combine_allowed, charset):
28667db96d56Sopenharmony_ci    """Fold string to_encode into lines as encoded word, combining if allowed.
28677db96d56Sopenharmony_ci    Return the new value for last_ew, or None if ew_combine_allowed is False.
28687db96d56Sopenharmony_ci
28697db96d56Sopenharmony_ci    If there is already an encoded word in the last line of lines (indicated by
28707db96d56Sopenharmony_ci    a non-None value for last_ew) and ew_combine_allowed is true, decode the
28717db96d56Sopenharmony_ci    existing ew, combine it with to_encode, and re-encode.  Otherwise, encode
28727db96d56Sopenharmony_ci    to_encode.  In either case, split to_encode as necessary so that the
28737db96d56Sopenharmony_ci    encoded segments fit within maxlen.
28747db96d56Sopenharmony_ci
28757db96d56Sopenharmony_ci    """
28767db96d56Sopenharmony_ci    if last_ew is not None and ew_combine_allowed:
28777db96d56Sopenharmony_ci        to_encode = str(
28787db96d56Sopenharmony_ci            get_unstructured(lines[-1][last_ew:] + to_encode))
28797db96d56Sopenharmony_ci        lines[-1] = lines[-1][:last_ew]
28807db96d56Sopenharmony_ci    if to_encode[0] in WSP:
28817db96d56Sopenharmony_ci        # We're joining this to non-encoded text, so don't encode
28827db96d56Sopenharmony_ci        # the leading blank.
28837db96d56Sopenharmony_ci        leading_wsp = to_encode[0]
28847db96d56Sopenharmony_ci        to_encode = to_encode[1:]
28857db96d56Sopenharmony_ci        if (len(lines[-1]) == maxlen):
28867db96d56Sopenharmony_ci            lines.append(_steal_trailing_WSP_if_exists(lines))
28877db96d56Sopenharmony_ci        lines[-1] += leading_wsp
28887db96d56Sopenharmony_ci    trailing_wsp = ''
28897db96d56Sopenharmony_ci    if to_encode[-1] in WSP:
28907db96d56Sopenharmony_ci        # Likewise for the trailing space.
28917db96d56Sopenharmony_ci        trailing_wsp = to_encode[-1]
28927db96d56Sopenharmony_ci        to_encode = to_encode[:-1]
28937db96d56Sopenharmony_ci    new_last_ew = len(lines[-1]) if last_ew is None else last_ew
28947db96d56Sopenharmony_ci
28957db96d56Sopenharmony_ci    encode_as = 'utf-8' if charset == 'us-ascii' else charset
28967db96d56Sopenharmony_ci
28977db96d56Sopenharmony_ci    # The RFC2047 chrome takes up 7 characters plus the length
28987db96d56Sopenharmony_ci    # of the charset name.
28997db96d56Sopenharmony_ci    chrome_len = len(encode_as) + 7
29007db96d56Sopenharmony_ci
29017db96d56Sopenharmony_ci    if (chrome_len + 1) >= maxlen:
29027db96d56Sopenharmony_ci        raise errors.HeaderParseError(
29037db96d56Sopenharmony_ci            "max_line_length is too small to fit an encoded word")
29047db96d56Sopenharmony_ci
29057db96d56Sopenharmony_ci    while to_encode:
29067db96d56Sopenharmony_ci        remaining_space = maxlen - len(lines[-1])
29077db96d56Sopenharmony_ci        text_space = remaining_space - chrome_len
29087db96d56Sopenharmony_ci        if text_space <= 0:
29097db96d56Sopenharmony_ci            lines.append(' ')
29107db96d56Sopenharmony_ci            continue
29117db96d56Sopenharmony_ci
29127db96d56Sopenharmony_ci        to_encode_word = to_encode[:text_space]
29137db96d56Sopenharmony_ci        encoded_word = _ew.encode(to_encode_word, charset=encode_as)
29147db96d56Sopenharmony_ci        excess = len(encoded_word) - remaining_space
29157db96d56Sopenharmony_ci        while excess > 0:
29167db96d56Sopenharmony_ci            # Since the chunk to encode is guaranteed to fit into less than 100 characters,
29177db96d56Sopenharmony_ci            # shrinking it by one at a time shouldn't take long.
29187db96d56Sopenharmony_ci            to_encode_word = to_encode_word[:-1]
29197db96d56Sopenharmony_ci            encoded_word = _ew.encode(to_encode_word, charset=encode_as)
29207db96d56Sopenharmony_ci            excess = len(encoded_word) - remaining_space
29217db96d56Sopenharmony_ci        lines[-1] += encoded_word
29227db96d56Sopenharmony_ci        to_encode = to_encode[len(to_encode_word):]
29237db96d56Sopenharmony_ci
29247db96d56Sopenharmony_ci        if to_encode:
29257db96d56Sopenharmony_ci            lines.append(' ')
29267db96d56Sopenharmony_ci            new_last_ew = len(lines[-1])
29277db96d56Sopenharmony_ci    lines[-1] += trailing_wsp
29287db96d56Sopenharmony_ci    return new_last_ew if ew_combine_allowed else None
29297db96d56Sopenharmony_ci
29307db96d56Sopenharmony_cidef _fold_mime_parameters(part, lines, maxlen, encoding):
29317db96d56Sopenharmony_ci    """Fold TokenList 'part' into the 'lines' list as mime parameters.
29327db96d56Sopenharmony_ci
29337db96d56Sopenharmony_ci    Using the decoded list of parameters and values, format them according to
29347db96d56Sopenharmony_ci    the RFC rules, including using RFC2231 encoding if the value cannot be
29357db96d56Sopenharmony_ci    expressed in 'encoding' and/or the parameter+value is too long to fit
29367db96d56Sopenharmony_ci    within 'maxlen'.
29377db96d56Sopenharmony_ci
29387db96d56Sopenharmony_ci    """
29397db96d56Sopenharmony_ci    # Special case for RFC2231 encoding: start from decoded values and use
29407db96d56Sopenharmony_ci    # RFC2231 encoding iff needed.
29417db96d56Sopenharmony_ci    #
29427db96d56Sopenharmony_ci    # Note that the 1 and 2s being added to the length calculations are
29437db96d56Sopenharmony_ci    # accounting for the possibly-needed spaces and semicolons we'll be adding.
29447db96d56Sopenharmony_ci    #
29457db96d56Sopenharmony_ci    for name, value in part.params:
29467db96d56Sopenharmony_ci        # XXX What if this ';' puts us over maxlen the first time through the
29477db96d56Sopenharmony_ci        # loop?  We should split the header value onto a newline in that case,
29487db96d56Sopenharmony_ci        # but to do that we need to recognize the need earlier or reparse the
29497db96d56Sopenharmony_ci        # header, so I'm going to ignore that bug for now.  It'll only put us
29507db96d56Sopenharmony_ci        # one character over.
29517db96d56Sopenharmony_ci        if not lines[-1].rstrip().endswith(';'):
29527db96d56Sopenharmony_ci            lines[-1] += ';'
29537db96d56Sopenharmony_ci        charset = encoding
29547db96d56Sopenharmony_ci        error_handler = 'strict'
29557db96d56Sopenharmony_ci        try:
29567db96d56Sopenharmony_ci            value.encode(encoding)
29577db96d56Sopenharmony_ci            encoding_required = False
29587db96d56Sopenharmony_ci        except UnicodeEncodeError:
29597db96d56Sopenharmony_ci            encoding_required = True
29607db96d56Sopenharmony_ci            if utils._has_surrogates(value):
29617db96d56Sopenharmony_ci                charset = 'unknown-8bit'
29627db96d56Sopenharmony_ci                error_handler = 'surrogateescape'
29637db96d56Sopenharmony_ci            else:
29647db96d56Sopenharmony_ci                charset = 'utf-8'
29657db96d56Sopenharmony_ci        if encoding_required:
29667db96d56Sopenharmony_ci            encoded_value = urllib.parse.quote(
29677db96d56Sopenharmony_ci                value, safe='', errors=error_handler)
29687db96d56Sopenharmony_ci            tstr = "{}*={}''{}".format(name, charset, encoded_value)
29697db96d56Sopenharmony_ci        else:
29707db96d56Sopenharmony_ci            tstr = '{}={}'.format(name, quote_string(value))
29717db96d56Sopenharmony_ci        if len(lines[-1]) + len(tstr) + 1 < maxlen:
29727db96d56Sopenharmony_ci            lines[-1] = lines[-1] + ' ' + tstr
29737db96d56Sopenharmony_ci            continue
29747db96d56Sopenharmony_ci        elif len(tstr) + 2 <= maxlen:
29757db96d56Sopenharmony_ci            lines.append(' ' + tstr)
29767db96d56Sopenharmony_ci            continue
29777db96d56Sopenharmony_ci        # We need multiple sections.  We are allowed to mix encoded and
29787db96d56Sopenharmony_ci        # non-encoded sections, but we aren't going to.  We'll encode them all.
29797db96d56Sopenharmony_ci        section = 0
29807db96d56Sopenharmony_ci        extra_chrome = charset + "''"
29817db96d56Sopenharmony_ci        while value:
29827db96d56Sopenharmony_ci            chrome_len = len(name) + len(str(section)) + 3 + len(extra_chrome)
29837db96d56Sopenharmony_ci            if maxlen <= chrome_len + 3:
29847db96d56Sopenharmony_ci                # We need room for the leading blank, the trailing semicolon,
29857db96d56Sopenharmony_ci                # and at least one character of the value.  If we don't
29867db96d56Sopenharmony_ci                # have that, we'd be stuck, so in that case fall back to
29877db96d56Sopenharmony_ci                # the RFC standard width.
29887db96d56Sopenharmony_ci                maxlen = 78
29897db96d56Sopenharmony_ci            splitpoint = maxchars = maxlen - chrome_len - 2
29907db96d56Sopenharmony_ci            while True:
29917db96d56Sopenharmony_ci                partial = value[:splitpoint]
29927db96d56Sopenharmony_ci                encoded_value = urllib.parse.quote(
29937db96d56Sopenharmony_ci                    partial, safe='', errors=error_handler)
29947db96d56Sopenharmony_ci                if len(encoded_value) <= maxchars:
29957db96d56Sopenharmony_ci                    break
29967db96d56Sopenharmony_ci                splitpoint -= 1
29977db96d56Sopenharmony_ci            lines.append(" {}*{}*={}{}".format(
29987db96d56Sopenharmony_ci                name, section, extra_chrome, encoded_value))
29997db96d56Sopenharmony_ci            extra_chrome = ''
30007db96d56Sopenharmony_ci            section += 1
30017db96d56Sopenharmony_ci            value = value[splitpoint:]
30027db96d56Sopenharmony_ci            if value:
30037db96d56Sopenharmony_ci                lines[-1] += ';'
3004