17db96d56Sopenharmony_ci"""Header value parser implementing various email-related RFC parsing rules. 27db96d56Sopenharmony_ci 37db96d56Sopenharmony_ciThe parsing methods defined in this module implement various email related 47db96d56Sopenharmony_ciparsing rules. Principal among them is RFC 5322, which is the followon 57db96d56Sopenharmony_cito RFC 2822 and primarily a clarification of the former. It also implements 67db96d56Sopenharmony_ciRFC 2047 encoded word decoding. 77db96d56Sopenharmony_ci 87db96d56Sopenharmony_ciRFC 5322 goes to considerable trouble to maintain backward compatibility with 97db96d56Sopenharmony_ciRFC 822 in the parse phase, while cleaning up the structure on the generation 107db96d56Sopenharmony_ciphase. This parser supports correct RFC 5322 generation by tagging white space 117db96d56Sopenharmony_cias folding white space only when folding is allowed in the non-obsolete rule 127db96d56Sopenharmony_cisets. Actually, the parser is even more generous when accepting input than RFC 137db96d56Sopenharmony_ci5322 mandates, following the spirit of Postel's Law, which RFC 5322 encourages. 147db96d56Sopenharmony_ciWhere possible deviations from the standard are annotated on the 'defects' 157db96d56Sopenharmony_ciattribute of tokens that deviate. 167db96d56Sopenharmony_ci 177db96d56Sopenharmony_ciThe general structure of the parser follows RFC 5322, and uses its terminology 187db96d56Sopenharmony_ciwhere there is a direct correspondence. Where the implementation requires a 197db96d56Sopenharmony_cisomewhat different structure than that used by the formal grammar, new terms 207db96d56Sopenharmony_cithat mimic the closest existing terms are used. Thus, it really helps to have 217db96d56Sopenharmony_cia copy of RFC 5322 handy when studying this code. 227db96d56Sopenharmony_ci 237db96d56Sopenharmony_ciInput to the parser is a string that has already been unfolded according to 247db96d56Sopenharmony_ciRFC 5322 rules. According to the RFC this unfolding is the very first step, and 257db96d56Sopenharmony_cithis parser leaves the unfolding step to a higher level message parser, which 267db96d56Sopenharmony_ciwill have already detected the line breaks that need unfolding while 277db96d56Sopenharmony_cidetermining the beginning and end of each header. 287db96d56Sopenharmony_ci 297db96d56Sopenharmony_ciThe output of the parser is a TokenList object, which is a list subclass. A 307db96d56Sopenharmony_ciTokenList is a recursive data structure. The terminal nodes of the structure 317db96d56Sopenharmony_ciare Terminal objects, which are subclasses of str. These do not correspond 327db96d56Sopenharmony_cidirectly to terminal objects in the formal grammar, but are instead more 337db96d56Sopenharmony_cipractical higher level combinations of true terminals. 347db96d56Sopenharmony_ci 357db96d56Sopenharmony_ciAll TokenList and Terminal objects have a 'value' attribute, which produces the 367db96d56Sopenharmony_cisemantically meaningful value of that part of the parse subtree. The value of 377db96d56Sopenharmony_ciall whitespace tokens (no matter how many sub-tokens they may contain) is a 387db96d56Sopenharmony_cisingle space, as per the RFC rules. This includes 'CFWS', which is herein 397db96d56Sopenharmony_ciincluded in the general class of whitespace tokens. There is one exception to 407db96d56Sopenharmony_cithe rule that whitespace tokens are collapsed into single spaces in values: in 417db96d56Sopenharmony_cithe value of a 'bare-quoted-string' (a quoted-string with no leading or 427db96d56Sopenharmony_citrailing whitespace), any whitespace that appeared between the quotation marks 437db96d56Sopenharmony_ciis preserved in the returned value. Note that in all Terminal strings quoted 447db96d56Sopenharmony_cipairs are turned into their unquoted values. 457db96d56Sopenharmony_ci 467db96d56Sopenharmony_ciAll TokenList and Terminal objects also have a string value, which attempts to 477db96d56Sopenharmony_cibe a "canonical" representation of the RFC-compliant form of the substring that 487db96d56Sopenharmony_ciproduced the parsed subtree, including minimal use of quoted pair quoting. 497db96d56Sopenharmony_ciWhitespace runs are not collapsed. 507db96d56Sopenharmony_ci 517db96d56Sopenharmony_ciComment tokens also have a 'content' attribute providing the string found 527db96d56Sopenharmony_cibetween the parens (including any nested comments) with whitespace preserved. 537db96d56Sopenharmony_ci 547db96d56Sopenharmony_ciAll TokenList and Terminal objects have a 'defects' attribute which is a 557db96d56Sopenharmony_cipossibly empty list all of the defects found while creating the token. Defects 567db96d56Sopenharmony_cimay appear on any token in the tree, and a composite list of all defects in the 577db96d56Sopenharmony_cisubtree is available through the 'all_defects' attribute of any node. (For 587db96d56Sopenharmony_ciTerminal notes x.defects == x.all_defects.) 597db96d56Sopenharmony_ci 607db96d56Sopenharmony_ciEach object in a parse tree is called a 'token', and each has a 'token_type' 617db96d56Sopenharmony_ciattribute that gives the name from the RFC 5322 grammar that it represents. 627db96d56Sopenharmony_ciNot all RFC 5322 nodes are produced, and there is one non-RFC 5322 node that 637db96d56Sopenharmony_cimay be produced: 'ptext'. A 'ptext' is a string of printable ascii characters. 647db96d56Sopenharmony_ciIt is returned in place of lists of (ctext/quoted-pair) and 657db96d56Sopenharmony_ci(qtext/quoted-pair). 667db96d56Sopenharmony_ci 677db96d56Sopenharmony_ciXXX: provide complete list of token types. 687db96d56Sopenharmony_ci""" 697db96d56Sopenharmony_ci 707db96d56Sopenharmony_ciimport re 717db96d56Sopenharmony_ciimport sys 727db96d56Sopenharmony_ciimport urllib # For urllib.parse.unquote 737db96d56Sopenharmony_cifrom string import hexdigits 747db96d56Sopenharmony_cifrom operator import itemgetter 757db96d56Sopenharmony_cifrom email import _encoded_words as _ew 767db96d56Sopenharmony_cifrom email import errors 777db96d56Sopenharmony_cifrom email import utils 787db96d56Sopenharmony_ci 797db96d56Sopenharmony_ci# 807db96d56Sopenharmony_ci# Useful constants and functions 817db96d56Sopenharmony_ci# 827db96d56Sopenharmony_ci 837db96d56Sopenharmony_ciWSP = set(' \t') 847db96d56Sopenharmony_ciCFWS_LEADER = WSP | set('(') 857db96d56Sopenharmony_ciSPECIALS = set(r'()<>@,:;.\"[]') 867db96d56Sopenharmony_ciATOM_ENDS = SPECIALS | WSP 877db96d56Sopenharmony_ciDOT_ATOM_ENDS = ATOM_ENDS - set('.') 887db96d56Sopenharmony_ci# '.', '"', and '(' do not end phrases in order to support obs-phrase 897db96d56Sopenharmony_ciPHRASE_ENDS = SPECIALS - set('."(') 907db96d56Sopenharmony_ciTSPECIALS = (SPECIALS | set('/?=')) - set('.') 917db96d56Sopenharmony_ciTOKEN_ENDS = TSPECIALS | WSP 927db96d56Sopenharmony_ciASPECIALS = TSPECIALS | set("*'%") 937db96d56Sopenharmony_ciATTRIBUTE_ENDS = ASPECIALS | WSP 947db96d56Sopenharmony_ciEXTENDED_ATTRIBUTE_ENDS = ATTRIBUTE_ENDS - set('%') 957db96d56Sopenharmony_ci 967db96d56Sopenharmony_cidef quote_string(value): 977db96d56Sopenharmony_ci return '"'+str(value).replace('\\', '\\\\').replace('"', r'\"')+'"' 987db96d56Sopenharmony_ci 997db96d56Sopenharmony_ci# Match a RFC 2047 word, looks like =?utf-8?q?someword?= 1007db96d56Sopenharmony_cirfc2047_matcher = re.compile(r''' 1017db96d56Sopenharmony_ci =\? # literal =? 1027db96d56Sopenharmony_ci [^?]* # charset 1037db96d56Sopenharmony_ci \? # literal ? 1047db96d56Sopenharmony_ci [qQbB] # literal 'q' or 'b', case insensitive 1057db96d56Sopenharmony_ci \? # literal ? 1067db96d56Sopenharmony_ci .*? # encoded word 1077db96d56Sopenharmony_ci \?= # literal ?= 1087db96d56Sopenharmony_ci''', re.VERBOSE | re.MULTILINE) 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_ci 1117db96d56Sopenharmony_ci# 1127db96d56Sopenharmony_ci# TokenList and its subclasses 1137db96d56Sopenharmony_ci# 1147db96d56Sopenharmony_ci 1157db96d56Sopenharmony_ciclass TokenList(list): 1167db96d56Sopenharmony_ci 1177db96d56Sopenharmony_ci token_type = None 1187db96d56Sopenharmony_ci syntactic_break = True 1197db96d56Sopenharmony_ci ew_combine_allowed = True 1207db96d56Sopenharmony_ci 1217db96d56Sopenharmony_ci def __init__(self, *args, **kw): 1227db96d56Sopenharmony_ci super().__init__(*args, **kw) 1237db96d56Sopenharmony_ci self.defects = [] 1247db96d56Sopenharmony_ci 1257db96d56Sopenharmony_ci def __str__(self): 1267db96d56Sopenharmony_ci return ''.join(str(x) for x in self) 1277db96d56Sopenharmony_ci 1287db96d56Sopenharmony_ci def __repr__(self): 1297db96d56Sopenharmony_ci return '{}({})'.format(self.__class__.__name__, 1307db96d56Sopenharmony_ci super().__repr__()) 1317db96d56Sopenharmony_ci 1327db96d56Sopenharmony_ci @property 1337db96d56Sopenharmony_ci def value(self): 1347db96d56Sopenharmony_ci return ''.join(x.value for x in self if x.value) 1357db96d56Sopenharmony_ci 1367db96d56Sopenharmony_ci @property 1377db96d56Sopenharmony_ci def all_defects(self): 1387db96d56Sopenharmony_ci return sum((x.all_defects for x in self), self.defects) 1397db96d56Sopenharmony_ci 1407db96d56Sopenharmony_ci def startswith_fws(self): 1417db96d56Sopenharmony_ci return self[0].startswith_fws() 1427db96d56Sopenharmony_ci 1437db96d56Sopenharmony_ci @property 1447db96d56Sopenharmony_ci def as_ew_allowed(self): 1457db96d56Sopenharmony_ci """True if all top level tokens of this part may be RFC2047 encoded.""" 1467db96d56Sopenharmony_ci return all(part.as_ew_allowed for part in self) 1477db96d56Sopenharmony_ci 1487db96d56Sopenharmony_ci @property 1497db96d56Sopenharmony_ci def comments(self): 1507db96d56Sopenharmony_ci comments = [] 1517db96d56Sopenharmony_ci for token in self: 1527db96d56Sopenharmony_ci comments.extend(token.comments) 1537db96d56Sopenharmony_ci return comments 1547db96d56Sopenharmony_ci 1557db96d56Sopenharmony_ci def fold(self, *, policy): 1567db96d56Sopenharmony_ci return _refold_parse_tree(self, policy=policy) 1577db96d56Sopenharmony_ci 1587db96d56Sopenharmony_ci def pprint(self, indent=''): 1597db96d56Sopenharmony_ci print(self.ppstr(indent=indent)) 1607db96d56Sopenharmony_ci 1617db96d56Sopenharmony_ci def ppstr(self, indent=''): 1627db96d56Sopenharmony_ci return '\n'.join(self._pp(indent=indent)) 1637db96d56Sopenharmony_ci 1647db96d56Sopenharmony_ci def _pp(self, indent=''): 1657db96d56Sopenharmony_ci yield '{}{}/{}('.format( 1667db96d56Sopenharmony_ci indent, 1677db96d56Sopenharmony_ci self.__class__.__name__, 1687db96d56Sopenharmony_ci self.token_type) 1697db96d56Sopenharmony_ci for token in self: 1707db96d56Sopenharmony_ci if not hasattr(token, '_pp'): 1717db96d56Sopenharmony_ci yield (indent + ' !! invalid element in token ' 1727db96d56Sopenharmony_ci 'list: {!r}'.format(token)) 1737db96d56Sopenharmony_ci else: 1747db96d56Sopenharmony_ci yield from token._pp(indent+' ') 1757db96d56Sopenharmony_ci if self.defects: 1767db96d56Sopenharmony_ci extra = ' Defects: {}'.format(self.defects) 1777db96d56Sopenharmony_ci else: 1787db96d56Sopenharmony_ci extra = '' 1797db96d56Sopenharmony_ci yield '{}){}'.format(indent, extra) 1807db96d56Sopenharmony_ci 1817db96d56Sopenharmony_ci 1827db96d56Sopenharmony_ciclass WhiteSpaceTokenList(TokenList): 1837db96d56Sopenharmony_ci 1847db96d56Sopenharmony_ci @property 1857db96d56Sopenharmony_ci def value(self): 1867db96d56Sopenharmony_ci return ' ' 1877db96d56Sopenharmony_ci 1887db96d56Sopenharmony_ci @property 1897db96d56Sopenharmony_ci def comments(self): 1907db96d56Sopenharmony_ci return [x.content for x in self if x.token_type=='comment'] 1917db96d56Sopenharmony_ci 1927db96d56Sopenharmony_ci 1937db96d56Sopenharmony_ciclass UnstructuredTokenList(TokenList): 1947db96d56Sopenharmony_ci token_type = 'unstructured' 1957db96d56Sopenharmony_ci 1967db96d56Sopenharmony_ci 1977db96d56Sopenharmony_ciclass Phrase(TokenList): 1987db96d56Sopenharmony_ci token_type = 'phrase' 1997db96d56Sopenharmony_ci 2007db96d56Sopenharmony_ciclass Word(TokenList): 2017db96d56Sopenharmony_ci token_type = 'word' 2027db96d56Sopenharmony_ci 2037db96d56Sopenharmony_ci 2047db96d56Sopenharmony_ciclass CFWSList(WhiteSpaceTokenList): 2057db96d56Sopenharmony_ci token_type = 'cfws' 2067db96d56Sopenharmony_ci 2077db96d56Sopenharmony_ci 2087db96d56Sopenharmony_ciclass Atom(TokenList): 2097db96d56Sopenharmony_ci token_type = 'atom' 2107db96d56Sopenharmony_ci 2117db96d56Sopenharmony_ci 2127db96d56Sopenharmony_ciclass Token(TokenList): 2137db96d56Sopenharmony_ci token_type = 'token' 2147db96d56Sopenharmony_ci encode_as_ew = False 2157db96d56Sopenharmony_ci 2167db96d56Sopenharmony_ci 2177db96d56Sopenharmony_ciclass EncodedWord(TokenList): 2187db96d56Sopenharmony_ci token_type = 'encoded-word' 2197db96d56Sopenharmony_ci cte = None 2207db96d56Sopenharmony_ci charset = None 2217db96d56Sopenharmony_ci lang = None 2227db96d56Sopenharmony_ci 2237db96d56Sopenharmony_ci 2247db96d56Sopenharmony_ciclass QuotedString(TokenList): 2257db96d56Sopenharmony_ci 2267db96d56Sopenharmony_ci token_type = 'quoted-string' 2277db96d56Sopenharmony_ci 2287db96d56Sopenharmony_ci @property 2297db96d56Sopenharmony_ci def content(self): 2307db96d56Sopenharmony_ci for x in self: 2317db96d56Sopenharmony_ci if x.token_type == 'bare-quoted-string': 2327db96d56Sopenharmony_ci return x.value 2337db96d56Sopenharmony_ci 2347db96d56Sopenharmony_ci @property 2357db96d56Sopenharmony_ci def quoted_value(self): 2367db96d56Sopenharmony_ci res = [] 2377db96d56Sopenharmony_ci for x in self: 2387db96d56Sopenharmony_ci if x.token_type == 'bare-quoted-string': 2397db96d56Sopenharmony_ci res.append(str(x)) 2407db96d56Sopenharmony_ci else: 2417db96d56Sopenharmony_ci res.append(x.value) 2427db96d56Sopenharmony_ci return ''.join(res) 2437db96d56Sopenharmony_ci 2447db96d56Sopenharmony_ci @property 2457db96d56Sopenharmony_ci def stripped_value(self): 2467db96d56Sopenharmony_ci for token in self: 2477db96d56Sopenharmony_ci if token.token_type == 'bare-quoted-string': 2487db96d56Sopenharmony_ci return token.value 2497db96d56Sopenharmony_ci 2507db96d56Sopenharmony_ci 2517db96d56Sopenharmony_ciclass BareQuotedString(QuotedString): 2527db96d56Sopenharmony_ci 2537db96d56Sopenharmony_ci token_type = 'bare-quoted-string' 2547db96d56Sopenharmony_ci 2557db96d56Sopenharmony_ci def __str__(self): 2567db96d56Sopenharmony_ci return quote_string(''.join(str(x) for x in self)) 2577db96d56Sopenharmony_ci 2587db96d56Sopenharmony_ci @property 2597db96d56Sopenharmony_ci def value(self): 2607db96d56Sopenharmony_ci return ''.join(str(x) for x in self) 2617db96d56Sopenharmony_ci 2627db96d56Sopenharmony_ci 2637db96d56Sopenharmony_ciclass Comment(WhiteSpaceTokenList): 2647db96d56Sopenharmony_ci 2657db96d56Sopenharmony_ci token_type = 'comment' 2667db96d56Sopenharmony_ci 2677db96d56Sopenharmony_ci def __str__(self): 2687db96d56Sopenharmony_ci return ''.join(sum([ 2697db96d56Sopenharmony_ci ["("], 2707db96d56Sopenharmony_ci [self.quote(x) for x in self], 2717db96d56Sopenharmony_ci [")"], 2727db96d56Sopenharmony_ci ], [])) 2737db96d56Sopenharmony_ci 2747db96d56Sopenharmony_ci def quote(self, value): 2757db96d56Sopenharmony_ci if value.token_type == 'comment': 2767db96d56Sopenharmony_ci return str(value) 2777db96d56Sopenharmony_ci return str(value).replace('\\', '\\\\').replace( 2787db96d56Sopenharmony_ci '(', r'\(').replace( 2797db96d56Sopenharmony_ci ')', r'\)') 2807db96d56Sopenharmony_ci 2817db96d56Sopenharmony_ci @property 2827db96d56Sopenharmony_ci def content(self): 2837db96d56Sopenharmony_ci return ''.join(str(x) for x in self) 2847db96d56Sopenharmony_ci 2857db96d56Sopenharmony_ci @property 2867db96d56Sopenharmony_ci def comments(self): 2877db96d56Sopenharmony_ci return [self.content] 2887db96d56Sopenharmony_ci 2897db96d56Sopenharmony_ciclass AddressList(TokenList): 2907db96d56Sopenharmony_ci 2917db96d56Sopenharmony_ci token_type = 'address-list' 2927db96d56Sopenharmony_ci 2937db96d56Sopenharmony_ci @property 2947db96d56Sopenharmony_ci def addresses(self): 2957db96d56Sopenharmony_ci return [x for x in self if x.token_type=='address'] 2967db96d56Sopenharmony_ci 2977db96d56Sopenharmony_ci @property 2987db96d56Sopenharmony_ci def mailboxes(self): 2997db96d56Sopenharmony_ci return sum((x.mailboxes 3007db96d56Sopenharmony_ci for x in self if x.token_type=='address'), []) 3017db96d56Sopenharmony_ci 3027db96d56Sopenharmony_ci @property 3037db96d56Sopenharmony_ci def all_mailboxes(self): 3047db96d56Sopenharmony_ci return sum((x.all_mailboxes 3057db96d56Sopenharmony_ci for x in self if x.token_type=='address'), []) 3067db96d56Sopenharmony_ci 3077db96d56Sopenharmony_ci 3087db96d56Sopenharmony_ciclass Address(TokenList): 3097db96d56Sopenharmony_ci 3107db96d56Sopenharmony_ci token_type = 'address' 3117db96d56Sopenharmony_ci 3127db96d56Sopenharmony_ci @property 3137db96d56Sopenharmony_ci def display_name(self): 3147db96d56Sopenharmony_ci if self[0].token_type == 'group': 3157db96d56Sopenharmony_ci return self[0].display_name 3167db96d56Sopenharmony_ci 3177db96d56Sopenharmony_ci @property 3187db96d56Sopenharmony_ci def mailboxes(self): 3197db96d56Sopenharmony_ci if self[0].token_type == 'mailbox': 3207db96d56Sopenharmony_ci return [self[0]] 3217db96d56Sopenharmony_ci elif self[0].token_type == 'invalid-mailbox': 3227db96d56Sopenharmony_ci return [] 3237db96d56Sopenharmony_ci return self[0].mailboxes 3247db96d56Sopenharmony_ci 3257db96d56Sopenharmony_ci @property 3267db96d56Sopenharmony_ci def all_mailboxes(self): 3277db96d56Sopenharmony_ci if self[0].token_type == 'mailbox': 3287db96d56Sopenharmony_ci return [self[0]] 3297db96d56Sopenharmony_ci elif self[0].token_type == 'invalid-mailbox': 3307db96d56Sopenharmony_ci return [self[0]] 3317db96d56Sopenharmony_ci return self[0].all_mailboxes 3327db96d56Sopenharmony_ci 3337db96d56Sopenharmony_ciclass MailboxList(TokenList): 3347db96d56Sopenharmony_ci 3357db96d56Sopenharmony_ci token_type = 'mailbox-list' 3367db96d56Sopenharmony_ci 3377db96d56Sopenharmony_ci @property 3387db96d56Sopenharmony_ci def mailboxes(self): 3397db96d56Sopenharmony_ci return [x for x in self if x.token_type=='mailbox'] 3407db96d56Sopenharmony_ci 3417db96d56Sopenharmony_ci @property 3427db96d56Sopenharmony_ci def all_mailboxes(self): 3437db96d56Sopenharmony_ci return [x for x in self 3447db96d56Sopenharmony_ci if x.token_type in ('mailbox', 'invalid-mailbox')] 3457db96d56Sopenharmony_ci 3467db96d56Sopenharmony_ci 3477db96d56Sopenharmony_ciclass GroupList(TokenList): 3487db96d56Sopenharmony_ci 3497db96d56Sopenharmony_ci token_type = 'group-list' 3507db96d56Sopenharmony_ci 3517db96d56Sopenharmony_ci @property 3527db96d56Sopenharmony_ci def mailboxes(self): 3537db96d56Sopenharmony_ci if not self or self[0].token_type != 'mailbox-list': 3547db96d56Sopenharmony_ci return [] 3557db96d56Sopenharmony_ci return self[0].mailboxes 3567db96d56Sopenharmony_ci 3577db96d56Sopenharmony_ci @property 3587db96d56Sopenharmony_ci def all_mailboxes(self): 3597db96d56Sopenharmony_ci if not self or self[0].token_type != 'mailbox-list': 3607db96d56Sopenharmony_ci return [] 3617db96d56Sopenharmony_ci return self[0].all_mailboxes 3627db96d56Sopenharmony_ci 3637db96d56Sopenharmony_ci 3647db96d56Sopenharmony_ciclass Group(TokenList): 3657db96d56Sopenharmony_ci 3667db96d56Sopenharmony_ci token_type = "group" 3677db96d56Sopenharmony_ci 3687db96d56Sopenharmony_ci @property 3697db96d56Sopenharmony_ci def mailboxes(self): 3707db96d56Sopenharmony_ci if self[2].token_type != 'group-list': 3717db96d56Sopenharmony_ci return [] 3727db96d56Sopenharmony_ci return self[2].mailboxes 3737db96d56Sopenharmony_ci 3747db96d56Sopenharmony_ci @property 3757db96d56Sopenharmony_ci def all_mailboxes(self): 3767db96d56Sopenharmony_ci if self[2].token_type != 'group-list': 3777db96d56Sopenharmony_ci return [] 3787db96d56Sopenharmony_ci return self[2].all_mailboxes 3797db96d56Sopenharmony_ci 3807db96d56Sopenharmony_ci @property 3817db96d56Sopenharmony_ci def display_name(self): 3827db96d56Sopenharmony_ci return self[0].display_name 3837db96d56Sopenharmony_ci 3847db96d56Sopenharmony_ci 3857db96d56Sopenharmony_ciclass NameAddr(TokenList): 3867db96d56Sopenharmony_ci 3877db96d56Sopenharmony_ci token_type = 'name-addr' 3887db96d56Sopenharmony_ci 3897db96d56Sopenharmony_ci @property 3907db96d56Sopenharmony_ci def display_name(self): 3917db96d56Sopenharmony_ci if len(self) == 1: 3927db96d56Sopenharmony_ci return None 3937db96d56Sopenharmony_ci return self[0].display_name 3947db96d56Sopenharmony_ci 3957db96d56Sopenharmony_ci @property 3967db96d56Sopenharmony_ci def local_part(self): 3977db96d56Sopenharmony_ci return self[-1].local_part 3987db96d56Sopenharmony_ci 3997db96d56Sopenharmony_ci @property 4007db96d56Sopenharmony_ci def domain(self): 4017db96d56Sopenharmony_ci return self[-1].domain 4027db96d56Sopenharmony_ci 4037db96d56Sopenharmony_ci @property 4047db96d56Sopenharmony_ci def route(self): 4057db96d56Sopenharmony_ci return self[-1].route 4067db96d56Sopenharmony_ci 4077db96d56Sopenharmony_ci @property 4087db96d56Sopenharmony_ci def addr_spec(self): 4097db96d56Sopenharmony_ci return self[-1].addr_spec 4107db96d56Sopenharmony_ci 4117db96d56Sopenharmony_ci 4127db96d56Sopenharmony_ciclass AngleAddr(TokenList): 4137db96d56Sopenharmony_ci 4147db96d56Sopenharmony_ci token_type = 'angle-addr' 4157db96d56Sopenharmony_ci 4167db96d56Sopenharmony_ci @property 4177db96d56Sopenharmony_ci def local_part(self): 4187db96d56Sopenharmony_ci for x in self: 4197db96d56Sopenharmony_ci if x.token_type == 'addr-spec': 4207db96d56Sopenharmony_ci return x.local_part 4217db96d56Sopenharmony_ci 4227db96d56Sopenharmony_ci @property 4237db96d56Sopenharmony_ci def domain(self): 4247db96d56Sopenharmony_ci for x in self: 4257db96d56Sopenharmony_ci if x.token_type == 'addr-spec': 4267db96d56Sopenharmony_ci return x.domain 4277db96d56Sopenharmony_ci 4287db96d56Sopenharmony_ci @property 4297db96d56Sopenharmony_ci def route(self): 4307db96d56Sopenharmony_ci for x in self: 4317db96d56Sopenharmony_ci if x.token_type == 'obs-route': 4327db96d56Sopenharmony_ci return x.domains 4337db96d56Sopenharmony_ci 4347db96d56Sopenharmony_ci @property 4357db96d56Sopenharmony_ci def addr_spec(self): 4367db96d56Sopenharmony_ci for x in self: 4377db96d56Sopenharmony_ci if x.token_type == 'addr-spec': 4387db96d56Sopenharmony_ci if x.local_part: 4397db96d56Sopenharmony_ci return x.addr_spec 4407db96d56Sopenharmony_ci else: 4417db96d56Sopenharmony_ci return quote_string(x.local_part) + x.addr_spec 4427db96d56Sopenharmony_ci else: 4437db96d56Sopenharmony_ci return '<>' 4447db96d56Sopenharmony_ci 4457db96d56Sopenharmony_ci 4467db96d56Sopenharmony_ciclass ObsRoute(TokenList): 4477db96d56Sopenharmony_ci 4487db96d56Sopenharmony_ci token_type = 'obs-route' 4497db96d56Sopenharmony_ci 4507db96d56Sopenharmony_ci @property 4517db96d56Sopenharmony_ci def domains(self): 4527db96d56Sopenharmony_ci return [x.domain for x in self if x.token_type == 'domain'] 4537db96d56Sopenharmony_ci 4547db96d56Sopenharmony_ci 4557db96d56Sopenharmony_ciclass Mailbox(TokenList): 4567db96d56Sopenharmony_ci 4577db96d56Sopenharmony_ci token_type = 'mailbox' 4587db96d56Sopenharmony_ci 4597db96d56Sopenharmony_ci @property 4607db96d56Sopenharmony_ci def display_name(self): 4617db96d56Sopenharmony_ci if self[0].token_type == 'name-addr': 4627db96d56Sopenharmony_ci return self[0].display_name 4637db96d56Sopenharmony_ci 4647db96d56Sopenharmony_ci @property 4657db96d56Sopenharmony_ci def local_part(self): 4667db96d56Sopenharmony_ci return self[0].local_part 4677db96d56Sopenharmony_ci 4687db96d56Sopenharmony_ci @property 4697db96d56Sopenharmony_ci def domain(self): 4707db96d56Sopenharmony_ci return self[0].domain 4717db96d56Sopenharmony_ci 4727db96d56Sopenharmony_ci @property 4737db96d56Sopenharmony_ci def route(self): 4747db96d56Sopenharmony_ci if self[0].token_type == 'name-addr': 4757db96d56Sopenharmony_ci return self[0].route 4767db96d56Sopenharmony_ci 4777db96d56Sopenharmony_ci @property 4787db96d56Sopenharmony_ci def addr_spec(self): 4797db96d56Sopenharmony_ci return self[0].addr_spec 4807db96d56Sopenharmony_ci 4817db96d56Sopenharmony_ci 4827db96d56Sopenharmony_ciclass InvalidMailbox(TokenList): 4837db96d56Sopenharmony_ci 4847db96d56Sopenharmony_ci token_type = 'invalid-mailbox' 4857db96d56Sopenharmony_ci 4867db96d56Sopenharmony_ci @property 4877db96d56Sopenharmony_ci def display_name(self): 4887db96d56Sopenharmony_ci return None 4897db96d56Sopenharmony_ci 4907db96d56Sopenharmony_ci local_part = domain = route = addr_spec = display_name 4917db96d56Sopenharmony_ci 4927db96d56Sopenharmony_ci 4937db96d56Sopenharmony_ciclass Domain(TokenList): 4947db96d56Sopenharmony_ci 4957db96d56Sopenharmony_ci token_type = 'domain' 4967db96d56Sopenharmony_ci as_ew_allowed = False 4977db96d56Sopenharmony_ci 4987db96d56Sopenharmony_ci @property 4997db96d56Sopenharmony_ci def domain(self): 5007db96d56Sopenharmony_ci return ''.join(super().value.split()) 5017db96d56Sopenharmony_ci 5027db96d56Sopenharmony_ci 5037db96d56Sopenharmony_ciclass DotAtom(TokenList): 5047db96d56Sopenharmony_ci token_type = 'dot-atom' 5057db96d56Sopenharmony_ci 5067db96d56Sopenharmony_ci 5077db96d56Sopenharmony_ciclass DotAtomText(TokenList): 5087db96d56Sopenharmony_ci token_type = 'dot-atom-text' 5097db96d56Sopenharmony_ci as_ew_allowed = True 5107db96d56Sopenharmony_ci 5117db96d56Sopenharmony_ci 5127db96d56Sopenharmony_ciclass NoFoldLiteral(TokenList): 5137db96d56Sopenharmony_ci token_type = 'no-fold-literal' 5147db96d56Sopenharmony_ci as_ew_allowed = False 5157db96d56Sopenharmony_ci 5167db96d56Sopenharmony_ci 5177db96d56Sopenharmony_ciclass AddrSpec(TokenList): 5187db96d56Sopenharmony_ci 5197db96d56Sopenharmony_ci token_type = 'addr-spec' 5207db96d56Sopenharmony_ci as_ew_allowed = False 5217db96d56Sopenharmony_ci 5227db96d56Sopenharmony_ci @property 5237db96d56Sopenharmony_ci def local_part(self): 5247db96d56Sopenharmony_ci return self[0].local_part 5257db96d56Sopenharmony_ci 5267db96d56Sopenharmony_ci @property 5277db96d56Sopenharmony_ci def domain(self): 5287db96d56Sopenharmony_ci if len(self) < 3: 5297db96d56Sopenharmony_ci return None 5307db96d56Sopenharmony_ci return self[-1].domain 5317db96d56Sopenharmony_ci 5327db96d56Sopenharmony_ci @property 5337db96d56Sopenharmony_ci def value(self): 5347db96d56Sopenharmony_ci if len(self) < 3: 5357db96d56Sopenharmony_ci return self[0].value 5367db96d56Sopenharmony_ci return self[0].value.rstrip()+self[1].value+self[2].value.lstrip() 5377db96d56Sopenharmony_ci 5387db96d56Sopenharmony_ci @property 5397db96d56Sopenharmony_ci def addr_spec(self): 5407db96d56Sopenharmony_ci nameset = set(self.local_part) 5417db96d56Sopenharmony_ci if len(nameset) > len(nameset-DOT_ATOM_ENDS): 5427db96d56Sopenharmony_ci lp = quote_string(self.local_part) 5437db96d56Sopenharmony_ci else: 5447db96d56Sopenharmony_ci lp = self.local_part 5457db96d56Sopenharmony_ci if self.domain is not None: 5467db96d56Sopenharmony_ci return lp + '@' + self.domain 5477db96d56Sopenharmony_ci return lp 5487db96d56Sopenharmony_ci 5497db96d56Sopenharmony_ci 5507db96d56Sopenharmony_ciclass ObsLocalPart(TokenList): 5517db96d56Sopenharmony_ci 5527db96d56Sopenharmony_ci token_type = 'obs-local-part' 5537db96d56Sopenharmony_ci as_ew_allowed = False 5547db96d56Sopenharmony_ci 5557db96d56Sopenharmony_ci 5567db96d56Sopenharmony_ciclass DisplayName(Phrase): 5577db96d56Sopenharmony_ci 5587db96d56Sopenharmony_ci token_type = 'display-name' 5597db96d56Sopenharmony_ci ew_combine_allowed = False 5607db96d56Sopenharmony_ci 5617db96d56Sopenharmony_ci @property 5627db96d56Sopenharmony_ci def display_name(self): 5637db96d56Sopenharmony_ci res = TokenList(self) 5647db96d56Sopenharmony_ci if len(res) == 0: 5657db96d56Sopenharmony_ci return res.value 5667db96d56Sopenharmony_ci if res[0].token_type == 'cfws': 5677db96d56Sopenharmony_ci res.pop(0) 5687db96d56Sopenharmony_ci else: 5697db96d56Sopenharmony_ci if res[0][0].token_type == 'cfws': 5707db96d56Sopenharmony_ci res[0] = TokenList(res[0][1:]) 5717db96d56Sopenharmony_ci if res[-1].token_type == 'cfws': 5727db96d56Sopenharmony_ci res.pop() 5737db96d56Sopenharmony_ci else: 5747db96d56Sopenharmony_ci if res[-1][-1].token_type == 'cfws': 5757db96d56Sopenharmony_ci res[-1] = TokenList(res[-1][:-1]) 5767db96d56Sopenharmony_ci return res.value 5777db96d56Sopenharmony_ci 5787db96d56Sopenharmony_ci @property 5797db96d56Sopenharmony_ci def value(self): 5807db96d56Sopenharmony_ci quote = False 5817db96d56Sopenharmony_ci if self.defects: 5827db96d56Sopenharmony_ci quote = True 5837db96d56Sopenharmony_ci else: 5847db96d56Sopenharmony_ci for x in self: 5857db96d56Sopenharmony_ci if x.token_type == 'quoted-string': 5867db96d56Sopenharmony_ci quote = True 5877db96d56Sopenharmony_ci if len(self) != 0 and quote: 5887db96d56Sopenharmony_ci pre = post = '' 5897db96d56Sopenharmony_ci if self[0].token_type=='cfws' or self[0][0].token_type=='cfws': 5907db96d56Sopenharmony_ci pre = ' ' 5917db96d56Sopenharmony_ci if self[-1].token_type=='cfws' or self[-1][-1].token_type=='cfws': 5927db96d56Sopenharmony_ci post = ' ' 5937db96d56Sopenharmony_ci return pre+quote_string(self.display_name)+post 5947db96d56Sopenharmony_ci else: 5957db96d56Sopenharmony_ci return super().value 5967db96d56Sopenharmony_ci 5977db96d56Sopenharmony_ci 5987db96d56Sopenharmony_ciclass LocalPart(TokenList): 5997db96d56Sopenharmony_ci 6007db96d56Sopenharmony_ci token_type = 'local-part' 6017db96d56Sopenharmony_ci as_ew_allowed = False 6027db96d56Sopenharmony_ci 6037db96d56Sopenharmony_ci @property 6047db96d56Sopenharmony_ci def value(self): 6057db96d56Sopenharmony_ci if self[0].token_type == "quoted-string": 6067db96d56Sopenharmony_ci return self[0].quoted_value 6077db96d56Sopenharmony_ci else: 6087db96d56Sopenharmony_ci return self[0].value 6097db96d56Sopenharmony_ci 6107db96d56Sopenharmony_ci @property 6117db96d56Sopenharmony_ci def local_part(self): 6127db96d56Sopenharmony_ci # Strip whitespace from front, back, and around dots. 6137db96d56Sopenharmony_ci res = [DOT] 6147db96d56Sopenharmony_ci last = DOT 6157db96d56Sopenharmony_ci last_is_tl = False 6167db96d56Sopenharmony_ci for tok in self[0] + [DOT]: 6177db96d56Sopenharmony_ci if tok.token_type == 'cfws': 6187db96d56Sopenharmony_ci continue 6197db96d56Sopenharmony_ci if (last_is_tl and tok.token_type == 'dot' and 6207db96d56Sopenharmony_ci last[-1].token_type == 'cfws'): 6217db96d56Sopenharmony_ci res[-1] = TokenList(last[:-1]) 6227db96d56Sopenharmony_ci is_tl = isinstance(tok, TokenList) 6237db96d56Sopenharmony_ci if (is_tl and last.token_type == 'dot' and 6247db96d56Sopenharmony_ci tok[0].token_type == 'cfws'): 6257db96d56Sopenharmony_ci res.append(TokenList(tok[1:])) 6267db96d56Sopenharmony_ci else: 6277db96d56Sopenharmony_ci res.append(tok) 6287db96d56Sopenharmony_ci last = res[-1] 6297db96d56Sopenharmony_ci last_is_tl = is_tl 6307db96d56Sopenharmony_ci res = TokenList(res[1:-1]) 6317db96d56Sopenharmony_ci return res.value 6327db96d56Sopenharmony_ci 6337db96d56Sopenharmony_ci 6347db96d56Sopenharmony_ciclass DomainLiteral(TokenList): 6357db96d56Sopenharmony_ci 6367db96d56Sopenharmony_ci token_type = 'domain-literal' 6377db96d56Sopenharmony_ci as_ew_allowed = False 6387db96d56Sopenharmony_ci 6397db96d56Sopenharmony_ci @property 6407db96d56Sopenharmony_ci def domain(self): 6417db96d56Sopenharmony_ci return ''.join(super().value.split()) 6427db96d56Sopenharmony_ci 6437db96d56Sopenharmony_ci @property 6447db96d56Sopenharmony_ci def ip(self): 6457db96d56Sopenharmony_ci for x in self: 6467db96d56Sopenharmony_ci if x.token_type == 'ptext': 6477db96d56Sopenharmony_ci return x.value 6487db96d56Sopenharmony_ci 6497db96d56Sopenharmony_ci 6507db96d56Sopenharmony_ciclass MIMEVersion(TokenList): 6517db96d56Sopenharmony_ci 6527db96d56Sopenharmony_ci token_type = 'mime-version' 6537db96d56Sopenharmony_ci major = None 6547db96d56Sopenharmony_ci minor = None 6557db96d56Sopenharmony_ci 6567db96d56Sopenharmony_ci 6577db96d56Sopenharmony_ciclass Parameter(TokenList): 6587db96d56Sopenharmony_ci 6597db96d56Sopenharmony_ci token_type = 'parameter' 6607db96d56Sopenharmony_ci sectioned = False 6617db96d56Sopenharmony_ci extended = False 6627db96d56Sopenharmony_ci charset = 'us-ascii' 6637db96d56Sopenharmony_ci 6647db96d56Sopenharmony_ci @property 6657db96d56Sopenharmony_ci def section_number(self): 6667db96d56Sopenharmony_ci # Because the first token, the attribute (name) eats CFWS, the second 6677db96d56Sopenharmony_ci # token is always the section if there is one. 6687db96d56Sopenharmony_ci return self[1].number if self.sectioned else 0 6697db96d56Sopenharmony_ci 6707db96d56Sopenharmony_ci @property 6717db96d56Sopenharmony_ci def param_value(self): 6727db96d56Sopenharmony_ci # This is part of the "handle quoted extended parameters" hack. 6737db96d56Sopenharmony_ci for token in self: 6747db96d56Sopenharmony_ci if token.token_type == 'value': 6757db96d56Sopenharmony_ci return token.stripped_value 6767db96d56Sopenharmony_ci if token.token_type == 'quoted-string': 6777db96d56Sopenharmony_ci for token in token: 6787db96d56Sopenharmony_ci if token.token_type == 'bare-quoted-string': 6797db96d56Sopenharmony_ci for token in token: 6807db96d56Sopenharmony_ci if token.token_type == 'value': 6817db96d56Sopenharmony_ci return token.stripped_value 6827db96d56Sopenharmony_ci return '' 6837db96d56Sopenharmony_ci 6847db96d56Sopenharmony_ci 6857db96d56Sopenharmony_ciclass InvalidParameter(Parameter): 6867db96d56Sopenharmony_ci 6877db96d56Sopenharmony_ci token_type = 'invalid-parameter' 6887db96d56Sopenharmony_ci 6897db96d56Sopenharmony_ci 6907db96d56Sopenharmony_ciclass Attribute(TokenList): 6917db96d56Sopenharmony_ci 6927db96d56Sopenharmony_ci token_type = 'attribute' 6937db96d56Sopenharmony_ci 6947db96d56Sopenharmony_ci @property 6957db96d56Sopenharmony_ci def stripped_value(self): 6967db96d56Sopenharmony_ci for token in self: 6977db96d56Sopenharmony_ci if token.token_type.endswith('attrtext'): 6987db96d56Sopenharmony_ci return token.value 6997db96d56Sopenharmony_ci 7007db96d56Sopenharmony_ciclass Section(TokenList): 7017db96d56Sopenharmony_ci 7027db96d56Sopenharmony_ci token_type = 'section' 7037db96d56Sopenharmony_ci number = None 7047db96d56Sopenharmony_ci 7057db96d56Sopenharmony_ci 7067db96d56Sopenharmony_ciclass Value(TokenList): 7077db96d56Sopenharmony_ci 7087db96d56Sopenharmony_ci token_type = 'value' 7097db96d56Sopenharmony_ci 7107db96d56Sopenharmony_ci @property 7117db96d56Sopenharmony_ci def stripped_value(self): 7127db96d56Sopenharmony_ci token = self[0] 7137db96d56Sopenharmony_ci if token.token_type == 'cfws': 7147db96d56Sopenharmony_ci token = self[1] 7157db96d56Sopenharmony_ci if token.token_type.endswith( 7167db96d56Sopenharmony_ci ('quoted-string', 'attribute', 'extended-attribute')): 7177db96d56Sopenharmony_ci return token.stripped_value 7187db96d56Sopenharmony_ci return self.value 7197db96d56Sopenharmony_ci 7207db96d56Sopenharmony_ci 7217db96d56Sopenharmony_ciclass MimeParameters(TokenList): 7227db96d56Sopenharmony_ci 7237db96d56Sopenharmony_ci token_type = 'mime-parameters' 7247db96d56Sopenharmony_ci syntactic_break = False 7257db96d56Sopenharmony_ci 7267db96d56Sopenharmony_ci @property 7277db96d56Sopenharmony_ci def params(self): 7287db96d56Sopenharmony_ci # The RFC specifically states that the ordering of parameters is not 7297db96d56Sopenharmony_ci # guaranteed and may be reordered by the transport layer. So we have 7307db96d56Sopenharmony_ci # to assume the RFC 2231 pieces can come in any order. However, we 7317db96d56Sopenharmony_ci # output them in the order that we first see a given name, which gives 7327db96d56Sopenharmony_ci # us a stable __str__. 7337db96d56Sopenharmony_ci params = {} # Using order preserving dict from Python 3.7+ 7347db96d56Sopenharmony_ci for token in self: 7357db96d56Sopenharmony_ci if not token.token_type.endswith('parameter'): 7367db96d56Sopenharmony_ci continue 7377db96d56Sopenharmony_ci if token[0].token_type != 'attribute': 7387db96d56Sopenharmony_ci continue 7397db96d56Sopenharmony_ci name = token[0].value.strip() 7407db96d56Sopenharmony_ci if name not in params: 7417db96d56Sopenharmony_ci params[name] = [] 7427db96d56Sopenharmony_ci params[name].append((token.section_number, token)) 7437db96d56Sopenharmony_ci for name, parts in params.items(): 7447db96d56Sopenharmony_ci parts = sorted(parts, key=itemgetter(0)) 7457db96d56Sopenharmony_ci first_param = parts[0][1] 7467db96d56Sopenharmony_ci charset = first_param.charset 7477db96d56Sopenharmony_ci # Our arbitrary error recovery is to ignore duplicate parameters, 7487db96d56Sopenharmony_ci # to use appearance order if there are duplicate rfc 2231 parts, 7497db96d56Sopenharmony_ci # and to ignore gaps. This mimics the error recovery of get_param. 7507db96d56Sopenharmony_ci if not first_param.extended and len(parts) > 1: 7517db96d56Sopenharmony_ci if parts[1][0] == 0: 7527db96d56Sopenharmony_ci parts[1][1].defects.append(errors.InvalidHeaderDefect( 7537db96d56Sopenharmony_ci 'duplicate parameter name; duplicate(s) ignored')) 7547db96d56Sopenharmony_ci parts = parts[:1] 7557db96d56Sopenharmony_ci # Else assume the *0* was missing...note that this is different 7567db96d56Sopenharmony_ci # from get_param, but we registered a defect for this earlier. 7577db96d56Sopenharmony_ci value_parts = [] 7587db96d56Sopenharmony_ci i = 0 7597db96d56Sopenharmony_ci for section_number, param in parts: 7607db96d56Sopenharmony_ci if section_number != i: 7617db96d56Sopenharmony_ci # We could get fancier here and look for a complete 7627db96d56Sopenharmony_ci # duplicate extended parameter and ignore the second one 7637db96d56Sopenharmony_ci # seen. But we're not doing that. The old code didn't. 7647db96d56Sopenharmony_ci if not param.extended: 7657db96d56Sopenharmony_ci param.defects.append(errors.InvalidHeaderDefect( 7667db96d56Sopenharmony_ci 'duplicate parameter name; duplicate ignored')) 7677db96d56Sopenharmony_ci continue 7687db96d56Sopenharmony_ci else: 7697db96d56Sopenharmony_ci param.defects.append(errors.InvalidHeaderDefect( 7707db96d56Sopenharmony_ci "inconsistent RFC2231 parameter numbering")) 7717db96d56Sopenharmony_ci i += 1 7727db96d56Sopenharmony_ci value = param.param_value 7737db96d56Sopenharmony_ci if param.extended: 7747db96d56Sopenharmony_ci try: 7757db96d56Sopenharmony_ci value = urllib.parse.unquote_to_bytes(value) 7767db96d56Sopenharmony_ci except UnicodeEncodeError: 7777db96d56Sopenharmony_ci # source had surrogate escaped bytes. What we do now 7787db96d56Sopenharmony_ci # is a bit of an open question. I'm not sure this is 7797db96d56Sopenharmony_ci # the best choice, but it is what the old algorithm did 7807db96d56Sopenharmony_ci value = urllib.parse.unquote(value, encoding='latin-1') 7817db96d56Sopenharmony_ci else: 7827db96d56Sopenharmony_ci try: 7837db96d56Sopenharmony_ci value = value.decode(charset, 'surrogateescape') 7847db96d56Sopenharmony_ci except (LookupError, UnicodeEncodeError): 7857db96d56Sopenharmony_ci # XXX: there should really be a custom defect for 7867db96d56Sopenharmony_ci # unknown character set to make it easy to find, 7877db96d56Sopenharmony_ci # because otherwise unknown charset is a silent 7887db96d56Sopenharmony_ci # failure. 7897db96d56Sopenharmony_ci value = value.decode('us-ascii', 'surrogateescape') 7907db96d56Sopenharmony_ci if utils._has_surrogates(value): 7917db96d56Sopenharmony_ci param.defects.append(errors.UndecodableBytesDefect()) 7927db96d56Sopenharmony_ci value_parts.append(value) 7937db96d56Sopenharmony_ci value = ''.join(value_parts) 7947db96d56Sopenharmony_ci yield name, value 7957db96d56Sopenharmony_ci 7967db96d56Sopenharmony_ci def __str__(self): 7977db96d56Sopenharmony_ci params = [] 7987db96d56Sopenharmony_ci for name, value in self.params: 7997db96d56Sopenharmony_ci if value: 8007db96d56Sopenharmony_ci params.append('{}={}'.format(name, quote_string(value))) 8017db96d56Sopenharmony_ci else: 8027db96d56Sopenharmony_ci params.append(name) 8037db96d56Sopenharmony_ci params = '; '.join(params) 8047db96d56Sopenharmony_ci return ' ' + params if params else '' 8057db96d56Sopenharmony_ci 8067db96d56Sopenharmony_ci 8077db96d56Sopenharmony_ciclass ParameterizedHeaderValue(TokenList): 8087db96d56Sopenharmony_ci 8097db96d56Sopenharmony_ci # Set this false so that the value doesn't wind up on a new line even 8107db96d56Sopenharmony_ci # if it and the parameters would fit there but not on the first line. 8117db96d56Sopenharmony_ci syntactic_break = False 8127db96d56Sopenharmony_ci 8137db96d56Sopenharmony_ci @property 8147db96d56Sopenharmony_ci def params(self): 8157db96d56Sopenharmony_ci for token in reversed(self): 8167db96d56Sopenharmony_ci if token.token_type == 'mime-parameters': 8177db96d56Sopenharmony_ci return token.params 8187db96d56Sopenharmony_ci return {} 8197db96d56Sopenharmony_ci 8207db96d56Sopenharmony_ci 8217db96d56Sopenharmony_ciclass ContentType(ParameterizedHeaderValue): 8227db96d56Sopenharmony_ci token_type = 'content-type' 8237db96d56Sopenharmony_ci as_ew_allowed = False 8247db96d56Sopenharmony_ci maintype = 'text' 8257db96d56Sopenharmony_ci subtype = 'plain' 8267db96d56Sopenharmony_ci 8277db96d56Sopenharmony_ci 8287db96d56Sopenharmony_ciclass ContentDisposition(ParameterizedHeaderValue): 8297db96d56Sopenharmony_ci token_type = 'content-disposition' 8307db96d56Sopenharmony_ci as_ew_allowed = False 8317db96d56Sopenharmony_ci content_disposition = None 8327db96d56Sopenharmony_ci 8337db96d56Sopenharmony_ci 8347db96d56Sopenharmony_ciclass ContentTransferEncoding(TokenList): 8357db96d56Sopenharmony_ci token_type = 'content-transfer-encoding' 8367db96d56Sopenharmony_ci as_ew_allowed = False 8377db96d56Sopenharmony_ci cte = '7bit' 8387db96d56Sopenharmony_ci 8397db96d56Sopenharmony_ci 8407db96d56Sopenharmony_ciclass HeaderLabel(TokenList): 8417db96d56Sopenharmony_ci token_type = 'header-label' 8427db96d56Sopenharmony_ci as_ew_allowed = False 8437db96d56Sopenharmony_ci 8447db96d56Sopenharmony_ci 8457db96d56Sopenharmony_ciclass MsgID(TokenList): 8467db96d56Sopenharmony_ci token_type = 'msg-id' 8477db96d56Sopenharmony_ci as_ew_allowed = False 8487db96d56Sopenharmony_ci 8497db96d56Sopenharmony_ci def fold(self, policy): 8507db96d56Sopenharmony_ci # message-id tokens may not be folded. 8517db96d56Sopenharmony_ci return str(self) + policy.linesep 8527db96d56Sopenharmony_ci 8537db96d56Sopenharmony_ci 8547db96d56Sopenharmony_ciclass MessageID(MsgID): 8557db96d56Sopenharmony_ci token_type = 'message-id' 8567db96d56Sopenharmony_ci 8577db96d56Sopenharmony_ci 8587db96d56Sopenharmony_ciclass InvalidMessageID(MessageID): 8597db96d56Sopenharmony_ci token_type = 'invalid-message-id' 8607db96d56Sopenharmony_ci 8617db96d56Sopenharmony_ci 8627db96d56Sopenharmony_ciclass Header(TokenList): 8637db96d56Sopenharmony_ci token_type = 'header' 8647db96d56Sopenharmony_ci 8657db96d56Sopenharmony_ci 8667db96d56Sopenharmony_ci# 8677db96d56Sopenharmony_ci# Terminal classes and instances 8687db96d56Sopenharmony_ci# 8697db96d56Sopenharmony_ci 8707db96d56Sopenharmony_ciclass Terminal(str): 8717db96d56Sopenharmony_ci 8727db96d56Sopenharmony_ci as_ew_allowed = True 8737db96d56Sopenharmony_ci ew_combine_allowed = True 8747db96d56Sopenharmony_ci syntactic_break = True 8757db96d56Sopenharmony_ci 8767db96d56Sopenharmony_ci def __new__(cls, value, token_type): 8777db96d56Sopenharmony_ci self = super().__new__(cls, value) 8787db96d56Sopenharmony_ci self.token_type = token_type 8797db96d56Sopenharmony_ci self.defects = [] 8807db96d56Sopenharmony_ci return self 8817db96d56Sopenharmony_ci 8827db96d56Sopenharmony_ci def __repr__(self): 8837db96d56Sopenharmony_ci return "{}({})".format(self.__class__.__name__, super().__repr__()) 8847db96d56Sopenharmony_ci 8857db96d56Sopenharmony_ci def pprint(self): 8867db96d56Sopenharmony_ci print(self.__class__.__name__ + '/' + self.token_type) 8877db96d56Sopenharmony_ci 8887db96d56Sopenharmony_ci @property 8897db96d56Sopenharmony_ci def all_defects(self): 8907db96d56Sopenharmony_ci return list(self.defects) 8917db96d56Sopenharmony_ci 8927db96d56Sopenharmony_ci def _pp(self, indent=''): 8937db96d56Sopenharmony_ci return ["{}{}/{}({}){}".format( 8947db96d56Sopenharmony_ci indent, 8957db96d56Sopenharmony_ci self.__class__.__name__, 8967db96d56Sopenharmony_ci self.token_type, 8977db96d56Sopenharmony_ci super().__repr__(), 8987db96d56Sopenharmony_ci '' if not self.defects else ' {}'.format(self.defects), 8997db96d56Sopenharmony_ci )] 9007db96d56Sopenharmony_ci 9017db96d56Sopenharmony_ci def pop_trailing_ws(self): 9027db96d56Sopenharmony_ci # This terminates the recursion. 9037db96d56Sopenharmony_ci return None 9047db96d56Sopenharmony_ci 9057db96d56Sopenharmony_ci @property 9067db96d56Sopenharmony_ci def comments(self): 9077db96d56Sopenharmony_ci return [] 9087db96d56Sopenharmony_ci 9097db96d56Sopenharmony_ci def __getnewargs__(self): 9107db96d56Sopenharmony_ci return(str(self), self.token_type) 9117db96d56Sopenharmony_ci 9127db96d56Sopenharmony_ci 9137db96d56Sopenharmony_ciclass WhiteSpaceTerminal(Terminal): 9147db96d56Sopenharmony_ci 9157db96d56Sopenharmony_ci @property 9167db96d56Sopenharmony_ci def value(self): 9177db96d56Sopenharmony_ci return ' ' 9187db96d56Sopenharmony_ci 9197db96d56Sopenharmony_ci def startswith_fws(self): 9207db96d56Sopenharmony_ci return True 9217db96d56Sopenharmony_ci 9227db96d56Sopenharmony_ci 9237db96d56Sopenharmony_ciclass ValueTerminal(Terminal): 9247db96d56Sopenharmony_ci 9257db96d56Sopenharmony_ci @property 9267db96d56Sopenharmony_ci def value(self): 9277db96d56Sopenharmony_ci return self 9287db96d56Sopenharmony_ci 9297db96d56Sopenharmony_ci def startswith_fws(self): 9307db96d56Sopenharmony_ci return False 9317db96d56Sopenharmony_ci 9327db96d56Sopenharmony_ci 9337db96d56Sopenharmony_ciclass EWWhiteSpaceTerminal(WhiteSpaceTerminal): 9347db96d56Sopenharmony_ci 9357db96d56Sopenharmony_ci @property 9367db96d56Sopenharmony_ci def value(self): 9377db96d56Sopenharmony_ci return '' 9387db96d56Sopenharmony_ci 9397db96d56Sopenharmony_ci def __str__(self): 9407db96d56Sopenharmony_ci return '' 9417db96d56Sopenharmony_ci 9427db96d56Sopenharmony_ci 9437db96d56Sopenharmony_ciclass _InvalidEwError(errors.HeaderParseError): 9447db96d56Sopenharmony_ci """Invalid encoded word found while parsing headers.""" 9457db96d56Sopenharmony_ci 9467db96d56Sopenharmony_ci 9477db96d56Sopenharmony_ci# XXX these need to become classes and used as instances so 9487db96d56Sopenharmony_ci# that a program can't change them in a parse tree and screw 9497db96d56Sopenharmony_ci# up other parse trees. Maybe should have tests for that, too. 9507db96d56Sopenharmony_ciDOT = ValueTerminal('.', 'dot') 9517db96d56Sopenharmony_ciListSeparator = ValueTerminal(',', 'list-separator') 9527db96d56Sopenharmony_ciRouteComponentMarker = ValueTerminal('@', 'route-component-marker') 9537db96d56Sopenharmony_ci 9547db96d56Sopenharmony_ci# 9557db96d56Sopenharmony_ci# Parser 9567db96d56Sopenharmony_ci# 9577db96d56Sopenharmony_ci 9587db96d56Sopenharmony_ci# Parse strings according to RFC822/2047/2822/5322 rules. 9597db96d56Sopenharmony_ci# 9607db96d56Sopenharmony_ci# This is a stateless parser. Each get_XXX function accepts a string and 9617db96d56Sopenharmony_ci# returns either a Terminal or a TokenList representing the RFC object named 9627db96d56Sopenharmony_ci# by the method and a string containing the remaining unparsed characters 9637db96d56Sopenharmony_ci# from the input. Thus a parser method consumes the next syntactic construct 9647db96d56Sopenharmony_ci# of a given type and returns a token representing the construct plus the 9657db96d56Sopenharmony_ci# unparsed remainder of the input string. 9667db96d56Sopenharmony_ci# 9677db96d56Sopenharmony_ci# For example, if the first element of a structured header is a 'phrase', 9687db96d56Sopenharmony_ci# then: 9697db96d56Sopenharmony_ci# 9707db96d56Sopenharmony_ci# phrase, value = get_phrase(value) 9717db96d56Sopenharmony_ci# 9727db96d56Sopenharmony_ci# returns the complete phrase from the start of the string value, plus any 9737db96d56Sopenharmony_ci# characters left in the string after the phrase is removed. 9747db96d56Sopenharmony_ci 9757db96d56Sopenharmony_ci_wsp_splitter = re.compile(r'([{}]+)'.format(''.join(WSP))).split 9767db96d56Sopenharmony_ci_non_atom_end_matcher = re.compile(r"[^{}]+".format( 9777db96d56Sopenharmony_ci re.escape(''.join(ATOM_ENDS)))).match 9787db96d56Sopenharmony_ci_non_printable_finder = re.compile(r"[\x00-\x20\x7F]").findall 9797db96d56Sopenharmony_ci_non_token_end_matcher = re.compile(r"[^{}]+".format( 9807db96d56Sopenharmony_ci re.escape(''.join(TOKEN_ENDS)))).match 9817db96d56Sopenharmony_ci_non_attribute_end_matcher = re.compile(r"[^{}]+".format( 9827db96d56Sopenharmony_ci re.escape(''.join(ATTRIBUTE_ENDS)))).match 9837db96d56Sopenharmony_ci_non_extended_attribute_end_matcher = re.compile(r"[^{}]+".format( 9847db96d56Sopenharmony_ci re.escape(''.join(EXTENDED_ATTRIBUTE_ENDS)))).match 9857db96d56Sopenharmony_ci 9867db96d56Sopenharmony_cidef _validate_xtext(xtext): 9877db96d56Sopenharmony_ci """If input token contains ASCII non-printables, register a defect.""" 9887db96d56Sopenharmony_ci 9897db96d56Sopenharmony_ci non_printables = _non_printable_finder(xtext) 9907db96d56Sopenharmony_ci if non_printables: 9917db96d56Sopenharmony_ci xtext.defects.append(errors.NonPrintableDefect(non_printables)) 9927db96d56Sopenharmony_ci if utils._has_surrogates(xtext): 9937db96d56Sopenharmony_ci xtext.defects.append(errors.UndecodableBytesDefect( 9947db96d56Sopenharmony_ci "Non-ASCII characters found in header token")) 9957db96d56Sopenharmony_ci 9967db96d56Sopenharmony_cidef _get_ptext_to_endchars(value, endchars): 9977db96d56Sopenharmony_ci """Scan printables/quoted-pairs until endchars and return unquoted ptext. 9987db96d56Sopenharmony_ci 9997db96d56Sopenharmony_ci This function turns a run of qcontent, ccontent-without-comments, or 10007db96d56Sopenharmony_ci dtext-with-quoted-printables into a single string by unquoting any 10017db96d56Sopenharmony_ci quoted printables. It returns the string, the remaining value, and 10027db96d56Sopenharmony_ci a flag that is True iff there were any quoted printables decoded. 10037db96d56Sopenharmony_ci 10047db96d56Sopenharmony_ci """ 10057db96d56Sopenharmony_ci fragment, *remainder = _wsp_splitter(value, 1) 10067db96d56Sopenharmony_ci vchars = [] 10077db96d56Sopenharmony_ci escape = False 10087db96d56Sopenharmony_ci had_qp = False 10097db96d56Sopenharmony_ci for pos in range(len(fragment)): 10107db96d56Sopenharmony_ci if fragment[pos] == '\\': 10117db96d56Sopenharmony_ci if escape: 10127db96d56Sopenharmony_ci escape = False 10137db96d56Sopenharmony_ci had_qp = True 10147db96d56Sopenharmony_ci else: 10157db96d56Sopenharmony_ci escape = True 10167db96d56Sopenharmony_ci continue 10177db96d56Sopenharmony_ci if escape: 10187db96d56Sopenharmony_ci escape = False 10197db96d56Sopenharmony_ci elif fragment[pos] in endchars: 10207db96d56Sopenharmony_ci break 10217db96d56Sopenharmony_ci vchars.append(fragment[pos]) 10227db96d56Sopenharmony_ci else: 10237db96d56Sopenharmony_ci pos = pos + 1 10247db96d56Sopenharmony_ci return ''.join(vchars), ''.join([fragment[pos:]] + remainder), had_qp 10257db96d56Sopenharmony_ci 10267db96d56Sopenharmony_cidef get_fws(value): 10277db96d56Sopenharmony_ci """FWS = 1*WSP 10287db96d56Sopenharmony_ci 10297db96d56Sopenharmony_ci This isn't the RFC definition. We're using fws to represent tokens where 10307db96d56Sopenharmony_ci folding can be done, but when we are parsing the *un*folding has already 10317db96d56Sopenharmony_ci been done so we don't need to watch out for CRLF. 10327db96d56Sopenharmony_ci 10337db96d56Sopenharmony_ci """ 10347db96d56Sopenharmony_ci newvalue = value.lstrip() 10357db96d56Sopenharmony_ci fws = WhiteSpaceTerminal(value[:len(value)-len(newvalue)], 'fws') 10367db96d56Sopenharmony_ci return fws, newvalue 10377db96d56Sopenharmony_ci 10387db96d56Sopenharmony_cidef get_encoded_word(value): 10397db96d56Sopenharmony_ci """ encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" 10407db96d56Sopenharmony_ci 10417db96d56Sopenharmony_ci """ 10427db96d56Sopenharmony_ci ew = EncodedWord() 10437db96d56Sopenharmony_ci if not value.startswith('=?'): 10447db96d56Sopenharmony_ci raise errors.HeaderParseError( 10457db96d56Sopenharmony_ci "expected encoded word but found {}".format(value)) 10467db96d56Sopenharmony_ci tok, *remainder = value[2:].split('?=', 1) 10477db96d56Sopenharmony_ci if tok == value[2:]: 10487db96d56Sopenharmony_ci raise errors.HeaderParseError( 10497db96d56Sopenharmony_ci "expected encoded word but found {}".format(value)) 10507db96d56Sopenharmony_ci remstr = ''.join(remainder) 10517db96d56Sopenharmony_ci if (len(remstr) > 1 and 10527db96d56Sopenharmony_ci remstr[0] in hexdigits and 10537db96d56Sopenharmony_ci remstr[1] in hexdigits and 10547db96d56Sopenharmony_ci tok.count('?') < 2): 10557db96d56Sopenharmony_ci # The ? after the CTE was followed by an encoded word escape (=XX). 10567db96d56Sopenharmony_ci rest, *remainder = remstr.split('?=', 1) 10577db96d56Sopenharmony_ci tok = tok + '?=' + rest 10587db96d56Sopenharmony_ci if len(tok.split()) > 1: 10597db96d56Sopenharmony_ci ew.defects.append(errors.InvalidHeaderDefect( 10607db96d56Sopenharmony_ci "whitespace inside encoded word")) 10617db96d56Sopenharmony_ci ew.cte = value 10627db96d56Sopenharmony_ci value = ''.join(remainder) 10637db96d56Sopenharmony_ci try: 10647db96d56Sopenharmony_ci text, charset, lang, defects = _ew.decode('=?' + tok + '?=') 10657db96d56Sopenharmony_ci except (ValueError, KeyError): 10667db96d56Sopenharmony_ci raise _InvalidEwError( 10677db96d56Sopenharmony_ci "encoded word format invalid: '{}'".format(ew.cte)) 10687db96d56Sopenharmony_ci ew.charset = charset 10697db96d56Sopenharmony_ci ew.lang = lang 10707db96d56Sopenharmony_ci ew.defects.extend(defects) 10717db96d56Sopenharmony_ci while text: 10727db96d56Sopenharmony_ci if text[0] in WSP: 10737db96d56Sopenharmony_ci token, text = get_fws(text) 10747db96d56Sopenharmony_ci ew.append(token) 10757db96d56Sopenharmony_ci continue 10767db96d56Sopenharmony_ci chars, *remainder = _wsp_splitter(text, 1) 10777db96d56Sopenharmony_ci vtext = ValueTerminal(chars, 'vtext') 10787db96d56Sopenharmony_ci _validate_xtext(vtext) 10797db96d56Sopenharmony_ci ew.append(vtext) 10807db96d56Sopenharmony_ci text = ''.join(remainder) 10817db96d56Sopenharmony_ci # Encoded words should be followed by a WS 10827db96d56Sopenharmony_ci if value and value[0] not in WSP: 10837db96d56Sopenharmony_ci ew.defects.append(errors.InvalidHeaderDefect( 10847db96d56Sopenharmony_ci "missing trailing whitespace after encoded-word")) 10857db96d56Sopenharmony_ci return ew, value 10867db96d56Sopenharmony_ci 10877db96d56Sopenharmony_cidef get_unstructured(value): 10887db96d56Sopenharmony_ci """unstructured = (*([FWS] vchar) *WSP) / obs-unstruct 10897db96d56Sopenharmony_ci obs-unstruct = *((*LF *CR *(obs-utext) *LF *CR)) / FWS) 10907db96d56Sopenharmony_ci obs-utext = %d0 / obs-NO-WS-CTL / LF / CR 10917db96d56Sopenharmony_ci 10927db96d56Sopenharmony_ci obs-NO-WS-CTL is control characters except WSP/CR/LF. 10937db96d56Sopenharmony_ci 10947db96d56Sopenharmony_ci So, basically, we have printable runs, plus control characters or nulls in 10957db96d56Sopenharmony_ci the obsolete syntax, separated by whitespace. Since RFC 2047 uses the 10967db96d56Sopenharmony_ci obsolete syntax in its specification, but requires whitespace on either 10977db96d56Sopenharmony_ci side of the encoded words, I can see no reason to need to separate the 10987db96d56Sopenharmony_ci non-printable-non-whitespace from the printable runs if they occur, so we 10997db96d56Sopenharmony_ci parse this into xtext tokens separated by WSP tokens. 11007db96d56Sopenharmony_ci 11017db96d56Sopenharmony_ci Because an 'unstructured' value must by definition constitute the entire 11027db96d56Sopenharmony_ci value, this 'get' routine does not return a remaining value, only the 11037db96d56Sopenharmony_ci parsed TokenList. 11047db96d56Sopenharmony_ci 11057db96d56Sopenharmony_ci """ 11067db96d56Sopenharmony_ci # XXX: but what about bare CR and LF? They might signal the start or 11077db96d56Sopenharmony_ci # end of an encoded word. YAGNI for now, since our current parsers 11087db96d56Sopenharmony_ci # will never send us strings with bare CR or LF. 11097db96d56Sopenharmony_ci 11107db96d56Sopenharmony_ci unstructured = UnstructuredTokenList() 11117db96d56Sopenharmony_ci while value: 11127db96d56Sopenharmony_ci if value[0] in WSP: 11137db96d56Sopenharmony_ci token, value = get_fws(value) 11147db96d56Sopenharmony_ci unstructured.append(token) 11157db96d56Sopenharmony_ci continue 11167db96d56Sopenharmony_ci valid_ew = True 11177db96d56Sopenharmony_ci if value.startswith('=?'): 11187db96d56Sopenharmony_ci try: 11197db96d56Sopenharmony_ci token, value = get_encoded_word(value) 11207db96d56Sopenharmony_ci except _InvalidEwError: 11217db96d56Sopenharmony_ci valid_ew = False 11227db96d56Sopenharmony_ci except errors.HeaderParseError: 11237db96d56Sopenharmony_ci # XXX: Need to figure out how to register defects when 11247db96d56Sopenharmony_ci # appropriate here. 11257db96d56Sopenharmony_ci pass 11267db96d56Sopenharmony_ci else: 11277db96d56Sopenharmony_ci have_ws = True 11287db96d56Sopenharmony_ci if len(unstructured) > 0: 11297db96d56Sopenharmony_ci if unstructured[-1].token_type != 'fws': 11307db96d56Sopenharmony_ci unstructured.defects.append(errors.InvalidHeaderDefect( 11317db96d56Sopenharmony_ci "missing whitespace before encoded word")) 11327db96d56Sopenharmony_ci have_ws = False 11337db96d56Sopenharmony_ci if have_ws and len(unstructured) > 1: 11347db96d56Sopenharmony_ci if unstructured[-2].token_type == 'encoded-word': 11357db96d56Sopenharmony_ci unstructured[-1] = EWWhiteSpaceTerminal( 11367db96d56Sopenharmony_ci unstructured[-1], 'fws') 11377db96d56Sopenharmony_ci unstructured.append(token) 11387db96d56Sopenharmony_ci continue 11397db96d56Sopenharmony_ci tok, *remainder = _wsp_splitter(value, 1) 11407db96d56Sopenharmony_ci # Split in the middle of an atom if there is a rfc2047 encoded word 11417db96d56Sopenharmony_ci # which does not have WSP on both sides. The defect will be registered 11427db96d56Sopenharmony_ci # the next time through the loop. 11437db96d56Sopenharmony_ci # This needs to only be performed when the encoded word is valid; 11447db96d56Sopenharmony_ci # otherwise, performing it on an invalid encoded word can cause 11457db96d56Sopenharmony_ci # the parser to go in an infinite loop. 11467db96d56Sopenharmony_ci if valid_ew and rfc2047_matcher.search(tok): 11477db96d56Sopenharmony_ci tok, *remainder = value.partition('=?') 11487db96d56Sopenharmony_ci vtext = ValueTerminal(tok, 'vtext') 11497db96d56Sopenharmony_ci _validate_xtext(vtext) 11507db96d56Sopenharmony_ci unstructured.append(vtext) 11517db96d56Sopenharmony_ci value = ''.join(remainder) 11527db96d56Sopenharmony_ci return unstructured 11537db96d56Sopenharmony_ci 11547db96d56Sopenharmony_cidef get_qp_ctext(value): 11557db96d56Sopenharmony_ci r"""ctext = <printable ascii except \ ( )> 11567db96d56Sopenharmony_ci 11577db96d56Sopenharmony_ci This is not the RFC ctext, since we are handling nested comments in comment 11587db96d56Sopenharmony_ci and unquoting quoted-pairs here. We allow anything except the '()' 11597db96d56Sopenharmony_ci characters, but if we find any ASCII other than the RFC defined printable 11607db96d56Sopenharmony_ci ASCII, a NonPrintableDefect is added to the token's defects list. Since 11617db96d56Sopenharmony_ci quoted pairs are converted to their unquoted values, what is returned is 11627db96d56Sopenharmony_ci a 'ptext' token. In this case it is a WhiteSpaceTerminal, so it's value 11637db96d56Sopenharmony_ci is ' '. 11647db96d56Sopenharmony_ci 11657db96d56Sopenharmony_ci """ 11667db96d56Sopenharmony_ci ptext, value, _ = _get_ptext_to_endchars(value, '()') 11677db96d56Sopenharmony_ci ptext = WhiteSpaceTerminal(ptext, 'ptext') 11687db96d56Sopenharmony_ci _validate_xtext(ptext) 11697db96d56Sopenharmony_ci return ptext, value 11707db96d56Sopenharmony_ci 11717db96d56Sopenharmony_cidef get_qcontent(value): 11727db96d56Sopenharmony_ci """qcontent = qtext / quoted-pair 11737db96d56Sopenharmony_ci 11747db96d56Sopenharmony_ci We allow anything except the DQUOTE character, but if we find any ASCII 11757db96d56Sopenharmony_ci other than the RFC defined printable ASCII, a NonPrintableDefect is 11767db96d56Sopenharmony_ci added to the token's defects list. Any quoted pairs are converted to their 11777db96d56Sopenharmony_ci unquoted values, so what is returned is a 'ptext' token. In this case it 11787db96d56Sopenharmony_ci is a ValueTerminal. 11797db96d56Sopenharmony_ci 11807db96d56Sopenharmony_ci """ 11817db96d56Sopenharmony_ci ptext, value, _ = _get_ptext_to_endchars(value, '"') 11827db96d56Sopenharmony_ci ptext = ValueTerminal(ptext, 'ptext') 11837db96d56Sopenharmony_ci _validate_xtext(ptext) 11847db96d56Sopenharmony_ci return ptext, value 11857db96d56Sopenharmony_ci 11867db96d56Sopenharmony_cidef get_atext(value): 11877db96d56Sopenharmony_ci """atext = <matches _atext_matcher> 11887db96d56Sopenharmony_ci 11897db96d56Sopenharmony_ci We allow any non-ATOM_ENDS in atext, but add an InvalidATextDefect to 11907db96d56Sopenharmony_ci the token's defects list if we find non-atext characters. 11917db96d56Sopenharmony_ci """ 11927db96d56Sopenharmony_ci m = _non_atom_end_matcher(value) 11937db96d56Sopenharmony_ci if not m: 11947db96d56Sopenharmony_ci raise errors.HeaderParseError( 11957db96d56Sopenharmony_ci "expected atext but found '{}'".format(value)) 11967db96d56Sopenharmony_ci atext = m.group() 11977db96d56Sopenharmony_ci value = value[len(atext):] 11987db96d56Sopenharmony_ci atext = ValueTerminal(atext, 'atext') 11997db96d56Sopenharmony_ci _validate_xtext(atext) 12007db96d56Sopenharmony_ci return atext, value 12017db96d56Sopenharmony_ci 12027db96d56Sopenharmony_cidef get_bare_quoted_string(value): 12037db96d56Sopenharmony_ci """bare-quoted-string = DQUOTE *([FWS] qcontent) [FWS] DQUOTE 12047db96d56Sopenharmony_ci 12057db96d56Sopenharmony_ci A quoted-string without the leading or trailing white space. Its 12067db96d56Sopenharmony_ci value is the text between the quote marks, with whitespace 12077db96d56Sopenharmony_ci preserved and quoted pairs decoded. 12087db96d56Sopenharmony_ci """ 12097db96d56Sopenharmony_ci if value[0] != '"': 12107db96d56Sopenharmony_ci raise errors.HeaderParseError( 12117db96d56Sopenharmony_ci "expected '\"' but found '{}'".format(value)) 12127db96d56Sopenharmony_ci bare_quoted_string = BareQuotedString() 12137db96d56Sopenharmony_ci value = value[1:] 12147db96d56Sopenharmony_ci if value and value[0] == '"': 12157db96d56Sopenharmony_ci token, value = get_qcontent(value) 12167db96d56Sopenharmony_ci bare_quoted_string.append(token) 12177db96d56Sopenharmony_ci while value and value[0] != '"': 12187db96d56Sopenharmony_ci if value[0] in WSP: 12197db96d56Sopenharmony_ci token, value = get_fws(value) 12207db96d56Sopenharmony_ci elif value[:2] == '=?': 12217db96d56Sopenharmony_ci valid_ew = False 12227db96d56Sopenharmony_ci try: 12237db96d56Sopenharmony_ci token, value = get_encoded_word(value) 12247db96d56Sopenharmony_ci bare_quoted_string.defects.append(errors.InvalidHeaderDefect( 12257db96d56Sopenharmony_ci "encoded word inside quoted string")) 12267db96d56Sopenharmony_ci valid_ew = True 12277db96d56Sopenharmony_ci except errors.HeaderParseError: 12287db96d56Sopenharmony_ci token, value = get_qcontent(value) 12297db96d56Sopenharmony_ci # Collapse the whitespace between two encoded words that occur in a 12307db96d56Sopenharmony_ci # bare-quoted-string. 12317db96d56Sopenharmony_ci if valid_ew and len(bare_quoted_string) > 1: 12327db96d56Sopenharmony_ci if (bare_quoted_string[-1].token_type == 'fws' and 12337db96d56Sopenharmony_ci bare_quoted_string[-2].token_type == 'encoded-word'): 12347db96d56Sopenharmony_ci bare_quoted_string[-1] = EWWhiteSpaceTerminal( 12357db96d56Sopenharmony_ci bare_quoted_string[-1], 'fws') 12367db96d56Sopenharmony_ci else: 12377db96d56Sopenharmony_ci token, value = get_qcontent(value) 12387db96d56Sopenharmony_ci bare_quoted_string.append(token) 12397db96d56Sopenharmony_ci if not value: 12407db96d56Sopenharmony_ci bare_quoted_string.defects.append(errors.InvalidHeaderDefect( 12417db96d56Sopenharmony_ci "end of header inside quoted string")) 12427db96d56Sopenharmony_ci return bare_quoted_string, value 12437db96d56Sopenharmony_ci return bare_quoted_string, value[1:] 12447db96d56Sopenharmony_ci 12457db96d56Sopenharmony_cidef get_comment(value): 12467db96d56Sopenharmony_ci """comment = "(" *([FWS] ccontent) [FWS] ")" 12477db96d56Sopenharmony_ci ccontent = ctext / quoted-pair / comment 12487db96d56Sopenharmony_ci 12497db96d56Sopenharmony_ci We handle nested comments here, and quoted-pair in our qp-ctext routine. 12507db96d56Sopenharmony_ci """ 12517db96d56Sopenharmony_ci if value and value[0] != '(': 12527db96d56Sopenharmony_ci raise errors.HeaderParseError( 12537db96d56Sopenharmony_ci "expected '(' but found '{}'".format(value)) 12547db96d56Sopenharmony_ci comment = Comment() 12557db96d56Sopenharmony_ci value = value[1:] 12567db96d56Sopenharmony_ci while value and value[0] != ")": 12577db96d56Sopenharmony_ci if value[0] in WSP: 12587db96d56Sopenharmony_ci token, value = get_fws(value) 12597db96d56Sopenharmony_ci elif value[0] == '(': 12607db96d56Sopenharmony_ci token, value = get_comment(value) 12617db96d56Sopenharmony_ci else: 12627db96d56Sopenharmony_ci token, value = get_qp_ctext(value) 12637db96d56Sopenharmony_ci comment.append(token) 12647db96d56Sopenharmony_ci if not value: 12657db96d56Sopenharmony_ci comment.defects.append(errors.InvalidHeaderDefect( 12667db96d56Sopenharmony_ci "end of header inside comment")) 12677db96d56Sopenharmony_ci return comment, value 12687db96d56Sopenharmony_ci return comment, value[1:] 12697db96d56Sopenharmony_ci 12707db96d56Sopenharmony_cidef get_cfws(value): 12717db96d56Sopenharmony_ci """CFWS = (1*([FWS] comment) [FWS]) / FWS 12727db96d56Sopenharmony_ci 12737db96d56Sopenharmony_ci """ 12747db96d56Sopenharmony_ci cfws = CFWSList() 12757db96d56Sopenharmony_ci while value and value[0] in CFWS_LEADER: 12767db96d56Sopenharmony_ci if value[0] in WSP: 12777db96d56Sopenharmony_ci token, value = get_fws(value) 12787db96d56Sopenharmony_ci else: 12797db96d56Sopenharmony_ci token, value = get_comment(value) 12807db96d56Sopenharmony_ci cfws.append(token) 12817db96d56Sopenharmony_ci return cfws, value 12827db96d56Sopenharmony_ci 12837db96d56Sopenharmony_cidef get_quoted_string(value): 12847db96d56Sopenharmony_ci """quoted-string = [CFWS] <bare-quoted-string> [CFWS] 12857db96d56Sopenharmony_ci 12867db96d56Sopenharmony_ci 'bare-quoted-string' is an intermediate class defined by this 12877db96d56Sopenharmony_ci parser and not by the RFC grammar. It is the quoted string 12887db96d56Sopenharmony_ci without any attached CFWS. 12897db96d56Sopenharmony_ci """ 12907db96d56Sopenharmony_ci quoted_string = QuotedString() 12917db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 12927db96d56Sopenharmony_ci token, value = get_cfws(value) 12937db96d56Sopenharmony_ci quoted_string.append(token) 12947db96d56Sopenharmony_ci token, value = get_bare_quoted_string(value) 12957db96d56Sopenharmony_ci quoted_string.append(token) 12967db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 12977db96d56Sopenharmony_ci token, value = get_cfws(value) 12987db96d56Sopenharmony_ci quoted_string.append(token) 12997db96d56Sopenharmony_ci return quoted_string, value 13007db96d56Sopenharmony_ci 13017db96d56Sopenharmony_cidef get_atom(value): 13027db96d56Sopenharmony_ci """atom = [CFWS] 1*atext [CFWS] 13037db96d56Sopenharmony_ci 13047db96d56Sopenharmony_ci An atom could be an rfc2047 encoded word. 13057db96d56Sopenharmony_ci """ 13067db96d56Sopenharmony_ci atom = Atom() 13077db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 13087db96d56Sopenharmony_ci token, value = get_cfws(value) 13097db96d56Sopenharmony_ci atom.append(token) 13107db96d56Sopenharmony_ci if value and value[0] in ATOM_ENDS: 13117db96d56Sopenharmony_ci raise errors.HeaderParseError( 13127db96d56Sopenharmony_ci "expected atom but found '{}'".format(value)) 13137db96d56Sopenharmony_ci if value.startswith('=?'): 13147db96d56Sopenharmony_ci try: 13157db96d56Sopenharmony_ci token, value = get_encoded_word(value) 13167db96d56Sopenharmony_ci except errors.HeaderParseError: 13177db96d56Sopenharmony_ci # XXX: need to figure out how to register defects when 13187db96d56Sopenharmony_ci # appropriate here. 13197db96d56Sopenharmony_ci token, value = get_atext(value) 13207db96d56Sopenharmony_ci else: 13217db96d56Sopenharmony_ci token, value = get_atext(value) 13227db96d56Sopenharmony_ci atom.append(token) 13237db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 13247db96d56Sopenharmony_ci token, value = get_cfws(value) 13257db96d56Sopenharmony_ci atom.append(token) 13267db96d56Sopenharmony_ci return atom, value 13277db96d56Sopenharmony_ci 13287db96d56Sopenharmony_cidef get_dot_atom_text(value): 13297db96d56Sopenharmony_ci """ dot-text = 1*atext *("." 1*atext) 13307db96d56Sopenharmony_ci 13317db96d56Sopenharmony_ci """ 13327db96d56Sopenharmony_ci dot_atom_text = DotAtomText() 13337db96d56Sopenharmony_ci if not value or value[0] in ATOM_ENDS: 13347db96d56Sopenharmony_ci raise errors.HeaderParseError("expected atom at a start of " 13357db96d56Sopenharmony_ci "dot-atom-text but found '{}'".format(value)) 13367db96d56Sopenharmony_ci while value and value[0] not in ATOM_ENDS: 13377db96d56Sopenharmony_ci token, value = get_atext(value) 13387db96d56Sopenharmony_ci dot_atom_text.append(token) 13397db96d56Sopenharmony_ci if value and value[0] == '.': 13407db96d56Sopenharmony_ci dot_atom_text.append(DOT) 13417db96d56Sopenharmony_ci value = value[1:] 13427db96d56Sopenharmony_ci if dot_atom_text[-1] is DOT: 13437db96d56Sopenharmony_ci raise errors.HeaderParseError("expected atom at end of dot-atom-text " 13447db96d56Sopenharmony_ci "but found '{}'".format('.'+value)) 13457db96d56Sopenharmony_ci return dot_atom_text, value 13467db96d56Sopenharmony_ci 13477db96d56Sopenharmony_cidef get_dot_atom(value): 13487db96d56Sopenharmony_ci """ dot-atom = [CFWS] dot-atom-text [CFWS] 13497db96d56Sopenharmony_ci 13507db96d56Sopenharmony_ci Any place we can have a dot atom, we could instead have an rfc2047 encoded 13517db96d56Sopenharmony_ci word. 13527db96d56Sopenharmony_ci """ 13537db96d56Sopenharmony_ci dot_atom = DotAtom() 13547db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 13557db96d56Sopenharmony_ci token, value = get_cfws(value) 13567db96d56Sopenharmony_ci dot_atom.append(token) 13577db96d56Sopenharmony_ci if value.startswith('=?'): 13587db96d56Sopenharmony_ci try: 13597db96d56Sopenharmony_ci token, value = get_encoded_word(value) 13607db96d56Sopenharmony_ci except errors.HeaderParseError: 13617db96d56Sopenharmony_ci # XXX: need to figure out how to register defects when 13627db96d56Sopenharmony_ci # appropriate here. 13637db96d56Sopenharmony_ci token, value = get_dot_atom_text(value) 13647db96d56Sopenharmony_ci else: 13657db96d56Sopenharmony_ci token, value = get_dot_atom_text(value) 13667db96d56Sopenharmony_ci dot_atom.append(token) 13677db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 13687db96d56Sopenharmony_ci token, value = get_cfws(value) 13697db96d56Sopenharmony_ci dot_atom.append(token) 13707db96d56Sopenharmony_ci return dot_atom, value 13717db96d56Sopenharmony_ci 13727db96d56Sopenharmony_cidef get_word(value): 13737db96d56Sopenharmony_ci """word = atom / quoted-string 13747db96d56Sopenharmony_ci 13757db96d56Sopenharmony_ci Either atom or quoted-string may start with CFWS. We have to peel off this 13767db96d56Sopenharmony_ci CFWS first to determine which type of word to parse. Afterward we splice 13777db96d56Sopenharmony_ci the leading CFWS, if any, into the parsed sub-token. 13787db96d56Sopenharmony_ci 13797db96d56Sopenharmony_ci If neither an atom or a quoted-string is found before the next special, a 13807db96d56Sopenharmony_ci HeaderParseError is raised. 13817db96d56Sopenharmony_ci 13827db96d56Sopenharmony_ci The token returned is either an Atom or a QuotedString, as appropriate. 13837db96d56Sopenharmony_ci This means the 'word' level of the formal grammar is not represented in the 13847db96d56Sopenharmony_ci parse tree; this is because having that extra layer when manipulating the 13857db96d56Sopenharmony_ci parse tree is more confusing than it is helpful. 13867db96d56Sopenharmony_ci 13877db96d56Sopenharmony_ci """ 13887db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 13897db96d56Sopenharmony_ci leader, value = get_cfws(value) 13907db96d56Sopenharmony_ci else: 13917db96d56Sopenharmony_ci leader = None 13927db96d56Sopenharmony_ci if not value: 13937db96d56Sopenharmony_ci raise errors.HeaderParseError( 13947db96d56Sopenharmony_ci "Expected 'atom' or 'quoted-string' but found nothing.") 13957db96d56Sopenharmony_ci if value[0]=='"': 13967db96d56Sopenharmony_ci token, value = get_quoted_string(value) 13977db96d56Sopenharmony_ci elif value[0] in SPECIALS: 13987db96d56Sopenharmony_ci raise errors.HeaderParseError("Expected 'atom' or 'quoted-string' " 13997db96d56Sopenharmony_ci "but found '{}'".format(value)) 14007db96d56Sopenharmony_ci else: 14017db96d56Sopenharmony_ci token, value = get_atom(value) 14027db96d56Sopenharmony_ci if leader is not None: 14037db96d56Sopenharmony_ci token[:0] = [leader] 14047db96d56Sopenharmony_ci return token, value 14057db96d56Sopenharmony_ci 14067db96d56Sopenharmony_cidef get_phrase(value): 14077db96d56Sopenharmony_ci """ phrase = 1*word / obs-phrase 14087db96d56Sopenharmony_ci obs-phrase = word *(word / "." / CFWS) 14097db96d56Sopenharmony_ci 14107db96d56Sopenharmony_ci This means a phrase can be a sequence of words, periods, and CFWS in any 14117db96d56Sopenharmony_ci order as long as it starts with at least one word. If anything other than 14127db96d56Sopenharmony_ci words is detected, an ObsoleteHeaderDefect is added to the token's defect 14137db96d56Sopenharmony_ci list. We also accept a phrase that starts with CFWS followed by a dot; 14147db96d56Sopenharmony_ci this is registered as an InvalidHeaderDefect, since it is not supported by 14157db96d56Sopenharmony_ci even the obsolete grammar. 14167db96d56Sopenharmony_ci 14177db96d56Sopenharmony_ci """ 14187db96d56Sopenharmony_ci phrase = Phrase() 14197db96d56Sopenharmony_ci try: 14207db96d56Sopenharmony_ci token, value = get_word(value) 14217db96d56Sopenharmony_ci phrase.append(token) 14227db96d56Sopenharmony_ci except errors.HeaderParseError: 14237db96d56Sopenharmony_ci phrase.defects.append(errors.InvalidHeaderDefect( 14247db96d56Sopenharmony_ci "phrase does not start with word")) 14257db96d56Sopenharmony_ci while value and value[0] not in PHRASE_ENDS: 14267db96d56Sopenharmony_ci if value[0]=='.': 14277db96d56Sopenharmony_ci phrase.append(DOT) 14287db96d56Sopenharmony_ci phrase.defects.append(errors.ObsoleteHeaderDefect( 14297db96d56Sopenharmony_ci "period in 'phrase'")) 14307db96d56Sopenharmony_ci value = value[1:] 14317db96d56Sopenharmony_ci else: 14327db96d56Sopenharmony_ci try: 14337db96d56Sopenharmony_ci token, value = get_word(value) 14347db96d56Sopenharmony_ci except errors.HeaderParseError: 14357db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 14367db96d56Sopenharmony_ci token, value = get_cfws(value) 14377db96d56Sopenharmony_ci phrase.defects.append(errors.ObsoleteHeaderDefect( 14387db96d56Sopenharmony_ci "comment found without atom")) 14397db96d56Sopenharmony_ci else: 14407db96d56Sopenharmony_ci raise 14417db96d56Sopenharmony_ci phrase.append(token) 14427db96d56Sopenharmony_ci return phrase, value 14437db96d56Sopenharmony_ci 14447db96d56Sopenharmony_cidef get_local_part(value): 14457db96d56Sopenharmony_ci """ local-part = dot-atom / quoted-string / obs-local-part 14467db96d56Sopenharmony_ci 14477db96d56Sopenharmony_ci """ 14487db96d56Sopenharmony_ci local_part = LocalPart() 14497db96d56Sopenharmony_ci leader = None 14507db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 14517db96d56Sopenharmony_ci leader, value = get_cfws(value) 14527db96d56Sopenharmony_ci if not value: 14537db96d56Sopenharmony_ci raise errors.HeaderParseError( 14547db96d56Sopenharmony_ci "expected local-part but found '{}'".format(value)) 14557db96d56Sopenharmony_ci try: 14567db96d56Sopenharmony_ci token, value = get_dot_atom(value) 14577db96d56Sopenharmony_ci except errors.HeaderParseError: 14587db96d56Sopenharmony_ci try: 14597db96d56Sopenharmony_ci token, value = get_word(value) 14607db96d56Sopenharmony_ci except errors.HeaderParseError: 14617db96d56Sopenharmony_ci if value[0] != '\\' and value[0] in PHRASE_ENDS: 14627db96d56Sopenharmony_ci raise 14637db96d56Sopenharmony_ci token = TokenList() 14647db96d56Sopenharmony_ci if leader is not None: 14657db96d56Sopenharmony_ci token[:0] = [leader] 14667db96d56Sopenharmony_ci local_part.append(token) 14677db96d56Sopenharmony_ci if value and (value[0]=='\\' or value[0] not in PHRASE_ENDS): 14687db96d56Sopenharmony_ci obs_local_part, value = get_obs_local_part(str(local_part) + value) 14697db96d56Sopenharmony_ci if obs_local_part.token_type == 'invalid-obs-local-part': 14707db96d56Sopenharmony_ci local_part.defects.append(errors.InvalidHeaderDefect( 14717db96d56Sopenharmony_ci "local-part is not dot-atom, quoted-string, or obs-local-part")) 14727db96d56Sopenharmony_ci else: 14737db96d56Sopenharmony_ci local_part.defects.append(errors.ObsoleteHeaderDefect( 14747db96d56Sopenharmony_ci "local-part is not a dot-atom (contains CFWS)")) 14757db96d56Sopenharmony_ci local_part[0] = obs_local_part 14767db96d56Sopenharmony_ci try: 14777db96d56Sopenharmony_ci local_part.value.encode('ascii') 14787db96d56Sopenharmony_ci except UnicodeEncodeError: 14797db96d56Sopenharmony_ci local_part.defects.append(errors.NonASCIILocalPartDefect( 14807db96d56Sopenharmony_ci "local-part contains non-ASCII characters)")) 14817db96d56Sopenharmony_ci return local_part, value 14827db96d56Sopenharmony_ci 14837db96d56Sopenharmony_cidef get_obs_local_part(value): 14847db96d56Sopenharmony_ci """ obs-local-part = word *("." word) 14857db96d56Sopenharmony_ci """ 14867db96d56Sopenharmony_ci obs_local_part = ObsLocalPart() 14877db96d56Sopenharmony_ci last_non_ws_was_dot = False 14887db96d56Sopenharmony_ci while value and (value[0]=='\\' or value[0] not in PHRASE_ENDS): 14897db96d56Sopenharmony_ci if value[0] == '.': 14907db96d56Sopenharmony_ci if last_non_ws_was_dot: 14917db96d56Sopenharmony_ci obs_local_part.defects.append(errors.InvalidHeaderDefect( 14927db96d56Sopenharmony_ci "invalid repeated '.'")) 14937db96d56Sopenharmony_ci obs_local_part.append(DOT) 14947db96d56Sopenharmony_ci last_non_ws_was_dot = True 14957db96d56Sopenharmony_ci value = value[1:] 14967db96d56Sopenharmony_ci continue 14977db96d56Sopenharmony_ci elif value[0]=='\\': 14987db96d56Sopenharmony_ci obs_local_part.append(ValueTerminal(value[0], 14997db96d56Sopenharmony_ci 'misplaced-special')) 15007db96d56Sopenharmony_ci value = value[1:] 15017db96d56Sopenharmony_ci obs_local_part.defects.append(errors.InvalidHeaderDefect( 15027db96d56Sopenharmony_ci "'\\' character outside of quoted-string/ccontent")) 15037db96d56Sopenharmony_ci last_non_ws_was_dot = False 15047db96d56Sopenharmony_ci continue 15057db96d56Sopenharmony_ci if obs_local_part and obs_local_part[-1].token_type != 'dot': 15067db96d56Sopenharmony_ci obs_local_part.defects.append(errors.InvalidHeaderDefect( 15077db96d56Sopenharmony_ci "missing '.' between words")) 15087db96d56Sopenharmony_ci try: 15097db96d56Sopenharmony_ci token, value = get_word(value) 15107db96d56Sopenharmony_ci last_non_ws_was_dot = False 15117db96d56Sopenharmony_ci except errors.HeaderParseError: 15127db96d56Sopenharmony_ci if value[0] not in CFWS_LEADER: 15137db96d56Sopenharmony_ci raise 15147db96d56Sopenharmony_ci token, value = get_cfws(value) 15157db96d56Sopenharmony_ci obs_local_part.append(token) 15167db96d56Sopenharmony_ci if (obs_local_part[0].token_type == 'dot' or 15177db96d56Sopenharmony_ci obs_local_part[0].token_type=='cfws' and 15187db96d56Sopenharmony_ci obs_local_part[1].token_type=='dot'): 15197db96d56Sopenharmony_ci obs_local_part.defects.append(errors.InvalidHeaderDefect( 15207db96d56Sopenharmony_ci "Invalid leading '.' in local part")) 15217db96d56Sopenharmony_ci if (obs_local_part[-1].token_type == 'dot' or 15227db96d56Sopenharmony_ci obs_local_part[-1].token_type=='cfws' and 15237db96d56Sopenharmony_ci obs_local_part[-2].token_type=='dot'): 15247db96d56Sopenharmony_ci obs_local_part.defects.append(errors.InvalidHeaderDefect( 15257db96d56Sopenharmony_ci "Invalid trailing '.' in local part")) 15267db96d56Sopenharmony_ci if obs_local_part.defects: 15277db96d56Sopenharmony_ci obs_local_part.token_type = 'invalid-obs-local-part' 15287db96d56Sopenharmony_ci return obs_local_part, value 15297db96d56Sopenharmony_ci 15307db96d56Sopenharmony_cidef get_dtext(value): 15317db96d56Sopenharmony_ci r""" dtext = <printable ascii except \ [ ]> / obs-dtext 15327db96d56Sopenharmony_ci obs-dtext = obs-NO-WS-CTL / quoted-pair 15337db96d56Sopenharmony_ci 15347db96d56Sopenharmony_ci We allow anything except the excluded characters, but if we find any 15357db96d56Sopenharmony_ci ASCII other than the RFC defined printable ASCII, a NonPrintableDefect is 15367db96d56Sopenharmony_ci added to the token's defects list. Quoted pairs are converted to their 15377db96d56Sopenharmony_ci unquoted values, so what is returned is a ptext token, in this case a 15387db96d56Sopenharmony_ci ValueTerminal. If there were quoted-printables, an ObsoleteHeaderDefect is 15397db96d56Sopenharmony_ci added to the returned token's defect list. 15407db96d56Sopenharmony_ci 15417db96d56Sopenharmony_ci """ 15427db96d56Sopenharmony_ci ptext, value, had_qp = _get_ptext_to_endchars(value, '[]') 15437db96d56Sopenharmony_ci ptext = ValueTerminal(ptext, 'ptext') 15447db96d56Sopenharmony_ci if had_qp: 15457db96d56Sopenharmony_ci ptext.defects.append(errors.ObsoleteHeaderDefect( 15467db96d56Sopenharmony_ci "quoted printable found in domain-literal")) 15477db96d56Sopenharmony_ci _validate_xtext(ptext) 15487db96d56Sopenharmony_ci return ptext, value 15497db96d56Sopenharmony_ci 15507db96d56Sopenharmony_cidef _check_for_early_dl_end(value, domain_literal): 15517db96d56Sopenharmony_ci if value: 15527db96d56Sopenharmony_ci return False 15537db96d56Sopenharmony_ci domain_literal.append(errors.InvalidHeaderDefect( 15547db96d56Sopenharmony_ci "end of input inside domain-literal")) 15557db96d56Sopenharmony_ci domain_literal.append(ValueTerminal(']', 'domain-literal-end')) 15567db96d56Sopenharmony_ci return True 15577db96d56Sopenharmony_ci 15587db96d56Sopenharmony_cidef get_domain_literal(value): 15597db96d56Sopenharmony_ci """ domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS] 15607db96d56Sopenharmony_ci 15617db96d56Sopenharmony_ci """ 15627db96d56Sopenharmony_ci domain_literal = DomainLiteral() 15637db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 15647db96d56Sopenharmony_ci token, value = get_cfws(value) 15657db96d56Sopenharmony_ci domain_literal.append(token) 15667db96d56Sopenharmony_ci if not value: 15677db96d56Sopenharmony_ci raise errors.HeaderParseError("expected domain-literal") 15687db96d56Sopenharmony_ci if value[0] != '[': 15697db96d56Sopenharmony_ci raise errors.HeaderParseError("expected '[' at start of domain-literal " 15707db96d56Sopenharmony_ci "but found '{}'".format(value)) 15717db96d56Sopenharmony_ci value = value[1:] 15727db96d56Sopenharmony_ci if _check_for_early_dl_end(value, domain_literal): 15737db96d56Sopenharmony_ci return domain_literal, value 15747db96d56Sopenharmony_ci domain_literal.append(ValueTerminal('[', 'domain-literal-start')) 15757db96d56Sopenharmony_ci if value[0] in WSP: 15767db96d56Sopenharmony_ci token, value = get_fws(value) 15777db96d56Sopenharmony_ci domain_literal.append(token) 15787db96d56Sopenharmony_ci token, value = get_dtext(value) 15797db96d56Sopenharmony_ci domain_literal.append(token) 15807db96d56Sopenharmony_ci if _check_for_early_dl_end(value, domain_literal): 15817db96d56Sopenharmony_ci return domain_literal, value 15827db96d56Sopenharmony_ci if value[0] in WSP: 15837db96d56Sopenharmony_ci token, value = get_fws(value) 15847db96d56Sopenharmony_ci domain_literal.append(token) 15857db96d56Sopenharmony_ci if _check_for_early_dl_end(value, domain_literal): 15867db96d56Sopenharmony_ci return domain_literal, value 15877db96d56Sopenharmony_ci if value[0] != ']': 15887db96d56Sopenharmony_ci raise errors.HeaderParseError("expected ']' at end of domain-literal " 15897db96d56Sopenharmony_ci "but found '{}'".format(value)) 15907db96d56Sopenharmony_ci domain_literal.append(ValueTerminal(']', 'domain-literal-end')) 15917db96d56Sopenharmony_ci value = value[1:] 15927db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 15937db96d56Sopenharmony_ci token, value = get_cfws(value) 15947db96d56Sopenharmony_ci domain_literal.append(token) 15957db96d56Sopenharmony_ci return domain_literal, value 15967db96d56Sopenharmony_ci 15977db96d56Sopenharmony_cidef get_domain(value): 15987db96d56Sopenharmony_ci """ domain = dot-atom / domain-literal / obs-domain 15997db96d56Sopenharmony_ci obs-domain = atom *("." atom)) 16007db96d56Sopenharmony_ci 16017db96d56Sopenharmony_ci """ 16027db96d56Sopenharmony_ci domain = Domain() 16037db96d56Sopenharmony_ci leader = None 16047db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 16057db96d56Sopenharmony_ci leader, value = get_cfws(value) 16067db96d56Sopenharmony_ci if not value: 16077db96d56Sopenharmony_ci raise errors.HeaderParseError( 16087db96d56Sopenharmony_ci "expected domain but found '{}'".format(value)) 16097db96d56Sopenharmony_ci if value[0] == '[': 16107db96d56Sopenharmony_ci token, value = get_domain_literal(value) 16117db96d56Sopenharmony_ci if leader is not None: 16127db96d56Sopenharmony_ci token[:0] = [leader] 16137db96d56Sopenharmony_ci domain.append(token) 16147db96d56Sopenharmony_ci return domain, value 16157db96d56Sopenharmony_ci try: 16167db96d56Sopenharmony_ci token, value = get_dot_atom(value) 16177db96d56Sopenharmony_ci except errors.HeaderParseError: 16187db96d56Sopenharmony_ci token, value = get_atom(value) 16197db96d56Sopenharmony_ci if value and value[0] == '@': 16207db96d56Sopenharmony_ci raise errors.HeaderParseError('Invalid Domain') 16217db96d56Sopenharmony_ci if leader is not None: 16227db96d56Sopenharmony_ci token[:0] = [leader] 16237db96d56Sopenharmony_ci domain.append(token) 16247db96d56Sopenharmony_ci if value and value[0] == '.': 16257db96d56Sopenharmony_ci domain.defects.append(errors.ObsoleteHeaderDefect( 16267db96d56Sopenharmony_ci "domain is not a dot-atom (contains CFWS)")) 16277db96d56Sopenharmony_ci if domain[0].token_type == 'dot-atom': 16287db96d56Sopenharmony_ci domain[:] = domain[0] 16297db96d56Sopenharmony_ci while value and value[0] == '.': 16307db96d56Sopenharmony_ci domain.append(DOT) 16317db96d56Sopenharmony_ci token, value = get_atom(value[1:]) 16327db96d56Sopenharmony_ci domain.append(token) 16337db96d56Sopenharmony_ci return domain, value 16347db96d56Sopenharmony_ci 16357db96d56Sopenharmony_cidef get_addr_spec(value): 16367db96d56Sopenharmony_ci """ addr-spec = local-part "@" domain 16377db96d56Sopenharmony_ci 16387db96d56Sopenharmony_ci """ 16397db96d56Sopenharmony_ci addr_spec = AddrSpec() 16407db96d56Sopenharmony_ci token, value = get_local_part(value) 16417db96d56Sopenharmony_ci addr_spec.append(token) 16427db96d56Sopenharmony_ci if not value or value[0] != '@': 16437db96d56Sopenharmony_ci addr_spec.defects.append(errors.InvalidHeaderDefect( 16447db96d56Sopenharmony_ci "addr-spec local part with no domain")) 16457db96d56Sopenharmony_ci return addr_spec, value 16467db96d56Sopenharmony_ci addr_spec.append(ValueTerminal('@', 'address-at-symbol')) 16477db96d56Sopenharmony_ci token, value = get_domain(value[1:]) 16487db96d56Sopenharmony_ci addr_spec.append(token) 16497db96d56Sopenharmony_ci return addr_spec, value 16507db96d56Sopenharmony_ci 16517db96d56Sopenharmony_cidef get_obs_route(value): 16527db96d56Sopenharmony_ci """ obs-route = obs-domain-list ":" 16537db96d56Sopenharmony_ci obs-domain-list = *(CFWS / ",") "@" domain *("," [CFWS] ["@" domain]) 16547db96d56Sopenharmony_ci 16557db96d56Sopenharmony_ci Returns an obs-route token with the appropriate sub-tokens (that is, 16567db96d56Sopenharmony_ci there is no obs-domain-list in the parse tree). 16577db96d56Sopenharmony_ci """ 16587db96d56Sopenharmony_ci obs_route = ObsRoute() 16597db96d56Sopenharmony_ci while value and (value[0]==',' or value[0] in CFWS_LEADER): 16607db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 16617db96d56Sopenharmony_ci token, value = get_cfws(value) 16627db96d56Sopenharmony_ci obs_route.append(token) 16637db96d56Sopenharmony_ci elif value[0] == ',': 16647db96d56Sopenharmony_ci obs_route.append(ListSeparator) 16657db96d56Sopenharmony_ci value = value[1:] 16667db96d56Sopenharmony_ci if not value or value[0] != '@': 16677db96d56Sopenharmony_ci raise errors.HeaderParseError( 16687db96d56Sopenharmony_ci "expected obs-route domain but found '{}'".format(value)) 16697db96d56Sopenharmony_ci obs_route.append(RouteComponentMarker) 16707db96d56Sopenharmony_ci token, value = get_domain(value[1:]) 16717db96d56Sopenharmony_ci obs_route.append(token) 16727db96d56Sopenharmony_ci while value and value[0]==',': 16737db96d56Sopenharmony_ci obs_route.append(ListSeparator) 16747db96d56Sopenharmony_ci value = value[1:] 16757db96d56Sopenharmony_ci if not value: 16767db96d56Sopenharmony_ci break 16777db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 16787db96d56Sopenharmony_ci token, value = get_cfws(value) 16797db96d56Sopenharmony_ci obs_route.append(token) 16807db96d56Sopenharmony_ci if value[0] == '@': 16817db96d56Sopenharmony_ci obs_route.append(RouteComponentMarker) 16827db96d56Sopenharmony_ci token, value = get_domain(value[1:]) 16837db96d56Sopenharmony_ci obs_route.append(token) 16847db96d56Sopenharmony_ci if not value: 16857db96d56Sopenharmony_ci raise errors.HeaderParseError("end of header while parsing obs-route") 16867db96d56Sopenharmony_ci if value[0] != ':': 16877db96d56Sopenharmony_ci raise errors.HeaderParseError( "expected ':' marking end of " 16887db96d56Sopenharmony_ci "obs-route but found '{}'".format(value)) 16897db96d56Sopenharmony_ci obs_route.append(ValueTerminal(':', 'end-of-obs-route-marker')) 16907db96d56Sopenharmony_ci return obs_route, value[1:] 16917db96d56Sopenharmony_ci 16927db96d56Sopenharmony_cidef get_angle_addr(value): 16937db96d56Sopenharmony_ci """ angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr 16947db96d56Sopenharmony_ci obs-angle-addr = [CFWS] "<" obs-route addr-spec ">" [CFWS] 16957db96d56Sopenharmony_ci 16967db96d56Sopenharmony_ci """ 16977db96d56Sopenharmony_ci angle_addr = AngleAddr() 16987db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 16997db96d56Sopenharmony_ci token, value = get_cfws(value) 17007db96d56Sopenharmony_ci angle_addr.append(token) 17017db96d56Sopenharmony_ci if not value or value[0] != '<': 17027db96d56Sopenharmony_ci raise errors.HeaderParseError( 17037db96d56Sopenharmony_ci "expected angle-addr but found '{}'".format(value)) 17047db96d56Sopenharmony_ci angle_addr.append(ValueTerminal('<', 'angle-addr-start')) 17057db96d56Sopenharmony_ci value = value[1:] 17067db96d56Sopenharmony_ci # Although it is not legal per RFC5322, SMTP uses '<>' in certain 17077db96d56Sopenharmony_ci # circumstances. 17087db96d56Sopenharmony_ci if value[0] == '>': 17097db96d56Sopenharmony_ci angle_addr.append(ValueTerminal('>', 'angle-addr-end')) 17107db96d56Sopenharmony_ci angle_addr.defects.append(errors.InvalidHeaderDefect( 17117db96d56Sopenharmony_ci "null addr-spec in angle-addr")) 17127db96d56Sopenharmony_ci value = value[1:] 17137db96d56Sopenharmony_ci return angle_addr, value 17147db96d56Sopenharmony_ci try: 17157db96d56Sopenharmony_ci token, value = get_addr_spec(value) 17167db96d56Sopenharmony_ci except errors.HeaderParseError: 17177db96d56Sopenharmony_ci try: 17187db96d56Sopenharmony_ci token, value = get_obs_route(value) 17197db96d56Sopenharmony_ci angle_addr.defects.append(errors.ObsoleteHeaderDefect( 17207db96d56Sopenharmony_ci "obsolete route specification in angle-addr")) 17217db96d56Sopenharmony_ci except errors.HeaderParseError: 17227db96d56Sopenharmony_ci raise errors.HeaderParseError( 17237db96d56Sopenharmony_ci "expected addr-spec or obs-route but found '{}'".format(value)) 17247db96d56Sopenharmony_ci angle_addr.append(token) 17257db96d56Sopenharmony_ci token, value = get_addr_spec(value) 17267db96d56Sopenharmony_ci angle_addr.append(token) 17277db96d56Sopenharmony_ci if value and value[0] == '>': 17287db96d56Sopenharmony_ci value = value[1:] 17297db96d56Sopenharmony_ci else: 17307db96d56Sopenharmony_ci angle_addr.defects.append(errors.InvalidHeaderDefect( 17317db96d56Sopenharmony_ci "missing trailing '>' on angle-addr")) 17327db96d56Sopenharmony_ci angle_addr.append(ValueTerminal('>', 'angle-addr-end')) 17337db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 17347db96d56Sopenharmony_ci token, value = get_cfws(value) 17357db96d56Sopenharmony_ci angle_addr.append(token) 17367db96d56Sopenharmony_ci return angle_addr, value 17377db96d56Sopenharmony_ci 17387db96d56Sopenharmony_cidef get_display_name(value): 17397db96d56Sopenharmony_ci """ display-name = phrase 17407db96d56Sopenharmony_ci 17417db96d56Sopenharmony_ci Because this is simply a name-rule, we don't return a display-name 17427db96d56Sopenharmony_ci token containing a phrase, but rather a display-name token with 17437db96d56Sopenharmony_ci the content of the phrase. 17447db96d56Sopenharmony_ci 17457db96d56Sopenharmony_ci """ 17467db96d56Sopenharmony_ci display_name = DisplayName() 17477db96d56Sopenharmony_ci token, value = get_phrase(value) 17487db96d56Sopenharmony_ci display_name.extend(token[:]) 17497db96d56Sopenharmony_ci display_name.defects = token.defects[:] 17507db96d56Sopenharmony_ci return display_name, value 17517db96d56Sopenharmony_ci 17527db96d56Sopenharmony_ci 17537db96d56Sopenharmony_cidef get_name_addr(value): 17547db96d56Sopenharmony_ci """ name-addr = [display-name] angle-addr 17557db96d56Sopenharmony_ci 17567db96d56Sopenharmony_ci """ 17577db96d56Sopenharmony_ci name_addr = NameAddr() 17587db96d56Sopenharmony_ci # Both the optional display name and the angle-addr can start with cfws. 17597db96d56Sopenharmony_ci leader = None 17607db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 17617db96d56Sopenharmony_ci leader, value = get_cfws(value) 17627db96d56Sopenharmony_ci if not value: 17637db96d56Sopenharmony_ci raise errors.HeaderParseError( 17647db96d56Sopenharmony_ci "expected name-addr but found '{}'".format(leader)) 17657db96d56Sopenharmony_ci if value[0] != '<': 17667db96d56Sopenharmony_ci if value[0] in PHRASE_ENDS: 17677db96d56Sopenharmony_ci raise errors.HeaderParseError( 17687db96d56Sopenharmony_ci "expected name-addr but found '{}'".format(value)) 17697db96d56Sopenharmony_ci token, value = get_display_name(value) 17707db96d56Sopenharmony_ci if not value: 17717db96d56Sopenharmony_ci raise errors.HeaderParseError( 17727db96d56Sopenharmony_ci "expected name-addr but found '{}'".format(token)) 17737db96d56Sopenharmony_ci if leader is not None: 17747db96d56Sopenharmony_ci token[0][:0] = [leader] 17757db96d56Sopenharmony_ci leader = None 17767db96d56Sopenharmony_ci name_addr.append(token) 17777db96d56Sopenharmony_ci token, value = get_angle_addr(value) 17787db96d56Sopenharmony_ci if leader is not None: 17797db96d56Sopenharmony_ci token[:0] = [leader] 17807db96d56Sopenharmony_ci name_addr.append(token) 17817db96d56Sopenharmony_ci return name_addr, value 17827db96d56Sopenharmony_ci 17837db96d56Sopenharmony_cidef get_mailbox(value): 17847db96d56Sopenharmony_ci """ mailbox = name-addr / addr-spec 17857db96d56Sopenharmony_ci 17867db96d56Sopenharmony_ci """ 17877db96d56Sopenharmony_ci # The only way to figure out if we are dealing with a name-addr or an 17887db96d56Sopenharmony_ci # addr-spec is to try parsing each one. 17897db96d56Sopenharmony_ci mailbox = Mailbox() 17907db96d56Sopenharmony_ci try: 17917db96d56Sopenharmony_ci token, value = get_name_addr(value) 17927db96d56Sopenharmony_ci except errors.HeaderParseError: 17937db96d56Sopenharmony_ci try: 17947db96d56Sopenharmony_ci token, value = get_addr_spec(value) 17957db96d56Sopenharmony_ci except errors.HeaderParseError: 17967db96d56Sopenharmony_ci raise errors.HeaderParseError( 17977db96d56Sopenharmony_ci "expected mailbox but found '{}'".format(value)) 17987db96d56Sopenharmony_ci if any(isinstance(x, errors.InvalidHeaderDefect) 17997db96d56Sopenharmony_ci for x in token.all_defects): 18007db96d56Sopenharmony_ci mailbox.token_type = 'invalid-mailbox' 18017db96d56Sopenharmony_ci mailbox.append(token) 18027db96d56Sopenharmony_ci return mailbox, value 18037db96d56Sopenharmony_ci 18047db96d56Sopenharmony_cidef get_invalid_mailbox(value, endchars): 18057db96d56Sopenharmony_ci """ Read everything up to one of the chars in endchars. 18067db96d56Sopenharmony_ci 18077db96d56Sopenharmony_ci This is outside the formal grammar. The InvalidMailbox TokenList that is 18087db96d56Sopenharmony_ci returned acts like a Mailbox, but the data attributes are None. 18097db96d56Sopenharmony_ci 18107db96d56Sopenharmony_ci """ 18117db96d56Sopenharmony_ci invalid_mailbox = InvalidMailbox() 18127db96d56Sopenharmony_ci while value and value[0] not in endchars: 18137db96d56Sopenharmony_ci if value[0] in PHRASE_ENDS: 18147db96d56Sopenharmony_ci invalid_mailbox.append(ValueTerminal(value[0], 18157db96d56Sopenharmony_ci 'misplaced-special')) 18167db96d56Sopenharmony_ci value = value[1:] 18177db96d56Sopenharmony_ci else: 18187db96d56Sopenharmony_ci token, value = get_phrase(value) 18197db96d56Sopenharmony_ci invalid_mailbox.append(token) 18207db96d56Sopenharmony_ci return invalid_mailbox, value 18217db96d56Sopenharmony_ci 18227db96d56Sopenharmony_cidef get_mailbox_list(value): 18237db96d56Sopenharmony_ci """ mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list 18247db96d56Sopenharmony_ci obs-mbox-list = *([CFWS] ",") mailbox *("," [mailbox / CFWS]) 18257db96d56Sopenharmony_ci 18267db96d56Sopenharmony_ci For this routine we go outside the formal grammar in order to improve error 18277db96d56Sopenharmony_ci handling. We recognize the end of the mailbox list only at the end of the 18287db96d56Sopenharmony_ci value or at a ';' (the group terminator). This is so that we can turn 18297db96d56Sopenharmony_ci invalid mailboxes into InvalidMailbox tokens and continue parsing any 18307db96d56Sopenharmony_ci remaining valid mailboxes. We also allow all mailbox entries to be null, 18317db96d56Sopenharmony_ci and this condition is handled appropriately at a higher level. 18327db96d56Sopenharmony_ci 18337db96d56Sopenharmony_ci """ 18347db96d56Sopenharmony_ci mailbox_list = MailboxList() 18357db96d56Sopenharmony_ci while value and value[0] != ';': 18367db96d56Sopenharmony_ci try: 18377db96d56Sopenharmony_ci token, value = get_mailbox(value) 18387db96d56Sopenharmony_ci mailbox_list.append(token) 18397db96d56Sopenharmony_ci except errors.HeaderParseError: 18407db96d56Sopenharmony_ci leader = None 18417db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 18427db96d56Sopenharmony_ci leader, value = get_cfws(value) 18437db96d56Sopenharmony_ci if not value or value[0] in ',;': 18447db96d56Sopenharmony_ci mailbox_list.append(leader) 18457db96d56Sopenharmony_ci mailbox_list.defects.append(errors.ObsoleteHeaderDefect( 18467db96d56Sopenharmony_ci "empty element in mailbox-list")) 18477db96d56Sopenharmony_ci else: 18487db96d56Sopenharmony_ci token, value = get_invalid_mailbox(value, ',;') 18497db96d56Sopenharmony_ci if leader is not None: 18507db96d56Sopenharmony_ci token[:0] = [leader] 18517db96d56Sopenharmony_ci mailbox_list.append(token) 18527db96d56Sopenharmony_ci mailbox_list.defects.append(errors.InvalidHeaderDefect( 18537db96d56Sopenharmony_ci "invalid mailbox in mailbox-list")) 18547db96d56Sopenharmony_ci elif value[0] == ',': 18557db96d56Sopenharmony_ci mailbox_list.defects.append(errors.ObsoleteHeaderDefect( 18567db96d56Sopenharmony_ci "empty element in mailbox-list")) 18577db96d56Sopenharmony_ci else: 18587db96d56Sopenharmony_ci token, value = get_invalid_mailbox(value, ',;') 18597db96d56Sopenharmony_ci if leader is not None: 18607db96d56Sopenharmony_ci token[:0] = [leader] 18617db96d56Sopenharmony_ci mailbox_list.append(token) 18627db96d56Sopenharmony_ci mailbox_list.defects.append(errors.InvalidHeaderDefect( 18637db96d56Sopenharmony_ci "invalid mailbox in mailbox-list")) 18647db96d56Sopenharmony_ci if value and value[0] not in ',;': 18657db96d56Sopenharmony_ci # Crap after mailbox; treat it as an invalid mailbox. 18667db96d56Sopenharmony_ci # The mailbox info will still be available. 18677db96d56Sopenharmony_ci mailbox = mailbox_list[-1] 18687db96d56Sopenharmony_ci mailbox.token_type = 'invalid-mailbox' 18697db96d56Sopenharmony_ci token, value = get_invalid_mailbox(value, ',;') 18707db96d56Sopenharmony_ci mailbox.extend(token) 18717db96d56Sopenharmony_ci mailbox_list.defects.append(errors.InvalidHeaderDefect( 18727db96d56Sopenharmony_ci "invalid mailbox in mailbox-list")) 18737db96d56Sopenharmony_ci if value and value[0] == ',': 18747db96d56Sopenharmony_ci mailbox_list.append(ListSeparator) 18757db96d56Sopenharmony_ci value = value[1:] 18767db96d56Sopenharmony_ci return mailbox_list, value 18777db96d56Sopenharmony_ci 18787db96d56Sopenharmony_ci 18797db96d56Sopenharmony_cidef get_group_list(value): 18807db96d56Sopenharmony_ci """ group-list = mailbox-list / CFWS / obs-group-list 18817db96d56Sopenharmony_ci obs-group-list = 1*([CFWS] ",") [CFWS] 18827db96d56Sopenharmony_ci 18837db96d56Sopenharmony_ci """ 18847db96d56Sopenharmony_ci group_list = GroupList() 18857db96d56Sopenharmony_ci if not value: 18867db96d56Sopenharmony_ci group_list.defects.append(errors.InvalidHeaderDefect( 18877db96d56Sopenharmony_ci "end of header before group-list")) 18887db96d56Sopenharmony_ci return group_list, value 18897db96d56Sopenharmony_ci leader = None 18907db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 18917db96d56Sopenharmony_ci leader, value = get_cfws(value) 18927db96d56Sopenharmony_ci if not value: 18937db96d56Sopenharmony_ci # This should never happen in email parsing, since CFWS-only is a 18947db96d56Sopenharmony_ci # legal alternative to group-list in a group, which is the only 18957db96d56Sopenharmony_ci # place group-list appears. 18967db96d56Sopenharmony_ci group_list.defects.append(errors.InvalidHeaderDefect( 18977db96d56Sopenharmony_ci "end of header in group-list")) 18987db96d56Sopenharmony_ci group_list.append(leader) 18997db96d56Sopenharmony_ci return group_list, value 19007db96d56Sopenharmony_ci if value[0] == ';': 19017db96d56Sopenharmony_ci group_list.append(leader) 19027db96d56Sopenharmony_ci return group_list, value 19037db96d56Sopenharmony_ci token, value = get_mailbox_list(value) 19047db96d56Sopenharmony_ci if len(token.all_mailboxes)==0: 19057db96d56Sopenharmony_ci if leader is not None: 19067db96d56Sopenharmony_ci group_list.append(leader) 19077db96d56Sopenharmony_ci group_list.extend(token) 19087db96d56Sopenharmony_ci group_list.defects.append(errors.ObsoleteHeaderDefect( 19097db96d56Sopenharmony_ci "group-list with empty entries")) 19107db96d56Sopenharmony_ci return group_list, value 19117db96d56Sopenharmony_ci if leader is not None: 19127db96d56Sopenharmony_ci token[:0] = [leader] 19137db96d56Sopenharmony_ci group_list.append(token) 19147db96d56Sopenharmony_ci return group_list, value 19157db96d56Sopenharmony_ci 19167db96d56Sopenharmony_cidef get_group(value): 19177db96d56Sopenharmony_ci """ group = display-name ":" [group-list] ";" [CFWS] 19187db96d56Sopenharmony_ci 19197db96d56Sopenharmony_ci """ 19207db96d56Sopenharmony_ci group = Group() 19217db96d56Sopenharmony_ci token, value = get_display_name(value) 19227db96d56Sopenharmony_ci if not value or value[0] != ':': 19237db96d56Sopenharmony_ci raise errors.HeaderParseError("expected ':' at end of group " 19247db96d56Sopenharmony_ci "display name but found '{}'".format(value)) 19257db96d56Sopenharmony_ci group.append(token) 19267db96d56Sopenharmony_ci group.append(ValueTerminal(':', 'group-display-name-terminator')) 19277db96d56Sopenharmony_ci value = value[1:] 19287db96d56Sopenharmony_ci if value and value[0] == ';': 19297db96d56Sopenharmony_ci group.append(ValueTerminal(';', 'group-terminator')) 19307db96d56Sopenharmony_ci return group, value[1:] 19317db96d56Sopenharmony_ci token, value = get_group_list(value) 19327db96d56Sopenharmony_ci group.append(token) 19337db96d56Sopenharmony_ci if not value: 19347db96d56Sopenharmony_ci group.defects.append(errors.InvalidHeaderDefect( 19357db96d56Sopenharmony_ci "end of header in group")) 19367db96d56Sopenharmony_ci elif value[0] != ';': 19377db96d56Sopenharmony_ci raise errors.HeaderParseError( 19387db96d56Sopenharmony_ci "expected ';' at end of group but found {}".format(value)) 19397db96d56Sopenharmony_ci group.append(ValueTerminal(';', 'group-terminator')) 19407db96d56Sopenharmony_ci value = value[1:] 19417db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 19427db96d56Sopenharmony_ci token, value = get_cfws(value) 19437db96d56Sopenharmony_ci group.append(token) 19447db96d56Sopenharmony_ci return group, value 19457db96d56Sopenharmony_ci 19467db96d56Sopenharmony_cidef get_address(value): 19477db96d56Sopenharmony_ci """ address = mailbox / group 19487db96d56Sopenharmony_ci 19497db96d56Sopenharmony_ci Note that counter-intuitively, an address can be either a single address or 19507db96d56Sopenharmony_ci a list of addresses (a group). This is why the returned Address object has 19517db96d56Sopenharmony_ci a 'mailboxes' attribute which treats a single address as a list of length 19527db96d56Sopenharmony_ci one. When you need to differentiate between to two cases, extract the single 19537db96d56Sopenharmony_ci element, which is either a mailbox or a group token. 19547db96d56Sopenharmony_ci 19557db96d56Sopenharmony_ci """ 19567db96d56Sopenharmony_ci # The formal grammar isn't very helpful when parsing an address. mailbox 19577db96d56Sopenharmony_ci # and group, especially when allowing for obsolete forms, start off very 19587db96d56Sopenharmony_ci # similarly. It is only when you reach one of @, <, or : that you know 19597db96d56Sopenharmony_ci # what you've got. So, we try each one in turn, starting with the more 19607db96d56Sopenharmony_ci # likely of the two. We could perhaps make this more efficient by looking 19617db96d56Sopenharmony_ci # for a phrase and then branching based on the next character, but that 19627db96d56Sopenharmony_ci # would be a premature optimization. 19637db96d56Sopenharmony_ci address = Address() 19647db96d56Sopenharmony_ci try: 19657db96d56Sopenharmony_ci token, value = get_group(value) 19667db96d56Sopenharmony_ci except errors.HeaderParseError: 19677db96d56Sopenharmony_ci try: 19687db96d56Sopenharmony_ci token, value = get_mailbox(value) 19697db96d56Sopenharmony_ci except errors.HeaderParseError: 19707db96d56Sopenharmony_ci raise errors.HeaderParseError( 19717db96d56Sopenharmony_ci "expected address but found '{}'".format(value)) 19727db96d56Sopenharmony_ci address.append(token) 19737db96d56Sopenharmony_ci return address, value 19747db96d56Sopenharmony_ci 19757db96d56Sopenharmony_cidef get_address_list(value): 19767db96d56Sopenharmony_ci """ address_list = (address *("," address)) / obs-addr-list 19777db96d56Sopenharmony_ci obs-addr-list = *([CFWS] ",") address *("," [address / CFWS]) 19787db96d56Sopenharmony_ci 19797db96d56Sopenharmony_ci We depart from the formal grammar here by continuing to parse until the end 19807db96d56Sopenharmony_ci of the input, assuming the input to be entirely composed of an 19817db96d56Sopenharmony_ci address-list. This is always true in email parsing, and allows us 19827db96d56Sopenharmony_ci to skip invalid addresses to parse additional valid ones. 19837db96d56Sopenharmony_ci 19847db96d56Sopenharmony_ci """ 19857db96d56Sopenharmony_ci address_list = AddressList() 19867db96d56Sopenharmony_ci while value: 19877db96d56Sopenharmony_ci try: 19887db96d56Sopenharmony_ci token, value = get_address(value) 19897db96d56Sopenharmony_ci address_list.append(token) 19907db96d56Sopenharmony_ci except errors.HeaderParseError as err: 19917db96d56Sopenharmony_ci leader = None 19927db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 19937db96d56Sopenharmony_ci leader, value = get_cfws(value) 19947db96d56Sopenharmony_ci if not value or value[0] == ',': 19957db96d56Sopenharmony_ci address_list.append(leader) 19967db96d56Sopenharmony_ci address_list.defects.append(errors.ObsoleteHeaderDefect( 19977db96d56Sopenharmony_ci "address-list entry with no content")) 19987db96d56Sopenharmony_ci else: 19997db96d56Sopenharmony_ci token, value = get_invalid_mailbox(value, ',') 20007db96d56Sopenharmony_ci if leader is not None: 20017db96d56Sopenharmony_ci token[:0] = [leader] 20027db96d56Sopenharmony_ci address_list.append(Address([token])) 20037db96d56Sopenharmony_ci address_list.defects.append(errors.InvalidHeaderDefect( 20047db96d56Sopenharmony_ci "invalid address in address-list")) 20057db96d56Sopenharmony_ci elif value[0] == ',': 20067db96d56Sopenharmony_ci address_list.defects.append(errors.ObsoleteHeaderDefect( 20077db96d56Sopenharmony_ci "empty element in address-list")) 20087db96d56Sopenharmony_ci else: 20097db96d56Sopenharmony_ci token, value = get_invalid_mailbox(value, ',') 20107db96d56Sopenharmony_ci if leader is not None: 20117db96d56Sopenharmony_ci token[:0] = [leader] 20127db96d56Sopenharmony_ci address_list.append(Address([token])) 20137db96d56Sopenharmony_ci address_list.defects.append(errors.InvalidHeaderDefect( 20147db96d56Sopenharmony_ci "invalid address in address-list")) 20157db96d56Sopenharmony_ci if value and value[0] != ',': 20167db96d56Sopenharmony_ci # Crap after address; treat it as an invalid mailbox. 20177db96d56Sopenharmony_ci # The mailbox info will still be available. 20187db96d56Sopenharmony_ci mailbox = address_list[-1][0] 20197db96d56Sopenharmony_ci mailbox.token_type = 'invalid-mailbox' 20207db96d56Sopenharmony_ci token, value = get_invalid_mailbox(value, ',') 20217db96d56Sopenharmony_ci mailbox.extend(token) 20227db96d56Sopenharmony_ci address_list.defects.append(errors.InvalidHeaderDefect( 20237db96d56Sopenharmony_ci "invalid address in address-list")) 20247db96d56Sopenharmony_ci if value: # Must be a , at this point. 20257db96d56Sopenharmony_ci address_list.append(ValueTerminal(',', 'list-separator')) 20267db96d56Sopenharmony_ci value = value[1:] 20277db96d56Sopenharmony_ci return address_list, value 20287db96d56Sopenharmony_ci 20297db96d56Sopenharmony_ci 20307db96d56Sopenharmony_cidef get_no_fold_literal(value): 20317db96d56Sopenharmony_ci """ no-fold-literal = "[" *dtext "]" 20327db96d56Sopenharmony_ci """ 20337db96d56Sopenharmony_ci no_fold_literal = NoFoldLiteral() 20347db96d56Sopenharmony_ci if not value: 20357db96d56Sopenharmony_ci raise errors.HeaderParseError( 20367db96d56Sopenharmony_ci "expected no-fold-literal but found '{}'".format(value)) 20377db96d56Sopenharmony_ci if value[0] != '[': 20387db96d56Sopenharmony_ci raise errors.HeaderParseError( 20397db96d56Sopenharmony_ci "expected '[' at the start of no-fold-literal " 20407db96d56Sopenharmony_ci "but found '{}'".format(value)) 20417db96d56Sopenharmony_ci no_fold_literal.append(ValueTerminal('[', 'no-fold-literal-start')) 20427db96d56Sopenharmony_ci value = value[1:] 20437db96d56Sopenharmony_ci token, value = get_dtext(value) 20447db96d56Sopenharmony_ci no_fold_literal.append(token) 20457db96d56Sopenharmony_ci if not value or value[0] != ']': 20467db96d56Sopenharmony_ci raise errors.HeaderParseError( 20477db96d56Sopenharmony_ci "expected ']' at the end of no-fold-literal " 20487db96d56Sopenharmony_ci "but found '{}'".format(value)) 20497db96d56Sopenharmony_ci no_fold_literal.append(ValueTerminal(']', 'no-fold-literal-end')) 20507db96d56Sopenharmony_ci return no_fold_literal, value[1:] 20517db96d56Sopenharmony_ci 20527db96d56Sopenharmony_cidef get_msg_id(value): 20537db96d56Sopenharmony_ci """msg-id = [CFWS] "<" id-left '@' id-right ">" [CFWS] 20547db96d56Sopenharmony_ci id-left = dot-atom-text / obs-id-left 20557db96d56Sopenharmony_ci id-right = dot-atom-text / no-fold-literal / obs-id-right 20567db96d56Sopenharmony_ci no-fold-literal = "[" *dtext "]" 20577db96d56Sopenharmony_ci """ 20587db96d56Sopenharmony_ci msg_id = MsgID() 20597db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 20607db96d56Sopenharmony_ci token, value = get_cfws(value) 20617db96d56Sopenharmony_ci msg_id.append(token) 20627db96d56Sopenharmony_ci if not value or value[0] != '<': 20637db96d56Sopenharmony_ci raise errors.HeaderParseError( 20647db96d56Sopenharmony_ci "expected msg-id but found '{}'".format(value)) 20657db96d56Sopenharmony_ci msg_id.append(ValueTerminal('<', 'msg-id-start')) 20667db96d56Sopenharmony_ci value = value[1:] 20677db96d56Sopenharmony_ci # Parse id-left. 20687db96d56Sopenharmony_ci try: 20697db96d56Sopenharmony_ci token, value = get_dot_atom_text(value) 20707db96d56Sopenharmony_ci except errors.HeaderParseError: 20717db96d56Sopenharmony_ci try: 20727db96d56Sopenharmony_ci # obs-id-left is same as local-part of add-spec. 20737db96d56Sopenharmony_ci token, value = get_obs_local_part(value) 20747db96d56Sopenharmony_ci msg_id.defects.append(errors.ObsoleteHeaderDefect( 20757db96d56Sopenharmony_ci "obsolete id-left in msg-id")) 20767db96d56Sopenharmony_ci except errors.HeaderParseError: 20777db96d56Sopenharmony_ci raise errors.HeaderParseError( 20787db96d56Sopenharmony_ci "expected dot-atom-text or obs-id-left" 20797db96d56Sopenharmony_ci " but found '{}'".format(value)) 20807db96d56Sopenharmony_ci msg_id.append(token) 20817db96d56Sopenharmony_ci if not value or value[0] != '@': 20827db96d56Sopenharmony_ci msg_id.defects.append(errors.InvalidHeaderDefect( 20837db96d56Sopenharmony_ci "msg-id with no id-right")) 20847db96d56Sopenharmony_ci # Even though there is no id-right, if the local part 20857db96d56Sopenharmony_ci # ends with `>` let's just parse it too and return 20867db96d56Sopenharmony_ci # along with the defect. 20877db96d56Sopenharmony_ci if value and value[0] == '>': 20887db96d56Sopenharmony_ci msg_id.append(ValueTerminal('>', 'msg-id-end')) 20897db96d56Sopenharmony_ci value = value[1:] 20907db96d56Sopenharmony_ci return msg_id, value 20917db96d56Sopenharmony_ci msg_id.append(ValueTerminal('@', 'address-at-symbol')) 20927db96d56Sopenharmony_ci value = value[1:] 20937db96d56Sopenharmony_ci # Parse id-right. 20947db96d56Sopenharmony_ci try: 20957db96d56Sopenharmony_ci token, value = get_dot_atom_text(value) 20967db96d56Sopenharmony_ci except errors.HeaderParseError: 20977db96d56Sopenharmony_ci try: 20987db96d56Sopenharmony_ci token, value = get_no_fold_literal(value) 20997db96d56Sopenharmony_ci except errors.HeaderParseError as e: 21007db96d56Sopenharmony_ci try: 21017db96d56Sopenharmony_ci token, value = get_domain(value) 21027db96d56Sopenharmony_ci msg_id.defects.append(errors.ObsoleteHeaderDefect( 21037db96d56Sopenharmony_ci "obsolete id-right in msg-id")) 21047db96d56Sopenharmony_ci except errors.HeaderParseError: 21057db96d56Sopenharmony_ci raise errors.HeaderParseError( 21067db96d56Sopenharmony_ci "expected dot-atom-text, no-fold-literal or obs-id-right" 21077db96d56Sopenharmony_ci " but found '{}'".format(value)) 21087db96d56Sopenharmony_ci msg_id.append(token) 21097db96d56Sopenharmony_ci if value and value[0] == '>': 21107db96d56Sopenharmony_ci value = value[1:] 21117db96d56Sopenharmony_ci else: 21127db96d56Sopenharmony_ci msg_id.defects.append(errors.InvalidHeaderDefect( 21137db96d56Sopenharmony_ci "missing trailing '>' on msg-id")) 21147db96d56Sopenharmony_ci msg_id.append(ValueTerminal('>', 'msg-id-end')) 21157db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 21167db96d56Sopenharmony_ci token, value = get_cfws(value) 21177db96d56Sopenharmony_ci msg_id.append(token) 21187db96d56Sopenharmony_ci return msg_id, value 21197db96d56Sopenharmony_ci 21207db96d56Sopenharmony_ci 21217db96d56Sopenharmony_cidef parse_message_id(value): 21227db96d56Sopenharmony_ci """message-id = "Message-ID:" msg-id CRLF 21237db96d56Sopenharmony_ci """ 21247db96d56Sopenharmony_ci message_id = MessageID() 21257db96d56Sopenharmony_ci try: 21267db96d56Sopenharmony_ci token, value = get_msg_id(value) 21277db96d56Sopenharmony_ci message_id.append(token) 21287db96d56Sopenharmony_ci except errors.HeaderParseError as ex: 21297db96d56Sopenharmony_ci token = get_unstructured(value) 21307db96d56Sopenharmony_ci message_id = InvalidMessageID(token) 21317db96d56Sopenharmony_ci message_id.defects.append( 21327db96d56Sopenharmony_ci errors.InvalidHeaderDefect("Invalid msg-id: {!r}".format(ex))) 21337db96d56Sopenharmony_ci else: 21347db96d56Sopenharmony_ci # Value after parsing a valid msg_id should be None. 21357db96d56Sopenharmony_ci if value: 21367db96d56Sopenharmony_ci message_id.defects.append(errors.InvalidHeaderDefect( 21377db96d56Sopenharmony_ci "Unexpected {!r}".format(value))) 21387db96d56Sopenharmony_ci 21397db96d56Sopenharmony_ci return message_id 21407db96d56Sopenharmony_ci 21417db96d56Sopenharmony_ci# 21427db96d56Sopenharmony_ci# XXX: As I begin to add additional header parsers, I'm realizing we probably 21437db96d56Sopenharmony_ci# have two level of parser routines: the get_XXX methods that get a token in 21447db96d56Sopenharmony_ci# the grammar, and parse_XXX methods that parse an entire field value. So 21457db96d56Sopenharmony_ci# get_address_list above should really be a parse_ method, as probably should 21467db96d56Sopenharmony_ci# be get_unstructured. 21477db96d56Sopenharmony_ci# 21487db96d56Sopenharmony_ci 21497db96d56Sopenharmony_cidef parse_mime_version(value): 21507db96d56Sopenharmony_ci """ mime-version = [CFWS] 1*digit [CFWS] "." [CFWS] 1*digit [CFWS] 21517db96d56Sopenharmony_ci 21527db96d56Sopenharmony_ci """ 21537db96d56Sopenharmony_ci # The [CFWS] is implicit in the RFC 2045 BNF. 21547db96d56Sopenharmony_ci # XXX: This routine is a bit verbose, should factor out a get_int method. 21557db96d56Sopenharmony_ci mime_version = MIMEVersion() 21567db96d56Sopenharmony_ci if not value: 21577db96d56Sopenharmony_ci mime_version.defects.append(errors.HeaderMissingRequiredValue( 21587db96d56Sopenharmony_ci "Missing MIME version number (eg: 1.0)")) 21597db96d56Sopenharmony_ci return mime_version 21607db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 21617db96d56Sopenharmony_ci token, value = get_cfws(value) 21627db96d56Sopenharmony_ci mime_version.append(token) 21637db96d56Sopenharmony_ci if not value: 21647db96d56Sopenharmony_ci mime_version.defects.append(errors.HeaderMissingRequiredValue( 21657db96d56Sopenharmony_ci "Expected MIME version number but found only CFWS")) 21667db96d56Sopenharmony_ci digits = '' 21677db96d56Sopenharmony_ci while value and value[0] != '.' and value[0] not in CFWS_LEADER: 21687db96d56Sopenharmony_ci digits += value[0] 21697db96d56Sopenharmony_ci value = value[1:] 21707db96d56Sopenharmony_ci if not digits.isdigit(): 21717db96d56Sopenharmony_ci mime_version.defects.append(errors.InvalidHeaderDefect( 21727db96d56Sopenharmony_ci "Expected MIME major version number but found {!r}".format(digits))) 21737db96d56Sopenharmony_ci mime_version.append(ValueTerminal(digits, 'xtext')) 21747db96d56Sopenharmony_ci else: 21757db96d56Sopenharmony_ci mime_version.major = int(digits) 21767db96d56Sopenharmony_ci mime_version.append(ValueTerminal(digits, 'digits')) 21777db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 21787db96d56Sopenharmony_ci token, value = get_cfws(value) 21797db96d56Sopenharmony_ci mime_version.append(token) 21807db96d56Sopenharmony_ci if not value or value[0] != '.': 21817db96d56Sopenharmony_ci if mime_version.major is not None: 21827db96d56Sopenharmony_ci mime_version.defects.append(errors.InvalidHeaderDefect( 21837db96d56Sopenharmony_ci "Incomplete MIME version; found only major number")) 21847db96d56Sopenharmony_ci if value: 21857db96d56Sopenharmony_ci mime_version.append(ValueTerminal(value, 'xtext')) 21867db96d56Sopenharmony_ci return mime_version 21877db96d56Sopenharmony_ci mime_version.append(ValueTerminal('.', 'version-separator')) 21887db96d56Sopenharmony_ci value = value[1:] 21897db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 21907db96d56Sopenharmony_ci token, value = get_cfws(value) 21917db96d56Sopenharmony_ci mime_version.append(token) 21927db96d56Sopenharmony_ci if not value: 21937db96d56Sopenharmony_ci if mime_version.major is not None: 21947db96d56Sopenharmony_ci mime_version.defects.append(errors.InvalidHeaderDefect( 21957db96d56Sopenharmony_ci "Incomplete MIME version; found only major number")) 21967db96d56Sopenharmony_ci return mime_version 21977db96d56Sopenharmony_ci digits = '' 21987db96d56Sopenharmony_ci while value and value[0] not in CFWS_LEADER: 21997db96d56Sopenharmony_ci digits += value[0] 22007db96d56Sopenharmony_ci value = value[1:] 22017db96d56Sopenharmony_ci if not digits.isdigit(): 22027db96d56Sopenharmony_ci mime_version.defects.append(errors.InvalidHeaderDefect( 22037db96d56Sopenharmony_ci "Expected MIME minor version number but found {!r}".format(digits))) 22047db96d56Sopenharmony_ci mime_version.append(ValueTerminal(digits, 'xtext')) 22057db96d56Sopenharmony_ci else: 22067db96d56Sopenharmony_ci mime_version.minor = int(digits) 22077db96d56Sopenharmony_ci mime_version.append(ValueTerminal(digits, 'digits')) 22087db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 22097db96d56Sopenharmony_ci token, value = get_cfws(value) 22107db96d56Sopenharmony_ci mime_version.append(token) 22117db96d56Sopenharmony_ci if value: 22127db96d56Sopenharmony_ci mime_version.defects.append(errors.InvalidHeaderDefect( 22137db96d56Sopenharmony_ci "Excess non-CFWS text after MIME version")) 22147db96d56Sopenharmony_ci mime_version.append(ValueTerminal(value, 'xtext')) 22157db96d56Sopenharmony_ci return mime_version 22167db96d56Sopenharmony_ci 22177db96d56Sopenharmony_cidef get_invalid_parameter(value): 22187db96d56Sopenharmony_ci """ Read everything up to the next ';'. 22197db96d56Sopenharmony_ci 22207db96d56Sopenharmony_ci This is outside the formal grammar. The InvalidParameter TokenList that is 22217db96d56Sopenharmony_ci returned acts like a Parameter, but the data attributes are None. 22227db96d56Sopenharmony_ci 22237db96d56Sopenharmony_ci """ 22247db96d56Sopenharmony_ci invalid_parameter = InvalidParameter() 22257db96d56Sopenharmony_ci while value and value[0] != ';': 22267db96d56Sopenharmony_ci if value[0] in PHRASE_ENDS: 22277db96d56Sopenharmony_ci invalid_parameter.append(ValueTerminal(value[0], 22287db96d56Sopenharmony_ci 'misplaced-special')) 22297db96d56Sopenharmony_ci value = value[1:] 22307db96d56Sopenharmony_ci else: 22317db96d56Sopenharmony_ci token, value = get_phrase(value) 22327db96d56Sopenharmony_ci invalid_parameter.append(token) 22337db96d56Sopenharmony_ci return invalid_parameter, value 22347db96d56Sopenharmony_ci 22357db96d56Sopenharmony_cidef get_ttext(value): 22367db96d56Sopenharmony_ci """ttext = <matches _ttext_matcher> 22377db96d56Sopenharmony_ci 22387db96d56Sopenharmony_ci We allow any non-TOKEN_ENDS in ttext, but add defects to the token's 22397db96d56Sopenharmony_ci defects list if we find non-ttext characters. We also register defects for 22407db96d56Sopenharmony_ci *any* non-printables even though the RFC doesn't exclude all of them, 22417db96d56Sopenharmony_ci because we follow the spirit of RFC 5322. 22427db96d56Sopenharmony_ci 22437db96d56Sopenharmony_ci """ 22447db96d56Sopenharmony_ci m = _non_token_end_matcher(value) 22457db96d56Sopenharmony_ci if not m: 22467db96d56Sopenharmony_ci raise errors.HeaderParseError( 22477db96d56Sopenharmony_ci "expected ttext but found '{}'".format(value)) 22487db96d56Sopenharmony_ci ttext = m.group() 22497db96d56Sopenharmony_ci value = value[len(ttext):] 22507db96d56Sopenharmony_ci ttext = ValueTerminal(ttext, 'ttext') 22517db96d56Sopenharmony_ci _validate_xtext(ttext) 22527db96d56Sopenharmony_ci return ttext, value 22537db96d56Sopenharmony_ci 22547db96d56Sopenharmony_cidef get_token(value): 22557db96d56Sopenharmony_ci """token = [CFWS] 1*ttext [CFWS] 22567db96d56Sopenharmony_ci 22577db96d56Sopenharmony_ci The RFC equivalent of ttext is any US-ASCII chars except space, ctls, or 22587db96d56Sopenharmony_ci tspecials. We also exclude tabs even though the RFC doesn't. 22597db96d56Sopenharmony_ci 22607db96d56Sopenharmony_ci The RFC implies the CFWS but is not explicit about it in the BNF. 22617db96d56Sopenharmony_ci 22627db96d56Sopenharmony_ci """ 22637db96d56Sopenharmony_ci mtoken = Token() 22647db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 22657db96d56Sopenharmony_ci token, value = get_cfws(value) 22667db96d56Sopenharmony_ci mtoken.append(token) 22677db96d56Sopenharmony_ci if value and value[0] in TOKEN_ENDS: 22687db96d56Sopenharmony_ci raise errors.HeaderParseError( 22697db96d56Sopenharmony_ci "expected token but found '{}'".format(value)) 22707db96d56Sopenharmony_ci token, value = get_ttext(value) 22717db96d56Sopenharmony_ci mtoken.append(token) 22727db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 22737db96d56Sopenharmony_ci token, value = get_cfws(value) 22747db96d56Sopenharmony_ci mtoken.append(token) 22757db96d56Sopenharmony_ci return mtoken, value 22767db96d56Sopenharmony_ci 22777db96d56Sopenharmony_cidef get_attrtext(value): 22787db96d56Sopenharmony_ci """attrtext = 1*(any non-ATTRIBUTE_ENDS character) 22797db96d56Sopenharmony_ci 22807db96d56Sopenharmony_ci We allow any non-ATTRIBUTE_ENDS in attrtext, but add defects to the 22817db96d56Sopenharmony_ci token's defects list if we find non-attrtext characters. We also register 22827db96d56Sopenharmony_ci defects for *any* non-printables even though the RFC doesn't exclude all of 22837db96d56Sopenharmony_ci them, because we follow the spirit of RFC 5322. 22847db96d56Sopenharmony_ci 22857db96d56Sopenharmony_ci """ 22867db96d56Sopenharmony_ci m = _non_attribute_end_matcher(value) 22877db96d56Sopenharmony_ci if not m: 22887db96d56Sopenharmony_ci raise errors.HeaderParseError( 22897db96d56Sopenharmony_ci "expected attrtext but found {!r}".format(value)) 22907db96d56Sopenharmony_ci attrtext = m.group() 22917db96d56Sopenharmony_ci value = value[len(attrtext):] 22927db96d56Sopenharmony_ci attrtext = ValueTerminal(attrtext, 'attrtext') 22937db96d56Sopenharmony_ci _validate_xtext(attrtext) 22947db96d56Sopenharmony_ci return attrtext, value 22957db96d56Sopenharmony_ci 22967db96d56Sopenharmony_cidef get_attribute(value): 22977db96d56Sopenharmony_ci """ [CFWS] 1*attrtext [CFWS] 22987db96d56Sopenharmony_ci 22997db96d56Sopenharmony_ci This version of the BNF makes the CFWS explicit, and as usual we use a 23007db96d56Sopenharmony_ci value terminal for the actual run of characters. The RFC equivalent of 23017db96d56Sopenharmony_ci attrtext is the token characters, with the subtraction of '*', "'", and '%'. 23027db96d56Sopenharmony_ci We include tab in the excluded set just as we do for token. 23037db96d56Sopenharmony_ci 23047db96d56Sopenharmony_ci """ 23057db96d56Sopenharmony_ci attribute = Attribute() 23067db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 23077db96d56Sopenharmony_ci token, value = get_cfws(value) 23087db96d56Sopenharmony_ci attribute.append(token) 23097db96d56Sopenharmony_ci if value and value[0] in ATTRIBUTE_ENDS: 23107db96d56Sopenharmony_ci raise errors.HeaderParseError( 23117db96d56Sopenharmony_ci "expected token but found '{}'".format(value)) 23127db96d56Sopenharmony_ci token, value = get_attrtext(value) 23137db96d56Sopenharmony_ci attribute.append(token) 23147db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 23157db96d56Sopenharmony_ci token, value = get_cfws(value) 23167db96d56Sopenharmony_ci attribute.append(token) 23177db96d56Sopenharmony_ci return attribute, value 23187db96d56Sopenharmony_ci 23197db96d56Sopenharmony_cidef get_extended_attrtext(value): 23207db96d56Sopenharmony_ci """attrtext = 1*(any non-ATTRIBUTE_ENDS character plus '%') 23217db96d56Sopenharmony_ci 23227db96d56Sopenharmony_ci This is a special parsing routine so that we get a value that 23237db96d56Sopenharmony_ci includes % escapes as a single string (which we decode as a single 23247db96d56Sopenharmony_ci string later). 23257db96d56Sopenharmony_ci 23267db96d56Sopenharmony_ci """ 23277db96d56Sopenharmony_ci m = _non_extended_attribute_end_matcher(value) 23287db96d56Sopenharmony_ci if not m: 23297db96d56Sopenharmony_ci raise errors.HeaderParseError( 23307db96d56Sopenharmony_ci "expected extended attrtext but found {!r}".format(value)) 23317db96d56Sopenharmony_ci attrtext = m.group() 23327db96d56Sopenharmony_ci value = value[len(attrtext):] 23337db96d56Sopenharmony_ci attrtext = ValueTerminal(attrtext, 'extended-attrtext') 23347db96d56Sopenharmony_ci _validate_xtext(attrtext) 23357db96d56Sopenharmony_ci return attrtext, value 23367db96d56Sopenharmony_ci 23377db96d56Sopenharmony_cidef get_extended_attribute(value): 23387db96d56Sopenharmony_ci """ [CFWS] 1*extended_attrtext [CFWS] 23397db96d56Sopenharmony_ci 23407db96d56Sopenharmony_ci This is like the non-extended version except we allow % characters, so that 23417db96d56Sopenharmony_ci we can pick up an encoded value as a single string. 23427db96d56Sopenharmony_ci 23437db96d56Sopenharmony_ci """ 23447db96d56Sopenharmony_ci # XXX: should we have an ExtendedAttribute TokenList? 23457db96d56Sopenharmony_ci attribute = Attribute() 23467db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 23477db96d56Sopenharmony_ci token, value = get_cfws(value) 23487db96d56Sopenharmony_ci attribute.append(token) 23497db96d56Sopenharmony_ci if value and value[0] in EXTENDED_ATTRIBUTE_ENDS: 23507db96d56Sopenharmony_ci raise errors.HeaderParseError( 23517db96d56Sopenharmony_ci "expected token but found '{}'".format(value)) 23527db96d56Sopenharmony_ci token, value = get_extended_attrtext(value) 23537db96d56Sopenharmony_ci attribute.append(token) 23547db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 23557db96d56Sopenharmony_ci token, value = get_cfws(value) 23567db96d56Sopenharmony_ci attribute.append(token) 23577db96d56Sopenharmony_ci return attribute, value 23587db96d56Sopenharmony_ci 23597db96d56Sopenharmony_cidef get_section(value): 23607db96d56Sopenharmony_ci """ '*' digits 23617db96d56Sopenharmony_ci 23627db96d56Sopenharmony_ci The formal BNF is more complicated because leading 0s are not allowed. We 23637db96d56Sopenharmony_ci check for that and add a defect. We also assume no CFWS is allowed between 23647db96d56Sopenharmony_ci the '*' and the digits, though the RFC is not crystal clear on that. 23657db96d56Sopenharmony_ci The caller should already have dealt with leading CFWS. 23667db96d56Sopenharmony_ci 23677db96d56Sopenharmony_ci """ 23687db96d56Sopenharmony_ci section = Section() 23697db96d56Sopenharmony_ci if not value or value[0] != '*': 23707db96d56Sopenharmony_ci raise errors.HeaderParseError("Expected section but found {}".format( 23717db96d56Sopenharmony_ci value)) 23727db96d56Sopenharmony_ci section.append(ValueTerminal('*', 'section-marker')) 23737db96d56Sopenharmony_ci value = value[1:] 23747db96d56Sopenharmony_ci if not value or not value[0].isdigit(): 23757db96d56Sopenharmony_ci raise errors.HeaderParseError("Expected section number but " 23767db96d56Sopenharmony_ci "found {}".format(value)) 23777db96d56Sopenharmony_ci digits = '' 23787db96d56Sopenharmony_ci while value and value[0].isdigit(): 23797db96d56Sopenharmony_ci digits += value[0] 23807db96d56Sopenharmony_ci value = value[1:] 23817db96d56Sopenharmony_ci if digits[0] == '0' and digits != '0': 23827db96d56Sopenharmony_ci section.defects.append(errors.InvalidHeaderDefect( 23837db96d56Sopenharmony_ci "section number has an invalid leading 0")) 23847db96d56Sopenharmony_ci section.number = int(digits) 23857db96d56Sopenharmony_ci section.append(ValueTerminal(digits, 'digits')) 23867db96d56Sopenharmony_ci return section, value 23877db96d56Sopenharmony_ci 23887db96d56Sopenharmony_ci 23897db96d56Sopenharmony_cidef get_value(value): 23907db96d56Sopenharmony_ci """ quoted-string / attribute 23917db96d56Sopenharmony_ci 23927db96d56Sopenharmony_ci """ 23937db96d56Sopenharmony_ci v = Value() 23947db96d56Sopenharmony_ci if not value: 23957db96d56Sopenharmony_ci raise errors.HeaderParseError("Expected value but found end of string") 23967db96d56Sopenharmony_ci leader = None 23977db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 23987db96d56Sopenharmony_ci leader, value = get_cfws(value) 23997db96d56Sopenharmony_ci if not value: 24007db96d56Sopenharmony_ci raise errors.HeaderParseError("Expected value but found " 24017db96d56Sopenharmony_ci "only {}".format(leader)) 24027db96d56Sopenharmony_ci if value[0] == '"': 24037db96d56Sopenharmony_ci token, value = get_quoted_string(value) 24047db96d56Sopenharmony_ci else: 24057db96d56Sopenharmony_ci token, value = get_extended_attribute(value) 24067db96d56Sopenharmony_ci if leader is not None: 24077db96d56Sopenharmony_ci token[:0] = [leader] 24087db96d56Sopenharmony_ci v.append(token) 24097db96d56Sopenharmony_ci return v, value 24107db96d56Sopenharmony_ci 24117db96d56Sopenharmony_cidef get_parameter(value): 24127db96d56Sopenharmony_ci """ attribute [section] ["*"] [CFWS] "=" value 24137db96d56Sopenharmony_ci 24147db96d56Sopenharmony_ci The CFWS is implied by the RFC but not made explicit in the BNF. This 24157db96d56Sopenharmony_ci simplified form of the BNF from the RFC is made to conform with the RFC BNF 24167db96d56Sopenharmony_ci through some extra checks. We do it this way because it makes both error 24177db96d56Sopenharmony_ci recovery and working with the resulting parse tree easier. 24187db96d56Sopenharmony_ci """ 24197db96d56Sopenharmony_ci # It is possible CFWS would also be implicitly allowed between the section 24207db96d56Sopenharmony_ci # and the 'extended-attribute' marker (the '*') , but we've never seen that 24217db96d56Sopenharmony_ci # in the wild and we will therefore ignore the possibility. 24227db96d56Sopenharmony_ci param = Parameter() 24237db96d56Sopenharmony_ci token, value = get_attribute(value) 24247db96d56Sopenharmony_ci param.append(token) 24257db96d56Sopenharmony_ci if not value or value[0] == ';': 24267db96d56Sopenharmony_ci param.defects.append(errors.InvalidHeaderDefect("Parameter contains " 24277db96d56Sopenharmony_ci "name ({}) but no value".format(token))) 24287db96d56Sopenharmony_ci return param, value 24297db96d56Sopenharmony_ci if value[0] == '*': 24307db96d56Sopenharmony_ci try: 24317db96d56Sopenharmony_ci token, value = get_section(value) 24327db96d56Sopenharmony_ci param.sectioned = True 24337db96d56Sopenharmony_ci param.append(token) 24347db96d56Sopenharmony_ci except errors.HeaderParseError: 24357db96d56Sopenharmony_ci pass 24367db96d56Sopenharmony_ci if not value: 24377db96d56Sopenharmony_ci raise errors.HeaderParseError("Incomplete parameter") 24387db96d56Sopenharmony_ci if value[0] == '*': 24397db96d56Sopenharmony_ci param.append(ValueTerminal('*', 'extended-parameter-marker')) 24407db96d56Sopenharmony_ci value = value[1:] 24417db96d56Sopenharmony_ci param.extended = True 24427db96d56Sopenharmony_ci if value[0] != '=': 24437db96d56Sopenharmony_ci raise errors.HeaderParseError("Parameter not followed by '='") 24447db96d56Sopenharmony_ci param.append(ValueTerminal('=', 'parameter-separator')) 24457db96d56Sopenharmony_ci value = value[1:] 24467db96d56Sopenharmony_ci leader = None 24477db96d56Sopenharmony_ci if value and value[0] in CFWS_LEADER: 24487db96d56Sopenharmony_ci token, value = get_cfws(value) 24497db96d56Sopenharmony_ci param.append(token) 24507db96d56Sopenharmony_ci remainder = None 24517db96d56Sopenharmony_ci appendto = param 24527db96d56Sopenharmony_ci if param.extended and value and value[0] == '"': 24537db96d56Sopenharmony_ci # Now for some serious hackery to handle the common invalid case of 24547db96d56Sopenharmony_ci # double quotes around an extended value. We also accept (with defect) 24557db96d56Sopenharmony_ci # a value marked as encoded that isn't really. 24567db96d56Sopenharmony_ci qstring, remainder = get_quoted_string(value) 24577db96d56Sopenharmony_ci inner_value = qstring.stripped_value 24587db96d56Sopenharmony_ci semi_valid = False 24597db96d56Sopenharmony_ci if param.section_number == 0: 24607db96d56Sopenharmony_ci if inner_value and inner_value[0] == "'": 24617db96d56Sopenharmony_ci semi_valid = True 24627db96d56Sopenharmony_ci else: 24637db96d56Sopenharmony_ci token, rest = get_attrtext(inner_value) 24647db96d56Sopenharmony_ci if rest and rest[0] == "'": 24657db96d56Sopenharmony_ci semi_valid = True 24667db96d56Sopenharmony_ci else: 24677db96d56Sopenharmony_ci try: 24687db96d56Sopenharmony_ci token, rest = get_extended_attrtext(inner_value) 24697db96d56Sopenharmony_ci except: 24707db96d56Sopenharmony_ci pass 24717db96d56Sopenharmony_ci else: 24727db96d56Sopenharmony_ci if not rest: 24737db96d56Sopenharmony_ci semi_valid = True 24747db96d56Sopenharmony_ci if semi_valid: 24757db96d56Sopenharmony_ci param.defects.append(errors.InvalidHeaderDefect( 24767db96d56Sopenharmony_ci "Quoted string value for extended parameter is invalid")) 24777db96d56Sopenharmony_ci param.append(qstring) 24787db96d56Sopenharmony_ci for t in qstring: 24797db96d56Sopenharmony_ci if t.token_type == 'bare-quoted-string': 24807db96d56Sopenharmony_ci t[:] = [] 24817db96d56Sopenharmony_ci appendto = t 24827db96d56Sopenharmony_ci break 24837db96d56Sopenharmony_ci value = inner_value 24847db96d56Sopenharmony_ci else: 24857db96d56Sopenharmony_ci remainder = None 24867db96d56Sopenharmony_ci param.defects.append(errors.InvalidHeaderDefect( 24877db96d56Sopenharmony_ci "Parameter marked as extended but appears to have a " 24887db96d56Sopenharmony_ci "quoted string value that is non-encoded")) 24897db96d56Sopenharmony_ci if value and value[0] == "'": 24907db96d56Sopenharmony_ci token = None 24917db96d56Sopenharmony_ci else: 24927db96d56Sopenharmony_ci token, value = get_value(value) 24937db96d56Sopenharmony_ci if not param.extended or param.section_number > 0: 24947db96d56Sopenharmony_ci if not value or value[0] != "'": 24957db96d56Sopenharmony_ci appendto.append(token) 24967db96d56Sopenharmony_ci if remainder is not None: 24977db96d56Sopenharmony_ci assert not value, value 24987db96d56Sopenharmony_ci value = remainder 24997db96d56Sopenharmony_ci return param, value 25007db96d56Sopenharmony_ci param.defects.append(errors.InvalidHeaderDefect( 25017db96d56Sopenharmony_ci "Apparent initial-extended-value but attribute " 25027db96d56Sopenharmony_ci "was not marked as extended or was not initial section")) 25037db96d56Sopenharmony_ci if not value: 25047db96d56Sopenharmony_ci # Assume the charset/lang is missing and the token is the value. 25057db96d56Sopenharmony_ci param.defects.append(errors.InvalidHeaderDefect( 25067db96d56Sopenharmony_ci "Missing required charset/lang delimiters")) 25077db96d56Sopenharmony_ci appendto.append(token) 25087db96d56Sopenharmony_ci if remainder is None: 25097db96d56Sopenharmony_ci return param, value 25107db96d56Sopenharmony_ci else: 25117db96d56Sopenharmony_ci if token is not None: 25127db96d56Sopenharmony_ci for t in token: 25137db96d56Sopenharmony_ci if t.token_type == 'extended-attrtext': 25147db96d56Sopenharmony_ci break 25157db96d56Sopenharmony_ci t.token_type == 'attrtext' 25167db96d56Sopenharmony_ci appendto.append(t) 25177db96d56Sopenharmony_ci param.charset = t.value 25187db96d56Sopenharmony_ci if value[0] != "'": 25197db96d56Sopenharmony_ci raise errors.HeaderParseError("Expected RFC2231 char/lang encoding " 25207db96d56Sopenharmony_ci "delimiter, but found {!r}".format(value)) 25217db96d56Sopenharmony_ci appendto.append(ValueTerminal("'", 'RFC2231-delimiter')) 25227db96d56Sopenharmony_ci value = value[1:] 25237db96d56Sopenharmony_ci if value and value[0] != "'": 25247db96d56Sopenharmony_ci token, value = get_attrtext(value) 25257db96d56Sopenharmony_ci appendto.append(token) 25267db96d56Sopenharmony_ci param.lang = token.value 25277db96d56Sopenharmony_ci if not value or value[0] != "'": 25287db96d56Sopenharmony_ci raise errors.HeaderParseError("Expected RFC2231 char/lang encoding " 25297db96d56Sopenharmony_ci "delimiter, but found {}".format(value)) 25307db96d56Sopenharmony_ci appendto.append(ValueTerminal("'", 'RFC2231-delimiter')) 25317db96d56Sopenharmony_ci value = value[1:] 25327db96d56Sopenharmony_ci if remainder is not None: 25337db96d56Sopenharmony_ci # Treat the rest of value as bare quoted string content. 25347db96d56Sopenharmony_ci v = Value() 25357db96d56Sopenharmony_ci while value: 25367db96d56Sopenharmony_ci if value[0] in WSP: 25377db96d56Sopenharmony_ci token, value = get_fws(value) 25387db96d56Sopenharmony_ci elif value[0] == '"': 25397db96d56Sopenharmony_ci token = ValueTerminal('"', 'DQUOTE') 25407db96d56Sopenharmony_ci value = value[1:] 25417db96d56Sopenharmony_ci else: 25427db96d56Sopenharmony_ci token, value = get_qcontent(value) 25437db96d56Sopenharmony_ci v.append(token) 25447db96d56Sopenharmony_ci token = v 25457db96d56Sopenharmony_ci else: 25467db96d56Sopenharmony_ci token, value = get_value(value) 25477db96d56Sopenharmony_ci appendto.append(token) 25487db96d56Sopenharmony_ci if remainder is not None: 25497db96d56Sopenharmony_ci assert not value, value 25507db96d56Sopenharmony_ci value = remainder 25517db96d56Sopenharmony_ci return param, value 25527db96d56Sopenharmony_ci 25537db96d56Sopenharmony_cidef parse_mime_parameters(value): 25547db96d56Sopenharmony_ci """ parameter *( ";" parameter ) 25557db96d56Sopenharmony_ci 25567db96d56Sopenharmony_ci That BNF is meant to indicate this routine should only be called after 25577db96d56Sopenharmony_ci finding and handling the leading ';'. There is no corresponding rule in 25587db96d56Sopenharmony_ci the formal RFC grammar, but it is more convenient for us for the set of 25597db96d56Sopenharmony_ci parameters to be treated as its own TokenList. 25607db96d56Sopenharmony_ci 25617db96d56Sopenharmony_ci This is 'parse' routine because it consumes the remaining value, but it 25627db96d56Sopenharmony_ci would never be called to parse a full header. Instead it is called to 25637db96d56Sopenharmony_ci parse everything after the non-parameter value of a specific MIME header. 25647db96d56Sopenharmony_ci 25657db96d56Sopenharmony_ci """ 25667db96d56Sopenharmony_ci mime_parameters = MimeParameters() 25677db96d56Sopenharmony_ci while value: 25687db96d56Sopenharmony_ci try: 25697db96d56Sopenharmony_ci token, value = get_parameter(value) 25707db96d56Sopenharmony_ci mime_parameters.append(token) 25717db96d56Sopenharmony_ci except errors.HeaderParseError as err: 25727db96d56Sopenharmony_ci leader = None 25737db96d56Sopenharmony_ci if value[0] in CFWS_LEADER: 25747db96d56Sopenharmony_ci leader, value = get_cfws(value) 25757db96d56Sopenharmony_ci if not value: 25767db96d56Sopenharmony_ci mime_parameters.append(leader) 25777db96d56Sopenharmony_ci return mime_parameters 25787db96d56Sopenharmony_ci if value[0] == ';': 25797db96d56Sopenharmony_ci if leader is not None: 25807db96d56Sopenharmony_ci mime_parameters.append(leader) 25817db96d56Sopenharmony_ci mime_parameters.defects.append(errors.InvalidHeaderDefect( 25827db96d56Sopenharmony_ci "parameter entry with no content")) 25837db96d56Sopenharmony_ci else: 25847db96d56Sopenharmony_ci token, value = get_invalid_parameter(value) 25857db96d56Sopenharmony_ci if leader: 25867db96d56Sopenharmony_ci token[:0] = [leader] 25877db96d56Sopenharmony_ci mime_parameters.append(token) 25887db96d56Sopenharmony_ci mime_parameters.defects.append(errors.InvalidHeaderDefect( 25897db96d56Sopenharmony_ci "invalid parameter {!r}".format(token))) 25907db96d56Sopenharmony_ci if value and value[0] != ';': 25917db96d56Sopenharmony_ci # Junk after the otherwise valid parameter. Mark it as 25927db96d56Sopenharmony_ci # invalid, but it will have a value. 25937db96d56Sopenharmony_ci param = mime_parameters[-1] 25947db96d56Sopenharmony_ci param.token_type = 'invalid-parameter' 25957db96d56Sopenharmony_ci token, value = get_invalid_parameter(value) 25967db96d56Sopenharmony_ci param.extend(token) 25977db96d56Sopenharmony_ci mime_parameters.defects.append(errors.InvalidHeaderDefect( 25987db96d56Sopenharmony_ci "parameter with invalid trailing text {!r}".format(token))) 25997db96d56Sopenharmony_ci if value: 26007db96d56Sopenharmony_ci # Must be a ';' at this point. 26017db96d56Sopenharmony_ci mime_parameters.append(ValueTerminal(';', 'parameter-separator')) 26027db96d56Sopenharmony_ci value = value[1:] 26037db96d56Sopenharmony_ci return mime_parameters 26047db96d56Sopenharmony_ci 26057db96d56Sopenharmony_cidef _find_mime_parameters(tokenlist, value): 26067db96d56Sopenharmony_ci """Do our best to find the parameters in an invalid MIME header 26077db96d56Sopenharmony_ci 26087db96d56Sopenharmony_ci """ 26097db96d56Sopenharmony_ci while value and value[0] != ';': 26107db96d56Sopenharmony_ci if value[0] in PHRASE_ENDS: 26117db96d56Sopenharmony_ci tokenlist.append(ValueTerminal(value[0], 'misplaced-special')) 26127db96d56Sopenharmony_ci value = value[1:] 26137db96d56Sopenharmony_ci else: 26147db96d56Sopenharmony_ci token, value = get_phrase(value) 26157db96d56Sopenharmony_ci tokenlist.append(token) 26167db96d56Sopenharmony_ci if not value: 26177db96d56Sopenharmony_ci return 26187db96d56Sopenharmony_ci tokenlist.append(ValueTerminal(';', 'parameter-separator')) 26197db96d56Sopenharmony_ci tokenlist.append(parse_mime_parameters(value[1:])) 26207db96d56Sopenharmony_ci 26217db96d56Sopenharmony_cidef parse_content_type_header(value): 26227db96d56Sopenharmony_ci """ maintype "/" subtype *( ";" parameter ) 26237db96d56Sopenharmony_ci 26247db96d56Sopenharmony_ci The maintype and substype are tokens. Theoretically they could 26257db96d56Sopenharmony_ci be checked against the official IANA list + x-token, but we 26267db96d56Sopenharmony_ci don't do that. 26277db96d56Sopenharmony_ci """ 26287db96d56Sopenharmony_ci ctype = ContentType() 26297db96d56Sopenharmony_ci recover = False 26307db96d56Sopenharmony_ci if not value: 26317db96d56Sopenharmony_ci ctype.defects.append(errors.HeaderMissingRequiredValue( 26327db96d56Sopenharmony_ci "Missing content type specification")) 26337db96d56Sopenharmony_ci return ctype 26347db96d56Sopenharmony_ci try: 26357db96d56Sopenharmony_ci token, value = get_token(value) 26367db96d56Sopenharmony_ci except errors.HeaderParseError: 26377db96d56Sopenharmony_ci ctype.defects.append(errors.InvalidHeaderDefect( 26387db96d56Sopenharmony_ci "Expected content maintype but found {!r}".format(value))) 26397db96d56Sopenharmony_ci _find_mime_parameters(ctype, value) 26407db96d56Sopenharmony_ci return ctype 26417db96d56Sopenharmony_ci ctype.append(token) 26427db96d56Sopenharmony_ci # XXX: If we really want to follow the formal grammar we should make 26437db96d56Sopenharmony_ci # mantype and subtype specialized TokenLists here. Probably not worth it. 26447db96d56Sopenharmony_ci if not value or value[0] != '/': 26457db96d56Sopenharmony_ci ctype.defects.append(errors.InvalidHeaderDefect( 26467db96d56Sopenharmony_ci "Invalid content type")) 26477db96d56Sopenharmony_ci if value: 26487db96d56Sopenharmony_ci _find_mime_parameters(ctype, value) 26497db96d56Sopenharmony_ci return ctype 26507db96d56Sopenharmony_ci ctype.maintype = token.value.strip().lower() 26517db96d56Sopenharmony_ci ctype.append(ValueTerminal('/', 'content-type-separator')) 26527db96d56Sopenharmony_ci value = value[1:] 26537db96d56Sopenharmony_ci try: 26547db96d56Sopenharmony_ci token, value = get_token(value) 26557db96d56Sopenharmony_ci except errors.HeaderParseError: 26567db96d56Sopenharmony_ci ctype.defects.append(errors.InvalidHeaderDefect( 26577db96d56Sopenharmony_ci "Expected content subtype but found {!r}".format(value))) 26587db96d56Sopenharmony_ci _find_mime_parameters(ctype, value) 26597db96d56Sopenharmony_ci return ctype 26607db96d56Sopenharmony_ci ctype.append(token) 26617db96d56Sopenharmony_ci ctype.subtype = token.value.strip().lower() 26627db96d56Sopenharmony_ci if not value: 26637db96d56Sopenharmony_ci return ctype 26647db96d56Sopenharmony_ci if value[0] != ';': 26657db96d56Sopenharmony_ci ctype.defects.append(errors.InvalidHeaderDefect( 26667db96d56Sopenharmony_ci "Only parameters are valid after content type, but " 26677db96d56Sopenharmony_ci "found {!r}".format(value))) 26687db96d56Sopenharmony_ci # The RFC requires that a syntactically invalid content-type be treated 26697db96d56Sopenharmony_ci # as text/plain. Perhaps we should postel this, but we should probably 26707db96d56Sopenharmony_ci # only do that if we were checking the subtype value against IANA. 26717db96d56Sopenharmony_ci del ctype.maintype, ctype.subtype 26727db96d56Sopenharmony_ci _find_mime_parameters(ctype, value) 26737db96d56Sopenharmony_ci return ctype 26747db96d56Sopenharmony_ci ctype.append(ValueTerminal(';', 'parameter-separator')) 26757db96d56Sopenharmony_ci ctype.append(parse_mime_parameters(value[1:])) 26767db96d56Sopenharmony_ci return ctype 26777db96d56Sopenharmony_ci 26787db96d56Sopenharmony_cidef parse_content_disposition_header(value): 26797db96d56Sopenharmony_ci """ disposition-type *( ";" parameter ) 26807db96d56Sopenharmony_ci 26817db96d56Sopenharmony_ci """ 26827db96d56Sopenharmony_ci disp_header = ContentDisposition() 26837db96d56Sopenharmony_ci if not value: 26847db96d56Sopenharmony_ci disp_header.defects.append(errors.HeaderMissingRequiredValue( 26857db96d56Sopenharmony_ci "Missing content disposition")) 26867db96d56Sopenharmony_ci return disp_header 26877db96d56Sopenharmony_ci try: 26887db96d56Sopenharmony_ci token, value = get_token(value) 26897db96d56Sopenharmony_ci except errors.HeaderParseError: 26907db96d56Sopenharmony_ci disp_header.defects.append(errors.InvalidHeaderDefect( 26917db96d56Sopenharmony_ci "Expected content disposition but found {!r}".format(value))) 26927db96d56Sopenharmony_ci _find_mime_parameters(disp_header, value) 26937db96d56Sopenharmony_ci return disp_header 26947db96d56Sopenharmony_ci disp_header.append(token) 26957db96d56Sopenharmony_ci disp_header.content_disposition = token.value.strip().lower() 26967db96d56Sopenharmony_ci if not value: 26977db96d56Sopenharmony_ci return disp_header 26987db96d56Sopenharmony_ci if value[0] != ';': 26997db96d56Sopenharmony_ci disp_header.defects.append(errors.InvalidHeaderDefect( 27007db96d56Sopenharmony_ci "Only parameters are valid after content disposition, but " 27017db96d56Sopenharmony_ci "found {!r}".format(value))) 27027db96d56Sopenharmony_ci _find_mime_parameters(disp_header, value) 27037db96d56Sopenharmony_ci return disp_header 27047db96d56Sopenharmony_ci disp_header.append(ValueTerminal(';', 'parameter-separator')) 27057db96d56Sopenharmony_ci disp_header.append(parse_mime_parameters(value[1:])) 27067db96d56Sopenharmony_ci return disp_header 27077db96d56Sopenharmony_ci 27087db96d56Sopenharmony_cidef parse_content_transfer_encoding_header(value): 27097db96d56Sopenharmony_ci """ mechanism 27107db96d56Sopenharmony_ci 27117db96d56Sopenharmony_ci """ 27127db96d56Sopenharmony_ci # We should probably validate the values, since the list is fixed. 27137db96d56Sopenharmony_ci cte_header = ContentTransferEncoding() 27147db96d56Sopenharmony_ci if not value: 27157db96d56Sopenharmony_ci cte_header.defects.append(errors.HeaderMissingRequiredValue( 27167db96d56Sopenharmony_ci "Missing content transfer encoding")) 27177db96d56Sopenharmony_ci return cte_header 27187db96d56Sopenharmony_ci try: 27197db96d56Sopenharmony_ci token, value = get_token(value) 27207db96d56Sopenharmony_ci except errors.HeaderParseError: 27217db96d56Sopenharmony_ci cte_header.defects.append(errors.InvalidHeaderDefect( 27227db96d56Sopenharmony_ci "Expected content transfer encoding but found {!r}".format(value))) 27237db96d56Sopenharmony_ci else: 27247db96d56Sopenharmony_ci cte_header.append(token) 27257db96d56Sopenharmony_ci cte_header.cte = token.value.strip().lower() 27267db96d56Sopenharmony_ci if not value: 27277db96d56Sopenharmony_ci return cte_header 27287db96d56Sopenharmony_ci while value: 27297db96d56Sopenharmony_ci cte_header.defects.append(errors.InvalidHeaderDefect( 27307db96d56Sopenharmony_ci "Extra text after content transfer encoding")) 27317db96d56Sopenharmony_ci if value[0] in PHRASE_ENDS: 27327db96d56Sopenharmony_ci cte_header.append(ValueTerminal(value[0], 'misplaced-special')) 27337db96d56Sopenharmony_ci value = value[1:] 27347db96d56Sopenharmony_ci else: 27357db96d56Sopenharmony_ci token, value = get_phrase(value) 27367db96d56Sopenharmony_ci cte_header.append(token) 27377db96d56Sopenharmony_ci return cte_header 27387db96d56Sopenharmony_ci 27397db96d56Sopenharmony_ci 27407db96d56Sopenharmony_ci# 27417db96d56Sopenharmony_ci# Header folding 27427db96d56Sopenharmony_ci# 27437db96d56Sopenharmony_ci# Header folding is complex, with lots of rules and corner cases. The 27447db96d56Sopenharmony_ci# following code does its best to obey the rules and handle the corner 27457db96d56Sopenharmony_ci# cases, but you can be sure there are few bugs:) 27467db96d56Sopenharmony_ci# 27477db96d56Sopenharmony_ci# This folder generally canonicalizes as it goes, preferring the stringified 27487db96d56Sopenharmony_ci# version of each token. The tokens contain information that supports the 27497db96d56Sopenharmony_ci# folder, including which tokens can be encoded in which ways. 27507db96d56Sopenharmony_ci# 27517db96d56Sopenharmony_ci# Folded text is accumulated in a simple list of strings ('lines'), each 27527db96d56Sopenharmony_ci# one of which should be less than policy.max_line_length ('maxlen'). 27537db96d56Sopenharmony_ci# 27547db96d56Sopenharmony_ci 27557db96d56Sopenharmony_cidef _steal_trailing_WSP_if_exists(lines): 27567db96d56Sopenharmony_ci wsp = '' 27577db96d56Sopenharmony_ci if lines and lines[-1] and lines[-1][-1] in WSP: 27587db96d56Sopenharmony_ci wsp = lines[-1][-1] 27597db96d56Sopenharmony_ci lines[-1] = lines[-1][:-1] 27607db96d56Sopenharmony_ci return wsp 27617db96d56Sopenharmony_ci 27627db96d56Sopenharmony_cidef _refold_parse_tree(parse_tree, *, policy): 27637db96d56Sopenharmony_ci """Return string of contents of parse_tree folded according to RFC rules. 27647db96d56Sopenharmony_ci 27657db96d56Sopenharmony_ci """ 27667db96d56Sopenharmony_ci # max_line_length 0/None means no limit, ie: infinitely long. 27677db96d56Sopenharmony_ci maxlen = policy.max_line_length or sys.maxsize 27687db96d56Sopenharmony_ci encoding = 'utf-8' if policy.utf8 else 'us-ascii' 27697db96d56Sopenharmony_ci lines = [''] 27707db96d56Sopenharmony_ci last_ew = None 27717db96d56Sopenharmony_ci wrap_as_ew_blocked = 0 27727db96d56Sopenharmony_ci want_encoding = False 27737db96d56Sopenharmony_ci end_ew_not_allowed = Terminal('', 'wrap_as_ew_blocked') 27747db96d56Sopenharmony_ci parts = list(parse_tree) 27757db96d56Sopenharmony_ci while parts: 27767db96d56Sopenharmony_ci part = parts.pop(0) 27777db96d56Sopenharmony_ci if part is end_ew_not_allowed: 27787db96d56Sopenharmony_ci wrap_as_ew_blocked -= 1 27797db96d56Sopenharmony_ci continue 27807db96d56Sopenharmony_ci tstr = str(part) 27817db96d56Sopenharmony_ci if part.token_type == 'ptext' and set(tstr) & SPECIALS: 27827db96d56Sopenharmony_ci # Encode if tstr contains special characters. 27837db96d56Sopenharmony_ci want_encoding = True 27847db96d56Sopenharmony_ci try: 27857db96d56Sopenharmony_ci tstr.encode(encoding) 27867db96d56Sopenharmony_ci charset = encoding 27877db96d56Sopenharmony_ci except UnicodeEncodeError: 27887db96d56Sopenharmony_ci if any(isinstance(x, errors.UndecodableBytesDefect) 27897db96d56Sopenharmony_ci for x in part.all_defects): 27907db96d56Sopenharmony_ci charset = 'unknown-8bit' 27917db96d56Sopenharmony_ci else: 27927db96d56Sopenharmony_ci # If policy.utf8 is false this should really be taken from a 27937db96d56Sopenharmony_ci # 'charset' property on the policy. 27947db96d56Sopenharmony_ci charset = 'utf-8' 27957db96d56Sopenharmony_ci want_encoding = True 27967db96d56Sopenharmony_ci if part.token_type == 'mime-parameters': 27977db96d56Sopenharmony_ci # Mime parameter folding (using RFC2231) is extra special. 27987db96d56Sopenharmony_ci _fold_mime_parameters(part, lines, maxlen, encoding) 27997db96d56Sopenharmony_ci continue 28007db96d56Sopenharmony_ci if want_encoding and not wrap_as_ew_blocked: 28017db96d56Sopenharmony_ci if not part.as_ew_allowed: 28027db96d56Sopenharmony_ci want_encoding = False 28037db96d56Sopenharmony_ci last_ew = None 28047db96d56Sopenharmony_ci if part.syntactic_break: 28057db96d56Sopenharmony_ci encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] 28067db96d56Sopenharmony_ci if policy.linesep not in encoded_part: 28077db96d56Sopenharmony_ci # It fits on a single line 28087db96d56Sopenharmony_ci if len(encoded_part) > maxlen - len(lines[-1]): 28097db96d56Sopenharmony_ci # But not on this one, so start a new one. 28107db96d56Sopenharmony_ci newline = _steal_trailing_WSP_if_exists(lines) 28117db96d56Sopenharmony_ci # XXX what if encoded_part has no leading FWS? 28127db96d56Sopenharmony_ci lines.append(newline) 28137db96d56Sopenharmony_ci lines[-1] += encoded_part 28147db96d56Sopenharmony_ci continue 28157db96d56Sopenharmony_ci # Either this is not a major syntactic break, so we don't 28167db96d56Sopenharmony_ci # want it on a line by itself even if it fits, or it 28177db96d56Sopenharmony_ci # doesn't fit on a line by itself. Either way, fall through 28187db96d56Sopenharmony_ci # to unpacking the subparts and wrapping them. 28197db96d56Sopenharmony_ci if not hasattr(part, 'encode'): 28207db96d56Sopenharmony_ci # It's not a Terminal, do each piece individually. 28217db96d56Sopenharmony_ci parts = list(part) + parts 28227db96d56Sopenharmony_ci else: 28237db96d56Sopenharmony_ci # It's a terminal, wrap it as an encoded word, possibly 28247db96d56Sopenharmony_ci # combining it with previously encoded words if allowed. 28257db96d56Sopenharmony_ci last_ew = _fold_as_ew(tstr, lines, maxlen, last_ew, 28267db96d56Sopenharmony_ci part.ew_combine_allowed, charset) 28277db96d56Sopenharmony_ci want_encoding = False 28287db96d56Sopenharmony_ci continue 28297db96d56Sopenharmony_ci if len(tstr) <= maxlen - len(lines[-1]): 28307db96d56Sopenharmony_ci lines[-1] += tstr 28317db96d56Sopenharmony_ci continue 28327db96d56Sopenharmony_ci # This part is too long to fit. The RFC wants us to break at 28337db96d56Sopenharmony_ci # "major syntactic breaks", so unless we don't consider this 28347db96d56Sopenharmony_ci # to be one, check if it will fit on the next line by itself. 28357db96d56Sopenharmony_ci if (part.syntactic_break and 28367db96d56Sopenharmony_ci len(tstr) + 1 <= maxlen): 28377db96d56Sopenharmony_ci newline = _steal_trailing_WSP_if_exists(lines) 28387db96d56Sopenharmony_ci if newline or part.startswith_fws(): 28397db96d56Sopenharmony_ci lines.append(newline + tstr) 28407db96d56Sopenharmony_ci last_ew = None 28417db96d56Sopenharmony_ci continue 28427db96d56Sopenharmony_ci if not hasattr(part, 'encode'): 28437db96d56Sopenharmony_ci # It's not a terminal, try folding the subparts. 28447db96d56Sopenharmony_ci newparts = list(part) 28457db96d56Sopenharmony_ci if not part.as_ew_allowed: 28467db96d56Sopenharmony_ci wrap_as_ew_blocked += 1 28477db96d56Sopenharmony_ci newparts.append(end_ew_not_allowed) 28487db96d56Sopenharmony_ci parts = newparts + parts 28497db96d56Sopenharmony_ci continue 28507db96d56Sopenharmony_ci if part.as_ew_allowed and not wrap_as_ew_blocked: 28517db96d56Sopenharmony_ci # It doesn't need CTE encoding, but encode it anyway so we can 28527db96d56Sopenharmony_ci # wrap it. 28537db96d56Sopenharmony_ci parts.insert(0, part) 28547db96d56Sopenharmony_ci want_encoding = True 28557db96d56Sopenharmony_ci continue 28567db96d56Sopenharmony_ci # We can't figure out how to wrap, it, so give up. 28577db96d56Sopenharmony_ci newline = _steal_trailing_WSP_if_exists(lines) 28587db96d56Sopenharmony_ci if newline or part.startswith_fws(): 28597db96d56Sopenharmony_ci lines.append(newline + tstr) 28607db96d56Sopenharmony_ci else: 28617db96d56Sopenharmony_ci # We can't fold it onto the next line either... 28627db96d56Sopenharmony_ci lines[-1] += tstr 28637db96d56Sopenharmony_ci return policy.linesep.join(lines) + policy.linesep 28647db96d56Sopenharmony_ci 28657db96d56Sopenharmony_cidef _fold_as_ew(to_encode, lines, maxlen, last_ew, ew_combine_allowed, charset): 28667db96d56Sopenharmony_ci """Fold string to_encode into lines as encoded word, combining if allowed. 28677db96d56Sopenharmony_ci Return the new value for last_ew, or None if ew_combine_allowed is False. 28687db96d56Sopenharmony_ci 28697db96d56Sopenharmony_ci If there is already an encoded word in the last line of lines (indicated by 28707db96d56Sopenharmony_ci a non-None value for last_ew) and ew_combine_allowed is true, decode the 28717db96d56Sopenharmony_ci existing ew, combine it with to_encode, and re-encode. Otherwise, encode 28727db96d56Sopenharmony_ci to_encode. In either case, split to_encode as necessary so that the 28737db96d56Sopenharmony_ci encoded segments fit within maxlen. 28747db96d56Sopenharmony_ci 28757db96d56Sopenharmony_ci """ 28767db96d56Sopenharmony_ci if last_ew is not None and ew_combine_allowed: 28777db96d56Sopenharmony_ci to_encode = str( 28787db96d56Sopenharmony_ci get_unstructured(lines[-1][last_ew:] + to_encode)) 28797db96d56Sopenharmony_ci lines[-1] = lines[-1][:last_ew] 28807db96d56Sopenharmony_ci if to_encode[0] in WSP: 28817db96d56Sopenharmony_ci # We're joining this to non-encoded text, so don't encode 28827db96d56Sopenharmony_ci # the leading blank. 28837db96d56Sopenharmony_ci leading_wsp = to_encode[0] 28847db96d56Sopenharmony_ci to_encode = to_encode[1:] 28857db96d56Sopenharmony_ci if (len(lines[-1]) == maxlen): 28867db96d56Sopenharmony_ci lines.append(_steal_trailing_WSP_if_exists(lines)) 28877db96d56Sopenharmony_ci lines[-1] += leading_wsp 28887db96d56Sopenharmony_ci trailing_wsp = '' 28897db96d56Sopenharmony_ci if to_encode[-1] in WSP: 28907db96d56Sopenharmony_ci # Likewise for the trailing space. 28917db96d56Sopenharmony_ci trailing_wsp = to_encode[-1] 28927db96d56Sopenharmony_ci to_encode = to_encode[:-1] 28937db96d56Sopenharmony_ci new_last_ew = len(lines[-1]) if last_ew is None else last_ew 28947db96d56Sopenharmony_ci 28957db96d56Sopenharmony_ci encode_as = 'utf-8' if charset == 'us-ascii' else charset 28967db96d56Sopenharmony_ci 28977db96d56Sopenharmony_ci # The RFC2047 chrome takes up 7 characters plus the length 28987db96d56Sopenharmony_ci # of the charset name. 28997db96d56Sopenharmony_ci chrome_len = len(encode_as) + 7 29007db96d56Sopenharmony_ci 29017db96d56Sopenharmony_ci if (chrome_len + 1) >= maxlen: 29027db96d56Sopenharmony_ci raise errors.HeaderParseError( 29037db96d56Sopenharmony_ci "max_line_length is too small to fit an encoded word") 29047db96d56Sopenharmony_ci 29057db96d56Sopenharmony_ci while to_encode: 29067db96d56Sopenharmony_ci remaining_space = maxlen - len(lines[-1]) 29077db96d56Sopenharmony_ci text_space = remaining_space - chrome_len 29087db96d56Sopenharmony_ci if text_space <= 0: 29097db96d56Sopenharmony_ci lines.append(' ') 29107db96d56Sopenharmony_ci continue 29117db96d56Sopenharmony_ci 29127db96d56Sopenharmony_ci to_encode_word = to_encode[:text_space] 29137db96d56Sopenharmony_ci encoded_word = _ew.encode(to_encode_word, charset=encode_as) 29147db96d56Sopenharmony_ci excess = len(encoded_word) - remaining_space 29157db96d56Sopenharmony_ci while excess > 0: 29167db96d56Sopenharmony_ci # Since the chunk to encode is guaranteed to fit into less than 100 characters, 29177db96d56Sopenharmony_ci # shrinking it by one at a time shouldn't take long. 29187db96d56Sopenharmony_ci to_encode_word = to_encode_word[:-1] 29197db96d56Sopenharmony_ci encoded_word = _ew.encode(to_encode_word, charset=encode_as) 29207db96d56Sopenharmony_ci excess = len(encoded_word) - remaining_space 29217db96d56Sopenharmony_ci lines[-1] += encoded_word 29227db96d56Sopenharmony_ci to_encode = to_encode[len(to_encode_word):] 29237db96d56Sopenharmony_ci 29247db96d56Sopenharmony_ci if to_encode: 29257db96d56Sopenharmony_ci lines.append(' ') 29267db96d56Sopenharmony_ci new_last_ew = len(lines[-1]) 29277db96d56Sopenharmony_ci lines[-1] += trailing_wsp 29287db96d56Sopenharmony_ci return new_last_ew if ew_combine_allowed else None 29297db96d56Sopenharmony_ci 29307db96d56Sopenharmony_cidef _fold_mime_parameters(part, lines, maxlen, encoding): 29317db96d56Sopenharmony_ci """Fold TokenList 'part' into the 'lines' list as mime parameters. 29327db96d56Sopenharmony_ci 29337db96d56Sopenharmony_ci Using the decoded list of parameters and values, format them according to 29347db96d56Sopenharmony_ci the RFC rules, including using RFC2231 encoding if the value cannot be 29357db96d56Sopenharmony_ci expressed in 'encoding' and/or the parameter+value is too long to fit 29367db96d56Sopenharmony_ci within 'maxlen'. 29377db96d56Sopenharmony_ci 29387db96d56Sopenharmony_ci """ 29397db96d56Sopenharmony_ci # Special case for RFC2231 encoding: start from decoded values and use 29407db96d56Sopenharmony_ci # RFC2231 encoding iff needed. 29417db96d56Sopenharmony_ci # 29427db96d56Sopenharmony_ci # Note that the 1 and 2s being added to the length calculations are 29437db96d56Sopenharmony_ci # accounting for the possibly-needed spaces and semicolons we'll be adding. 29447db96d56Sopenharmony_ci # 29457db96d56Sopenharmony_ci for name, value in part.params: 29467db96d56Sopenharmony_ci # XXX What if this ';' puts us over maxlen the first time through the 29477db96d56Sopenharmony_ci # loop? We should split the header value onto a newline in that case, 29487db96d56Sopenharmony_ci # but to do that we need to recognize the need earlier or reparse the 29497db96d56Sopenharmony_ci # header, so I'm going to ignore that bug for now. It'll only put us 29507db96d56Sopenharmony_ci # one character over. 29517db96d56Sopenharmony_ci if not lines[-1].rstrip().endswith(';'): 29527db96d56Sopenharmony_ci lines[-1] += ';' 29537db96d56Sopenharmony_ci charset = encoding 29547db96d56Sopenharmony_ci error_handler = 'strict' 29557db96d56Sopenharmony_ci try: 29567db96d56Sopenharmony_ci value.encode(encoding) 29577db96d56Sopenharmony_ci encoding_required = False 29587db96d56Sopenharmony_ci except UnicodeEncodeError: 29597db96d56Sopenharmony_ci encoding_required = True 29607db96d56Sopenharmony_ci if utils._has_surrogates(value): 29617db96d56Sopenharmony_ci charset = 'unknown-8bit' 29627db96d56Sopenharmony_ci error_handler = 'surrogateescape' 29637db96d56Sopenharmony_ci else: 29647db96d56Sopenharmony_ci charset = 'utf-8' 29657db96d56Sopenharmony_ci if encoding_required: 29667db96d56Sopenharmony_ci encoded_value = urllib.parse.quote( 29677db96d56Sopenharmony_ci value, safe='', errors=error_handler) 29687db96d56Sopenharmony_ci tstr = "{}*={}''{}".format(name, charset, encoded_value) 29697db96d56Sopenharmony_ci else: 29707db96d56Sopenharmony_ci tstr = '{}={}'.format(name, quote_string(value)) 29717db96d56Sopenharmony_ci if len(lines[-1]) + len(tstr) + 1 < maxlen: 29727db96d56Sopenharmony_ci lines[-1] = lines[-1] + ' ' + tstr 29737db96d56Sopenharmony_ci continue 29747db96d56Sopenharmony_ci elif len(tstr) + 2 <= maxlen: 29757db96d56Sopenharmony_ci lines.append(' ' + tstr) 29767db96d56Sopenharmony_ci continue 29777db96d56Sopenharmony_ci # We need multiple sections. We are allowed to mix encoded and 29787db96d56Sopenharmony_ci # non-encoded sections, but we aren't going to. We'll encode them all. 29797db96d56Sopenharmony_ci section = 0 29807db96d56Sopenharmony_ci extra_chrome = charset + "''" 29817db96d56Sopenharmony_ci while value: 29827db96d56Sopenharmony_ci chrome_len = len(name) + len(str(section)) + 3 + len(extra_chrome) 29837db96d56Sopenharmony_ci if maxlen <= chrome_len + 3: 29847db96d56Sopenharmony_ci # We need room for the leading blank, the trailing semicolon, 29857db96d56Sopenharmony_ci # and at least one character of the value. If we don't 29867db96d56Sopenharmony_ci # have that, we'd be stuck, so in that case fall back to 29877db96d56Sopenharmony_ci # the RFC standard width. 29887db96d56Sopenharmony_ci maxlen = 78 29897db96d56Sopenharmony_ci splitpoint = maxchars = maxlen - chrome_len - 2 29907db96d56Sopenharmony_ci while True: 29917db96d56Sopenharmony_ci partial = value[:splitpoint] 29927db96d56Sopenharmony_ci encoded_value = urllib.parse.quote( 29937db96d56Sopenharmony_ci partial, safe='', errors=error_handler) 29947db96d56Sopenharmony_ci if len(encoded_value) <= maxchars: 29957db96d56Sopenharmony_ci break 29967db96d56Sopenharmony_ci splitpoint -= 1 29977db96d56Sopenharmony_ci lines.append(" {}*{}*={}{}".format( 29987db96d56Sopenharmony_ci name, section, extra_chrome, encoded_value)) 29997db96d56Sopenharmony_ci extra_chrome = '' 30007db96d56Sopenharmony_ci section += 1 30017db96d56Sopenharmony_ci value = value[splitpoint:] 30027db96d56Sopenharmony_ci if value: 30037db96d56Sopenharmony_ci lines[-1] += ';' 3004