17db96d56Sopenharmony_ci 27db96d56Sopenharmony_ci.. _lexical: 37db96d56Sopenharmony_ci 47db96d56Sopenharmony_ci**************** 57db96d56Sopenharmony_ciLexical analysis 67db96d56Sopenharmony_ci**************** 77db96d56Sopenharmony_ci 87db96d56Sopenharmony_ci.. index:: lexical analysis, parser, token 97db96d56Sopenharmony_ci 107db96d56Sopenharmony_ciA Python program is read by a *parser*. Input to the parser is a stream of 117db96d56Sopenharmony_ci*tokens*, generated by the *lexical analyzer*. This chapter describes how the 127db96d56Sopenharmony_cilexical analyzer breaks a file into tokens. 137db96d56Sopenharmony_ci 147db96d56Sopenharmony_ciPython reads program text as Unicode code points; the encoding of a source file 157db96d56Sopenharmony_cican be given by an encoding declaration and defaults to UTF-8, see :pep:`3120` 167db96d56Sopenharmony_cifor details. If the source file cannot be decoded, a :exc:`SyntaxError` is 177db96d56Sopenharmony_ciraised. 187db96d56Sopenharmony_ci 197db96d56Sopenharmony_ci 207db96d56Sopenharmony_ci.. _line-structure: 217db96d56Sopenharmony_ci 227db96d56Sopenharmony_ciLine structure 237db96d56Sopenharmony_ci============== 247db96d56Sopenharmony_ci 257db96d56Sopenharmony_ci.. index:: line structure 267db96d56Sopenharmony_ci 277db96d56Sopenharmony_ciA Python program is divided into a number of *logical lines*. 287db96d56Sopenharmony_ci 297db96d56Sopenharmony_ci 307db96d56Sopenharmony_ci.. _logical-lines: 317db96d56Sopenharmony_ci 327db96d56Sopenharmony_ciLogical lines 337db96d56Sopenharmony_ci------------- 347db96d56Sopenharmony_ci 357db96d56Sopenharmony_ci.. index:: logical line, physical line, line joining, NEWLINE token 367db96d56Sopenharmony_ci 377db96d56Sopenharmony_ciThe end of a logical line is represented by the token NEWLINE. Statements 387db96d56Sopenharmony_cicannot cross logical line boundaries except where NEWLINE is allowed by the 397db96d56Sopenharmony_cisyntax (e.g., between statements in compound statements). A logical line is 407db96d56Sopenharmony_ciconstructed from one or more *physical lines* by following the explicit or 417db96d56Sopenharmony_ciimplicit *line joining* rules. 427db96d56Sopenharmony_ci 437db96d56Sopenharmony_ci 447db96d56Sopenharmony_ci.. _physical-lines: 457db96d56Sopenharmony_ci 467db96d56Sopenharmony_ciPhysical lines 477db96d56Sopenharmony_ci-------------- 487db96d56Sopenharmony_ci 497db96d56Sopenharmony_ciA physical line is a sequence of characters terminated by an end-of-line 507db96d56Sopenharmony_cisequence. In source files and strings, any of the standard platform line 517db96d56Sopenharmony_citermination sequences can be used - the Unix form using ASCII LF (linefeed), 527db96d56Sopenharmony_cithe Windows form using the ASCII sequence CR LF (return followed by linefeed), 537db96d56Sopenharmony_cior the old Macintosh form using the ASCII CR (return) character. All of these 547db96d56Sopenharmony_ciforms can be used equally, regardless of platform. The end of input also serves 557db96d56Sopenharmony_cias an implicit terminator for the final physical line. 567db96d56Sopenharmony_ci 577db96d56Sopenharmony_ciWhen embedding Python, source code strings should be passed to Python APIs using 587db96d56Sopenharmony_cithe standard C conventions for newline characters (the ``\n`` character, 597db96d56Sopenharmony_cirepresenting ASCII LF, is the line terminator). 607db96d56Sopenharmony_ci 617db96d56Sopenharmony_ci 627db96d56Sopenharmony_ci.. _comments: 637db96d56Sopenharmony_ci 647db96d56Sopenharmony_ciComments 657db96d56Sopenharmony_ci-------- 667db96d56Sopenharmony_ci 677db96d56Sopenharmony_ci.. index:: comment, hash character 687db96d56Sopenharmony_ci single: # (hash); comment 697db96d56Sopenharmony_ci 707db96d56Sopenharmony_ciA comment starts with a hash character (``#``) that is not part of a string 717db96d56Sopenharmony_ciliteral, and ends at the end of the physical line. A comment signifies the end 727db96d56Sopenharmony_ciof the logical line unless the implicit line joining rules are invoked. Comments 737db96d56Sopenharmony_ciare ignored by the syntax. 747db96d56Sopenharmony_ci 757db96d56Sopenharmony_ci 767db96d56Sopenharmony_ci.. _encodings: 777db96d56Sopenharmony_ci 787db96d56Sopenharmony_ciEncoding declarations 797db96d56Sopenharmony_ci--------------------- 807db96d56Sopenharmony_ci 817db96d56Sopenharmony_ci.. index:: source character set, encoding declarations (source file) 827db96d56Sopenharmony_ci single: # (hash); source encoding declaration 837db96d56Sopenharmony_ci 847db96d56Sopenharmony_ciIf a comment in the first or second line of the Python script matches the 857db96d56Sopenharmony_ciregular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an 867db96d56Sopenharmony_ciencoding declaration; the first group of this expression names the encoding of 877db96d56Sopenharmony_cithe source code file. The encoding declaration must appear on a line of its 887db96d56Sopenharmony_ciown. If it is the second line, the first line must also be a comment-only line. 897db96d56Sopenharmony_ciThe recommended forms of an encoding expression are :: 907db96d56Sopenharmony_ci 917db96d56Sopenharmony_ci # -*- coding: <encoding-name> -*- 927db96d56Sopenharmony_ci 937db96d56Sopenharmony_ciwhich is recognized also by GNU Emacs, and :: 947db96d56Sopenharmony_ci 957db96d56Sopenharmony_ci # vim:fileencoding=<encoding-name> 967db96d56Sopenharmony_ci 977db96d56Sopenharmony_ciwhich is recognized by Bram Moolenaar's VIM. 987db96d56Sopenharmony_ci 997db96d56Sopenharmony_ciIf no encoding declaration is found, the default encoding is UTF-8. In 1007db96d56Sopenharmony_ciaddition, if the first bytes of the file are the UTF-8 byte-order mark 1017db96d56Sopenharmony_ci(``b'\xef\xbb\xbf'``), the declared file encoding is UTF-8 (this is supported, 1027db96d56Sopenharmony_ciamong others, by Microsoft's :program:`notepad`). 1037db96d56Sopenharmony_ci 1047db96d56Sopenharmony_ciIf an encoding is declared, the encoding name must be recognized by Python 1057db96d56Sopenharmony_ci(see :ref:`standard-encodings`). The 1067db96d56Sopenharmony_ciencoding is used for all lexical analysis, including string literals, comments 1077db96d56Sopenharmony_ciand identifiers. 1087db96d56Sopenharmony_ci 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_ci.. _explicit-joining: 1117db96d56Sopenharmony_ci 1127db96d56Sopenharmony_ciExplicit line joining 1137db96d56Sopenharmony_ci--------------------- 1147db96d56Sopenharmony_ci 1157db96d56Sopenharmony_ci.. index:: physical line, line joining, line continuation, backslash character 1167db96d56Sopenharmony_ci 1177db96d56Sopenharmony_ciTwo or more physical lines may be joined into logical lines using backslash 1187db96d56Sopenharmony_cicharacters (``\``), as follows: when a physical line ends in a backslash that is 1197db96d56Sopenharmony_cinot part of a string literal or comment, it is joined with the following forming 1207db96d56Sopenharmony_cia single logical line, deleting the backslash and the following end-of-line 1217db96d56Sopenharmony_cicharacter. For example:: 1227db96d56Sopenharmony_ci 1237db96d56Sopenharmony_ci if 1900 < year < 2100 and 1 <= month <= 12 \ 1247db96d56Sopenharmony_ci and 1 <= day <= 31 and 0 <= hour < 24 \ 1257db96d56Sopenharmony_ci and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date 1267db96d56Sopenharmony_ci return 1 1277db96d56Sopenharmony_ci 1287db96d56Sopenharmony_ciA line ending in a backslash cannot carry a comment. A backslash does not 1297db96d56Sopenharmony_cicontinue a comment. A backslash does not continue a token except for string 1307db96d56Sopenharmony_ciliterals (i.e., tokens other than string literals cannot be split across 1317db96d56Sopenharmony_ciphysical lines using a backslash). A backslash is illegal elsewhere on a line 1327db96d56Sopenharmony_cioutside a string literal. 1337db96d56Sopenharmony_ci 1347db96d56Sopenharmony_ci 1357db96d56Sopenharmony_ci.. _implicit-joining: 1367db96d56Sopenharmony_ci 1377db96d56Sopenharmony_ciImplicit line joining 1387db96d56Sopenharmony_ci--------------------- 1397db96d56Sopenharmony_ci 1407db96d56Sopenharmony_ciExpressions in parentheses, square brackets or curly braces can be split over 1417db96d56Sopenharmony_cimore than one physical line without using backslashes. For example:: 1427db96d56Sopenharmony_ci 1437db96d56Sopenharmony_ci month_names = ['Januari', 'Februari', 'Maart', # These are the 1447db96d56Sopenharmony_ci 'April', 'Mei', 'Juni', # Dutch names 1457db96d56Sopenharmony_ci 'Juli', 'Augustus', 'September', # for the months 1467db96d56Sopenharmony_ci 'Oktober', 'November', 'December'] # of the year 1477db96d56Sopenharmony_ci 1487db96d56Sopenharmony_ciImplicitly continued lines can carry comments. The indentation of the 1497db96d56Sopenharmony_cicontinuation lines is not important. Blank continuation lines are allowed. 1507db96d56Sopenharmony_ciThere is no NEWLINE token between implicit continuation lines. Implicitly 1517db96d56Sopenharmony_cicontinued lines can also occur within triple-quoted strings (see below); in that 1527db96d56Sopenharmony_cicase they cannot carry comments. 1537db96d56Sopenharmony_ci 1547db96d56Sopenharmony_ci 1557db96d56Sopenharmony_ci.. _blank-lines: 1567db96d56Sopenharmony_ci 1577db96d56Sopenharmony_ciBlank lines 1587db96d56Sopenharmony_ci----------- 1597db96d56Sopenharmony_ci 1607db96d56Sopenharmony_ci.. index:: single: blank line 1617db96d56Sopenharmony_ci 1627db96d56Sopenharmony_ciA logical line that contains only spaces, tabs, formfeeds and possibly a 1637db96d56Sopenharmony_cicomment, is ignored (i.e., no NEWLINE token is generated). During interactive 1647db96d56Sopenharmony_ciinput of statements, handling of a blank line may differ depending on the 1657db96d56Sopenharmony_ciimplementation of the read-eval-print loop. In the standard interactive 1667db96d56Sopenharmony_ciinterpreter, an entirely blank logical line (i.e. one containing not even 1677db96d56Sopenharmony_ciwhitespace or a comment) terminates a multi-line statement. 1687db96d56Sopenharmony_ci 1697db96d56Sopenharmony_ci 1707db96d56Sopenharmony_ci.. _indentation: 1717db96d56Sopenharmony_ci 1727db96d56Sopenharmony_ciIndentation 1737db96d56Sopenharmony_ci----------- 1747db96d56Sopenharmony_ci 1757db96d56Sopenharmony_ci.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping 1767db96d56Sopenharmony_ci 1777db96d56Sopenharmony_ciLeading whitespace (spaces and tabs) at the beginning of a logical line is used 1787db96d56Sopenharmony_cito compute the indentation level of the line, which in turn is used to determine 1797db96d56Sopenharmony_cithe grouping of statements. 1807db96d56Sopenharmony_ci 1817db96d56Sopenharmony_ciTabs are replaced (from left to right) by one to eight spaces such that the 1827db96d56Sopenharmony_citotal number of characters up to and including the replacement is a multiple of 1837db96d56Sopenharmony_cieight (this is intended to be the same rule as used by Unix). The total number 1847db96d56Sopenharmony_ciof spaces preceding the first non-blank character then determines the line's 1857db96d56Sopenharmony_ciindentation. Indentation cannot be split over multiple physical lines using 1867db96d56Sopenharmony_cibackslashes; the whitespace up to the first backslash determines the 1877db96d56Sopenharmony_ciindentation. 1887db96d56Sopenharmony_ci 1897db96d56Sopenharmony_ciIndentation is rejected as inconsistent if a source file mixes tabs and spaces 1907db96d56Sopenharmony_ciin a way that makes the meaning dependent on the worth of a tab in spaces; a 1917db96d56Sopenharmony_ci:exc:`TabError` is raised in that case. 1927db96d56Sopenharmony_ci 1937db96d56Sopenharmony_ci**Cross-platform compatibility note:** because of the nature of text editors on 1947db96d56Sopenharmony_cinon-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the 1957db96d56Sopenharmony_ciindentation in a single source file. It should also be noted that different 1967db96d56Sopenharmony_ciplatforms may explicitly limit the maximum indentation level. 1977db96d56Sopenharmony_ci 1987db96d56Sopenharmony_ciA formfeed character may be present at the start of the line; it will be ignored 1997db96d56Sopenharmony_cifor the indentation calculations above. Formfeed characters occurring elsewhere 2007db96d56Sopenharmony_ciin the leading whitespace have an undefined effect (for instance, they may reset 2017db96d56Sopenharmony_cithe space count to zero). 2027db96d56Sopenharmony_ci 2037db96d56Sopenharmony_ci.. index:: INDENT token, DEDENT token 2047db96d56Sopenharmony_ci 2057db96d56Sopenharmony_ciThe indentation levels of consecutive lines are used to generate INDENT and 2067db96d56Sopenharmony_ciDEDENT tokens, using a stack, as follows. 2077db96d56Sopenharmony_ci 2087db96d56Sopenharmony_ciBefore the first line of the file is read, a single zero is pushed on the stack; 2097db96d56Sopenharmony_cithis will never be popped off again. The numbers pushed on the stack will 2107db96d56Sopenharmony_cialways be strictly increasing from bottom to top. At the beginning of each 2117db96d56Sopenharmony_cilogical line, the line's indentation level is compared to the top of the stack. 2127db96d56Sopenharmony_ciIf it is equal, nothing happens. If it is larger, it is pushed on the stack, and 2137db96d56Sopenharmony_cione INDENT token is generated. If it is smaller, it *must* be one of the 2147db96d56Sopenharmony_cinumbers occurring on the stack; all numbers on the stack that are larger are 2157db96d56Sopenharmony_cipopped off, and for each number popped off a DEDENT token is generated. At the 2167db96d56Sopenharmony_ciend of the file, a DEDENT token is generated for each number remaining on the 2177db96d56Sopenharmony_cistack that is larger than zero. 2187db96d56Sopenharmony_ci 2197db96d56Sopenharmony_ciHere is an example of a correctly (though confusingly) indented piece of Python 2207db96d56Sopenharmony_cicode:: 2217db96d56Sopenharmony_ci 2227db96d56Sopenharmony_ci def perm(l): 2237db96d56Sopenharmony_ci # Compute the list of all permutations of l 2247db96d56Sopenharmony_ci if len(l) <= 1: 2257db96d56Sopenharmony_ci return [l] 2267db96d56Sopenharmony_ci r = [] 2277db96d56Sopenharmony_ci for i in range(len(l)): 2287db96d56Sopenharmony_ci s = l[:i] + l[i+1:] 2297db96d56Sopenharmony_ci p = perm(s) 2307db96d56Sopenharmony_ci for x in p: 2317db96d56Sopenharmony_ci r.append(l[i:i+1] + x) 2327db96d56Sopenharmony_ci return r 2337db96d56Sopenharmony_ci 2347db96d56Sopenharmony_ciThe following example shows various indentation errors:: 2357db96d56Sopenharmony_ci 2367db96d56Sopenharmony_ci def perm(l): # error: first line indented 2377db96d56Sopenharmony_ci for i in range(len(l)): # error: not indented 2387db96d56Sopenharmony_ci s = l[:i] + l[i+1:] 2397db96d56Sopenharmony_ci p = perm(l[:i] + l[i+1:]) # error: unexpected indent 2407db96d56Sopenharmony_ci for x in p: 2417db96d56Sopenharmony_ci r.append(l[i:i+1] + x) 2427db96d56Sopenharmony_ci return r # error: inconsistent dedent 2437db96d56Sopenharmony_ci 2447db96d56Sopenharmony_ci(Actually, the first three errors are detected by the parser; only the last 2457db96d56Sopenharmony_cierror is found by the lexical analyzer --- the indentation of ``return r`` does 2467db96d56Sopenharmony_cinot match a level popped off the stack.) 2477db96d56Sopenharmony_ci 2487db96d56Sopenharmony_ci 2497db96d56Sopenharmony_ci.. _whitespace: 2507db96d56Sopenharmony_ci 2517db96d56Sopenharmony_ciWhitespace between tokens 2527db96d56Sopenharmony_ci------------------------- 2537db96d56Sopenharmony_ci 2547db96d56Sopenharmony_ciExcept at the beginning of a logical line or in string literals, the whitespace 2557db96d56Sopenharmony_cicharacters space, tab and formfeed can be used interchangeably to separate 2567db96d56Sopenharmony_citokens. Whitespace is needed between two tokens only if their concatenation 2577db96d56Sopenharmony_cicould otherwise be interpreted as a different token (e.g., ab is one token, but 2587db96d56Sopenharmony_cia b is two tokens). 2597db96d56Sopenharmony_ci 2607db96d56Sopenharmony_ci 2617db96d56Sopenharmony_ci.. _other-tokens: 2627db96d56Sopenharmony_ci 2637db96d56Sopenharmony_ciOther tokens 2647db96d56Sopenharmony_ci============ 2657db96d56Sopenharmony_ci 2667db96d56Sopenharmony_ciBesides NEWLINE, INDENT and DEDENT, the following categories of tokens exist: 2677db96d56Sopenharmony_ci*identifiers*, *keywords*, *literals*, *operators*, and *delimiters*. Whitespace 2687db96d56Sopenharmony_cicharacters (other than line terminators, discussed earlier) are not tokens, but 2697db96d56Sopenharmony_ciserve to delimit tokens. Where ambiguity exists, a token comprises the longest 2707db96d56Sopenharmony_cipossible string that forms a legal token, when read from left to right. 2717db96d56Sopenharmony_ci 2727db96d56Sopenharmony_ci 2737db96d56Sopenharmony_ci.. _identifiers: 2747db96d56Sopenharmony_ci 2757db96d56Sopenharmony_ciIdentifiers and keywords 2767db96d56Sopenharmony_ci======================== 2777db96d56Sopenharmony_ci 2787db96d56Sopenharmony_ci.. index:: identifier, name 2797db96d56Sopenharmony_ci 2807db96d56Sopenharmony_ciIdentifiers (also referred to as *names*) are described by the following lexical 2817db96d56Sopenharmony_cidefinitions. 2827db96d56Sopenharmony_ci 2837db96d56Sopenharmony_ciThe syntax of identifiers in Python is based on the Unicode standard annex 2847db96d56Sopenharmony_ciUAX-31, with elaboration and changes as defined below; see also :pep:`3131` for 2857db96d56Sopenharmony_cifurther details. 2867db96d56Sopenharmony_ci 2877db96d56Sopenharmony_ciWithin the ASCII range (U+0001..U+007F), the valid characters for identifiers 2887db96d56Sopenharmony_ciare the same as in Python 2.x: the uppercase and lowercase letters ``A`` through 2897db96d56Sopenharmony_ci``Z``, the underscore ``_`` and, except for the first character, the digits 2907db96d56Sopenharmony_ci``0`` through ``9``. 2917db96d56Sopenharmony_ci 2927db96d56Sopenharmony_ciPython 3.0 introduces additional characters from outside the ASCII range (see 2937db96d56Sopenharmony_ci:pep:`3131`). For these characters, the classification uses the version of the 2947db96d56Sopenharmony_ciUnicode Character Database as included in the :mod:`unicodedata` module. 2957db96d56Sopenharmony_ci 2967db96d56Sopenharmony_ciIdentifiers are unlimited in length. Case is significant. 2977db96d56Sopenharmony_ci 2987db96d56Sopenharmony_ci.. productionlist:: python-grammar 2997db96d56Sopenharmony_ci identifier: `xid_start` `xid_continue`* 3007db96d56Sopenharmony_ci id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property> 3017db96d56Sopenharmony_ci id_continue: <all characters in `id_start`, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property> 3027db96d56Sopenharmony_ci xid_start: <all characters in `id_start` whose NFKC normalization is in "id_start xid_continue*"> 3037db96d56Sopenharmony_ci xid_continue: <all characters in `id_continue` whose NFKC normalization is in "id_continue*"> 3047db96d56Sopenharmony_ci 3057db96d56Sopenharmony_ciThe Unicode category codes mentioned above stand for: 3067db96d56Sopenharmony_ci 3077db96d56Sopenharmony_ci* *Lu* - uppercase letters 3087db96d56Sopenharmony_ci* *Ll* - lowercase letters 3097db96d56Sopenharmony_ci* *Lt* - titlecase letters 3107db96d56Sopenharmony_ci* *Lm* - modifier letters 3117db96d56Sopenharmony_ci* *Lo* - other letters 3127db96d56Sopenharmony_ci* *Nl* - letter numbers 3137db96d56Sopenharmony_ci* *Mn* - nonspacing marks 3147db96d56Sopenharmony_ci* *Mc* - spacing combining marks 3157db96d56Sopenharmony_ci* *Nd* - decimal numbers 3167db96d56Sopenharmony_ci* *Pc* - connector punctuations 3177db96d56Sopenharmony_ci* *Other_ID_Start* - explicit list of characters in `PropList.txt 3187db96d56Sopenharmony_ci <https://www.unicode.org/Public/14.0.0/ucd/PropList.txt>`_ to support backwards 3197db96d56Sopenharmony_ci compatibility 3207db96d56Sopenharmony_ci* *Other_ID_Continue* - likewise 3217db96d56Sopenharmony_ci 3227db96d56Sopenharmony_ciAll identifiers are converted into the normal form NFKC while parsing; comparison 3237db96d56Sopenharmony_ciof identifiers is based on NFKC. 3247db96d56Sopenharmony_ci 3257db96d56Sopenharmony_ciA non-normative HTML file listing all valid identifier characters for Unicode 3267db96d56Sopenharmony_ci14.0.0 can be found at 3277db96d56Sopenharmony_cihttps://www.unicode.org/Public/14.0.0/ucd/DerivedCoreProperties.txt 3287db96d56Sopenharmony_ci 3297db96d56Sopenharmony_ci 3307db96d56Sopenharmony_ci.. _keywords: 3317db96d56Sopenharmony_ci 3327db96d56Sopenharmony_ciKeywords 3337db96d56Sopenharmony_ci-------- 3347db96d56Sopenharmony_ci 3357db96d56Sopenharmony_ci.. index:: 3367db96d56Sopenharmony_ci single: keyword 3377db96d56Sopenharmony_ci single: reserved word 3387db96d56Sopenharmony_ci 3397db96d56Sopenharmony_ciThe following identifiers are used as reserved words, or *keywords* of the 3407db96d56Sopenharmony_cilanguage, and cannot be used as ordinary identifiers. They must be spelled 3417db96d56Sopenharmony_ciexactly as written here: 3427db96d56Sopenharmony_ci 3437db96d56Sopenharmony_ci.. sourcecode:: text 3447db96d56Sopenharmony_ci 3457db96d56Sopenharmony_ci False await else import pass 3467db96d56Sopenharmony_ci None break except in raise 3477db96d56Sopenharmony_ci True class finally is return 3487db96d56Sopenharmony_ci and continue for lambda try 3497db96d56Sopenharmony_ci as def from nonlocal while 3507db96d56Sopenharmony_ci assert del global not with 3517db96d56Sopenharmony_ci async elif if or yield 3527db96d56Sopenharmony_ci 3537db96d56Sopenharmony_ci 3547db96d56Sopenharmony_ci.. _soft-keywords: 3557db96d56Sopenharmony_ci 3567db96d56Sopenharmony_ciSoft Keywords 3577db96d56Sopenharmony_ci------------- 3587db96d56Sopenharmony_ci 3597db96d56Sopenharmony_ci.. index:: soft keyword, keyword 3607db96d56Sopenharmony_ci 3617db96d56Sopenharmony_ci.. versionadded:: 3.10 3627db96d56Sopenharmony_ci 3637db96d56Sopenharmony_ciSome identifiers are only reserved under specific contexts. These are known as 3647db96d56Sopenharmony_ci*soft keywords*. The identifiers ``match``, ``case`` and ``_`` can 3657db96d56Sopenharmony_cisyntactically act as keywords in contexts related to the pattern matching 3667db96d56Sopenharmony_cistatement, but this distinction is done at the parser level, not when 3677db96d56Sopenharmony_citokenizing. 3687db96d56Sopenharmony_ci 3697db96d56Sopenharmony_ciAs soft keywords, their use with pattern matching is possible while still 3707db96d56Sopenharmony_cipreserving compatibility with existing code that uses ``match``, ``case`` and ``_`` as 3717db96d56Sopenharmony_ciidentifier names. 3727db96d56Sopenharmony_ci 3737db96d56Sopenharmony_ci 3747db96d56Sopenharmony_ci.. index:: 3757db96d56Sopenharmony_ci single: _, identifiers 3767db96d56Sopenharmony_ci single: __, identifiers 3777db96d56Sopenharmony_ci.. _id-classes: 3787db96d56Sopenharmony_ci 3797db96d56Sopenharmony_ciReserved classes of identifiers 3807db96d56Sopenharmony_ci------------------------------- 3817db96d56Sopenharmony_ci 3827db96d56Sopenharmony_ciCertain classes of identifiers (besides keywords) have special meanings. These 3837db96d56Sopenharmony_ciclasses are identified by the patterns of leading and trailing underscore 3847db96d56Sopenharmony_cicharacters: 3857db96d56Sopenharmony_ci 3867db96d56Sopenharmony_ci``_*`` 3877db96d56Sopenharmony_ci Not imported by ``from module import *``. 3887db96d56Sopenharmony_ci 3897db96d56Sopenharmony_ci``_`` 3907db96d56Sopenharmony_ci In a ``case`` pattern within a :keyword:`match` statement, ``_`` is a 3917db96d56Sopenharmony_ci :ref:`soft keyword <soft-keywords>` that denotes a 3927db96d56Sopenharmony_ci :ref:`wildcard <wildcard-patterns>`. 3937db96d56Sopenharmony_ci 3947db96d56Sopenharmony_ci Separately, the interactive interpreter makes the result of the last evaluation 3957db96d56Sopenharmony_ci available in the variable ``_``. 3967db96d56Sopenharmony_ci (It is stored in the :mod:`builtins` module, alongside built-in 3977db96d56Sopenharmony_ci functions like ``print``.) 3987db96d56Sopenharmony_ci 3997db96d56Sopenharmony_ci Elsewhere, ``_`` is a regular identifier. It is often used to name 4007db96d56Sopenharmony_ci "special" items, but it is not special to Python itself. 4017db96d56Sopenharmony_ci 4027db96d56Sopenharmony_ci .. note:: 4037db96d56Sopenharmony_ci 4047db96d56Sopenharmony_ci The name ``_`` is often used in conjunction with internationalization; 4057db96d56Sopenharmony_ci refer to the documentation for the :mod:`gettext` module for more 4067db96d56Sopenharmony_ci information on this convention. 4077db96d56Sopenharmony_ci 4087db96d56Sopenharmony_ci It is also commonly used for unused variables. 4097db96d56Sopenharmony_ci 4107db96d56Sopenharmony_ci``__*__`` 4117db96d56Sopenharmony_ci System-defined names, informally known as "dunder" names. These names are 4127db96d56Sopenharmony_ci defined by the interpreter and its implementation (including the standard library). 4137db96d56Sopenharmony_ci Current system names are discussed in the :ref:`specialnames` section and elsewhere. 4147db96d56Sopenharmony_ci More will likely be defined in future versions of Python. *Any* use of ``__*__`` names, 4157db96d56Sopenharmony_ci in any context, that does not follow explicitly documented use, is subject to 4167db96d56Sopenharmony_ci breakage without warning. 4177db96d56Sopenharmony_ci 4187db96d56Sopenharmony_ci``__*`` 4197db96d56Sopenharmony_ci Class-private names. Names in this category, when used within the context of a 4207db96d56Sopenharmony_ci class definition, are re-written to use a mangled form to help avoid name 4217db96d56Sopenharmony_ci clashes between "private" attributes of base and derived classes. See section 4227db96d56Sopenharmony_ci :ref:`atom-identifiers`. 4237db96d56Sopenharmony_ci 4247db96d56Sopenharmony_ci 4257db96d56Sopenharmony_ci.. _literals: 4267db96d56Sopenharmony_ci 4277db96d56Sopenharmony_ciLiterals 4287db96d56Sopenharmony_ci======== 4297db96d56Sopenharmony_ci 4307db96d56Sopenharmony_ci.. index:: literal, constant 4317db96d56Sopenharmony_ci 4327db96d56Sopenharmony_ciLiterals are notations for constant values of some built-in types. 4337db96d56Sopenharmony_ci 4347db96d56Sopenharmony_ci 4357db96d56Sopenharmony_ci.. index:: string literal, bytes literal, ASCII 4367db96d56Sopenharmony_ci single: ' (single quote); string literal 4377db96d56Sopenharmony_ci single: " (double quote); string literal 4387db96d56Sopenharmony_ci single: u'; string literal 4397db96d56Sopenharmony_ci single: u"; string literal 4407db96d56Sopenharmony_ci.. _strings: 4417db96d56Sopenharmony_ci 4427db96d56Sopenharmony_ciString and Bytes literals 4437db96d56Sopenharmony_ci------------------------- 4447db96d56Sopenharmony_ci 4457db96d56Sopenharmony_ciString literals are described by the following lexical definitions: 4467db96d56Sopenharmony_ci 4477db96d56Sopenharmony_ci.. productionlist:: python-grammar 4487db96d56Sopenharmony_ci stringliteral: [`stringprefix`](`shortstring` | `longstring`) 4497db96d56Sopenharmony_ci stringprefix: "r" | "u" | "R" | "U" | "f" | "F" 4507db96d56Sopenharmony_ci : | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" 4517db96d56Sopenharmony_ci shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"' 4527db96d56Sopenharmony_ci longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""' 4537db96d56Sopenharmony_ci shortstringitem: `shortstringchar` | `stringescapeseq` 4547db96d56Sopenharmony_ci longstringitem: `longstringchar` | `stringescapeseq` 4557db96d56Sopenharmony_ci shortstringchar: <any source character except "\" or newline or the quote> 4567db96d56Sopenharmony_ci longstringchar: <any source character except "\"> 4577db96d56Sopenharmony_ci stringescapeseq: "\" <any source character> 4587db96d56Sopenharmony_ci 4597db96d56Sopenharmony_ci.. productionlist:: python-grammar 4607db96d56Sopenharmony_ci bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`) 4617db96d56Sopenharmony_ci bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" 4627db96d56Sopenharmony_ci shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"' 4637db96d56Sopenharmony_ci longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""' 4647db96d56Sopenharmony_ci shortbytesitem: `shortbyteschar` | `bytesescapeseq` 4657db96d56Sopenharmony_ci longbytesitem: `longbyteschar` | `bytesescapeseq` 4667db96d56Sopenharmony_ci shortbyteschar: <any ASCII character except "\" or newline or the quote> 4677db96d56Sopenharmony_ci longbyteschar: <any ASCII character except "\"> 4687db96d56Sopenharmony_ci bytesescapeseq: "\" <any ASCII character> 4697db96d56Sopenharmony_ci 4707db96d56Sopenharmony_ciOne syntactic restriction not indicated by these productions is that whitespace 4717db96d56Sopenharmony_ciis not allowed between the :token:`~python-grammar:stringprefix` or 4727db96d56Sopenharmony_ci:token:`~python-grammar:bytesprefix` and the rest of the literal. The source 4737db96d56Sopenharmony_cicharacter set is defined by the encoding declaration; it is UTF-8 if no encoding 4747db96d56Sopenharmony_cideclaration is given in the source file; see section :ref:`encodings`. 4757db96d56Sopenharmony_ci 4767db96d56Sopenharmony_ci.. index:: triple-quoted string, Unicode Consortium, raw string 4777db96d56Sopenharmony_ci single: """; string literal 4787db96d56Sopenharmony_ci single: '''; string literal 4797db96d56Sopenharmony_ci 4807db96d56Sopenharmony_ciIn plain English: Both types of literals can be enclosed in matching single quotes 4817db96d56Sopenharmony_ci(``'``) or double quotes (``"``). They can also be enclosed in matching groups 4827db96d56Sopenharmony_ciof three single or double quotes (these are generally referred to as 4837db96d56Sopenharmony_ci*triple-quoted strings*). The backslash (``\``) character is used to escape 4847db96d56Sopenharmony_cicharacters that otherwise have a special meaning, such as newline, backslash 4857db96d56Sopenharmony_ciitself, or the quote character. 4867db96d56Sopenharmony_ci 4877db96d56Sopenharmony_ci.. index:: 4887db96d56Sopenharmony_ci single: b'; bytes literal 4897db96d56Sopenharmony_ci single: b"; bytes literal 4907db96d56Sopenharmony_ci 4917db96d56Sopenharmony_ciBytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an 4927db96d56Sopenharmony_ciinstance of the :class:`bytes` type instead of the :class:`str` type. They 4937db96d56Sopenharmony_cimay only contain ASCII characters; bytes with a numeric value of 128 or greater 4947db96d56Sopenharmony_cimust be expressed with escapes. 4957db96d56Sopenharmony_ci 4967db96d56Sopenharmony_ci.. index:: 4977db96d56Sopenharmony_ci single: r'; raw string literal 4987db96d56Sopenharmony_ci single: r"; raw string literal 4997db96d56Sopenharmony_ci 5007db96d56Sopenharmony_ciBoth string and bytes literals may optionally be prefixed with a letter ``'r'`` 5017db96d56Sopenharmony_cior ``'R'``; such strings are called :dfn:`raw strings` and treat backslashes as 5027db96d56Sopenharmony_ciliteral characters. As a result, in string literals, ``'\U'`` and ``'\u'`` 5037db96d56Sopenharmony_ciescapes in raw strings are not treated specially. Given that Python 2.x's raw 5047db96d56Sopenharmony_ciunicode literals behave differently than Python 3.x's the ``'ur'`` syntax 5057db96d56Sopenharmony_ciis not supported. 5067db96d56Sopenharmony_ci 5077db96d56Sopenharmony_ci.. versionadded:: 3.3 5087db96d56Sopenharmony_ci The ``'rb'`` prefix of raw bytes literals has been added as a synonym 5097db96d56Sopenharmony_ci of ``'br'``. 5107db96d56Sopenharmony_ci 5117db96d56Sopenharmony_ci.. versionadded:: 3.3 5127db96d56Sopenharmony_ci Support for the unicode legacy literal (``u'value'``) was reintroduced 5137db96d56Sopenharmony_ci to simplify the maintenance of dual Python 2.x and 3.x codebases. 5147db96d56Sopenharmony_ci See :pep:`414` for more information. 5157db96d56Sopenharmony_ci 5167db96d56Sopenharmony_ci.. index:: 5177db96d56Sopenharmony_ci single: f'; formatted string literal 5187db96d56Sopenharmony_ci single: f"; formatted string literal 5197db96d56Sopenharmony_ci 5207db96d56Sopenharmony_ciA string literal with ``'f'`` or ``'F'`` in its prefix is a 5217db96d56Sopenharmony_ci:dfn:`formatted string literal`; see :ref:`f-strings`. The ``'f'`` may be 5227db96d56Sopenharmony_cicombined with ``'r'``, but not with ``'b'`` or ``'u'``, therefore raw 5237db96d56Sopenharmony_ciformatted strings are possible, but formatted bytes literals are not. 5247db96d56Sopenharmony_ci 5257db96d56Sopenharmony_ciIn triple-quoted literals, unescaped newlines and quotes are allowed (and are 5267db96d56Sopenharmony_ciretained), except that three unescaped quotes in a row terminate the literal. (A 5277db96d56Sopenharmony_ci"quote" is the character used to open the literal, i.e. either ``'`` or ``"``.) 5287db96d56Sopenharmony_ci 5297db96d56Sopenharmony_ci.. index:: physical line, escape sequence, Standard C, C 5307db96d56Sopenharmony_ci single: \ (backslash); escape sequence 5317db96d56Sopenharmony_ci single: \\; escape sequence 5327db96d56Sopenharmony_ci single: \a; escape sequence 5337db96d56Sopenharmony_ci single: \b; escape sequence 5347db96d56Sopenharmony_ci single: \f; escape sequence 5357db96d56Sopenharmony_ci single: \n; escape sequence 5367db96d56Sopenharmony_ci single: \r; escape sequence 5377db96d56Sopenharmony_ci single: \t; escape sequence 5387db96d56Sopenharmony_ci single: \v; escape sequence 5397db96d56Sopenharmony_ci single: \x; escape sequence 5407db96d56Sopenharmony_ci single: \N; escape sequence 5417db96d56Sopenharmony_ci single: \u; escape sequence 5427db96d56Sopenharmony_ci single: \U; escape sequence 5437db96d56Sopenharmony_ci 5447db96d56Sopenharmony_ciUnless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and 5457db96d56Sopenharmony_cibytes literals are interpreted according to rules similar to those used by 5467db96d56Sopenharmony_ciStandard C. The recognized escape sequences are: 5477db96d56Sopenharmony_ci 5487db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5497db96d56Sopenharmony_ci| Escape Sequence | Meaning | Notes | 5507db96d56Sopenharmony_ci+=================+=================================+=======+ 5517db96d56Sopenharmony_ci| ``\``\ <newline>| Backslash and newline ignored | \(1) | 5527db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5537db96d56Sopenharmony_ci| ``\\`` | Backslash (``\``) | | 5547db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5557db96d56Sopenharmony_ci| ``\'`` | Single quote (``'``) | | 5567db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5577db96d56Sopenharmony_ci| ``\"`` | Double quote (``"``) | | 5587db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5597db96d56Sopenharmony_ci| ``\a`` | ASCII Bell (BEL) | | 5607db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5617db96d56Sopenharmony_ci| ``\b`` | ASCII Backspace (BS) | | 5627db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5637db96d56Sopenharmony_ci| ``\f`` | ASCII Formfeed (FF) | | 5647db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5657db96d56Sopenharmony_ci| ``\n`` | ASCII Linefeed (LF) | | 5667db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5677db96d56Sopenharmony_ci| ``\r`` | ASCII Carriage Return (CR) | | 5687db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5697db96d56Sopenharmony_ci| ``\t`` | ASCII Horizontal Tab (TAB) | | 5707db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5717db96d56Sopenharmony_ci| ``\v`` | ASCII Vertical Tab (VT) | | 5727db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5737db96d56Sopenharmony_ci| ``\ooo`` | Character with octal value | (2,4) | 5747db96d56Sopenharmony_ci| | *ooo* | | 5757db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5767db96d56Sopenharmony_ci| ``\xhh`` | Character with hex value *hh* | (3,4) | 5777db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5787db96d56Sopenharmony_ci 5797db96d56Sopenharmony_ciEscape sequences only recognized in string literals are: 5807db96d56Sopenharmony_ci 5817db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5827db96d56Sopenharmony_ci| Escape Sequence | Meaning | Notes | 5837db96d56Sopenharmony_ci+=================+=================================+=======+ 5847db96d56Sopenharmony_ci| ``\N{name}`` | Character named *name* in the | \(5) | 5857db96d56Sopenharmony_ci| | Unicode database | | 5867db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5877db96d56Sopenharmony_ci| ``\uxxxx`` | Character with 16-bit hex value | \(6) | 5887db96d56Sopenharmony_ci| | *xxxx* | | 5897db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5907db96d56Sopenharmony_ci| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(7) | 5917db96d56Sopenharmony_ci| | *xxxxxxxx* | | 5927db96d56Sopenharmony_ci+-----------------+---------------------------------+-------+ 5937db96d56Sopenharmony_ci 5947db96d56Sopenharmony_ciNotes: 5957db96d56Sopenharmony_ci 5967db96d56Sopenharmony_ci(1) 5977db96d56Sopenharmony_ci A backslash can be added at the end of a line to ignore the newline:: 5987db96d56Sopenharmony_ci 5997db96d56Sopenharmony_ci >>> 'This string will not include \ 6007db96d56Sopenharmony_ci ... backslashes or newline characters.' 6017db96d56Sopenharmony_ci 'This string will not include backslashes or newline characters.' 6027db96d56Sopenharmony_ci 6037db96d56Sopenharmony_ci The same result can be achieved using :ref:`triple-quoted strings <strings>`, 6047db96d56Sopenharmony_ci or parentheses and :ref:`string literal concatenation <string-concatenation>`. 6057db96d56Sopenharmony_ci 6067db96d56Sopenharmony_ci 6077db96d56Sopenharmony_ci(2) 6087db96d56Sopenharmony_ci As in Standard C, up to three octal digits are accepted. 6097db96d56Sopenharmony_ci 6107db96d56Sopenharmony_ci .. versionchanged:: 3.11 6117db96d56Sopenharmony_ci Octal escapes with value larger than ``0o377`` produce a :exc:`DeprecationWarning`. 6127db96d56Sopenharmony_ci In a future Python version they will be a :exc:`SyntaxWarning` and 6137db96d56Sopenharmony_ci eventually a :exc:`SyntaxError`. 6147db96d56Sopenharmony_ci 6157db96d56Sopenharmony_ci(3) 6167db96d56Sopenharmony_ci Unlike in Standard C, exactly two hex digits are required. 6177db96d56Sopenharmony_ci 6187db96d56Sopenharmony_ci(4) 6197db96d56Sopenharmony_ci In a bytes literal, hexadecimal and octal escapes denote the byte with the 6207db96d56Sopenharmony_ci given value. In a string literal, these escapes denote a Unicode character 6217db96d56Sopenharmony_ci with the given value. 6227db96d56Sopenharmony_ci 6237db96d56Sopenharmony_ci(5) 6247db96d56Sopenharmony_ci .. versionchanged:: 3.3 6257db96d56Sopenharmony_ci Support for name aliases [#]_ has been added. 6267db96d56Sopenharmony_ci 6277db96d56Sopenharmony_ci(6) 6287db96d56Sopenharmony_ci Exactly four hex digits are required. 6297db96d56Sopenharmony_ci 6307db96d56Sopenharmony_ci(7) 6317db96d56Sopenharmony_ci Any Unicode character can be encoded this way. Exactly eight hex digits 6327db96d56Sopenharmony_ci are required. 6337db96d56Sopenharmony_ci 6347db96d56Sopenharmony_ci 6357db96d56Sopenharmony_ci.. index:: unrecognized escape sequence 6367db96d56Sopenharmony_ci 6377db96d56Sopenharmony_ciUnlike Standard C, all unrecognized escape sequences are left in the string 6387db96d56Sopenharmony_ciunchanged, i.e., *the backslash is left in the result*. (This behavior is 6397db96d56Sopenharmony_ciuseful when debugging: if an escape sequence is mistyped, the resulting output 6407db96d56Sopenharmony_ciis more easily recognized as broken.) It is also important to note that the 6417db96d56Sopenharmony_ciescape sequences only recognized in string literals fall into the category of 6427db96d56Sopenharmony_ciunrecognized escapes for bytes literals. 6437db96d56Sopenharmony_ci 6447db96d56Sopenharmony_ci .. versionchanged:: 3.6 6457db96d56Sopenharmony_ci Unrecognized escape sequences produce a :exc:`DeprecationWarning`. In 6467db96d56Sopenharmony_ci a future Python version they will be a :exc:`SyntaxWarning` and 6477db96d56Sopenharmony_ci eventually a :exc:`SyntaxError`. 6487db96d56Sopenharmony_ci 6497db96d56Sopenharmony_ciEven in a raw literal, quotes can be escaped with a backslash, but the 6507db96d56Sopenharmony_cibackslash remains in the result; for example, ``r"\""`` is a valid string 6517db96d56Sopenharmony_ciliteral consisting of two characters: a backslash and a double quote; ``r"\"`` 6527db96d56Sopenharmony_ciis not a valid string literal (even a raw string cannot end in an odd number of 6537db96d56Sopenharmony_cibackslashes). Specifically, *a raw literal cannot end in a single backslash* 6547db96d56Sopenharmony_ci(since the backslash would escape the following quote character). Note also 6557db96d56Sopenharmony_cithat a single backslash followed by a newline is interpreted as those two 6567db96d56Sopenharmony_cicharacters as part of the literal, *not* as a line continuation. 6577db96d56Sopenharmony_ci 6587db96d56Sopenharmony_ci 6597db96d56Sopenharmony_ci.. _string-concatenation: 6607db96d56Sopenharmony_ci 6617db96d56Sopenharmony_ciString literal concatenation 6627db96d56Sopenharmony_ci---------------------------- 6637db96d56Sopenharmony_ci 6647db96d56Sopenharmony_ciMultiple adjacent string or bytes literals (delimited by whitespace), possibly 6657db96d56Sopenharmony_ciusing different quoting conventions, are allowed, and their meaning is the same 6667db96d56Sopenharmony_cias their concatenation. Thus, ``"hello" 'world'`` is equivalent to 6677db96d56Sopenharmony_ci``"helloworld"``. This feature can be used to reduce the number of backslashes 6687db96d56Sopenharmony_cineeded, to split long strings conveniently across long lines, or even to add 6697db96d56Sopenharmony_cicomments to parts of strings, for example:: 6707db96d56Sopenharmony_ci 6717db96d56Sopenharmony_ci re.compile("[A-Za-z_]" # letter or underscore 6727db96d56Sopenharmony_ci "[A-Za-z0-9_]*" # letter, digit or underscore 6737db96d56Sopenharmony_ci ) 6747db96d56Sopenharmony_ci 6757db96d56Sopenharmony_ciNote that this feature is defined at the syntactical level, but implemented at 6767db96d56Sopenharmony_cicompile time. The '+' operator must be used to concatenate string expressions 6777db96d56Sopenharmony_ciat run time. Also note that literal concatenation can use different quoting 6787db96d56Sopenharmony_cistyles for each component (even mixing raw strings and triple quoted strings), 6797db96d56Sopenharmony_ciand formatted string literals may be concatenated with plain string literals. 6807db96d56Sopenharmony_ci 6817db96d56Sopenharmony_ci 6827db96d56Sopenharmony_ci.. index:: 6837db96d56Sopenharmony_ci single: formatted string literal 6847db96d56Sopenharmony_ci single: interpolated string literal 6857db96d56Sopenharmony_ci single: string; formatted literal 6867db96d56Sopenharmony_ci single: string; interpolated literal 6877db96d56Sopenharmony_ci single: f-string 6887db96d56Sopenharmony_ci single: fstring 6897db96d56Sopenharmony_ci single: {} (curly brackets); in formatted string literal 6907db96d56Sopenharmony_ci single: ! (exclamation); in formatted string literal 6917db96d56Sopenharmony_ci single: : (colon); in formatted string literal 6927db96d56Sopenharmony_ci single: = (equals); for help in debugging using string literals 6937db96d56Sopenharmony_ci.. _f-strings: 6947db96d56Sopenharmony_ci 6957db96d56Sopenharmony_ciFormatted string literals 6967db96d56Sopenharmony_ci------------------------- 6977db96d56Sopenharmony_ci 6987db96d56Sopenharmony_ci.. versionadded:: 3.6 6997db96d56Sopenharmony_ci 7007db96d56Sopenharmony_ciA :dfn:`formatted string literal` or :dfn:`f-string` is a string literal 7017db96d56Sopenharmony_cithat is prefixed with ``'f'`` or ``'F'``. These strings may contain 7027db96d56Sopenharmony_cireplacement fields, which are expressions delimited by curly braces ``{}``. 7037db96d56Sopenharmony_ciWhile other string literals always have a constant value, formatted strings 7047db96d56Sopenharmony_ciare really expressions evaluated at run time. 7057db96d56Sopenharmony_ci 7067db96d56Sopenharmony_ciEscape sequences are decoded like in ordinary string literals (except when 7077db96d56Sopenharmony_cia literal is also marked as a raw string). After decoding, the grammar 7087db96d56Sopenharmony_cifor the contents of the string is: 7097db96d56Sopenharmony_ci 7107db96d56Sopenharmony_ci.. productionlist:: python-grammar 7117db96d56Sopenharmony_ci f_string: (`literal_char` | "{{" | "}}" | `replacement_field`)* 7127db96d56Sopenharmony_ci replacement_field: "{" `f_expression` ["="] ["!" `conversion`] [":" `format_spec`] "}" 7137db96d56Sopenharmony_ci f_expression: (`conditional_expression` | "*" `or_expr`) 7147db96d56Sopenharmony_ci : ("," `conditional_expression` | "," "*" `or_expr`)* [","] 7157db96d56Sopenharmony_ci : | `yield_expression` 7167db96d56Sopenharmony_ci conversion: "s" | "r" | "a" 7177db96d56Sopenharmony_ci format_spec: (`literal_char` | NULL | `replacement_field`)* 7187db96d56Sopenharmony_ci literal_char: <any code point except "{", "}" or NULL> 7197db96d56Sopenharmony_ci 7207db96d56Sopenharmony_ciThe parts of the string outside curly braces are treated literally, 7217db96d56Sopenharmony_ciexcept that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced 7227db96d56Sopenharmony_ciwith the corresponding single curly brace. A single opening curly 7237db96d56Sopenharmony_cibracket ``'{'`` marks a replacement field, which starts with a 7247db96d56Sopenharmony_ciPython expression. To display both the expression text and its value after 7257db96d56Sopenharmony_cievaluation, (useful in debugging), an equal sign ``'='`` may be added after the 7267db96d56Sopenharmony_ciexpression. A conversion field, introduced by an exclamation point ``'!'`` may 7277db96d56Sopenharmony_cifollow. A format specifier may also be appended, introduced by a colon ``':'``. 7287db96d56Sopenharmony_ciA replacement field ends with a closing curly bracket ``'}'``. 7297db96d56Sopenharmony_ci 7307db96d56Sopenharmony_ciExpressions in formatted string literals are treated like regular 7317db96d56Sopenharmony_ciPython expressions surrounded by parentheses, with a few exceptions. 7327db96d56Sopenharmony_ciAn empty expression is not allowed, and both :keyword:`lambda` and 7337db96d56Sopenharmony_ciassignment expressions ``:=`` must be surrounded by explicit parentheses. 7347db96d56Sopenharmony_ciReplacement expressions can contain line breaks (e.g. in triple-quoted 7357db96d56Sopenharmony_cistrings), but they cannot contain comments. Each expression is evaluated 7367db96d56Sopenharmony_ciin the context where the formatted string literal appears, in order from 7377db96d56Sopenharmony_cileft to right. 7387db96d56Sopenharmony_ci 7397db96d56Sopenharmony_ci.. versionchanged:: 3.7 7407db96d56Sopenharmony_ci Prior to Python 3.7, an :keyword:`await` expression and comprehensions 7417db96d56Sopenharmony_ci containing an :keyword:`async for` clause were illegal in the expressions 7427db96d56Sopenharmony_ci in formatted string literals due to a problem with the implementation. 7437db96d56Sopenharmony_ci 7447db96d56Sopenharmony_ciWhen the equal sign ``'='`` is provided, the output will have the expression 7457db96d56Sopenharmony_citext, the ``'='`` and the evaluated value. Spaces after the opening brace 7467db96d56Sopenharmony_ci``'{'``, within the expression and after the ``'='`` are all retained in the 7477db96d56Sopenharmony_cioutput. By default, the ``'='`` causes the :func:`repr` of the expression to be 7487db96d56Sopenharmony_ciprovided, unless there is a format specified. When a format is specified it 7497db96d56Sopenharmony_cidefaults to the :func:`str` of the expression unless a conversion ``'!r'`` is 7507db96d56Sopenharmony_cideclared. 7517db96d56Sopenharmony_ci 7527db96d56Sopenharmony_ci.. versionadded:: 3.8 7537db96d56Sopenharmony_ci The equal sign ``'='``. 7547db96d56Sopenharmony_ci 7557db96d56Sopenharmony_ciIf a conversion is specified, the result of evaluating the expression 7567db96d56Sopenharmony_ciis converted before formatting. Conversion ``'!s'`` calls :func:`str` on 7577db96d56Sopenharmony_cithe result, ``'!r'`` calls :func:`repr`, and ``'!a'`` calls :func:`ascii`. 7587db96d56Sopenharmony_ci 7597db96d56Sopenharmony_ciThe result is then formatted using the :func:`format` protocol. The 7607db96d56Sopenharmony_ciformat specifier is passed to the :meth:`__format__` method of the 7617db96d56Sopenharmony_ciexpression or conversion result. An empty string is passed when the 7627db96d56Sopenharmony_ciformat specifier is omitted. The formatted result is then included in 7637db96d56Sopenharmony_cithe final value of the whole string. 7647db96d56Sopenharmony_ci 7657db96d56Sopenharmony_ciTop-level format specifiers may include nested replacement fields. These nested 7667db96d56Sopenharmony_cifields may include their own conversion fields and :ref:`format specifiers 7677db96d56Sopenharmony_ci<formatspec>`, but may not include more deeply nested replacement fields. The 7687db96d56Sopenharmony_ci:ref:`format specifier mini-language <formatspec>` is the same as that used by 7697db96d56Sopenharmony_cithe :meth:`str.format` method. 7707db96d56Sopenharmony_ci 7717db96d56Sopenharmony_ciFormatted string literals may be concatenated, but replacement fields 7727db96d56Sopenharmony_cicannot be split across literals. 7737db96d56Sopenharmony_ci 7747db96d56Sopenharmony_ciSome examples of formatted string literals:: 7757db96d56Sopenharmony_ci 7767db96d56Sopenharmony_ci >>> name = "Fred" 7777db96d56Sopenharmony_ci >>> f"He said his name is {name!r}." 7787db96d56Sopenharmony_ci "He said his name is 'Fred'." 7797db96d56Sopenharmony_ci >>> f"He said his name is {repr(name)}." # repr() is equivalent to !r 7807db96d56Sopenharmony_ci "He said his name is 'Fred'." 7817db96d56Sopenharmony_ci >>> width = 10 7827db96d56Sopenharmony_ci >>> precision = 4 7837db96d56Sopenharmony_ci >>> value = decimal.Decimal("12.34567") 7847db96d56Sopenharmony_ci >>> f"result: {value:{width}.{precision}}" # nested fields 7857db96d56Sopenharmony_ci 'result: 12.35' 7867db96d56Sopenharmony_ci >>> today = datetime(year=2017, month=1, day=27) 7877db96d56Sopenharmony_ci >>> f"{today:%B %d, %Y}" # using date format specifier 7887db96d56Sopenharmony_ci 'January 27, 2017' 7897db96d56Sopenharmony_ci >>> f"{today=:%B %d, %Y}" # using date format specifier and debugging 7907db96d56Sopenharmony_ci 'today=January 27, 2017' 7917db96d56Sopenharmony_ci >>> number = 1024 7927db96d56Sopenharmony_ci >>> f"{number:#0x}" # using integer format specifier 7937db96d56Sopenharmony_ci '0x400' 7947db96d56Sopenharmony_ci >>> foo = "bar" 7957db96d56Sopenharmony_ci >>> f"{ foo = }" # preserves whitespace 7967db96d56Sopenharmony_ci " foo = 'bar'" 7977db96d56Sopenharmony_ci >>> line = "The mill's closed" 7987db96d56Sopenharmony_ci >>> f"{line = }" 7997db96d56Sopenharmony_ci 'line = "The mill\'s closed"' 8007db96d56Sopenharmony_ci >>> f"{line = :20}" 8017db96d56Sopenharmony_ci "line = The mill's closed " 8027db96d56Sopenharmony_ci >>> f"{line = !r:20}" 8037db96d56Sopenharmony_ci 'line = "The mill\'s closed" ' 8047db96d56Sopenharmony_ci 8057db96d56Sopenharmony_ci 8067db96d56Sopenharmony_ciA consequence of sharing the same syntax as regular string literals is 8077db96d56Sopenharmony_cithat characters in the replacement fields must not conflict with the 8087db96d56Sopenharmony_ciquoting used in the outer formatted string literal:: 8097db96d56Sopenharmony_ci 8107db96d56Sopenharmony_ci f"abc {a["x"]} def" # error: outer string literal ended prematurely 8117db96d56Sopenharmony_ci f"abc {a['x']} def" # workaround: use different quoting 8127db96d56Sopenharmony_ci 8137db96d56Sopenharmony_ciBackslashes are not allowed in format expressions and will raise 8147db96d56Sopenharmony_cian error:: 8157db96d56Sopenharmony_ci 8167db96d56Sopenharmony_ci f"newline: {ord('\n')}" # raises SyntaxError 8177db96d56Sopenharmony_ci 8187db96d56Sopenharmony_ciTo include a value in which a backslash escape is required, create 8197db96d56Sopenharmony_cia temporary variable. 8207db96d56Sopenharmony_ci 8217db96d56Sopenharmony_ci >>> newline = ord('\n') 8227db96d56Sopenharmony_ci >>> f"newline: {newline}" 8237db96d56Sopenharmony_ci 'newline: 10' 8247db96d56Sopenharmony_ci 8257db96d56Sopenharmony_ciFormatted string literals cannot be used as docstrings, even if they do not 8267db96d56Sopenharmony_ciinclude expressions. 8277db96d56Sopenharmony_ci 8287db96d56Sopenharmony_ci:: 8297db96d56Sopenharmony_ci 8307db96d56Sopenharmony_ci >>> def foo(): 8317db96d56Sopenharmony_ci ... f"Not a docstring" 8327db96d56Sopenharmony_ci ... 8337db96d56Sopenharmony_ci >>> foo.__doc__ is None 8347db96d56Sopenharmony_ci True 8357db96d56Sopenharmony_ci 8367db96d56Sopenharmony_ciSee also :pep:`498` for the proposal that added formatted string literals, 8377db96d56Sopenharmony_ciand :meth:`str.format`, which uses a related format string mechanism. 8387db96d56Sopenharmony_ci 8397db96d56Sopenharmony_ci 8407db96d56Sopenharmony_ci.. _numbers: 8417db96d56Sopenharmony_ci 8427db96d56Sopenharmony_ciNumeric literals 8437db96d56Sopenharmony_ci---------------- 8447db96d56Sopenharmony_ci 8457db96d56Sopenharmony_ci.. index:: number, numeric literal, integer literal 8467db96d56Sopenharmony_ci floating point literal, hexadecimal literal 8477db96d56Sopenharmony_ci octal literal, binary literal, decimal literal, imaginary literal, complex literal 8487db96d56Sopenharmony_ci 8497db96d56Sopenharmony_ciThere are three types of numeric literals: integers, floating point numbers, and 8507db96d56Sopenharmony_ciimaginary numbers. There are no complex literals (complex numbers can be formed 8517db96d56Sopenharmony_ciby adding a real number and an imaginary number). 8527db96d56Sopenharmony_ci 8537db96d56Sopenharmony_ciNote that numeric literals do not include a sign; a phrase like ``-1`` is 8547db96d56Sopenharmony_ciactually an expression composed of the unary operator '``-``' and the literal 8557db96d56Sopenharmony_ci``1``. 8567db96d56Sopenharmony_ci 8577db96d56Sopenharmony_ci 8587db96d56Sopenharmony_ci.. index:: 8597db96d56Sopenharmony_ci single: 0b; integer literal 8607db96d56Sopenharmony_ci single: 0o; integer literal 8617db96d56Sopenharmony_ci single: 0x; integer literal 8627db96d56Sopenharmony_ci single: _ (underscore); in numeric literal 8637db96d56Sopenharmony_ci 8647db96d56Sopenharmony_ci.. _integers: 8657db96d56Sopenharmony_ci 8667db96d56Sopenharmony_ciInteger literals 8677db96d56Sopenharmony_ci---------------- 8687db96d56Sopenharmony_ci 8697db96d56Sopenharmony_ciInteger literals are described by the following lexical definitions: 8707db96d56Sopenharmony_ci 8717db96d56Sopenharmony_ci.. productionlist:: python-grammar 8727db96d56Sopenharmony_ci integer: `decinteger` | `bininteger` | `octinteger` | `hexinteger` 8737db96d56Sopenharmony_ci decinteger: `nonzerodigit` (["_"] `digit`)* | "0"+ (["_"] "0")* 8747db96d56Sopenharmony_ci bininteger: "0" ("b" | "B") (["_"] `bindigit`)+ 8757db96d56Sopenharmony_ci octinteger: "0" ("o" | "O") (["_"] `octdigit`)+ 8767db96d56Sopenharmony_ci hexinteger: "0" ("x" | "X") (["_"] `hexdigit`)+ 8777db96d56Sopenharmony_ci nonzerodigit: "1"..."9" 8787db96d56Sopenharmony_ci digit: "0"..."9" 8797db96d56Sopenharmony_ci bindigit: "0" | "1" 8807db96d56Sopenharmony_ci octdigit: "0"..."7" 8817db96d56Sopenharmony_ci hexdigit: `digit` | "a"..."f" | "A"..."F" 8827db96d56Sopenharmony_ci 8837db96d56Sopenharmony_ciThere is no limit for the length of integer literals apart from what can be 8847db96d56Sopenharmony_cistored in available memory. 8857db96d56Sopenharmony_ci 8867db96d56Sopenharmony_ciUnderscores are ignored for determining the numeric value of the literal. They 8877db96d56Sopenharmony_cican be used to group digits for enhanced readability. One underscore can occur 8887db96d56Sopenharmony_cibetween digits, and after base specifiers like ``0x``. 8897db96d56Sopenharmony_ci 8907db96d56Sopenharmony_ciNote that leading zeros in a non-zero decimal number are not allowed. This is 8917db96d56Sopenharmony_cifor disambiguation with C-style octal literals, which Python used before version 8927db96d56Sopenharmony_ci3.0. 8937db96d56Sopenharmony_ci 8947db96d56Sopenharmony_ciSome examples of integer literals:: 8957db96d56Sopenharmony_ci 8967db96d56Sopenharmony_ci 7 2147483647 0o177 0b100110111 8977db96d56Sopenharmony_ci 3 79228162514264337593543950336 0o377 0xdeadbeef 8987db96d56Sopenharmony_ci 100_000_000_000 0b_1110_0101 8997db96d56Sopenharmony_ci 9007db96d56Sopenharmony_ci.. versionchanged:: 3.6 9017db96d56Sopenharmony_ci Underscores are now allowed for grouping purposes in literals. 9027db96d56Sopenharmony_ci 9037db96d56Sopenharmony_ci 9047db96d56Sopenharmony_ci.. index:: 9057db96d56Sopenharmony_ci single: . (dot); in numeric literal 9067db96d56Sopenharmony_ci single: e; in numeric literal 9077db96d56Sopenharmony_ci single: _ (underscore); in numeric literal 9087db96d56Sopenharmony_ci.. _floating: 9097db96d56Sopenharmony_ci 9107db96d56Sopenharmony_ciFloating point literals 9117db96d56Sopenharmony_ci----------------------- 9127db96d56Sopenharmony_ci 9137db96d56Sopenharmony_ciFloating point literals are described by the following lexical definitions: 9147db96d56Sopenharmony_ci 9157db96d56Sopenharmony_ci.. productionlist:: python-grammar 9167db96d56Sopenharmony_ci floatnumber: `pointfloat` | `exponentfloat` 9177db96d56Sopenharmony_ci pointfloat: [`digitpart`] `fraction` | `digitpart` "." 9187db96d56Sopenharmony_ci exponentfloat: (`digitpart` | `pointfloat`) `exponent` 9197db96d56Sopenharmony_ci digitpart: `digit` (["_"] `digit`)* 9207db96d56Sopenharmony_ci fraction: "." `digitpart` 9217db96d56Sopenharmony_ci exponent: ("e" | "E") ["+" | "-"] `digitpart` 9227db96d56Sopenharmony_ci 9237db96d56Sopenharmony_ciNote that the integer and exponent parts are always interpreted using radix 10. 9247db96d56Sopenharmony_ciFor example, ``077e010`` is legal, and denotes the same number as ``77e10``. The 9257db96d56Sopenharmony_ciallowed range of floating point literals is implementation-dependent. As in 9267db96d56Sopenharmony_ciinteger literals, underscores are supported for digit grouping. 9277db96d56Sopenharmony_ci 9287db96d56Sopenharmony_ciSome examples of floating point literals:: 9297db96d56Sopenharmony_ci 9307db96d56Sopenharmony_ci 3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93 9317db96d56Sopenharmony_ci 9327db96d56Sopenharmony_ci.. versionchanged:: 3.6 9337db96d56Sopenharmony_ci Underscores are now allowed for grouping purposes in literals. 9347db96d56Sopenharmony_ci 9357db96d56Sopenharmony_ci 9367db96d56Sopenharmony_ci.. index:: 9377db96d56Sopenharmony_ci single: j; in numeric literal 9387db96d56Sopenharmony_ci.. _imaginary: 9397db96d56Sopenharmony_ci 9407db96d56Sopenharmony_ciImaginary literals 9417db96d56Sopenharmony_ci------------------ 9427db96d56Sopenharmony_ci 9437db96d56Sopenharmony_ciImaginary literals are described by the following lexical definitions: 9447db96d56Sopenharmony_ci 9457db96d56Sopenharmony_ci.. productionlist:: python-grammar 9467db96d56Sopenharmony_ci imagnumber: (`floatnumber` | `digitpart`) ("j" | "J") 9477db96d56Sopenharmony_ci 9487db96d56Sopenharmony_ciAn imaginary literal yields a complex number with a real part of 0.0. Complex 9497db96d56Sopenharmony_cinumbers are represented as a pair of floating point numbers and have the same 9507db96d56Sopenharmony_cirestrictions on their range. To create a complex number with a nonzero real 9517db96d56Sopenharmony_cipart, add a floating point number to it, e.g., ``(3+4j)``. Some examples of 9527db96d56Sopenharmony_ciimaginary literals:: 9537db96d56Sopenharmony_ci 9547db96d56Sopenharmony_ci 3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j 9557db96d56Sopenharmony_ci 9567db96d56Sopenharmony_ci 9577db96d56Sopenharmony_ci.. _operators: 9587db96d56Sopenharmony_ci 9597db96d56Sopenharmony_ciOperators 9607db96d56Sopenharmony_ci========= 9617db96d56Sopenharmony_ci 9627db96d56Sopenharmony_ci.. index:: single: operators 9637db96d56Sopenharmony_ci 9647db96d56Sopenharmony_ciThe following tokens are operators: 9657db96d56Sopenharmony_ci 9667db96d56Sopenharmony_ci.. code-block:: none 9677db96d56Sopenharmony_ci 9687db96d56Sopenharmony_ci 9697db96d56Sopenharmony_ci + - * ** / // % @ 9707db96d56Sopenharmony_ci << >> & | ^ ~ := 9717db96d56Sopenharmony_ci < > <= >= == != 9727db96d56Sopenharmony_ci 9737db96d56Sopenharmony_ci 9747db96d56Sopenharmony_ci.. _delimiters: 9757db96d56Sopenharmony_ci 9767db96d56Sopenharmony_ciDelimiters 9777db96d56Sopenharmony_ci========== 9787db96d56Sopenharmony_ci 9797db96d56Sopenharmony_ci.. index:: single: delimiters 9807db96d56Sopenharmony_ci 9817db96d56Sopenharmony_ciThe following tokens serve as delimiters in the grammar: 9827db96d56Sopenharmony_ci 9837db96d56Sopenharmony_ci.. code-block:: none 9847db96d56Sopenharmony_ci 9857db96d56Sopenharmony_ci ( ) [ ] { } 9867db96d56Sopenharmony_ci , : . ; @ = -> 9877db96d56Sopenharmony_ci += -= *= /= //= %= @= 9887db96d56Sopenharmony_ci &= |= ^= >>= <<= **= 9897db96d56Sopenharmony_ci 9907db96d56Sopenharmony_ciThe period can also occur in floating-point and imaginary literals. A sequence 9917db96d56Sopenharmony_ciof three periods has a special meaning as an ellipsis literal. The second half 9927db96d56Sopenharmony_ciof the list, the augmented assignment operators, serve lexically as delimiters, 9937db96d56Sopenharmony_cibut also perform an operation. 9947db96d56Sopenharmony_ci 9957db96d56Sopenharmony_ciThe following printing ASCII characters have special meaning as part of other 9967db96d56Sopenharmony_citokens or are otherwise significant to the lexical analyzer: 9977db96d56Sopenharmony_ci 9987db96d56Sopenharmony_ci.. code-block:: none 9997db96d56Sopenharmony_ci 10007db96d56Sopenharmony_ci ' " # \ 10017db96d56Sopenharmony_ci 10027db96d56Sopenharmony_ciThe following printing ASCII characters are not used in Python. Their 10037db96d56Sopenharmony_cioccurrence outside string literals and comments is an unconditional error: 10047db96d56Sopenharmony_ci 10057db96d56Sopenharmony_ci.. code-block:: none 10067db96d56Sopenharmony_ci 10077db96d56Sopenharmony_ci $ ? ` 10087db96d56Sopenharmony_ci 10097db96d56Sopenharmony_ci 10107db96d56Sopenharmony_ci.. rubric:: Footnotes 10117db96d56Sopenharmony_ci 10127db96d56Sopenharmony_ci.. [#] https://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt 1013