17db96d56Sopenharmony_ci:mod:`shlex` --- Simple lexical analysis 27db96d56Sopenharmony_ci======================================== 37db96d56Sopenharmony_ci 47db96d56Sopenharmony_ci.. module:: shlex 57db96d56Sopenharmony_ci :synopsis: Simple lexical analysis for Unix shell-like languages. 67db96d56Sopenharmony_ci 77db96d56Sopenharmony_ci.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com> 87db96d56Sopenharmony_ci.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 97db96d56Sopenharmony_ci.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com> 107db96d56Sopenharmony_ci.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 117db96d56Sopenharmony_ci 127db96d56Sopenharmony_ci**Source code:** :source:`Lib/shlex.py` 137db96d56Sopenharmony_ci 147db96d56Sopenharmony_ci-------------- 157db96d56Sopenharmony_ci 167db96d56Sopenharmony_ciThe :class:`~shlex.shlex` class makes it easy to write lexical analyzers for 177db96d56Sopenharmony_cisimple syntaxes resembling that of the Unix shell. This will often be useful 187db96d56Sopenharmony_cifor writing minilanguages, (for example, in run control files for Python 197db96d56Sopenharmony_ciapplications) or for parsing quoted strings. 207db96d56Sopenharmony_ci 217db96d56Sopenharmony_ciThe :mod:`shlex` module defines the following functions: 227db96d56Sopenharmony_ci 237db96d56Sopenharmony_ci 247db96d56Sopenharmony_ci.. function:: split(s, comments=False, posix=True) 257db96d56Sopenharmony_ci 267db96d56Sopenharmony_ci Split the string *s* using shell-like syntax. If *comments* is :const:`False` 277db96d56Sopenharmony_ci (the default), the parsing of comments in the given string will be disabled 287db96d56Sopenharmony_ci (setting the :attr:`~shlex.commenters` attribute of the 297db96d56Sopenharmony_ci :class:`~shlex.shlex` instance to the empty string). This function operates 307db96d56Sopenharmony_ci in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is 317db96d56Sopenharmony_ci false. 327db96d56Sopenharmony_ci 337db96d56Sopenharmony_ci .. note:: 347db96d56Sopenharmony_ci 357db96d56Sopenharmony_ci Since the :func:`split` function instantiates a :class:`~shlex.shlex` 367db96d56Sopenharmony_ci instance, passing ``None`` for *s* will read the string to split from 377db96d56Sopenharmony_ci standard input. 387db96d56Sopenharmony_ci 397db96d56Sopenharmony_ci .. deprecated:: 3.9 407db96d56Sopenharmony_ci Passing ``None`` for *s* will raise an exception in future Python 417db96d56Sopenharmony_ci versions. 427db96d56Sopenharmony_ci 437db96d56Sopenharmony_ci.. function:: join(split_command) 447db96d56Sopenharmony_ci 457db96d56Sopenharmony_ci Concatenate the tokens of the list *split_command* and return a string. 467db96d56Sopenharmony_ci This function is the inverse of :func:`split`. 477db96d56Sopenharmony_ci 487db96d56Sopenharmony_ci >>> from shlex import join 497db96d56Sopenharmony_ci >>> print(join(['echo', '-n', 'Multiple words'])) 507db96d56Sopenharmony_ci echo -n 'Multiple words' 517db96d56Sopenharmony_ci 527db96d56Sopenharmony_ci The returned value is shell-escaped to protect against injection 537db96d56Sopenharmony_ci vulnerabilities (see :func:`quote`). 547db96d56Sopenharmony_ci 557db96d56Sopenharmony_ci .. versionadded:: 3.8 567db96d56Sopenharmony_ci 577db96d56Sopenharmony_ci 587db96d56Sopenharmony_ci.. function:: quote(s) 597db96d56Sopenharmony_ci 607db96d56Sopenharmony_ci Return a shell-escaped version of the string *s*. The returned value is a 617db96d56Sopenharmony_ci string that can safely be used as one token in a shell command line, for 627db96d56Sopenharmony_ci cases where you cannot use a list. 637db96d56Sopenharmony_ci 647db96d56Sopenharmony_ci .. _shlex-quote-warning: 657db96d56Sopenharmony_ci 667db96d56Sopenharmony_ci .. warning:: 677db96d56Sopenharmony_ci 687db96d56Sopenharmony_ci The ``shlex`` module is **only designed for Unix shells**. 697db96d56Sopenharmony_ci 707db96d56Sopenharmony_ci The :func:`quote` function is not guaranteed to be correct on non-POSIX 717db96d56Sopenharmony_ci compliant shells or shells from other operating systems such as Windows. 727db96d56Sopenharmony_ci Executing commands quoted by this module on such shells can open up the 737db96d56Sopenharmony_ci possibility of a command injection vulnerability. 747db96d56Sopenharmony_ci 757db96d56Sopenharmony_ci Consider using functions that pass command arguments with lists such as 767db96d56Sopenharmony_ci :func:`subprocess.run` with ``shell=False``. 777db96d56Sopenharmony_ci 787db96d56Sopenharmony_ci This idiom would be unsafe: 797db96d56Sopenharmony_ci 807db96d56Sopenharmony_ci >>> filename = 'somefile; rm -rf ~' 817db96d56Sopenharmony_ci >>> command = 'ls -l {}'.format(filename) 827db96d56Sopenharmony_ci >>> print(command) # executed by a shell: boom! 837db96d56Sopenharmony_ci ls -l somefile; rm -rf ~ 847db96d56Sopenharmony_ci 857db96d56Sopenharmony_ci :func:`quote` lets you plug the security hole: 867db96d56Sopenharmony_ci 877db96d56Sopenharmony_ci >>> from shlex import quote 887db96d56Sopenharmony_ci >>> command = 'ls -l {}'.format(quote(filename)) 897db96d56Sopenharmony_ci >>> print(command) 907db96d56Sopenharmony_ci ls -l 'somefile; rm -rf ~' 917db96d56Sopenharmony_ci >>> remote_command = 'ssh home {}'.format(quote(command)) 927db96d56Sopenharmony_ci >>> print(remote_command) 937db96d56Sopenharmony_ci ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"'' 947db96d56Sopenharmony_ci 957db96d56Sopenharmony_ci The quoting is compatible with UNIX shells and with :func:`split`: 967db96d56Sopenharmony_ci 977db96d56Sopenharmony_ci >>> from shlex import split 987db96d56Sopenharmony_ci >>> remote_command = split(remote_command) 997db96d56Sopenharmony_ci >>> remote_command 1007db96d56Sopenharmony_ci ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"] 1017db96d56Sopenharmony_ci >>> command = split(remote_command[-1]) 1027db96d56Sopenharmony_ci >>> command 1037db96d56Sopenharmony_ci ['ls', '-l', 'somefile; rm -rf ~'] 1047db96d56Sopenharmony_ci 1057db96d56Sopenharmony_ci .. versionadded:: 3.3 1067db96d56Sopenharmony_ci 1077db96d56Sopenharmony_ciThe :mod:`shlex` module defines the following class: 1087db96d56Sopenharmony_ci 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_ci.. class:: shlex(instream=None, infile=None, posix=False, punctuation_chars=False) 1117db96d56Sopenharmony_ci 1127db96d56Sopenharmony_ci A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer 1137db96d56Sopenharmony_ci object. The initialization argument, if present, specifies where to read 1147db96d56Sopenharmony_ci characters from. It must be a file-/stream-like object with 1157db96d56Sopenharmony_ci :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or 1167db96d56Sopenharmony_ci a string. If no argument is given, input will be taken from ``sys.stdin``. 1177db96d56Sopenharmony_ci The second optional argument is a filename string, which sets the initial 1187db96d56Sopenharmony_ci value of the :attr:`~shlex.infile` attribute. If the *instream* 1197db96d56Sopenharmony_ci argument is omitted or equal to ``sys.stdin``, this second argument 1207db96d56Sopenharmony_ci defaults to "stdin". The *posix* argument defines the operational mode: 1217db96d56Sopenharmony_ci when *posix* is not true (default), the :class:`~shlex.shlex` instance will 1227db96d56Sopenharmony_ci operate in compatibility mode. When operating in POSIX mode, 1237db96d56Sopenharmony_ci :class:`~shlex.shlex` will try to be as close as possible to the POSIX shell 1247db96d56Sopenharmony_ci parsing rules. The *punctuation_chars* argument provides a way to make the 1257db96d56Sopenharmony_ci behaviour even closer to how real shells parse. This can take a number of 1267db96d56Sopenharmony_ci values: the default value, ``False``, preserves the behaviour seen under 1277db96d56Sopenharmony_ci Python 3.5 and earlier. If set to ``True``, then parsing of the characters 1287db96d56Sopenharmony_ci ``();<>|&`` is changed: any run of these characters (considered punctuation 1297db96d56Sopenharmony_ci characters) is returned as a single token. If set to a non-empty string of 1307db96d56Sopenharmony_ci characters, those characters will be used as the punctuation characters. Any 1317db96d56Sopenharmony_ci characters in the :attr:`wordchars` attribute that appear in 1327db96d56Sopenharmony_ci *punctuation_chars* will be removed from :attr:`wordchars`. See 1337db96d56Sopenharmony_ci :ref:`improved-shell-compatibility` for more information. *punctuation_chars* 1347db96d56Sopenharmony_ci can be set only upon :class:`~shlex.shlex` instance creation and can't be 1357db96d56Sopenharmony_ci modified later. 1367db96d56Sopenharmony_ci 1377db96d56Sopenharmony_ci .. versionchanged:: 3.6 1387db96d56Sopenharmony_ci The *punctuation_chars* parameter was added. 1397db96d56Sopenharmony_ci 1407db96d56Sopenharmony_ci.. seealso:: 1417db96d56Sopenharmony_ci 1427db96d56Sopenharmony_ci Module :mod:`configparser` 1437db96d56Sopenharmony_ci Parser for configuration files similar to the Windows :file:`.ini` files. 1447db96d56Sopenharmony_ci 1457db96d56Sopenharmony_ci 1467db96d56Sopenharmony_ci.. _shlex-objects: 1477db96d56Sopenharmony_ci 1487db96d56Sopenharmony_cishlex Objects 1497db96d56Sopenharmony_ci------------- 1507db96d56Sopenharmony_ci 1517db96d56Sopenharmony_ciA :class:`~shlex.shlex` instance has the following methods: 1527db96d56Sopenharmony_ci 1537db96d56Sopenharmony_ci 1547db96d56Sopenharmony_ci.. method:: shlex.get_token() 1557db96d56Sopenharmony_ci 1567db96d56Sopenharmony_ci Return a token. If tokens have been stacked using :meth:`push_token`, pop a 1577db96d56Sopenharmony_ci token off the stack. Otherwise, read one from the input stream. If reading 1587db96d56Sopenharmony_ci encounters an immediate end-of-file, :attr:`eof` is returned (the empty 1597db96d56Sopenharmony_ci string (``''``) in non-POSIX mode, and ``None`` in POSIX mode). 1607db96d56Sopenharmony_ci 1617db96d56Sopenharmony_ci 1627db96d56Sopenharmony_ci.. method:: shlex.push_token(str) 1637db96d56Sopenharmony_ci 1647db96d56Sopenharmony_ci Push the argument onto the token stack. 1657db96d56Sopenharmony_ci 1667db96d56Sopenharmony_ci 1677db96d56Sopenharmony_ci.. method:: shlex.read_token() 1687db96d56Sopenharmony_ci 1697db96d56Sopenharmony_ci Read a raw token. Ignore the pushback stack, and do not interpret source 1707db96d56Sopenharmony_ci requests. (This is not ordinarily a useful entry point, and is documented here 1717db96d56Sopenharmony_ci only for the sake of completeness.) 1727db96d56Sopenharmony_ci 1737db96d56Sopenharmony_ci 1747db96d56Sopenharmony_ci.. method:: shlex.sourcehook(filename) 1757db96d56Sopenharmony_ci 1767db96d56Sopenharmony_ci When :class:`~shlex.shlex` detects a source request (see :attr:`source` 1777db96d56Sopenharmony_ci below) this method is given the following token as argument, and expected 1787db96d56Sopenharmony_ci to return a tuple consisting of a filename and an open file-like object. 1797db96d56Sopenharmony_ci 1807db96d56Sopenharmony_ci Normally, this method first strips any quotes off the argument. If the result 1817db96d56Sopenharmony_ci is an absolute pathname, or there was no previous source request in effect, or 1827db96d56Sopenharmony_ci the previous source was a stream (such as ``sys.stdin``), the result is left 1837db96d56Sopenharmony_ci alone. Otherwise, if the result is a relative pathname, the directory part of 1847db96d56Sopenharmony_ci the name of the file immediately before it on the source inclusion stack is 1857db96d56Sopenharmony_ci prepended (this behavior is like the way the C preprocessor handles ``#include 1867db96d56Sopenharmony_ci "file.h"``). 1877db96d56Sopenharmony_ci 1887db96d56Sopenharmony_ci The result of the manipulations is treated as a filename, and returned as the 1897db96d56Sopenharmony_ci first component of the tuple, with :func:`open` called on it to yield the second 1907db96d56Sopenharmony_ci component. (Note: this is the reverse of the order of arguments in instance 1917db96d56Sopenharmony_ci initialization!) 1927db96d56Sopenharmony_ci 1937db96d56Sopenharmony_ci This hook is exposed so that you can use it to implement directory search paths, 1947db96d56Sopenharmony_ci addition of file extensions, and other namespace hacks. There is no 1957db96d56Sopenharmony_ci corresponding 'close' hook, but a shlex instance will call the 1967db96d56Sopenharmony_ci :meth:`~io.IOBase.close` method of the sourced input stream when it returns 1977db96d56Sopenharmony_ci EOF. 1987db96d56Sopenharmony_ci 1997db96d56Sopenharmony_ci For more explicit control of source stacking, use the :meth:`push_source` and 2007db96d56Sopenharmony_ci :meth:`pop_source` methods. 2017db96d56Sopenharmony_ci 2027db96d56Sopenharmony_ci 2037db96d56Sopenharmony_ci.. method:: shlex.push_source(newstream, newfile=None) 2047db96d56Sopenharmony_ci 2057db96d56Sopenharmony_ci Push an input source stream onto the input stack. If the filename argument is 2067db96d56Sopenharmony_ci specified it will later be available for use in error messages. This is the 2077db96d56Sopenharmony_ci same method used internally by the :meth:`sourcehook` method. 2087db96d56Sopenharmony_ci 2097db96d56Sopenharmony_ci 2107db96d56Sopenharmony_ci.. method:: shlex.pop_source() 2117db96d56Sopenharmony_ci 2127db96d56Sopenharmony_ci Pop the last-pushed input source from the input stack. This is the same method 2137db96d56Sopenharmony_ci used internally when the lexer reaches EOF on a stacked input stream. 2147db96d56Sopenharmony_ci 2157db96d56Sopenharmony_ci 2167db96d56Sopenharmony_ci.. method:: shlex.error_leader(infile=None, lineno=None) 2177db96d56Sopenharmony_ci 2187db96d56Sopenharmony_ci This method generates an error message leader in the format of a Unix C compiler 2197db96d56Sopenharmony_ci error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced 2207db96d56Sopenharmony_ci with the name of the current source file and the ``%d`` with the current input 2217db96d56Sopenharmony_ci line number (the optional arguments can be used to override these). 2227db96d56Sopenharmony_ci 2237db96d56Sopenharmony_ci This convenience is provided to encourage :mod:`shlex` users to generate error 2247db96d56Sopenharmony_ci messages in the standard, parseable format understood by Emacs and other Unix 2257db96d56Sopenharmony_ci tools. 2267db96d56Sopenharmony_ci 2277db96d56Sopenharmony_ciInstances of :class:`~shlex.shlex` subclasses have some public instance 2287db96d56Sopenharmony_civariables which either control lexical analysis or can be used for debugging: 2297db96d56Sopenharmony_ci 2307db96d56Sopenharmony_ci 2317db96d56Sopenharmony_ci.. attribute:: shlex.commenters 2327db96d56Sopenharmony_ci 2337db96d56Sopenharmony_ci The string of characters that are recognized as comment beginners. All 2347db96d56Sopenharmony_ci characters from the comment beginner to end of line are ignored. Includes just 2357db96d56Sopenharmony_ci ``'#'`` by default. 2367db96d56Sopenharmony_ci 2377db96d56Sopenharmony_ci 2387db96d56Sopenharmony_ci.. attribute:: shlex.wordchars 2397db96d56Sopenharmony_ci 2407db96d56Sopenharmony_ci The string of characters that will accumulate into multi-character tokens. By 2417db96d56Sopenharmony_ci default, includes all ASCII alphanumerics and underscore. In POSIX mode, the 2427db96d56Sopenharmony_ci accented characters in the Latin-1 set are also included. If 2437db96d56Sopenharmony_ci :attr:`punctuation_chars` is not empty, the characters ``~-./*?=``, which can 2447db96d56Sopenharmony_ci appear in filename specifications and command line parameters, will also be 2457db96d56Sopenharmony_ci included in this attribute, and any characters which appear in 2467db96d56Sopenharmony_ci ``punctuation_chars`` will be removed from ``wordchars`` if they are present 2477db96d56Sopenharmony_ci there. If :attr:`whitespace_split` is set to ``True``, this will have no 2487db96d56Sopenharmony_ci effect. 2497db96d56Sopenharmony_ci 2507db96d56Sopenharmony_ci 2517db96d56Sopenharmony_ci.. attribute:: shlex.whitespace 2527db96d56Sopenharmony_ci 2537db96d56Sopenharmony_ci Characters that will be considered whitespace and skipped. Whitespace bounds 2547db96d56Sopenharmony_ci tokens. By default, includes space, tab, linefeed and carriage-return. 2557db96d56Sopenharmony_ci 2567db96d56Sopenharmony_ci 2577db96d56Sopenharmony_ci.. attribute:: shlex.escape 2587db96d56Sopenharmony_ci 2597db96d56Sopenharmony_ci Characters that will be considered as escape. This will be only used in POSIX 2607db96d56Sopenharmony_ci mode, and includes just ``'\'`` by default. 2617db96d56Sopenharmony_ci 2627db96d56Sopenharmony_ci 2637db96d56Sopenharmony_ci.. attribute:: shlex.quotes 2647db96d56Sopenharmony_ci 2657db96d56Sopenharmony_ci Characters that will be considered string quotes. The token accumulates until 2667db96d56Sopenharmony_ci the same quote is encountered again (thus, different quote types protect each 2677db96d56Sopenharmony_ci other as in the shell.) By default, includes ASCII single and double quotes. 2687db96d56Sopenharmony_ci 2697db96d56Sopenharmony_ci 2707db96d56Sopenharmony_ci.. attribute:: shlex.escapedquotes 2717db96d56Sopenharmony_ci 2727db96d56Sopenharmony_ci Characters in :attr:`quotes` that will interpret escape characters defined in 2737db96d56Sopenharmony_ci :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by 2747db96d56Sopenharmony_ci default. 2757db96d56Sopenharmony_ci 2767db96d56Sopenharmony_ci 2777db96d56Sopenharmony_ci.. attribute:: shlex.whitespace_split 2787db96d56Sopenharmony_ci 2797db96d56Sopenharmony_ci If ``True``, tokens will only be split in whitespaces. This is useful, for 2807db96d56Sopenharmony_ci example, for parsing command lines with :class:`~shlex.shlex`, getting 2817db96d56Sopenharmony_ci tokens in a similar way to shell arguments. When used in combination with 2827db96d56Sopenharmony_ci :attr:`punctuation_chars`, tokens will be split on whitespace in addition to 2837db96d56Sopenharmony_ci those characters. 2847db96d56Sopenharmony_ci 2857db96d56Sopenharmony_ci .. versionchanged:: 3.8 2867db96d56Sopenharmony_ci The :attr:`punctuation_chars` attribute was made compatible with the 2877db96d56Sopenharmony_ci :attr:`whitespace_split` attribute. 2887db96d56Sopenharmony_ci 2897db96d56Sopenharmony_ci 2907db96d56Sopenharmony_ci.. attribute:: shlex.infile 2917db96d56Sopenharmony_ci 2927db96d56Sopenharmony_ci The name of the current input file, as initially set at class instantiation time 2937db96d56Sopenharmony_ci or stacked by later source requests. It may be useful to examine this when 2947db96d56Sopenharmony_ci constructing error messages. 2957db96d56Sopenharmony_ci 2967db96d56Sopenharmony_ci 2977db96d56Sopenharmony_ci.. attribute:: shlex.instream 2987db96d56Sopenharmony_ci 2997db96d56Sopenharmony_ci The input stream from which this :class:`~shlex.shlex` instance is reading 3007db96d56Sopenharmony_ci characters. 3017db96d56Sopenharmony_ci 3027db96d56Sopenharmony_ci 3037db96d56Sopenharmony_ci.. attribute:: shlex.source 3047db96d56Sopenharmony_ci 3057db96d56Sopenharmony_ci This attribute is ``None`` by default. If you assign a string to it, that 3067db96d56Sopenharmony_ci string will be recognized as a lexical-level inclusion request similar to the 3077db96d56Sopenharmony_ci ``source`` keyword in various shells. That is, the immediately following token 3087db96d56Sopenharmony_ci will be opened as a filename and input will be taken from that stream until 3097db96d56Sopenharmony_ci EOF, at which point the :meth:`~io.IOBase.close` method of that stream will be 3107db96d56Sopenharmony_ci called and the input source will again become the original input stream. Source 3117db96d56Sopenharmony_ci requests may be stacked any number of levels deep. 3127db96d56Sopenharmony_ci 3137db96d56Sopenharmony_ci 3147db96d56Sopenharmony_ci.. attribute:: shlex.debug 3157db96d56Sopenharmony_ci 3167db96d56Sopenharmony_ci If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex` 3177db96d56Sopenharmony_ci instance will print verbose progress output on its behavior. If you need 3187db96d56Sopenharmony_ci to use this, you can read the module source code to learn the details. 3197db96d56Sopenharmony_ci 3207db96d56Sopenharmony_ci 3217db96d56Sopenharmony_ci.. attribute:: shlex.lineno 3227db96d56Sopenharmony_ci 3237db96d56Sopenharmony_ci Source line number (count of newlines seen so far plus one). 3247db96d56Sopenharmony_ci 3257db96d56Sopenharmony_ci 3267db96d56Sopenharmony_ci.. attribute:: shlex.token 3277db96d56Sopenharmony_ci 3287db96d56Sopenharmony_ci The token buffer. It may be useful to examine this when catching exceptions. 3297db96d56Sopenharmony_ci 3307db96d56Sopenharmony_ci 3317db96d56Sopenharmony_ci.. attribute:: shlex.eof 3327db96d56Sopenharmony_ci 3337db96d56Sopenharmony_ci Token used to determine end of file. This will be set to the empty string 3347db96d56Sopenharmony_ci (``''``), in non-POSIX mode, and to ``None`` in POSIX mode. 3357db96d56Sopenharmony_ci 3367db96d56Sopenharmony_ci 3377db96d56Sopenharmony_ci.. attribute:: shlex.punctuation_chars 3387db96d56Sopenharmony_ci 3397db96d56Sopenharmony_ci A read-only property. Characters that will be considered punctuation. Runs of 3407db96d56Sopenharmony_ci punctuation characters will be returned as a single token. However, note that no 3417db96d56Sopenharmony_ci semantic validity checking will be performed: for example, '>>>' could be 3427db96d56Sopenharmony_ci returned as a token, even though it may not be recognised as such by shells. 3437db96d56Sopenharmony_ci 3447db96d56Sopenharmony_ci .. versionadded:: 3.6 3457db96d56Sopenharmony_ci 3467db96d56Sopenharmony_ci 3477db96d56Sopenharmony_ci.. _shlex-parsing-rules: 3487db96d56Sopenharmony_ci 3497db96d56Sopenharmony_ciParsing Rules 3507db96d56Sopenharmony_ci------------- 3517db96d56Sopenharmony_ci 3527db96d56Sopenharmony_ciWhen operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the 3537db96d56Sopenharmony_cifollowing rules. 3547db96d56Sopenharmony_ci 3557db96d56Sopenharmony_ci* Quote characters are not recognized within words (``Do"Not"Separate`` is 3567db96d56Sopenharmony_ci parsed as the single word ``Do"Not"Separate``); 3577db96d56Sopenharmony_ci 3587db96d56Sopenharmony_ci* Escape characters are not recognized; 3597db96d56Sopenharmony_ci 3607db96d56Sopenharmony_ci* Enclosing characters in quotes preserve the literal value of all characters 3617db96d56Sopenharmony_ci within the quotes; 3627db96d56Sopenharmony_ci 3637db96d56Sopenharmony_ci* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and 3647db96d56Sopenharmony_ci ``Separate``); 3657db96d56Sopenharmony_ci 3667db96d56Sopenharmony_ci* If :attr:`~shlex.whitespace_split` is ``False``, any character not 3677db96d56Sopenharmony_ci declared to be a word character, whitespace, or a quote will be returned as 3687db96d56Sopenharmony_ci a single-character token. If it is ``True``, :class:`~shlex.shlex` will only 3697db96d56Sopenharmony_ci split words in whitespaces; 3707db96d56Sopenharmony_ci 3717db96d56Sopenharmony_ci* EOF is signaled with an empty string (``''``); 3727db96d56Sopenharmony_ci 3737db96d56Sopenharmony_ci* It's not possible to parse empty strings, even if quoted. 3747db96d56Sopenharmony_ci 3757db96d56Sopenharmony_ciWhen operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the 3767db96d56Sopenharmony_cifollowing parsing rules. 3777db96d56Sopenharmony_ci 3787db96d56Sopenharmony_ci* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is 3797db96d56Sopenharmony_ci parsed as the single word ``DoNotSeparate``); 3807db96d56Sopenharmony_ci 3817db96d56Sopenharmony_ci* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the 3827db96d56Sopenharmony_ci next character that follows; 3837db96d56Sopenharmony_ci 3847db96d56Sopenharmony_ci* Enclosing characters in quotes which are not part of 3857db96d56Sopenharmony_ci :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value 3867db96d56Sopenharmony_ci of all characters within the quotes; 3877db96d56Sopenharmony_ci 3887db96d56Sopenharmony_ci* Enclosing characters in quotes which are part of 3897db96d56Sopenharmony_ci :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value 3907db96d56Sopenharmony_ci of all characters within the quotes, with the exception of the characters 3917db96d56Sopenharmony_ci mentioned in :attr:`~shlex.escape`. The escape characters retain its 3927db96d56Sopenharmony_ci special meaning only when followed by the quote in use, or the escape 3937db96d56Sopenharmony_ci character itself. Otherwise the escape character will be considered a 3947db96d56Sopenharmony_ci normal character. 3957db96d56Sopenharmony_ci 3967db96d56Sopenharmony_ci* EOF is signaled with a :const:`None` value; 3977db96d56Sopenharmony_ci 3987db96d56Sopenharmony_ci* Quoted empty strings (``''``) are allowed. 3997db96d56Sopenharmony_ci 4007db96d56Sopenharmony_ci.. _improved-shell-compatibility: 4017db96d56Sopenharmony_ci 4027db96d56Sopenharmony_ciImproved Compatibility with Shells 4037db96d56Sopenharmony_ci---------------------------------- 4047db96d56Sopenharmony_ci 4057db96d56Sopenharmony_ci.. versionadded:: 3.6 4067db96d56Sopenharmony_ci 4077db96d56Sopenharmony_ciThe :class:`shlex` class provides compatibility with the parsing performed by 4087db96d56Sopenharmony_cicommon Unix shells like ``bash``, ``dash``, and ``sh``. To take advantage of 4097db96d56Sopenharmony_cithis compatibility, specify the ``punctuation_chars`` argument in the 4107db96d56Sopenharmony_ciconstructor. This defaults to ``False``, which preserves pre-3.6 behaviour. 4117db96d56Sopenharmony_ciHowever, if it is set to ``True``, then parsing of the characters ``();<>|&`` 4127db96d56Sopenharmony_ciis changed: any run of these characters is returned as a single token. While 4137db96d56Sopenharmony_cithis is short of a full parser for shells (which would be out of scope for the 4147db96d56Sopenharmony_cistandard library, given the multiplicity of shells out there), it does allow 4157db96d56Sopenharmony_ciyou to perform processing of command lines more easily than you could 4167db96d56Sopenharmony_ciotherwise. To illustrate, you can see the difference in the following snippet: 4177db96d56Sopenharmony_ci 4187db96d56Sopenharmony_ci.. doctest:: 4197db96d56Sopenharmony_ci :options: +NORMALIZE_WHITESPACE 4207db96d56Sopenharmony_ci 4217db96d56Sopenharmony_ci >>> import shlex 4227db96d56Sopenharmony_ci >>> text = "a && b; c && d || e; f >'abc'; (def \"ghi\")" 4237db96d56Sopenharmony_ci >>> s = shlex.shlex(text, posix=True) 4247db96d56Sopenharmony_ci >>> s.whitespace_split = True 4257db96d56Sopenharmony_ci >>> list(s) 4267db96d56Sopenharmony_ci ['a', '&&', 'b;', 'c', '&&', 'd', '||', 'e;', 'f', '>abc;', '(def', 'ghi)'] 4277db96d56Sopenharmony_ci >>> s = shlex.shlex(text, posix=True, punctuation_chars=True) 4287db96d56Sopenharmony_ci >>> s.whitespace_split = True 4297db96d56Sopenharmony_ci >>> list(s) 4307db96d56Sopenharmony_ci ['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', 'abc', ';', 4317db96d56Sopenharmony_ci '(', 'def', 'ghi', ')'] 4327db96d56Sopenharmony_ci 4337db96d56Sopenharmony_ciOf course, tokens will be returned which are not valid for shells, and you'll 4347db96d56Sopenharmony_cineed to implement your own error checks on the returned tokens. 4357db96d56Sopenharmony_ci 4367db96d56Sopenharmony_ciInstead of passing ``True`` as the value for the punctuation_chars parameter, 4377db96d56Sopenharmony_ciyou can pass a string with specific characters, which will be used to determine 4387db96d56Sopenharmony_ciwhich characters constitute punctuation. For example:: 4397db96d56Sopenharmony_ci 4407db96d56Sopenharmony_ci >>> import shlex 4417db96d56Sopenharmony_ci >>> s = shlex.shlex("a && b || c", punctuation_chars="|") 4427db96d56Sopenharmony_ci >>> list(s) 4437db96d56Sopenharmony_ci ['a', '&', '&', 'b', '||', 'c'] 4447db96d56Sopenharmony_ci 4457db96d56Sopenharmony_ci.. note:: When ``punctuation_chars`` is specified, the :attr:`~shlex.wordchars` 4467db96d56Sopenharmony_ci attribute is augmented with the characters ``~-./*?=``. That is because these 4477db96d56Sopenharmony_ci characters can appear in file names (including wildcards) and command-line 4487db96d56Sopenharmony_ci arguments (e.g. ``--color=auto``). Hence:: 4497db96d56Sopenharmony_ci 4507db96d56Sopenharmony_ci >>> import shlex 4517db96d56Sopenharmony_ci >>> s = shlex.shlex('~/a && b-c --color=auto || d *.py?', 4527db96d56Sopenharmony_ci ... punctuation_chars=True) 4537db96d56Sopenharmony_ci >>> list(s) 4547db96d56Sopenharmony_ci ['~/a', '&&', 'b-c', '--color=auto', '||', 'd', '*.py?'] 4557db96d56Sopenharmony_ci 4567db96d56Sopenharmony_ci However, to match the shell as closely as possible, it is recommended to 4577db96d56Sopenharmony_ci always use ``posix`` and :attr:`~shlex.whitespace_split` when using 4587db96d56Sopenharmony_ci :attr:`~shlex.punctuation_chars`, which will negate 4597db96d56Sopenharmony_ci :attr:`~shlex.wordchars` entirely. 4607db96d56Sopenharmony_ci 4617db96d56Sopenharmony_ciFor best effect, ``punctuation_chars`` should be set in conjunction with 4627db96d56Sopenharmony_ci``posix=True``. (Note that ``posix=False`` is the default for 4637db96d56Sopenharmony_ci:class:`~shlex.shlex`.) 464