17db96d56Sopenharmony_ci:mod:`shlex` --- Simple lexical analysis
27db96d56Sopenharmony_ci========================================
37db96d56Sopenharmony_ci
47db96d56Sopenharmony_ci.. module:: shlex
57db96d56Sopenharmony_ci   :synopsis: Simple lexical analysis for Unix shell-like languages.
67db96d56Sopenharmony_ci
77db96d56Sopenharmony_ci.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
87db96d56Sopenharmony_ci.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
97db96d56Sopenharmony_ci.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
107db96d56Sopenharmony_ci.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
117db96d56Sopenharmony_ci
127db96d56Sopenharmony_ci**Source code:** :source:`Lib/shlex.py`
137db96d56Sopenharmony_ci
147db96d56Sopenharmony_ci--------------
157db96d56Sopenharmony_ci
167db96d56Sopenharmony_ciThe :class:`~shlex.shlex` class makes it easy to write lexical analyzers for
177db96d56Sopenharmony_cisimple syntaxes resembling that of the Unix shell.  This will often be useful
187db96d56Sopenharmony_cifor writing minilanguages, (for example, in run control files for Python
197db96d56Sopenharmony_ciapplications) or for parsing quoted strings.
207db96d56Sopenharmony_ci
217db96d56Sopenharmony_ciThe :mod:`shlex` module defines the following functions:
227db96d56Sopenharmony_ci
237db96d56Sopenharmony_ci
247db96d56Sopenharmony_ci.. function:: split(s, comments=False, posix=True)
257db96d56Sopenharmony_ci
267db96d56Sopenharmony_ci   Split the string *s* using shell-like syntax. If *comments* is :const:`False`
277db96d56Sopenharmony_ci   (the default), the parsing of comments in the given string will be disabled
287db96d56Sopenharmony_ci   (setting the :attr:`~shlex.commenters` attribute of the
297db96d56Sopenharmony_ci   :class:`~shlex.shlex` instance to the empty string).  This function operates
307db96d56Sopenharmony_ci   in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is
317db96d56Sopenharmony_ci   false.
327db96d56Sopenharmony_ci
337db96d56Sopenharmony_ci   .. note::
347db96d56Sopenharmony_ci
357db96d56Sopenharmony_ci      Since the :func:`split` function instantiates a :class:`~shlex.shlex`
367db96d56Sopenharmony_ci      instance, passing ``None`` for *s* will read the string to split from
377db96d56Sopenharmony_ci      standard input.
387db96d56Sopenharmony_ci
397db96d56Sopenharmony_ci   .. deprecated:: 3.9
407db96d56Sopenharmony_ci      Passing ``None`` for *s* will raise an exception in future Python
417db96d56Sopenharmony_ci      versions.
427db96d56Sopenharmony_ci
437db96d56Sopenharmony_ci.. function:: join(split_command)
447db96d56Sopenharmony_ci
457db96d56Sopenharmony_ci   Concatenate the tokens of the list *split_command* and return a string.
467db96d56Sopenharmony_ci   This function is the inverse of :func:`split`.
477db96d56Sopenharmony_ci
487db96d56Sopenharmony_ci      >>> from shlex import join
497db96d56Sopenharmony_ci      >>> print(join(['echo', '-n', 'Multiple words']))
507db96d56Sopenharmony_ci      echo -n 'Multiple words'
517db96d56Sopenharmony_ci
527db96d56Sopenharmony_ci   The returned value is shell-escaped to protect against injection
537db96d56Sopenharmony_ci   vulnerabilities (see :func:`quote`).
547db96d56Sopenharmony_ci
557db96d56Sopenharmony_ci   .. versionadded:: 3.8
567db96d56Sopenharmony_ci
577db96d56Sopenharmony_ci
587db96d56Sopenharmony_ci.. function:: quote(s)
597db96d56Sopenharmony_ci
607db96d56Sopenharmony_ci   Return a shell-escaped version of the string *s*.  The returned value is a
617db96d56Sopenharmony_ci   string that can safely be used as one token in a shell command line, for
627db96d56Sopenharmony_ci   cases where you cannot use a list.
637db96d56Sopenharmony_ci
647db96d56Sopenharmony_ci   .. _shlex-quote-warning:
657db96d56Sopenharmony_ci
667db96d56Sopenharmony_ci   .. warning::
677db96d56Sopenharmony_ci
687db96d56Sopenharmony_ci      The ``shlex`` module is **only designed for Unix shells**.
697db96d56Sopenharmony_ci
707db96d56Sopenharmony_ci      The :func:`quote` function is not guaranteed to be correct on non-POSIX
717db96d56Sopenharmony_ci      compliant shells or shells from other operating systems such as Windows.
727db96d56Sopenharmony_ci      Executing commands quoted by this module on such shells can open up the
737db96d56Sopenharmony_ci      possibility of a command injection vulnerability.
747db96d56Sopenharmony_ci
757db96d56Sopenharmony_ci      Consider using functions that pass command arguments with lists such as
767db96d56Sopenharmony_ci      :func:`subprocess.run` with ``shell=False``.
777db96d56Sopenharmony_ci
787db96d56Sopenharmony_ci   This idiom would be unsafe:
797db96d56Sopenharmony_ci
807db96d56Sopenharmony_ci      >>> filename = 'somefile; rm -rf ~'
817db96d56Sopenharmony_ci      >>> command = 'ls -l {}'.format(filename)
827db96d56Sopenharmony_ci      >>> print(command)  # executed by a shell: boom!
837db96d56Sopenharmony_ci      ls -l somefile; rm -rf ~
847db96d56Sopenharmony_ci
857db96d56Sopenharmony_ci   :func:`quote` lets you plug the security hole:
867db96d56Sopenharmony_ci
877db96d56Sopenharmony_ci      >>> from shlex import quote
887db96d56Sopenharmony_ci      >>> command = 'ls -l {}'.format(quote(filename))
897db96d56Sopenharmony_ci      >>> print(command)
907db96d56Sopenharmony_ci      ls -l 'somefile; rm -rf ~'
917db96d56Sopenharmony_ci      >>> remote_command = 'ssh home {}'.format(quote(command))
927db96d56Sopenharmony_ci      >>> print(remote_command)
937db96d56Sopenharmony_ci      ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"''
947db96d56Sopenharmony_ci
957db96d56Sopenharmony_ci   The quoting is compatible with UNIX shells and with :func:`split`:
967db96d56Sopenharmony_ci
977db96d56Sopenharmony_ci      >>> from shlex import split
987db96d56Sopenharmony_ci      >>> remote_command = split(remote_command)
997db96d56Sopenharmony_ci      >>> remote_command
1007db96d56Sopenharmony_ci      ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"]
1017db96d56Sopenharmony_ci      >>> command = split(remote_command[-1])
1027db96d56Sopenharmony_ci      >>> command
1037db96d56Sopenharmony_ci      ['ls', '-l', 'somefile; rm -rf ~']
1047db96d56Sopenharmony_ci
1057db96d56Sopenharmony_ci   .. versionadded:: 3.3
1067db96d56Sopenharmony_ci
1077db96d56Sopenharmony_ciThe :mod:`shlex` module defines the following class:
1087db96d56Sopenharmony_ci
1097db96d56Sopenharmony_ci
1107db96d56Sopenharmony_ci.. class:: shlex(instream=None, infile=None, posix=False, punctuation_chars=False)
1117db96d56Sopenharmony_ci
1127db96d56Sopenharmony_ci   A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer
1137db96d56Sopenharmony_ci   object.  The initialization argument, if present, specifies where to read
1147db96d56Sopenharmony_ci   characters from.  It must be a file-/stream-like object with
1157db96d56Sopenharmony_ci   :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or
1167db96d56Sopenharmony_ci   a string.  If no argument is given, input will be taken from ``sys.stdin``.
1177db96d56Sopenharmony_ci   The second optional argument is a filename string, which sets the initial
1187db96d56Sopenharmony_ci   value of the :attr:`~shlex.infile` attribute.  If the *instream*
1197db96d56Sopenharmony_ci   argument is omitted or equal to ``sys.stdin``, this second argument
1207db96d56Sopenharmony_ci   defaults to "stdin".  The *posix* argument defines the operational mode:
1217db96d56Sopenharmony_ci   when *posix* is not true (default), the :class:`~shlex.shlex` instance will
1227db96d56Sopenharmony_ci   operate in compatibility mode.  When operating in POSIX mode,
1237db96d56Sopenharmony_ci   :class:`~shlex.shlex` will try to be as close as possible to the POSIX shell
1247db96d56Sopenharmony_ci   parsing rules.  The *punctuation_chars* argument provides a way to make the
1257db96d56Sopenharmony_ci   behaviour even closer to how real shells parse.  This can take a number of
1267db96d56Sopenharmony_ci   values: the default value, ``False``, preserves the behaviour seen under
1277db96d56Sopenharmony_ci   Python 3.5 and earlier.  If set to ``True``, then parsing of the characters
1287db96d56Sopenharmony_ci   ``();<>|&`` is changed: any run of these characters (considered punctuation
1297db96d56Sopenharmony_ci   characters) is returned as a single token.  If set to a non-empty string of
1307db96d56Sopenharmony_ci   characters, those characters will be used as the punctuation characters.  Any
1317db96d56Sopenharmony_ci   characters in the :attr:`wordchars` attribute that appear in
1327db96d56Sopenharmony_ci   *punctuation_chars* will be removed from :attr:`wordchars`.  See
1337db96d56Sopenharmony_ci   :ref:`improved-shell-compatibility` for more information. *punctuation_chars*
1347db96d56Sopenharmony_ci   can be set only upon :class:`~shlex.shlex` instance creation and can't be
1357db96d56Sopenharmony_ci   modified later.
1367db96d56Sopenharmony_ci
1377db96d56Sopenharmony_ci   .. versionchanged:: 3.6
1387db96d56Sopenharmony_ci      The *punctuation_chars* parameter was added.
1397db96d56Sopenharmony_ci
1407db96d56Sopenharmony_ci.. seealso::
1417db96d56Sopenharmony_ci
1427db96d56Sopenharmony_ci   Module :mod:`configparser`
1437db96d56Sopenharmony_ci      Parser for configuration files similar to the Windows :file:`.ini` files.
1447db96d56Sopenharmony_ci
1457db96d56Sopenharmony_ci
1467db96d56Sopenharmony_ci.. _shlex-objects:
1477db96d56Sopenharmony_ci
1487db96d56Sopenharmony_cishlex Objects
1497db96d56Sopenharmony_ci-------------
1507db96d56Sopenharmony_ci
1517db96d56Sopenharmony_ciA :class:`~shlex.shlex` instance has the following methods:
1527db96d56Sopenharmony_ci
1537db96d56Sopenharmony_ci
1547db96d56Sopenharmony_ci.. method:: shlex.get_token()
1557db96d56Sopenharmony_ci
1567db96d56Sopenharmony_ci   Return a token.  If tokens have been stacked using :meth:`push_token`, pop a
1577db96d56Sopenharmony_ci   token off the stack.  Otherwise, read one from the input stream.  If reading
1587db96d56Sopenharmony_ci   encounters an immediate end-of-file, :attr:`eof` is returned (the empty
1597db96d56Sopenharmony_ci   string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
1607db96d56Sopenharmony_ci
1617db96d56Sopenharmony_ci
1627db96d56Sopenharmony_ci.. method:: shlex.push_token(str)
1637db96d56Sopenharmony_ci
1647db96d56Sopenharmony_ci   Push the argument onto the token stack.
1657db96d56Sopenharmony_ci
1667db96d56Sopenharmony_ci
1677db96d56Sopenharmony_ci.. method:: shlex.read_token()
1687db96d56Sopenharmony_ci
1697db96d56Sopenharmony_ci   Read a raw token.  Ignore the pushback stack, and do not interpret source
1707db96d56Sopenharmony_ci   requests.  (This is not ordinarily a useful entry point, and is documented here
1717db96d56Sopenharmony_ci   only for the sake of completeness.)
1727db96d56Sopenharmony_ci
1737db96d56Sopenharmony_ci
1747db96d56Sopenharmony_ci.. method:: shlex.sourcehook(filename)
1757db96d56Sopenharmony_ci
1767db96d56Sopenharmony_ci   When :class:`~shlex.shlex` detects a source request (see :attr:`source`
1777db96d56Sopenharmony_ci   below) this method is given the following token as argument, and expected
1787db96d56Sopenharmony_ci   to return a tuple consisting of a filename and an open file-like object.
1797db96d56Sopenharmony_ci
1807db96d56Sopenharmony_ci   Normally, this method first strips any quotes off the argument.  If the result
1817db96d56Sopenharmony_ci   is an absolute pathname, or there was no previous source request in effect, or
1827db96d56Sopenharmony_ci   the previous source was a stream (such as ``sys.stdin``), the result is left
1837db96d56Sopenharmony_ci   alone.  Otherwise, if the result is a relative pathname, the directory part of
1847db96d56Sopenharmony_ci   the name of the file immediately before it on the source inclusion stack is
1857db96d56Sopenharmony_ci   prepended (this behavior is like the way the C preprocessor handles ``#include
1867db96d56Sopenharmony_ci   "file.h"``).
1877db96d56Sopenharmony_ci
1887db96d56Sopenharmony_ci   The result of the manipulations is treated as a filename, and returned as the
1897db96d56Sopenharmony_ci   first component of the tuple, with :func:`open` called on it to yield the second
1907db96d56Sopenharmony_ci   component. (Note: this is the reverse of the order of arguments in instance
1917db96d56Sopenharmony_ci   initialization!)
1927db96d56Sopenharmony_ci
1937db96d56Sopenharmony_ci   This hook is exposed so that you can use it to implement directory search paths,
1947db96d56Sopenharmony_ci   addition of file extensions, and other namespace hacks. There is no
1957db96d56Sopenharmony_ci   corresponding 'close' hook, but a shlex instance will call the
1967db96d56Sopenharmony_ci   :meth:`~io.IOBase.close` method of the sourced input stream when it returns
1977db96d56Sopenharmony_ci   EOF.
1987db96d56Sopenharmony_ci
1997db96d56Sopenharmony_ci   For more explicit control of source stacking, use the :meth:`push_source` and
2007db96d56Sopenharmony_ci   :meth:`pop_source` methods.
2017db96d56Sopenharmony_ci
2027db96d56Sopenharmony_ci
2037db96d56Sopenharmony_ci.. method:: shlex.push_source(newstream, newfile=None)
2047db96d56Sopenharmony_ci
2057db96d56Sopenharmony_ci   Push an input source stream onto the input stack.  If the filename argument is
2067db96d56Sopenharmony_ci   specified it will later be available for use in error messages.  This is the
2077db96d56Sopenharmony_ci   same method used internally by the :meth:`sourcehook` method.
2087db96d56Sopenharmony_ci
2097db96d56Sopenharmony_ci
2107db96d56Sopenharmony_ci.. method:: shlex.pop_source()
2117db96d56Sopenharmony_ci
2127db96d56Sopenharmony_ci   Pop the last-pushed input source from the input stack. This is the same method
2137db96d56Sopenharmony_ci   used internally when the lexer reaches EOF on a stacked input stream.
2147db96d56Sopenharmony_ci
2157db96d56Sopenharmony_ci
2167db96d56Sopenharmony_ci.. method:: shlex.error_leader(infile=None, lineno=None)
2177db96d56Sopenharmony_ci
2187db96d56Sopenharmony_ci   This method generates an error message leader in the format of a Unix C compiler
2197db96d56Sopenharmony_ci   error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
2207db96d56Sopenharmony_ci   with the name of the current source file and the ``%d`` with the current input
2217db96d56Sopenharmony_ci   line number (the optional arguments can be used to override these).
2227db96d56Sopenharmony_ci
2237db96d56Sopenharmony_ci   This convenience is provided to encourage :mod:`shlex` users to generate error
2247db96d56Sopenharmony_ci   messages in the standard, parseable format understood by Emacs and other Unix
2257db96d56Sopenharmony_ci   tools.
2267db96d56Sopenharmony_ci
2277db96d56Sopenharmony_ciInstances of :class:`~shlex.shlex` subclasses have some public instance
2287db96d56Sopenharmony_civariables which either control lexical analysis or can be used for debugging:
2297db96d56Sopenharmony_ci
2307db96d56Sopenharmony_ci
2317db96d56Sopenharmony_ci.. attribute:: shlex.commenters
2327db96d56Sopenharmony_ci
2337db96d56Sopenharmony_ci   The string of characters that are recognized as comment beginners. All
2347db96d56Sopenharmony_ci   characters from the comment beginner to end of line are ignored. Includes just
2357db96d56Sopenharmony_ci   ``'#'`` by default.
2367db96d56Sopenharmony_ci
2377db96d56Sopenharmony_ci
2387db96d56Sopenharmony_ci.. attribute:: shlex.wordchars
2397db96d56Sopenharmony_ci
2407db96d56Sopenharmony_ci   The string of characters that will accumulate into multi-character tokens.  By
2417db96d56Sopenharmony_ci   default, includes all ASCII alphanumerics and underscore.  In POSIX mode, the
2427db96d56Sopenharmony_ci   accented characters in the Latin-1 set are also included.  If
2437db96d56Sopenharmony_ci   :attr:`punctuation_chars` is not empty, the characters ``~-./*?=``, which can
2447db96d56Sopenharmony_ci   appear in filename specifications and command line parameters, will also be
2457db96d56Sopenharmony_ci   included in this attribute, and any characters which appear in
2467db96d56Sopenharmony_ci   ``punctuation_chars`` will be removed from ``wordchars`` if they are present
2477db96d56Sopenharmony_ci   there. If :attr:`whitespace_split` is set to ``True``, this will have no
2487db96d56Sopenharmony_ci   effect.
2497db96d56Sopenharmony_ci
2507db96d56Sopenharmony_ci
2517db96d56Sopenharmony_ci.. attribute:: shlex.whitespace
2527db96d56Sopenharmony_ci
2537db96d56Sopenharmony_ci   Characters that will be considered whitespace and skipped.  Whitespace bounds
2547db96d56Sopenharmony_ci   tokens.  By default, includes space, tab, linefeed and carriage-return.
2557db96d56Sopenharmony_ci
2567db96d56Sopenharmony_ci
2577db96d56Sopenharmony_ci.. attribute:: shlex.escape
2587db96d56Sopenharmony_ci
2597db96d56Sopenharmony_ci   Characters that will be considered as escape. This will be only used in POSIX
2607db96d56Sopenharmony_ci   mode, and includes just ``'\'`` by default.
2617db96d56Sopenharmony_ci
2627db96d56Sopenharmony_ci
2637db96d56Sopenharmony_ci.. attribute:: shlex.quotes
2647db96d56Sopenharmony_ci
2657db96d56Sopenharmony_ci   Characters that will be considered string quotes.  The token accumulates until
2667db96d56Sopenharmony_ci   the same quote is encountered again (thus, different quote types protect each
2677db96d56Sopenharmony_ci   other as in the shell.)  By default, includes ASCII single and double quotes.
2687db96d56Sopenharmony_ci
2697db96d56Sopenharmony_ci
2707db96d56Sopenharmony_ci.. attribute:: shlex.escapedquotes
2717db96d56Sopenharmony_ci
2727db96d56Sopenharmony_ci   Characters in :attr:`quotes` that will interpret escape characters defined in
2737db96d56Sopenharmony_ci   :attr:`escape`.  This is only used in POSIX mode, and includes just ``'"'`` by
2747db96d56Sopenharmony_ci   default.
2757db96d56Sopenharmony_ci
2767db96d56Sopenharmony_ci
2777db96d56Sopenharmony_ci.. attribute:: shlex.whitespace_split
2787db96d56Sopenharmony_ci
2797db96d56Sopenharmony_ci   If ``True``, tokens will only be split in whitespaces.  This is useful, for
2807db96d56Sopenharmony_ci   example, for parsing command lines with :class:`~shlex.shlex`, getting
2817db96d56Sopenharmony_ci   tokens in a similar way to shell arguments.  When used in combination with
2827db96d56Sopenharmony_ci   :attr:`punctuation_chars`, tokens will be split on whitespace in addition to
2837db96d56Sopenharmony_ci   those characters.
2847db96d56Sopenharmony_ci
2857db96d56Sopenharmony_ci   .. versionchanged:: 3.8
2867db96d56Sopenharmony_ci      The :attr:`punctuation_chars` attribute was made compatible with the
2877db96d56Sopenharmony_ci      :attr:`whitespace_split` attribute.
2887db96d56Sopenharmony_ci
2897db96d56Sopenharmony_ci
2907db96d56Sopenharmony_ci.. attribute:: shlex.infile
2917db96d56Sopenharmony_ci
2927db96d56Sopenharmony_ci   The name of the current input file, as initially set at class instantiation time
2937db96d56Sopenharmony_ci   or stacked by later source requests.  It may be useful to examine this when
2947db96d56Sopenharmony_ci   constructing error messages.
2957db96d56Sopenharmony_ci
2967db96d56Sopenharmony_ci
2977db96d56Sopenharmony_ci.. attribute:: shlex.instream
2987db96d56Sopenharmony_ci
2997db96d56Sopenharmony_ci   The input stream from which this :class:`~shlex.shlex` instance is reading
3007db96d56Sopenharmony_ci   characters.
3017db96d56Sopenharmony_ci
3027db96d56Sopenharmony_ci
3037db96d56Sopenharmony_ci.. attribute:: shlex.source
3047db96d56Sopenharmony_ci
3057db96d56Sopenharmony_ci   This attribute is ``None`` by default.  If you assign a string to it, that
3067db96d56Sopenharmony_ci   string will be recognized as a lexical-level inclusion request similar to the
3077db96d56Sopenharmony_ci   ``source`` keyword in various shells.  That is, the immediately following token
3087db96d56Sopenharmony_ci   will be opened as a filename and input will be taken from that stream until
3097db96d56Sopenharmony_ci   EOF, at which point the :meth:`~io.IOBase.close` method of that stream will be
3107db96d56Sopenharmony_ci   called and the input source will again become the original input stream.  Source
3117db96d56Sopenharmony_ci   requests may be stacked any number of levels deep.
3127db96d56Sopenharmony_ci
3137db96d56Sopenharmony_ci
3147db96d56Sopenharmony_ci.. attribute:: shlex.debug
3157db96d56Sopenharmony_ci
3167db96d56Sopenharmony_ci   If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex`
3177db96d56Sopenharmony_ci   instance will print verbose progress output on its behavior.  If you need
3187db96d56Sopenharmony_ci   to use this, you can read the module source code to learn the details.
3197db96d56Sopenharmony_ci
3207db96d56Sopenharmony_ci
3217db96d56Sopenharmony_ci.. attribute:: shlex.lineno
3227db96d56Sopenharmony_ci
3237db96d56Sopenharmony_ci   Source line number (count of newlines seen so far plus one).
3247db96d56Sopenharmony_ci
3257db96d56Sopenharmony_ci
3267db96d56Sopenharmony_ci.. attribute:: shlex.token
3277db96d56Sopenharmony_ci
3287db96d56Sopenharmony_ci   The token buffer.  It may be useful to examine this when catching exceptions.
3297db96d56Sopenharmony_ci
3307db96d56Sopenharmony_ci
3317db96d56Sopenharmony_ci.. attribute:: shlex.eof
3327db96d56Sopenharmony_ci
3337db96d56Sopenharmony_ci   Token used to determine end of file. This will be set to the empty string
3347db96d56Sopenharmony_ci   (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
3357db96d56Sopenharmony_ci
3367db96d56Sopenharmony_ci
3377db96d56Sopenharmony_ci.. attribute:: shlex.punctuation_chars
3387db96d56Sopenharmony_ci
3397db96d56Sopenharmony_ci   A read-only property. Characters that will be considered punctuation. Runs of
3407db96d56Sopenharmony_ci   punctuation characters will be returned as a single token. However, note that no
3417db96d56Sopenharmony_ci   semantic validity checking will be performed: for example, '>>>' could be
3427db96d56Sopenharmony_ci   returned as a token, even though it may not be recognised as such by shells.
3437db96d56Sopenharmony_ci
3447db96d56Sopenharmony_ci   .. versionadded:: 3.6
3457db96d56Sopenharmony_ci
3467db96d56Sopenharmony_ci
3477db96d56Sopenharmony_ci.. _shlex-parsing-rules:
3487db96d56Sopenharmony_ci
3497db96d56Sopenharmony_ciParsing Rules
3507db96d56Sopenharmony_ci-------------
3517db96d56Sopenharmony_ci
3527db96d56Sopenharmony_ciWhen operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the
3537db96d56Sopenharmony_cifollowing rules.
3547db96d56Sopenharmony_ci
3557db96d56Sopenharmony_ci* Quote characters are not recognized within words (``Do"Not"Separate`` is
3567db96d56Sopenharmony_ci  parsed as the single word ``Do"Not"Separate``);
3577db96d56Sopenharmony_ci
3587db96d56Sopenharmony_ci* Escape characters are not recognized;
3597db96d56Sopenharmony_ci
3607db96d56Sopenharmony_ci* Enclosing characters in quotes preserve the literal value of all characters
3617db96d56Sopenharmony_ci  within the quotes;
3627db96d56Sopenharmony_ci
3637db96d56Sopenharmony_ci* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
3647db96d56Sopenharmony_ci  ``Separate``);
3657db96d56Sopenharmony_ci
3667db96d56Sopenharmony_ci* If :attr:`~shlex.whitespace_split` is ``False``, any character not
3677db96d56Sopenharmony_ci  declared to be a word character, whitespace, or a quote will be returned as
3687db96d56Sopenharmony_ci  a single-character token. If it is ``True``, :class:`~shlex.shlex` will only
3697db96d56Sopenharmony_ci  split words in whitespaces;
3707db96d56Sopenharmony_ci
3717db96d56Sopenharmony_ci* EOF is signaled with an empty string (``''``);
3727db96d56Sopenharmony_ci
3737db96d56Sopenharmony_ci* It's not possible to parse empty strings, even if quoted.
3747db96d56Sopenharmony_ci
3757db96d56Sopenharmony_ciWhen operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the
3767db96d56Sopenharmony_cifollowing parsing rules.
3777db96d56Sopenharmony_ci
3787db96d56Sopenharmony_ci* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
3797db96d56Sopenharmony_ci  parsed as the single word ``DoNotSeparate``);
3807db96d56Sopenharmony_ci
3817db96d56Sopenharmony_ci* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
3827db96d56Sopenharmony_ci  next character that follows;
3837db96d56Sopenharmony_ci
3847db96d56Sopenharmony_ci* Enclosing characters in quotes which are not part of
3857db96d56Sopenharmony_ci  :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value
3867db96d56Sopenharmony_ci  of all characters within the quotes;
3877db96d56Sopenharmony_ci
3887db96d56Sopenharmony_ci* Enclosing characters in quotes which are part of
3897db96d56Sopenharmony_ci  :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value
3907db96d56Sopenharmony_ci  of all characters within the quotes, with the exception of the characters
3917db96d56Sopenharmony_ci  mentioned in :attr:`~shlex.escape`.  The escape characters retain its
3927db96d56Sopenharmony_ci  special meaning only when followed by the quote in use, or the escape
3937db96d56Sopenharmony_ci  character itself. Otherwise the escape character will be considered a
3947db96d56Sopenharmony_ci  normal character.
3957db96d56Sopenharmony_ci
3967db96d56Sopenharmony_ci* EOF is signaled with a :const:`None` value;
3977db96d56Sopenharmony_ci
3987db96d56Sopenharmony_ci* Quoted empty strings (``''``) are allowed.
3997db96d56Sopenharmony_ci
4007db96d56Sopenharmony_ci.. _improved-shell-compatibility:
4017db96d56Sopenharmony_ci
4027db96d56Sopenharmony_ciImproved Compatibility with Shells
4037db96d56Sopenharmony_ci----------------------------------
4047db96d56Sopenharmony_ci
4057db96d56Sopenharmony_ci.. versionadded:: 3.6
4067db96d56Sopenharmony_ci
4077db96d56Sopenharmony_ciThe :class:`shlex` class provides compatibility with the parsing performed by
4087db96d56Sopenharmony_cicommon Unix shells like ``bash``, ``dash``, and ``sh``.  To take advantage of
4097db96d56Sopenharmony_cithis compatibility, specify the ``punctuation_chars`` argument in the
4107db96d56Sopenharmony_ciconstructor.  This defaults to ``False``, which preserves pre-3.6 behaviour.
4117db96d56Sopenharmony_ciHowever, if it is set to ``True``, then parsing of the characters ``();<>|&``
4127db96d56Sopenharmony_ciis changed: any run of these characters is returned as a single token.  While
4137db96d56Sopenharmony_cithis is short of a full parser for shells (which would be out of scope for the
4147db96d56Sopenharmony_cistandard library, given the multiplicity of shells out there), it does allow
4157db96d56Sopenharmony_ciyou to perform processing of command lines more easily than you could
4167db96d56Sopenharmony_ciotherwise.  To illustrate, you can see the difference in the following snippet:
4177db96d56Sopenharmony_ci
4187db96d56Sopenharmony_ci.. doctest::
4197db96d56Sopenharmony_ci   :options: +NORMALIZE_WHITESPACE
4207db96d56Sopenharmony_ci
4217db96d56Sopenharmony_ci    >>> import shlex
4227db96d56Sopenharmony_ci    >>> text = "a && b; c && d || e; f >'abc'; (def \"ghi\")"
4237db96d56Sopenharmony_ci    >>> s = shlex.shlex(text, posix=True)
4247db96d56Sopenharmony_ci    >>> s.whitespace_split = True
4257db96d56Sopenharmony_ci    >>> list(s)
4267db96d56Sopenharmony_ci    ['a', '&&', 'b;', 'c', '&&', 'd', '||', 'e;', 'f', '>abc;', '(def', 'ghi)']
4277db96d56Sopenharmony_ci    >>> s = shlex.shlex(text, posix=True, punctuation_chars=True)
4287db96d56Sopenharmony_ci    >>> s.whitespace_split = True
4297db96d56Sopenharmony_ci    >>> list(s)
4307db96d56Sopenharmony_ci    ['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', 'abc', ';',
4317db96d56Sopenharmony_ci    '(', 'def', 'ghi', ')']
4327db96d56Sopenharmony_ci
4337db96d56Sopenharmony_ciOf course, tokens will be returned which are not valid for shells, and you'll
4347db96d56Sopenharmony_cineed to implement your own error checks on the returned tokens.
4357db96d56Sopenharmony_ci
4367db96d56Sopenharmony_ciInstead of passing ``True`` as the value for the punctuation_chars parameter,
4377db96d56Sopenharmony_ciyou can pass a string with specific characters, which will be used to determine
4387db96d56Sopenharmony_ciwhich characters constitute punctuation. For example::
4397db96d56Sopenharmony_ci
4407db96d56Sopenharmony_ci    >>> import shlex
4417db96d56Sopenharmony_ci    >>> s = shlex.shlex("a && b || c", punctuation_chars="|")
4427db96d56Sopenharmony_ci    >>> list(s)
4437db96d56Sopenharmony_ci    ['a', '&', '&', 'b', '||', 'c']
4447db96d56Sopenharmony_ci
4457db96d56Sopenharmony_ci.. note:: When ``punctuation_chars`` is specified, the :attr:`~shlex.wordchars`
4467db96d56Sopenharmony_ci   attribute is augmented with the characters ``~-./*?=``.  That is because these
4477db96d56Sopenharmony_ci   characters can appear in file names (including wildcards) and command-line
4487db96d56Sopenharmony_ci   arguments (e.g. ``--color=auto``). Hence::
4497db96d56Sopenharmony_ci
4507db96d56Sopenharmony_ci      >>> import shlex
4517db96d56Sopenharmony_ci      >>> s = shlex.shlex('~/a && b-c --color=auto || d *.py?',
4527db96d56Sopenharmony_ci      ...                 punctuation_chars=True)
4537db96d56Sopenharmony_ci      >>> list(s)
4547db96d56Sopenharmony_ci      ['~/a', '&&', 'b-c', '--color=auto', '||', 'd', '*.py?']
4557db96d56Sopenharmony_ci
4567db96d56Sopenharmony_ci   However, to match the shell as closely as possible, it is recommended to
4577db96d56Sopenharmony_ci   always use ``posix`` and :attr:`~shlex.whitespace_split` when using
4587db96d56Sopenharmony_ci   :attr:`~shlex.punctuation_chars`, which will negate
4597db96d56Sopenharmony_ci   :attr:`~shlex.wordchars` entirely.
4607db96d56Sopenharmony_ci
4617db96d56Sopenharmony_ciFor best effect, ``punctuation_chars`` should be set in conjunction with
4627db96d56Sopenharmony_ci``posix=True``. (Note that ``posix=False`` is the default for
4637db96d56Sopenharmony_ci:class:`~shlex.shlex`.)
464