1e18e3516Sopenharmony_ciPCRE2TEST(1) General Commands Manual PCRE2TEST(1) 2e18e3516Sopenharmony_ci 3e18e3516Sopenharmony_ci 4e18e3516Sopenharmony_ci 5e18e3516Sopenharmony_ciNAME 6e18e3516Sopenharmony_ci pcre2test - a program for testing Perl-compatible regular expressions. 7e18e3516Sopenharmony_ci 8e18e3516Sopenharmony_ciSYNOPSIS 9e18e3516Sopenharmony_ci 10e18e3516Sopenharmony_ci pcre2test [options] [input file [output file]] 11e18e3516Sopenharmony_ci 12e18e3516Sopenharmony_ci pcre2test is a test program for the PCRE2 regular expression libraries, 13e18e3516Sopenharmony_ci but it can also be used for experimenting with regular expressions. 14e18e3516Sopenharmony_ci This document describes the features of the test program; for details 15e18e3516Sopenharmony_ci of the regular expressions themselves, see the pcre2pattern documenta- 16e18e3516Sopenharmony_ci tion. For details of the PCRE2 library function calls and their op- 17e18e3516Sopenharmony_ci tions, see the pcre2api documentation. 18e18e3516Sopenharmony_ci 19e18e3516Sopenharmony_ci The input for pcre2test is a sequence of regular expression patterns 20e18e3516Sopenharmony_ci and subject strings to be matched. There are also command lines for 21e18e3516Sopenharmony_ci setting defaults and controlling some special actions. The output shows 22e18e3516Sopenharmony_ci the result of each match attempt. Modifiers on external or internal 23e18e3516Sopenharmony_ci command lines, the patterns, and the subject lines specify PCRE2 func- 24e18e3516Sopenharmony_ci tion options, control how the subject is processed, and what output is 25e18e3516Sopenharmony_ci produced. 26e18e3516Sopenharmony_ci 27e18e3516Sopenharmony_ci There are many obscure modifiers, some of which are specifically de- 28e18e3516Sopenharmony_ci signed for use in conjunction with the test script and data files that 29e18e3516Sopenharmony_ci are distributed as part of PCRE2. All the modifiers are documented 30e18e3516Sopenharmony_ci here, some without much justification, but many of them are unlikely to 31e18e3516Sopenharmony_ci be of use except when testing the libraries. 32e18e3516Sopenharmony_ci 33e18e3516Sopenharmony_ci 34e18e3516Sopenharmony_ciPCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES 35e18e3516Sopenharmony_ci 36e18e3516Sopenharmony_ci Different versions of the PCRE2 library can be built to support charac- 37e18e3516Sopenharmony_ci ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units. 38e18e3516Sopenharmony_ci One, two, or all three of these libraries may be simultaneously in- 39e18e3516Sopenharmony_ci stalled. The pcre2test program can be used to test all the libraries. 40e18e3516Sopenharmony_ci However, its own input and output are always in 8-bit format. When 41e18e3516Sopenharmony_ci testing the 16-bit or 32-bit libraries, patterns and subject strings 42e18e3516Sopenharmony_ci are converted to 16-bit or 32-bit format before being passed to the li- 43e18e3516Sopenharmony_ci brary functions. Results are converted back to 8-bit code units for 44e18e3516Sopenharmony_ci output. 45e18e3516Sopenharmony_ci 46e18e3516Sopenharmony_ci In the rest of this document, the names of library functions and struc- 47e18e3516Sopenharmony_ci tures are given in generic form, for example, pcre2_compile(). The ac- 48e18e3516Sopenharmony_ci tual names used in the libraries have a suffix _8, _16, or _32, as ap- 49e18e3516Sopenharmony_ci propriate. 50e18e3516Sopenharmony_ci 51e18e3516Sopenharmony_ci 52e18e3516Sopenharmony_ciINPUT ENCODING 53e18e3516Sopenharmony_ci 54e18e3516Sopenharmony_ci Input to pcre2test is processed line by line, either by calling the C 55e18e3516Sopenharmony_ci library's fgets() function, or via the libreadline or libedit library. 56e18e3516Sopenharmony_ci In some Windows environments character 26 (hex 1A) causes an immediate 57e18e3516Sopenharmony_ci end of file, and no further data is read, so this character should be 58e18e3516Sopenharmony_ci avoided unless you really want that action. 59e18e3516Sopenharmony_ci 60e18e3516Sopenharmony_ci The input is processed using using C's string functions, so must not 61e18e3516Sopenharmony_ci contain binary zeros, even though in Unix-like environments, fgets() 62e18e3516Sopenharmony_ci treats any bytes other than newline as data characters. An error is 63e18e3516Sopenharmony_ci generated if a binary zero is encountered. By default subject lines are 64e18e3516Sopenharmony_ci processed for backslash escapes, which makes it possible to include any 65e18e3516Sopenharmony_ci data value in strings that are passed to the library for matching. For 66e18e3516Sopenharmony_ci patterns, there is a facility for specifying some or all of the 8-bit 67e18e3516Sopenharmony_ci input characters as hexadecimal pairs, which makes it possible to in- 68e18e3516Sopenharmony_ci clude binary zeros. 69e18e3516Sopenharmony_ci 70e18e3516Sopenharmony_ci Input for the 16-bit and 32-bit libraries 71e18e3516Sopenharmony_ci 72e18e3516Sopenharmony_ci When testing the 16-bit or 32-bit libraries, there is a need to be able 73e18e3516Sopenharmony_ci to generate character code points greater than 255 in the strings that 74e18e3516Sopenharmony_ci are passed to the library. For subject lines, backslash escapes can be 75e18e3516Sopenharmony_ci used. In addition, when the utf modifier (see "Setting compilation op- 76e18e3516Sopenharmony_ci tions" below) is set, the pattern and any following subject lines are 77e18e3516Sopenharmony_ci interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap- 78e18e3516Sopenharmony_ci propriate. 79e18e3516Sopenharmony_ci 80e18e3516Sopenharmony_ci For non-UTF testing of wide characters, the utf8_input modifier can be 81e18e3516Sopenharmony_ci used. This is mutually exclusive with utf, and is allowed only in 82e18e3516Sopenharmony_ci 16-bit or 32-bit mode. It causes the pattern and following subject 83e18e3516Sopenharmony_ci lines to be treated as UTF-8 according to the original definition (RFC 84e18e3516Sopenharmony_ci 2279), which allows for character values up to 0x7fffffff. Each charac- 85e18e3516Sopenharmony_ci ter is placed in one 16-bit or 32-bit code unit (in the 16-bit case, 86e18e3516Sopenharmony_ci values greater than 0xffff cause an error to occur). 87e18e3516Sopenharmony_ci 88e18e3516Sopenharmony_ci UTF-8 (in its original definition) is not capable of encoding values 89e18e3516Sopenharmony_ci greater than 0x7fffffff, but such values can be handled by the 32-bit 90e18e3516Sopenharmony_ci library. When testing this library in non-UTF mode with utf8_input set, 91e18e3516Sopenharmony_ci if any character is preceded by the byte 0xff (which is an invalid byte 92e18e3516Sopenharmony_ci in UTF-8) 0x80000000 is added to the character's value. This is the 93e18e3516Sopenharmony_ci only way of passing such code points in a pattern string. For subject 94e18e3516Sopenharmony_ci strings, using an escape sequence is preferable. 95e18e3516Sopenharmony_ci 96e18e3516Sopenharmony_ci 97e18e3516Sopenharmony_ciCOMMAND LINE OPTIONS 98e18e3516Sopenharmony_ci 99e18e3516Sopenharmony_ci -8 If the 8-bit library has been built, this option causes it to 100e18e3516Sopenharmony_ci be used (this is the default). If the 8-bit library has not 101e18e3516Sopenharmony_ci been built, this option causes an error. 102e18e3516Sopenharmony_ci 103e18e3516Sopenharmony_ci -16 If the 16-bit library has been built, this option causes it 104e18e3516Sopenharmony_ci to be used. If only the 16-bit library has been built, this 105e18e3516Sopenharmony_ci is the default. If the 16-bit library has not been built, 106e18e3516Sopenharmony_ci this option causes an error. 107e18e3516Sopenharmony_ci 108e18e3516Sopenharmony_ci -32 If the 32-bit library has been built, this option causes it 109e18e3516Sopenharmony_ci to be used. If only the 32-bit library has been built, this 110e18e3516Sopenharmony_ci is the default. If the 32-bit library has not been built, 111e18e3516Sopenharmony_ci this option causes an error. 112e18e3516Sopenharmony_ci 113e18e3516Sopenharmony_ci -ac Behave as if each pattern has the auto_callout modifier, that 114e18e3516Sopenharmony_ci is, insert automatic callouts into every pattern that is com- 115e18e3516Sopenharmony_ci piled. 116e18e3516Sopenharmony_ci 117e18e3516Sopenharmony_ci -AC As for -ac, but in addition behave as if each subject line 118e18e3516Sopenharmony_ci has the callout_extra modifier, that is, show additional in- 119e18e3516Sopenharmony_ci formation from callouts. 120e18e3516Sopenharmony_ci 121e18e3516Sopenharmony_ci -b Behave as if each pattern has the fullbincode modifier; the 122e18e3516Sopenharmony_ci full internal binary form of the pattern is output after com- 123e18e3516Sopenharmony_ci pilation. 124e18e3516Sopenharmony_ci 125e18e3516Sopenharmony_ci -C Output the version number of the PCRE2 library, and all 126e18e3516Sopenharmony_ci available information about the optional features that are 127e18e3516Sopenharmony_ci included, and then exit with zero exit code. All other op- 128e18e3516Sopenharmony_ci tions are ignored. If both -C and -LM are present, whichever 129e18e3516Sopenharmony_ci is first is recognized. 130e18e3516Sopenharmony_ci 131e18e3516Sopenharmony_ci -C option Output information about a specific build-time option, then 132e18e3516Sopenharmony_ci exit. This functionality is intended for use in scripts such 133e18e3516Sopenharmony_ci as RunTest. The following options output the value and set 134e18e3516Sopenharmony_ci the exit code as indicated: 135e18e3516Sopenharmony_ci 136e18e3516Sopenharmony_ci ebcdic-nl the code for LF (= NL) in an EBCDIC environment: 137e18e3516Sopenharmony_ci 0x15 or 0x25 138e18e3516Sopenharmony_ci 0 if used in an ASCII environment 139e18e3516Sopenharmony_ci exit code is always 0 140e18e3516Sopenharmony_ci linksize the configured internal link size (2, 3, or 4) 141e18e3516Sopenharmony_ci exit code is set to the link size 142e18e3516Sopenharmony_ci newline the default newline setting: 143e18e3516Sopenharmony_ci CR, LF, CRLF, ANYCRLF, ANY, or NUL 144e18e3516Sopenharmony_ci exit code is always 0 145e18e3516Sopenharmony_ci bsr the default setting for what \R matches: 146e18e3516Sopenharmony_ci ANYCRLF or ANY 147e18e3516Sopenharmony_ci exit code is always 0 148e18e3516Sopenharmony_ci 149e18e3516Sopenharmony_ci The following options output 1 for true or 0 for false, and 150e18e3516Sopenharmony_ci set the exit code to the same value: 151e18e3516Sopenharmony_ci 152e18e3516Sopenharmony_ci backslash-C \C is supported (not locked out) 153e18e3516Sopenharmony_ci ebcdic compiled for an EBCDIC environment 154e18e3516Sopenharmony_ci jit just-in-time support is available 155e18e3516Sopenharmony_ci pcre2-16 the 16-bit library was built 156e18e3516Sopenharmony_ci pcre2-32 the 32-bit library was built 157e18e3516Sopenharmony_ci pcre2-8 the 8-bit library was built 158e18e3516Sopenharmony_ci unicode Unicode support is available 159e18e3516Sopenharmony_ci 160e18e3516Sopenharmony_ci If an unknown option is given, an error message is output; 161e18e3516Sopenharmony_ci the exit code is 0. 162e18e3516Sopenharmony_ci 163e18e3516Sopenharmony_ci -d Behave as if each pattern has the debug modifier; the inter- 164e18e3516Sopenharmony_ci nal form and information about the compiled pattern is output 165e18e3516Sopenharmony_ci after compilation; -d is equivalent to -b -i. 166e18e3516Sopenharmony_ci 167e18e3516Sopenharmony_ci -dfa Behave as if each subject line has the dfa modifier; matching 168e18e3516Sopenharmony_ci is done using the pcre2_dfa_match() function instead of the 169e18e3516Sopenharmony_ci default pcre2_match(). 170e18e3516Sopenharmony_ci 171e18e3516Sopenharmony_ci -error number[,number,...] 172e18e3516Sopenharmony_ci Call pcre2_get_error_message() for each of the error numbers 173e18e3516Sopenharmony_ci in the comma-separated list, display the resulting messages 174e18e3516Sopenharmony_ci on the standard output, then exit with zero exit code. The 175e18e3516Sopenharmony_ci numbers may be positive or negative. This is a convenience 176e18e3516Sopenharmony_ci facility for PCRE2 maintainers. 177e18e3516Sopenharmony_ci 178e18e3516Sopenharmony_ci -help Output a brief summary these options and then exit. 179e18e3516Sopenharmony_ci 180e18e3516Sopenharmony_ci -i Behave as if each pattern has the info modifier; information 181e18e3516Sopenharmony_ci about the compiled pattern is given after compilation. 182e18e3516Sopenharmony_ci 183e18e3516Sopenharmony_ci -jit Behave as if each pattern line has the jit modifier; after 184e18e3516Sopenharmony_ci successful compilation, each pattern is passed to the just- 185e18e3516Sopenharmony_ci in-time compiler, if available. 186e18e3516Sopenharmony_ci 187e18e3516Sopenharmony_ci -jitfast Behave as if each pattern line has the jitfast modifier; af- 188e18e3516Sopenharmony_ci ter successful compilation, each pattern is passed to the 189e18e3516Sopenharmony_ci just-in-time compiler, if available, and each subject line is 190e18e3516Sopenharmony_ci passed directly to the JIT matcher via its "fast path". 191e18e3516Sopenharmony_ci 192e18e3516Sopenharmony_ci -jitverify 193e18e3516Sopenharmony_ci Behave as if each pattern line has the jitverify modifier; 194e18e3516Sopenharmony_ci after successful compilation, each pattern is passed to the 195e18e3516Sopenharmony_ci just-in-time compiler, if available, and the use of JIT for 196e18e3516Sopenharmony_ci matching is verified. 197e18e3516Sopenharmony_ci 198e18e3516Sopenharmony_ci -LM List modifiers: write a list of available pattern and subject 199e18e3516Sopenharmony_ci modifiers to the standard output, then exit with zero exit 200e18e3516Sopenharmony_ci code. All other options are ignored. If both -C and any -Lx 201e18e3516Sopenharmony_ci options are present, whichever is first is recognized. 202e18e3516Sopenharmony_ci 203e18e3516Sopenharmony_ci -LP List properties: write a list of recognized Unicode proper- 204e18e3516Sopenharmony_ci ties to the standard output, then exit with zero exit code. 205e18e3516Sopenharmony_ci All other options are ignored. If both -C and any -Lx options 206e18e3516Sopenharmony_ci are present, whichever is first is recognized. 207e18e3516Sopenharmony_ci 208e18e3516Sopenharmony_ci -LS List scripts: write a list of recognized Unicode script names 209e18e3516Sopenharmony_ci to the standard output, then exit with zero exit code. All 210e18e3516Sopenharmony_ci other options are ignored. If both -C and any -Lx options are 211e18e3516Sopenharmony_ci present, whichever is first is recognized. 212e18e3516Sopenharmony_ci 213e18e3516Sopenharmony_ci -pattern modifier-list 214e18e3516Sopenharmony_ci Behave as if each pattern line contains the given modifiers. 215e18e3516Sopenharmony_ci 216e18e3516Sopenharmony_ci -q Do not output the version number of pcre2test at the start of 217e18e3516Sopenharmony_ci execution. 218e18e3516Sopenharmony_ci 219e18e3516Sopenharmony_ci -S size On Unix-like systems, set the size of the run-time stack to 220e18e3516Sopenharmony_ci size mebibytes (units of 1024*1024 bytes). 221e18e3516Sopenharmony_ci 222e18e3516Sopenharmony_ci -subject modifier-list 223e18e3516Sopenharmony_ci Behave as if each subject line contains the given modifiers. 224e18e3516Sopenharmony_ci 225e18e3516Sopenharmony_ci -t Run each compile and match many times with a timer, and out- 226e18e3516Sopenharmony_ci put the resulting times per compile or match. When JIT is 227e18e3516Sopenharmony_ci used, separate times are given for the initial compile and 228e18e3516Sopenharmony_ci the JIT compile. You can control the number of iterations 229e18e3516Sopenharmony_ci that are used for timing by following -t with a number (as a 230e18e3516Sopenharmony_ci separate item on the command line). For example, "-t 1000" 231e18e3516Sopenharmony_ci iterates 1000 times. The default is to iterate 500,000 times. 232e18e3516Sopenharmony_ci 233e18e3516Sopenharmony_ci -tm This is like -t except that it times only the matching phase, 234e18e3516Sopenharmony_ci not the compile phase. 235e18e3516Sopenharmony_ci 236e18e3516Sopenharmony_ci -T -TM These behave like -t and -tm, but in addition, at the end of 237e18e3516Sopenharmony_ci a run, the total times for all compiles and matches are out- 238e18e3516Sopenharmony_ci put. 239e18e3516Sopenharmony_ci 240e18e3516Sopenharmony_ci -version Output the PCRE2 version number and then exit. 241e18e3516Sopenharmony_ci 242e18e3516Sopenharmony_ci 243e18e3516Sopenharmony_ciDESCRIPTION 244e18e3516Sopenharmony_ci 245e18e3516Sopenharmony_ci If pcre2test is given two filename arguments, it reads from the first 246e18e3516Sopenharmony_ci and writes to the second. If the first name is "-", input is taken from 247e18e3516Sopenharmony_ci the standard input. If pcre2test is given only one argument, it reads 248e18e3516Sopenharmony_ci from that file and writes to stdout. Otherwise, it reads from stdin and 249e18e3516Sopenharmony_ci writes to stdout. 250e18e3516Sopenharmony_ci 251e18e3516Sopenharmony_ci When pcre2test is built, a configuration option can specify that it 252e18e3516Sopenharmony_ci should be linked with the libreadline or libedit library. When this is 253e18e3516Sopenharmony_ci done, if the input is from a terminal, it is read using the readline() 254e18e3516Sopenharmony_ci function. This provides line-editing and history facilities. The output 255e18e3516Sopenharmony_ci from the -help option states whether or not readline() will be used. 256e18e3516Sopenharmony_ci 257e18e3516Sopenharmony_ci The program handles any number of tests, each of which consists of a 258e18e3516Sopenharmony_ci set of input lines. Each set starts with a regular expression pattern, 259e18e3516Sopenharmony_ci followed by any number of subject lines to be matched against that pat- 260e18e3516Sopenharmony_ci tern. In between sets of test data, command lines that begin with # may 261e18e3516Sopenharmony_ci appear. This file format, with some restrictions, can also be processed 262e18e3516Sopenharmony_ci by the perltest.sh script that is distributed with PCRE2 as a means of 263e18e3516Sopenharmony_ci checking that the behaviour of PCRE2 and Perl is the same. For a speci- 264e18e3516Sopenharmony_ci fication of perltest.sh, see the comments near its beginning. See also 265e18e3516Sopenharmony_ci the #perltest command below. 266e18e3516Sopenharmony_ci 267e18e3516Sopenharmony_ci When the input is a terminal, pcre2test prompts for each line of input, 268e18e3516Sopenharmony_ci using "re>" to prompt for regular expression patterns, and "data>" to 269e18e3516Sopenharmony_ci prompt for subject lines. Command lines starting with # can be entered 270e18e3516Sopenharmony_ci only in response to the "re>" prompt. 271e18e3516Sopenharmony_ci 272e18e3516Sopenharmony_ci Each subject line is matched separately and independently. If you want 273e18e3516Sopenharmony_ci to do multi-line matches, you have to use the \n escape sequence (or \r 274e18e3516Sopenharmony_ci or \r\n, etc., depending on the newline setting) in a single line of 275e18e3516Sopenharmony_ci input to encode the newline sequences. There is no limit on the length 276e18e3516Sopenharmony_ci of subject lines; the input buffer is automatically extended if it is 277e18e3516Sopenharmony_ci too small. There are replication features that makes it possible to 278e18e3516Sopenharmony_ci generate long repetitive pattern or subject lines without having to 279e18e3516Sopenharmony_ci supply them explicitly. 280e18e3516Sopenharmony_ci 281e18e3516Sopenharmony_ci An empty line or the end of the file signals the end of the subject 282e18e3516Sopenharmony_ci lines for a test, at which point a new pattern or command line is ex- 283e18e3516Sopenharmony_ci pected if there is still input to be read. 284e18e3516Sopenharmony_ci 285e18e3516Sopenharmony_ci 286e18e3516Sopenharmony_ciCOMMAND LINES 287e18e3516Sopenharmony_ci 288e18e3516Sopenharmony_ci In between sets of test data, a line that begins with # is interpreted 289e18e3516Sopenharmony_ci as a command line. If the first character is followed by white space or 290e18e3516Sopenharmony_ci an exclamation mark, the line is treated as a comment, and ignored. 291e18e3516Sopenharmony_ci Otherwise, the following commands are recognized: 292e18e3516Sopenharmony_ci 293e18e3516Sopenharmony_ci #forbid_utf 294e18e3516Sopenharmony_ci 295e18e3516Sopenharmony_ci Subsequent patterns automatically have the PCRE2_NEVER_UTF and 296e18e3516Sopenharmony_ci PCRE2_NEVER_UCP options set, which locks out the use of the PCRE2_UTF 297e18e3516Sopenharmony_ci and PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start of 298e18e3516Sopenharmony_ci patterns. This command also forces an error if a subsequent pattern 299e18e3516Sopenharmony_ci contains any occurrences of \P, \p, or \X, which are still supported 300e18e3516Sopenharmony_ci when PCRE2_UTF is not set, but which require Unicode property support 301e18e3516Sopenharmony_ci to be included in the library. 302e18e3516Sopenharmony_ci 303e18e3516Sopenharmony_ci This is a trigger guard that is used in test files to ensure that UTF 304e18e3516Sopenharmony_ci or Unicode property tests are not accidentally added to files that are 305e18e3516Sopenharmony_ci used when Unicode support is not included in the library. Setting 306e18e3516Sopenharmony_ci PCRE2_NEVER_UTF and PCRE2_NEVER_UCP as a default can also be obtained 307e18e3516Sopenharmony_ci by the use of #pattern; the difference is that #forbid_utf cannot be 308e18e3516Sopenharmony_ci unset, and the automatic options are not displayed in pattern informa- 309e18e3516Sopenharmony_ci tion, to avoid cluttering up test output. 310e18e3516Sopenharmony_ci 311e18e3516Sopenharmony_ci #load <filename> 312e18e3516Sopenharmony_ci 313e18e3516Sopenharmony_ci This command is used to load a set of precompiled patterns from a file, 314e18e3516Sopenharmony_ci as described in the section entitled "Saving and restoring compiled 315e18e3516Sopenharmony_ci patterns" below. 316e18e3516Sopenharmony_ci 317e18e3516Sopenharmony_ci #loadtables <filename> 318e18e3516Sopenharmony_ci 319e18e3516Sopenharmony_ci This command is used to load a set of binary character tables that can 320e18e3516Sopenharmony_ci be accessed by the tables=3 qualifier. Such tables can be created by 321e18e3516Sopenharmony_ci the pcre2_dftables program with the -b option. 322e18e3516Sopenharmony_ci 323e18e3516Sopenharmony_ci #newline_default [<newline-list>] 324e18e3516Sopenharmony_ci 325e18e3516Sopenharmony_ci When PCRE2 is built, a default newline convention can be specified. 326e18e3516Sopenharmony_ci This determines which characters and/or character pairs are recognized 327e18e3516Sopenharmony_ci as indicating a newline in a pattern or subject string. The default can 328e18e3516Sopenharmony_ci be overridden when a pattern is compiled. The standard test files con- 329e18e3516Sopenharmony_ci tain tests of various newline conventions, but the majority of the 330e18e3516Sopenharmony_ci tests expect a single linefeed to be recognized as a newline by de- 331e18e3516Sopenharmony_ci fault. Without special action the tests would fail when PCRE2 is com- 332e18e3516Sopenharmony_ci piled with either CR or CRLF as the default newline. 333e18e3516Sopenharmony_ci 334e18e3516Sopenharmony_ci The #newline_default command specifies a list of newline types that are 335e18e3516Sopenharmony_ci acceptable as the default. The types must be one of CR, LF, CRLF, ANY- 336e18e3516Sopenharmony_ci CRLF, ANY, or NUL (in upper or lower case), for example: 337e18e3516Sopenharmony_ci 338e18e3516Sopenharmony_ci #newline_default LF Any anyCRLF 339e18e3516Sopenharmony_ci 340e18e3516Sopenharmony_ci If the default newline is in the list, this command has no effect. Oth- 341e18e3516Sopenharmony_ci erwise, except when testing the POSIX API, a newline modifier that 342e18e3516Sopenharmony_ci specifies the first newline convention in the list (LF in the above ex- 343e18e3516Sopenharmony_ci ample) is added to any pattern that does not already have a newline 344e18e3516Sopenharmony_ci modifier. If the newline list is empty, the feature is turned off. This 345e18e3516Sopenharmony_ci command is present in a number of the standard test input files. 346e18e3516Sopenharmony_ci 347e18e3516Sopenharmony_ci When the POSIX API is being tested there is no way to override the de- 348e18e3516Sopenharmony_ci fault newline convention, though it is possible to set the newline con- 349e18e3516Sopenharmony_ci vention from within the pattern. A warning is given if the posix or 350e18e3516Sopenharmony_ci posix_nosub modifier is used when #newline_default would set a default 351e18e3516Sopenharmony_ci for the non-POSIX API. 352e18e3516Sopenharmony_ci 353e18e3516Sopenharmony_ci #pattern <modifier-list> 354e18e3516Sopenharmony_ci 355e18e3516Sopenharmony_ci This command sets a default modifier list that applies to all subse- 356e18e3516Sopenharmony_ci quent patterns. Modifiers on a pattern can change these settings. 357e18e3516Sopenharmony_ci 358e18e3516Sopenharmony_ci #perltest 359e18e3516Sopenharmony_ci 360e18e3516Sopenharmony_ci This line is used in test files that can also be processed by perl- 361e18e3516Sopenharmony_ci test.sh to confirm that Perl gives the same results as PCRE2. Subse- 362e18e3516Sopenharmony_ci quent tests are checked for the use of pcre2test features that are in- 363e18e3516Sopenharmony_ci compatible with the perltest.sh script. 364e18e3516Sopenharmony_ci 365e18e3516Sopenharmony_ci Patterns must use '/' as their delimiter, and only certain modifiers 366e18e3516Sopenharmony_ci are supported. Comment lines, #pattern commands, and #subject commands 367e18e3516Sopenharmony_ci that set or unset "mark" are recognized and acted on. The #perltest, 368e18e3516Sopenharmony_ci #forbid_utf, and #newline_default commands, which are needed in the 369e18e3516Sopenharmony_ci relevant pcre2test files, are silently ignored. All other command lines 370e18e3516Sopenharmony_ci are ignored, but give a warning message. The #perltest command helps 371e18e3516Sopenharmony_ci detect tests that are accidentally put in the wrong file or use the 372e18e3516Sopenharmony_ci wrong delimiter. For more details of the perltest.sh script see the 373e18e3516Sopenharmony_ci comments it contains. 374e18e3516Sopenharmony_ci 375e18e3516Sopenharmony_ci #pop [<modifiers>] 376e18e3516Sopenharmony_ci #popcopy [<modifiers>] 377e18e3516Sopenharmony_ci 378e18e3516Sopenharmony_ci These commands are used to manipulate the stack of compiled patterns, 379e18e3516Sopenharmony_ci as described in the section entitled "Saving and restoring compiled 380e18e3516Sopenharmony_ci patterns" below. 381e18e3516Sopenharmony_ci 382e18e3516Sopenharmony_ci #save <filename> 383e18e3516Sopenharmony_ci 384e18e3516Sopenharmony_ci This command is used to save a set of compiled patterns to a file, as 385e18e3516Sopenharmony_ci described in the section entitled "Saving and restoring compiled pat- 386e18e3516Sopenharmony_ci terns" below. 387e18e3516Sopenharmony_ci 388e18e3516Sopenharmony_ci #subject <modifier-list> 389e18e3516Sopenharmony_ci 390e18e3516Sopenharmony_ci This command sets a default modifier list that applies to all subse- 391e18e3516Sopenharmony_ci quent subject lines. Modifiers on a subject line can change these set- 392e18e3516Sopenharmony_ci tings. 393e18e3516Sopenharmony_ci 394e18e3516Sopenharmony_ci 395e18e3516Sopenharmony_ciMODIFIER SYNTAX 396e18e3516Sopenharmony_ci 397e18e3516Sopenharmony_ci Modifier lists are used with both pattern and subject lines. Items in a 398e18e3516Sopenharmony_ci list are separated by commas followed by optional white space. Trailing 399e18e3516Sopenharmony_ci whitespace in a modifier list is ignored. Some modifiers may be given 400e18e3516Sopenharmony_ci for both patterns and subject lines, whereas others are valid only for 401e18e3516Sopenharmony_ci one or the other. Each modifier has a long name, for example "an- 402e18e3516Sopenharmony_ci chored", and some of them must be followed by an equals sign and a 403e18e3516Sopenharmony_ci value, for example, "offset=12". Values cannot contain comma charac- 404e18e3516Sopenharmony_ci ters, but may contain spaces. Modifiers that do not take values may be 405e18e3516Sopenharmony_ci preceded by a minus sign to turn off a previous setting. 406e18e3516Sopenharmony_ci 407e18e3516Sopenharmony_ci A few of the more common modifiers can also be specified as single let- 408e18e3516Sopenharmony_ci ters, for example "i" for "caseless". In documentation, following the 409e18e3516Sopenharmony_ci Perl convention, these are written with a slash ("the /i modifier") for 410e18e3516Sopenharmony_ci clarity. Abbreviated modifiers must all be concatenated in the first 411e18e3516Sopenharmony_ci item of a modifier list. If the first item is not recognized as a long 412e18e3516Sopenharmony_ci modifier name, it is interpreted as a sequence of these abbreviations. 413e18e3516Sopenharmony_ci For example: 414e18e3516Sopenharmony_ci 415e18e3516Sopenharmony_ci /abc/ig,newline=cr,jit=3 416e18e3516Sopenharmony_ci 417e18e3516Sopenharmony_ci This is a pattern line whose modifier list starts with two one-letter 418e18e3516Sopenharmony_ci modifiers (/i and /g). The lower-case abbreviated modifiers are the 419e18e3516Sopenharmony_ci same as used in Perl. 420e18e3516Sopenharmony_ci 421e18e3516Sopenharmony_ci 422e18e3516Sopenharmony_ciPATTERN SYNTAX 423e18e3516Sopenharmony_ci 424e18e3516Sopenharmony_ci A pattern line must start with one of the following characters (common 425e18e3516Sopenharmony_ci symbols, excluding pattern meta-characters): 426e18e3516Sopenharmony_ci 427e18e3516Sopenharmony_ci / ! " ' ` - = _ : ; , % & @ ~ 428e18e3516Sopenharmony_ci 429e18e3516Sopenharmony_ci This is interpreted as the pattern's delimiter. A regular expression 430e18e3516Sopenharmony_ci may be continued over several input lines, in which case the newline 431e18e3516Sopenharmony_ci characters are included within it. It is possible to include the delim- 432e18e3516Sopenharmony_ci iter as a literal within the pattern by escaping it with a backslash, 433e18e3516Sopenharmony_ci for example 434e18e3516Sopenharmony_ci 435e18e3516Sopenharmony_ci /abc\/def/ 436e18e3516Sopenharmony_ci 437e18e3516Sopenharmony_ci If you do this, the escape and the delimiter form part of the pattern, 438e18e3516Sopenharmony_ci but since the delimiters are all non-alphanumeric, the inclusion of the 439e18e3516Sopenharmony_ci backslash does not affect the pattern's interpretation. Note, however, 440e18e3516Sopenharmony_ci that this trick does not work within \Q...\E literal bracketing because 441e18e3516Sopenharmony_ci the backslash will itself be interpreted as a literal. If the terminat- 442e18e3516Sopenharmony_ci ing delimiter is immediately followed by a backslash, for example, 443e18e3516Sopenharmony_ci 444e18e3516Sopenharmony_ci /abc/\ 445e18e3516Sopenharmony_ci 446e18e3516Sopenharmony_ci then a backslash is added to the end of the pattern. This is done to 447e18e3516Sopenharmony_ci provide a way of testing the error condition that arises if a pattern 448e18e3516Sopenharmony_ci finishes with a backslash, because 449e18e3516Sopenharmony_ci 450e18e3516Sopenharmony_ci /abc\/ 451e18e3516Sopenharmony_ci 452e18e3516Sopenharmony_ci is interpreted as the first line of a pattern that starts with "abc/", 453e18e3516Sopenharmony_ci causing pcre2test to read the next line as a continuation of the regu- 454e18e3516Sopenharmony_ci lar expression. 455e18e3516Sopenharmony_ci 456e18e3516Sopenharmony_ci A pattern can be followed by a modifier list (details below). 457e18e3516Sopenharmony_ci 458e18e3516Sopenharmony_ci 459e18e3516Sopenharmony_ciSUBJECT LINE SYNTAX 460e18e3516Sopenharmony_ci 461e18e3516Sopenharmony_ci Before each subject line is passed to pcre2_match(), pcre2_dfa_match(), 462e18e3516Sopenharmony_ci or pcre2_jit_match(), leading and trailing white space is removed, and 463e18e3516Sopenharmony_ci the line is scanned for backslash escapes, unless the subject_literal 464e18e3516Sopenharmony_ci modifier was set for the pattern. The following provide a means of en- 465e18e3516Sopenharmony_ci coding non-printing characters in a visible way: 466e18e3516Sopenharmony_ci 467e18e3516Sopenharmony_ci \a alarm (BEL, \x07) 468e18e3516Sopenharmony_ci \b backspace (\x08) 469e18e3516Sopenharmony_ci \e escape (\x27) 470e18e3516Sopenharmony_ci \f form feed (\x0c) 471e18e3516Sopenharmony_ci \n newline (\x0a) 472e18e3516Sopenharmony_ci \r carriage return (\x0d) 473e18e3516Sopenharmony_ci \t tab (\x09) 474e18e3516Sopenharmony_ci \v vertical tab (\x0b) 475e18e3516Sopenharmony_ci \nnn octal character (up to 3 octal digits); always 476e18e3516Sopenharmony_ci a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode 477e18e3516Sopenharmony_ci \o{dd...} octal character (any number of octal digits} 478e18e3516Sopenharmony_ci \xhh hexadecimal byte (up to 2 hex digits) 479e18e3516Sopenharmony_ci \x{hh...} hexadecimal character (any number of hex digits) 480e18e3516Sopenharmony_ci 481e18e3516Sopenharmony_ci The use of \x{hh...} is not dependent on the use of the utf modifier on 482e18e3516Sopenharmony_ci the pattern. It is recognized always. There may be any number of hexa- 483e18e3516Sopenharmony_ci decimal digits inside the braces; invalid values provoke error mes- 484e18e3516Sopenharmony_ci sages. 485e18e3516Sopenharmony_ci 486e18e3516Sopenharmony_ci Note that \xhh specifies one byte rather than one character in UTF-8 487e18e3516Sopenharmony_ci mode; this makes it possible to construct invalid UTF-8 sequences for 488e18e3516Sopenharmony_ci testing purposes. On the other hand, \x{hh} is interpreted as a UTF-8 489e18e3516Sopenharmony_ci character in UTF-8 mode, generating more than one byte if the value is 490e18e3516Sopenharmony_ci greater than 127. When testing the 8-bit library not in UTF-8 mode, 491e18e3516Sopenharmony_ci \x{hh} generates one byte for values less than 256, and causes an error 492e18e3516Sopenharmony_ci for greater values. 493e18e3516Sopenharmony_ci 494e18e3516Sopenharmony_ci In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it 495e18e3516Sopenharmony_ci possible to construct invalid UTF-16 sequences for testing purposes. 496e18e3516Sopenharmony_ci 497e18e3516Sopenharmony_ci In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This 498e18e3516Sopenharmony_ci makes it possible to construct invalid UTF-32 sequences for testing 499e18e3516Sopenharmony_ci purposes. 500e18e3516Sopenharmony_ci 501e18e3516Sopenharmony_ci There is a special backslash sequence that specifies replication of one 502e18e3516Sopenharmony_ci or more characters: 503e18e3516Sopenharmony_ci 504e18e3516Sopenharmony_ci \[<characters>]{<count>} 505e18e3516Sopenharmony_ci 506e18e3516Sopenharmony_ci This makes it possible to test long strings without having to provide 507e18e3516Sopenharmony_ci them as part of the file. For example: 508e18e3516Sopenharmony_ci 509e18e3516Sopenharmony_ci \[abc]{4} 510e18e3516Sopenharmony_ci 511e18e3516Sopenharmony_ci is converted to "abcabcabcabc". This feature does not support nesting. 512e18e3516Sopenharmony_ci To include a closing square bracket in the characters, code it as \x5D. 513e18e3516Sopenharmony_ci 514e18e3516Sopenharmony_ci A backslash followed by an equals sign marks the end of the subject 515e18e3516Sopenharmony_ci string and the start of a modifier list. For example: 516e18e3516Sopenharmony_ci 517e18e3516Sopenharmony_ci abc\=notbol,notempty 518e18e3516Sopenharmony_ci 519e18e3516Sopenharmony_ci If the subject string is empty and \= is followed by whitespace, the 520e18e3516Sopenharmony_ci line is treated as a comment line, and is not used for matching. For 521e18e3516Sopenharmony_ci example: 522e18e3516Sopenharmony_ci 523e18e3516Sopenharmony_ci \= This is a comment. 524e18e3516Sopenharmony_ci abc\= This is an invalid modifier list. 525e18e3516Sopenharmony_ci 526e18e3516Sopenharmony_ci A backslash followed by any other non-alphanumeric character just es- 527e18e3516Sopenharmony_ci capes that character. A backslash followed by anything else causes an 528e18e3516Sopenharmony_ci error. However, if the very last character in the line is a backslash 529e18e3516Sopenharmony_ci (and there is no modifier list), it is ignored. This gives a way of 530e18e3516Sopenharmony_ci passing an empty line as data, since a real empty line terminates the 531e18e3516Sopenharmony_ci data input. 532e18e3516Sopenharmony_ci 533e18e3516Sopenharmony_ci If the subject_literal modifier is set for a pattern, all subject lines 534e18e3516Sopenharmony_ci that follow are treated as literals, with no special treatment of back- 535e18e3516Sopenharmony_ci slashes. No replication is possible, and any subject modifiers must be 536e18e3516Sopenharmony_ci set as defaults by a #subject command. 537e18e3516Sopenharmony_ci 538e18e3516Sopenharmony_ci 539e18e3516Sopenharmony_ciPATTERN MODIFIERS 540e18e3516Sopenharmony_ci 541e18e3516Sopenharmony_ci There are several types of modifier that can appear in pattern lines. 542e18e3516Sopenharmony_ci Except where noted below, they may also be used in #pattern commands. A 543e18e3516Sopenharmony_ci pattern's modifier list can add to or override default modifiers that 544e18e3516Sopenharmony_ci were set by a previous #pattern command. 545e18e3516Sopenharmony_ci 546e18e3516Sopenharmony_ci Setting compilation options 547e18e3516Sopenharmony_ci 548e18e3516Sopenharmony_ci The following modifiers set options for pcre2_compile(). Most of them 549e18e3516Sopenharmony_ci set bits in the options argument of that function, but those whose 550e18e3516Sopenharmony_ci names start with PCRE2_EXTRA are additional options that are set in the 551e18e3516Sopenharmony_ci compile context. For the main options, there are some single-letter ab- 552e18e3516Sopenharmony_ci breviations that are the same as Perl options. There is special han- 553e18e3516Sopenharmony_ci dling for /x: if a second x is present, PCRE2_EXTENDED is converted 554e18e3516Sopenharmony_ci into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX- 555e18e3516Sopenharmony_ci TENDED as well, though this makes no difference to the way pcre2_com- 556e18e3516Sopenharmony_ci pile() behaves. See pcre2api for a description of the effects of these 557e18e3516Sopenharmony_ci options. 558e18e3516Sopenharmony_ci 559e18e3516Sopenharmony_ci allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS 560e18e3516Sopenharmony_ci allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK 561e18e3516Sopenharmony_ci allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 562e18e3516Sopenharmony_ci alt_bsux set PCRE2_ALT_BSUX 563e18e3516Sopenharmony_ci alt_circumflex set PCRE2_ALT_CIRCUMFLEX 564e18e3516Sopenharmony_ci alt_verbnames set PCRE2_ALT_VERBNAMES 565e18e3516Sopenharmony_ci anchored set PCRE2_ANCHORED 566e18e3516Sopenharmony_ci auto_callout set PCRE2_AUTO_CALLOUT 567e18e3516Sopenharmony_ci bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 568e18e3516Sopenharmony_ci /i caseless set PCRE2_CASELESS 569e18e3516Sopenharmony_ci dollar_endonly set PCRE2_DOLLAR_ENDONLY 570e18e3516Sopenharmony_ci /s dotall set PCRE2_DOTALL 571e18e3516Sopenharmony_ci dupnames set PCRE2_DUPNAMES 572e18e3516Sopenharmony_ci endanchored set PCRE2_ENDANCHORED 573e18e3516Sopenharmony_ci escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF 574e18e3516Sopenharmony_ci /x extended set PCRE2_EXTENDED 575e18e3516Sopenharmony_ci /xx extended_more set PCRE2_EXTENDED_MORE 576e18e3516Sopenharmony_ci extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX 577e18e3516Sopenharmony_ci firstline set PCRE2_FIRSTLINE 578e18e3516Sopenharmony_ci literal set PCRE2_LITERAL 579e18e3516Sopenharmony_ci match_line set PCRE2_EXTRA_MATCH_LINE 580e18e3516Sopenharmony_ci match_invalid_utf set PCRE2_MATCH_INVALID_UTF 581e18e3516Sopenharmony_ci match_unset_backref set PCRE2_MATCH_UNSET_BACKREF 582e18e3516Sopenharmony_ci match_word set PCRE2_EXTRA_MATCH_WORD 583e18e3516Sopenharmony_ci /m multiline set PCRE2_MULTILINE 584e18e3516Sopenharmony_ci never_backslash_c set PCRE2_NEVER_BACKSLASH_C 585e18e3516Sopenharmony_ci never_ucp set PCRE2_NEVER_UCP 586e18e3516Sopenharmony_ci never_utf set PCRE2_NEVER_UTF 587e18e3516Sopenharmony_ci /n no_auto_capture set PCRE2_NO_AUTO_CAPTURE 588e18e3516Sopenharmony_ci no_auto_possess set PCRE2_NO_AUTO_POSSESS 589e18e3516Sopenharmony_ci no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR 590e18e3516Sopenharmony_ci no_start_optimize set PCRE2_NO_START_OPTIMIZE 591e18e3516Sopenharmony_ci no_utf_check set PCRE2_NO_UTF_CHECK 592e18e3516Sopenharmony_ci ucp set PCRE2_UCP 593e18e3516Sopenharmony_ci ungreedy set PCRE2_UNGREEDY 594e18e3516Sopenharmony_ci use_offset_limit set PCRE2_USE_OFFSET_LIMIT 595e18e3516Sopenharmony_ci utf set PCRE2_UTF 596e18e3516Sopenharmony_ci 597e18e3516Sopenharmony_ci As well as turning on the PCRE2_UTF option, the utf modifier causes all 598e18e3516Sopenharmony_ci non-printing characters in output strings to be printed using the 599e18e3516Sopenharmony_ci \x{hh...} notation. Otherwise, those less than 0x100 are output in hex 600e18e3516Sopenharmony_ci without the curly brackets. Setting utf in 16-bit or 32-bit mode also 601e18e3516Sopenharmony_ci causes pattern and subject strings to be translated to UTF-16 or 602e18e3516Sopenharmony_ci UTF-32, respectively, before being passed to library functions. 603e18e3516Sopenharmony_ci 604e18e3516Sopenharmony_ci Setting compilation controls 605e18e3516Sopenharmony_ci 606e18e3516Sopenharmony_ci The following modifiers affect the compilation process or request in- 607e18e3516Sopenharmony_ci formation about the pattern. There are single-letter abbreviations for 608e18e3516Sopenharmony_ci some that are heavily used in the test files. 609e18e3516Sopenharmony_ci 610e18e3516Sopenharmony_ci bsr=[anycrlf|unicode] specify \R handling 611e18e3516Sopenharmony_ci /B bincode show binary code without lengths 612e18e3516Sopenharmony_ci callout_info show callout information 613e18e3516Sopenharmony_ci convert=<options> request foreign pattern conversion 614e18e3516Sopenharmony_ci convert_glob_escape=c set glob escape character 615e18e3516Sopenharmony_ci convert_glob_separator=c set glob separator character 616e18e3516Sopenharmony_ci convert_length set convert buffer length 617e18e3516Sopenharmony_ci debug same as info,fullbincode 618e18e3516Sopenharmony_ci framesize show matching frame size 619e18e3516Sopenharmony_ci fullbincode show binary code with lengths 620e18e3516Sopenharmony_ci /I info show info about compiled pattern 621e18e3516Sopenharmony_ci hex unquoted characters are hexadecimal 622e18e3516Sopenharmony_ci jit[=<number>] use JIT 623e18e3516Sopenharmony_ci jitfast use JIT fast path 624e18e3516Sopenharmony_ci jitverify verify JIT use 625e18e3516Sopenharmony_ci locale=<name> use this locale 626e18e3516Sopenharmony_ci max_pattern_length=<n> set the maximum pattern length 627e18e3516Sopenharmony_ci memory show memory used 628e18e3516Sopenharmony_ci newline=<type> set newline type 629e18e3516Sopenharmony_ci null_context compile with a NULL context 630e18e3516Sopenharmony_ci parens_nest_limit=<n> set maximum parentheses depth 631e18e3516Sopenharmony_ci posix use the POSIX API 632e18e3516Sopenharmony_ci posix_nosub use the POSIX API with REG_NOSUB 633e18e3516Sopenharmony_ci push push compiled pattern onto the stack 634e18e3516Sopenharmony_ci pushcopy push a copy onto the stack 635e18e3516Sopenharmony_ci stackguard=<number> test the stackguard feature 636e18e3516Sopenharmony_ci subject_literal treat all subject lines as literal 637e18e3516Sopenharmony_ci tables=[0|1|2|3] select internal tables 638e18e3516Sopenharmony_ci use_length do not zero-terminate the pattern 639e18e3516Sopenharmony_ci utf8_input treat input as UTF-8 640e18e3516Sopenharmony_ci 641e18e3516Sopenharmony_ci The effects of these modifiers are described in the following sections. 642e18e3516Sopenharmony_ci 643e18e3516Sopenharmony_ci Newline and \R handling 644e18e3516Sopenharmony_ci 645e18e3516Sopenharmony_ci The bsr modifier specifies what \R in a pattern should match. If it is 646e18e3516Sopenharmony_ci set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to 647e18e3516Sopenharmony_ci "unicode", \R matches any Unicode newline sequence. The default can be 648e18e3516Sopenharmony_ci specified when PCRE2 is built; if it is not, the default is set to Uni- 649e18e3516Sopenharmony_ci code. 650e18e3516Sopenharmony_ci 651e18e3516Sopenharmony_ci The newline modifier specifies which characters are to be interpreted 652e18e3516Sopenharmony_ci as newlines, both in the pattern and in subject lines. The type must be 653e18e3516Sopenharmony_ci one of CR, LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case). 654e18e3516Sopenharmony_ci 655e18e3516Sopenharmony_ci Information about a pattern 656e18e3516Sopenharmony_ci 657e18e3516Sopenharmony_ci The debug modifier is a shorthand for info,fullbincode, requesting all 658e18e3516Sopenharmony_ci available information. 659e18e3516Sopenharmony_ci 660e18e3516Sopenharmony_ci The bincode modifier causes a representation of the compiled code to be 661e18e3516Sopenharmony_ci output after compilation. This information does not contain length and 662e18e3516Sopenharmony_ci offset values, which ensures that the same output is generated for dif- 663e18e3516Sopenharmony_ci ferent internal link sizes and different code unit widths. By using 664e18e3516Sopenharmony_ci bincode, the same regression tests can be used in different environ- 665e18e3516Sopenharmony_ci ments. 666e18e3516Sopenharmony_ci 667e18e3516Sopenharmony_ci The fullbincode modifier, by contrast, does include length and offset 668e18e3516Sopenharmony_ci values. This is used in a few special tests that run only for specific 669e18e3516Sopenharmony_ci code unit widths and link sizes, and is also useful for one-off tests. 670e18e3516Sopenharmony_ci 671e18e3516Sopenharmony_ci The info modifier requests information about the compiled pattern 672e18e3516Sopenharmony_ci (whether it is anchored, has a fixed first character, and so on). The 673e18e3516Sopenharmony_ci information is obtained from the pcre2_pattern_info() function. Here 674e18e3516Sopenharmony_ci are some typical examples: 675e18e3516Sopenharmony_ci 676e18e3516Sopenharmony_ci re> /(?i)(^a|^b)/m,info 677e18e3516Sopenharmony_ci Capture group count = 1 678e18e3516Sopenharmony_ci Compile options: multiline 679e18e3516Sopenharmony_ci Overall options: caseless multiline 680e18e3516Sopenharmony_ci First code unit at start or follows newline 681e18e3516Sopenharmony_ci Subject length lower bound = 1 682e18e3516Sopenharmony_ci 683e18e3516Sopenharmony_ci re> /(?i)abc/info 684e18e3516Sopenharmony_ci Capture group count = 0 685e18e3516Sopenharmony_ci Compile options: <none> 686e18e3516Sopenharmony_ci Overall options: caseless 687e18e3516Sopenharmony_ci First code unit = 'a' (caseless) 688e18e3516Sopenharmony_ci Last code unit = 'c' (caseless) 689e18e3516Sopenharmony_ci Subject length lower bound = 3 690e18e3516Sopenharmony_ci 691e18e3516Sopenharmony_ci "Compile options" are those specified by modifiers; "overall options" 692e18e3516Sopenharmony_ci have added options that are taken or deduced from the pattern. If both 693e18e3516Sopenharmony_ci sets of options are the same, just a single "options" line is output; 694e18e3516Sopenharmony_ci if there are no options, the line is omitted. "First code unit" is 695e18e3516Sopenharmony_ci where any match must start; if there is more than one they are listed 696e18e3516Sopenharmony_ci as "starting code units". "Last code unit" is the last literal code 697e18e3516Sopenharmony_ci unit that must be present in any match. This is not necessarily the 698e18e3516Sopenharmony_ci last character. These lines are omitted if no starting or ending code 699e18e3516Sopenharmony_ci units are recorded. The subject length line is omitted when 700e18e3516Sopenharmony_ci no_start_optimize is set because the minimum length is not calculated 701e18e3516Sopenharmony_ci when it can never be used. 702e18e3516Sopenharmony_ci 703e18e3516Sopenharmony_ci The framesize modifier shows the size, in bytes, of the storage frames 704e18e3516Sopenharmony_ci used by pcre2_match() for handling backtracking. The size depends on 705e18e3516Sopenharmony_ci the number of capturing parentheses in the pattern. 706e18e3516Sopenharmony_ci 707e18e3516Sopenharmony_ci The callout_info modifier requests information about all the callouts 708e18e3516Sopenharmony_ci in the pattern. A list of them is output at the end of any other infor- 709e18e3516Sopenharmony_ci mation that is requested. For each callout, either its number or string 710e18e3516Sopenharmony_ci is given, followed by the item that follows it in the pattern. 711e18e3516Sopenharmony_ci 712e18e3516Sopenharmony_ci Passing a NULL context 713e18e3516Sopenharmony_ci 714e18e3516Sopenharmony_ci Normally, pcre2test passes a context block to pcre2_compile(). If the 715e18e3516Sopenharmony_ci null_context modifier is set, however, NULL is passed. This is for 716e18e3516Sopenharmony_ci testing that pcre2_compile() behaves correctly in this case (it uses 717e18e3516Sopenharmony_ci default values). 718e18e3516Sopenharmony_ci 719e18e3516Sopenharmony_ci Specifying pattern characters in hexadecimal 720e18e3516Sopenharmony_ci 721e18e3516Sopenharmony_ci The hex modifier specifies that the characters of the pattern, except 722e18e3516Sopenharmony_ci for substrings enclosed in single or double quotes, are to be inter- 723e18e3516Sopenharmony_ci preted as pairs of hexadecimal digits. This feature is provided as a 724e18e3516Sopenharmony_ci way of creating patterns that contain binary zeros and other non-print- 725e18e3516Sopenharmony_ci ing characters. White space is permitted between pairs of digits. For 726e18e3516Sopenharmony_ci example, this pattern contains three characters: 727e18e3516Sopenharmony_ci 728e18e3516Sopenharmony_ci /ab 32 59/hex 729e18e3516Sopenharmony_ci 730e18e3516Sopenharmony_ci Parts of such a pattern are taken literally if quoted. This pattern 731e18e3516Sopenharmony_ci contains nine characters, only two of which are specified in hexadeci- 732e18e3516Sopenharmony_ci mal: 733e18e3516Sopenharmony_ci 734e18e3516Sopenharmony_ci /ab "literal" 32/hex 735e18e3516Sopenharmony_ci 736e18e3516Sopenharmony_ci Either single or double quotes may be used. There is no way of includ- 737e18e3516Sopenharmony_ci ing the delimiter within a substring. The hex and expand modifiers are 738e18e3516Sopenharmony_ci mutually exclusive. 739e18e3516Sopenharmony_ci 740e18e3516Sopenharmony_ci Specifying the pattern's length 741e18e3516Sopenharmony_ci 742e18e3516Sopenharmony_ci By default, patterns are passed to the compiling functions as zero-ter- 743e18e3516Sopenharmony_ci minated strings but can be passed by length instead of being zero-ter- 744e18e3516Sopenharmony_ci minated. The use_length modifier causes this to happen. Using a length 745e18e3516Sopenharmony_ci happens automatically (whether or not use_length is set) when hex is 746e18e3516Sopenharmony_ci set, because patterns specified in hexadecimal may contain binary ze- 747e18e3516Sopenharmony_ci ros. 748e18e3516Sopenharmony_ci 749e18e3516Sopenharmony_ci If hex or use_length is used with the POSIX wrapper API (see "Using the 750e18e3516Sopenharmony_ci POSIX wrapper API" below), the REG_PEND extension is used to pass the 751e18e3516Sopenharmony_ci pattern's length. 752e18e3516Sopenharmony_ci 753e18e3516Sopenharmony_ci Specifying wide characters in 16-bit and 32-bit modes 754e18e3516Sopenharmony_ci 755e18e3516Sopenharmony_ci In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 756e18e3516Sopenharmony_ci and translated to UTF-16 or UTF-32 when the utf modifier is set. For 757e18e3516Sopenharmony_ci testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input 758e18e3516Sopenharmony_ci modifier can be used. It is mutually exclusive with utf. Input lines 759e18e3516Sopenharmony_ci are interpreted as UTF-8 as a means of specifying wide characters. More 760e18e3516Sopenharmony_ci details are given in "Input encoding" above. 761e18e3516Sopenharmony_ci 762e18e3516Sopenharmony_ci Generating long repetitive patterns 763e18e3516Sopenharmony_ci 764e18e3516Sopenharmony_ci Some tests use long patterns that are very repetitive. Instead of cre- 765e18e3516Sopenharmony_ci ating a very long input line for such a pattern, you can use a special 766e18e3516Sopenharmony_ci repetition feature, similar to the one described for subject lines 767e18e3516Sopenharmony_ci above. If the expand modifier is present on a pattern, parts of the 768e18e3516Sopenharmony_ci pattern that have the form 769e18e3516Sopenharmony_ci 770e18e3516Sopenharmony_ci \[<characters>]{<count>} 771e18e3516Sopenharmony_ci 772e18e3516Sopenharmony_ci are expanded before the pattern is passed to pcre2_compile(). For exam- 773e18e3516Sopenharmony_ci ple, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction 774e18e3516Sopenharmony_ci cannot be nested. An initial "\[" sequence is recognized only if "]{" 775e18e3516Sopenharmony_ci followed by decimal digits and "}" is found later in the pattern. If 776e18e3516Sopenharmony_ci not, the characters remain in the pattern unaltered. The expand and hex 777e18e3516Sopenharmony_ci modifiers are mutually exclusive. 778e18e3516Sopenharmony_ci 779e18e3516Sopenharmony_ci If part of an expanded pattern looks like an expansion, but is really 780e18e3516Sopenharmony_ci part of the actual pattern, unwanted expansion can be avoided by giving 781e18e3516Sopenharmony_ci two values in the quantifier. For example, \[AB]{6000,6000} is not rec- 782e18e3516Sopenharmony_ci ognized as an expansion item. 783e18e3516Sopenharmony_ci 784e18e3516Sopenharmony_ci If the info modifier is set on an expanded pattern, the result of the 785e18e3516Sopenharmony_ci expansion is included in the information that is output. 786e18e3516Sopenharmony_ci 787e18e3516Sopenharmony_ci JIT compilation 788e18e3516Sopenharmony_ci 789e18e3516Sopenharmony_ci Just-in-time (JIT) compiling is a heavyweight optimization that can 790e18e3516Sopenharmony_ci greatly speed up pattern matching. See the pcre2jit documentation for 791e18e3516Sopenharmony_ci details. JIT compiling happens, optionally, after a pattern has been 792e18e3516Sopenharmony_ci successfully compiled into an internal form. The JIT compiler converts 793e18e3516Sopenharmony_ci this to optimized machine code. It needs to know whether the match-time 794e18e3516Sopenharmony_ci options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, 795e18e3516Sopenharmony_ci because different code is generated for the different cases. See the 796e18e3516Sopenharmony_ci partial modifier in "Subject Modifiers" below for details of how these 797e18e3516Sopenharmony_ci options are specified for each match attempt. 798e18e3516Sopenharmony_ci 799e18e3516Sopenharmony_ci JIT compilation is requested by the jit pattern modifier, which may op- 800e18e3516Sopenharmony_ci tionally be followed by an equals sign and a number in the range 0 to 801e18e3516Sopenharmony_ci 7. The three bits that make up the number specify which of the three 802e18e3516Sopenharmony_ci JIT operating modes are to be compiled: 803e18e3516Sopenharmony_ci 804e18e3516Sopenharmony_ci 1 compile JIT code for non-partial matching 805e18e3516Sopenharmony_ci 2 compile JIT code for soft partial matching 806e18e3516Sopenharmony_ci 4 compile JIT code for hard partial matching 807e18e3516Sopenharmony_ci 808e18e3516Sopenharmony_ci The possible values for the jit modifier are therefore: 809e18e3516Sopenharmony_ci 810e18e3516Sopenharmony_ci 0 disable JIT 811e18e3516Sopenharmony_ci 1 normal matching only 812e18e3516Sopenharmony_ci 2 soft partial matching only 813e18e3516Sopenharmony_ci 3 normal and soft partial matching 814e18e3516Sopenharmony_ci 4 hard partial matching only 815e18e3516Sopenharmony_ci 6 soft and hard partial matching only 816e18e3516Sopenharmony_ci 7 all three modes 817e18e3516Sopenharmony_ci 818e18e3516Sopenharmony_ci If no number is given, 7 is assumed. The phrase "partial matching" 819e18e3516Sopenharmony_ci means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the 820e18e3516Sopenharmony_ci PCRE2_PARTIAL_HARD option set. Note that such a call may return a com- 821e18e3516Sopenharmony_ci plete match; the options enable the possibility of a partial match, but 822e18e3516Sopenharmony_ci do not require it. Note also that if you request JIT compilation only 823e18e3516Sopenharmony_ci for partial matching (for example, jit=2) but do not set the partial 824e18e3516Sopenharmony_ci modifier on a subject line, that match will not use JIT code because 825e18e3516Sopenharmony_ci none was compiled for non-partial matching. 826e18e3516Sopenharmony_ci 827e18e3516Sopenharmony_ci If JIT compilation is successful, the compiled JIT code will automati- 828e18e3516Sopenharmony_ci cally be used when an appropriate type of match is run, except when in- 829e18e3516Sopenharmony_ci compatible run-time options are specified. For more details, see the 830e18e3516Sopenharmony_ci pcre2jit documentation. See also the jitstack modifier below for a way 831e18e3516Sopenharmony_ci of setting the size of the JIT stack. 832e18e3516Sopenharmony_ci 833e18e3516Sopenharmony_ci If the jitfast modifier is specified, matching is done using the JIT 834e18e3516Sopenharmony_ci "fast path" interface, pcre2_jit_match(), which skips some of the san- 835e18e3516Sopenharmony_ci ity checks that are done by pcre2_match(), and of course does not work 836e18e3516Sopenharmony_ci when JIT is not supported. If jitfast is specified without jit, jit=7 837e18e3516Sopenharmony_ci is assumed. 838e18e3516Sopenharmony_ci 839e18e3516Sopenharmony_ci If the jitverify modifier is specified, information about the compiled 840e18e3516Sopenharmony_ci pattern shows whether JIT compilation was or was not successful. If 841e18e3516Sopenharmony_ci jitverify is specified without jit, jit=7 is assumed. If JIT compila- 842e18e3516Sopenharmony_ci tion is successful when jitverify is set, the text "(JIT)" is added to 843e18e3516Sopenharmony_ci the first output line after a match or non match when JIT-compiled code 844e18e3516Sopenharmony_ci was actually used in the match. 845e18e3516Sopenharmony_ci 846e18e3516Sopenharmony_ci Setting a locale 847e18e3516Sopenharmony_ci 848e18e3516Sopenharmony_ci The locale modifier must specify the name of a locale, for example: 849e18e3516Sopenharmony_ci 850e18e3516Sopenharmony_ci /pattern/locale=fr_FR 851e18e3516Sopenharmony_ci 852e18e3516Sopenharmony_ci The given locale is set, pcre2_maketables() is called to build a set of 853e18e3516Sopenharmony_ci character tables for the locale, and this is then passed to pcre2_com- 854e18e3516Sopenharmony_ci pile() when compiling the regular expression. The same tables are used 855e18e3516Sopenharmony_ci when matching the following subject lines. The locale modifier applies 856e18e3516Sopenharmony_ci only to the pattern on which it appears, but can be given in a #pattern 857e18e3516Sopenharmony_ci command if a default is needed. Setting a locale and alternate charac- 858e18e3516Sopenharmony_ci ter tables are mutually exclusive. 859e18e3516Sopenharmony_ci 860e18e3516Sopenharmony_ci Showing pattern memory 861e18e3516Sopenharmony_ci 862e18e3516Sopenharmony_ci The memory modifier causes the size in bytes of the memory used to hold 863e18e3516Sopenharmony_ci the compiled pattern to be output. This does not include the size of 864e18e3516Sopenharmony_ci the pcre2_code block; it is just the actual compiled data. If the pat- 865e18e3516Sopenharmony_ci tern is subsequently passed to the JIT compiler, the size of the JIT 866e18e3516Sopenharmony_ci compiled code is also output. Here is an example: 867e18e3516Sopenharmony_ci 868e18e3516Sopenharmony_ci re> /a(b)c/jit,memory 869e18e3516Sopenharmony_ci Memory allocation (code space): 21 870e18e3516Sopenharmony_ci Memory allocation (JIT code): 1910 871e18e3516Sopenharmony_ci 872e18e3516Sopenharmony_ci 873e18e3516Sopenharmony_ci Limiting nested parentheses 874e18e3516Sopenharmony_ci 875e18e3516Sopenharmony_ci The parens_nest_limit modifier sets a limit on the depth of nested 876e18e3516Sopenharmony_ci parentheses in a pattern. Breaching the limit causes a compilation er- 877e18e3516Sopenharmony_ci ror. The default for the library is set when PCRE2 is built, but 878e18e3516Sopenharmony_ci pcre2test sets its own default of 220, which is required for running 879e18e3516Sopenharmony_ci the standard test suite. 880e18e3516Sopenharmony_ci 881e18e3516Sopenharmony_ci Limiting the pattern length 882e18e3516Sopenharmony_ci 883e18e3516Sopenharmony_ci The max_pattern_length modifier sets a limit, in code units, to the 884e18e3516Sopenharmony_ci length of pattern that pcre2_compile() will accept. Breaching the limit 885e18e3516Sopenharmony_ci causes a compilation error. The default is the largest number a 886e18e3516Sopenharmony_ci PCRE2_SIZE variable can hold (essentially unlimited). 887e18e3516Sopenharmony_ci 888e18e3516Sopenharmony_ci Using the POSIX wrapper API 889e18e3516Sopenharmony_ci 890e18e3516Sopenharmony_ci The posix and posix_nosub modifiers cause pcre2test to call PCRE2 via 891e18e3516Sopenharmony_ci the POSIX wrapper API rather than its native API. When posix_nosub is 892e18e3516Sopenharmony_ci used, the POSIX option REG_NOSUB is passed to regcomp(). The POSIX 893e18e3516Sopenharmony_ci wrapper supports only the 8-bit library. Note that it does not imply 894e18e3516Sopenharmony_ci POSIX matching semantics; for more detail see the pcre2posix documenta- 895e18e3516Sopenharmony_ci tion. The following pattern modifiers set options for the regcomp() 896e18e3516Sopenharmony_ci function: 897e18e3516Sopenharmony_ci 898e18e3516Sopenharmony_ci caseless REG_ICASE 899e18e3516Sopenharmony_ci multiline REG_NEWLINE 900e18e3516Sopenharmony_ci dotall REG_DOTALL ) 901e18e3516Sopenharmony_ci ungreedy REG_UNGREEDY ) These options are not part of 902e18e3516Sopenharmony_ci ucp REG_UCP ) the POSIX standard 903e18e3516Sopenharmony_ci utf REG_UTF8 ) 904e18e3516Sopenharmony_ci 905e18e3516Sopenharmony_ci The regerror_buffsize modifier specifies a size for the error buffer 906e18e3516Sopenharmony_ci that is passed to regerror() in the event of a compilation error. For 907e18e3516Sopenharmony_ci example: 908e18e3516Sopenharmony_ci 909e18e3516Sopenharmony_ci /abc/posix,regerror_buffsize=20 910e18e3516Sopenharmony_ci 911e18e3516Sopenharmony_ci This provides a means of testing the behaviour of regerror() when the 912e18e3516Sopenharmony_ci buffer is too small for the error message. If this modifier has not 913e18e3516Sopenharmony_ci been set, a large buffer is used. 914e18e3516Sopenharmony_ci 915e18e3516Sopenharmony_ci The aftertext and allaftertext subject modifiers work as described be- 916e18e3516Sopenharmony_ci low. All other modifiers are either ignored, with a warning message, or 917e18e3516Sopenharmony_ci cause an error. 918e18e3516Sopenharmony_ci 919e18e3516Sopenharmony_ci The pattern is passed to regcomp() as a zero-terminated string by de- 920e18e3516Sopenharmony_ci fault, but if the use_length or hex modifiers are set, the REG_PEND ex- 921e18e3516Sopenharmony_ci tension is used to pass it by length. 922e18e3516Sopenharmony_ci 923e18e3516Sopenharmony_ci Testing the stack guard feature 924e18e3516Sopenharmony_ci 925e18e3516Sopenharmony_ci The stackguard modifier is used to test the use of pcre2_set_com- 926e18e3516Sopenharmony_ci pile_recursion_guard(), a function that is provided to enable stack 927e18e3516Sopenharmony_ci availability to be checked during compilation (see the pcre2api docu- 928e18e3516Sopenharmony_ci mentation for details). If the number specified by the modifier is 929e18e3516Sopenharmony_ci greater than zero, pcre2_set_compile_recursion_guard() is called to set 930e18e3516Sopenharmony_ci up callback from pcre2_compile() to a local function. The argument it 931e18e3516Sopenharmony_ci receives is the current nesting parenthesis depth; if this is greater 932e18e3516Sopenharmony_ci than the value given by the modifier, non-zero is returned, causing the 933e18e3516Sopenharmony_ci compilation to be aborted. 934e18e3516Sopenharmony_ci 935e18e3516Sopenharmony_ci Using alternative character tables 936e18e3516Sopenharmony_ci 937e18e3516Sopenharmony_ci The value specified for the tables modifier must be one of the digits 938e18e3516Sopenharmony_ci 0, 1, 2, or 3. It causes a specific set of built-in character tables to 939e18e3516Sopenharmony_ci be passed to pcre2_compile(). This is used in the PCRE2 tests to check 940e18e3516Sopenharmony_ci behaviour with different character tables. The digit specifies the ta- 941e18e3516Sopenharmony_ci bles as follows: 942e18e3516Sopenharmony_ci 943e18e3516Sopenharmony_ci 0 do not pass any special character tables 944e18e3516Sopenharmony_ci 1 the default ASCII tables, as distributed in 945e18e3516Sopenharmony_ci pcre2_chartables.c.dist 946e18e3516Sopenharmony_ci 2 a set of tables defining ISO 8859 characters 947e18e3516Sopenharmony_ci 3 a set of tables loaded by the #loadtables command 948e18e3516Sopenharmony_ci 949e18e3516Sopenharmony_ci In tables 2, some characters whose codes are greater than 128 are iden- 950e18e3516Sopenharmony_ci tified as letters, digits, spaces, etc. Tables 3 can be used only after 951e18e3516Sopenharmony_ci a #loadtables command has loaded them from a binary file. Setting al- 952e18e3516Sopenharmony_ci ternate character tables and a locale are mutually exclusive. 953e18e3516Sopenharmony_ci 954e18e3516Sopenharmony_ci Setting certain match controls 955e18e3516Sopenharmony_ci 956e18e3516Sopenharmony_ci The following modifiers are really subject modifiers, and are described 957e18e3516Sopenharmony_ci under "Subject Modifiers" below. However, they may be included in a 958e18e3516Sopenharmony_ci pattern's modifier list, in which case they are applied to every sub- 959e18e3516Sopenharmony_ci ject line that is processed with that pattern. These modifiers do not 960e18e3516Sopenharmony_ci affect the compilation process. 961e18e3516Sopenharmony_ci 962e18e3516Sopenharmony_ci aftertext show text after match 963e18e3516Sopenharmony_ci allaftertext show text after captures 964e18e3516Sopenharmony_ci allcaptures show all captures 965e18e3516Sopenharmony_ci allvector show the entire ovector 966e18e3516Sopenharmony_ci allusedtext show all consulted text 967e18e3516Sopenharmony_ci altglobal alternative global matching 968e18e3516Sopenharmony_ci /g global global matching 969e18e3516Sopenharmony_ci jitstack=<n> set size of JIT stack 970e18e3516Sopenharmony_ci mark show mark values 971e18e3516Sopenharmony_ci replace=<string> specify a replacement string 972e18e3516Sopenharmony_ci startchar show starting character when relevant 973e18e3516Sopenharmony_ci substitute_callout use substitution callouts 974e18e3516Sopenharmony_ci substitute_extended use PCRE2_SUBSTITUTE_EXTENDED 975e18e3516Sopenharmony_ci substitute_literal use PCRE2_SUBSTITUTE_LITERAL 976e18e3516Sopenharmony_ci substitute_matched use PCRE2_SUBSTITUTE_MATCHED 977e18e3516Sopenharmony_ci substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 978e18e3516Sopenharmony_ci substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 979e18e3516Sopenharmony_ci substitute_skip=<n> skip substitution <n> 980e18e3516Sopenharmony_ci substitute_stop=<n> skip substitution <n> and following 981e18e3516Sopenharmony_ci substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET 982e18e3516Sopenharmony_ci substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY 983e18e3516Sopenharmony_ci 984e18e3516Sopenharmony_ci These modifiers may not appear in a #pattern command. If you want them 985e18e3516Sopenharmony_ci as defaults, set them in a #subject command. 986e18e3516Sopenharmony_ci 987e18e3516Sopenharmony_ci Specifying literal subject lines 988e18e3516Sopenharmony_ci 989e18e3516Sopenharmony_ci If the subject_literal modifier is present on a pattern, all the sub- 990e18e3516Sopenharmony_ci ject lines that it matches are taken as literal strings, with no inter- 991e18e3516Sopenharmony_ci pretation of backslashes. It is not possible to set subject modifiers 992e18e3516Sopenharmony_ci on such lines, but any that are set as defaults by a #subject command 993e18e3516Sopenharmony_ci are recognized. 994e18e3516Sopenharmony_ci 995e18e3516Sopenharmony_ci Saving a compiled pattern 996e18e3516Sopenharmony_ci 997e18e3516Sopenharmony_ci When a pattern with the push modifier is successfully compiled, it is 998e18e3516Sopenharmony_ci pushed onto a stack of compiled patterns, and pcre2test expects the 999e18e3516Sopenharmony_ci next line to contain a new pattern (or a command) instead of a subject 1000e18e3516Sopenharmony_ci line. This facility is used when saving compiled patterns to a file, as 1001e18e3516Sopenharmony_ci described in the section entitled "Saving and restoring compiled pat- 1002e18e3516Sopenharmony_ci terns" below. If pushcopy is used instead of push, a copy of the com- 1003e18e3516Sopenharmony_ci piled pattern is stacked, leaving the original as current, ready to 1004e18e3516Sopenharmony_ci match the following input lines. This provides a way of testing the 1005e18e3516Sopenharmony_ci pcre2_code_copy() function. The push and pushcopy modifiers are in- 1006e18e3516Sopenharmony_ci compatible with compilation modifiers such as global that act at match 1007e18e3516Sopenharmony_ci time. Any that are specified are ignored (for the stacked copy), with a 1008e18e3516Sopenharmony_ci warning message, except for replace, which causes an error. Note that 1009e18e3516Sopenharmony_ci jitverify, which is allowed, does not carry through to any subsequent 1010e18e3516Sopenharmony_ci matching that uses a stacked pattern. 1011e18e3516Sopenharmony_ci 1012e18e3516Sopenharmony_ci Testing foreign pattern conversion 1013e18e3516Sopenharmony_ci 1014e18e3516Sopenharmony_ci The experimental foreign pattern conversion functions in PCRE2 can be 1015e18e3516Sopenharmony_ci tested by setting the convert modifier. Its argument is a colon-sepa- 1016e18e3516Sopenharmony_ci rated list of options, which set the equivalent option for the 1017e18e3516Sopenharmony_ci pcre2_pattern_convert() function: 1018e18e3516Sopenharmony_ci 1019e18e3516Sopenharmony_ci glob PCRE2_CONVERT_GLOB 1020e18e3516Sopenharmony_ci glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR 1021e18e3516Sopenharmony_ci glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 1022e18e3516Sopenharmony_ci posix_basic PCRE2_CONVERT_POSIX_BASIC 1023e18e3516Sopenharmony_ci posix_extended PCRE2_CONVERT_POSIX_EXTENDED 1024e18e3516Sopenharmony_ci unset Unset all options 1025e18e3516Sopenharmony_ci 1026e18e3516Sopenharmony_ci The "unset" value is useful for turning off a default that has been set 1027e18e3516Sopenharmony_ci by a #pattern command. When one of these options is set, the input pat- 1028e18e3516Sopenharmony_ci tern is passed to pcre2_pattern_convert(). If the conversion is suc- 1029e18e3516Sopenharmony_ci cessful, the result is reflected in the output and then passed to 1030e18e3516Sopenharmony_ci pcre2_compile(). The normal utf and no_utf_check options, if set, cause 1031e18e3516Sopenharmony_ci the PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be 1032e18e3516Sopenharmony_ci passed to pcre2_pattern_convert(). 1033e18e3516Sopenharmony_ci 1034e18e3516Sopenharmony_ci By default, the conversion function is allowed to allocate a buffer for 1035e18e3516Sopenharmony_ci its output. However, if the convert_length modifier is set to a value 1036e18e3516Sopenharmony_ci greater than zero, pcre2test passes a buffer of the given length. This 1037e18e3516Sopenharmony_ci makes it possible to test the length check. 1038e18e3516Sopenharmony_ci 1039e18e3516Sopenharmony_ci The convert_glob_escape and convert_glob_separator modifiers can be 1040e18e3516Sopenharmony_ci used to specify the escape and separator characters for glob process- 1041e18e3516Sopenharmony_ci ing, overriding the defaults, which are operating-system dependent. 1042e18e3516Sopenharmony_ci 1043e18e3516Sopenharmony_ci 1044e18e3516Sopenharmony_ciSUBJECT MODIFIERS 1045e18e3516Sopenharmony_ci 1046e18e3516Sopenharmony_ci The modifiers that can appear in subject lines and the #subject command 1047e18e3516Sopenharmony_ci are of two types. 1048e18e3516Sopenharmony_ci 1049e18e3516Sopenharmony_ci Setting match options 1050e18e3516Sopenharmony_ci 1051e18e3516Sopenharmony_ci The following modifiers set options for pcre2_match() or 1052e18e3516Sopenharmony_ci pcre2_dfa_match(). See pcreapi for a description of their effects. 1053e18e3516Sopenharmony_ci 1054e18e3516Sopenharmony_ci anchored set PCRE2_ANCHORED 1055e18e3516Sopenharmony_ci endanchored set PCRE2_ENDANCHORED 1056e18e3516Sopenharmony_ci dfa_restart set PCRE2_DFA_RESTART 1057e18e3516Sopenharmony_ci dfa_shortest set PCRE2_DFA_SHORTEST 1058e18e3516Sopenharmony_ci no_jit set PCRE2_NO_JIT 1059e18e3516Sopenharmony_ci no_utf_check set PCRE2_NO_UTF_CHECK 1060e18e3516Sopenharmony_ci notbol set PCRE2_NOTBOL 1061e18e3516Sopenharmony_ci notempty set PCRE2_NOTEMPTY 1062e18e3516Sopenharmony_ci notempty_atstart set PCRE2_NOTEMPTY_ATSTART 1063e18e3516Sopenharmony_ci noteol set PCRE2_NOTEOL 1064e18e3516Sopenharmony_ci partial_hard (or ph) set PCRE2_PARTIAL_HARD 1065e18e3516Sopenharmony_ci partial_soft (or ps) set PCRE2_PARTIAL_SOFT 1066e18e3516Sopenharmony_ci 1067e18e3516Sopenharmony_ci The partial matching modifiers are provided with abbreviations because 1068e18e3516Sopenharmony_ci they appear frequently in tests. 1069e18e3516Sopenharmony_ci 1070e18e3516Sopenharmony_ci If the posix or posix_nosub modifier was present on the pattern, caus- 1071e18e3516Sopenharmony_ci ing the POSIX wrapper API to be used, the only option-setting modifiers 1072e18e3516Sopenharmony_ci that have any effect are notbol, notempty, and noteol, causing REG_NOT- 1073e18e3516Sopenharmony_ci BOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to 1074e18e3516Sopenharmony_ci regexec(). The other modifiers are ignored, with a warning message. 1075e18e3516Sopenharmony_ci 1076e18e3516Sopenharmony_ci There is one additional modifier that can be used with the POSIX wrap- 1077e18e3516Sopenharmony_ci per. It is ignored (with a warning) if used for non-POSIX matching. 1078e18e3516Sopenharmony_ci 1079e18e3516Sopenharmony_ci posix_startend=<n>[:<m>] 1080e18e3516Sopenharmony_ci 1081e18e3516Sopenharmony_ci This causes the subject string to be passed to regexec() using the 1082e18e3516Sopenharmony_ci REG_STARTEND option, which uses offsets to specify which part of the 1083e18e3516Sopenharmony_ci string is searched. If only one number is given, the end offset is 1084e18e3516Sopenharmony_ci passed as the end of the subject string. For more detail of REG_STAR- 1085e18e3516Sopenharmony_ci TEND, see the pcre2posix documentation. If the subject string contains 1086e18e3516Sopenharmony_ci binary zeros (coded as escapes such as \x{00} because pcre2test does 1087e18e3516Sopenharmony_ci not support actual binary zeros in its input), you must use posix_star- 1088e18e3516Sopenharmony_ci tend to specify its length. 1089e18e3516Sopenharmony_ci 1090e18e3516Sopenharmony_ci Setting match controls 1091e18e3516Sopenharmony_ci 1092e18e3516Sopenharmony_ci The following modifiers affect the matching process or request addi- 1093e18e3516Sopenharmony_ci tional information. Some of them may also be specified on a pattern 1094e18e3516Sopenharmony_ci line (see above), in which case they apply to every subject line that 1095e18e3516Sopenharmony_ci is matched against that pattern, but can be overridden by modifiers on 1096e18e3516Sopenharmony_ci the subject. 1097e18e3516Sopenharmony_ci 1098e18e3516Sopenharmony_ci aftertext show text after match 1099e18e3516Sopenharmony_ci allaftertext show text after captures 1100e18e3516Sopenharmony_ci allcaptures show all captures 1101e18e3516Sopenharmony_ci allvector show the entire ovector 1102e18e3516Sopenharmony_ci allusedtext show all consulted text (non-JIT only) 1103e18e3516Sopenharmony_ci altglobal alternative global matching 1104e18e3516Sopenharmony_ci callout_capture show captures at callout time 1105e18e3516Sopenharmony_ci callout_data=<n> set a value to pass via callouts 1106e18e3516Sopenharmony_ci callout_error=<n>[:<m>] control callout error 1107e18e3516Sopenharmony_ci callout_extra show extra callout information 1108e18e3516Sopenharmony_ci callout_fail=<n>[:<m>] control callout failure 1109e18e3516Sopenharmony_ci callout_no_where do not show position of a callout 1110e18e3516Sopenharmony_ci callout_none do not supply a callout function 1111e18e3516Sopenharmony_ci copy=<number or name> copy captured substring 1112e18e3516Sopenharmony_ci depth_limit=<n> set a depth limit 1113e18e3516Sopenharmony_ci dfa use pcre2_dfa_match() 1114e18e3516Sopenharmony_ci find_limits find heap, match and depth limits 1115e18e3516Sopenharmony_ci find_limits_noheap find match and depth limits 1116e18e3516Sopenharmony_ci get=<number or name> extract captured substring 1117e18e3516Sopenharmony_ci getall extract all captured substrings 1118e18e3516Sopenharmony_ci /g global global matching 1119e18e3516Sopenharmony_ci heap_limit=<n> set a limit on heap memory (Kbytes) 1120e18e3516Sopenharmony_ci jitstack=<n> set size of JIT stack 1121e18e3516Sopenharmony_ci mark show mark values 1122e18e3516Sopenharmony_ci match_limit=<n> set a match limit 1123e18e3516Sopenharmony_ci memory show heap memory usage 1124e18e3516Sopenharmony_ci null_context match with a NULL context 1125e18e3516Sopenharmony_ci null_replacement substitute with NULL replacement 1126e18e3516Sopenharmony_ci null_subject match with NULL subject 1127e18e3516Sopenharmony_ci offset=<n> set starting offset 1128e18e3516Sopenharmony_ci offset_limit=<n> set offset limit 1129e18e3516Sopenharmony_ci ovector=<n> set size of output vector 1130e18e3516Sopenharmony_ci recursion_limit=<n> obsolete synonym for depth_limit 1131e18e3516Sopenharmony_ci replace=<string> specify a replacement string 1132e18e3516Sopenharmony_ci startchar show startchar when relevant 1133e18e3516Sopenharmony_ci startoffset=<n> same as offset=<n> 1134e18e3516Sopenharmony_ci substitute_callout use substitution callouts 1135e18e3516Sopenharmony_ci substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED 1136e18e3516Sopenharmony_ci substitute_literal use PCRE2_SUBSTITUTE_LITERAL 1137e18e3516Sopenharmony_ci substitute_matched use PCRE2_SUBSTITUTE_MATCHED 1138e18e3516Sopenharmony_ci substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 1139e18e3516Sopenharmony_ci substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 1140e18e3516Sopenharmony_ci substitute_skip=<n> skip substitution number n 1141e18e3516Sopenharmony_ci substitute_stop=<n> skip substitution number n and greater 1142e18e3516Sopenharmony_ci substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET 1143e18e3516Sopenharmony_ci substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY 1144e18e3516Sopenharmony_ci zero_terminate pass the subject as zero-terminated 1145e18e3516Sopenharmony_ci 1146e18e3516Sopenharmony_ci The effects of these modifiers are described in the following sections. 1147e18e3516Sopenharmony_ci When matching via the POSIX wrapper API, the aftertext, allaftertext, 1148e18e3516Sopenharmony_ci and ovector subject modifiers work as described below. All other modi- 1149e18e3516Sopenharmony_ci fiers are either ignored, with a warning message, or cause an error. 1150e18e3516Sopenharmony_ci 1151e18e3516Sopenharmony_ci Showing more text 1152e18e3516Sopenharmony_ci 1153e18e3516Sopenharmony_ci The aftertext modifier requests that as well as outputting the part of 1154e18e3516Sopenharmony_ci the subject string that matched the entire pattern, pcre2test should in 1155e18e3516Sopenharmony_ci addition output the remainder of the subject string. This is useful for 1156e18e3516Sopenharmony_ci tests where the subject contains multiple copies of the same substring. 1157e18e3516Sopenharmony_ci The allaftertext modifier requests the same action for captured sub- 1158e18e3516Sopenharmony_ci strings as well as the main matched substring. In each case the remain- 1159e18e3516Sopenharmony_ci der is output on the following line with a plus character following the 1160e18e3516Sopenharmony_ci capture number. 1161e18e3516Sopenharmony_ci 1162e18e3516Sopenharmony_ci The allusedtext modifier requests that all the text that was consulted 1163e18e3516Sopenharmony_ci during a successful pattern match by the interpreter should be shown, 1164e18e3516Sopenharmony_ci for both full and partial matches. This feature is not supported for 1165e18e3516Sopenharmony_ci JIT matching, and if requested with JIT it is ignored (with a warning 1166e18e3516Sopenharmony_ci message). Setting this modifier affects the output if there is a look- 1167e18e3516Sopenharmony_ci behind at the start of a match, or, for a complete match, a lookahead 1168e18e3516Sopenharmony_ci at the end, or if \K is used in the pattern. Characters that precede or 1169e18e3516Sopenharmony_ci follow the start and end of the actual match are indicated in the out- 1170e18e3516Sopenharmony_ci put by '<' or '>' characters underneath them. Here is an example: 1171e18e3516Sopenharmony_ci 1172e18e3516Sopenharmony_ci re> /(?<=pqr)abc(?=xyz)/ 1173e18e3516Sopenharmony_ci data> 123pqrabcxyz456\=allusedtext 1174e18e3516Sopenharmony_ci 0: pqrabcxyz 1175e18e3516Sopenharmony_ci <<< >>> 1176e18e3516Sopenharmony_ci data> 123pqrabcxy\=ph,allusedtext 1177e18e3516Sopenharmony_ci Partial match: pqrabcxy 1178e18e3516Sopenharmony_ci <<< 1179e18e3516Sopenharmony_ci 1180e18e3516Sopenharmony_ci The first, complete match shows that the matched string is "abc", with 1181e18e3516Sopenharmony_ci the preceding and following strings "pqr" and "xyz" having been con- 1182e18e3516Sopenharmony_ci sulted during the match (when processing the assertions). The partial 1183e18e3516Sopenharmony_ci match can indicate only the preceding string. 1184e18e3516Sopenharmony_ci 1185e18e3516Sopenharmony_ci The startchar modifier requests that the starting character for the 1186e18e3516Sopenharmony_ci match be indicated, if it is different to the start of the matched 1187e18e3516Sopenharmony_ci string. The only time when this occurs is when \K has been processed as 1188e18e3516Sopenharmony_ci part of the match. In this situation, the output for the matched string 1189e18e3516Sopenharmony_ci is displayed from the starting character instead of from the match 1190e18e3516Sopenharmony_ci point, with circumflex characters under the earlier characters. For ex- 1191e18e3516Sopenharmony_ci ample: 1192e18e3516Sopenharmony_ci 1193e18e3516Sopenharmony_ci re> /abc\Kxyz/ 1194e18e3516Sopenharmony_ci data> abcxyz\=startchar 1195e18e3516Sopenharmony_ci 0: abcxyz 1196e18e3516Sopenharmony_ci ^^^ 1197e18e3516Sopenharmony_ci 1198e18e3516Sopenharmony_ci Unlike allusedtext, the startchar modifier can be used with JIT. How- 1199e18e3516Sopenharmony_ci ever, these two modifiers are mutually exclusive. 1200e18e3516Sopenharmony_ci 1201e18e3516Sopenharmony_ci Showing the value of all capture groups 1202e18e3516Sopenharmony_ci 1203e18e3516Sopenharmony_ci The allcaptures modifier requests that the values of all potential cap- 1204e18e3516Sopenharmony_ci tured parentheses be output after a match. By default, only those up to 1205e18e3516Sopenharmony_ci the highest one actually used in the match are output (corresponding to 1206e18e3516Sopenharmony_ci the return code from pcre2_match()). Groups that did not take part in 1207e18e3516Sopenharmony_ci the match are output as "<unset>". This modifier is not relevant for 1208e18e3516Sopenharmony_ci DFA matching (which does no capturing) and does not apply when replace 1209e18e3516Sopenharmony_ci is specified; it is ignored, with a warning message, if present. 1210e18e3516Sopenharmony_ci 1211e18e3516Sopenharmony_ci Showing the entire ovector, for all outcomes 1212e18e3516Sopenharmony_ci 1213e18e3516Sopenharmony_ci The allvector modifier requests that the entire ovector be shown, what- 1214e18e3516Sopenharmony_ci ever the outcome of the match. Compare allcaptures, which shows only up 1215e18e3516Sopenharmony_ci to the maximum number of capture groups for the pattern, and then only 1216e18e3516Sopenharmony_ci for a successful complete non-DFA match. This modifier, which acts af- 1217e18e3516Sopenharmony_ci ter any match result, and also for DFA matching, provides a means of 1218e18e3516Sopenharmony_ci checking that there are no unexpected modifications to ovector fields. 1219e18e3516Sopenharmony_ci Before each match attempt, the ovector is filled with a special value, 1220e18e3516Sopenharmony_ci and if this is found in both elements of a capturing pair, "<un- 1221e18e3516Sopenharmony_ci changed>" is output. After a successful match, this applies to all 1222e18e3516Sopenharmony_ci groups after the maximum capture group for the pattern. In other cases 1223e18e3516Sopenharmony_ci it applies to the entire ovector. After a partial match, the first two 1224e18e3516Sopenharmony_ci elements are the only ones that should be set. After a DFA match, the 1225e18e3516Sopenharmony_ci amount of ovector that is used depends on the number of matches that 1226e18e3516Sopenharmony_ci were found. 1227e18e3516Sopenharmony_ci 1228e18e3516Sopenharmony_ci Testing pattern callouts 1229e18e3516Sopenharmony_ci 1230e18e3516Sopenharmony_ci A callout function is supplied when pcre2test calls the library match- 1231e18e3516Sopenharmony_ci ing functions, unless callout_none is specified. Its behaviour can be 1232e18e3516Sopenharmony_ci controlled by various modifiers listed above whose names begin with 1233e18e3516Sopenharmony_ci callout_. Details are given in the section entitled "Callouts" below. 1234e18e3516Sopenharmony_ci Testing callouts from pcre2_substitute() is described separately in 1235e18e3516Sopenharmony_ci "Testing the substitution function" below. 1236e18e3516Sopenharmony_ci 1237e18e3516Sopenharmony_ci Finding all matches in a string 1238e18e3516Sopenharmony_ci 1239e18e3516Sopenharmony_ci Searching for all possible matches within a subject can be requested by 1240e18e3516Sopenharmony_ci the global or altglobal modifier. After finding a match, the matching 1241e18e3516Sopenharmony_ci function is called again to search the remainder of the subject. The 1242e18e3516Sopenharmony_ci difference between global and altglobal is that the former uses the 1243e18e3516Sopenharmony_ci start_offset argument to pcre2_match() or pcre2_dfa_match() to start 1244e18e3516Sopenharmony_ci searching at a new point within the entire string (which is what Perl 1245e18e3516Sopenharmony_ci does), whereas the latter passes over a shortened subject. This makes a 1246e18e3516Sopenharmony_ci difference to the matching process if the pattern begins with a lookbe- 1247e18e3516Sopenharmony_ci hind assertion (including \b or \B). 1248e18e3516Sopenharmony_ci 1249e18e3516Sopenharmony_ci If an empty string is matched, the next match is done with the 1250e18e3516Sopenharmony_ci PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search 1251e18e3516Sopenharmony_ci for another, non-empty, match at the same point in the subject. If this 1252e18e3516Sopenharmony_ci match fails, the start offset is advanced, and the normal match is re- 1253e18e3516Sopenharmony_ci tried. This imitates the way Perl handles such cases when using the /g 1254e18e3516Sopenharmony_ci modifier or the split() function. Normally, the start offset is ad- 1255e18e3516Sopenharmony_ci vanced by one character, but if the newline convention recognizes CRLF 1256e18e3516Sopenharmony_ci as a newline, and the current character is CR followed by LF, an ad- 1257e18e3516Sopenharmony_ci vance of two characters occurs. 1258e18e3516Sopenharmony_ci 1259e18e3516Sopenharmony_ci Testing substring extraction functions 1260e18e3516Sopenharmony_ci 1261e18e3516Sopenharmony_ci The copy and get modifiers can be used to test the pcre2_sub- 1262e18e3516Sopenharmony_ci string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be 1263e18e3516Sopenharmony_ci given more than once, and each can specify a capture group name or num- 1264e18e3516Sopenharmony_ci ber, for example: 1265e18e3516Sopenharmony_ci 1266e18e3516Sopenharmony_ci abcd\=copy=1,copy=3,get=G1 1267e18e3516Sopenharmony_ci 1268e18e3516Sopenharmony_ci If the #subject command is used to set default copy and/or get lists, 1269e18e3516Sopenharmony_ci these can be unset by specifying a negative number to cancel all num- 1270e18e3516Sopenharmony_ci bered groups and an empty name to cancel all named groups. 1271e18e3516Sopenharmony_ci 1272e18e3516Sopenharmony_ci The getall modifier tests pcre2_substring_list_get(), which extracts 1273e18e3516Sopenharmony_ci all captured substrings. 1274e18e3516Sopenharmony_ci 1275e18e3516Sopenharmony_ci If the subject line is successfully matched, the substrings extracted 1276e18e3516Sopenharmony_ci by the convenience functions are output with C, G, or L after the 1277e18e3516Sopenharmony_ci string number instead of a colon. This is in addition to the normal 1278e18e3516Sopenharmony_ci full list. The string length (that is, the return from the extraction 1279e18e3516Sopenharmony_ci function) is given in parentheses after each substring, followed by the 1280e18e3516Sopenharmony_ci name when the extraction was by name. 1281e18e3516Sopenharmony_ci 1282e18e3516Sopenharmony_ci Testing the substitution function 1283e18e3516Sopenharmony_ci 1284e18e3516Sopenharmony_ci If the replace modifier is set, the pcre2_substitute() function is 1285e18e3516Sopenharmony_ci called instead of one of the matching functions (or after one call of 1286e18e3516Sopenharmony_ci pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that re- 1287e18e3516Sopenharmony_ci placement strings cannot contain commas, because a comma signifies the 1288e18e3516Sopenharmony_ci end of a modifier. This is not thought to be an issue in a test pro- 1289e18e3516Sopenharmony_ci gram. 1290e18e3516Sopenharmony_ci 1291e18e3516Sopenharmony_ci Specifying a completely empty replacement string disables this modi- 1292e18e3516Sopenharmony_ci fier. However, it is possible to specify an empty replacement by pro- 1293e18e3516Sopenharmony_ci viding a buffer length, as described below, for an otherwise empty re- 1294e18e3516Sopenharmony_ci placement. 1295e18e3516Sopenharmony_ci 1296e18e3516Sopenharmony_ci Unlike subject strings, pcre2test does not process replacement strings 1297e18e3516Sopenharmony_ci for escape sequences. In UTF mode, a replacement string is checked to 1298e18e3516Sopenharmony_ci see if it is a valid UTF-8 string. If so, it is correctly converted to 1299e18e3516Sopenharmony_ci a UTF string of the appropriate code unit width. If it is not a valid 1300e18e3516Sopenharmony_ci UTF-8 string, the individual code units are copied directly. This pro- 1301e18e3516Sopenharmony_ci vides a means of passing an invalid UTF-8 string for testing purposes. 1302e18e3516Sopenharmony_ci 1303e18e3516Sopenharmony_ci The following modifiers set options (in additional to the normal match 1304e18e3516Sopenharmony_ci options) for pcre2_substitute(): 1305e18e3516Sopenharmony_ci 1306e18e3516Sopenharmony_ci global PCRE2_SUBSTITUTE_GLOBAL 1307e18e3516Sopenharmony_ci substitute_extended PCRE2_SUBSTITUTE_EXTENDED 1308e18e3516Sopenharmony_ci substitute_literal PCRE2_SUBSTITUTE_LITERAL 1309e18e3516Sopenharmony_ci substitute_matched PCRE2_SUBSTITUTE_MATCHED 1310e18e3516Sopenharmony_ci substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 1311e18e3516Sopenharmony_ci substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 1312e18e3516Sopenharmony_ci substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET 1313e18e3516Sopenharmony_ci substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY 1314e18e3516Sopenharmony_ci 1315e18e3516Sopenharmony_ci See the pcre2api documentation for details of these options. 1316e18e3516Sopenharmony_ci 1317e18e3516Sopenharmony_ci After a successful substitution, the modified string is output, pre- 1318e18e3516Sopenharmony_ci ceded by the number of replacements. This may be zero if there were no 1319e18e3516Sopenharmony_ci matches. Here is a simple example of a substitution test: 1320e18e3516Sopenharmony_ci 1321e18e3516Sopenharmony_ci /abc/replace=xxx 1322e18e3516Sopenharmony_ci =abc=abc= 1323e18e3516Sopenharmony_ci 1: =xxx=abc= 1324e18e3516Sopenharmony_ci =abc=abc=\=global 1325e18e3516Sopenharmony_ci 2: =xxx=xxx= 1326e18e3516Sopenharmony_ci 1327e18e3516Sopenharmony_ci Subject and replacement strings should be kept relatively short (fewer 1328e18e3516Sopenharmony_ci than 256 characters) for substitution tests, as fixed-size buffers are 1329e18e3516Sopenharmony_ci used. To make it easy to test for buffer overflow, if the replacement 1330e18e3516Sopenharmony_ci string starts with a number in square brackets, that number is passed 1331e18e3516Sopenharmony_ci to pcre2_substitute() as the size of the output buffer, with the re- 1332e18e3516Sopenharmony_ci placement string starting at the next character. Here is an example 1333e18e3516Sopenharmony_ci that tests the edge case: 1334e18e3516Sopenharmony_ci 1335e18e3516Sopenharmony_ci /abc/ 1336e18e3516Sopenharmony_ci 123abc123\=replace=[10]XYZ 1337e18e3516Sopenharmony_ci 1: 123XYZ123 1338e18e3516Sopenharmony_ci 123abc123\=replace=[9]XYZ 1339e18e3516Sopenharmony_ci Failed: error -47: no more memory 1340e18e3516Sopenharmony_ci 1341e18e3516Sopenharmony_ci The default action of pcre2_substitute() is to return PCRE2_ER- 1342e18e3516Sopenharmony_ci ROR_NOMEMORY when the output buffer is too small. However, if the 1343e18e3516Sopenharmony_ci PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi- 1344e18e3516Sopenharmony_ci tute_overflow_length modifier), pcre2_substitute() continues to go 1345e18e3516Sopenharmony_ci through the motions of matching and substituting (but not doing any 1346e18e3516Sopenharmony_ci callouts), in order to compute the size of buffer that is required. 1347e18e3516Sopenharmony_ci When this happens, pcre2test shows the required buffer length (which 1348e18e3516Sopenharmony_ci includes space for the trailing zero) as part of the error message. For 1349e18e3516Sopenharmony_ci example: 1350e18e3516Sopenharmony_ci 1351e18e3516Sopenharmony_ci /abc/substitute_overflow_length 1352e18e3516Sopenharmony_ci 123abc123\=replace=[9]XYZ 1353e18e3516Sopenharmony_ci Failed: error -47: no more memory: 10 code units are needed 1354e18e3516Sopenharmony_ci 1355e18e3516Sopenharmony_ci A replacement string is ignored with POSIX and DFA matching. Specifying 1356e18e3516Sopenharmony_ci partial matching provokes an error return ("bad option value") from 1357e18e3516Sopenharmony_ci pcre2_substitute(). 1358e18e3516Sopenharmony_ci 1359e18e3516Sopenharmony_ci Testing substitute callouts 1360e18e3516Sopenharmony_ci 1361e18e3516Sopenharmony_ci If the substitute_callout modifier is set, a substitution callout func- 1362e18e3516Sopenharmony_ci tion is set up. The null_context modifier must not be set, because the 1363e18e3516Sopenharmony_ci address of the callout function is passed in a match context. When the 1364e18e3516Sopenharmony_ci callout function is called (after each substitution), details of the 1365e18e3516Sopenharmony_ci the input and output strings are output. For example: 1366e18e3516Sopenharmony_ci 1367e18e3516Sopenharmony_ci /abc/g,replace=<$0>,substitute_callout 1368e18e3516Sopenharmony_ci abcdefabcpqr 1369e18e3516Sopenharmony_ci 1(1) Old 0 3 "abc" New 0 5 "<abc>" 1370e18e3516Sopenharmony_ci 2(1) Old 6 9 "abc" New 8 13 "<abc>" 1371e18e3516Sopenharmony_ci 2: <abc>def<abc>pqr 1372e18e3516Sopenharmony_ci 1373e18e3516Sopenharmony_ci The first number on each callout line is the count of matches. The 1374e18e3516Sopenharmony_ci parenthesized number is the number of pairs that are set in the ovector 1375e18e3516Sopenharmony_ci (that is, one more than the number of capturing groups that were set). 1376e18e3516Sopenharmony_ci Then are listed the offsets of the old substring, its contents, and the 1377e18e3516Sopenharmony_ci same for the replacement. 1378e18e3516Sopenharmony_ci 1379e18e3516Sopenharmony_ci By default, the substitution callout function returns zero, which ac- 1380e18e3516Sopenharmony_ci cepts the replacement and causes matching to continue if /g was used. 1381e18e3516Sopenharmony_ci Two further modifiers can be used to test other return values. If sub- 1382e18e3516Sopenharmony_ci stitute_skip is set to a value greater than zero the callout function 1383e18e3516Sopenharmony_ci returns +1 for the match of that number, and similarly substitute_stop 1384e18e3516Sopenharmony_ci returns -1. These cause the replacement to be rejected, and -1 causes 1385e18e3516Sopenharmony_ci no further matching to take place. If either of them are set, substi- 1386e18e3516Sopenharmony_ci tute_callout is assumed. For example: 1387e18e3516Sopenharmony_ci 1388e18e3516Sopenharmony_ci /abc/g,replace=<$0>,substitute_skip=1 1389e18e3516Sopenharmony_ci abcdefabcpqr 1390e18e3516Sopenharmony_ci 1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" 1391e18e3516Sopenharmony_ci 2(1) Old 6 9 "abc" New 6 11 "<abc>" 1392e18e3516Sopenharmony_ci 2: abcdef<abc>pqr 1393e18e3516Sopenharmony_ci abcdefabcpqr\=substitute_stop=1 1394e18e3516Sopenharmony_ci 1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" 1395e18e3516Sopenharmony_ci 1: abcdefabcpqr 1396e18e3516Sopenharmony_ci 1397e18e3516Sopenharmony_ci If both are set for the same number, stop takes precedence. Only a sin- 1398e18e3516Sopenharmony_ci gle skip or stop is supported, which is sufficient for testing that the 1399e18e3516Sopenharmony_ci feature works. 1400e18e3516Sopenharmony_ci 1401e18e3516Sopenharmony_ci Setting the JIT stack size 1402e18e3516Sopenharmony_ci 1403e18e3516Sopenharmony_ci The jitstack modifier provides a way of setting the maximum stack size 1404e18e3516Sopenharmony_ci that is used by the just-in-time optimization code. It is ignored if 1405e18e3516Sopenharmony_ci JIT optimization is not being used. The value is a number of kibibytes 1406e18e3516Sopenharmony_ci (units of 1024 bytes). Setting zero reverts to the default of 32KiB. 1407e18e3516Sopenharmony_ci Providing a stack that is larger than the default is necessary only for 1408e18e3516Sopenharmony_ci very complicated patterns. If jitstack is set non-zero on a subject 1409e18e3516Sopenharmony_ci line it overrides any value that was set on the pattern. 1410e18e3516Sopenharmony_ci 1411e18e3516Sopenharmony_ci Setting heap, match, and depth limits 1412e18e3516Sopenharmony_ci 1413e18e3516Sopenharmony_ci The heap_limit, match_limit, and depth_limit modifiers set the appro- 1414e18e3516Sopenharmony_ci priate limits in the match context. These values are ignored when the 1415e18e3516Sopenharmony_ci find_limits or find_limits_noheap modifier is specified. 1416e18e3516Sopenharmony_ci 1417e18e3516Sopenharmony_ci Finding minimum limits 1418e18e3516Sopenharmony_ci 1419e18e3516Sopenharmony_ci If the find_limits modifier is present on a subject line, pcre2test 1420e18e3516Sopenharmony_ci calls the relevant matching function several times, setting different 1421e18e3516Sopenharmony_ci values in the match context via pcre2_set_heap_limit(), 1422e18e3516Sopenharmony_ci pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the 1423e18e3516Sopenharmony_ci smallest value for each parameter that allows the match to complete 1424e18e3516Sopenharmony_ci without a "limit exceeded" error. The match itself may succeed or fail. 1425e18e3516Sopenharmony_ci An alternative modifier, find_limits_noheap, omits the heap limit. This 1426e18e3516Sopenharmony_ci is used in the standard tests, because the minimum heap limit varies 1427e18e3516Sopenharmony_ci between systems. If JIT is being used, only the match limit is rele- 1428e18e3516Sopenharmony_ci vant, and the other two are automatically omitted. 1429e18e3516Sopenharmony_ci 1430e18e3516Sopenharmony_ci When using this modifier, the pattern should not contain any limit set- 1431e18e3516Sopenharmony_ci tings such as (*LIMIT_MATCH=...) within it. If such a setting is 1432e18e3516Sopenharmony_ci present and is lower than the minimum matching value, the minimum value 1433e18e3516Sopenharmony_ci cannot be found because pcre2_set_match_limit() etc. are only able to 1434e18e3516Sopenharmony_ci reduce the value of an in-pattern limit; they cannot increase it. 1435e18e3516Sopenharmony_ci 1436e18e3516Sopenharmony_ci For non-DFA matching, the minimum depth_limit number is a measure of 1437e18e3516Sopenharmony_ci how much nested backtracking happens (that is, how deeply the pattern's 1438e18e3516Sopenharmony_ci tree is searched). In the case of DFA matching, depth_limit controls 1439e18e3516Sopenharmony_ci the depth of recursive calls of the internal function that is used for 1440e18e3516Sopenharmony_ci handling pattern recursion, lookaround assertions, and atomic groups. 1441e18e3516Sopenharmony_ci 1442e18e3516Sopenharmony_ci For non-DFA matching, the match_limit number is a measure of the amount 1443e18e3516Sopenharmony_ci of backtracking that takes place, and learning the minimum value can be 1444e18e3516Sopenharmony_ci instructive. For most simple matches, the number is quite small, but 1445e18e3516Sopenharmony_ci for patterns with very large numbers of matching possibilities, it can 1446e18e3516Sopenharmony_ci become large very quickly with increasing length of subject string. In 1447e18e3516Sopenharmony_ci the case of DFA matching, match_limit controls the total number of 1448e18e3516Sopenharmony_ci calls, both recursive and non-recursive, to the internal matching func- 1449e18e3516Sopenharmony_ci tion, thus controlling the overall amount of computing resource that is 1450e18e3516Sopenharmony_ci used. 1451e18e3516Sopenharmony_ci 1452e18e3516Sopenharmony_ci For both kinds of matching, the heap_limit number, which is in 1453e18e3516Sopenharmony_ci kibibytes (units of 1024 bytes), limits the amount of heap memory used 1454e18e3516Sopenharmony_ci for matching. 1455e18e3516Sopenharmony_ci 1456e18e3516Sopenharmony_ci Showing MARK names 1457e18e3516Sopenharmony_ci 1458e18e3516Sopenharmony_ci 1459e18e3516Sopenharmony_ci The mark modifier causes the names from backtracking control verbs that 1460e18e3516Sopenharmony_ci are returned from calls to pcre2_match() to be displayed. If a mark is 1461e18e3516Sopenharmony_ci returned for a match, non-match, or partial match, pcre2test shows it. 1462e18e3516Sopenharmony_ci For a match, it is on a line by itself, tagged with "MK:". Otherwise, 1463e18e3516Sopenharmony_ci it is added to the non-match message. 1464e18e3516Sopenharmony_ci 1465e18e3516Sopenharmony_ci Showing memory usage 1466e18e3516Sopenharmony_ci 1467e18e3516Sopenharmony_ci The memory modifier causes pcre2test to log the sizes of all heap mem- 1468e18e3516Sopenharmony_ci ory allocation and freeing calls that occur during a call to 1469e18e3516Sopenharmony_ci pcre2_match() or pcre2_dfa_match(). In the latter case, heap memory is 1470e18e3516Sopenharmony_ci used only when a match requires more internal workspace that the de- 1471e18e3516Sopenharmony_ci fault allocation on the stack, so in many cases there will be no out- 1472e18e3516Sopenharmony_ci put. No heap memory is allocated during matching with JIT. For this 1473e18e3516Sopenharmony_ci modifier to work, the null_context modifier must not be set on both the 1474e18e3516Sopenharmony_ci pattern and the subject, though it can be set on one or the other. 1475e18e3516Sopenharmony_ci 1476e18e3516Sopenharmony_ci Setting a starting offset 1477e18e3516Sopenharmony_ci 1478e18e3516Sopenharmony_ci The offset modifier sets an offset in the subject string at which 1479e18e3516Sopenharmony_ci matching starts. Its value is a number of code units, not characters. 1480e18e3516Sopenharmony_ci 1481e18e3516Sopenharmony_ci Setting an offset limit 1482e18e3516Sopenharmony_ci 1483e18e3516Sopenharmony_ci The offset_limit modifier sets a limit for unanchored matches. If a 1484e18e3516Sopenharmony_ci match cannot be found starting at or before this offset in the subject, 1485e18e3516Sopenharmony_ci a "no match" return is given. The data value is a number of code units, 1486e18e3516Sopenharmony_ci not characters. When this modifier is used, the use_offset_limit modi- 1487e18e3516Sopenharmony_ci fier must have been set for the pattern; if not, an error is generated. 1488e18e3516Sopenharmony_ci 1489e18e3516Sopenharmony_ci Setting the size of the output vector 1490e18e3516Sopenharmony_ci 1491e18e3516Sopenharmony_ci The ovector modifier applies only to the subject line in which it ap- 1492e18e3516Sopenharmony_ci pears, though of course it can also be used to set a default in a #sub- 1493e18e3516Sopenharmony_ci ject command. It specifies the number of pairs of offsets that are 1494e18e3516Sopenharmony_ci available for storing matching information. The default is 15. 1495e18e3516Sopenharmony_ci 1496e18e3516Sopenharmony_ci A value of zero is useful when testing the POSIX API because it causes 1497e18e3516Sopenharmony_ci regexec() to be called with a NULL capture vector. When not testing the 1498e18e3516Sopenharmony_ci POSIX API, a value of zero is used to cause pcre2_match_data_cre- 1499e18e3516Sopenharmony_ci ate_from_pattern() to be called, in order to create a match block of 1500e18e3516Sopenharmony_ci exactly the right size for the pattern. (It is not possible to create a 1501e18e3516Sopenharmony_ci match block with a zero-length ovector; there is always at least one 1502e18e3516Sopenharmony_ci pair of offsets.) 1503e18e3516Sopenharmony_ci 1504e18e3516Sopenharmony_ci Passing the subject as zero-terminated 1505e18e3516Sopenharmony_ci 1506e18e3516Sopenharmony_ci By default, the subject string is passed to a native API matching func- 1507e18e3516Sopenharmony_ci tion with its correct length. In order to test the facility for passing 1508e18e3516Sopenharmony_ci a zero-terminated string, the zero_terminate modifier is provided. It 1509e18e3516Sopenharmony_ci causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching 1510e18e3516Sopenharmony_ci via the POSIX interface, this modifier is ignored, with a warning. 1511e18e3516Sopenharmony_ci 1512e18e3516Sopenharmony_ci When testing pcre2_substitute(), this modifier also has the effect of 1513e18e3516Sopenharmony_ci passing the replacement string as zero-terminated. 1514e18e3516Sopenharmony_ci 1515e18e3516Sopenharmony_ci Passing a NULL context, subject, or replacement 1516e18e3516Sopenharmony_ci 1517e18e3516Sopenharmony_ci Normally, pcre2test passes a context block to pcre2_match(), 1518e18e3516Sopenharmony_ci pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). If the 1519e18e3516Sopenharmony_ci null_context modifier is set, however, NULL is passed. This is for 1520e18e3516Sopenharmony_ci testing that the matching and substitution functions behave correctly 1521e18e3516Sopenharmony_ci in this case (they use default values). This modifier cannot be used 1522e18e3516Sopenharmony_ci with the find_limits, find_limits_noheap, or substitute_callout modi- 1523e18e3516Sopenharmony_ci fiers. 1524e18e3516Sopenharmony_ci 1525e18e3516Sopenharmony_ci Similarly, for testing purposes, if the null_subject or null_replace- 1526e18e3516Sopenharmony_ci ment modifier is set, the subject or replacement string pointers are 1527e18e3516Sopenharmony_ci passed as NULL, respectively, to the relevant functions. 1528e18e3516Sopenharmony_ci 1529e18e3516Sopenharmony_ci 1530e18e3516Sopenharmony_ciTHE ALTERNATIVE MATCHING FUNCTION 1531e18e3516Sopenharmony_ci 1532e18e3516Sopenharmony_ci By default, pcre2test uses the standard PCRE2 matching function, 1533e18e3516Sopenharmony_ci pcre2_match() to match each subject line. PCRE2 also supports an alter- 1534e18e3516Sopenharmony_ci native matching function, pcre2_dfa_match(), which operates in a dif- 1535e18e3516Sopenharmony_ci ferent way, and has some restrictions. The differences between the two 1536e18e3516Sopenharmony_ci functions are described in the pcre2matching documentation. 1537e18e3516Sopenharmony_ci 1538e18e3516Sopenharmony_ci If the dfa modifier is set, the alternative matching function is used. 1539e18e3516Sopenharmony_ci This function finds all possible matches at a given point in the sub- 1540e18e3516Sopenharmony_ci ject. If, however, the dfa_shortest modifier is set, processing stops 1541e18e3516Sopenharmony_ci after the first match is found. This is always the shortest possible 1542e18e3516Sopenharmony_ci match. 1543e18e3516Sopenharmony_ci 1544e18e3516Sopenharmony_ci 1545e18e3516Sopenharmony_ciDEFAULT OUTPUT FROM pcre2test 1546e18e3516Sopenharmony_ci 1547e18e3516Sopenharmony_ci This section describes the output when the normal matching function, 1548e18e3516Sopenharmony_ci pcre2_match(), is being used. 1549e18e3516Sopenharmony_ci 1550e18e3516Sopenharmony_ci When a match succeeds, pcre2test outputs the list of captured sub- 1551e18e3516Sopenharmony_ci strings, starting with number 0 for the string that matched the whole 1552e18e3516Sopenharmony_ci pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER- 1553e18e3516Sopenharmony_ci ROR_NOMATCH, or "Partial match:" followed by the partially matching 1554e18e3516Sopenharmony_ci substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is 1555e18e3516Sopenharmony_ci the entire substring that was inspected during the partial match; it 1556e18e3516Sopenharmony_ci may include characters before the actual match start if a lookbehind 1557e18e3516Sopenharmony_ci assertion, \K, \b, or \B was involved.) 1558e18e3516Sopenharmony_ci 1559e18e3516Sopenharmony_ci For any other return, pcre2test outputs the PCRE2 negative error number 1560e18e3516Sopenharmony_ci and a short descriptive phrase. If the error is a failed UTF string 1561e18e3516Sopenharmony_ci check, the code unit offset of the start of the failing character is 1562e18e3516Sopenharmony_ci also output. Here is an example of an interactive pcre2test run. 1563e18e3516Sopenharmony_ci 1564e18e3516Sopenharmony_ci $ pcre2test 1565e18e3516Sopenharmony_ci PCRE2 version 10.22 2016-07-29 1566e18e3516Sopenharmony_ci 1567e18e3516Sopenharmony_ci re> /^abc(\d+)/ 1568e18e3516Sopenharmony_ci data> abc123 1569e18e3516Sopenharmony_ci 0: abc123 1570e18e3516Sopenharmony_ci 1: 123 1571e18e3516Sopenharmony_ci data> xyz 1572e18e3516Sopenharmony_ci No match 1573e18e3516Sopenharmony_ci 1574e18e3516Sopenharmony_ci Unset capturing substrings that are not followed by one that is set are 1575e18e3516Sopenharmony_ci not shown by pcre2test unless the allcaptures modifier is specified. In 1576e18e3516Sopenharmony_ci the following example, there are two capturing substrings, but when the 1577e18e3516Sopenharmony_ci first data line is matched, the second, unset substring is not shown. 1578e18e3516Sopenharmony_ci An "internal" unset substring is shown as "<unset>", as for the second 1579e18e3516Sopenharmony_ci data line. 1580e18e3516Sopenharmony_ci 1581e18e3516Sopenharmony_ci re> /(a)|(b)/ 1582e18e3516Sopenharmony_ci data> a 1583e18e3516Sopenharmony_ci 0: a 1584e18e3516Sopenharmony_ci 1: a 1585e18e3516Sopenharmony_ci data> b 1586e18e3516Sopenharmony_ci 0: b 1587e18e3516Sopenharmony_ci 1: <unset> 1588e18e3516Sopenharmony_ci 2: b 1589e18e3516Sopenharmony_ci 1590e18e3516Sopenharmony_ci If the strings contain any non-printing characters, they are output as 1591e18e3516Sopenharmony_ci \xhh escapes if the value is less than 256 and UTF mode is not set. 1592e18e3516Sopenharmony_ci Otherwise they are output as \x{hh...} escapes. See below for the defi- 1593e18e3516Sopenharmony_ci nition of non-printing characters. If the aftertext modifier is set, 1594e18e3516Sopenharmony_ci the output for substring 0 is followed by the the rest of the subject 1595e18e3516Sopenharmony_ci string, identified by "0+" like this: 1596e18e3516Sopenharmony_ci 1597e18e3516Sopenharmony_ci re> /cat/aftertext 1598e18e3516Sopenharmony_ci data> cataract 1599e18e3516Sopenharmony_ci 0: cat 1600e18e3516Sopenharmony_ci 0+ aract 1601e18e3516Sopenharmony_ci 1602e18e3516Sopenharmony_ci If global matching is requested, the results of successive matching at- 1603e18e3516Sopenharmony_ci tempts are output in sequence, like this: 1604e18e3516Sopenharmony_ci 1605e18e3516Sopenharmony_ci re> /\Bi(\w\w)/g 1606e18e3516Sopenharmony_ci data> Mississippi 1607e18e3516Sopenharmony_ci 0: iss 1608e18e3516Sopenharmony_ci 1: ss 1609e18e3516Sopenharmony_ci 0: iss 1610e18e3516Sopenharmony_ci 1: ss 1611e18e3516Sopenharmony_ci 0: ipp 1612e18e3516Sopenharmony_ci 1: pp 1613e18e3516Sopenharmony_ci 1614e18e3516Sopenharmony_ci "No match" is output only if the first match attempt fails. Here is an 1615e18e3516Sopenharmony_ci example of a failure message (the offset 4 that is specified by the 1616e18e3516Sopenharmony_ci offset modifier is past the end of the subject string): 1617e18e3516Sopenharmony_ci 1618e18e3516Sopenharmony_ci re> /xyz/ 1619e18e3516Sopenharmony_ci data> xyz\=offset=4 1620e18e3516Sopenharmony_ci Error -24 (bad offset value) 1621e18e3516Sopenharmony_ci 1622e18e3516Sopenharmony_ci Note that whereas patterns can be continued over several lines (a plain 1623e18e3516Sopenharmony_ci ">" prompt is used for continuations), subject lines may not. However 1624e18e3516Sopenharmony_ci newlines can be included in a subject by means of the \n escape (or \r, 1625e18e3516Sopenharmony_ci \r\n, etc., depending on the newline sequence setting). 1626e18e3516Sopenharmony_ci 1627e18e3516Sopenharmony_ci 1628e18e3516Sopenharmony_ciOUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION 1629e18e3516Sopenharmony_ci 1630e18e3516Sopenharmony_ci When the alternative matching function, pcre2_dfa_match(), is used, the 1631e18e3516Sopenharmony_ci output consists of a list of all the matches that start at the first 1632e18e3516Sopenharmony_ci point in the subject where there is at least one match. For example: 1633e18e3516Sopenharmony_ci 1634e18e3516Sopenharmony_ci re> /(tang|tangerine|tan)/ 1635e18e3516Sopenharmony_ci data> yellow tangerine\=dfa 1636e18e3516Sopenharmony_ci 0: tangerine 1637e18e3516Sopenharmony_ci 1: tang 1638e18e3516Sopenharmony_ci 2: tan 1639e18e3516Sopenharmony_ci 1640e18e3516Sopenharmony_ci Using the normal matching function on this data finds only "tang". The 1641e18e3516Sopenharmony_ci longest matching string is always given first (and numbered zero). Af- 1642e18e3516Sopenharmony_ci ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol- 1643e18e3516Sopenharmony_ci lowed by the partially matching substring. Note that this is the entire 1644e18e3516Sopenharmony_ci substring that was inspected during the partial match; it may include 1645e18e3516Sopenharmony_ci characters before the actual match start if a lookbehind assertion, \b, 1646e18e3516Sopenharmony_ci or \B was involved. (\K is not supported for DFA matching.) 1647e18e3516Sopenharmony_ci 1648e18e3516Sopenharmony_ci If global matching is requested, the search for further matches resumes 1649e18e3516Sopenharmony_ci at the end of the longest match. For example: 1650e18e3516Sopenharmony_ci 1651e18e3516Sopenharmony_ci re> /(tang|tangerine|tan)/g 1652e18e3516Sopenharmony_ci data> yellow tangerine and tangy sultana\=dfa 1653e18e3516Sopenharmony_ci 0: tangerine 1654e18e3516Sopenharmony_ci 1: tang 1655e18e3516Sopenharmony_ci 2: tan 1656e18e3516Sopenharmony_ci 0: tang 1657e18e3516Sopenharmony_ci 1: tan 1658e18e3516Sopenharmony_ci 0: tan 1659e18e3516Sopenharmony_ci 1660e18e3516Sopenharmony_ci The alternative matching function does not support substring capture, 1661e18e3516Sopenharmony_ci so the modifiers that are concerned with captured substrings are not 1662e18e3516Sopenharmony_ci relevant. 1663e18e3516Sopenharmony_ci 1664e18e3516Sopenharmony_ci 1665e18e3516Sopenharmony_ciRESTARTING AFTER A PARTIAL MATCH 1666e18e3516Sopenharmony_ci 1667e18e3516Sopenharmony_ci When the alternative matching function has given the PCRE2_ERROR_PAR- 1668e18e3516Sopenharmony_ci TIAL return, indicating that the subject partially matched the pattern, 1669e18e3516Sopenharmony_ci you can restart the match with additional subject data by means of the 1670e18e3516Sopenharmony_ci dfa_restart modifier. For example: 1671e18e3516Sopenharmony_ci 1672e18e3516Sopenharmony_ci re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ 1673e18e3516Sopenharmony_ci data> 23ja\=ps,dfa 1674e18e3516Sopenharmony_ci Partial match: 23ja 1675e18e3516Sopenharmony_ci data> n05\=dfa,dfa_restart 1676e18e3516Sopenharmony_ci 0: n05 1677e18e3516Sopenharmony_ci 1678e18e3516Sopenharmony_ci For further information about partial matching, see the pcre2partial 1679e18e3516Sopenharmony_ci documentation. 1680e18e3516Sopenharmony_ci 1681e18e3516Sopenharmony_ci 1682e18e3516Sopenharmony_ciCALLOUTS 1683e18e3516Sopenharmony_ci 1684e18e3516Sopenharmony_ci If the pattern contains any callout requests, pcre2test's callout func- 1685e18e3516Sopenharmony_ci tion is called during matching unless callout_none is specified. This 1686e18e3516Sopenharmony_ci works with both matching functions, and with JIT, though there are some 1687e18e3516Sopenharmony_ci differences in behaviour. The output for callouts with numerical argu- 1688e18e3516Sopenharmony_ci ments and those with string arguments is slightly different. 1689e18e3516Sopenharmony_ci 1690e18e3516Sopenharmony_ci Callouts with numerical arguments 1691e18e3516Sopenharmony_ci 1692e18e3516Sopenharmony_ci By default, the callout function displays the callout number, the start 1693e18e3516Sopenharmony_ci and current positions in the subject text at the callout time, and the 1694e18e3516Sopenharmony_ci next pattern item to be tested. For example: 1695e18e3516Sopenharmony_ci 1696e18e3516Sopenharmony_ci --->pqrabcdef 1697e18e3516Sopenharmony_ci 0 ^ ^ \d 1698e18e3516Sopenharmony_ci 1699e18e3516Sopenharmony_ci This output indicates that callout number 0 occurred for a match at- 1700e18e3516Sopenharmony_ci tempt starting at the fourth character of the subject string, when the 1701e18e3516Sopenharmony_ci pointer was at the seventh character, and when the next pattern item 1702e18e3516Sopenharmony_ci was \d. Just one circumflex is output if the start and current posi- 1703e18e3516Sopenharmony_ci tions are the same, or if the current position precedes the start posi- 1704e18e3516Sopenharmony_ci tion, which can happen if the callout is in a lookbehind assertion. 1705e18e3516Sopenharmony_ci 1706e18e3516Sopenharmony_ci Callouts numbered 255 are assumed to be automatic callouts, inserted as 1707e18e3516Sopenharmony_ci a result of the auto_callout pattern modifier. In this case, instead of 1708e18e3516Sopenharmony_ci showing the callout number, the offset in the pattern, preceded by a 1709e18e3516Sopenharmony_ci plus, is output. For example: 1710e18e3516Sopenharmony_ci 1711e18e3516Sopenharmony_ci re> /\d?[A-E]\*/auto_callout 1712e18e3516Sopenharmony_ci data> E* 1713e18e3516Sopenharmony_ci --->E* 1714e18e3516Sopenharmony_ci +0 ^ \d? 1715e18e3516Sopenharmony_ci +3 ^ [A-E] 1716e18e3516Sopenharmony_ci +8 ^^ \* 1717e18e3516Sopenharmony_ci +10 ^ ^ 1718e18e3516Sopenharmony_ci 0: E* 1719e18e3516Sopenharmony_ci 1720e18e3516Sopenharmony_ci If a pattern contains (*MARK) items, an additional line is output when- 1721e18e3516Sopenharmony_ci ever a change of latest mark is passed to the callout function. For ex- 1722e18e3516Sopenharmony_ci ample: 1723e18e3516Sopenharmony_ci 1724e18e3516Sopenharmony_ci re> /a(*MARK:X)bc/auto_callout 1725e18e3516Sopenharmony_ci data> abc 1726e18e3516Sopenharmony_ci --->abc 1727e18e3516Sopenharmony_ci +0 ^ a 1728e18e3516Sopenharmony_ci +1 ^^ (*MARK:X) 1729e18e3516Sopenharmony_ci +10 ^^ b 1730e18e3516Sopenharmony_ci Latest Mark: X 1731e18e3516Sopenharmony_ci +11 ^ ^ c 1732e18e3516Sopenharmony_ci +12 ^ ^ 1733e18e3516Sopenharmony_ci 0: abc 1734e18e3516Sopenharmony_ci 1735e18e3516Sopenharmony_ci The mark changes between matching "a" and "b", but stays the same for 1736e18e3516Sopenharmony_ci the rest of the match, so nothing more is output. If, as a result of 1737e18e3516Sopenharmony_ci backtracking, the mark reverts to being unset, the text "<unset>" is 1738e18e3516Sopenharmony_ci output. 1739e18e3516Sopenharmony_ci 1740e18e3516Sopenharmony_ci Callouts with string arguments 1741e18e3516Sopenharmony_ci 1742e18e3516Sopenharmony_ci The output for a callout with a string argument is similar, except that 1743e18e3516Sopenharmony_ci instead of outputting a callout number before the position indicators, 1744e18e3516Sopenharmony_ci the callout string and its offset in the pattern string are output be- 1745e18e3516Sopenharmony_ci fore the reflection of the subject string, and the subject string is 1746e18e3516Sopenharmony_ci reflected for each callout. For example: 1747e18e3516Sopenharmony_ci 1748e18e3516Sopenharmony_ci re> /^ab(?C'first')cd(?C"second")ef/ 1749e18e3516Sopenharmony_ci data> abcdefg 1750e18e3516Sopenharmony_ci Callout (7): 'first' 1751e18e3516Sopenharmony_ci --->abcdefg 1752e18e3516Sopenharmony_ci ^ ^ c 1753e18e3516Sopenharmony_ci Callout (20): "second" 1754e18e3516Sopenharmony_ci --->abcdefg 1755e18e3516Sopenharmony_ci ^ ^ e 1756e18e3516Sopenharmony_ci 0: abcdef 1757e18e3516Sopenharmony_ci 1758e18e3516Sopenharmony_ci 1759e18e3516Sopenharmony_ci Callout modifiers 1760e18e3516Sopenharmony_ci 1761e18e3516Sopenharmony_ci The callout function in pcre2test returns zero (carry on matching) by 1762e18e3516Sopenharmony_ci default, but you can use a callout_fail modifier in a subject line to 1763e18e3516Sopenharmony_ci change this and other parameters of the callout (see below). 1764e18e3516Sopenharmony_ci 1765e18e3516Sopenharmony_ci If the callout_capture modifier is set, the current captured groups are 1766e18e3516Sopenharmony_ci output when a callout occurs. This is useful only for non-DFA matching, 1767e18e3516Sopenharmony_ci as pcre2_dfa_match() does not support capturing, so no captures are 1768e18e3516Sopenharmony_ci ever shown. 1769e18e3516Sopenharmony_ci 1770e18e3516Sopenharmony_ci The normal callout output, showing the callout number or pattern offset 1771e18e3516Sopenharmony_ci (as described above) is suppressed if the callout_no_where modifier is 1772e18e3516Sopenharmony_ci set. 1773e18e3516Sopenharmony_ci 1774e18e3516Sopenharmony_ci When using the interpretive matching function pcre2_match() without 1775e18e3516Sopenharmony_ci JIT, setting the callout_extra modifier causes additional output from 1776e18e3516Sopenharmony_ci pcre2test's callout function to be generated. For the first callout in 1777e18e3516Sopenharmony_ci a match attempt at a new starting position in the subject, "New match 1778e18e3516Sopenharmony_ci attempt" is output. If there has been a backtrack since the last call- 1779e18e3516Sopenharmony_ci out (or start of matching if this is the first callout), "Backtrack" is 1780e18e3516Sopenharmony_ci output, followed by "No other matching paths" if the backtrack ended 1781e18e3516Sopenharmony_ci the previous match attempt. For example: 1782e18e3516Sopenharmony_ci 1783e18e3516Sopenharmony_ci re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess 1784e18e3516Sopenharmony_ci data> aac\=callout_extra 1785e18e3516Sopenharmony_ci New match attempt 1786e18e3516Sopenharmony_ci --->aac 1787e18e3516Sopenharmony_ci +0 ^ ( 1788e18e3516Sopenharmony_ci +1 ^ a+ 1789e18e3516Sopenharmony_ci +3 ^ ^ ) 1790e18e3516Sopenharmony_ci +4 ^ ^ b 1791e18e3516Sopenharmony_ci Backtrack 1792e18e3516Sopenharmony_ci --->aac 1793e18e3516Sopenharmony_ci +3 ^^ ) 1794e18e3516Sopenharmony_ci +4 ^^ b 1795e18e3516Sopenharmony_ci Backtrack 1796e18e3516Sopenharmony_ci No other matching paths 1797e18e3516Sopenharmony_ci New match attempt 1798e18e3516Sopenharmony_ci --->aac 1799e18e3516Sopenharmony_ci +0 ^ ( 1800e18e3516Sopenharmony_ci +1 ^ a+ 1801e18e3516Sopenharmony_ci +3 ^^ ) 1802e18e3516Sopenharmony_ci +4 ^^ b 1803e18e3516Sopenharmony_ci Backtrack 1804e18e3516Sopenharmony_ci No other matching paths 1805e18e3516Sopenharmony_ci New match attempt 1806e18e3516Sopenharmony_ci --->aac 1807e18e3516Sopenharmony_ci +0 ^ ( 1808e18e3516Sopenharmony_ci +1 ^ a+ 1809e18e3516Sopenharmony_ci Backtrack 1810e18e3516Sopenharmony_ci No other matching paths 1811e18e3516Sopenharmony_ci New match attempt 1812e18e3516Sopenharmony_ci --->aac 1813e18e3516Sopenharmony_ci +0 ^ ( 1814e18e3516Sopenharmony_ci +1 ^ a+ 1815e18e3516Sopenharmony_ci No match 1816e18e3516Sopenharmony_ci 1817e18e3516Sopenharmony_ci Notice that various optimizations must be turned off if you want all 1818e18e3516Sopenharmony_ci possible matching paths to be scanned. If no_start_optimize is not 1819e18e3516Sopenharmony_ci used, there is an immediate "no match", without any callouts, because 1820e18e3516Sopenharmony_ci the starting optimization fails to find "b" in the subject, which it 1821e18e3516Sopenharmony_ci knows must be present for any match. If no_auto_possess is not used, 1822e18e3516Sopenharmony_ci the "a+" item is turned into "a++", which reduces the number of back- 1823e18e3516Sopenharmony_ci tracks. 1824e18e3516Sopenharmony_ci 1825e18e3516Sopenharmony_ci The callout_extra modifier has no effect if used with the DFA matching 1826e18e3516Sopenharmony_ci function, or with JIT. 1827e18e3516Sopenharmony_ci 1828e18e3516Sopenharmony_ci Return values from callouts 1829e18e3516Sopenharmony_ci 1830e18e3516Sopenharmony_ci The default return from the callout function is zero, which allows 1831e18e3516Sopenharmony_ci matching to continue. The callout_fail modifier can be given one or two 1832e18e3516Sopenharmony_ci numbers. If there is only one number, 1 is returned instead of 0 (caus- 1833e18e3516Sopenharmony_ci ing matching to backtrack) when a callout of that number is reached. If 1834e18e3516Sopenharmony_ci two numbers (<n>:<m>) are given, 1 is returned when callout <n> is 1835e18e3516Sopenharmony_ci reached and there have been at least <m> callouts. The callout_error 1836e18e3516Sopenharmony_ci modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus- 1837e18e3516Sopenharmony_ci ing the entire matching process to be aborted. If both these modifiers 1838e18e3516Sopenharmony_ci are set for the same callout number, callout_error takes precedence. 1839e18e3516Sopenharmony_ci Note that callouts with string arguments are always given the number 1840e18e3516Sopenharmony_ci zero. 1841e18e3516Sopenharmony_ci 1842e18e3516Sopenharmony_ci The callout_data modifier can be given an unsigned or a negative num- 1843e18e3516Sopenharmony_ci ber. This is set as the "user data" that is passed to the matching 1844e18e3516Sopenharmony_ci function, and passed back when the callout function is invoked. Any 1845e18e3516Sopenharmony_ci value other than zero is used as a return from pcre2test's callout 1846e18e3516Sopenharmony_ci function. 1847e18e3516Sopenharmony_ci 1848e18e3516Sopenharmony_ci Inserting callouts can be helpful when using pcre2test to check compli- 1849e18e3516Sopenharmony_ci cated regular expressions. For further information about callouts, see 1850e18e3516Sopenharmony_ci the pcre2callout documentation. 1851e18e3516Sopenharmony_ci 1852e18e3516Sopenharmony_ci 1853e18e3516Sopenharmony_ciNON-PRINTING CHARACTERS 1854e18e3516Sopenharmony_ci 1855e18e3516Sopenharmony_ci When pcre2test is outputting text in the compiled version of a pattern, 1856e18e3516Sopenharmony_ci bytes other than 32-126 are always treated as non-printing characters 1857e18e3516Sopenharmony_ci and are therefore shown as hex escapes. 1858e18e3516Sopenharmony_ci 1859e18e3516Sopenharmony_ci When pcre2test is outputting text that is a matched part of a subject 1860e18e3516Sopenharmony_ci string, it behaves in the same way, unless a different locale has been 1861e18e3516Sopenharmony_ci set for the pattern (using the locale modifier). In this case, the is- 1862e18e3516Sopenharmony_ci print() function is used to distinguish printing and non-printing char- 1863e18e3516Sopenharmony_ci acters. 1864e18e3516Sopenharmony_ci 1865e18e3516Sopenharmony_ci 1866e18e3516Sopenharmony_ciSAVING AND RESTORING COMPILED PATTERNS 1867e18e3516Sopenharmony_ci 1868e18e3516Sopenharmony_ci It is possible to save compiled patterns on disc or elsewhere, and 1869e18e3516Sopenharmony_ci reload them later, subject to a number of restrictions. JIT data cannot 1870e18e3516Sopenharmony_ci be saved. The host on which the patterns are reloaded must be running 1871e18e3516Sopenharmony_ci the same version of PCRE2, with the same code unit width, and must also 1872e18e3516Sopenharmony_ci have the same endianness, pointer width and PCRE2_SIZE type. Before 1873e18e3516Sopenharmony_ci compiled patterns can be saved they must be serialized, that is, con- 1874e18e3516Sopenharmony_ci verted to a stream of bytes. A single byte stream may contain any num- 1875e18e3516Sopenharmony_ci ber of compiled patterns, but they must all use the same character ta- 1876e18e3516Sopenharmony_ci bles. A single copy of the tables is included in the byte stream (its 1877e18e3516Sopenharmony_ci size is 1088 bytes). 1878e18e3516Sopenharmony_ci 1879e18e3516Sopenharmony_ci The functions whose names begin with pcre2_serialize_ are used for se- 1880e18e3516Sopenharmony_ci rializing and de-serializing. They are described in the pcre2serialize 1881e18e3516Sopenharmony_ci documentation. In this section we describe the features of pcre2test 1882e18e3516Sopenharmony_ci that can be used to test these functions. 1883e18e3516Sopenharmony_ci 1884e18e3516Sopenharmony_ci Note that "serialization" in PCRE2 does not convert compiled patterns 1885e18e3516Sopenharmony_ci to an abstract format like Java or .NET. It just makes a reloadable 1886e18e3516Sopenharmony_ci byte code stream. Hence the restrictions on reloading mentioned above. 1887e18e3516Sopenharmony_ci 1888e18e3516Sopenharmony_ci In pcre2test, when a pattern with push modifier is successfully com- 1889e18e3516Sopenharmony_ci piled, it is pushed onto a stack of compiled patterns, and pcre2test 1890e18e3516Sopenharmony_ci expects the next line to contain a new pattern (or command) instead of 1891e18e3516Sopenharmony_ci a subject line. By contrast, the pushcopy modifier causes a copy of the 1892e18e3516Sopenharmony_ci compiled pattern to be stacked, leaving the original available for im- 1893e18e3516Sopenharmony_ci mediate matching. By using push and/or pushcopy, a number of patterns 1894e18e3516Sopenharmony_ci can be compiled and retained. These modifiers are incompatible with 1895e18e3516Sopenharmony_ci posix, and control modifiers that act at match time are ignored (with a 1896e18e3516Sopenharmony_ci message) for the stacked patterns. The jitverify modifier applies only 1897e18e3516Sopenharmony_ci at compile time. 1898e18e3516Sopenharmony_ci 1899e18e3516Sopenharmony_ci The command 1900e18e3516Sopenharmony_ci 1901e18e3516Sopenharmony_ci #save <filename> 1902e18e3516Sopenharmony_ci 1903e18e3516Sopenharmony_ci causes all the stacked patterns to be serialized and the result written 1904e18e3516Sopenharmony_ci to the named file. Afterwards, all the stacked patterns are freed. The 1905e18e3516Sopenharmony_ci command 1906e18e3516Sopenharmony_ci 1907e18e3516Sopenharmony_ci #load <filename> 1908e18e3516Sopenharmony_ci 1909e18e3516Sopenharmony_ci reads the data in the file, and then arranges for it to be de-serial- 1910e18e3516Sopenharmony_ci ized, with the resulting compiled patterns added to the pattern stack. 1911e18e3516Sopenharmony_ci The pattern on the top of the stack can be retrieved by the #pop com- 1912e18e3516Sopenharmony_ci mand, which must be followed by lines of subjects that are to be 1913e18e3516Sopenharmony_ci matched with the pattern, terminated as usual by an empty line or end 1914e18e3516Sopenharmony_ci of file. This command may be followed by a modifier list containing 1915e18e3516Sopenharmony_ci only control modifiers that act after a pattern has been compiled. In 1916e18e3516Sopenharmony_ci particular, hex, posix, posix_nosub, push, and pushcopy are not al- 1917e18e3516Sopenharmony_ci lowed, nor are any option-setting modifiers. The JIT modifiers are, 1918e18e3516Sopenharmony_ci however permitted. Here is an example that saves and reloads two pat- 1919e18e3516Sopenharmony_ci terns. 1920e18e3516Sopenharmony_ci 1921e18e3516Sopenharmony_ci /abc/push 1922e18e3516Sopenharmony_ci /xyz/push 1923e18e3516Sopenharmony_ci #save tempfile 1924e18e3516Sopenharmony_ci #load tempfile 1925e18e3516Sopenharmony_ci #pop info 1926e18e3516Sopenharmony_ci xyz 1927e18e3516Sopenharmony_ci 1928e18e3516Sopenharmony_ci #pop jit,bincode 1929e18e3516Sopenharmony_ci abc 1930e18e3516Sopenharmony_ci 1931e18e3516Sopenharmony_ci If jitverify is used with #pop, it does not automatically imply jit, 1932e18e3516Sopenharmony_ci which is different behaviour from when it is used on a pattern. 1933e18e3516Sopenharmony_ci 1934e18e3516Sopenharmony_ci The #popcopy command is analogous to the pushcopy modifier in that it 1935e18e3516Sopenharmony_ci makes current a copy of the topmost stack pattern, leaving the original 1936e18e3516Sopenharmony_ci still on the stack. 1937e18e3516Sopenharmony_ci 1938e18e3516Sopenharmony_ci 1939e18e3516Sopenharmony_ciSEE ALSO 1940e18e3516Sopenharmony_ci 1941e18e3516Sopenharmony_ci pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3), 1942e18e3516Sopenharmony_ci pcre2partial(d), pcre2pattern(3), pcre2serialize(3). 1943e18e3516Sopenharmony_ci 1944e18e3516Sopenharmony_ci 1945e18e3516Sopenharmony_ciAUTHOR 1946e18e3516Sopenharmony_ci 1947e18e3516Sopenharmony_ci Philip Hazel 1948e18e3516Sopenharmony_ci Retired from University Computing Service 1949e18e3516Sopenharmony_ci Cambridge, England. 1950e18e3516Sopenharmony_ci 1951e18e3516Sopenharmony_ci 1952e18e3516Sopenharmony_ciREVISION 1953e18e3516Sopenharmony_ci 1954e18e3516Sopenharmony_ci Last updated: 27 July 2022 1955e18e3516Sopenharmony_ci Copyright (c) 1997-2022 University of Cambridge. 1956