1e18e3516Sopenharmony_ciPCRE2TEST(1)                General Commands Manual               PCRE2TEST(1)
2e18e3516Sopenharmony_ci
3e18e3516Sopenharmony_ci
4e18e3516Sopenharmony_ci
5e18e3516Sopenharmony_ciNAME
6e18e3516Sopenharmony_ci       pcre2test - a program for testing Perl-compatible regular expressions.
7e18e3516Sopenharmony_ci
8e18e3516Sopenharmony_ciSYNOPSIS
9e18e3516Sopenharmony_ci
10e18e3516Sopenharmony_ci       pcre2test [options] [input file [output file]]
11e18e3516Sopenharmony_ci
12e18e3516Sopenharmony_ci       pcre2test is a test program for the PCRE2 regular expression libraries,
13e18e3516Sopenharmony_ci       but it can also be used for  experimenting  with  regular  expressions.
14e18e3516Sopenharmony_ci       This  document  describes the features of the test program; for details
15e18e3516Sopenharmony_ci       of the regular expressions themselves, see the pcre2pattern  documenta-
16e18e3516Sopenharmony_ci       tion.  For  details  of  the PCRE2 library function calls and their op-
17e18e3516Sopenharmony_ci       tions, see the pcre2api documentation.
18e18e3516Sopenharmony_ci
19e18e3516Sopenharmony_ci       The input for pcre2test is a sequence of  regular  expression  patterns
20e18e3516Sopenharmony_ci       and  subject  strings  to  be matched. There are also command lines for
21e18e3516Sopenharmony_ci       setting defaults and controlling some special actions. The output shows
22e18e3516Sopenharmony_ci       the  result  of  each  match attempt. Modifiers on external or internal
23e18e3516Sopenharmony_ci       command lines, the patterns, and the subject lines specify PCRE2  func-
24e18e3516Sopenharmony_ci       tion  options, control how the subject is processed, and what output is
25e18e3516Sopenharmony_ci       produced.
26e18e3516Sopenharmony_ci
27e18e3516Sopenharmony_ci       There are many obscure modifiers, some of which  are  specifically  de-
28e18e3516Sopenharmony_ci       signed  for use in conjunction with the test script and data files that
29e18e3516Sopenharmony_ci       are distributed as part of PCRE2.  All  the  modifiers  are  documented
30e18e3516Sopenharmony_ci       here, some without much justification, but many of them are unlikely to
31e18e3516Sopenharmony_ci       be of use except when testing the libraries.
32e18e3516Sopenharmony_ci
33e18e3516Sopenharmony_ci
34e18e3516Sopenharmony_ciPCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
35e18e3516Sopenharmony_ci
36e18e3516Sopenharmony_ci       Different versions of the PCRE2 library can be built to support charac-
37e18e3516Sopenharmony_ci       ter  strings  that  are encoded in 8-bit, 16-bit, or 32-bit code units.
38e18e3516Sopenharmony_ci       One, two, or all three of these libraries  may  be  simultaneously  in-
39e18e3516Sopenharmony_ci       stalled.  The  pcre2test program can be used to test all the libraries.
40e18e3516Sopenharmony_ci       However, its own input and output are  always  in  8-bit  format.  When
41e18e3516Sopenharmony_ci       testing  the  16-bit  or 32-bit libraries, patterns and subject strings
42e18e3516Sopenharmony_ci       are converted to 16-bit or 32-bit format before being passed to the li-
43e18e3516Sopenharmony_ci       brary  functions.  Results  are  converted back to 8-bit code units for
44e18e3516Sopenharmony_ci       output.
45e18e3516Sopenharmony_ci
46e18e3516Sopenharmony_ci       In the rest of this document, the names of library functions and struc-
47e18e3516Sopenharmony_ci       tures  are given in generic form, for example, pcre2_compile(). The ac-
48e18e3516Sopenharmony_ci       tual names used in the libraries have a suffix _8, _16, or _32, as  ap-
49e18e3516Sopenharmony_ci       propriate.
50e18e3516Sopenharmony_ci
51e18e3516Sopenharmony_ci
52e18e3516Sopenharmony_ciINPUT ENCODING
53e18e3516Sopenharmony_ci
54e18e3516Sopenharmony_ci       Input  to  pcre2test is processed line by line, either by calling the C
55e18e3516Sopenharmony_ci       library's fgets() function, or via the libreadline or libedit  library.
56e18e3516Sopenharmony_ci       In  some Windows environments character 26 (hex 1A) causes an immediate
57e18e3516Sopenharmony_ci       end of file, and no further data is read, so this character  should  be
58e18e3516Sopenharmony_ci       avoided unless you really want that action.
59e18e3516Sopenharmony_ci
60e18e3516Sopenharmony_ci       The  input  is  processed using using C's string functions, so must not
61e18e3516Sopenharmony_ci       contain binary zeros, even though in  Unix-like  environments,  fgets()
62e18e3516Sopenharmony_ci       treats  any  bytes  other  than newline as data characters. An error is
63e18e3516Sopenharmony_ci       generated if a binary zero is encountered. By default subject lines are
64e18e3516Sopenharmony_ci       processed for backslash escapes, which makes it possible to include any
65e18e3516Sopenharmony_ci       data value in strings that are passed to the library for matching.  For
66e18e3516Sopenharmony_ci       patterns,  there  is a facility for specifying some or all of the 8-bit
67e18e3516Sopenharmony_ci       input characters as hexadecimal pairs, which makes it possible  to  in-
68e18e3516Sopenharmony_ci       clude binary zeros.
69e18e3516Sopenharmony_ci
70e18e3516Sopenharmony_ci   Input for the 16-bit and 32-bit libraries
71e18e3516Sopenharmony_ci
72e18e3516Sopenharmony_ci       When testing the 16-bit or 32-bit libraries, there is a need to be able
73e18e3516Sopenharmony_ci       to generate character code points greater than 255 in the strings  that
74e18e3516Sopenharmony_ci       are  passed to the library. For subject lines, backslash escapes can be
75e18e3516Sopenharmony_ci       used. In addition, when the utf modifier (see "Setting compilation  op-
76e18e3516Sopenharmony_ci       tions"  below)  is set, the pattern and any following subject lines are
77e18e3516Sopenharmony_ci       interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as  ap-
78e18e3516Sopenharmony_ci       propriate.
79e18e3516Sopenharmony_ci
80e18e3516Sopenharmony_ci       For  non-UTF testing of wide characters, the utf8_input modifier can be
81e18e3516Sopenharmony_ci       used. This is mutually exclusive with  utf,  and  is  allowed  only  in
82e18e3516Sopenharmony_ci       16-bit  or  32-bit  mode.  It  causes the pattern and following subject
83e18e3516Sopenharmony_ci       lines to be treated as UTF-8 according to the original definition  (RFC
84e18e3516Sopenharmony_ci       2279), which allows for character values up to 0x7fffffff. Each charac-
85e18e3516Sopenharmony_ci       ter is placed in one 16-bit or 32-bit code unit (in  the  16-bit  case,
86e18e3516Sopenharmony_ci       values greater than 0xffff cause an error to occur).
87e18e3516Sopenharmony_ci
88e18e3516Sopenharmony_ci       UTF-8  (in  its  original definition) is not capable of encoding values
89e18e3516Sopenharmony_ci       greater than 0x7fffffff, but such values can be handled by  the  32-bit
90e18e3516Sopenharmony_ci       library. When testing this library in non-UTF mode with utf8_input set,
91e18e3516Sopenharmony_ci       if any character is preceded by the byte 0xff (which is an invalid byte
92e18e3516Sopenharmony_ci       in  UTF-8)  0x80000000  is  added to the character's value. This is the
93e18e3516Sopenharmony_ci       only way of passing such code points in a pattern string.  For  subject
94e18e3516Sopenharmony_ci       strings, using an escape sequence is preferable.
95e18e3516Sopenharmony_ci
96e18e3516Sopenharmony_ci
97e18e3516Sopenharmony_ciCOMMAND LINE OPTIONS
98e18e3516Sopenharmony_ci
99e18e3516Sopenharmony_ci       -8        If the 8-bit library has been built, this option causes it to
100e18e3516Sopenharmony_ci                 be used (this is the default). If the 8-bit library  has  not
101e18e3516Sopenharmony_ci                 been built, this option causes an error.
102e18e3516Sopenharmony_ci
103e18e3516Sopenharmony_ci       -16       If  the  16-bit library has been built, this option causes it
104e18e3516Sopenharmony_ci                 to be used. If only the 16-bit library has been  built,  this
105e18e3516Sopenharmony_ci                 is  the  default.  If  the 16-bit library has not been built,
106e18e3516Sopenharmony_ci                 this option causes an error.
107e18e3516Sopenharmony_ci
108e18e3516Sopenharmony_ci       -32       If the 32-bit library has been built, this option  causes  it
109e18e3516Sopenharmony_ci                 to  be  used. If only the 32-bit library has been built, this
110e18e3516Sopenharmony_ci                 is the default. If the 32-bit library  has  not  been  built,
111e18e3516Sopenharmony_ci                 this option causes an error.
112e18e3516Sopenharmony_ci
113e18e3516Sopenharmony_ci       -ac       Behave as if each pattern has the auto_callout modifier, that
114e18e3516Sopenharmony_ci                 is, insert automatic callouts into every pattern that is com-
115e18e3516Sopenharmony_ci                 piled.
116e18e3516Sopenharmony_ci
117e18e3516Sopenharmony_ci       -AC       As  for  -ac,  but in addition behave as if each subject line
118e18e3516Sopenharmony_ci                 has the callout_extra modifier, that is, show additional  in-
119e18e3516Sopenharmony_ci                 formation from callouts.
120e18e3516Sopenharmony_ci
121e18e3516Sopenharmony_ci       -b        Behave  as  if each pattern has the fullbincode modifier; the
122e18e3516Sopenharmony_ci                 full internal binary form of the pattern is output after com-
123e18e3516Sopenharmony_ci                 pilation.
124e18e3516Sopenharmony_ci
125e18e3516Sopenharmony_ci       -C        Output  the  version  number  of  the  PCRE2 library, and all
126e18e3516Sopenharmony_ci                 available information about the optional  features  that  are
127e18e3516Sopenharmony_ci                 included,  and  then  exit with zero exit code. All other op-
128e18e3516Sopenharmony_ci                 tions are ignored. If both -C and -LM are present,  whichever
129e18e3516Sopenharmony_ci                 is first is recognized.
130e18e3516Sopenharmony_ci
131e18e3516Sopenharmony_ci       -C option Output  information  about a specific build-time option, then
132e18e3516Sopenharmony_ci                 exit. This functionality is intended for use in scripts  such
133e18e3516Sopenharmony_ci                 as  RunTest.  The  following options output the value and set
134e18e3516Sopenharmony_ci                 the exit code as indicated:
135e18e3516Sopenharmony_ci
136e18e3516Sopenharmony_ci                   ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
137e18e3516Sopenharmony_ci                                0x15 or 0x25
138e18e3516Sopenharmony_ci                                0 if used in an ASCII environment
139e18e3516Sopenharmony_ci                                exit code is always 0
140e18e3516Sopenharmony_ci                   linksize   the configured internal link size (2, 3, or 4)
141e18e3516Sopenharmony_ci                                exit code is set to the link size
142e18e3516Sopenharmony_ci                   newline    the default newline setting:
143e18e3516Sopenharmony_ci                                CR, LF, CRLF, ANYCRLF, ANY, or NUL
144e18e3516Sopenharmony_ci                                exit code is always 0
145e18e3516Sopenharmony_ci                   bsr        the default setting for what \R matches:
146e18e3516Sopenharmony_ci                                ANYCRLF or ANY
147e18e3516Sopenharmony_ci                                exit code is always 0
148e18e3516Sopenharmony_ci
149e18e3516Sopenharmony_ci                 The following options output 1 for true or 0 for  false,  and
150e18e3516Sopenharmony_ci                 set the exit code to the same value:
151e18e3516Sopenharmony_ci
152e18e3516Sopenharmony_ci                   backslash-C  \C is supported (not locked out)
153e18e3516Sopenharmony_ci                   ebcdic       compiled for an EBCDIC environment
154e18e3516Sopenharmony_ci                   jit          just-in-time support is available
155e18e3516Sopenharmony_ci                   pcre2-16     the 16-bit library was built
156e18e3516Sopenharmony_ci                   pcre2-32     the 32-bit library was built
157e18e3516Sopenharmony_ci                   pcre2-8      the 8-bit library was built
158e18e3516Sopenharmony_ci                   unicode      Unicode support is available
159e18e3516Sopenharmony_ci
160e18e3516Sopenharmony_ci                 If  an  unknown  option is given, an error message is output;
161e18e3516Sopenharmony_ci                 the exit code is 0.
162e18e3516Sopenharmony_ci
163e18e3516Sopenharmony_ci       -d        Behave as if each pattern has the debug modifier; the  inter-
164e18e3516Sopenharmony_ci                 nal form and information about the compiled pattern is output
165e18e3516Sopenharmony_ci                 after compilation; -d is equivalent to -b -i.
166e18e3516Sopenharmony_ci
167e18e3516Sopenharmony_ci       -dfa      Behave as if each subject line has the dfa modifier; matching
168e18e3516Sopenharmony_ci                 is  done  using the pcre2_dfa_match() function instead of the
169e18e3516Sopenharmony_ci                 default pcre2_match().
170e18e3516Sopenharmony_ci
171e18e3516Sopenharmony_ci       -error number[,number,...]
172e18e3516Sopenharmony_ci                 Call pcre2_get_error_message() for each of the error  numbers
173e18e3516Sopenharmony_ci                 in  the  comma-separated list, display the resulting messages
174e18e3516Sopenharmony_ci                 on the standard output, then exit with zero  exit  code.  The
175e18e3516Sopenharmony_ci                 numbers  may  be  positive or negative. This is a convenience
176e18e3516Sopenharmony_ci                 facility for PCRE2 maintainers.
177e18e3516Sopenharmony_ci
178e18e3516Sopenharmony_ci       -help     Output a brief summary these options and then exit.
179e18e3516Sopenharmony_ci
180e18e3516Sopenharmony_ci       -i        Behave as if each pattern has the info modifier;  information
181e18e3516Sopenharmony_ci                 about the compiled pattern is given after compilation.
182e18e3516Sopenharmony_ci
183e18e3516Sopenharmony_ci       -jit      Behave  as  if  each pattern line has the jit modifier; after
184e18e3516Sopenharmony_ci                 successful compilation, each pattern is passed to  the  just-
185e18e3516Sopenharmony_ci                 in-time compiler, if available.
186e18e3516Sopenharmony_ci
187e18e3516Sopenharmony_ci       -jitfast  Behave  as if each pattern line has the jitfast modifier; af-
188e18e3516Sopenharmony_ci                 ter successful compilation, each pattern  is  passed  to  the
189e18e3516Sopenharmony_ci                 just-in-time compiler, if available, and each subject line is
190e18e3516Sopenharmony_ci                 passed directly to the JIT matcher via its "fast path".
191e18e3516Sopenharmony_ci
192e18e3516Sopenharmony_ci       -jitverify
193e18e3516Sopenharmony_ci                 Behave as if each pattern line has  the  jitverify  modifier;
194e18e3516Sopenharmony_ci                 after  successful  compilation, each pattern is passed to the
195e18e3516Sopenharmony_ci                 just-in-time compiler, if available, and the use of  JIT  for
196e18e3516Sopenharmony_ci                 matching is verified.
197e18e3516Sopenharmony_ci
198e18e3516Sopenharmony_ci       -LM       List modifiers: write a list of available pattern and subject
199e18e3516Sopenharmony_ci                 modifiers to the standard output, then exit  with  zero  exit
200e18e3516Sopenharmony_ci                 code.  All other options are ignored.  If both -C and any -Lx
201e18e3516Sopenharmony_ci                 options are present, whichever is first is recognized.
202e18e3516Sopenharmony_ci
203e18e3516Sopenharmony_ci       -LP       List properties: write a list of recognized  Unicode  proper-
204e18e3516Sopenharmony_ci                 ties  to  the standard output, then exit with zero exit code.
205e18e3516Sopenharmony_ci                 All other options are ignored. If both -C and any -Lx options
206e18e3516Sopenharmony_ci                 are present, whichever is first is recognized.
207e18e3516Sopenharmony_ci
208e18e3516Sopenharmony_ci       -LS       List scripts: write a list of recognized Unicode script names
209e18e3516Sopenharmony_ci                 to the standard output, then exit with zero  exit  code.  All
210e18e3516Sopenharmony_ci                 other options are ignored. If both -C and any -Lx options are
211e18e3516Sopenharmony_ci                 present, whichever is first is recognized.
212e18e3516Sopenharmony_ci
213e18e3516Sopenharmony_ci       -pattern modifier-list
214e18e3516Sopenharmony_ci                 Behave as if each pattern line contains the given modifiers.
215e18e3516Sopenharmony_ci
216e18e3516Sopenharmony_ci       -q        Do not output the version number of pcre2test at the start of
217e18e3516Sopenharmony_ci                 execution.
218e18e3516Sopenharmony_ci
219e18e3516Sopenharmony_ci       -S size   On  Unix-like  systems, set the size of the run-time stack to
220e18e3516Sopenharmony_ci                 size mebibytes (units of 1024*1024 bytes).
221e18e3516Sopenharmony_ci
222e18e3516Sopenharmony_ci       -subject modifier-list
223e18e3516Sopenharmony_ci                 Behave as if each subject line contains the given modifiers.
224e18e3516Sopenharmony_ci
225e18e3516Sopenharmony_ci       -t        Run each compile and match many times with a timer, and  out-
226e18e3516Sopenharmony_ci                 put  the  resulting  times  per compile or match. When JIT is
227e18e3516Sopenharmony_ci                 used, separate times are given for the  initial  compile  and
228e18e3516Sopenharmony_ci                 the  JIT  compile.  You  can control the number of iterations
229e18e3516Sopenharmony_ci                 that are used for timing by following -t with a number (as  a
230e18e3516Sopenharmony_ci                 separate  item  on  the command line). For example, "-t 1000"
231e18e3516Sopenharmony_ci                 iterates 1000 times. The default is to iterate 500,000 times.
232e18e3516Sopenharmony_ci
233e18e3516Sopenharmony_ci       -tm       This is like -t except that it times only the matching phase,
234e18e3516Sopenharmony_ci                 not the compile phase.
235e18e3516Sopenharmony_ci
236e18e3516Sopenharmony_ci       -T -TM    These  behave like -t and -tm, but in addition, at the end of
237e18e3516Sopenharmony_ci                 a run, the total times for all compiles and matches are  out-
238e18e3516Sopenharmony_ci                 put.
239e18e3516Sopenharmony_ci
240e18e3516Sopenharmony_ci       -version  Output the PCRE2 version number and then exit.
241e18e3516Sopenharmony_ci
242e18e3516Sopenharmony_ci
243e18e3516Sopenharmony_ciDESCRIPTION
244e18e3516Sopenharmony_ci
245e18e3516Sopenharmony_ci       If  pcre2test  is given two filename arguments, it reads from the first
246e18e3516Sopenharmony_ci       and writes to the second. If the first name is "-", input is taken from
247e18e3516Sopenharmony_ci       the  standard  input. If pcre2test is given only one argument, it reads
248e18e3516Sopenharmony_ci       from that file and writes to stdout. Otherwise, it reads from stdin and
249e18e3516Sopenharmony_ci       writes to stdout.
250e18e3516Sopenharmony_ci
251e18e3516Sopenharmony_ci       When  pcre2test  is  built,  a configuration option can specify that it
252e18e3516Sopenharmony_ci       should be linked with the libreadline or libedit library. When this  is
253e18e3516Sopenharmony_ci       done,  if the input is from a terminal, it is read using the readline()
254e18e3516Sopenharmony_ci       function. This provides line-editing and history facilities. The output
255e18e3516Sopenharmony_ci       from the -help option states whether or not readline() will be used.
256e18e3516Sopenharmony_ci
257e18e3516Sopenharmony_ci       The  program  handles  any number of tests, each of which consists of a
258e18e3516Sopenharmony_ci       set of input lines. Each set starts with a regular expression  pattern,
259e18e3516Sopenharmony_ci       followed by any number of subject lines to be matched against that pat-
260e18e3516Sopenharmony_ci       tern. In between sets of test data, command lines that begin with # may
261e18e3516Sopenharmony_ci       appear. This file format, with some restrictions, can also be processed
262e18e3516Sopenharmony_ci       by the perltest.sh script that is distributed with PCRE2 as a means  of
263e18e3516Sopenharmony_ci       checking that the behaviour of PCRE2 and Perl is the same. For a speci-
264e18e3516Sopenharmony_ci       fication of perltest.sh, see the comments near its beginning. See  also
265e18e3516Sopenharmony_ci       the #perltest command below.
266e18e3516Sopenharmony_ci
267e18e3516Sopenharmony_ci       When the input is a terminal, pcre2test prompts for each line of input,
268e18e3516Sopenharmony_ci       using "re>" to prompt for regular expression patterns, and  "data>"  to
269e18e3516Sopenharmony_ci       prompt  for subject lines. Command lines starting with # can be entered
270e18e3516Sopenharmony_ci       only in response to the "re>" prompt.
271e18e3516Sopenharmony_ci
272e18e3516Sopenharmony_ci       Each subject line is matched separately and independently. If you  want
273e18e3516Sopenharmony_ci       to do multi-line matches, you have to use the \n escape sequence (or \r
274e18e3516Sopenharmony_ci       or \r\n, etc., depending on the newline setting) in a  single  line  of
275e18e3516Sopenharmony_ci       input  to encode the newline sequences. There is no limit on the length
276e18e3516Sopenharmony_ci       of subject lines; the input buffer is automatically extended if  it  is
277e18e3516Sopenharmony_ci       too  small.  There  are  replication features that makes it possible to
278e18e3516Sopenharmony_ci       generate long repetitive pattern or subject  lines  without  having  to
279e18e3516Sopenharmony_ci       supply them explicitly.
280e18e3516Sopenharmony_ci
281e18e3516Sopenharmony_ci       An  empty  line  or  the end of the file signals the end of the subject
282e18e3516Sopenharmony_ci       lines for a test, at which point a new pattern or command line  is  ex-
283e18e3516Sopenharmony_ci       pected if there is still input to be read.
284e18e3516Sopenharmony_ci
285e18e3516Sopenharmony_ci
286e18e3516Sopenharmony_ciCOMMAND LINES
287e18e3516Sopenharmony_ci
288e18e3516Sopenharmony_ci       In  between sets of test data, a line that begins with # is interpreted
289e18e3516Sopenharmony_ci       as a command line. If the first character is followed by white space or
290e18e3516Sopenharmony_ci       an  exclamation  mark,  the  line is treated as a comment, and ignored.
291e18e3516Sopenharmony_ci       Otherwise, the following commands are recognized:
292e18e3516Sopenharmony_ci
293e18e3516Sopenharmony_ci         #forbid_utf
294e18e3516Sopenharmony_ci
295e18e3516Sopenharmony_ci       Subsequent  patterns  automatically  have   the   PCRE2_NEVER_UTF   and
296e18e3516Sopenharmony_ci       PCRE2_NEVER_UCP  options  set, which locks out the use of the PCRE2_UTF
297e18e3516Sopenharmony_ci       and PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start  of
298e18e3516Sopenharmony_ci       patterns.  This  command  also  forces an error if a subsequent pattern
299e18e3516Sopenharmony_ci       contains any occurrences of \P, \p, or \X, which  are  still  supported
300e18e3516Sopenharmony_ci       when  PCRE2_UTF  is not set, but which require Unicode property support
301e18e3516Sopenharmony_ci       to be included in the library.
302e18e3516Sopenharmony_ci
303e18e3516Sopenharmony_ci       This is a trigger guard that is used in test files to ensure  that  UTF
304e18e3516Sopenharmony_ci       or  Unicode property tests are not accidentally added to files that are
305e18e3516Sopenharmony_ci       used when Unicode support is  not  included  in  the  library.  Setting
306e18e3516Sopenharmony_ci       PCRE2_NEVER_UTF  and  PCRE2_NEVER_UCP as a default can also be obtained
307e18e3516Sopenharmony_ci       by the use of #pattern; the difference is that  #forbid_utf  cannot  be
308e18e3516Sopenharmony_ci       unset,  and the automatic options are not displayed in pattern informa-
309e18e3516Sopenharmony_ci       tion, to avoid cluttering up test output.
310e18e3516Sopenharmony_ci
311e18e3516Sopenharmony_ci         #load <filename>
312e18e3516Sopenharmony_ci
313e18e3516Sopenharmony_ci       This command is used to load a set of precompiled patterns from a file,
314e18e3516Sopenharmony_ci       as  described  in  the  section entitled "Saving and restoring compiled
315e18e3516Sopenharmony_ci       patterns" below.
316e18e3516Sopenharmony_ci
317e18e3516Sopenharmony_ci         #loadtables <filename>
318e18e3516Sopenharmony_ci
319e18e3516Sopenharmony_ci       This command is used to load a set of binary character tables that  can
320e18e3516Sopenharmony_ci       be  accessed  by  the tables=3 qualifier. Such tables can be created by
321e18e3516Sopenharmony_ci       the pcre2_dftables program with the -b option.
322e18e3516Sopenharmony_ci
323e18e3516Sopenharmony_ci         #newline_default [<newline-list>]
324e18e3516Sopenharmony_ci
325e18e3516Sopenharmony_ci       When PCRE2 is built, a default newline  convention  can  be  specified.
326e18e3516Sopenharmony_ci       This  determines which characters and/or character pairs are recognized
327e18e3516Sopenharmony_ci       as indicating a newline in a pattern or subject string. The default can
328e18e3516Sopenharmony_ci       be  overridden when a pattern is compiled. The standard test files con-
329e18e3516Sopenharmony_ci       tain tests of various newline conventions,  but  the  majority  of  the
330e18e3516Sopenharmony_ci       tests  expect  a  single  linefeed to be recognized as a newline by de-
331e18e3516Sopenharmony_ci       fault. Without special action the tests would fail when PCRE2  is  com-
332e18e3516Sopenharmony_ci       piled with either CR or CRLF as the default newline.
333e18e3516Sopenharmony_ci
334e18e3516Sopenharmony_ci       The #newline_default command specifies a list of newline types that are
335e18e3516Sopenharmony_ci       acceptable as the default. The types must be one of CR, LF, CRLF,  ANY-
336e18e3516Sopenharmony_ci       CRLF, ANY, or NUL (in upper or lower case), for example:
337e18e3516Sopenharmony_ci
338e18e3516Sopenharmony_ci         #newline_default LF Any anyCRLF
339e18e3516Sopenharmony_ci
340e18e3516Sopenharmony_ci       If the default newline is in the list, this command has no effect. Oth-
341e18e3516Sopenharmony_ci       erwise, except when testing the POSIX  API,  a  newline  modifier  that
342e18e3516Sopenharmony_ci       specifies the first newline convention in the list (LF in the above ex-
343e18e3516Sopenharmony_ci       ample) is added to any pattern that does not  already  have  a  newline
344e18e3516Sopenharmony_ci       modifier. If the newline list is empty, the feature is turned off. This
345e18e3516Sopenharmony_ci       command is present in a number of the standard test input files.
346e18e3516Sopenharmony_ci
347e18e3516Sopenharmony_ci       When the POSIX API is being tested there is no way to override the  de-
348e18e3516Sopenharmony_ci       fault newline convention, though it is possible to set the newline con-
349e18e3516Sopenharmony_ci       vention from within the pattern. A warning is given  if  the  posix  or
350e18e3516Sopenharmony_ci       posix_nosub  modifier is used when #newline_default would set a default
351e18e3516Sopenharmony_ci       for the non-POSIX API.
352e18e3516Sopenharmony_ci
353e18e3516Sopenharmony_ci         #pattern <modifier-list>
354e18e3516Sopenharmony_ci
355e18e3516Sopenharmony_ci       This command sets a default modifier list that applies  to  all  subse-
356e18e3516Sopenharmony_ci       quent patterns. Modifiers on a pattern can change these settings.
357e18e3516Sopenharmony_ci
358e18e3516Sopenharmony_ci         #perltest
359e18e3516Sopenharmony_ci
360e18e3516Sopenharmony_ci       This  line  is  used  in test files that can also be processed by perl-
361e18e3516Sopenharmony_ci       test.sh to confirm that Perl gives the same results  as  PCRE2.  Subse-
362e18e3516Sopenharmony_ci       quent  tests are checked for the use of pcre2test features that are in-
363e18e3516Sopenharmony_ci       compatible with the perltest.sh script.
364e18e3516Sopenharmony_ci
365e18e3516Sopenharmony_ci       Patterns must use '/' as their delimiter, and  only  certain  modifiers
366e18e3516Sopenharmony_ci       are  supported. Comment lines, #pattern commands, and #subject commands
367e18e3516Sopenharmony_ci       that set or unset "mark" are recognized and acted  on.  The  #perltest,
368e18e3516Sopenharmony_ci       #forbid_utf,  and  #newline_default  commands,  which are needed in the
369e18e3516Sopenharmony_ci       relevant pcre2test files, are silently ignored. All other command lines
370e18e3516Sopenharmony_ci       are  ignored,  but  give a warning message. The #perltest command helps
371e18e3516Sopenharmony_ci       detect tests that are accidentally put in the wrong  file  or  use  the
372e18e3516Sopenharmony_ci       wrong  delimiter.  For  more  details of the perltest.sh script see the
373e18e3516Sopenharmony_ci       comments it contains.
374e18e3516Sopenharmony_ci
375e18e3516Sopenharmony_ci         #pop [<modifiers>]
376e18e3516Sopenharmony_ci         #popcopy [<modifiers>]
377e18e3516Sopenharmony_ci
378e18e3516Sopenharmony_ci       These commands are used to manipulate the stack of  compiled  patterns,
379e18e3516Sopenharmony_ci       as  described  in  the  section entitled "Saving and restoring compiled
380e18e3516Sopenharmony_ci       patterns" below.
381e18e3516Sopenharmony_ci
382e18e3516Sopenharmony_ci         #save <filename>
383e18e3516Sopenharmony_ci
384e18e3516Sopenharmony_ci       This command is used to save a set of compiled patterns to a  file,  as
385e18e3516Sopenharmony_ci       described  in  the section entitled "Saving and restoring compiled pat-
386e18e3516Sopenharmony_ci       terns" below.
387e18e3516Sopenharmony_ci
388e18e3516Sopenharmony_ci         #subject <modifier-list>
389e18e3516Sopenharmony_ci
390e18e3516Sopenharmony_ci       This command sets a default modifier list that applies  to  all  subse-
391e18e3516Sopenharmony_ci       quent  subject lines. Modifiers on a subject line can change these set-
392e18e3516Sopenharmony_ci       tings.
393e18e3516Sopenharmony_ci
394e18e3516Sopenharmony_ci
395e18e3516Sopenharmony_ciMODIFIER SYNTAX
396e18e3516Sopenharmony_ci
397e18e3516Sopenharmony_ci       Modifier lists are used with both pattern and subject lines. Items in a
398e18e3516Sopenharmony_ci       list are separated by commas followed by optional white space. Trailing
399e18e3516Sopenharmony_ci       whitespace in a modifier list is ignored. Some modifiers may  be  given
400e18e3516Sopenharmony_ci       for  both patterns and subject lines, whereas others are valid only for
401e18e3516Sopenharmony_ci       one or the other. Each modifier has  a  long  name,  for  example  "an-
402e18e3516Sopenharmony_ci       chored",  and  some  of  them  must be followed by an equals sign and a
403e18e3516Sopenharmony_ci       value, for example, "offset=12". Values cannot  contain  comma  charac-
404e18e3516Sopenharmony_ci       ters,  but may contain spaces. Modifiers that do not take values may be
405e18e3516Sopenharmony_ci       preceded by a minus sign to turn off a previous setting.
406e18e3516Sopenharmony_ci
407e18e3516Sopenharmony_ci       A few of the more common modifiers can also be specified as single let-
408e18e3516Sopenharmony_ci       ters,  for  example "i" for "caseless". In documentation, following the
409e18e3516Sopenharmony_ci       Perl convention, these are written with a slash ("the /i modifier") for
410e18e3516Sopenharmony_ci       clarity.  Abbreviated  modifiers  must all be concatenated in the first
411e18e3516Sopenharmony_ci       item of a modifier list. If the first item is not recognized as a  long
412e18e3516Sopenharmony_ci       modifier  name, it is interpreted as a sequence of these abbreviations.
413e18e3516Sopenharmony_ci       For example:
414e18e3516Sopenharmony_ci
415e18e3516Sopenharmony_ci         /abc/ig,newline=cr,jit=3
416e18e3516Sopenharmony_ci
417e18e3516Sopenharmony_ci       This is a pattern line whose modifier list starts with  two  one-letter
418e18e3516Sopenharmony_ci       modifiers  (/i  and  /g).  The lower-case abbreviated modifiers are the
419e18e3516Sopenharmony_ci       same as used in Perl.
420e18e3516Sopenharmony_ci
421e18e3516Sopenharmony_ci
422e18e3516Sopenharmony_ciPATTERN SYNTAX
423e18e3516Sopenharmony_ci
424e18e3516Sopenharmony_ci       A pattern line must start with one of the following characters  (common
425e18e3516Sopenharmony_ci       symbols, excluding pattern meta-characters):
426e18e3516Sopenharmony_ci
427e18e3516Sopenharmony_ci         / ! " ' ` - = _ : ; , % & @ ~
428e18e3516Sopenharmony_ci
429e18e3516Sopenharmony_ci       This  is  interpreted  as the pattern's delimiter. A regular expression
430e18e3516Sopenharmony_ci       may be continued over several input lines, in which  case  the  newline
431e18e3516Sopenharmony_ci       characters are included within it. It is possible to include the delim-
432e18e3516Sopenharmony_ci       iter as a literal within the pattern by escaping it with  a  backslash,
433e18e3516Sopenharmony_ci       for example
434e18e3516Sopenharmony_ci
435e18e3516Sopenharmony_ci         /abc\/def/
436e18e3516Sopenharmony_ci
437e18e3516Sopenharmony_ci       If  you do this, the escape and the delimiter form part of the pattern,
438e18e3516Sopenharmony_ci       but since the delimiters are all non-alphanumeric, the inclusion of the
439e18e3516Sopenharmony_ci       backslash  does not affect the pattern's interpretation. Note, however,
440e18e3516Sopenharmony_ci       that this trick does not work within \Q...\E literal bracketing because
441e18e3516Sopenharmony_ci       the backslash will itself be interpreted as a literal. If the terminat-
442e18e3516Sopenharmony_ci       ing delimiter is immediately followed by a backslash, for example,
443e18e3516Sopenharmony_ci
444e18e3516Sopenharmony_ci         /abc/\
445e18e3516Sopenharmony_ci
446e18e3516Sopenharmony_ci       then a backslash is added to the end of the pattern. This  is  done  to
447e18e3516Sopenharmony_ci       provide  a  way of testing the error condition that arises if a pattern
448e18e3516Sopenharmony_ci       finishes with a backslash, because
449e18e3516Sopenharmony_ci
450e18e3516Sopenharmony_ci         /abc\/
451e18e3516Sopenharmony_ci
452e18e3516Sopenharmony_ci       is interpreted as the first line of a pattern that starts with  "abc/",
453e18e3516Sopenharmony_ci       causing  pcre2test to read the next line as a continuation of the regu-
454e18e3516Sopenharmony_ci       lar expression.
455e18e3516Sopenharmony_ci
456e18e3516Sopenharmony_ci       A pattern can be followed by a modifier list (details below).
457e18e3516Sopenharmony_ci
458e18e3516Sopenharmony_ci
459e18e3516Sopenharmony_ciSUBJECT LINE SYNTAX
460e18e3516Sopenharmony_ci
461e18e3516Sopenharmony_ci       Before each subject line is passed to pcre2_match(), pcre2_dfa_match(),
462e18e3516Sopenharmony_ci       or  pcre2_jit_match(), leading and trailing white space is removed, and
463e18e3516Sopenharmony_ci       the line is scanned for backslash escapes, unless  the  subject_literal
464e18e3516Sopenharmony_ci       modifier  was set for the pattern. The following provide a means of en-
465e18e3516Sopenharmony_ci       coding non-printing characters in a visible way:
466e18e3516Sopenharmony_ci
467e18e3516Sopenharmony_ci         \a         alarm (BEL, \x07)
468e18e3516Sopenharmony_ci         \b         backspace (\x08)
469e18e3516Sopenharmony_ci         \e         escape (\x27)
470e18e3516Sopenharmony_ci         \f         form feed (\x0c)
471e18e3516Sopenharmony_ci         \n         newline (\x0a)
472e18e3516Sopenharmony_ci         \r         carriage return (\x0d)
473e18e3516Sopenharmony_ci         \t         tab (\x09)
474e18e3516Sopenharmony_ci         \v         vertical tab (\x0b)
475e18e3516Sopenharmony_ci         \nnn       octal character (up to 3 octal digits); always
476e18e3516Sopenharmony_ci                      a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
477e18e3516Sopenharmony_ci         \o{dd...}  octal character (any number of octal digits}
478e18e3516Sopenharmony_ci         \xhh       hexadecimal byte (up to 2 hex digits)
479e18e3516Sopenharmony_ci         \x{hh...}  hexadecimal character (any number of hex digits)
480e18e3516Sopenharmony_ci
481e18e3516Sopenharmony_ci       The use of \x{hh...} is not dependent on the use of the utf modifier on
482e18e3516Sopenharmony_ci       the  pattern. It is recognized always. There may be any number of hexa-
483e18e3516Sopenharmony_ci       decimal digits inside the braces; invalid  values  provoke  error  mes-
484e18e3516Sopenharmony_ci       sages.
485e18e3516Sopenharmony_ci
486e18e3516Sopenharmony_ci       Note  that  \xhh  specifies one byte rather than one character in UTF-8
487e18e3516Sopenharmony_ci       mode; this makes it possible to construct invalid UTF-8  sequences  for
488e18e3516Sopenharmony_ci       testing  purposes.  On the other hand, \x{hh} is interpreted as a UTF-8
489e18e3516Sopenharmony_ci       character in UTF-8 mode, generating more than one byte if the value  is
490e18e3516Sopenharmony_ci       greater  than  127.   When testing the 8-bit library not in UTF-8 mode,
491e18e3516Sopenharmony_ci       \x{hh} generates one byte for values less than 256, and causes an error
492e18e3516Sopenharmony_ci       for greater values.
493e18e3516Sopenharmony_ci
494e18e3516Sopenharmony_ci       In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
495e18e3516Sopenharmony_ci       possible to construct invalid UTF-16 sequences for testing purposes.
496e18e3516Sopenharmony_ci
497e18e3516Sopenharmony_ci       In UTF-32 mode, all 4- to 8-digit \x{...}  values  are  accepted.  This
498e18e3516Sopenharmony_ci       makes  it  possible  to  construct invalid UTF-32 sequences for testing
499e18e3516Sopenharmony_ci       purposes.
500e18e3516Sopenharmony_ci
501e18e3516Sopenharmony_ci       There is a special backslash sequence that specifies replication of one
502e18e3516Sopenharmony_ci       or more characters:
503e18e3516Sopenharmony_ci
504e18e3516Sopenharmony_ci         \[<characters>]{<count>}
505e18e3516Sopenharmony_ci
506e18e3516Sopenharmony_ci       This  makes  it possible to test long strings without having to provide
507e18e3516Sopenharmony_ci       them as part of the file. For example:
508e18e3516Sopenharmony_ci
509e18e3516Sopenharmony_ci         \[abc]{4}
510e18e3516Sopenharmony_ci
511e18e3516Sopenharmony_ci       is converted to "abcabcabcabc". This feature does not support  nesting.
512e18e3516Sopenharmony_ci       To include a closing square bracket in the characters, code it as \x5D.
513e18e3516Sopenharmony_ci
514e18e3516Sopenharmony_ci       A  backslash  followed  by  an equals sign marks the end of the subject
515e18e3516Sopenharmony_ci       string and the start of a modifier list. For example:
516e18e3516Sopenharmony_ci
517e18e3516Sopenharmony_ci         abc\=notbol,notempty
518e18e3516Sopenharmony_ci
519e18e3516Sopenharmony_ci       If the subject string is empty and \= is followed  by  whitespace,  the
520e18e3516Sopenharmony_ci       line  is  treated  as a comment line, and is not used for matching. For
521e18e3516Sopenharmony_ci       example:
522e18e3516Sopenharmony_ci
523e18e3516Sopenharmony_ci         \= This is a comment.
524e18e3516Sopenharmony_ci         abc\= This is an invalid modifier list.
525e18e3516Sopenharmony_ci
526e18e3516Sopenharmony_ci       A backslash followed by any other non-alphanumeric character  just  es-
527e18e3516Sopenharmony_ci       capes  that  character. A backslash followed by anything else causes an
528e18e3516Sopenharmony_ci       error. However, if the very last character in the line is  a  backslash
529e18e3516Sopenharmony_ci       (and  there  is  no  modifier list), it is ignored. This gives a way of
530e18e3516Sopenharmony_ci       passing an empty line as data, since a real empty line  terminates  the
531e18e3516Sopenharmony_ci       data input.
532e18e3516Sopenharmony_ci
533e18e3516Sopenharmony_ci       If the subject_literal modifier is set for a pattern, all subject lines
534e18e3516Sopenharmony_ci       that follow are treated as literals, with no special treatment of back-
535e18e3516Sopenharmony_ci       slashes.  No replication is possible, and any subject modifiers must be
536e18e3516Sopenharmony_ci       set as defaults by a #subject command.
537e18e3516Sopenharmony_ci
538e18e3516Sopenharmony_ci
539e18e3516Sopenharmony_ciPATTERN MODIFIERS
540e18e3516Sopenharmony_ci
541e18e3516Sopenharmony_ci       There are several types of modifier that can appear in  pattern  lines.
542e18e3516Sopenharmony_ci       Except where noted below, they may also be used in #pattern commands. A
543e18e3516Sopenharmony_ci       pattern's modifier list can add to or override default  modifiers  that
544e18e3516Sopenharmony_ci       were set by a previous #pattern command.
545e18e3516Sopenharmony_ci
546e18e3516Sopenharmony_ci   Setting compilation options
547e18e3516Sopenharmony_ci
548e18e3516Sopenharmony_ci       The  following  modifiers set options for pcre2_compile(). Most of them
549e18e3516Sopenharmony_ci       set bits in the options argument of  that  function,  but  those  whose
550e18e3516Sopenharmony_ci       names start with PCRE2_EXTRA are additional options that are set in the
551e18e3516Sopenharmony_ci       compile context. For the main options, there are some single-letter ab-
552e18e3516Sopenharmony_ci       breviations  that  are  the same as Perl options. There is special han-
553e18e3516Sopenharmony_ci       dling for /x: if a second x is  present,  PCRE2_EXTENDED  is  converted
554e18e3516Sopenharmony_ci       into  PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX-
555e18e3516Sopenharmony_ci       TENDED as well, though this makes no difference to the  way  pcre2_com-
556e18e3516Sopenharmony_ci       pile()  behaves. See pcre2api for a description of the effects of these
557e18e3516Sopenharmony_ci       options.
558e18e3516Sopenharmony_ci
559e18e3516Sopenharmony_ci             allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
560e18e3516Sopenharmony_ci             allow_lookaround_bsk      set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
561e18e3516Sopenharmony_ci             allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
562e18e3516Sopenharmony_ci             alt_bsux                  set PCRE2_ALT_BSUX
563e18e3516Sopenharmony_ci             alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
564e18e3516Sopenharmony_ci             alt_verbnames             set PCRE2_ALT_VERBNAMES
565e18e3516Sopenharmony_ci             anchored                  set PCRE2_ANCHORED
566e18e3516Sopenharmony_ci             auto_callout              set PCRE2_AUTO_CALLOUT
567e18e3516Sopenharmony_ci             bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
568e18e3516Sopenharmony_ci         /i  caseless                  set PCRE2_CASELESS
569e18e3516Sopenharmony_ci             dollar_endonly            set PCRE2_DOLLAR_ENDONLY
570e18e3516Sopenharmony_ci         /s  dotall                    set PCRE2_DOTALL
571e18e3516Sopenharmony_ci             dupnames                  set PCRE2_DUPNAMES
572e18e3516Sopenharmony_ci             endanchored               set PCRE2_ENDANCHORED
573e18e3516Sopenharmony_ci             escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF
574e18e3516Sopenharmony_ci         /x  extended                  set PCRE2_EXTENDED
575e18e3516Sopenharmony_ci         /xx extended_more             set PCRE2_EXTENDED_MORE
576e18e3516Sopenharmony_ci             extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX
577e18e3516Sopenharmony_ci             firstline                 set PCRE2_FIRSTLINE
578e18e3516Sopenharmony_ci             literal                   set PCRE2_LITERAL
579e18e3516Sopenharmony_ci             match_line                set PCRE2_EXTRA_MATCH_LINE
580e18e3516Sopenharmony_ci             match_invalid_utf         set PCRE2_MATCH_INVALID_UTF
581e18e3516Sopenharmony_ci             match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
582e18e3516Sopenharmony_ci             match_word                set PCRE2_EXTRA_MATCH_WORD
583e18e3516Sopenharmony_ci         /m  multiline                 set PCRE2_MULTILINE
584e18e3516Sopenharmony_ci             never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
585e18e3516Sopenharmony_ci             never_ucp                 set PCRE2_NEVER_UCP
586e18e3516Sopenharmony_ci             never_utf                 set PCRE2_NEVER_UTF
587e18e3516Sopenharmony_ci         /n  no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
588e18e3516Sopenharmony_ci             no_auto_possess           set PCRE2_NO_AUTO_POSSESS
589e18e3516Sopenharmony_ci             no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
590e18e3516Sopenharmony_ci             no_start_optimize         set PCRE2_NO_START_OPTIMIZE
591e18e3516Sopenharmony_ci             no_utf_check              set PCRE2_NO_UTF_CHECK
592e18e3516Sopenharmony_ci             ucp                       set PCRE2_UCP
593e18e3516Sopenharmony_ci             ungreedy                  set PCRE2_UNGREEDY
594e18e3516Sopenharmony_ci             use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
595e18e3516Sopenharmony_ci             utf                       set PCRE2_UTF
596e18e3516Sopenharmony_ci
597e18e3516Sopenharmony_ci       As well as turning on the PCRE2_UTF option, the utf modifier causes all
598e18e3516Sopenharmony_ci       non-printing  characters  in  output  strings  to  be printed using the
599e18e3516Sopenharmony_ci       \x{hh...} notation. Otherwise, those less than 0x100 are output in  hex
600e18e3516Sopenharmony_ci       without  the  curly brackets. Setting utf in 16-bit or 32-bit mode also
601e18e3516Sopenharmony_ci       causes pattern and subject  strings  to  be  translated  to  UTF-16  or
602e18e3516Sopenharmony_ci       UTF-32, respectively, before being passed to library functions.
603e18e3516Sopenharmony_ci
604e18e3516Sopenharmony_ci   Setting compilation controls
605e18e3516Sopenharmony_ci
606e18e3516Sopenharmony_ci       The  following  modifiers affect the compilation process or request in-
607e18e3516Sopenharmony_ci       formation about the pattern. There are single-letter abbreviations  for
608e18e3516Sopenharmony_ci       some that are heavily used in the test files.
609e18e3516Sopenharmony_ci
610e18e3516Sopenharmony_ci             bsr=[anycrlf|unicode]     specify \R handling
611e18e3516Sopenharmony_ci         /B  bincode                   show binary code without lengths
612e18e3516Sopenharmony_ci             callout_info              show callout information
613e18e3516Sopenharmony_ci             convert=<options>         request foreign pattern conversion
614e18e3516Sopenharmony_ci             convert_glob_escape=c     set glob escape character
615e18e3516Sopenharmony_ci             convert_glob_separator=c  set glob separator character
616e18e3516Sopenharmony_ci             convert_length            set convert buffer length
617e18e3516Sopenharmony_ci             debug                     same as info,fullbincode
618e18e3516Sopenharmony_ci             framesize                 show matching frame size
619e18e3516Sopenharmony_ci             fullbincode               show binary code with lengths
620e18e3516Sopenharmony_ci         /I  info                      show info about compiled pattern
621e18e3516Sopenharmony_ci             hex                       unquoted characters are hexadecimal
622e18e3516Sopenharmony_ci             jit[=<number>]            use JIT
623e18e3516Sopenharmony_ci             jitfast                   use JIT fast path
624e18e3516Sopenharmony_ci             jitverify                 verify JIT use
625e18e3516Sopenharmony_ci             locale=<name>             use this locale
626e18e3516Sopenharmony_ci             max_pattern_length=<n>    set the maximum pattern length
627e18e3516Sopenharmony_ci             memory                    show memory used
628e18e3516Sopenharmony_ci             newline=<type>            set newline type
629e18e3516Sopenharmony_ci             null_context              compile with a NULL context
630e18e3516Sopenharmony_ci             parens_nest_limit=<n>     set maximum parentheses depth
631e18e3516Sopenharmony_ci             posix                     use the POSIX API
632e18e3516Sopenharmony_ci             posix_nosub               use the POSIX API with REG_NOSUB
633e18e3516Sopenharmony_ci             push                      push compiled pattern onto the stack
634e18e3516Sopenharmony_ci             pushcopy                  push a copy onto the stack
635e18e3516Sopenharmony_ci             stackguard=<number>       test the stackguard feature
636e18e3516Sopenharmony_ci             subject_literal           treat all subject lines as literal
637e18e3516Sopenharmony_ci             tables=[0|1|2|3]          select internal tables
638e18e3516Sopenharmony_ci             use_length                do not zero-terminate the pattern
639e18e3516Sopenharmony_ci             utf8_input                treat input as UTF-8
640e18e3516Sopenharmony_ci
641e18e3516Sopenharmony_ci       The effects of these modifiers are described in the following sections.
642e18e3516Sopenharmony_ci
643e18e3516Sopenharmony_ci   Newline and \R handling
644e18e3516Sopenharmony_ci
645e18e3516Sopenharmony_ci       The  bsr modifier specifies what \R in a pattern should match. If it is
646e18e3516Sopenharmony_ci       set to "anycrlf", \R matches CR, LF, or CRLF only.  If  it  is  set  to
647e18e3516Sopenharmony_ci       "unicode",  \R matches any Unicode newline sequence. The default can be
648e18e3516Sopenharmony_ci       specified when PCRE2 is built; if it is not, the default is set to Uni-
649e18e3516Sopenharmony_ci       code.
650e18e3516Sopenharmony_ci
651e18e3516Sopenharmony_ci       The  newline  modifier specifies which characters are to be interpreted
652e18e3516Sopenharmony_ci       as newlines, both in the pattern and in subject lines. The type must be
653e18e3516Sopenharmony_ci       one of CR, LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case).
654e18e3516Sopenharmony_ci
655e18e3516Sopenharmony_ci   Information about a pattern
656e18e3516Sopenharmony_ci
657e18e3516Sopenharmony_ci       The  debug modifier is a shorthand for info,fullbincode, requesting all
658e18e3516Sopenharmony_ci       available information.
659e18e3516Sopenharmony_ci
660e18e3516Sopenharmony_ci       The bincode modifier causes a representation of the compiled code to be
661e18e3516Sopenharmony_ci       output  after compilation. This information does not contain length and
662e18e3516Sopenharmony_ci       offset values, which ensures that the same output is generated for dif-
663e18e3516Sopenharmony_ci       ferent  internal  link  sizes  and different code unit widths. By using
664e18e3516Sopenharmony_ci       bincode, the same regression tests can be used  in  different  environ-
665e18e3516Sopenharmony_ci       ments.
666e18e3516Sopenharmony_ci
667e18e3516Sopenharmony_ci       The  fullbincode  modifier, by contrast, does include length and offset
668e18e3516Sopenharmony_ci       values. This is used in a few special tests that run only for  specific
669e18e3516Sopenharmony_ci       code unit widths and link sizes, and is also useful for one-off tests.
670e18e3516Sopenharmony_ci
671e18e3516Sopenharmony_ci       The  info  modifier  requests  information  about  the compiled pattern
672e18e3516Sopenharmony_ci       (whether it is anchored, has a fixed first character, and so  on).  The
673e18e3516Sopenharmony_ci       information  is  obtained  from the pcre2_pattern_info() function. Here
674e18e3516Sopenharmony_ci       are some typical examples:
675e18e3516Sopenharmony_ci
676e18e3516Sopenharmony_ci           re> /(?i)(^a|^b)/m,info
677e18e3516Sopenharmony_ci         Capture group count = 1
678e18e3516Sopenharmony_ci         Compile options: multiline
679e18e3516Sopenharmony_ci         Overall options: caseless multiline
680e18e3516Sopenharmony_ci         First code unit at start or follows newline
681e18e3516Sopenharmony_ci         Subject length lower bound = 1
682e18e3516Sopenharmony_ci
683e18e3516Sopenharmony_ci           re> /(?i)abc/info
684e18e3516Sopenharmony_ci         Capture group count = 0
685e18e3516Sopenharmony_ci         Compile options: <none>
686e18e3516Sopenharmony_ci         Overall options: caseless
687e18e3516Sopenharmony_ci         First code unit = 'a' (caseless)
688e18e3516Sopenharmony_ci         Last code unit = 'c' (caseless)
689e18e3516Sopenharmony_ci         Subject length lower bound = 3
690e18e3516Sopenharmony_ci
691e18e3516Sopenharmony_ci       "Compile options" are those specified by modifiers;  "overall  options"
692e18e3516Sopenharmony_ci       have  added options that are taken or deduced from the pattern. If both
693e18e3516Sopenharmony_ci       sets of options are the same, just a single "options" line  is  output;
694e18e3516Sopenharmony_ci       if  there  are  no  options,  the line is omitted. "First code unit" is
695e18e3516Sopenharmony_ci       where any match must start; if there is more than one they  are  listed
696e18e3516Sopenharmony_ci       as  "starting  code  units".  "Last code unit" is the last literal code
697e18e3516Sopenharmony_ci       unit that must be present in any match. This  is  not  necessarily  the
698e18e3516Sopenharmony_ci       last  character.  These lines are omitted if no starting or ending code
699e18e3516Sopenharmony_ci       units  are  recorded.  The  subject  length  line   is   omitted   when
700e18e3516Sopenharmony_ci       no_start_optimize  is  set because the minimum length is not calculated
701e18e3516Sopenharmony_ci       when it can never be used.
702e18e3516Sopenharmony_ci
703e18e3516Sopenharmony_ci       The framesize modifier shows the size, in bytes, of the storage  frames
704e18e3516Sopenharmony_ci       used  by  pcre2_match()  for handling backtracking. The size depends on
705e18e3516Sopenharmony_ci       the number of capturing parentheses in the pattern.
706e18e3516Sopenharmony_ci
707e18e3516Sopenharmony_ci       The callout_info modifier requests information about all  the  callouts
708e18e3516Sopenharmony_ci       in the pattern. A list of them is output at the end of any other infor-
709e18e3516Sopenharmony_ci       mation that is requested. For each callout, either its number or string
710e18e3516Sopenharmony_ci       is given, followed by the item that follows it in the pattern.
711e18e3516Sopenharmony_ci
712e18e3516Sopenharmony_ci   Passing a NULL context
713e18e3516Sopenharmony_ci
714e18e3516Sopenharmony_ci       Normally,  pcre2test  passes a context block to pcre2_compile(). If the
715e18e3516Sopenharmony_ci       null_context modifier is set, however, NULL  is  passed.  This  is  for
716e18e3516Sopenharmony_ci       testing  that  pcre2_compile()  behaves correctly in this case (it uses
717e18e3516Sopenharmony_ci       default values).
718e18e3516Sopenharmony_ci
719e18e3516Sopenharmony_ci   Specifying pattern characters in hexadecimal
720e18e3516Sopenharmony_ci
721e18e3516Sopenharmony_ci       The hex modifier specifies that the characters of the  pattern,  except
722e18e3516Sopenharmony_ci       for  substrings  enclosed  in single or double quotes, are to be inter-
723e18e3516Sopenharmony_ci       preted as pairs of hexadecimal digits. This feature is  provided  as  a
724e18e3516Sopenharmony_ci       way of creating patterns that contain binary zeros and other non-print-
725e18e3516Sopenharmony_ci       ing characters. White space is permitted between pairs of  digits.  For
726e18e3516Sopenharmony_ci       example, this pattern contains three characters:
727e18e3516Sopenharmony_ci
728e18e3516Sopenharmony_ci         /ab 32 59/hex
729e18e3516Sopenharmony_ci
730e18e3516Sopenharmony_ci       Parts  of  such  a  pattern are taken literally if quoted. This pattern
731e18e3516Sopenharmony_ci       contains nine characters, only two of which are specified in  hexadeci-
732e18e3516Sopenharmony_ci       mal:
733e18e3516Sopenharmony_ci
734e18e3516Sopenharmony_ci         /ab "literal" 32/hex
735e18e3516Sopenharmony_ci
736e18e3516Sopenharmony_ci       Either  single or double quotes may be used. There is no way of includ-
737e18e3516Sopenharmony_ci       ing the delimiter within a substring. The hex and expand modifiers  are
738e18e3516Sopenharmony_ci       mutually exclusive.
739e18e3516Sopenharmony_ci
740e18e3516Sopenharmony_ci   Specifying the pattern's length
741e18e3516Sopenharmony_ci
742e18e3516Sopenharmony_ci       By default, patterns are passed to the compiling functions as zero-ter-
743e18e3516Sopenharmony_ci       minated strings but can be passed by length instead of being  zero-ter-
744e18e3516Sopenharmony_ci       minated.  The use_length modifier causes this to happen. Using a length
745e18e3516Sopenharmony_ci       happens automatically (whether or not use_length is set)  when  hex  is
746e18e3516Sopenharmony_ci       set,  because  patterns specified in hexadecimal may contain binary ze-
747e18e3516Sopenharmony_ci       ros.
748e18e3516Sopenharmony_ci
749e18e3516Sopenharmony_ci       If hex or use_length is used with the POSIX wrapper API (see "Using the
750e18e3516Sopenharmony_ci       POSIX  wrapper  API" below), the REG_PEND extension is used to pass the
751e18e3516Sopenharmony_ci       pattern's length.
752e18e3516Sopenharmony_ci
753e18e3516Sopenharmony_ci   Specifying wide characters in 16-bit and 32-bit modes
754e18e3516Sopenharmony_ci
755e18e3516Sopenharmony_ci       In 16-bit and 32-bit modes, all input is automatically treated as UTF-8
756e18e3516Sopenharmony_ci       and  translated  to  UTF-16 or UTF-32 when the utf modifier is set. For
757e18e3516Sopenharmony_ci       testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input
758e18e3516Sopenharmony_ci       modifier  can  be  used. It is mutually exclusive with utf. Input lines
759e18e3516Sopenharmony_ci       are interpreted as UTF-8 as a means of specifying wide characters. More
760e18e3516Sopenharmony_ci       details are given in "Input encoding" above.
761e18e3516Sopenharmony_ci
762e18e3516Sopenharmony_ci   Generating long repetitive patterns
763e18e3516Sopenharmony_ci
764e18e3516Sopenharmony_ci       Some  tests use long patterns that are very repetitive. Instead of cre-
765e18e3516Sopenharmony_ci       ating a very long input line for such a pattern, you can use a  special
766e18e3516Sopenharmony_ci       repetition  feature,  similar  to  the  one described for subject lines
767e18e3516Sopenharmony_ci       above. If the expand modifier is present on a  pattern,  parts  of  the
768e18e3516Sopenharmony_ci       pattern that have the form
769e18e3516Sopenharmony_ci
770e18e3516Sopenharmony_ci         \[<characters>]{<count>}
771e18e3516Sopenharmony_ci
772e18e3516Sopenharmony_ci       are expanded before the pattern is passed to pcre2_compile(). For exam-
773e18e3516Sopenharmony_ci       ple, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
774e18e3516Sopenharmony_ci       cannot  be  nested. An initial "\[" sequence is recognized only if "]{"
775e18e3516Sopenharmony_ci       followed by decimal digits and "}" is found later in  the  pattern.  If
776e18e3516Sopenharmony_ci       not, the characters remain in the pattern unaltered. The expand and hex
777e18e3516Sopenharmony_ci       modifiers are mutually exclusive.
778e18e3516Sopenharmony_ci
779e18e3516Sopenharmony_ci       If part of an expanded pattern looks like an expansion, but  is  really
780e18e3516Sopenharmony_ci       part of the actual pattern, unwanted expansion can be avoided by giving
781e18e3516Sopenharmony_ci       two values in the quantifier. For example, \[AB]{6000,6000} is not rec-
782e18e3516Sopenharmony_ci       ognized as an expansion item.
783e18e3516Sopenharmony_ci
784e18e3516Sopenharmony_ci       If  the  info modifier is set on an expanded pattern, the result of the
785e18e3516Sopenharmony_ci       expansion is included in the information that is output.
786e18e3516Sopenharmony_ci
787e18e3516Sopenharmony_ci   JIT compilation
788e18e3516Sopenharmony_ci
789e18e3516Sopenharmony_ci       Just-in-time (JIT) compiling is a  heavyweight  optimization  that  can
790e18e3516Sopenharmony_ci       greatly  speed  up pattern matching. See the pcre2jit documentation for
791e18e3516Sopenharmony_ci       details. JIT compiling happens, optionally, after a  pattern  has  been
792e18e3516Sopenharmony_ci       successfully  compiled into an internal form. The JIT compiler converts
793e18e3516Sopenharmony_ci       this to optimized machine code. It needs to know whether the match-time
794e18e3516Sopenharmony_ci       options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used,
795e18e3516Sopenharmony_ci       because different code is generated for the different  cases.  See  the
796e18e3516Sopenharmony_ci       partial  modifier in "Subject Modifiers" below for details of how these
797e18e3516Sopenharmony_ci       options are specified for each match attempt.
798e18e3516Sopenharmony_ci
799e18e3516Sopenharmony_ci       JIT compilation is requested by the jit pattern modifier, which may op-
800e18e3516Sopenharmony_ci       tionally  be  followed by an equals sign and a number in the range 0 to
801e18e3516Sopenharmony_ci       7.  The three bits that make up the number specify which of  the  three
802e18e3516Sopenharmony_ci       JIT operating modes are to be compiled:
803e18e3516Sopenharmony_ci
804e18e3516Sopenharmony_ci         1  compile JIT code for non-partial matching
805e18e3516Sopenharmony_ci         2  compile JIT code for soft partial matching
806e18e3516Sopenharmony_ci         4  compile JIT code for hard partial matching
807e18e3516Sopenharmony_ci
808e18e3516Sopenharmony_ci       The possible values for the jit modifier are therefore:
809e18e3516Sopenharmony_ci
810e18e3516Sopenharmony_ci         0  disable JIT
811e18e3516Sopenharmony_ci         1  normal matching only
812e18e3516Sopenharmony_ci         2  soft partial matching only
813e18e3516Sopenharmony_ci         3  normal and soft partial matching
814e18e3516Sopenharmony_ci         4  hard partial matching only
815e18e3516Sopenharmony_ci         6  soft and hard partial matching only
816e18e3516Sopenharmony_ci         7  all three modes
817e18e3516Sopenharmony_ci
818e18e3516Sopenharmony_ci       If  no  number  is  given,  7 is assumed. The phrase "partial matching"
819e18e3516Sopenharmony_ci       means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the
820e18e3516Sopenharmony_ci       PCRE2_PARTIAL_HARD  option set. Note that such a call may return a com-
821e18e3516Sopenharmony_ci       plete match; the options enable the possibility of a partial match, but
822e18e3516Sopenharmony_ci       do  not  require it. Note also that if you request JIT compilation only
823e18e3516Sopenharmony_ci       for partial matching (for example, jit=2) but do not  set  the  partial
824e18e3516Sopenharmony_ci       modifier  on  a  subject line, that match will not use JIT code because
825e18e3516Sopenharmony_ci       none was compiled for non-partial matching.
826e18e3516Sopenharmony_ci
827e18e3516Sopenharmony_ci       If JIT compilation is successful, the compiled JIT code will  automati-
828e18e3516Sopenharmony_ci       cally be used when an appropriate type of match is run, except when in-
829e18e3516Sopenharmony_ci       compatible run-time options are specified. For more  details,  see  the
830e18e3516Sopenharmony_ci       pcre2jit  documentation. See also the jitstack modifier below for a way
831e18e3516Sopenharmony_ci       of setting the size of the JIT stack.
832e18e3516Sopenharmony_ci
833e18e3516Sopenharmony_ci       If the jitfast modifier is specified, matching is done  using  the  JIT
834e18e3516Sopenharmony_ci       "fast  path" interface, pcre2_jit_match(), which skips some of the san-
835e18e3516Sopenharmony_ci       ity checks that are done by pcre2_match(), and of course does not  work
836e18e3516Sopenharmony_ci       when  JIT  is not supported. If jitfast is specified without jit, jit=7
837e18e3516Sopenharmony_ci       is assumed.
838e18e3516Sopenharmony_ci
839e18e3516Sopenharmony_ci       If the jitverify modifier is specified, information about the  compiled
840e18e3516Sopenharmony_ci       pattern  shows  whether  JIT  compilation was or was not successful. If
841e18e3516Sopenharmony_ci       jitverify is specified without jit, jit=7 is assumed. If  JIT  compila-
842e18e3516Sopenharmony_ci       tion  is successful when jitverify is set, the text "(JIT)" is added to
843e18e3516Sopenharmony_ci       the first output line after a match or non match when JIT-compiled code
844e18e3516Sopenharmony_ci       was actually used in the match.
845e18e3516Sopenharmony_ci
846e18e3516Sopenharmony_ci   Setting a locale
847e18e3516Sopenharmony_ci
848e18e3516Sopenharmony_ci       The locale modifier must specify the name of a locale, for example:
849e18e3516Sopenharmony_ci
850e18e3516Sopenharmony_ci         /pattern/locale=fr_FR
851e18e3516Sopenharmony_ci
852e18e3516Sopenharmony_ci       The given locale is set, pcre2_maketables() is called to build a set of
853e18e3516Sopenharmony_ci       character tables for the locale, and this is then passed to  pcre2_com-
854e18e3516Sopenharmony_ci       pile()  when compiling the regular expression. The same tables are used
855e18e3516Sopenharmony_ci       when matching the following subject lines. The locale modifier  applies
856e18e3516Sopenharmony_ci       only to the pattern on which it appears, but can be given in a #pattern
857e18e3516Sopenharmony_ci       command if a default is needed. Setting a locale and alternate  charac-
858e18e3516Sopenharmony_ci       ter tables are mutually exclusive.
859e18e3516Sopenharmony_ci
860e18e3516Sopenharmony_ci   Showing pattern memory
861e18e3516Sopenharmony_ci
862e18e3516Sopenharmony_ci       The memory modifier causes the size in bytes of the memory used to hold
863e18e3516Sopenharmony_ci       the compiled pattern to be output. This does not include  the  size  of
864e18e3516Sopenharmony_ci       the  pcre2_code block; it is just the actual compiled data. If the pat-
865e18e3516Sopenharmony_ci       tern is subsequently passed to the JIT compiler, the size  of  the  JIT
866e18e3516Sopenharmony_ci       compiled code is also output. Here is an example:
867e18e3516Sopenharmony_ci
868e18e3516Sopenharmony_ci           re> /a(b)c/jit,memory
869e18e3516Sopenharmony_ci         Memory allocation (code space): 21
870e18e3516Sopenharmony_ci         Memory allocation (JIT code): 1910
871e18e3516Sopenharmony_ci
872e18e3516Sopenharmony_ci
873e18e3516Sopenharmony_ci   Limiting nested parentheses
874e18e3516Sopenharmony_ci
875e18e3516Sopenharmony_ci       The  parens_nest_limit  modifier  sets  a  limit on the depth of nested
876e18e3516Sopenharmony_ci       parentheses in a pattern. Breaching the limit causes a compilation  er-
877e18e3516Sopenharmony_ci       ror.   The  default  for  the  library  is set when PCRE2 is built, but
878e18e3516Sopenharmony_ci       pcre2test sets its own default of 220, which is  required  for  running
879e18e3516Sopenharmony_ci       the standard test suite.
880e18e3516Sopenharmony_ci
881e18e3516Sopenharmony_ci   Limiting the pattern length
882e18e3516Sopenharmony_ci
883e18e3516Sopenharmony_ci       The  max_pattern_length  modifier  sets  a limit, in code units, to the
884e18e3516Sopenharmony_ci       length of pattern that pcre2_compile() will accept. Breaching the limit
885e18e3516Sopenharmony_ci       causes  a  compilation  error.  The  default  is  the  largest number a
886e18e3516Sopenharmony_ci       PCRE2_SIZE variable can hold (essentially unlimited).
887e18e3516Sopenharmony_ci
888e18e3516Sopenharmony_ci   Using the POSIX wrapper API
889e18e3516Sopenharmony_ci
890e18e3516Sopenharmony_ci       The posix and posix_nosub modifiers cause pcre2test to call  PCRE2  via
891e18e3516Sopenharmony_ci       the  POSIX  wrapper API rather than its native API. When posix_nosub is
892e18e3516Sopenharmony_ci       used, the POSIX option REG_NOSUB is  passed  to  regcomp().  The  POSIX
893e18e3516Sopenharmony_ci       wrapper  supports  only  the 8-bit library. Note that it does not imply
894e18e3516Sopenharmony_ci       POSIX matching semantics; for more detail see the pcre2posix documenta-
895e18e3516Sopenharmony_ci       tion.  The  following  pattern  modifiers set options for the regcomp()
896e18e3516Sopenharmony_ci       function:
897e18e3516Sopenharmony_ci
898e18e3516Sopenharmony_ci         caseless           REG_ICASE
899e18e3516Sopenharmony_ci         multiline          REG_NEWLINE
900e18e3516Sopenharmony_ci         dotall             REG_DOTALL     )
901e18e3516Sopenharmony_ci         ungreedy           REG_UNGREEDY   ) These options are not part of
902e18e3516Sopenharmony_ci         ucp                REG_UCP        )   the POSIX standard
903e18e3516Sopenharmony_ci         utf                REG_UTF8       )
904e18e3516Sopenharmony_ci
905e18e3516Sopenharmony_ci       The regerror_buffsize modifier specifies a size for  the  error  buffer
906e18e3516Sopenharmony_ci       that  is  passed to regerror() in the event of a compilation error. For
907e18e3516Sopenharmony_ci       example:
908e18e3516Sopenharmony_ci
909e18e3516Sopenharmony_ci         /abc/posix,regerror_buffsize=20
910e18e3516Sopenharmony_ci
911e18e3516Sopenharmony_ci       This provides a means of testing the behaviour of regerror()  when  the
912e18e3516Sopenharmony_ci       buffer  is  too  small  for the error message. If this modifier has not
913e18e3516Sopenharmony_ci       been set, a large buffer is used.
914e18e3516Sopenharmony_ci
915e18e3516Sopenharmony_ci       The aftertext and allaftertext subject modifiers work as described  be-
916e18e3516Sopenharmony_ci       low. All other modifiers are either ignored, with a warning message, or
917e18e3516Sopenharmony_ci       cause an error.
918e18e3516Sopenharmony_ci
919e18e3516Sopenharmony_ci       The pattern is passed to regcomp() as a zero-terminated string  by  de-
920e18e3516Sopenharmony_ci       fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
921e18e3516Sopenharmony_ci       tension is used to pass it by length.
922e18e3516Sopenharmony_ci
923e18e3516Sopenharmony_ci   Testing the stack guard feature
924e18e3516Sopenharmony_ci
925e18e3516Sopenharmony_ci       The stackguard modifier is used  to  test  the  use  of  pcre2_set_com-
926e18e3516Sopenharmony_ci       pile_recursion_guard(),  a  function  that  is provided to enable stack
927e18e3516Sopenharmony_ci       availability to be checked during compilation (see the  pcre2api  docu-
928e18e3516Sopenharmony_ci       mentation  for  details).  If  the  number specified by the modifier is
929e18e3516Sopenharmony_ci       greater than zero, pcre2_set_compile_recursion_guard() is called to set
930e18e3516Sopenharmony_ci       up  callback  from pcre2_compile() to a local function. The argument it
931e18e3516Sopenharmony_ci       receives is the current nesting parenthesis depth; if this  is  greater
932e18e3516Sopenharmony_ci       than the value given by the modifier, non-zero is returned, causing the
933e18e3516Sopenharmony_ci       compilation to be aborted.
934e18e3516Sopenharmony_ci
935e18e3516Sopenharmony_ci   Using alternative character tables
936e18e3516Sopenharmony_ci
937e18e3516Sopenharmony_ci       The value specified for the tables modifier must be one of  the  digits
938e18e3516Sopenharmony_ci       0, 1, 2, or 3. It causes a specific set of built-in character tables to
939e18e3516Sopenharmony_ci       be passed to pcre2_compile(). This is used in the PCRE2 tests to  check
940e18e3516Sopenharmony_ci       behaviour  with different character tables. The digit specifies the ta-
941e18e3516Sopenharmony_ci       bles as follows:
942e18e3516Sopenharmony_ci
943e18e3516Sopenharmony_ci         0   do not pass any special character tables
944e18e3516Sopenharmony_ci         1   the default ASCII tables, as distributed in
945e18e3516Sopenharmony_ci               pcre2_chartables.c.dist
946e18e3516Sopenharmony_ci         2   a set of tables defining ISO 8859 characters
947e18e3516Sopenharmony_ci         3   a set of tables loaded by the #loadtables command
948e18e3516Sopenharmony_ci
949e18e3516Sopenharmony_ci       In tables 2, some characters whose codes are greater than 128 are iden-
950e18e3516Sopenharmony_ci       tified as letters, digits, spaces, etc. Tables 3 can be used only after
951e18e3516Sopenharmony_ci       a #loadtables command has loaded them from a binary file.  Setting  al-
952e18e3516Sopenharmony_ci       ternate character tables and a locale are mutually exclusive.
953e18e3516Sopenharmony_ci
954e18e3516Sopenharmony_ci   Setting certain match controls
955e18e3516Sopenharmony_ci
956e18e3516Sopenharmony_ci       The following modifiers are really subject modifiers, and are described
957e18e3516Sopenharmony_ci       under "Subject Modifiers" below. However, they may  be  included  in  a
958e18e3516Sopenharmony_ci       pattern's  modifier  list, in which case they are applied to every sub-
959e18e3516Sopenharmony_ci       ject line that is processed with that pattern. These modifiers  do  not
960e18e3516Sopenharmony_ci       affect the compilation process.
961e18e3516Sopenharmony_ci
962e18e3516Sopenharmony_ci             aftertext                   show text after match
963e18e3516Sopenharmony_ci             allaftertext                show text after captures
964e18e3516Sopenharmony_ci             allcaptures                 show all captures
965e18e3516Sopenharmony_ci             allvector                   show the entire ovector
966e18e3516Sopenharmony_ci             allusedtext                 show all consulted text
967e18e3516Sopenharmony_ci             altglobal                   alternative global matching
968e18e3516Sopenharmony_ci         /g  global                      global matching
969e18e3516Sopenharmony_ci             jitstack=<n>                set size of JIT stack
970e18e3516Sopenharmony_ci             mark                        show mark values
971e18e3516Sopenharmony_ci             replace=<string>            specify a replacement string
972e18e3516Sopenharmony_ci             startchar                   show starting character when relevant
973e18e3516Sopenharmony_ci             substitute_callout          use substitution callouts
974e18e3516Sopenharmony_ci             substitute_extended         use PCRE2_SUBSTITUTE_EXTENDED
975e18e3516Sopenharmony_ci             substitute_literal          use PCRE2_SUBSTITUTE_LITERAL
976e18e3516Sopenharmony_ci             substitute_matched          use PCRE2_SUBSTITUTE_MATCHED
977e18e3516Sopenharmony_ci             substitute_overflow_length  use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
978e18e3516Sopenharmony_ci             substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
979e18e3516Sopenharmony_ci             substitute_skip=<n>         skip substitution <n>
980e18e3516Sopenharmony_ci             substitute_stop=<n>         skip substitution <n> and following
981e18e3516Sopenharmony_ci             substitute_unknown_unset    use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
982e18e3516Sopenharmony_ci             substitute_unset_empty      use PCRE2_SUBSTITUTE_UNSET_EMPTY
983e18e3516Sopenharmony_ci
984e18e3516Sopenharmony_ci       These  modifiers may not appear in a #pattern command. If you want them
985e18e3516Sopenharmony_ci       as defaults, set them in a #subject command.
986e18e3516Sopenharmony_ci
987e18e3516Sopenharmony_ci   Specifying literal subject lines
988e18e3516Sopenharmony_ci
989e18e3516Sopenharmony_ci       If the subject_literal modifier is present on a pattern, all  the  sub-
990e18e3516Sopenharmony_ci       ject lines that it matches are taken as literal strings, with no inter-
991e18e3516Sopenharmony_ci       pretation of backslashes. It is not possible to set  subject  modifiers
992e18e3516Sopenharmony_ci       on  such  lines, but any that are set as defaults by a #subject command
993e18e3516Sopenharmony_ci       are recognized.
994e18e3516Sopenharmony_ci
995e18e3516Sopenharmony_ci   Saving a compiled pattern
996e18e3516Sopenharmony_ci
997e18e3516Sopenharmony_ci       When a pattern with the push modifier is successfully compiled,  it  is
998e18e3516Sopenharmony_ci       pushed  onto  a  stack  of compiled patterns, and pcre2test expects the
999e18e3516Sopenharmony_ci       next line to contain a new pattern (or a command) instead of a  subject
1000e18e3516Sopenharmony_ci       line. This facility is used when saving compiled patterns to a file, as
1001e18e3516Sopenharmony_ci       described in the section entitled "Saving and restoring  compiled  pat-
1002e18e3516Sopenharmony_ci       terns"  below.  If pushcopy is used instead of push, a copy of the com-
1003e18e3516Sopenharmony_ci       piled pattern is stacked, leaving the original  as  current,  ready  to
1004e18e3516Sopenharmony_ci       match  the  following  input  lines. This provides a way of testing the
1005e18e3516Sopenharmony_ci       pcre2_code_copy() function.  The push and pushcopy  modifiers  are  in-
1006e18e3516Sopenharmony_ci       compatible  with compilation modifiers such as global that act at match
1007e18e3516Sopenharmony_ci       time. Any that are specified are ignored (for the stacked copy), with a
1008e18e3516Sopenharmony_ci       warning  message,  except for replace, which causes an error. Note that
1009e18e3516Sopenharmony_ci       jitverify, which is allowed, does not carry through to  any  subsequent
1010e18e3516Sopenharmony_ci       matching that uses a stacked pattern.
1011e18e3516Sopenharmony_ci
1012e18e3516Sopenharmony_ci   Testing foreign pattern conversion
1013e18e3516Sopenharmony_ci
1014e18e3516Sopenharmony_ci       The  experimental  foreign pattern conversion functions in PCRE2 can be
1015e18e3516Sopenharmony_ci       tested by setting the convert modifier. Its argument is  a  colon-sepa-
1016e18e3516Sopenharmony_ci       rated  list  of  options,  which  set  the  equivalent  option  for the
1017e18e3516Sopenharmony_ci       pcre2_pattern_convert() function:
1018e18e3516Sopenharmony_ci
1019e18e3516Sopenharmony_ci         glob                    PCRE2_CONVERT_GLOB
1020e18e3516Sopenharmony_ci         glob_no_starstar        PCRE2_CONVERT_GLOB_NO_STARSTAR
1021e18e3516Sopenharmony_ci         glob_no_wild_separator  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
1022e18e3516Sopenharmony_ci         posix_basic             PCRE2_CONVERT_POSIX_BASIC
1023e18e3516Sopenharmony_ci         posix_extended          PCRE2_CONVERT_POSIX_EXTENDED
1024e18e3516Sopenharmony_ci         unset                   Unset all options
1025e18e3516Sopenharmony_ci
1026e18e3516Sopenharmony_ci       The "unset" value is useful for turning off a default that has been set
1027e18e3516Sopenharmony_ci       by a #pattern command. When one of these options is set, the input pat-
1028e18e3516Sopenharmony_ci       tern is passed to pcre2_pattern_convert(). If the  conversion  is  suc-
1029e18e3516Sopenharmony_ci       cessful,  the  result  is  reflected  in  the output and then passed to
1030e18e3516Sopenharmony_ci       pcre2_compile(). The normal utf and no_utf_check options, if set, cause
1031e18e3516Sopenharmony_ci       the  PCRE2_CONVERT_UTF  and  PCRE2_CONVERT_NO_UTF_CHECK  options  to be
1032e18e3516Sopenharmony_ci       passed to pcre2_pattern_convert().
1033e18e3516Sopenharmony_ci
1034e18e3516Sopenharmony_ci       By default, the conversion function is allowed to allocate a buffer for
1035e18e3516Sopenharmony_ci       its  output.  However, if the convert_length modifier is set to a value
1036e18e3516Sopenharmony_ci       greater than zero, pcre2test passes a buffer of the given length.  This
1037e18e3516Sopenharmony_ci       makes it possible to test the length check.
1038e18e3516Sopenharmony_ci
1039e18e3516Sopenharmony_ci       The  convert_glob_escape  and  convert_glob_separator  modifiers can be
1040e18e3516Sopenharmony_ci       used to specify the escape and separator characters for  glob  process-
1041e18e3516Sopenharmony_ci       ing, overriding the defaults, which are operating-system dependent.
1042e18e3516Sopenharmony_ci
1043e18e3516Sopenharmony_ci
1044e18e3516Sopenharmony_ciSUBJECT MODIFIERS
1045e18e3516Sopenharmony_ci
1046e18e3516Sopenharmony_ci       The modifiers that can appear in subject lines and the #subject command
1047e18e3516Sopenharmony_ci       are of two types.
1048e18e3516Sopenharmony_ci
1049e18e3516Sopenharmony_ci   Setting match options
1050e18e3516Sopenharmony_ci
1051e18e3516Sopenharmony_ci       The   following   modifiers   set   options   for   pcre2_match()    or
1052e18e3516Sopenharmony_ci       pcre2_dfa_match(). See pcreapi for a description of their effects.
1053e18e3516Sopenharmony_ci
1054e18e3516Sopenharmony_ci             anchored                  set PCRE2_ANCHORED
1055e18e3516Sopenharmony_ci             endanchored               set PCRE2_ENDANCHORED
1056e18e3516Sopenharmony_ci             dfa_restart               set PCRE2_DFA_RESTART
1057e18e3516Sopenharmony_ci             dfa_shortest              set PCRE2_DFA_SHORTEST
1058e18e3516Sopenharmony_ci             no_jit                    set PCRE2_NO_JIT
1059e18e3516Sopenharmony_ci             no_utf_check              set PCRE2_NO_UTF_CHECK
1060e18e3516Sopenharmony_ci             notbol                    set PCRE2_NOTBOL
1061e18e3516Sopenharmony_ci             notempty                  set PCRE2_NOTEMPTY
1062e18e3516Sopenharmony_ci             notempty_atstart          set PCRE2_NOTEMPTY_ATSTART
1063e18e3516Sopenharmony_ci             noteol                    set PCRE2_NOTEOL
1064e18e3516Sopenharmony_ci             partial_hard (or ph)      set PCRE2_PARTIAL_HARD
1065e18e3516Sopenharmony_ci             partial_soft (or ps)      set PCRE2_PARTIAL_SOFT
1066e18e3516Sopenharmony_ci
1067e18e3516Sopenharmony_ci       The  partial matching modifiers are provided with abbreviations because
1068e18e3516Sopenharmony_ci       they appear frequently in tests.
1069e18e3516Sopenharmony_ci
1070e18e3516Sopenharmony_ci       If the posix or posix_nosub modifier was present on the pattern,  caus-
1071e18e3516Sopenharmony_ci       ing the POSIX wrapper API to be used, the only option-setting modifiers
1072e18e3516Sopenharmony_ci       that have any effect are notbol, notempty, and noteol, causing REG_NOT-
1073e18e3516Sopenharmony_ci       BOL,  REG_NOTEMPTY,  and  REG_NOTEOL,  respectively,  to  be  passed to
1074e18e3516Sopenharmony_ci       regexec(). The other modifiers are ignored, with a warning message.
1075e18e3516Sopenharmony_ci
1076e18e3516Sopenharmony_ci       There is one additional modifier that can be used with the POSIX  wrap-
1077e18e3516Sopenharmony_ci       per. It is ignored (with a warning) if used for non-POSIX matching.
1078e18e3516Sopenharmony_ci
1079e18e3516Sopenharmony_ci             posix_startend=<n>[:<m>]
1080e18e3516Sopenharmony_ci
1081e18e3516Sopenharmony_ci       This  causes  the  subject  string  to be passed to regexec() using the
1082e18e3516Sopenharmony_ci       REG_STARTEND option, which uses offsets to specify which  part  of  the
1083e18e3516Sopenharmony_ci       string  is  searched.  If  only  one number is given, the end offset is
1084e18e3516Sopenharmony_ci       passed as the end of the subject string. For more detail  of  REG_STAR-
1085e18e3516Sopenharmony_ci       TEND,  see the pcre2posix documentation. If the subject string contains
1086e18e3516Sopenharmony_ci       binary zeros (coded as escapes such as \x{00}  because  pcre2test  does
1087e18e3516Sopenharmony_ci       not support actual binary zeros in its input), you must use posix_star-
1088e18e3516Sopenharmony_ci       tend to specify its length.
1089e18e3516Sopenharmony_ci
1090e18e3516Sopenharmony_ci   Setting match controls
1091e18e3516Sopenharmony_ci
1092e18e3516Sopenharmony_ci       The following modifiers affect the matching process  or  request  addi-
1093e18e3516Sopenharmony_ci       tional  information.  Some  of  them may also be specified on a pattern
1094e18e3516Sopenharmony_ci       line (see above), in which case they apply to every subject  line  that
1095e18e3516Sopenharmony_ci       is  matched against that pattern, but can be overridden by modifiers on
1096e18e3516Sopenharmony_ci       the subject.
1097e18e3516Sopenharmony_ci
1098e18e3516Sopenharmony_ci             aftertext                  show text after match
1099e18e3516Sopenharmony_ci             allaftertext               show text after captures
1100e18e3516Sopenharmony_ci             allcaptures                show all captures
1101e18e3516Sopenharmony_ci             allvector                  show the entire ovector
1102e18e3516Sopenharmony_ci             allusedtext                show all consulted text (non-JIT only)
1103e18e3516Sopenharmony_ci             altglobal                  alternative global matching
1104e18e3516Sopenharmony_ci             callout_capture            show captures at callout time
1105e18e3516Sopenharmony_ci             callout_data=<n>           set a value to pass via callouts
1106e18e3516Sopenharmony_ci             callout_error=<n>[:<m>]    control callout error
1107e18e3516Sopenharmony_ci             callout_extra              show extra callout information
1108e18e3516Sopenharmony_ci             callout_fail=<n>[:<m>]     control callout failure
1109e18e3516Sopenharmony_ci             callout_no_where           do not show position of a callout
1110e18e3516Sopenharmony_ci             callout_none               do not supply a callout function
1111e18e3516Sopenharmony_ci             copy=<number or name>      copy captured substring
1112e18e3516Sopenharmony_ci             depth_limit=<n>            set a depth limit
1113e18e3516Sopenharmony_ci             dfa                        use pcre2_dfa_match()
1114e18e3516Sopenharmony_ci             find_limits                find heap, match and depth limits
1115e18e3516Sopenharmony_ci             find_limits_noheap         find match and depth limits
1116e18e3516Sopenharmony_ci             get=<number or name>       extract captured substring
1117e18e3516Sopenharmony_ci             getall                     extract all captured substrings
1118e18e3516Sopenharmony_ci         /g  global                     global matching
1119e18e3516Sopenharmony_ci             heap_limit=<n>             set a limit on heap memory (Kbytes)
1120e18e3516Sopenharmony_ci             jitstack=<n>               set size of JIT stack
1121e18e3516Sopenharmony_ci             mark                       show mark values
1122e18e3516Sopenharmony_ci             match_limit=<n>            set a match limit
1123e18e3516Sopenharmony_ci             memory                     show heap memory usage
1124e18e3516Sopenharmony_ci             null_context               match with a NULL context
1125e18e3516Sopenharmony_ci             null_replacement           substitute with NULL replacement
1126e18e3516Sopenharmony_ci             null_subject               match with NULL subject
1127e18e3516Sopenharmony_ci             offset=<n>                 set starting offset
1128e18e3516Sopenharmony_ci             offset_limit=<n>           set offset limit
1129e18e3516Sopenharmony_ci             ovector=<n>                set size of output vector
1130e18e3516Sopenharmony_ci             recursion_limit=<n>        obsolete synonym for depth_limit
1131e18e3516Sopenharmony_ci             replace=<string>           specify a replacement string
1132e18e3516Sopenharmony_ci             startchar                  show startchar when relevant
1133e18e3516Sopenharmony_ci             startoffset=<n>            same as offset=<n>
1134e18e3516Sopenharmony_ci             substitute_callout         use substitution callouts
1135e18e3516Sopenharmony_ci             substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
1136e18e3516Sopenharmony_ci             substitute_literal         use PCRE2_SUBSTITUTE_LITERAL
1137e18e3516Sopenharmony_ci             substitute_matched         use PCRE2_SUBSTITUTE_MATCHED
1138e18e3516Sopenharmony_ci             substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1139e18e3516Sopenharmony_ci             substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
1140e18e3516Sopenharmony_ci             substitute_skip=<n>        skip substitution number n
1141e18e3516Sopenharmony_ci             substitute_stop=<n>        skip substitution number n and greater
1142e18e3516Sopenharmony_ci             substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1143e18e3516Sopenharmony_ci             substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
1144e18e3516Sopenharmony_ci             zero_terminate             pass the subject as zero-terminated
1145e18e3516Sopenharmony_ci
1146e18e3516Sopenharmony_ci       The effects of these modifiers are described in the following sections.
1147e18e3516Sopenharmony_ci       When  matching  via the POSIX wrapper API, the aftertext, allaftertext,
1148e18e3516Sopenharmony_ci       and ovector subject modifiers work as described below. All other  modi-
1149e18e3516Sopenharmony_ci       fiers are either ignored, with a warning message, or cause an error.
1150e18e3516Sopenharmony_ci
1151e18e3516Sopenharmony_ci   Showing more text
1152e18e3516Sopenharmony_ci
1153e18e3516Sopenharmony_ci       The  aftertext modifier requests that as well as outputting the part of
1154e18e3516Sopenharmony_ci       the subject string that matched the entire pattern, pcre2test should in
1155e18e3516Sopenharmony_ci       addition output the remainder of the subject string. This is useful for
1156e18e3516Sopenharmony_ci       tests where the subject contains multiple copies of the same substring.
1157e18e3516Sopenharmony_ci       The  allaftertext  modifier  requests the same action for captured sub-
1158e18e3516Sopenharmony_ci       strings as well as the main matched substring. In each case the remain-
1159e18e3516Sopenharmony_ci       der is output on the following line with a plus character following the
1160e18e3516Sopenharmony_ci       capture number.
1161e18e3516Sopenharmony_ci
1162e18e3516Sopenharmony_ci       The allusedtext modifier requests that all the text that was  consulted
1163e18e3516Sopenharmony_ci       during  a  successful pattern match by the interpreter should be shown,
1164e18e3516Sopenharmony_ci       for both full and partial matches. This feature is  not  supported  for
1165e18e3516Sopenharmony_ci       JIT  matching,  and if requested with JIT it is ignored (with a warning
1166e18e3516Sopenharmony_ci       message). Setting this modifier affects the output if there is a  look-
1167e18e3516Sopenharmony_ci       behind  at  the start of a match, or, for a complete match, a lookahead
1168e18e3516Sopenharmony_ci       at the end, or if \K is used in the pattern. Characters that precede or
1169e18e3516Sopenharmony_ci       follow  the start and end of the actual match are indicated in the out-
1170e18e3516Sopenharmony_ci       put by '<' or '>' characters underneath them.  Here is an example:
1171e18e3516Sopenharmony_ci
1172e18e3516Sopenharmony_ci           re> /(?<=pqr)abc(?=xyz)/
1173e18e3516Sopenharmony_ci         data> 123pqrabcxyz456\=allusedtext
1174e18e3516Sopenharmony_ci          0: pqrabcxyz
1175e18e3516Sopenharmony_ci             <<<   >>>
1176e18e3516Sopenharmony_ci         data> 123pqrabcxy\=ph,allusedtext
1177e18e3516Sopenharmony_ci         Partial match: pqrabcxy
1178e18e3516Sopenharmony_ci                        <<<
1179e18e3516Sopenharmony_ci
1180e18e3516Sopenharmony_ci       The first, complete match shows that the matched string is "abc",  with
1181e18e3516Sopenharmony_ci       the  preceding  and  following strings "pqr" and "xyz" having been con-
1182e18e3516Sopenharmony_ci       sulted during the match (when processing the assertions).  The  partial
1183e18e3516Sopenharmony_ci       match can indicate only the preceding string.
1184e18e3516Sopenharmony_ci
1185e18e3516Sopenharmony_ci       The  startchar  modifier  requests  that the starting character for the
1186e18e3516Sopenharmony_ci       match be indicated, if it is different to  the  start  of  the  matched
1187e18e3516Sopenharmony_ci       string. The only time when this occurs is when \K has been processed as
1188e18e3516Sopenharmony_ci       part of the match. In this situation, the output for the matched string
1189e18e3516Sopenharmony_ci       is  displayed  from  the  starting  character instead of from the match
1190e18e3516Sopenharmony_ci       point, with circumflex characters under the earlier characters. For ex-
1191e18e3516Sopenharmony_ci       ample:
1192e18e3516Sopenharmony_ci
1193e18e3516Sopenharmony_ci           re> /abc\Kxyz/
1194e18e3516Sopenharmony_ci         data> abcxyz\=startchar
1195e18e3516Sopenharmony_ci          0: abcxyz
1196e18e3516Sopenharmony_ci             ^^^
1197e18e3516Sopenharmony_ci
1198e18e3516Sopenharmony_ci       Unlike  allusedtext, the startchar modifier can be used with JIT.  How-
1199e18e3516Sopenharmony_ci       ever, these two modifiers are mutually exclusive.
1200e18e3516Sopenharmony_ci
1201e18e3516Sopenharmony_ci   Showing the value of all capture groups
1202e18e3516Sopenharmony_ci
1203e18e3516Sopenharmony_ci       The allcaptures modifier requests that the values of all potential cap-
1204e18e3516Sopenharmony_ci       tured parentheses be output after a match. By default, only those up to
1205e18e3516Sopenharmony_ci       the highest one actually used in the match are output (corresponding to
1206e18e3516Sopenharmony_ci       the  return  code from pcre2_match()). Groups that did not take part in
1207e18e3516Sopenharmony_ci       the match are output as "<unset>". This modifier is  not  relevant  for
1208e18e3516Sopenharmony_ci       DFA  matching (which does no capturing) and does not apply when replace
1209e18e3516Sopenharmony_ci       is specified; it is ignored, with a warning message, if present.
1210e18e3516Sopenharmony_ci
1211e18e3516Sopenharmony_ci   Showing the entire ovector, for all outcomes
1212e18e3516Sopenharmony_ci
1213e18e3516Sopenharmony_ci       The allvector modifier requests that the entire ovector be shown, what-
1214e18e3516Sopenharmony_ci       ever the outcome of the match. Compare allcaptures, which shows only up
1215e18e3516Sopenharmony_ci       to the maximum number of capture groups for the pattern, and then  only
1216e18e3516Sopenharmony_ci       for  a successful complete non-DFA match. This modifier, which acts af-
1217e18e3516Sopenharmony_ci       ter any match result, and also for DFA matching, provides  a  means  of
1218e18e3516Sopenharmony_ci       checking  that there are no unexpected modifications to ovector fields.
1219e18e3516Sopenharmony_ci       Before each match attempt, the ovector is filled with a special  value,
1220e18e3516Sopenharmony_ci       and  if  this  is  found  in  both  elements of a capturing pair, "<un-
1221e18e3516Sopenharmony_ci       changed>" is output. After a successful  match,  this  applies  to  all
1222e18e3516Sopenharmony_ci       groups  after the maximum capture group for the pattern. In other cases
1223e18e3516Sopenharmony_ci       it applies to the entire ovector. After a partial match, the first  two
1224e18e3516Sopenharmony_ci       elements  are  the only ones that should be set. After a DFA match, the
1225e18e3516Sopenharmony_ci       amount of ovector that is used depends on the number  of  matches  that
1226e18e3516Sopenharmony_ci       were found.
1227e18e3516Sopenharmony_ci
1228e18e3516Sopenharmony_ci   Testing pattern callouts
1229e18e3516Sopenharmony_ci
1230e18e3516Sopenharmony_ci       A  callout function is supplied when pcre2test calls the library match-
1231e18e3516Sopenharmony_ci       ing functions, unless callout_none is specified. Its behaviour  can  be
1232e18e3516Sopenharmony_ci       controlled  by  various  modifiers  listed above whose names begin with
1233e18e3516Sopenharmony_ci       callout_. Details are given in the section entitled  "Callouts"  below.
1234e18e3516Sopenharmony_ci       Testing  callouts  from  pcre2_substitute()  is described separately in
1235e18e3516Sopenharmony_ci       "Testing the substitution function" below.
1236e18e3516Sopenharmony_ci
1237e18e3516Sopenharmony_ci   Finding all matches in a string
1238e18e3516Sopenharmony_ci
1239e18e3516Sopenharmony_ci       Searching for all possible matches within a subject can be requested by
1240e18e3516Sopenharmony_ci       the  global  or altglobal modifier. After finding a match, the matching
1241e18e3516Sopenharmony_ci       function is called again to search the remainder of  the  subject.  The
1242e18e3516Sopenharmony_ci       difference  between  global  and  altglobal is that the former uses the
1243e18e3516Sopenharmony_ci       start_offset argument to pcre2_match() or  pcre2_dfa_match()  to  start
1244e18e3516Sopenharmony_ci       searching  at  a new point within the entire string (which is what Perl
1245e18e3516Sopenharmony_ci       does), whereas the latter passes over a shortened subject. This makes a
1246e18e3516Sopenharmony_ci       difference to the matching process if the pattern begins with a lookbe-
1247e18e3516Sopenharmony_ci       hind assertion (including \b or \B).
1248e18e3516Sopenharmony_ci
1249e18e3516Sopenharmony_ci       If an empty string  is  matched,  the  next  match  is  done  with  the
1250e18e3516Sopenharmony_ci       PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
1251e18e3516Sopenharmony_ci       for another, non-empty, match at the same point in the subject. If this
1252e18e3516Sopenharmony_ci       match  fails, the start offset is advanced, and the normal match is re-
1253e18e3516Sopenharmony_ci       tried. This imitates the way Perl handles such cases when using the  /g
1254e18e3516Sopenharmony_ci       modifier  or  the  split()  function. Normally, the start offset is ad-
1255e18e3516Sopenharmony_ci       vanced by one character, but if the newline convention recognizes  CRLF
1256e18e3516Sopenharmony_ci       as  a  newline,  and the current character is CR followed by LF, an ad-
1257e18e3516Sopenharmony_ci       vance of two characters occurs.
1258e18e3516Sopenharmony_ci
1259e18e3516Sopenharmony_ci   Testing substring extraction functions
1260e18e3516Sopenharmony_ci
1261e18e3516Sopenharmony_ci       The copy  and  get  modifiers  can  be  used  to  test  the  pcre2_sub-
1262e18e3516Sopenharmony_ci       string_copy_xxx() and pcre2_substring_get_xxx() functions.  They can be
1263e18e3516Sopenharmony_ci       given more than once, and each can specify a capture group name or num-
1264e18e3516Sopenharmony_ci       ber, for example:
1265e18e3516Sopenharmony_ci
1266e18e3516Sopenharmony_ci          abcd\=copy=1,copy=3,get=G1
1267e18e3516Sopenharmony_ci
1268e18e3516Sopenharmony_ci       If  the  #subject command is used to set default copy and/or get lists,
1269e18e3516Sopenharmony_ci       these can be unset by specifying a negative number to cancel  all  num-
1270e18e3516Sopenharmony_ci       bered groups and an empty name to cancel all named groups.
1271e18e3516Sopenharmony_ci
1272e18e3516Sopenharmony_ci       The  getall  modifier  tests pcre2_substring_list_get(), which extracts
1273e18e3516Sopenharmony_ci       all captured substrings.
1274e18e3516Sopenharmony_ci
1275e18e3516Sopenharmony_ci       If the subject line is successfully matched, the  substrings  extracted
1276e18e3516Sopenharmony_ci       by  the  convenience  functions  are  output  with C, G, or L after the
1277e18e3516Sopenharmony_ci       string number instead of a colon. This is in  addition  to  the  normal
1278e18e3516Sopenharmony_ci       full  list.  The string length (that is, the return from the extraction
1279e18e3516Sopenharmony_ci       function) is given in parentheses after each substring, followed by the
1280e18e3516Sopenharmony_ci       name when the extraction was by name.
1281e18e3516Sopenharmony_ci
1282e18e3516Sopenharmony_ci   Testing the substitution function
1283e18e3516Sopenharmony_ci
1284e18e3516Sopenharmony_ci       If  the  replace  modifier  is  set, the pcre2_substitute() function is
1285e18e3516Sopenharmony_ci       called instead of one of the matching functions (or after one  call  of
1286e18e3516Sopenharmony_ci       pcre2_match()  in  the case of PCRE2_SUBSTITUTE_MATCHED). Note that re-
1287e18e3516Sopenharmony_ci       placement strings cannot contain commas, because a comma signifies  the
1288e18e3516Sopenharmony_ci       end  of  a  modifier. This is not thought to be an issue in a test pro-
1289e18e3516Sopenharmony_ci       gram.
1290e18e3516Sopenharmony_ci
1291e18e3516Sopenharmony_ci       Specifying a completely empty replacement string  disables  this  modi-
1292e18e3516Sopenharmony_ci       fier.   However, it is possible to specify an empty replacement by pro-
1293e18e3516Sopenharmony_ci       viding a buffer length, as described below, for an otherwise empty  re-
1294e18e3516Sopenharmony_ci       placement.
1295e18e3516Sopenharmony_ci
1296e18e3516Sopenharmony_ci       Unlike  subject strings, pcre2test does not process replacement strings
1297e18e3516Sopenharmony_ci       for escape sequences. In UTF mode, a replacement string is  checked  to
1298e18e3516Sopenharmony_ci       see  if it is a valid UTF-8 string. If so, it is correctly converted to
1299e18e3516Sopenharmony_ci       a UTF string of the appropriate code unit width. If it is not  a  valid
1300e18e3516Sopenharmony_ci       UTF-8  string, the individual code units are copied directly. This pro-
1301e18e3516Sopenharmony_ci       vides a means of passing an invalid UTF-8 string for testing purposes.
1302e18e3516Sopenharmony_ci
1303e18e3516Sopenharmony_ci       The following modifiers set options (in additional to the normal  match
1304e18e3516Sopenharmony_ci       options) for pcre2_substitute():
1305e18e3516Sopenharmony_ci
1306e18e3516Sopenharmony_ci         global                      PCRE2_SUBSTITUTE_GLOBAL
1307e18e3516Sopenharmony_ci         substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
1308e18e3516Sopenharmony_ci         substitute_literal          PCRE2_SUBSTITUTE_LITERAL
1309e18e3516Sopenharmony_ci         substitute_matched          PCRE2_SUBSTITUTE_MATCHED
1310e18e3516Sopenharmony_ci         substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1311e18e3516Sopenharmony_ci         substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
1312e18e3516Sopenharmony_ci         substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1313e18e3516Sopenharmony_ci         substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
1314e18e3516Sopenharmony_ci
1315e18e3516Sopenharmony_ci       See the pcre2api documentation for details of these options.
1316e18e3516Sopenharmony_ci
1317e18e3516Sopenharmony_ci       After  a  successful  substitution, the modified string is output, pre-
1318e18e3516Sopenharmony_ci       ceded by the number of replacements. This may be zero if there were  no
1319e18e3516Sopenharmony_ci       matches. Here is a simple example of a substitution test:
1320e18e3516Sopenharmony_ci
1321e18e3516Sopenharmony_ci         /abc/replace=xxx
1322e18e3516Sopenharmony_ci             =abc=abc=
1323e18e3516Sopenharmony_ci          1: =xxx=abc=
1324e18e3516Sopenharmony_ci             =abc=abc=\=global
1325e18e3516Sopenharmony_ci          2: =xxx=xxx=
1326e18e3516Sopenharmony_ci
1327e18e3516Sopenharmony_ci       Subject  and replacement strings should be kept relatively short (fewer
1328e18e3516Sopenharmony_ci       than 256 characters) for substitution tests, as fixed-size buffers  are
1329e18e3516Sopenharmony_ci       used.  To  make it easy to test for buffer overflow, if the replacement
1330e18e3516Sopenharmony_ci       string starts with a number in square brackets, that number  is  passed
1331e18e3516Sopenharmony_ci       to  pcre2_substitute()  as  the size of the output buffer, with the re-
1332e18e3516Sopenharmony_ci       placement string starting at the next character.  Here  is  an  example
1333e18e3516Sopenharmony_ci       that tests the edge case:
1334e18e3516Sopenharmony_ci
1335e18e3516Sopenharmony_ci         /abc/
1336e18e3516Sopenharmony_ci             123abc123\=replace=[10]XYZ
1337e18e3516Sopenharmony_ci          1: 123XYZ123
1338e18e3516Sopenharmony_ci             123abc123\=replace=[9]XYZ
1339e18e3516Sopenharmony_ci         Failed: error -47: no more memory
1340e18e3516Sopenharmony_ci
1341e18e3516Sopenharmony_ci       The  default  action  of  pcre2_substitute()  is  to  return  PCRE2_ER-
1342e18e3516Sopenharmony_ci       ROR_NOMEMORY when the output buffer  is  too  small.  However,  if  the
1343e18e3516Sopenharmony_ci       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  option  is  set (by using the substi-
1344e18e3516Sopenharmony_ci       tute_overflow_length  modifier),  pcre2_substitute()  continues  to  go
1345e18e3516Sopenharmony_ci       through  the  motions  of  matching and substituting (but not doing any
1346e18e3516Sopenharmony_ci       callouts), in order to compute the size of  buffer  that  is  required.
1347e18e3516Sopenharmony_ci       When  this  happens,  pcre2test shows the required buffer length (which
1348e18e3516Sopenharmony_ci       includes space for the trailing zero) as part of the error message. For
1349e18e3516Sopenharmony_ci       example:
1350e18e3516Sopenharmony_ci
1351e18e3516Sopenharmony_ci         /abc/substitute_overflow_length
1352e18e3516Sopenharmony_ci             123abc123\=replace=[9]XYZ
1353e18e3516Sopenharmony_ci         Failed: error -47: no more memory: 10 code units are needed
1354e18e3516Sopenharmony_ci
1355e18e3516Sopenharmony_ci       A replacement string is ignored with POSIX and DFA matching. Specifying
1356e18e3516Sopenharmony_ci       partial matching provokes an error return  ("bad  option  value")  from
1357e18e3516Sopenharmony_ci       pcre2_substitute().
1358e18e3516Sopenharmony_ci
1359e18e3516Sopenharmony_ci   Testing substitute callouts
1360e18e3516Sopenharmony_ci
1361e18e3516Sopenharmony_ci       If the substitute_callout modifier is set, a substitution callout func-
1362e18e3516Sopenharmony_ci       tion is set up. The null_context modifier must not be set, because  the
1363e18e3516Sopenharmony_ci       address  of the callout function is passed in a match context. When the
1364e18e3516Sopenharmony_ci       callout function is called (after each substitution),  details  of  the
1365e18e3516Sopenharmony_ci       the input and output strings are output. For example:
1366e18e3516Sopenharmony_ci
1367e18e3516Sopenharmony_ci         /abc/g,replace=<$0>,substitute_callout
1368e18e3516Sopenharmony_ci             abcdefabcpqr
1369e18e3516Sopenharmony_ci          1(1) Old 0 3 "abc" New 0 5 "<abc>"
1370e18e3516Sopenharmony_ci          2(1) Old 6 9 "abc" New 8 13 "<abc>"
1371e18e3516Sopenharmony_ci          2: <abc>def<abc>pqr
1372e18e3516Sopenharmony_ci
1373e18e3516Sopenharmony_ci       The  first  number  on  each  callout line is the count of matches. The
1374e18e3516Sopenharmony_ci       parenthesized number is the number of pairs that are set in the ovector
1375e18e3516Sopenharmony_ci       (that  is, one more than the number of capturing groups that were set).
1376e18e3516Sopenharmony_ci       Then are listed the offsets of the old substring, its contents, and the
1377e18e3516Sopenharmony_ci       same for the replacement.
1378e18e3516Sopenharmony_ci
1379e18e3516Sopenharmony_ci       By  default,  the substitution callout function returns zero, which ac-
1380e18e3516Sopenharmony_ci       cepts the replacement and causes matching to continue if /g  was  used.
1381e18e3516Sopenharmony_ci       Two  further modifiers can be used to test other return values. If sub-
1382e18e3516Sopenharmony_ci       stitute_skip is set to a value greater than zero the  callout  function
1383e18e3516Sopenharmony_ci       returns  +1 for the match of that number, and similarly substitute_stop
1384e18e3516Sopenharmony_ci       returns -1. These cause the replacement to be rejected, and  -1  causes
1385e18e3516Sopenharmony_ci       no  further  matching to take place. If either of them are set, substi-
1386e18e3516Sopenharmony_ci       tute_callout is assumed. For example:
1387e18e3516Sopenharmony_ci
1388e18e3516Sopenharmony_ci         /abc/g,replace=<$0>,substitute_skip=1
1389e18e3516Sopenharmony_ci             abcdefabcpqr
1390e18e3516Sopenharmony_ci          1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
1391e18e3516Sopenharmony_ci          2(1) Old 6 9 "abc" New 6 11 "<abc>"
1392e18e3516Sopenharmony_ci          2: abcdef<abc>pqr
1393e18e3516Sopenharmony_ci             abcdefabcpqr\=substitute_stop=1
1394e18e3516Sopenharmony_ci          1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
1395e18e3516Sopenharmony_ci          1: abcdefabcpqr
1396e18e3516Sopenharmony_ci
1397e18e3516Sopenharmony_ci       If both are set for the same number, stop takes precedence. Only a sin-
1398e18e3516Sopenharmony_ci       gle skip or stop is supported, which is sufficient for testing that the
1399e18e3516Sopenharmony_ci       feature works.
1400e18e3516Sopenharmony_ci
1401e18e3516Sopenharmony_ci   Setting the JIT stack size
1402e18e3516Sopenharmony_ci
1403e18e3516Sopenharmony_ci       The jitstack modifier provides a way of setting the maximum stack  size
1404e18e3516Sopenharmony_ci       that  is  used  by the just-in-time optimization code. It is ignored if
1405e18e3516Sopenharmony_ci       JIT optimization is not being used. The value is a number of  kibibytes
1406e18e3516Sopenharmony_ci       (units  of  1024  bytes). Setting zero reverts to the default of 32KiB.
1407e18e3516Sopenharmony_ci       Providing a stack that is larger than the default is necessary only for
1408e18e3516Sopenharmony_ci       very  complicated  patterns.  If  jitstack is set non-zero on a subject
1409e18e3516Sopenharmony_ci       line it overrides any value that was set on the pattern.
1410e18e3516Sopenharmony_ci
1411e18e3516Sopenharmony_ci   Setting heap, match, and depth limits
1412e18e3516Sopenharmony_ci
1413e18e3516Sopenharmony_ci       The heap_limit, match_limit, and depth_limit modifiers set  the  appro-
1414e18e3516Sopenharmony_ci       priate  limits  in the match context. These values are ignored when the
1415e18e3516Sopenharmony_ci       find_limits or find_limits_noheap modifier is specified.
1416e18e3516Sopenharmony_ci
1417e18e3516Sopenharmony_ci   Finding minimum limits
1418e18e3516Sopenharmony_ci
1419e18e3516Sopenharmony_ci       If the find_limits modifier is present on  a  subject  line,  pcre2test
1420e18e3516Sopenharmony_ci       calls  the  relevant matching function several times, setting different
1421e18e3516Sopenharmony_ci       values   in   the    match    context    via    pcre2_set_heap_limit(),
1422e18e3516Sopenharmony_ci       pcre2_set_match_limit(),  or pcre2_set_depth_limit() until it finds the
1423e18e3516Sopenharmony_ci       smallest value for each parameter that allows  the  match  to  complete
1424e18e3516Sopenharmony_ci       without a "limit exceeded" error. The match itself may succeed or fail.
1425e18e3516Sopenharmony_ci       An alternative modifier, find_limits_noheap, omits the heap limit. This
1426e18e3516Sopenharmony_ci       is  used  in  the standard tests, because the minimum heap limit varies
1427e18e3516Sopenharmony_ci       between systems. If JIT is being used, only the match  limit  is  rele-
1428e18e3516Sopenharmony_ci       vant, and the other two are automatically omitted.
1429e18e3516Sopenharmony_ci
1430e18e3516Sopenharmony_ci       When using this modifier, the pattern should not contain any limit set-
1431e18e3516Sopenharmony_ci       tings such as (*LIMIT_MATCH=...)  within  it.  If  such  a  setting  is
1432e18e3516Sopenharmony_ci       present and is lower than the minimum matching value, the minimum value
1433e18e3516Sopenharmony_ci       cannot be found because pcre2_set_match_limit() etc. are only  able  to
1434e18e3516Sopenharmony_ci       reduce the value of an in-pattern limit; they cannot increase it.
1435e18e3516Sopenharmony_ci
1436e18e3516Sopenharmony_ci       For  non-DFA  matching,  the minimum depth_limit number is a measure of
1437e18e3516Sopenharmony_ci       how much nested backtracking happens (that is, how deeply the pattern's
1438e18e3516Sopenharmony_ci       tree  is  searched).  In the case of DFA matching, depth_limit controls
1439e18e3516Sopenharmony_ci       the depth of recursive calls of the internal function that is used  for
1440e18e3516Sopenharmony_ci       handling pattern recursion, lookaround assertions, and atomic groups.
1441e18e3516Sopenharmony_ci
1442e18e3516Sopenharmony_ci       For non-DFA matching, the match_limit number is a measure of the amount
1443e18e3516Sopenharmony_ci       of backtracking that takes place, and learning the minimum value can be
1444e18e3516Sopenharmony_ci       instructive.  For  most  simple matches, the number is quite small, but
1445e18e3516Sopenharmony_ci       for patterns with very large numbers of matching possibilities, it  can
1446e18e3516Sopenharmony_ci       become  large very quickly with increasing length of subject string. In
1447e18e3516Sopenharmony_ci       the case of DFA matching, match_limit  controls  the  total  number  of
1448e18e3516Sopenharmony_ci       calls, both recursive and non-recursive, to the internal matching func-
1449e18e3516Sopenharmony_ci       tion, thus controlling the overall amount of computing resource that is
1450e18e3516Sopenharmony_ci       used.
1451e18e3516Sopenharmony_ci
1452e18e3516Sopenharmony_ci       For  both  kinds  of  matching,  the  heap_limit  number,  which  is in
1453e18e3516Sopenharmony_ci       kibibytes (units of 1024 bytes), limits the amount of heap memory  used
1454e18e3516Sopenharmony_ci       for matching.
1455e18e3516Sopenharmony_ci
1456e18e3516Sopenharmony_ci   Showing MARK names
1457e18e3516Sopenharmony_ci
1458e18e3516Sopenharmony_ci
1459e18e3516Sopenharmony_ci       The mark modifier causes the names from backtracking control verbs that
1460e18e3516Sopenharmony_ci       are returned from calls to pcre2_match() to be displayed. If a mark  is
1461e18e3516Sopenharmony_ci       returned  for a match, non-match, or partial match, pcre2test shows it.
1462e18e3516Sopenharmony_ci       For a match, it is on a line by itself, tagged with  "MK:".  Otherwise,
1463e18e3516Sopenharmony_ci       it is added to the non-match message.
1464e18e3516Sopenharmony_ci
1465e18e3516Sopenharmony_ci   Showing memory usage
1466e18e3516Sopenharmony_ci
1467e18e3516Sopenharmony_ci       The  memory modifier causes pcre2test to log the sizes of all heap mem-
1468e18e3516Sopenharmony_ci       ory  allocation  and  freeing  calls  that  occur  during  a  call   to
1469e18e3516Sopenharmony_ci       pcre2_match()  or pcre2_dfa_match(). In the latter case, heap memory is
1470e18e3516Sopenharmony_ci       used only when a match requires more internal workspace  that  the  de-
1471e18e3516Sopenharmony_ci       fault  allocation  on the stack, so in many cases there will be no out-
1472e18e3516Sopenharmony_ci       put. No heap memory is allocated during matching  with  JIT.  For  this
1473e18e3516Sopenharmony_ci       modifier to work, the null_context modifier must not be set on both the
1474e18e3516Sopenharmony_ci       pattern and the subject, though it can be set on one or the other.
1475e18e3516Sopenharmony_ci
1476e18e3516Sopenharmony_ci   Setting a starting offset
1477e18e3516Sopenharmony_ci
1478e18e3516Sopenharmony_ci       The offset modifier sets an offset  in  the  subject  string  at  which
1479e18e3516Sopenharmony_ci       matching starts. Its value is a number of code units, not characters.
1480e18e3516Sopenharmony_ci
1481e18e3516Sopenharmony_ci   Setting an offset limit
1482e18e3516Sopenharmony_ci
1483e18e3516Sopenharmony_ci       The  offset_limit  modifier  sets  a limit for unanchored matches. If a
1484e18e3516Sopenharmony_ci       match cannot be found starting at or before this offset in the subject,
1485e18e3516Sopenharmony_ci       a "no match" return is given. The data value is a number of code units,
1486e18e3516Sopenharmony_ci       not characters. When this modifier is used, the use_offset_limit  modi-
1487e18e3516Sopenharmony_ci       fier must have been set for the pattern; if not, an error is generated.
1488e18e3516Sopenharmony_ci
1489e18e3516Sopenharmony_ci   Setting the size of the output vector
1490e18e3516Sopenharmony_ci
1491e18e3516Sopenharmony_ci       The  ovector  modifier applies only to the subject line in which it ap-
1492e18e3516Sopenharmony_ci       pears, though of course it can also be used to set a default in a #sub-
1493e18e3516Sopenharmony_ci       ject  command.  It  specifies  the  number of pairs of offsets that are
1494e18e3516Sopenharmony_ci       available for storing matching information. The default is 15.
1495e18e3516Sopenharmony_ci
1496e18e3516Sopenharmony_ci       A value of zero is useful when testing the POSIX API because it  causes
1497e18e3516Sopenharmony_ci       regexec() to be called with a NULL capture vector. When not testing the
1498e18e3516Sopenharmony_ci       POSIX API, a value of  zero  is  used  to  cause  pcre2_match_data_cre-
1499e18e3516Sopenharmony_ci       ate_from_pattern()  to  be  called, in order to create a match block of
1500e18e3516Sopenharmony_ci       exactly the right size for the pattern. (It is not possible to create a
1501e18e3516Sopenharmony_ci       match  block  with  a zero-length ovector; there is always at least one
1502e18e3516Sopenharmony_ci       pair of offsets.)
1503e18e3516Sopenharmony_ci
1504e18e3516Sopenharmony_ci   Passing the subject as zero-terminated
1505e18e3516Sopenharmony_ci
1506e18e3516Sopenharmony_ci       By default, the subject string is passed to a native API matching func-
1507e18e3516Sopenharmony_ci       tion with its correct length. In order to test the facility for passing
1508e18e3516Sopenharmony_ci       a zero-terminated string, the zero_terminate modifier is  provided.  It
1509e18e3516Sopenharmony_ci       causes  the length to be passed as PCRE2_ZERO_TERMINATED. When matching
1510e18e3516Sopenharmony_ci       via the POSIX interface, this modifier is ignored, with a warning.
1511e18e3516Sopenharmony_ci
1512e18e3516Sopenharmony_ci       When testing pcre2_substitute(), this modifier also has the  effect  of
1513e18e3516Sopenharmony_ci       passing the replacement string as zero-terminated.
1514e18e3516Sopenharmony_ci
1515e18e3516Sopenharmony_ci   Passing a NULL context, subject, or replacement
1516e18e3516Sopenharmony_ci
1517e18e3516Sopenharmony_ci       Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
1518e18e3516Sopenharmony_ci       pcre2_dfa_match(), pcre2_jit_match()  or  pcre2_substitute().   If  the
1519e18e3516Sopenharmony_ci       null_context  modifier  is  set,  however,  NULL is passed. This is for
1520e18e3516Sopenharmony_ci       testing that the matching and substitution functions  behave  correctly
1521e18e3516Sopenharmony_ci       in  this  case  (they use default values). This modifier cannot be used
1522e18e3516Sopenharmony_ci       with the find_limits, find_limits_noheap, or  substitute_callout  modi-
1523e18e3516Sopenharmony_ci       fiers.
1524e18e3516Sopenharmony_ci
1525e18e3516Sopenharmony_ci       Similarly,  for  testing purposes, if the null_subject or null_replace-
1526e18e3516Sopenharmony_ci       ment modifier is set, the subject or replacement  string  pointers  are
1527e18e3516Sopenharmony_ci       passed as NULL, respectively, to the relevant functions.
1528e18e3516Sopenharmony_ci
1529e18e3516Sopenharmony_ci
1530e18e3516Sopenharmony_ciTHE ALTERNATIVE MATCHING FUNCTION
1531e18e3516Sopenharmony_ci
1532e18e3516Sopenharmony_ci       By  default,  pcre2test  uses  the  standard  PCRE2  matching function,
1533e18e3516Sopenharmony_ci       pcre2_match() to match each subject line. PCRE2 also supports an alter-
1534e18e3516Sopenharmony_ci       native  matching  function, pcre2_dfa_match(), which operates in a dif-
1535e18e3516Sopenharmony_ci       ferent way, and has some restrictions. The differences between the  two
1536e18e3516Sopenharmony_ci       functions are described in the pcre2matching documentation.
1537e18e3516Sopenharmony_ci
1538e18e3516Sopenharmony_ci       If  the dfa modifier is set, the alternative matching function is used.
1539e18e3516Sopenharmony_ci       This function finds all possible matches at a given point in  the  sub-
1540e18e3516Sopenharmony_ci       ject.  If,  however, the dfa_shortest modifier is set, processing stops
1541e18e3516Sopenharmony_ci       after the first match is found. This is always  the  shortest  possible
1542e18e3516Sopenharmony_ci       match.
1543e18e3516Sopenharmony_ci
1544e18e3516Sopenharmony_ci
1545e18e3516Sopenharmony_ciDEFAULT OUTPUT FROM pcre2test
1546e18e3516Sopenharmony_ci
1547e18e3516Sopenharmony_ci       This  section  describes  the output when the normal matching function,
1548e18e3516Sopenharmony_ci       pcre2_match(), is being used.
1549e18e3516Sopenharmony_ci
1550e18e3516Sopenharmony_ci       When a match succeeds, pcre2test outputs  the  list  of  captured  sub-
1551e18e3516Sopenharmony_ci       strings,  starting  with number 0 for the string that matched the whole
1552e18e3516Sopenharmony_ci       pattern.  Otherwise, it outputs "No match" when the return is PCRE2_ER-
1553e18e3516Sopenharmony_ci       ROR_NOMATCH,  or  "Partial  match:"  followed by the partially matching
1554e18e3516Sopenharmony_ci       substring when the return is PCRE2_ERROR_PARTIAL. (Note  that  this  is
1555e18e3516Sopenharmony_ci       the  entire  substring  that was inspected during the partial match; it
1556e18e3516Sopenharmony_ci       may include characters before the actual match start  if  a  lookbehind
1557e18e3516Sopenharmony_ci       assertion, \K, \b, or \B was involved.)
1558e18e3516Sopenharmony_ci
1559e18e3516Sopenharmony_ci       For any other return, pcre2test outputs the PCRE2 negative error number
1560e18e3516Sopenharmony_ci       and a short descriptive phrase. If the error is  a  failed  UTF  string
1561e18e3516Sopenharmony_ci       check,  the  code  unit offset of the start of the failing character is
1562e18e3516Sopenharmony_ci       also output. Here is an example of an interactive pcre2test run.
1563e18e3516Sopenharmony_ci
1564e18e3516Sopenharmony_ci         $ pcre2test
1565e18e3516Sopenharmony_ci         PCRE2 version 10.22 2016-07-29
1566e18e3516Sopenharmony_ci
1567e18e3516Sopenharmony_ci           re> /^abc(\d+)/
1568e18e3516Sopenharmony_ci         data> abc123
1569e18e3516Sopenharmony_ci          0: abc123
1570e18e3516Sopenharmony_ci          1: 123
1571e18e3516Sopenharmony_ci         data> xyz
1572e18e3516Sopenharmony_ci         No match
1573e18e3516Sopenharmony_ci
1574e18e3516Sopenharmony_ci       Unset capturing substrings that are not followed by one that is set are
1575e18e3516Sopenharmony_ci       not shown by pcre2test unless the allcaptures modifier is specified. In
1576e18e3516Sopenharmony_ci       the following example, there are two capturing substrings, but when the
1577e18e3516Sopenharmony_ci       first  data  line is matched, the second, unset substring is not shown.
1578e18e3516Sopenharmony_ci       An "internal" unset substring is shown as "<unset>", as for the  second
1579e18e3516Sopenharmony_ci       data line.
1580e18e3516Sopenharmony_ci
1581e18e3516Sopenharmony_ci           re> /(a)|(b)/
1582e18e3516Sopenharmony_ci         data> a
1583e18e3516Sopenharmony_ci          0: a
1584e18e3516Sopenharmony_ci          1: a
1585e18e3516Sopenharmony_ci         data> b
1586e18e3516Sopenharmony_ci          0: b
1587e18e3516Sopenharmony_ci          1: <unset>
1588e18e3516Sopenharmony_ci          2: b
1589e18e3516Sopenharmony_ci
1590e18e3516Sopenharmony_ci       If  the strings contain any non-printing characters, they are output as
1591e18e3516Sopenharmony_ci       \xhh escapes if the value is less than 256 and UTF  mode  is  not  set.
1592e18e3516Sopenharmony_ci       Otherwise they are output as \x{hh...} escapes. See below for the defi-
1593e18e3516Sopenharmony_ci       nition of non-printing characters. If the aftertext  modifier  is  set,
1594e18e3516Sopenharmony_ci       the  output  for substring 0 is followed by the the rest of the subject
1595e18e3516Sopenharmony_ci       string, identified by "0+" like this:
1596e18e3516Sopenharmony_ci
1597e18e3516Sopenharmony_ci           re> /cat/aftertext
1598e18e3516Sopenharmony_ci         data> cataract
1599e18e3516Sopenharmony_ci          0: cat
1600e18e3516Sopenharmony_ci          0+ aract
1601e18e3516Sopenharmony_ci
1602e18e3516Sopenharmony_ci       If global matching is requested, the results of successive matching at-
1603e18e3516Sopenharmony_ci       tempts are output in sequence, like this:
1604e18e3516Sopenharmony_ci
1605e18e3516Sopenharmony_ci           re> /\Bi(\w\w)/g
1606e18e3516Sopenharmony_ci         data> Mississippi
1607e18e3516Sopenharmony_ci          0: iss
1608e18e3516Sopenharmony_ci          1: ss
1609e18e3516Sopenharmony_ci          0: iss
1610e18e3516Sopenharmony_ci          1: ss
1611e18e3516Sopenharmony_ci          0: ipp
1612e18e3516Sopenharmony_ci          1: pp
1613e18e3516Sopenharmony_ci
1614e18e3516Sopenharmony_ci       "No  match" is output only if the first match attempt fails. Here is an
1615e18e3516Sopenharmony_ci       example of a failure message (the offset 4 that  is  specified  by  the
1616e18e3516Sopenharmony_ci       offset modifier is past the end of the subject string):
1617e18e3516Sopenharmony_ci
1618e18e3516Sopenharmony_ci           re> /xyz/
1619e18e3516Sopenharmony_ci         data> xyz\=offset=4
1620e18e3516Sopenharmony_ci         Error -24 (bad offset value)
1621e18e3516Sopenharmony_ci
1622e18e3516Sopenharmony_ci       Note that whereas patterns can be continued over several lines (a plain
1623e18e3516Sopenharmony_ci       ">" prompt is used for continuations), subject lines may  not.  However
1624e18e3516Sopenharmony_ci       newlines can be included in a subject by means of the \n escape (or \r,
1625e18e3516Sopenharmony_ci       \r\n, etc., depending on the newline sequence setting).
1626e18e3516Sopenharmony_ci
1627e18e3516Sopenharmony_ci
1628e18e3516Sopenharmony_ciOUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1629e18e3516Sopenharmony_ci
1630e18e3516Sopenharmony_ci       When the alternative matching function, pcre2_dfa_match(), is used, the
1631e18e3516Sopenharmony_ci       output  consists  of  a list of all the matches that start at the first
1632e18e3516Sopenharmony_ci       point in the subject where there is at least one match. For example:
1633e18e3516Sopenharmony_ci
1634e18e3516Sopenharmony_ci           re> /(tang|tangerine|tan)/
1635e18e3516Sopenharmony_ci         data> yellow tangerine\=dfa
1636e18e3516Sopenharmony_ci          0: tangerine
1637e18e3516Sopenharmony_ci          1: tang
1638e18e3516Sopenharmony_ci          2: tan
1639e18e3516Sopenharmony_ci
1640e18e3516Sopenharmony_ci       Using the normal matching function on this data finds only "tang".  The
1641e18e3516Sopenharmony_ci       longest  matching string is always given first (and numbered zero). Af-
1642e18e3516Sopenharmony_ci       ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",  fol-
1643e18e3516Sopenharmony_ci       lowed by the partially matching substring. Note that this is the entire
1644e18e3516Sopenharmony_ci       substring that was inspected during the partial match; it  may  include
1645e18e3516Sopenharmony_ci       characters before the actual match start if a lookbehind assertion, \b,
1646e18e3516Sopenharmony_ci       or \B was involved. (\K is not supported for DFA matching.)
1647e18e3516Sopenharmony_ci
1648e18e3516Sopenharmony_ci       If global matching is requested, the search for further matches resumes
1649e18e3516Sopenharmony_ci       at the end of the longest match. For example:
1650e18e3516Sopenharmony_ci
1651e18e3516Sopenharmony_ci           re> /(tang|tangerine|tan)/g
1652e18e3516Sopenharmony_ci         data> yellow tangerine and tangy sultana\=dfa
1653e18e3516Sopenharmony_ci          0: tangerine
1654e18e3516Sopenharmony_ci          1: tang
1655e18e3516Sopenharmony_ci          2: tan
1656e18e3516Sopenharmony_ci          0: tang
1657e18e3516Sopenharmony_ci          1: tan
1658e18e3516Sopenharmony_ci          0: tan
1659e18e3516Sopenharmony_ci
1660e18e3516Sopenharmony_ci       The  alternative  matching function does not support substring capture,
1661e18e3516Sopenharmony_ci       so the modifiers that are concerned with captured  substrings  are  not
1662e18e3516Sopenharmony_ci       relevant.
1663e18e3516Sopenharmony_ci
1664e18e3516Sopenharmony_ci
1665e18e3516Sopenharmony_ciRESTARTING AFTER A PARTIAL MATCH
1666e18e3516Sopenharmony_ci
1667e18e3516Sopenharmony_ci       When  the  alternative matching function has given the PCRE2_ERROR_PAR-
1668e18e3516Sopenharmony_ci       TIAL return, indicating that the subject partially matched the pattern,
1669e18e3516Sopenharmony_ci       you  can restart the match with additional subject data by means of the
1670e18e3516Sopenharmony_ci       dfa_restart modifier. For example:
1671e18e3516Sopenharmony_ci
1672e18e3516Sopenharmony_ci           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
1673e18e3516Sopenharmony_ci         data> 23ja\=ps,dfa
1674e18e3516Sopenharmony_ci         Partial match: 23ja
1675e18e3516Sopenharmony_ci         data> n05\=dfa,dfa_restart
1676e18e3516Sopenharmony_ci          0: n05
1677e18e3516Sopenharmony_ci
1678e18e3516Sopenharmony_ci       For further information about partial matching,  see  the  pcre2partial
1679e18e3516Sopenharmony_ci       documentation.
1680e18e3516Sopenharmony_ci
1681e18e3516Sopenharmony_ci
1682e18e3516Sopenharmony_ciCALLOUTS
1683e18e3516Sopenharmony_ci
1684e18e3516Sopenharmony_ci       If the pattern contains any callout requests, pcre2test's callout func-
1685e18e3516Sopenharmony_ci       tion is called during matching unless callout_none is  specified.  This
1686e18e3516Sopenharmony_ci       works with both matching functions, and with JIT, though there are some
1687e18e3516Sopenharmony_ci       differences in behaviour. The output for callouts with numerical  argu-
1688e18e3516Sopenharmony_ci       ments and those with string arguments is slightly different.
1689e18e3516Sopenharmony_ci
1690e18e3516Sopenharmony_ci   Callouts with numerical arguments
1691e18e3516Sopenharmony_ci
1692e18e3516Sopenharmony_ci       By default, the callout function displays the callout number, the start
1693e18e3516Sopenharmony_ci       and current positions in the subject text at the callout time, and  the
1694e18e3516Sopenharmony_ci       next pattern item to be tested. For example:
1695e18e3516Sopenharmony_ci
1696e18e3516Sopenharmony_ci         --->pqrabcdef
1697e18e3516Sopenharmony_ci           0    ^  ^     \d
1698e18e3516Sopenharmony_ci
1699e18e3516Sopenharmony_ci       This  output  indicates  that callout number 0 occurred for a match at-
1700e18e3516Sopenharmony_ci       tempt starting at the fourth character of the subject string, when  the
1701e18e3516Sopenharmony_ci       pointer  was  at  the seventh character, and when the next pattern item
1702e18e3516Sopenharmony_ci       was \d. Just one circumflex is output if the start  and  current  posi-
1703e18e3516Sopenharmony_ci       tions are the same, or if the current position precedes the start posi-
1704e18e3516Sopenharmony_ci       tion, which can happen if the callout is in a lookbehind assertion.
1705e18e3516Sopenharmony_ci
1706e18e3516Sopenharmony_ci       Callouts numbered 255 are assumed to be automatic callouts, inserted as
1707e18e3516Sopenharmony_ci       a result of the auto_callout pattern modifier. In this case, instead of
1708e18e3516Sopenharmony_ci       showing the callout number, the offset in the pattern,  preceded  by  a
1709e18e3516Sopenharmony_ci       plus, is output. For example:
1710e18e3516Sopenharmony_ci
1711e18e3516Sopenharmony_ci           re> /\d?[A-E]\*/auto_callout
1712e18e3516Sopenharmony_ci         data> E*
1713e18e3516Sopenharmony_ci         --->E*
1714e18e3516Sopenharmony_ci          +0 ^      \d?
1715e18e3516Sopenharmony_ci          +3 ^      [A-E]
1716e18e3516Sopenharmony_ci          +8 ^^     \*
1717e18e3516Sopenharmony_ci         +10 ^ ^
1718e18e3516Sopenharmony_ci          0: E*
1719e18e3516Sopenharmony_ci
1720e18e3516Sopenharmony_ci       If a pattern contains (*MARK) items, an additional line is output when-
1721e18e3516Sopenharmony_ci       ever a change of latest mark is passed to the callout function. For ex-
1722e18e3516Sopenharmony_ci       ample:
1723e18e3516Sopenharmony_ci
1724e18e3516Sopenharmony_ci           re> /a(*MARK:X)bc/auto_callout
1725e18e3516Sopenharmony_ci         data> abc
1726e18e3516Sopenharmony_ci         --->abc
1727e18e3516Sopenharmony_ci          +0 ^       a
1728e18e3516Sopenharmony_ci          +1 ^^      (*MARK:X)
1729e18e3516Sopenharmony_ci         +10 ^^      b
1730e18e3516Sopenharmony_ci         Latest Mark: X
1731e18e3516Sopenharmony_ci         +11 ^ ^     c
1732e18e3516Sopenharmony_ci         +12 ^  ^
1733e18e3516Sopenharmony_ci          0: abc
1734e18e3516Sopenharmony_ci
1735e18e3516Sopenharmony_ci       The  mark  changes between matching "a" and "b", but stays the same for
1736e18e3516Sopenharmony_ci       the rest of the match, so nothing more is output. If, as  a  result  of
1737e18e3516Sopenharmony_ci       backtracking,  the  mark  reverts to being unset, the text "<unset>" is
1738e18e3516Sopenharmony_ci       output.
1739e18e3516Sopenharmony_ci
1740e18e3516Sopenharmony_ci   Callouts with string arguments
1741e18e3516Sopenharmony_ci
1742e18e3516Sopenharmony_ci       The output for a callout with a string argument is similar, except that
1743e18e3516Sopenharmony_ci       instead  of outputting a callout number before the position indicators,
1744e18e3516Sopenharmony_ci       the callout string and its offset in the pattern string are output  be-
1745e18e3516Sopenharmony_ci       fore  the  reflection  of the subject string, and the subject string is
1746e18e3516Sopenharmony_ci       reflected for each callout. For example:
1747e18e3516Sopenharmony_ci
1748e18e3516Sopenharmony_ci           re> /^ab(?C'first')cd(?C"second")ef/
1749e18e3516Sopenharmony_ci         data> abcdefg
1750e18e3516Sopenharmony_ci         Callout (7): 'first'
1751e18e3516Sopenharmony_ci         --->abcdefg
1752e18e3516Sopenharmony_ci             ^ ^         c
1753e18e3516Sopenharmony_ci         Callout (20): "second"
1754e18e3516Sopenharmony_ci         --->abcdefg
1755e18e3516Sopenharmony_ci             ^   ^       e
1756e18e3516Sopenharmony_ci          0: abcdef
1757e18e3516Sopenharmony_ci
1758e18e3516Sopenharmony_ci
1759e18e3516Sopenharmony_ci   Callout modifiers
1760e18e3516Sopenharmony_ci
1761e18e3516Sopenharmony_ci       The callout function in pcre2test returns zero (carry on  matching)  by
1762e18e3516Sopenharmony_ci       default,  but  you can use a callout_fail modifier in a subject line to
1763e18e3516Sopenharmony_ci       change this and other parameters of the callout (see below).
1764e18e3516Sopenharmony_ci
1765e18e3516Sopenharmony_ci       If the callout_capture modifier is set, the current captured groups are
1766e18e3516Sopenharmony_ci       output when a callout occurs. This is useful only for non-DFA matching,
1767e18e3516Sopenharmony_ci       as pcre2_dfa_match() does not support capturing,  so  no  captures  are
1768e18e3516Sopenharmony_ci       ever shown.
1769e18e3516Sopenharmony_ci
1770e18e3516Sopenharmony_ci       The normal callout output, showing the callout number or pattern offset
1771e18e3516Sopenharmony_ci       (as described above) is suppressed if the callout_no_where modifier  is
1772e18e3516Sopenharmony_ci       set.
1773e18e3516Sopenharmony_ci
1774e18e3516Sopenharmony_ci       When  using  the  interpretive  matching function pcre2_match() without
1775e18e3516Sopenharmony_ci       JIT, setting the callout_extra modifier causes additional  output  from
1776e18e3516Sopenharmony_ci       pcre2test's  callout function to be generated. For the first callout in
1777e18e3516Sopenharmony_ci       a match attempt at a new starting position in the subject,  "New  match
1778e18e3516Sopenharmony_ci       attempt"  is output. If there has been a backtrack since the last call-
1779e18e3516Sopenharmony_ci       out (or start of matching if this is the first callout), "Backtrack" is
1780e18e3516Sopenharmony_ci       output,  followed  by  "No other matching paths" if the backtrack ended
1781e18e3516Sopenharmony_ci       the previous match attempt. For example:
1782e18e3516Sopenharmony_ci
1783e18e3516Sopenharmony_ci          re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
1784e18e3516Sopenharmony_ci         data> aac\=callout_extra
1785e18e3516Sopenharmony_ci         New match attempt
1786e18e3516Sopenharmony_ci         --->aac
1787e18e3516Sopenharmony_ci          +0 ^       (
1788e18e3516Sopenharmony_ci          +1 ^       a+
1789e18e3516Sopenharmony_ci          +3 ^ ^     )
1790e18e3516Sopenharmony_ci          +4 ^ ^     b
1791e18e3516Sopenharmony_ci         Backtrack
1792e18e3516Sopenharmony_ci         --->aac
1793e18e3516Sopenharmony_ci          +3 ^^      )
1794e18e3516Sopenharmony_ci          +4 ^^      b
1795e18e3516Sopenharmony_ci         Backtrack
1796e18e3516Sopenharmony_ci         No other matching paths
1797e18e3516Sopenharmony_ci         New match attempt
1798e18e3516Sopenharmony_ci         --->aac
1799e18e3516Sopenharmony_ci          +0  ^      (
1800e18e3516Sopenharmony_ci          +1  ^      a+
1801e18e3516Sopenharmony_ci          +3  ^^     )
1802e18e3516Sopenharmony_ci          +4  ^^     b
1803e18e3516Sopenharmony_ci         Backtrack
1804e18e3516Sopenharmony_ci         No other matching paths
1805e18e3516Sopenharmony_ci         New match attempt
1806e18e3516Sopenharmony_ci         --->aac
1807e18e3516Sopenharmony_ci          +0   ^     (
1808e18e3516Sopenharmony_ci          +1   ^     a+
1809e18e3516Sopenharmony_ci         Backtrack
1810e18e3516Sopenharmony_ci         No other matching paths
1811e18e3516Sopenharmony_ci         New match attempt
1812e18e3516Sopenharmony_ci         --->aac
1813e18e3516Sopenharmony_ci          +0    ^    (
1814e18e3516Sopenharmony_ci          +1    ^    a+
1815e18e3516Sopenharmony_ci         No match
1816e18e3516Sopenharmony_ci
1817e18e3516Sopenharmony_ci       Notice that various optimizations must be turned off if  you  want  all
1818e18e3516Sopenharmony_ci       possible  matching  paths  to  be  scanned. If no_start_optimize is not
1819e18e3516Sopenharmony_ci       used, there is an immediate "no match", without any  callouts,  because
1820e18e3516Sopenharmony_ci       the  starting  optimization  fails to find "b" in the subject, which it
1821e18e3516Sopenharmony_ci       knows must be present for any match. If no_auto_possess  is  not  used,
1822e18e3516Sopenharmony_ci       the  "a+"  item is turned into "a++", which reduces the number of back-
1823e18e3516Sopenharmony_ci       tracks.
1824e18e3516Sopenharmony_ci
1825e18e3516Sopenharmony_ci       The callout_extra modifier has no effect if used with the DFA  matching
1826e18e3516Sopenharmony_ci       function, or with JIT.
1827e18e3516Sopenharmony_ci
1828e18e3516Sopenharmony_ci   Return values from callouts
1829e18e3516Sopenharmony_ci
1830e18e3516Sopenharmony_ci       The  default  return  from  the  callout function is zero, which allows
1831e18e3516Sopenharmony_ci       matching to continue. The callout_fail modifier can be given one or two
1832e18e3516Sopenharmony_ci       numbers. If there is only one number, 1 is returned instead of 0 (caus-
1833e18e3516Sopenharmony_ci       ing matching to backtrack) when a callout of that number is reached. If
1834e18e3516Sopenharmony_ci       two  numbers  (<n>:<m>)  are  given,  1 is returned when callout <n> is
1835e18e3516Sopenharmony_ci       reached and there have been at least <m>  callouts.  The  callout_error
1836e18e3516Sopenharmony_ci       modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
1837e18e3516Sopenharmony_ci       ing the entire matching process to be aborted. If both these  modifiers
1838e18e3516Sopenharmony_ci       are  set  for  the same callout number, callout_error takes precedence.
1839e18e3516Sopenharmony_ci       Note that callouts with string arguments are always  given  the  number
1840e18e3516Sopenharmony_ci       zero.
1841e18e3516Sopenharmony_ci
1842e18e3516Sopenharmony_ci       The  callout_data  modifier can be given an unsigned or a negative num-
1843e18e3516Sopenharmony_ci       ber.  This is set as the "user data" that is  passed  to  the  matching
1844e18e3516Sopenharmony_ci       function,  and  passed  back  when the callout function is invoked. Any
1845e18e3516Sopenharmony_ci       value other than zero is used as  a  return  from  pcre2test's  callout
1846e18e3516Sopenharmony_ci       function.
1847e18e3516Sopenharmony_ci
1848e18e3516Sopenharmony_ci       Inserting callouts can be helpful when using pcre2test to check compli-
1849e18e3516Sopenharmony_ci       cated regular expressions. For further information about callouts,  see
1850e18e3516Sopenharmony_ci       the pcre2callout documentation.
1851e18e3516Sopenharmony_ci
1852e18e3516Sopenharmony_ci
1853e18e3516Sopenharmony_ciNON-PRINTING CHARACTERS
1854e18e3516Sopenharmony_ci
1855e18e3516Sopenharmony_ci       When pcre2test is outputting text in the compiled version of a pattern,
1856e18e3516Sopenharmony_ci       bytes other than 32-126 are always treated as  non-printing  characters
1857e18e3516Sopenharmony_ci       and are therefore shown as hex escapes.
1858e18e3516Sopenharmony_ci
1859e18e3516Sopenharmony_ci       When  pcre2test  is outputting text that is a matched part of a subject
1860e18e3516Sopenharmony_ci       string, it behaves in the same way, unless a different locale has  been
1861e18e3516Sopenharmony_ci       set  for the pattern (using the locale modifier). In this case, the is-
1862e18e3516Sopenharmony_ci       print() function is used to distinguish printing and non-printing char-
1863e18e3516Sopenharmony_ci       acters.
1864e18e3516Sopenharmony_ci
1865e18e3516Sopenharmony_ci
1866e18e3516Sopenharmony_ciSAVING AND RESTORING COMPILED PATTERNS
1867e18e3516Sopenharmony_ci
1868e18e3516Sopenharmony_ci       It  is  possible  to  save  compiled patterns on disc or elsewhere, and
1869e18e3516Sopenharmony_ci       reload them later, subject to a number of restrictions. JIT data cannot
1870e18e3516Sopenharmony_ci       be  saved.  The host on which the patterns are reloaded must be running
1871e18e3516Sopenharmony_ci       the same version of PCRE2, with the same code unit width, and must also
1872e18e3516Sopenharmony_ci       have  the  same  endianness,  pointer width and PCRE2_SIZE type. Before
1873e18e3516Sopenharmony_ci       compiled patterns can be saved they must be serialized, that  is,  con-
1874e18e3516Sopenharmony_ci       verted  to a stream of bytes. A single byte stream may contain any num-
1875e18e3516Sopenharmony_ci       ber of compiled patterns, but they must all use the same character  ta-
1876e18e3516Sopenharmony_ci       bles.  A  single copy of the tables is included in the byte stream (its
1877e18e3516Sopenharmony_ci       size is 1088 bytes).
1878e18e3516Sopenharmony_ci
1879e18e3516Sopenharmony_ci       The functions whose names begin with pcre2_serialize_ are used for  se-
1880e18e3516Sopenharmony_ci       rializing  and de-serializing. They are described in the pcre2serialize
1881e18e3516Sopenharmony_ci       documentation. In this section we describe the  features  of  pcre2test
1882e18e3516Sopenharmony_ci       that can be used to test these functions.
1883e18e3516Sopenharmony_ci
1884e18e3516Sopenharmony_ci       Note  that  "serialization" in PCRE2 does not convert compiled patterns
1885e18e3516Sopenharmony_ci       to an abstract format like Java or .NET. It  just  makes  a  reloadable
1886e18e3516Sopenharmony_ci       byte code stream.  Hence the restrictions on reloading mentioned above.
1887e18e3516Sopenharmony_ci
1888e18e3516Sopenharmony_ci       In  pcre2test,  when  a pattern with push modifier is successfully com-
1889e18e3516Sopenharmony_ci       piled, it is pushed onto a stack of compiled  patterns,  and  pcre2test
1890e18e3516Sopenharmony_ci       expects  the next line to contain a new pattern (or command) instead of
1891e18e3516Sopenharmony_ci       a subject line. By contrast, the pushcopy modifier causes a copy of the
1892e18e3516Sopenharmony_ci       compiled  pattern to be stacked, leaving the original available for im-
1893e18e3516Sopenharmony_ci       mediate matching. By using push and/or pushcopy, a number  of  patterns
1894e18e3516Sopenharmony_ci       can  be  compiled  and  retained. These modifiers are incompatible with
1895e18e3516Sopenharmony_ci       posix, and control modifiers that act at match time are ignored (with a
1896e18e3516Sopenharmony_ci       message)  for the stacked patterns. The jitverify modifier applies only
1897e18e3516Sopenharmony_ci       at compile time.
1898e18e3516Sopenharmony_ci
1899e18e3516Sopenharmony_ci       The command
1900e18e3516Sopenharmony_ci
1901e18e3516Sopenharmony_ci         #save <filename>
1902e18e3516Sopenharmony_ci
1903e18e3516Sopenharmony_ci       causes all the stacked patterns to be serialized and the result written
1904e18e3516Sopenharmony_ci       to  the named file. Afterwards, all the stacked patterns are freed. The
1905e18e3516Sopenharmony_ci       command
1906e18e3516Sopenharmony_ci
1907e18e3516Sopenharmony_ci         #load <filename>
1908e18e3516Sopenharmony_ci
1909e18e3516Sopenharmony_ci       reads the data in the file, and then arranges for it to  be  de-serial-
1910e18e3516Sopenharmony_ci       ized,  with the resulting compiled patterns added to the pattern stack.
1911e18e3516Sopenharmony_ci       The pattern on the top of the stack can be retrieved by the  #pop  com-
1912e18e3516Sopenharmony_ci       mand,  which  must  be  followed  by  lines  of subjects that are to be
1913e18e3516Sopenharmony_ci       matched with the pattern, terminated as usual by an empty line  or  end
1914e18e3516Sopenharmony_ci       of  file.  This  command  may be followed by a modifier list containing
1915e18e3516Sopenharmony_ci       only control modifiers that act after a pattern has been  compiled.  In
1916e18e3516Sopenharmony_ci       particular,  hex,  posix,  posix_nosub,  push, and pushcopy are not al-
1917e18e3516Sopenharmony_ci       lowed, nor are any option-setting modifiers.  The  JIT  modifiers  are,
1918e18e3516Sopenharmony_ci       however  permitted.  Here is an example that saves and reloads two pat-
1919e18e3516Sopenharmony_ci       terns.
1920e18e3516Sopenharmony_ci
1921e18e3516Sopenharmony_ci         /abc/push
1922e18e3516Sopenharmony_ci         /xyz/push
1923e18e3516Sopenharmony_ci         #save tempfile
1924e18e3516Sopenharmony_ci         #load tempfile
1925e18e3516Sopenharmony_ci         #pop info
1926e18e3516Sopenharmony_ci         xyz
1927e18e3516Sopenharmony_ci
1928e18e3516Sopenharmony_ci         #pop jit,bincode
1929e18e3516Sopenharmony_ci         abc
1930e18e3516Sopenharmony_ci
1931e18e3516Sopenharmony_ci       If jitverify is used with #pop, it does not  automatically  imply  jit,
1932e18e3516Sopenharmony_ci       which is different behaviour from when it is used on a pattern.
1933e18e3516Sopenharmony_ci
1934e18e3516Sopenharmony_ci       The  #popcopy  command is analogous to the pushcopy modifier in that it
1935e18e3516Sopenharmony_ci       makes current a copy of the topmost stack pattern, leaving the original
1936e18e3516Sopenharmony_ci       still on the stack.
1937e18e3516Sopenharmony_ci
1938e18e3516Sopenharmony_ci
1939e18e3516Sopenharmony_ciSEE ALSO
1940e18e3516Sopenharmony_ci
1941e18e3516Sopenharmony_ci       pcre2(3),  pcre2api(3),  pcre2callout(3),  pcre2jit,  pcre2matching(3),
1942e18e3516Sopenharmony_ci       pcre2partial(d), pcre2pattern(3), pcre2serialize(3).
1943e18e3516Sopenharmony_ci
1944e18e3516Sopenharmony_ci
1945e18e3516Sopenharmony_ciAUTHOR
1946e18e3516Sopenharmony_ci
1947e18e3516Sopenharmony_ci       Philip Hazel
1948e18e3516Sopenharmony_ci       Retired from University Computing Service
1949e18e3516Sopenharmony_ci       Cambridge, England.
1950e18e3516Sopenharmony_ci
1951e18e3516Sopenharmony_ci
1952e18e3516Sopenharmony_ciREVISION
1953e18e3516Sopenharmony_ci
1954e18e3516Sopenharmony_ci       Last updated: 27 July 2022
1955e18e3516Sopenharmony_ci       Copyright (c) 1997-2022 University of Cambridge.
1956