1c67d6573Sopenharmony_ciregex
2c67d6573Sopenharmony_ci=====
3c67d6573Sopenharmony_ciA Rust library for parsing, compiling, and executing regular expressions. Its
4c67d6573Sopenharmony_cisyntax is similar to Perl-style regular expressions, but lacks a few features
5c67d6573Sopenharmony_cilike look around and backreferences. In exchange, all searches execute in
6c67d6573Sopenharmony_cilinear time with respect to the size of the regular expression and search text.
7c67d6573Sopenharmony_ciMuch of the syntax and implementation is inspired
8c67d6573Sopenharmony_ciby [RE2](https://github.com/google/re2).
9c67d6573Sopenharmony_ci
10c67d6573Sopenharmony_ci[![Build status](https://github.com/rust-lang/regex/workflows/ci/badge.svg)](https://github.com/rust-lang/regex/actions)
11c67d6573Sopenharmony_ci[![Crates.io](https://img.shields.io/crates/v/regex.svg)](https://crates.io/crates/regex)
12c67d6573Sopenharmony_ci[![Rust](https://img.shields.io/badge/rust-1.41.1%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex)
13c67d6573Sopenharmony_ci
14c67d6573Sopenharmony_ci### Documentation
15c67d6573Sopenharmony_ci
16c67d6573Sopenharmony_ci[Module documentation with examples](https://docs.rs/regex).
17c67d6573Sopenharmony_ciThe module documentation also includes a comprehensive description of the
18c67d6573Sopenharmony_cisyntax supported.
19c67d6573Sopenharmony_ci
20c67d6573Sopenharmony_ciDocumentation with examples for the various matching functions and iterators
21c67d6573Sopenharmony_cican be found on the
22c67d6573Sopenharmony_ci[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).
23c67d6573Sopenharmony_ci
24c67d6573Sopenharmony_ci### Usage
25c67d6573Sopenharmony_ci
26c67d6573Sopenharmony_ciTo bring this crate into your repository, either add `regex` to your
27c67d6573Sopenharmony_ci`Cargo.toml`, or run `cargo add regex`.
28c67d6573Sopenharmony_ci
29c67d6573Sopenharmony_ciHere's a simple example that matches a date in YYYY-MM-DD format and prints the
30c67d6573Sopenharmony_ciyear, month and day:
31c67d6573Sopenharmony_ci
32c67d6573Sopenharmony_ci```rust
33c67d6573Sopenharmony_ciuse regex::Regex;
34c67d6573Sopenharmony_ci
35c67d6573Sopenharmony_cifn main() {
36c67d6573Sopenharmony_ci    let re = Regex::new(r"(?x)
37c67d6573Sopenharmony_ci(?P<year>\d{4})  # the year
38c67d6573Sopenharmony_ci-
39c67d6573Sopenharmony_ci(?P<month>\d{2}) # the month
40c67d6573Sopenharmony_ci-
41c67d6573Sopenharmony_ci(?P<day>\d{2})   # the day
42c67d6573Sopenharmony_ci").unwrap();
43c67d6573Sopenharmony_ci    let caps = re.captures("2010-03-14").unwrap();
44c67d6573Sopenharmony_ci
45c67d6573Sopenharmony_ci    assert_eq!("2010", &caps["year"]);
46c67d6573Sopenharmony_ci    assert_eq!("03", &caps["month"]);
47c67d6573Sopenharmony_ci    assert_eq!("14", &caps["day"]);
48c67d6573Sopenharmony_ci}
49c67d6573Sopenharmony_ci```
50c67d6573Sopenharmony_ci
51c67d6573Sopenharmony_ciIf you have lots of dates in text that you'd like to iterate over, then it's
52c67d6573Sopenharmony_cieasy to adapt the above example with an iterator:
53c67d6573Sopenharmony_ci
54c67d6573Sopenharmony_ci```rust
55c67d6573Sopenharmony_ciuse regex::Regex;
56c67d6573Sopenharmony_ci
57c67d6573Sopenharmony_ciconst TO_SEARCH: &'static str = "
58c67d6573Sopenharmony_ciOn 2010-03-14, foo happened. On 2014-10-14, bar happened.
59c67d6573Sopenharmony_ci";
60c67d6573Sopenharmony_ci
61c67d6573Sopenharmony_cifn main() {
62c67d6573Sopenharmony_ci    let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
63c67d6573Sopenharmony_ci
64c67d6573Sopenharmony_ci    for caps in re.captures_iter(TO_SEARCH) {
65c67d6573Sopenharmony_ci        // Note that all of the unwraps are actually OK for this regex
66c67d6573Sopenharmony_ci        // because the only way for the regex to match is if all of the
67c67d6573Sopenharmony_ci        // capture groups match. This is not true in general though!
68c67d6573Sopenharmony_ci        println!("year: {}, month: {}, day: {}",
69c67d6573Sopenharmony_ci                 caps.get(1).unwrap().as_str(),
70c67d6573Sopenharmony_ci                 caps.get(2).unwrap().as_str(),
71c67d6573Sopenharmony_ci                 caps.get(3).unwrap().as_str());
72c67d6573Sopenharmony_ci    }
73c67d6573Sopenharmony_ci}
74c67d6573Sopenharmony_ci```
75c67d6573Sopenharmony_ci
76c67d6573Sopenharmony_ciThis example outputs:
77c67d6573Sopenharmony_ci
78c67d6573Sopenharmony_ci```text
79c67d6573Sopenharmony_ciyear: 2010, month: 03, day: 14
80c67d6573Sopenharmony_ciyear: 2014, month: 10, day: 14
81c67d6573Sopenharmony_ci```
82c67d6573Sopenharmony_ci
83c67d6573Sopenharmony_ci### Usage: Avoid compiling the same regex in a loop
84c67d6573Sopenharmony_ci
85c67d6573Sopenharmony_ciIt is an anti-pattern to compile the same regular expression in a loop since
86c67d6573Sopenharmony_cicompilation is typically expensive. (It takes anywhere from a few microseconds
87c67d6573Sopenharmony_cito a few **milliseconds** depending on the size of the regex.) Not only is
88c67d6573Sopenharmony_cicompilation itself expensive, but this also prevents optimizations that reuse
89c67d6573Sopenharmony_ciallocations internally to the matching engines.
90c67d6573Sopenharmony_ci
91c67d6573Sopenharmony_ciIn Rust, it can sometimes be a pain to pass regular expressions around if
92c67d6573Sopenharmony_cithey're used from inside a helper function. Instead, we recommend using the
93c67d6573Sopenharmony_ci[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
94c67d6573Sopenharmony_ciregular expressions are compiled exactly once.
95c67d6573Sopenharmony_ci
96c67d6573Sopenharmony_ciFor example:
97c67d6573Sopenharmony_ci
98c67d6573Sopenharmony_ci```rust,ignore
99c67d6573Sopenharmony_ciuse regex::Regex;
100c67d6573Sopenharmony_ci
101c67d6573Sopenharmony_cifn some_helper_function(text: &str) -> bool {
102c67d6573Sopenharmony_ci    lazy_static! {
103c67d6573Sopenharmony_ci        static ref RE: Regex = Regex::new("...").unwrap();
104c67d6573Sopenharmony_ci    }
105c67d6573Sopenharmony_ci    RE.is_match(text)
106c67d6573Sopenharmony_ci}
107c67d6573Sopenharmony_ci```
108c67d6573Sopenharmony_ci
109c67d6573Sopenharmony_ciSpecifically, in this example, the regex will be compiled when it is used for
110c67d6573Sopenharmony_cithe first time. On subsequent uses, it will reuse the previous compilation.
111c67d6573Sopenharmony_ci
112c67d6573Sopenharmony_ci### Usage: match regular expressions on `&[u8]`
113c67d6573Sopenharmony_ci
114c67d6573Sopenharmony_ciThe main API of this crate (`regex::Regex`) requires the caller to pass a
115c67d6573Sopenharmony_ci`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which
116c67d6573Sopenharmony_cimeans the main API can't be used for searching arbitrary bytes.
117c67d6573Sopenharmony_ci
118c67d6573Sopenharmony_ciTo match on arbitrary bytes, use the `regex::bytes::Regex` API. The API
119c67d6573Sopenharmony_ciis identical to the main API, except that it takes an `&[u8]` to search
120c67d6573Sopenharmony_cion instead of an `&str`. By default, `.` will match any *byte* using
121c67d6573Sopenharmony_ci`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar
122c67d6573Sopenharmony_civalue* using the main API.
123c67d6573Sopenharmony_ci
124c67d6573Sopenharmony_ciThis example shows how to find all null-terminated strings in a slice of bytes:
125c67d6573Sopenharmony_ci
126c67d6573Sopenharmony_ci```rust
127c67d6573Sopenharmony_ciuse regex::bytes::Regex;
128c67d6573Sopenharmony_ci
129c67d6573Sopenharmony_cilet re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap();
130c67d6573Sopenharmony_cilet text = b"foo\x00bar\x00baz\x00";
131c67d6573Sopenharmony_ci
132c67d6573Sopenharmony_ci// Extract all of the strings without the null terminator from each match.
133c67d6573Sopenharmony_ci// The unwrap is OK here since a match requires the `cstr` capture to match.
134c67d6573Sopenharmony_cilet cstrs: Vec<&[u8]> =
135c67d6573Sopenharmony_ci    re.captures_iter(text)
136c67d6573Sopenharmony_ci      .map(|c| c.name("cstr").unwrap().as_bytes())
137c67d6573Sopenharmony_ci      .collect();
138c67d6573Sopenharmony_ciassert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
139c67d6573Sopenharmony_ci```
140c67d6573Sopenharmony_ci
141c67d6573Sopenharmony_ciNotice here that the `[^\x00]+` will match any *byte* except for `NUL`. When
142c67d6573Sopenharmony_ciusing the main API, `[^\x00]+` would instead match any valid UTF-8 sequence
143c67d6573Sopenharmony_ciexcept for `NUL`.
144c67d6573Sopenharmony_ci
145c67d6573Sopenharmony_ci### Usage: match multiple regular expressions simultaneously
146c67d6573Sopenharmony_ci
147c67d6573Sopenharmony_ciThis demonstrates how to use a `RegexSet` to match multiple (possibly
148c67d6573Sopenharmony_cioverlapping) regular expressions in a single scan of the search text:
149c67d6573Sopenharmony_ci
150c67d6573Sopenharmony_ci```rust
151c67d6573Sopenharmony_ciuse regex::RegexSet;
152c67d6573Sopenharmony_ci
153c67d6573Sopenharmony_cilet set = RegexSet::new(&[
154c67d6573Sopenharmony_ci    r"\w+",
155c67d6573Sopenharmony_ci    r"\d+",
156c67d6573Sopenharmony_ci    r"\pL+",
157c67d6573Sopenharmony_ci    r"foo",
158c67d6573Sopenharmony_ci    r"bar",
159c67d6573Sopenharmony_ci    r"barfoo",
160c67d6573Sopenharmony_ci    r"foobar",
161c67d6573Sopenharmony_ci]).unwrap();
162c67d6573Sopenharmony_ci
163c67d6573Sopenharmony_ci// Iterate over and collect all of the matches.
164c67d6573Sopenharmony_cilet matches: Vec<_> = set.matches("foobar").into_iter().collect();
165c67d6573Sopenharmony_ciassert_eq!(matches, vec![0, 2, 3, 4, 6]);
166c67d6573Sopenharmony_ci
167c67d6573Sopenharmony_ci// You can also test whether a particular regex matched:
168c67d6573Sopenharmony_cilet matches = set.matches("foobar");
169c67d6573Sopenharmony_ciassert!(!matches.matched(5));
170c67d6573Sopenharmony_ciassert!(matches.matched(6));
171c67d6573Sopenharmony_ci```
172c67d6573Sopenharmony_ci
173c67d6573Sopenharmony_ci### Usage: enable SIMD optimizations
174c67d6573Sopenharmony_ci
175c67d6573Sopenharmony_ciSIMD optimizations are enabled automatically on Rust stable 1.27 and newer.
176c67d6573Sopenharmony_ciFor nightly versions of Rust, this requires a recent version with the SIMD
177c67d6573Sopenharmony_cifeatures stabilized.
178c67d6573Sopenharmony_ci
179c67d6573Sopenharmony_ci
180c67d6573Sopenharmony_ci### Usage: a regular expression parser
181c67d6573Sopenharmony_ci
182c67d6573Sopenharmony_ciThis repository contains a crate that provides a well tested regular expression
183c67d6573Sopenharmony_ciparser, abstract syntax and a high-level intermediate representation for
184c67d6573Sopenharmony_ciconvenient analysis. It provides no facilities for compilation or execution.
185c67d6573Sopenharmony_ciThis may be useful if you're implementing your own regex engine or otherwise
186c67d6573Sopenharmony_cineed to do analysis on the syntax of a regular expression. It is otherwise not
187c67d6573Sopenharmony_cirecommended for general use.
188c67d6573Sopenharmony_ci
189c67d6573Sopenharmony_ci[Documentation `regex-syntax`.](https://docs.rs/regex-syntax)
190c67d6573Sopenharmony_ci
191c67d6573Sopenharmony_ci
192c67d6573Sopenharmony_ci### Crate features
193c67d6573Sopenharmony_ci
194c67d6573Sopenharmony_ciThis crate comes with several features that permit tweaking the trade off
195c67d6573Sopenharmony_cibetween binary size, compilation time and runtime performance. Users of this
196c67d6573Sopenharmony_cicrate can selectively disable Unicode tables, or choose from a variety of
197c67d6573Sopenharmony_cioptimizations performed by this crate to disable.
198c67d6573Sopenharmony_ci
199c67d6573Sopenharmony_ciWhen all of these features are disabled, runtime match performance may be much
200c67d6573Sopenharmony_ciworse, but if you're matching on short strings, or if high performance isn't
201c67d6573Sopenharmony_cinecessary, then such a configuration is perfectly serviceable. To disable
202c67d6573Sopenharmony_ciall such features, use the following `Cargo.toml` dependency configuration:
203c67d6573Sopenharmony_ci
204c67d6573Sopenharmony_ci```toml
205c67d6573Sopenharmony_ci[dependencies.regex]
206c67d6573Sopenharmony_civersion = "1.3"
207c67d6573Sopenharmony_cidefault-features = false
208c67d6573Sopenharmony_ci# regex currently requires the standard library, you must re-enable it.
209c67d6573Sopenharmony_cifeatures = ["std"]
210c67d6573Sopenharmony_ci```
211c67d6573Sopenharmony_ci
212c67d6573Sopenharmony_ciThis will reduce the dependency tree of `regex` down to a single crate
213c67d6573Sopenharmony_ci(`regex-syntax`).
214c67d6573Sopenharmony_ci
215c67d6573Sopenharmony_ciThe full set of features one can disable are
216c67d6573Sopenharmony_ci[in the "Crate features" section of the documentation](https://docs.rs/regex/*/#crate-features).
217c67d6573Sopenharmony_ci
218c67d6573Sopenharmony_ci
219c67d6573Sopenharmony_ci### Minimum Rust version policy
220c67d6573Sopenharmony_ci
221c67d6573Sopenharmony_ciThis crate's minimum supported `rustc` version is `1.41.1`.
222c67d6573Sopenharmony_ci
223c67d6573Sopenharmony_ciThe current **tentative** policy is that the minimum Rust version required
224c67d6573Sopenharmony_cito use this crate can be increased in minor version updates. For example, if
225c67d6573Sopenharmony_ciregex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will
226c67d6573Sopenharmony_cialso require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a
227c67d6573Sopenharmony_cinewer minimum version of Rust.
228c67d6573Sopenharmony_ci
229c67d6573Sopenharmony_ciIn general, this crate will be conservative with respect to the minimum
230c67d6573Sopenharmony_cisupported version of Rust.
231c67d6573Sopenharmony_ci
232c67d6573Sopenharmony_ci
233c67d6573Sopenharmony_ci### License
234c67d6573Sopenharmony_ci
235c67d6573Sopenharmony_ciThis project is licensed under either of
236c67d6573Sopenharmony_ci
237c67d6573Sopenharmony_ci * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
238c67d6573Sopenharmony_ci   https://www.apache.org/licenses/LICENSE-2.0)
239c67d6573Sopenharmony_ci * MIT license ([LICENSE-MIT](LICENSE-MIT) or
240c67d6573Sopenharmony_ci   https://opensource.org/licenses/MIT)
241c67d6573Sopenharmony_ci
242c67d6573Sopenharmony_ciat your option.
243c67d6573Sopenharmony_ci
244c67d6573Sopenharmony_ciThe data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
245c67d6573Sopenharmony_ciLicense Agreement
246c67d6573Sopenharmony_ci([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)).
247