1c67d6573Sopenharmony_ciregex 2c67d6573Sopenharmony_ci===== 3c67d6573Sopenharmony_ciA Rust library for parsing, compiling, and executing regular expressions. Its 4c67d6573Sopenharmony_cisyntax is similar to Perl-style regular expressions, but lacks a few features 5c67d6573Sopenharmony_cilike look around and backreferences. In exchange, all searches execute in 6c67d6573Sopenharmony_cilinear time with respect to the size of the regular expression and search text. 7c67d6573Sopenharmony_ciMuch of the syntax and implementation is inspired 8c67d6573Sopenharmony_ciby [RE2](https://github.com/google/re2). 9c67d6573Sopenharmony_ci 10c67d6573Sopenharmony_ci[](https://github.com/rust-lang/regex/actions) 11c67d6573Sopenharmony_ci[](https://crates.io/crates/regex) 12c67d6573Sopenharmony_ci[](https://github.com/rust-lang/regex) 13c67d6573Sopenharmony_ci 14c67d6573Sopenharmony_ci### Documentation 15c67d6573Sopenharmony_ci 16c67d6573Sopenharmony_ci[Module documentation with examples](https://docs.rs/regex). 17c67d6573Sopenharmony_ciThe module documentation also includes a comprehensive description of the 18c67d6573Sopenharmony_cisyntax supported. 19c67d6573Sopenharmony_ci 20c67d6573Sopenharmony_ciDocumentation with examples for the various matching functions and iterators 21c67d6573Sopenharmony_cican be found on the 22c67d6573Sopenharmony_ci[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html). 23c67d6573Sopenharmony_ci 24c67d6573Sopenharmony_ci### Usage 25c67d6573Sopenharmony_ci 26c67d6573Sopenharmony_ciTo bring this crate into your repository, either add `regex` to your 27c67d6573Sopenharmony_ci`Cargo.toml`, or run `cargo add regex`. 28c67d6573Sopenharmony_ci 29c67d6573Sopenharmony_ciHere's a simple example that matches a date in YYYY-MM-DD format and prints the 30c67d6573Sopenharmony_ciyear, month and day: 31c67d6573Sopenharmony_ci 32c67d6573Sopenharmony_ci```rust 33c67d6573Sopenharmony_ciuse regex::Regex; 34c67d6573Sopenharmony_ci 35c67d6573Sopenharmony_cifn main() { 36c67d6573Sopenharmony_ci let re = Regex::new(r"(?x) 37c67d6573Sopenharmony_ci(?P<year>\d{4}) # the year 38c67d6573Sopenharmony_ci- 39c67d6573Sopenharmony_ci(?P<month>\d{2}) # the month 40c67d6573Sopenharmony_ci- 41c67d6573Sopenharmony_ci(?P<day>\d{2}) # the day 42c67d6573Sopenharmony_ci").unwrap(); 43c67d6573Sopenharmony_ci let caps = re.captures("2010-03-14").unwrap(); 44c67d6573Sopenharmony_ci 45c67d6573Sopenharmony_ci assert_eq!("2010", &caps["year"]); 46c67d6573Sopenharmony_ci assert_eq!("03", &caps["month"]); 47c67d6573Sopenharmony_ci assert_eq!("14", &caps["day"]); 48c67d6573Sopenharmony_ci} 49c67d6573Sopenharmony_ci``` 50c67d6573Sopenharmony_ci 51c67d6573Sopenharmony_ciIf you have lots of dates in text that you'd like to iterate over, then it's 52c67d6573Sopenharmony_cieasy to adapt the above example with an iterator: 53c67d6573Sopenharmony_ci 54c67d6573Sopenharmony_ci```rust 55c67d6573Sopenharmony_ciuse regex::Regex; 56c67d6573Sopenharmony_ci 57c67d6573Sopenharmony_ciconst TO_SEARCH: &'static str = " 58c67d6573Sopenharmony_ciOn 2010-03-14, foo happened. On 2014-10-14, bar happened. 59c67d6573Sopenharmony_ci"; 60c67d6573Sopenharmony_ci 61c67d6573Sopenharmony_cifn main() { 62c67d6573Sopenharmony_ci let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap(); 63c67d6573Sopenharmony_ci 64c67d6573Sopenharmony_ci for caps in re.captures_iter(TO_SEARCH) { 65c67d6573Sopenharmony_ci // Note that all of the unwraps are actually OK for this regex 66c67d6573Sopenharmony_ci // because the only way for the regex to match is if all of the 67c67d6573Sopenharmony_ci // capture groups match. This is not true in general though! 68c67d6573Sopenharmony_ci println!("year: {}, month: {}, day: {}", 69c67d6573Sopenharmony_ci caps.get(1).unwrap().as_str(), 70c67d6573Sopenharmony_ci caps.get(2).unwrap().as_str(), 71c67d6573Sopenharmony_ci caps.get(3).unwrap().as_str()); 72c67d6573Sopenharmony_ci } 73c67d6573Sopenharmony_ci} 74c67d6573Sopenharmony_ci``` 75c67d6573Sopenharmony_ci 76c67d6573Sopenharmony_ciThis example outputs: 77c67d6573Sopenharmony_ci 78c67d6573Sopenharmony_ci```text 79c67d6573Sopenharmony_ciyear: 2010, month: 03, day: 14 80c67d6573Sopenharmony_ciyear: 2014, month: 10, day: 14 81c67d6573Sopenharmony_ci``` 82c67d6573Sopenharmony_ci 83c67d6573Sopenharmony_ci### Usage: Avoid compiling the same regex in a loop 84c67d6573Sopenharmony_ci 85c67d6573Sopenharmony_ciIt is an anti-pattern to compile the same regular expression in a loop since 86c67d6573Sopenharmony_cicompilation is typically expensive. (It takes anywhere from a few microseconds 87c67d6573Sopenharmony_cito a few **milliseconds** depending on the size of the regex.) Not only is 88c67d6573Sopenharmony_cicompilation itself expensive, but this also prevents optimizations that reuse 89c67d6573Sopenharmony_ciallocations internally to the matching engines. 90c67d6573Sopenharmony_ci 91c67d6573Sopenharmony_ciIn Rust, it can sometimes be a pain to pass regular expressions around if 92c67d6573Sopenharmony_cithey're used from inside a helper function. Instead, we recommend using the 93c67d6573Sopenharmony_ci[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that 94c67d6573Sopenharmony_ciregular expressions are compiled exactly once. 95c67d6573Sopenharmony_ci 96c67d6573Sopenharmony_ciFor example: 97c67d6573Sopenharmony_ci 98c67d6573Sopenharmony_ci```rust,ignore 99c67d6573Sopenharmony_ciuse regex::Regex; 100c67d6573Sopenharmony_ci 101c67d6573Sopenharmony_cifn some_helper_function(text: &str) -> bool { 102c67d6573Sopenharmony_ci lazy_static! { 103c67d6573Sopenharmony_ci static ref RE: Regex = Regex::new("...").unwrap(); 104c67d6573Sopenharmony_ci } 105c67d6573Sopenharmony_ci RE.is_match(text) 106c67d6573Sopenharmony_ci} 107c67d6573Sopenharmony_ci``` 108c67d6573Sopenharmony_ci 109c67d6573Sopenharmony_ciSpecifically, in this example, the regex will be compiled when it is used for 110c67d6573Sopenharmony_cithe first time. On subsequent uses, it will reuse the previous compilation. 111c67d6573Sopenharmony_ci 112c67d6573Sopenharmony_ci### Usage: match regular expressions on `&[u8]` 113c67d6573Sopenharmony_ci 114c67d6573Sopenharmony_ciThe main API of this crate (`regex::Regex`) requires the caller to pass a 115c67d6573Sopenharmony_ci`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which 116c67d6573Sopenharmony_cimeans the main API can't be used for searching arbitrary bytes. 117c67d6573Sopenharmony_ci 118c67d6573Sopenharmony_ciTo match on arbitrary bytes, use the `regex::bytes::Regex` API. The API 119c67d6573Sopenharmony_ciis identical to the main API, except that it takes an `&[u8]` to search 120c67d6573Sopenharmony_cion instead of an `&str`. By default, `.` will match any *byte* using 121c67d6573Sopenharmony_ci`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar 122c67d6573Sopenharmony_civalue* using the main API. 123c67d6573Sopenharmony_ci 124c67d6573Sopenharmony_ciThis example shows how to find all null-terminated strings in a slice of bytes: 125c67d6573Sopenharmony_ci 126c67d6573Sopenharmony_ci```rust 127c67d6573Sopenharmony_ciuse regex::bytes::Regex; 128c67d6573Sopenharmony_ci 129c67d6573Sopenharmony_cilet re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap(); 130c67d6573Sopenharmony_cilet text = b"foo\x00bar\x00baz\x00"; 131c67d6573Sopenharmony_ci 132c67d6573Sopenharmony_ci// Extract all of the strings without the null terminator from each match. 133c67d6573Sopenharmony_ci// The unwrap is OK here since a match requires the `cstr` capture to match. 134c67d6573Sopenharmony_cilet cstrs: Vec<&[u8]> = 135c67d6573Sopenharmony_ci re.captures_iter(text) 136c67d6573Sopenharmony_ci .map(|c| c.name("cstr").unwrap().as_bytes()) 137c67d6573Sopenharmony_ci .collect(); 138c67d6573Sopenharmony_ciassert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs); 139c67d6573Sopenharmony_ci``` 140c67d6573Sopenharmony_ci 141c67d6573Sopenharmony_ciNotice here that the `[^\x00]+` will match any *byte* except for `NUL`. When 142c67d6573Sopenharmony_ciusing the main API, `[^\x00]+` would instead match any valid UTF-8 sequence 143c67d6573Sopenharmony_ciexcept for `NUL`. 144c67d6573Sopenharmony_ci 145c67d6573Sopenharmony_ci### Usage: match multiple regular expressions simultaneously 146c67d6573Sopenharmony_ci 147c67d6573Sopenharmony_ciThis demonstrates how to use a `RegexSet` to match multiple (possibly 148c67d6573Sopenharmony_cioverlapping) regular expressions in a single scan of the search text: 149c67d6573Sopenharmony_ci 150c67d6573Sopenharmony_ci```rust 151c67d6573Sopenharmony_ciuse regex::RegexSet; 152c67d6573Sopenharmony_ci 153c67d6573Sopenharmony_cilet set = RegexSet::new(&[ 154c67d6573Sopenharmony_ci r"\w+", 155c67d6573Sopenharmony_ci r"\d+", 156c67d6573Sopenharmony_ci r"\pL+", 157c67d6573Sopenharmony_ci r"foo", 158c67d6573Sopenharmony_ci r"bar", 159c67d6573Sopenharmony_ci r"barfoo", 160c67d6573Sopenharmony_ci r"foobar", 161c67d6573Sopenharmony_ci]).unwrap(); 162c67d6573Sopenharmony_ci 163c67d6573Sopenharmony_ci// Iterate over and collect all of the matches. 164c67d6573Sopenharmony_cilet matches: Vec<_> = set.matches("foobar").into_iter().collect(); 165c67d6573Sopenharmony_ciassert_eq!(matches, vec![0, 2, 3, 4, 6]); 166c67d6573Sopenharmony_ci 167c67d6573Sopenharmony_ci// You can also test whether a particular regex matched: 168c67d6573Sopenharmony_cilet matches = set.matches("foobar"); 169c67d6573Sopenharmony_ciassert!(!matches.matched(5)); 170c67d6573Sopenharmony_ciassert!(matches.matched(6)); 171c67d6573Sopenharmony_ci``` 172c67d6573Sopenharmony_ci 173c67d6573Sopenharmony_ci### Usage: enable SIMD optimizations 174c67d6573Sopenharmony_ci 175c67d6573Sopenharmony_ciSIMD optimizations are enabled automatically on Rust stable 1.27 and newer. 176c67d6573Sopenharmony_ciFor nightly versions of Rust, this requires a recent version with the SIMD 177c67d6573Sopenharmony_cifeatures stabilized. 178c67d6573Sopenharmony_ci 179c67d6573Sopenharmony_ci 180c67d6573Sopenharmony_ci### Usage: a regular expression parser 181c67d6573Sopenharmony_ci 182c67d6573Sopenharmony_ciThis repository contains a crate that provides a well tested regular expression 183c67d6573Sopenharmony_ciparser, abstract syntax and a high-level intermediate representation for 184c67d6573Sopenharmony_ciconvenient analysis. It provides no facilities for compilation or execution. 185c67d6573Sopenharmony_ciThis may be useful if you're implementing your own regex engine or otherwise 186c67d6573Sopenharmony_cineed to do analysis on the syntax of a regular expression. It is otherwise not 187c67d6573Sopenharmony_cirecommended for general use. 188c67d6573Sopenharmony_ci 189c67d6573Sopenharmony_ci[Documentation `regex-syntax`.](https://docs.rs/regex-syntax) 190c67d6573Sopenharmony_ci 191c67d6573Sopenharmony_ci 192c67d6573Sopenharmony_ci### Crate features 193c67d6573Sopenharmony_ci 194c67d6573Sopenharmony_ciThis crate comes with several features that permit tweaking the trade off 195c67d6573Sopenharmony_cibetween binary size, compilation time and runtime performance. Users of this 196c67d6573Sopenharmony_cicrate can selectively disable Unicode tables, or choose from a variety of 197c67d6573Sopenharmony_cioptimizations performed by this crate to disable. 198c67d6573Sopenharmony_ci 199c67d6573Sopenharmony_ciWhen all of these features are disabled, runtime match performance may be much 200c67d6573Sopenharmony_ciworse, but if you're matching on short strings, or if high performance isn't 201c67d6573Sopenharmony_cinecessary, then such a configuration is perfectly serviceable. To disable 202c67d6573Sopenharmony_ciall such features, use the following `Cargo.toml` dependency configuration: 203c67d6573Sopenharmony_ci 204c67d6573Sopenharmony_ci```toml 205c67d6573Sopenharmony_ci[dependencies.regex] 206c67d6573Sopenharmony_civersion = "1.3" 207c67d6573Sopenharmony_cidefault-features = false 208c67d6573Sopenharmony_ci# regex currently requires the standard library, you must re-enable it. 209c67d6573Sopenharmony_cifeatures = ["std"] 210c67d6573Sopenharmony_ci``` 211c67d6573Sopenharmony_ci 212c67d6573Sopenharmony_ciThis will reduce the dependency tree of `regex` down to a single crate 213c67d6573Sopenharmony_ci(`regex-syntax`). 214c67d6573Sopenharmony_ci 215c67d6573Sopenharmony_ciThe full set of features one can disable are 216c67d6573Sopenharmony_ci[in the "Crate features" section of the documentation](https://docs.rs/regex/*/#crate-features). 217c67d6573Sopenharmony_ci 218c67d6573Sopenharmony_ci 219c67d6573Sopenharmony_ci### Minimum Rust version policy 220c67d6573Sopenharmony_ci 221c67d6573Sopenharmony_ciThis crate's minimum supported `rustc` version is `1.41.1`. 222c67d6573Sopenharmony_ci 223c67d6573Sopenharmony_ciThe current **tentative** policy is that the minimum Rust version required 224c67d6573Sopenharmony_cito use this crate can be increased in minor version updates. For example, if 225c67d6573Sopenharmony_ciregex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will 226c67d6573Sopenharmony_cialso require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a 227c67d6573Sopenharmony_cinewer minimum version of Rust. 228c67d6573Sopenharmony_ci 229c67d6573Sopenharmony_ciIn general, this crate will be conservative with respect to the minimum 230c67d6573Sopenharmony_cisupported version of Rust. 231c67d6573Sopenharmony_ci 232c67d6573Sopenharmony_ci 233c67d6573Sopenharmony_ci### License 234c67d6573Sopenharmony_ci 235c67d6573Sopenharmony_ciThis project is licensed under either of 236c67d6573Sopenharmony_ci 237c67d6573Sopenharmony_ci * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or 238c67d6573Sopenharmony_ci https://www.apache.org/licenses/LICENSE-2.0) 239c67d6573Sopenharmony_ci * MIT license ([LICENSE-MIT](LICENSE-MIT) or 240c67d6573Sopenharmony_ci https://opensource.org/licenses/MIT) 241c67d6573Sopenharmony_ci 242c67d6573Sopenharmony_ciat your option. 243c67d6573Sopenharmony_ci 244c67d6573Sopenharmony_ciThe data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode 245c67d6573Sopenharmony_ciLicense Agreement 246c67d6573Sopenharmony_ci([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)). 247