1c67d6573Sopenharmony_ciregex-syntax 2c67d6573Sopenharmony_ci============ 3c67d6573Sopenharmony_ciThis crate provides a robust regular expression parser. 4c67d6573Sopenharmony_ci 5c67d6573Sopenharmony_ci[](https://github.com/rust-lang/regex/actions) 6c67d6573Sopenharmony_ci[](https://crates.io/crates/regex-syntax) 7c67d6573Sopenharmony_ci[](https://github.com/rust-lang/regex) 8c67d6573Sopenharmony_ci 9c67d6573Sopenharmony_ci 10c67d6573Sopenharmony_ci### Documentation 11c67d6573Sopenharmony_ci 12c67d6573Sopenharmony_cihttps://docs.rs/regex-syntax 13c67d6573Sopenharmony_ci 14c67d6573Sopenharmony_ci 15c67d6573Sopenharmony_ci### Overview 16c67d6573Sopenharmony_ci 17c67d6573Sopenharmony_ciThere are two primary types exported by this crate: `Ast` and `Hir`. The former 18c67d6573Sopenharmony_ciis a faithful abstract syntax of a regular expression, and can convert regular 19c67d6573Sopenharmony_ciexpressions back to their concrete syntax while mostly preserving its original 20c67d6573Sopenharmony_ciform. The latter type is a high level intermediate representation of a regular 21c67d6573Sopenharmony_ciexpression that is amenable to analysis and compilation into byte codes or 22c67d6573Sopenharmony_ciautomata. An `Hir` achieves this by drastically simplifying the syntactic 23c67d6573Sopenharmony_cistructure of the regular expression. While an `Hir` can be converted back to 24c67d6573Sopenharmony_ciits equivalent concrete syntax, the result is unlikely to resemble the original 25c67d6573Sopenharmony_ciconcrete syntax that produced the `Hir`. 26c67d6573Sopenharmony_ci 27c67d6573Sopenharmony_ci 28c67d6573Sopenharmony_ci### Example 29c67d6573Sopenharmony_ci 30c67d6573Sopenharmony_ciThis example shows how to parse a pattern string into its HIR: 31c67d6573Sopenharmony_ci 32c67d6573Sopenharmony_ci```rust 33c67d6573Sopenharmony_ciuse regex_syntax::Parser; 34c67d6573Sopenharmony_ciuse regex_syntax::hir::{self, Hir}; 35c67d6573Sopenharmony_ci 36c67d6573Sopenharmony_cilet hir = Parser::new().parse("a|b").unwrap(); 37c67d6573Sopenharmony_ciassert_eq!(hir, Hir::alternation(vec![ 38c67d6573Sopenharmony_ci Hir::literal(hir::Literal::Unicode('a')), 39c67d6573Sopenharmony_ci Hir::literal(hir::Literal::Unicode('b')), 40c67d6573Sopenharmony_ci])); 41c67d6573Sopenharmony_ci``` 42c67d6573Sopenharmony_ci 43c67d6573Sopenharmony_ci 44c67d6573Sopenharmony_ci### Safety 45c67d6573Sopenharmony_ci 46c67d6573Sopenharmony_ciThis crate has no `unsafe` code and sets `forbid(unsafe_code)`. While it's 47c67d6573Sopenharmony_cipossible this crate could use `unsafe` code in the future, the standard 48c67d6573Sopenharmony_cifor doing so is extremely high. In general, most code in this crate is not 49c67d6573Sopenharmony_ciperformance critical, since it tends to be dwarfed by the time it takes to 50c67d6573Sopenharmony_cicompile a regular expression into an automaton. Therefore, there is little need 51c67d6573Sopenharmony_cifor extreme optimization, and therefore, use of `unsafe`. 52c67d6573Sopenharmony_ci 53c67d6573Sopenharmony_ciThe standard for using `unsafe` in this crate is extremely high because this 54c67d6573Sopenharmony_cicrate is intended to be reasonably safe to use with user supplied regular 55c67d6573Sopenharmony_ciexpressions. Therefore, while there may be bugs in the regex parser itself, 56c67d6573Sopenharmony_cithey should _never_ result in memory unsafety unless there is either a bug 57c67d6573Sopenharmony_ciin the compiler or the standard library. (Since `regex-syntax` has zero 58c67d6573Sopenharmony_cidependencies.) 59c67d6573Sopenharmony_ci 60c67d6573Sopenharmony_ci 61c67d6573Sopenharmony_ci### Crate features 62c67d6573Sopenharmony_ci 63c67d6573Sopenharmony_ciBy default, this crate bundles a fairly large amount of Unicode data tables 64c67d6573Sopenharmony_ci(a source size of ~750KB). Because of their large size, one can disable some 65c67d6573Sopenharmony_cior all of these data tables. If a regular expression attempts to use Unicode 66c67d6573Sopenharmony_cidata that is not available, then an error will occur when translating the `Ast` 67c67d6573Sopenharmony_cito the `Hir`. 68c67d6573Sopenharmony_ci 69c67d6573Sopenharmony_ciThe full set of features one can disable are 70c67d6573Sopenharmony_ci[in the "Crate features" section of the documentation](https://docs.rs/regex-syntax/*/#crate-features). 71c67d6573Sopenharmony_ci 72c67d6573Sopenharmony_ci 73c67d6573Sopenharmony_ci### Testing 74c67d6573Sopenharmony_ci 75c67d6573Sopenharmony_ciSimply running `cargo test` will give you very good coverage. However, because 76c67d6573Sopenharmony_ciof the large number of features exposed by this crate, a `test` script is 77c67d6573Sopenharmony_ciincluded in this directory which will test several feature combinations. This 78c67d6573Sopenharmony_ciis the same script that is run in CI. 79c67d6573Sopenharmony_ci 80c67d6573Sopenharmony_ci 81c67d6573Sopenharmony_ci### Motivation 82c67d6573Sopenharmony_ci 83c67d6573Sopenharmony_ciThe primary purpose of this crate is to provide the parser used by `regex`. 84c67d6573Sopenharmony_ciSpecifically, this crate is treated as an implementation detail of the `regex`, 85c67d6573Sopenharmony_ciand is primarily developed for the needs of `regex`. 86c67d6573Sopenharmony_ci 87c67d6573Sopenharmony_ciSince this crate is an implementation detail of `regex`, it may experience 88c67d6573Sopenharmony_cibreaking change releases at a different cadence from `regex`. This is only 89c67d6573Sopenharmony_cipossible because this crate is _not_ a public dependency of `regex`. 90c67d6573Sopenharmony_ci 91c67d6573Sopenharmony_ciAnother consequence of this de-coupling is that there is no direct way to 92c67d6573Sopenharmony_cicompile a `regex::Regex` from a `regex_syntax::hir::Hir`. Instead, one must 93c67d6573Sopenharmony_cifirst convert the `Hir` to a string (via its `std::fmt::Display`) and then 94c67d6573Sopenharmony_cicompile that via `Regex::new`. While this does repeat some work, compilation 95c67d6573Sopenharmony_citypically takes much longer than parsing. 96c67d6573Sopenharmony_ci 97c67d6573Sopenharmony_ciStated differently, the coupling between `regex` and `regex-syntax` exists only 98c67d6573Sopenharmony_ciat the level of the concrete syntax. 99