1c67d6573Sopenharmony_ciYour friendly guide to hacking and navigating the regex library. 2c67d6573Sopenharmony_ci 3c67d6573Sopenharmony_ciThis guide assumes familiarity with Rust and Cargo, and at least a perusal of 4c67d6573Sopenharmony_cithe user facing documentation for this crate. 5c67d6573Sopenharmony_ci 6c67d6573Sopenharmony_ciIf you're looking for background on the implementation in this library, then 7c67d6573Sopenharmony_ciyou can do no better than Russ Cox's article series on implementing regular 8c67d6573Sopenharmony_ciexpressions using finite automata: https://swtch.com/~rsc/regexp/ 9c67d6573Sopenharmony_ci 10c67d6573Sopenharmony_ci 11c67d6573Sopenharmony_ci## Architecture overview 12c67d6573Sopenharmony_ci 13c67d6573Sopenharmony_ciAs you probably already know, this library executes regular expressions using 14c67d6573Sopenharmony_cifinite automata. In particular, a design goal is to make searching linear 15c67d6573Sopenharmony_ciwith respect to both the regular expression and the text being searched. 16c67d6573Sopenharmony_ciMeeting that design goal on its own is not so hard and can be done with an 17c67d6573Sopenharmony_ciimplementation of the Pike VM (similar to Thompson's construction, but supports 18c67d6573Sopenharmony_cicapturing groups), as described in: https://swtch.com/~rsc/regexp/regexp2.html 19c67d6573Sopenharmony_ci--- This library contains such an implementation in src/pikevm.rs. 20c67d6573Sopenharmony_ci 21c67d6573Sopenharmony_ciMaking it fast is harder. One of the key problems with the Pike VM is that it 22c67d6573Sopenharmony_cican be in more than one state at any point in time, and must shuffle capture 23c67d6573Sopenharmony_cipositions between them. The Pike VM also spends a lot of time following the 24c67d6573Sopenharmony_cisame epsilon transitions over and over again. We can employ one trick to 25c67d6573Sopenharmony_cispeed up the Pike VM: extract one or more literal prefixes from the regular 26c67d6573Sopenharmony_ciexpression and execute specialized code to quickly find matches of those 27c67d6573Sopenharmony_ciprefixes in the search text. The Pike VM can then be avoided for most the 28c67d6573Sopenharmony_cisearch, and instead only executed when a prefix is found. The code to find 29c67d6573Sopenharmony_ciprefixes is in the regex-syntax crate (in this repository). The code to search 30c67d6573Sopenharmony_cifor literals is in src/literals.rs. When more than one literal prefix is found, 31c67d6573Sopenharmony_ciwe fall back to an Aho-Corasick DFA using the aho-corasick crate. For one 32c67d6573Sopenharmony_ciliteral, we use a variant of the Boyer-Moore algorithm. Both Aho-Corasick and 33c67d6573Sopenharmony_ciBoyer-Moore use `memchr` when appropriate. The Boyer-Moore variant in this 34c67d6573Sopenharmony_cilibrary also uses elementary frequency analysis to choose the right byte to run 35c67d6573Sopenharmony_ci`memchr` with. 36c67d6573Sopenharmony_ci 37c67d6573Sopenharmony_ciOf course, detecting prefix literals can only take us so far. Not all regular 38c67d6573Sopenharmony_ciexpressions have literal prefixes. To remedy this, we try another approach 39c67d6573Sopenharmony_cito executing the Pike VM: backtracking, whose implementation can be found in 40c67d6573Sopenharmony_cisrc/backtrack.rs. One reason why backtracking can be faster is that it avoids 41c67d6573Sopenharmony_ciexcessive shuffling of capture groups. Of course, backtracking is susceptible 42c67d6573Sopenharmony_cito exponential runtimes, so we keep track of every state we've visited to make 43c67d6573Sopenharmony_cisure we never visit it again. This guarantees linear time execution, but we 44c67d6573Sopenharmony_cipay for it with the memory required to track visited states. Because of the 45c67d6573Sopenharmony_cimemory requirement, we only use this engine on small search strings *and* small 46c67d6573Sopenharmony_ciregular expressions. 47c67d6573Sopenharmony_ci 48c67d6573Sopenharmony_ciLastly, the real workhorse of this library is the "lazy" DFA in src/dfa.rs. 49c67d6573Sopenharmony_ciIt is distinct from the Pike VM in that the DFA is explicitly represented in 50c67d6573Sopenharmony_cimemory and is only ever in one state at a time. It is said to be "lazy" because 51c67d6573Sopenharmony_cithe DFA is computed as text is searched, where each byte in the search text 52c67d6573Sopenharmony_ciresults in at most one new DFA state. It is made fast by caching states. DFAs 53c67d6573Sopenharmony_ciare susceptible to exponential state blow up (where the worst case is computing 54c67d6573Sopenharmony_cia new state for every input byte, regardless of what's in the state cache). To 55c67d6573Sopenharmony_ciavoid using a lot of memory, the lazy DFA uses a bounded cache. Once the cache 56c67d6573Sopenharmony_ciis full, it is wiped and state computation starts over again. If the cache is 57c67d6573Sopenharmony_ciwiped too frequently, then the DFA gives up and searching falls back to one of 58c67d6573Sopenharmony_cithe aforementioned algorithms. 59c67d6573Sopenharmony_ci 60c67d6573Sopenharmony_ciAll of the above matching engines expose precisely the same matching semantics. 61c67d6573Sopenharmony_ciThis is indeed tested. (See the section below about testing.) 62c67d6573Sopenharmony_ci 63c67d6573Sopenharmony_ciThe following sub-sections describe the rest of the library and how each of the 64c67d6573Sopenharmony_cimatching engines are actually used. 65c67d6573Sopenharmony_ci 66c67d6573Sopenharmony_ci### Parsing 67c67d6573Sopenharmony_ci 68c67d6573Sopenharmony_ciRegular expressions are parsed using the regex-syntax crate, which is 69c67d6573Sopenharmony_cimaintained in this repository. The regex-syntax crate defines an abstract 70c67d6573Sopenharmony_cisyntax and provides very detailed error messages when a parse error is 71c67d6573Sopenharmony_ciencountered. Parsing is done in a separate crate so that others may benefit 72c67d6573Sopenharmony_cifrom its existence, and because it is relatively divorced from the rest of the 73c67d6573Sopenharmony_ciregex library. 74c67d6573Sopenharmony_ci 75c67d6573Sopenharmony_ciThe regex-syntax crate also provides sophisticated support for extracting 76c67d6573Sopenharmony_ciprefix and suffix literals from regular expressions. 77c67d6573Sopenharmony_ci 78c67d6573Sopenharmony_ci### Compilation 79c67d6573Sopenharmony_ci 80c67d6573Sopenharmony_ciThe compiler is in src/compile.rs. The input to the compiler is some abstract 81c67d6573Sopenharmony_cisyntax for a regular expression and the output is a sequence of opcodes that 82c67d6573Sopenharmony_cimatching engines use to execute a search. (One can think of matching engines as 83c67d6573Sopenharmony_cimini virtual machines.) The sequence of opcodes is a particular encoding of a 84c67d6573Sopenharmony_cinon-deterministic finite automaton. In particular, the opcodes explicitly rely 85c67d6573Sopenharmony_cion epsilon transitions. 86c67d6573Sopenharmony_ci 87c67d6573Sopenharmony_ciConsider a simple regular expression like `a|b`. Its compiled form looks like 88c67d6573Sopenharmony_cithis: 89c67d6573Sopenharmony_ci 90c67d6573Sopenharmony_ci 000 Save(0) 91c67d6573Sopenharmony_ci 001 Split(2, 3) 92c67d6573Sopenharmony_ci 002 'a' (goto: 4) 93c67d6573Sopenharmony_ci 003 'b' 94c67d6573Sopenharmony_ci 004 Save(1) 95c67d6573Sopenharmony_ci 005 Match 96c67d6573Sopenharmony_ci 97c67d6573Sopenharmony_ciThe first column is the instruction pointer and the second column is the 98c67d6573Sopenharmony_ciinstruction. Save instructions indicate that the current position in the input 99c67d6573Sopenharmony_cishould be stored in a captured location. Split instructions represent a binary 100c67d6573Sopenharmony_cibranch in the program (i.e., epsilon transitions). The instructions `'a'` and 101c67d6573Sopenharmony_ci`'b'` indicate that the literal bytes `'a'` or `'b'` should match. 102c67d6573Sopenharmony_ci 103c67d6573Sopenharmony_ciIn older versions of this library, the compilation looked like this: 104c67d6573Sopenharmony_ci 105c67d6573Sopenharmony_ci 000 Save(0) 106c67d6573Sopenharmony_ci 001 Split(2, 3) 107c67d6573Sopenharmony_ci 002 'a' 108c67d6573Sopenharmony_ci 003 Jump(5) 109c67d6573Sopenharmony_ci 004 'b' 110c67d6573Sopenharmony_ci 005 Save(1) 111c67d6573Sopenharmony_ci 006 Match 112c67d6573Sopenharmony_ci 113c67d6573Sopenharmony_ciIn particular, empty instructions that merely served to move execution from one 114c67d6573Sopenharmony_cipoint in the program to another were removed. Instead, every instruction has a 115c67d6573Sopenharmony_ci`goto` pointer embedded into it. This resulted in a small performance boost for 116c67d6573Sopenharmony_cithe Pike VM, because it was one fewer epsilon transition that it had to follow. 117c67d6573Sopenharmony_ci 118c67d6573Sopenharmony_ciThere exist more instructions and they are defined and documented in 119c67d6573Sopenharmony_cisrc/prog.rs. 120c67d6573Sopenharmony_ci 121c67d6573Sopenharmony_ciCompilation has several knobs and a few unfortunately complicated invariants. 122c67d6573Sopenharmony_ciNamely, the output of compilation can be one of two types of programs: a 123c67d6573Sopenharmony_ciprogram that executes on Unicode scalar values or a program that executes 124c67d6573Sopenharmony_cion raw bytes. In the former case, the matching engine is responsible for 125c67d6573Sopenharmony_ciperforming UTF-8 decoding and executing instructions using Unicode codepoints. 126c67d6573Sopenharmony_ciIn the latter case, the program handles UTF-8 decoding implicitly, so that the 127c67d6573Sopenharmony_cimatching engine can execute on raw bytes. All matching engines can execute 128c67d6573Sopenharmony_cieither Unicode or byte based programs except for the lazy DFA, which requires 129c67d6573Sopenharmony_cibyte based programs. In general, both representations were kept because (1) the 130c67d6573Sopenharmony_cilazy DFA requires byte based programs so that states can be encoded in a memory 131c67d6573Sopenharmony_ciefficient manner and (2) the Pike VM benefits greatly from inlining Unicode 132c67d6573Sopenharmony_cicharacter classes into fewer instructions as it results in fewer epsilon 133c67d6573Sopenharmony_citransitions. 134c67d6573Sopenharmony_ci 135c67d6573Sopenharmony_ciN.B. UTF-8 decoding is built into the compiled program by making use of the 136c67d6573Sopenharmony_ciutf8-ranges crate. The compiler in this library factors out common suffixes to 137c67d6573Sopenharmony_cireduce the size of huge character classes (e.g., `\pL`). 138c67d6573Sopenharmony_ci 139c67d6573Sopenharmony_ciA regrettable consequence of this split in instruction sets is we generally 140c67d6573Sopenharmony_cineed to compile two programs; one for NFA execution and one for the lazy DFA. 141c67d6573Sopenharmony_ci 142c67d6573Sopenharmony_ciIn fact, it is worse than that: the lazy DFA is not capable of finding the 143c67d6573Sopenharmony_cistarting location of a match in a single scan, and must instead execute a 144c67d6573Sopenharmony_cibackwards search after finding the end location. To execute a backwards search, 145c67d6573Sopenharmony_ciwe must have compiled the regular expression *in reverse*. 146c67d6573Sopenharmony_ci 147c67d6573Sopenharmony_ciThis means that every compilation of a regular expression generally results in 148c67d6573Sopenharmony_cithree distinct programs. It would be possible to lazily compile the Unicode 149c67d6573Sopenharmony_ciprogram, since it is never needed if (1) the regular expression uses no word 150c67d6573Sopenharmony_ciboundary assertions and (2) the caller never asks for sub-capture locations. 151c67d6573Sopenharmony_ci 152c67d6573Sopenharmony_ci### Execution 153c67d6573Sopenharmony_ci 154c67d6573Sopenharmony_ciAt the time of writing, there are four matching engines in this library: 155c67d6573Sopenharmony_ci 156c67d6573Sopenharmony_ci1. The Pike VM (supports captures). 157c67d6573Sopenharmony_ci2. Bounded backtracking (supports captures). 158c67d6573Sopenharmony_ci3. Literal substring or multi-substring search. 159c67d6573Sopenharmony_ci4. Lazy DFA (no support for Unicode word boundary assertions). 160c67d6573Sopenharmony_ci 161c67d6573Sopenharmony_ciOnly the first two matching engines are capable of executing every regular 162c67d6573Sopenharmony_ciexpression program. They also happen to be the slowest, which means we need 163c67d6573Sopenharmony_cisome logic that (1) knows various facts about the regular expression and (2) 164c67d6573Sopenharmony_ciknows what the caller wants. Using this information, we can determine which 165c67d6573Sopenharmony_ciengine (or engines) to use. 166c67d6573Sopenharmony_ci 167c67d6573Sopenharmony_ciThe logic for choosing which engine to execute is in src/exec.rs and is 168c67d6573Sopenharmony_cidocumented on the Exec type. Exec values contain regular expression Programs 169c67d6573Sopenharmony_ci(defined in src/prog.rs), which contain all the necessary tidbits for actually 170c67d6573Sopenharmony_ciexecuting a regular expression on search text. 171c67d6573Sopenharmony_ci 172c67d6573Sopenharmony_ciFor the most part, the execution logic is straight-forward and follows the 173c67d6573Sopenharmony_cilimitations of each engine described above pretty faithfully. The hairiest 174c67d6573Sopenharmony_cipart of src/exec.rs by far is the execution of the lazy DFA, since it requires 175c67d6573Sopenharmony_cia forwards and backwards search, and then falls back to either the Pike VM or 176c67d6573Sopenharmony_cibacktracking if the caller requested capture locations. 177c67d6573Sopenharmony_ci 178c67d6573Sopenharmony_ciThe Exec type also contains mutable scratch space for each type of matching 179c67d6573Sopenharmony_ciengine. This scratch space is used during search (for example, for the lazy 180c67d6573Sopenharmony_ciDFA, it contains compiled states that are reused on subsequent searches). 181c67d6573Sopenharmony_ci 182c67d6573Sopenharmony_ci### Programs 183c67d6573Sopenharmony_ci 184c67d6573Sopenharmony_ciA regular expression program is essentially a sequence of opcodes produced by 185c67d6573Sopenharmony_cithe compiler plus various facts about the regular expression (such as whether 186c67d6573Sopenharmony_ciit is anchored, its capture names, etc.). 187c67d6573Sopenharmony_ci 188c67d6573Sopenharmony_ci### The regex! macro 189c67d6573Sopenharmony_ci 190c67d6573Sopenharmony_ciThe `regex!` macro no longer exists. It was developed in a bygone era as a 191c67d6573Sopenharmony_cicompiler plugin during the infancy of the regex crate. Back then, then only 192c67d6573Sopenharmony_cimatching engine in the crate was the Pike VM. The `regex!` macro was, itself, 193c67d6573Sopenharmony_cialso a Pike VM. The only advantages it offered over the dynamic Pike VM that 194c67d6573Sopenharmony_ciwas built at runtime were the following: 195c67d6573Sopenharmony_ci 196c67d6573Sopenharmony_ci 1. Syntax checking was done at compile time. Your Rust program wouldn't 197c67d6573Sopenharmony_ci compile if your regex didn't compile. 198c67d6573Sopenharmony_ci 2. Reduction of overhead that was proportional to the size of the regex. 199c67d6573Sopenharmony_ci For the most part, this overhead consisted of heap allocation, which 200c67d6573Sopenharmony_ci was nearly eliminated in the compiler plugin. 201c67d6573Sopenharmony_ci 202c67d6573Sopenharmony_ciThe main takeaway here is that the compiler plugin was a marginally faster 203c67d6573Sopenharmony_civersion of a slow regex engine. As the regex crate evolved, it grew other regex 204c67d6573Sopenharmony_ciengines (DFA, bounded backtracker) and sophisticated literal optimizations. 205c67d6573Sopenharmony_ciThe regex macro didn't keep pace, and it therefore became (dramatically) slower 206c67d6573Sopenharmony_cithan the dynamic engines. The only reason left to use it was for the compile 207c67d6573Sopenharmony_citime guarantee that your regex is correct. Fortunately, Clippy (the Rust lint 208c67d6573Sopenharmony_citool) has a lint that checks your regular expression validity, which mostly 209c67d6573Sopenharmony_cireplaces that use case. 210c67d6573Sopenharmony_ci 211c67d6573Sopenharmony_ciAdditionally, the regex compiler plugin stopped receiving maintenance. Nobody 212c67d6573Sopenharmony_cicomplained. At that point, it seemed prudent to just remove it. 213c67d6573Sopenharmony_ci 214c67d6573Sopenharmony_ciWill a compiler plugin be brought back? The future is murky, but there is 215c67d6573Sopenharmony_cidefinitely an opportunity there to build something that is faster than the 216c67d6573Sopenharmony_cidynamic engines in some cases. But it will be challenging! As of now, there 217c67d6573Sopenharmony_ciare no plans to work on this. 218c67d6573Sopenharmony_ci 219c67d6573Sopenharmony_ci 220c67d6573Sopenharmony_ci## Testing 221c67d6573Sopenharmony_ci 222c67d6573Sopenharmony_ciA key aspect of any mature regex library is its test suite. A subset of the 223c67d6573Sopenharmony_citests in this library come from Glenn Fowler's AT&T test suite (its online 224c67d6573Sopenharmony_cipresence seems gone at the time of writing). The source of the test suite is 225c67d6573Sopenharmony_cilocated in src/testdata. The scripts/regex-match-tests.py takes the test suite 226c67d6573Sopenharmony_ciin src/testdata and generates tests/matches.rs. 227c67d6573Sopenharmony_ci 228c67d6573Sopenharmony_ciThere are also many other manually crafted tests and regression tests in 229c67d6573Sopenharmony_citests/tests.rs. Some of these tests were taken from RE2. 230c67d6573Sopenharmony_ci 231c67d6573Sopenharmony_ciThe biggest source of complexity in the tests is related to answering this 232c67d6573Sopenharmony_ciquestion: how can we reuse the tests to check all of our matching engines? One 233c67d6573Sopenharmony_ciapproach would have been to encode every test into some kind of format (like 234c67d6573Sopenharmony_cithe AT&T test suite) and code generate tests for each matching engine. The 235c67d6573Sopenharmony_ciapproach we use in this library is to create a Cargo.toml entry point for each 236c67d6573Sopenharmony_cimatching engine we want to test. The entry points are: 237c67d6573Sopenharmony_ci 238c67d6573Sopenharmony_ci* `tests/test_default.rs` - tests `Regex::new` 239c67d6573Sopenharmony_ci* `tests/test_default_bytes.rs` - tests `bytes::Regex::new` 240c67d6573Sopenharmony_ci* `tests/test_nfa.rs` - tests `Regex::new`, forced to use the NFA 241c67d6573Sopenharmony_ci algorithm on every regex. 242c67d6573Sopenharmony_ci* `tests/test_nfa_bytes.rs` - tests `Regex::new`, forced to use the NFA 243c67d6573Sopenharmony_ci algorithm on every regex and use *arbitrary* byte based programs. 244c67d6573Sopenharmony_ci* `tests/test_nfa_utf8bytes.rs` - tests `Regex::new`, forced to use the NFA 245c67d6573Sopenharmony_ci algorithm on every regex and use *UTF-8* byte based programs. 246c67d6573Sopenharmony_ci* `tests/test_backtrack.rs` - tests `Regex::new`, forced to use 247c67d6573Sopenharmony_ci backtracking on every regex. 248c67d6573Sopenharmony_ci* `tests/test_backtrack_bytes.rs` - tests `Regex::new`, forced to use 249c67d6573Sopenharmony_ci backtracking on every regex and use *arbitrary* byte based programs. 250c67d6573Sopenharmony_ci* `tests/test_backtrack_utf8bytes.rs` - tests `Regex::new`, forced to use 251c67d6573Sopenharmony_ci backtracking on every regex and use *UTF-8* byte based programs. 252c67d6573Sopenharmony_ci* `tests/test_crates_regex.rs` - tests to make sure that all of the 253c67d6573Sopenharmony_ci backends behave in the same way against a number of quickcheck 254c67d6573Sopenharmony_ci generated random inputs. These tests need to be enabled through 255c67d6573Sopenharmony_ci the `RUST_REGEX_RANDOM_TEST` environment variable (see 256c67d6573Sopenharmony_ci below). 257c67d6573Sopenharmony_ci 258c67d6573Sopenharmony_ciThe lazy DFA and pure literal engines are absent from this list because 259c67d6573Sopenharmony_cithey cannot be used on every regular expression. Instead, we rely on 260c67d6573Sopenharmony_ci`tests/test_dynamic.rs` to test the lazy DFA and literal engines when possible. 261c67d6573Sopenharmony_ci 262c67d6573Sopenharmony_ciSince the tests are repeated several times, and because `cargo test` runs all 263c67d6573Sopenharmony_cientry points, it can take a while to compile everything. To reduce compile 264c67d6573Sopenharmony_citimes slightly, try using `cargo test --test default`, which will only use the 265c67d6573Sopenharmony_ci`tests/test_default.rs` entry point. 266c67d6573Sopenharmony_ci 267c67d6573Sopenharmony_ciThe random testing takes quite a while, so it is not enabled by default. 268c67d6573Sopenharmony_ciIn order to run the random testing you can set the 269c67d6573Sopenharmony_ci`RUST_REGEX_RANDOM_TEST` environment variable to anything before 270c67d6573Sopenharmony_ciinvoking `cargo test`. Note that this variable is inspected at compile 271c67d6573Sopenharmony_citime, so if the tests don't seem to be running, you may need to run 272c67d6573Sopenharmony_ci`cargo clean`. 273c67d6573Sopenharmony_ci 274c67d6573Sopenharmony_ci## Benchmarking 275c67d6573Sopenharmony_ci 276c67d6573Sopenharmony_ciThe benchmarking in this crate is made up of many micro-benchmarks. Currently, 277c67d6573Sopenharmony_cithere are two primary sets of benchmarks: the benchmarks that were adopted 278c67d6573Sopenharmony_ciat this library's inception (in `bench/src/misc.rs`) and a newer set of 279c67d6573Sopenharmony_cibenchmarks meant to test various optimizations. Specifically, the latter set 280c67d6573Sopenharmony_cicontain some analysis and are in `bench/src/sherlock.rs`. Also, the latter 281c67d6573Sopenharmony_ciset are all executed on the same lengthy input whereas the former benchmarks 282c67d6573Sopenharmony_ciare executed on strings of varying length. 283c67d6573Sopenharmony_ci 284c67d6573Sopenharmony_ciThere is also a smattering of benchmarks for parsing and compilation. 285c67d6573Sopenharmony_ci 286c67d6573Sopenharmony_ciBenchmarks are in a separate crate so that its dependencies can be managed 287c67d6573Sopenharmony_ciseparately from the main regex crate. 288c67d6573Sopenharmony_ci 289c67d6573Sopenharmony_ciBenchmarking follows a similarly wonky setup as tests. There are multiple entry 290c67d6573Sopenharmony_cipoints: 291c67d6573Sopenharmony_ci 292c67d6573Sopenharmony_ci* `bench_rust.rs` - benchmarks `Regex::new` 293c67d6573Sopenharmony_ci* `bench_rust_bytes.rs` benchmarks `bytes::Regex::new` 294c67d6573Sopenharmony_ci* `bench_pcre.rs` - benchmarks PCRE 295c67d6573Sopenharmony_ci* `bench_onig.rs` - benchmarks Oniguruma 296c67d6573Sopenharmony_ci 297c67d6573Sopenharmony_ciThe PCRE and Oniguruma benchmarks exist as a comparison point to a mature 298c67d6573Sopenharmony_ciregular expression library. In general, this regex library compares favorably 299c67d6573Sopenharmony_ci(there are even a few benchmarks that PCRE simply runs too slowly on or 300c67d6573Sopenharmony_cioutright can't execute at all). I would love to add other regular expression 301c67d6573Sopenharmony_cilibrary benchmarks (especially RE2). 302c67d6573Sopenharmony_ci 303c67d6573Sopenharmony_ciIf you're hacking on one of the matching engines and just want to see 304c67d6573Sopenharmony_cibenchmarks, then all you need to run is: 305c67d6573Sopenharmony_ci 306c67d6573Sopenharmony_ci $ (cd bench && ./run rust) 307c67d6573Sopenharmony_ci 308c67d6573Sopenharmony_ciIf you want to compare your results with older benchmarks, then try: 309c67d6573Sopenharmony_ci 310c67d6573Sopenharmony_ci $ (cd bench && ./run rust | tee old) 311c67d6573Sopenharmony_ci $ ... make it faster 312c67d6573Sopenharmony_ci $ (cd bench && ./run rust | tee new) 313c67d6573Sopenharmony_ci $ cargo benchcmp old new --improvements 314c67d6573Sopenharmony_ci 315c67d6573Sopenharmony_ciThe `cargo-benchcmp` utility is available here: 316c67d6573Sopenharmony_cihttps://github.com/BurntSushi/cargo-benchcmp 317c67d6573Sopenharmony_ci 318c67d6573Sopenharmony_ciThe `./bench/run` utility can run benchmarks for PCRE and Oniguruma too. See 319c67d6573Sopenharmony_ci`./bench/bench --help`. 320c67d6573Sopenharmony_ci 321c67d6573Sopenharmony_ci## Dev Docs 322c67d6573Sopenharmony_ci 323c67d6573Sopenharmony_ciWhen digging your teeth into the codebase for the first time, the 324c67d6573Sopenharmony_cicrate documentation can be a great resource. By default `rustdoc` 325c67d6573Sopenharmony_ciwill strip out all documentation of private crate members in an 326c67d6573Sopenharmony_cieffort to help consumers of the crate focus on the *interface* 327c67d6573Sopenharmony_ciwithout having to concern themselves with the *implementation*. 328c67d6573Sopenharmony_ciNormally this is a great thing, but if you want to start hacking 329c67d6573Sopenharmony_cion regex internals it is not what you want. Many of the private members 330c67d6573Sopenharmony_ciof this crate are well documented with rustdoc style comments, and 331c67d6573Sopenharmony_ciit would be a shame to miss out on the opportunity that presents. 332c67d6573Sopenharmony_ciYou can generate the private docs with: 333c67d6573Sopenharmony_ci 334c67d6573Sopenharmony_ci``` 335c67d6573Sopenharmony_ci$ rustdoc --crate-name docs src/lib.rs -o target/doc -L target/debug/deps --no-defaults --passes collapse-docs --passes unindent-comments 336c67d6573Sopenharmony_ci``` 337c67d6573Sopenharmony_ci 338c67d6573Sopenharmony_ciThen just point your browser at `target/doc/regex/index.html`. 339c67d6573Sopenharmony_ci 340c67d6573Sopenharmony_ciSee https://github.com/rust-lang/rust/issues/15347 for more info 341c67d6573Sopenharmony_ciabout generating developer docs for internal use. 342