1fb6c1f39Sopenharmony_cimemchr 2fb6c1f39Sopenharmony_ci====== 3fb6c1f39Sopenharmony_ciThis library provides heavily optimized routines for string search primitives. 4fb6c1f39Sopenharmony_ci 5fb6c1f39Sopenharmony_ci[](https://github.com/BurntSushi/memchr/actions) 6fb6c1f39Sopenharmony_ci[](https://crates.io/crates/memchr) 7fb6c1f39Sopenharmony_ci 8fb6c1f39Sopenharmony_ciDual-licensed under MIT or the [UNLICENSE](https://unlicense.org/). 9fb6c1f39Sopenharmony_ci 10fb6c1f39Sopenharmony_ci 11fb6c1f39Sopenharmony_ci### Documentation 12fb6c1f39Sopenharmony_ci 13fb6c1f39Sopenharmony_ci[https://docs.rs/memchr](https://docs.rs/memchr) 14fb6c1f39Sopenharmony_ci 15fb6c1f39Sopenharmony_ci 16fb6c1f39Sopenharmony_ci### Overview 17fb6c1f39Sopenharmony_ci 18fb6c1f39Sopenharmony_ci* The top-level module provides routines for searching for 1, 2 or 3 bytes 19fb6c1f39Sopenharmony_ci in the forward or reverse direction. When searching for more than one byte, 20fb6c1f39Sopenharmony_ci positions are considered a match if the byte at that position matches any 21fb6c1f39Sopenharmony_ci of the bytes. 22fb6c1f39Sopenharmony_ci* The `memmem` sub-module provides forward and reverse substring search 23fb6c1f39Sopenharmony_ci routines. 24fb6c1f39Sopenharmony_ci 25fb6c1f39Sopenharmony_ciIn all such cases, routines operate on `&[u8]` without regard to encoding. This 26fb6c1f39Sopenharmony_ciis exactly what you want when searching either UTF-8 or arbitrary bytes. 27fb6c1f39Sopenharmony_ci 28fb6c1f39Sopenharmony_ci### Compiling without the standard library 29fb6c1f39Sopenharmony_ci 30fb6c1f39Sopenharmony_cimemchr links to the standard library by default, but you can disable the 31fb6c1f39Sopenharmony_ci`std` feature if you want to use it in a `#![no_std]` crate: 32fb6c1f39Sopenharmony_ci 33fb6c1f39Sopenharmony_ci```toml 34fb6c1f39Sopenharmony_ci[dependencies] 35fb6c1f39Sopenharmony_cimemchr = { version = "2", default-features = false } 36fb6c1f39Sopenharmony_ci``` 37fb6c1f39Sopenharmony_ci 38fb6c1f39Sopenharmony_ciOn x86 platforms, when the `std` feature is disabled, the SSE2 accelerated 39fb6c1f39Sopenharmony_ciimplementations will be used. When `std` is enabled, AVX accelerated 40fb6c1f39Sopenharmony_ciimplementations will be used if the CPU is determined to support it at runtime. 41fb6c1f39Sopenharmony_ci 42fb6c1f39Sopenharmony_ci### Using libc 43fb6c1f39Sopenharmony_ci 44fb6c1f39Sopenharmony_ci`memchr` is a routine that is part of libc, although this crate does not use 45fb6c1f39Sopenharmony_cilibc by default. Instead, it uses its own routines, which are either vectorized 46fb6c1f39Sopenharmony_cior generic fallback routines. In general, these should be competitive with 47fb6c1f39Sopenharmony_ciwhat's in libc, although this has not been tested for all architectures. If 48fb6c1f39Sopenharmony_ciusing `memchr` from libc is desirable and a vectorized routine is not otherwise 49fb6c1f39Sopenharmony_ciavailable in this crate, then enabling the `libc` feature will use libc's 50fb6c1f39Sopenharmony_civersion of `memchr`. 51fb6c1f39Sopenharmony_ci 52fb6c1f39Sopenharmony_ciThe rest of the functions in this crate, e.g., `memchr2` or `memrchr3` and the 53fb6c1f39Sopenharmony_cisubstring search routines, will always use the implementations in this crate. 54fb6c1f39Sopenharmony_ciOne exception to this is `memrchr`, which is an extension in `libc` found on 55fb6c1f39Sopenharmony_ciLinux. On Linux, `memrchr` is used in precisely the same scenario as `memchr`, 56fb6c1f39Sopenharmony_cias described above. 57fb6c1f39Sopenharmony_ci 58fb6c1f39Sopenharmony_ci 59fb6c1f39Sopenharmony_ci### Minimum Rust version policy 60fb6c1f39Sopenharmony_ci 61fb6c1f39Sopenharmony_ciThis crate's minimum supported `rustc` version is `1.41.1`. 62fb6c1f39Sopenharmony_ci 63fb6c1f39Sopenharmony_ciThe current policy is that the minimum Rust version required to use this crate 64fb6c1f39Sopenharmony_cican be increased in minor version updates. For example, if `crate 1.0` requires 65fb6c1f39Sopenharmony_ciRust 1.20.0, then `crate 1.0.z` for all values of `z` will also require Rust 66fb6c1f39Sopenharmony_ci1.20.0 or newer. However, `crate 1.y` for `y > 0` may require a newer minimum 67fb6c1f39Sopenharmony_civersion of Rust. 68fb6c1f39Sopenharmony_ci 69fb6c1f39Sopenharmony_ciIn general, this crate will be conservative with respect to the minimum 70fb6c1f39Sopenharmony_cisupported version of Rust. 71fb6c1f39Sopenharmony_ci 72fb6c1f39Sopenharmony_ci 73fb6c1f39Sopenharmony_ci### Testing strategy 74fb6c1f39Sopenharmony_ci 75fb6c1f39Sopenharmony_ciGiven the complexity of the code in this crate, along with the pervasive use 76fb6c1f39Sopenharmony_ciof `unsafe`, this crate has an extensive testing strategy. It combines multiple 77fb6c1f39Sopenharmony_ciapproaches: 78fb6c1f39Sopenharmony_ci 79fb6c1f39Sopenharmony_ci* Hand-written tests. 80fb6c1f39Sopenharmony_ci* Exhaustive-style testing meant to exercise all possible branching and offset 81fb6c1f39Sopenharmony_ci calculations. 82fb6c1f39Sopenharmony_ci* Property based testing through [`quickcheck`](https://github.com/BurntSushi/quickcheck). 83fb6c1f39Sopenharmony_ci* Fuzz testing through [`cargo fuzz`](https://github.com/rust-fuzz/cargo-fuzz). 84fb6c1f39Sopenharmony_ci* A huge suite of benchmarks that are also run as tests. Benchmarks always 85fb6c1f39Sopenharmony_ci confirm that the expected result occurs. 86fb6c1f39Sopenharmony_ci 87fb6c1f39Sopenharmony_ciImprovements to the testing infrastructure are very welcome. 88fb6c1f39Sopenharmony_ci 89fb6c1f39Sopenharmony_ci 90fb6c1f39Sopenharmony_ci### Algorithms used 91fb6c1f39Sopenharmony_ci 92fb6c1f39Sopenharmony_ciAt time of writing, this crate's implementation of substring search actually 93fb6c1f39Sopenharmony_cihas a few different algorithms to choose from depending on the situation. 94fb6c1f39Sopenharmony_ci 95fb6c1f39Sopenharmony_ci* For very small haystacks, 96fb6c1f39Sopenharmony_ci [Rabin-Karp](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm) 97fb6c1f39Sopenharmony_ci is used to reduce latency. Rabin-Karp has very small overhead and can often 98fb6c1f39Sopenharmony_ci complete before other searchers have even been constructed. 99fb6c1f39Sopenharmony_ci* For small needles, a variant of the 100fb6c1f39Sopenharmony_ci ["Generic SIMD"](http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd) 101fb6c1f39Sopenharmony_ci algorithm is used. Instead of using the first and last bytes, a heuristic is 102fb6c1f39Sopenharmony_ci used to select bytes based on a background distribution of byte frequencies. 103fb6c1f39Sopenharmony_ci* In all other cases, 104fb6c1f39Sopenharmony_ci [Two-Way](https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm) 105fb6c1f39Sopenharmony_ci is used. If possible, a prefilter based on the "Generic SIMD" algorithm 106fb6c1f39Sopenharmony_ci linked above is used to find candidates quickly. A dynamic heuristic is used 107fb6c1f39Sopenharmony_ci to detect if the prefilter is ineffective, and if so, disables it. 108