1fb6c1f39Sopenharmony_cimemchr
2fb6c1f39Sopenharmony_ci======
3fb6c1f39Sopenharmony_ciThis library provides heavily optimized routines for string search primitives.
4fb6c1f39Sopenharmony_ci
5fb6c1f39Sopenharmony_ci[![Build status](https://github.com/BurntSushi/memchr/workflows/ci/badge.svg)](https://github.com/BurntSushi/memchr/actions)
6fb6c1f39Sopenharmony_ci[![Crates.io](https://img.shields.io/crates/v/memchr.svg)](https://crates.io/crates/memchr)
7fb6c1f39Sopenharmony_ci
8fb6c1f39Sopenharmony_ciDual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
9fb6c1f39Sopenharmony_ci
10fb6c1f39Sopenharmony_ci
11fb6c1f39Sopenharmony_ci### Documentation
12fb6c1f39Sopenharmony_ci
13fb6c1f39Sopenharmony_ci[https://docs.rs/memchr](https://docs.rs/memchr)
14fb6c1f39Sopenharmony_ci
15fb6c1f39Sopenharmony_ci
16fb6c1f39Sopenharmony_ci### Overview
17fb6c1f39Sopenharmony_ci
18fb6c1f39Sopenharmony_ci* The top-level module provides routines for searching for 1, 2 or 3 bytes
19fb6c1f39Sopenharmony_ci  in the forward or reverse direction. When searching for more than one byte,
20fb6c1f39Sopenharmony_ci  positions are considered a match if the byte at that position matches any
21fb6c1f39Sopenharmony_ci  of the bytes.
22fb6c1f39Sopenharmony_ci* The `memmem` sub-module provides forward and reverse substring search
23fb6c1f39Sopenharmony_ci  routines.
24fb6c1f39Sopenharmony_ci
25fb6c1f39Sopenharmony_ciIn all such cases, routines operate on `&[u8]` without regard to encoding. This
26fb6c1f39Sopenharmony_ciis exactly what you want when searching either UTF-8 or arbitrary bytes.
27fb6c1f39Sopenharmony_ci
28fb6c1f39Sopenharmony_ci### Compiling without the standard library
29fb6c1f39Sopenharmony_ci
30fb6c1f39Sopenharmony_cimemchr links to the standard library by default, but you can disable the
31fb6c1f39Sopenharmony_ci`std` feature if you want to use it in a `#![no_std]` crate:
32fb6c1f39Sopenharmony_ci
33fb6c1f39Sopenharmony_ci```toml
34fb6c1f39Sopenharmony_ci[dependencies]
35fb6c1f39Sopenharmony_cimemchr = { version = "2", default-features = false }
36fb6c1f39Sopenharmony_ci```
37fb6c1f39Sopenharmony_ci
38fb6c1f39Sopenharmony_ciOn x86 platforms, when the `std` feature is disabled, the SSE2 accelerated
39fb6c1f39Sopenharmony_ciimplementations will be used. When `std` is enabled, AVX accelerated
40fb6c1f39Sopenharmony_ciimplementations will be used if the CPU is determined to support it at runtime.
41fb6c1f39Sopenharmony_ci
42fb6c1f39Sopenharmony_ci### Using libc
43fb6c1f39Sopenharmony_ci
44fb6c1f39Sopenharmony_ci`memchr` is a routine that is part of libc, although this crate does not use
45fb6c1f39Sopenharmony_cilibc by default. Instead, it uses its own routines, which are either vectorized
46fb6c1f39Sopenharmony_cior generic fallback routines. In general, these should be competitive with
47fb6c1f39Sopenharmony_ciwhat's in libc, although this has not been tested for all architectures. If
48fb6c1f39Sopenharmony_ciusing `memchr` from libc is desirable and a vectorized routine is not otherwise
49fb6c1f39Sopenharmony_ciavailable in this crate, then enabling the `libc` feature will use libc's
50fb6c1f39Sopenharmony_civersion of `memchr`.
51fb6c1f39Sopenharmony_ci
52fb6c1f39Sopenharmony_ciThe rest of the functions in this crate, e.g., `memchr2` or `memrchr3` and the
53fb6c1f39Sopenharmony_cisubstring search routines, will always use the implementations in this crate.
54fb6c1f39Sopenharmony_ciOne exception to this is `memrchr`, which is an extension in `libc` found on
55fb6c1f39Sopenharmony_ciLinux. On Linux, `memrchr` is used in precisely the same scenario as `memchr`,
56fb6c1f39Sopenharmony_cias described above.
57fb6c1f39Sopenharmony_ci
58fb6c1f39Sopenharmony_ci
59fb6c1f39Sopenharmony_ci### Minimum Rust version policy
60fb6c1f39Sopenharmony_ci
61fb6c1f39Sopenharmony_ciThis crate's minimum supported `rustc` version is `1.41.1`.
62fb6c1f39Sopenharmony_ci
63fb6c1f39Sopenharmony_ciThe current policy is that the minimum Rust version required to use this crate
64fb6c1f39Sopenharmony_cican be increased in minor version updates. For example, if `crate 1.0` requires
65fb6c1f39Sopenharmony_ciRust 1.20.0, then `crate 1.0.z` for all values of `z` will also require Rust
66fb6c1f39Sopenharmony_ci1.20.0 or newer. However, `crate 1.y` for `y > 0` may require a newer minimum
67fb6c1f39Sopenharmony_civersion of Rust.
68fb6c1f39Sopenharmony_ci
69fb6c1f39Sopenharmony_ciIn general, this crate will be conservative with respect to the minimum
70fb6c1f39Sopenharmony_cisupported version of Rust.
71fb6c1f39Sopenharmony_ci
72fb6c1f39Sopenharmony_ci
73fb6c1f39Sopenharmony_ci### Testing strategy
74fb6c1f39Sopenharmony_ci
75fb6c1f39Sopenharmony_ciGiven the complexity of the code in this crate, along with the pervasive use
76fb6c1f39Sopenharmony_ciof `unsafe`, this crate has an extensive testing strategy. It combines multiple
77fb6c1f39Sopenharmony_ciapproaches:
78fb6c1f39Sopenharmony_ci
79fb6c1f39Sopenharmony_ci* Hand-written tests.
80fb6c1f39Sopenharmony_ci* Exhaustive-style testing meant to exercise all possible branching and offset
81fb6c1f39Sopenharmony_ci  calculations.
82fb6c1f39Sopenharmony_ci* Property based testing through [`quickcheck`](https://github.com/BurntSushi/quickcheck).
83fb6c1f39Sopenharmony_ci* Fuzz testing through [`cargo fuzz`](https://github.com/rust-fuzz/cargo-fuzz).
84fb6c1f39Sopenharmony_ci* A huge suite of benchmarks that are also run as tests. Benchmarks always
85fb6c1f39Sopenharmony_ci  confirm that the expected result occurs.
86fb6c1f39Sopenharmony_ci
87fb6c1f39Sopenharmony_ciImprovements to the testing infrastructure are very welcome.
88fb6c1f39Sopenharmony_ci
89fb6c1f39Sopenharmony_ci
90fb6c1f39Sopenharmony_ci### Algorithms used
91fb6c1f39Sopenharmony_ci
92fb6c1f39Sopenharmony_ciAt time of writing, this crate's implementation of substring search actually
93fb6c1f39Sopenharmony_cihas a few different algorithms to choose from depending on the situation.
94fb6c1f39Sopenharmony_ci
95fb6c1f39Sopenharmony_ci* For very small haystacks,
96fb6c1f39Sopenharmony_ci  [Rabin-Karp](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
97fb6c1f39Sopenharmony_ci  is used to reduce latency. Rabin-Karp has very small overhead and can often
98fb6c1f39Sopenharmony_ci  complete before other searchers have even been constructed.
99fb6c1f39Sopenharmony_ci* For small needles, a variant of the
100fb6c1f39Sopenharmony_ci  ["Generic SIMD"](http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd)
101fb6c1f39Sopenharmony_ci  algorithm is used. Instead of using the first and last bytes, a heuristic is
102fb6c1f39Sopenharmony_ci  used to select bytes based on a background distribution of byte frequencies.
103fb6c1f39Sopenharmony_ci* In all other cases,
104fb6c1f39Sopenharmony_ci  [Two-Way](https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm)
105fb6c1f39Sopenharmony_ci  is used. If possible, a prefilter based on the "Generic SIMD" algorithm
106fb6c1f39Sopenharmony_ci  linked above is used to find candidates quickly. A dynamic heuristic is used
107fb6c1f39Sopenharmony_ci  to detect if the prefilter is ineffective, and if so, disables it.
108