1a8c51b3fSopenharmony_ci# User Guide
2a8c51b3fSopenharmony_ci
3a8c51b3fSopenharmony_ci## Command Line
4a8c51b3fSopenharmony_ci
5a8c51b3fSopenharmony_ci[Output Formats](#output-formats)
6a8c51b3fSopenharmony_ci
7a8c51b3fSopenharmony_ci[Output Files](#output-files)
8a8c51b3fSopenharmony_ci
9a8c51b3fSopenharmony_ci[Running Benchmarks](#running-benchmarks)
10a8c51b3fSopenharmony_ci
11a8c51b3fSopenharmony_ci[Running a Subset of Benchmarks](#running-a-subset-of-benchmarks)
12a8c51b3fSopenharmony_ci
13a8c51b3fSopenharmony_ci[Result Comparison](#result-comparison)
14a8c51b3fSopenharmony_ci
15a8c51b3fSopenharmony_ci[Extra Context](#extra-context)
16a8c51b3fSopenharmony_ci
17a8c51b3fSopenharmony_ci## Library
18a8c51b3fSopenharmony_ci
19a8c51b3fSopenharmony_ci[Runtime and Reporting Considerations](#runtime-and-reporting-considerations)
20a8c51b3fSopenharmony_ci
21a8c51b3fSopenharmony_ci[Setup/Teardown](#setupteardown)
22a8c51b3fSopenharmony_ci
23a8c51b3fSopenharmony_ci[Passing Arguments](#passing-arguments)
24a8c51b3fSopenharmony_ci
25a8c51b3fSopenharmony_ci[Custom Benchmark Name](#custom-benchmark-name)
26a8c51b3fSopenharmony_ci
27a8c51b3fSopenharmony_ci[Calculating Asymptotic Complexity](#asymptotic-complexity)
28a8c51b3fSopenharmony_ci
29a8c51b3fSopenharmony_ci[Templated Benchmarks](#templated-benchmarks)
30a8c51b3fSopenharmony_ci
31a8c51b3fSopenharmony_ci[Fixtures](#fixtures)
32a8c51b3fSopenharmony_ci
33a8c51b3fSopenharmony_ci[Custom Counters](#custom-counters)
34a8c51b3fSopenharmony_ci
35a8c51b3fSopenharmony_ci[Multithreaded Benchmarks](#multithreaded-benchmarks)
36a8c51b3fSopenharmony_ci
37a8c51b3fSopenharmony_ci[CPU Timers](#cpu-timers)
38a8c51b3fSopenharmony_ci
39a8c51b3fSopenharmony_ci[Manual Timing](#manual-timing)
40a8c51b3fSopenharmony_ci
41a8c51b3fSopenharmony_ci[Setting the Time Unit](#setting-the-time-unit)
42a8c51b3fSopenharmony_ci
43a8c51b3fSopenharmony_ci[Random Interleaving](random_interleaving.md)
44a8c51b3fSopenharmony_ci
45a8c51b3fSopenharmony_ci[User-Requested Performance Counters](perf_counters.md)
46a8c51b3fSopenharmony_ci
47a8c51b3fSopenharmony_ci[Preventing Optimization](#preventing-optimization)
48a8c51b3fSopenharmony_ci
49a8c51b3fSopenharmony_ci[Reporting Statistics](#reporting-statistics)
50a8c51b3fSopenharmony_ci
51a8c51b3fSopenharmony_ci[Custom Statistics](#custom-statistics)
52a8c51b3fSopenharmony_ci
53a8c51b3fSopenharmony_ci[Memory Usage](#memory-usage)
54a8c51b3fSopenharmony_ci
55a8c51b3fSopenharmony_ci[Using RegisterBenchmark](#using-register-benchmark)
56a8c51b3fSopenharmony_ci
57a8c51b3fSopenharmony_ci[Exiting with an Error](#exiting-with-an-error)
58a8c51b3fSopenharmony_ci
59a8c51b3fSopenharmony_ci[A Faster `KeepRunning` Loop](#a-faster-keep-running-loop)
60a8c51b3fSopenharmony_ci
61a8c51b3fSopenharmony_ci## Benchmarking Tips
62a8c51b3fSopenharmony_ci
63a8c51b3fSopenharmony_ci[Disabling CPU Frequency Scaling](#disabling-cpu-frequency-scaling)
64a8c51b3fSopenharmony_ci
65a8c51b3fSopenharmony_ci[Reducing Variance in Benchmarks](reducing_variance.md)
66a8c51b3fSopenharmony_ci
67a8c51b3fSopenharmony_ci<a name="output-formats" />
68a8c51b3fSopenharmony_ci
69a8c51b3fSopenharmony_ci## Output Formats
70a8c51b3fSopenharmony_ci
71a8c51b3fSopenharmony_ciThe library supports multiple output formats. Use the
72a8c51b3fSopenharmony_ci`--benchmark_format=<console|json|csv>` flag (or set the
73a8c51b3fSopenharmony_ci`BENCHMARK_FORMAT=<console|json|csv>` environment variable) to set
74a8c51b3fSopenharmony_cithe format type. `console` is the default format.
75a8c51b3fSopenharmony_ci
76a8c51b3fSopenharmony_ciThe Console format is intended to be a human readable format. By default
77a8c51b3fSopenharmony_cithe format generates color output. Context is output on stderr and the
78a8c51b3fSopenharmony_citabular data on stdout. Example tabular output looks like:
79a8c51b3fSopenharmony_ci
80a8c51b3fSopenharmony_ci```
81a8c51b3fSopenharmony_ciBenchmark                               Time(ns)    CPU(ns) Iterations
82a8c51b3fSopenharmony_ci----------------------------------------------------------------------
83a8c51b3fSopenharmony_ciBM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
84a8c51b3fSopenharmony_ciBM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
85a8c51b3fSopenharmony_ciBM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
86a8c51b3fSopenharmony_ci```
87a8c51b3fSopenharmony_ci
88a8c51b3fSopenharmony_ciThe JSON format outputs human readable json split into two top level attributes.
89a8c51b3fSopenharmony_ciThe `context` attribute contains information about the run in general, including
90a8c51b3fSopenharmony_ciinformation about the CPU and the date.
91a8c51b3fSopenharmony_ciThe `benchmarks` attribute contains a list of every benchmark run. Example json
92a8c51b3fSopenharmony_cioutput looks like:
93a8c51b3fSopenharmony_ci
94a8c51b3fSopenharmony_ci```json
95a8c51b3fSopenharmony_ci{
96a8c51b3fSopenharmony_ci  "context": {
97a8c51b3fSopenharmony_ci    "date": "2015/03/17-18:40:25",
98a8c51b3fSopenharmony_ci    "num_cpus": 40,
99a8c51b3fSopenharmony_ci    "mhz_per_cpu": 2801,
100a8c51b3fSopenharmony_ci    "cpu_scaling_enabled": false,
101a8c51b3fSopenharmony_ci    "build_type": "debug"
102a8c51b3fSopenharmony_ci  },
103a8c51b3fSopenharmony_ci  "benchmarks": [
104a8c51b3fSopenharmony_ci    {
105a8c51b3fSopenharmony_ci      "name": "BM_SetInsert/1024/1",
106a8c51b3fSopenharmony_ci      "iterations": 94877,
107a8c51b3fSopenharmony_ci      "real_time": 29275,
108a8c51b3fSopenharmony_ci      "cpu_time": 29836,
109a8c51b3fSopenharmony_ci      "bytes_per_second": 134066,
110a8c51b3fSopenharmony_ci      "items_per_second": 33516
111a8c51b3fSopenharmony_ci    },
112a8c51b3fSopenharmony_ci    {
113a8c51b3fSopenharmony_ci      "name": "BM_SetInsert/1024/8",
114a8c51b3fSopenharmony_ci      "iterations": 21609,
115a8c51b3fSopenharmony_ci      "real_time": 32317,
116a8c51b3fSopenharmony_ci      "cpu_time": 32429,
117a8c51b3fSopenharmony_ci      "bytes_per_second": 986770,
118a8c51b3fSopenharmony_ci      "items_per_second": 246693
119a8c51b3fSopenharmony_ci    },
120a8c51b3fSopenharmony_ci    {
121a8c51b3fSopenharmony_ci      "name": "BM_SetInsert/1024/10",
122a8c51b3fSopenharmony_ci      "iterations": 21393,
123a8c51b3fSopenharmony_ci      "real_time": 32724,
124a8c51b3fSopenharmony_ci      "cpu_time": 33355,
125a8c51b3fSopenharmony_ci      "bytes_per_second": 1199226,
126a8c51b3fSopenharmony_ci      "items_per_second": 299807
127a8c51b3fSopenharmony_ci    }
128a8c51b3fSopenharmony_ci  ]
129a8c51b3fSopenharmony_ci}
130a8c51b3fSopenharmony_ci```
131a8c51b3fSopenharmony_ci
132a8c51b3fSopenharmony_ciThe CSV format outputs comma-separated values. The `context` is output on stderr
133a8c51b3fSopenharmony_ciand the CSV itself on stdout. Example CSV output looks like:
134a8c51b3fSopenharmony_ci
135a8c51b3fSopenharmony_ci```
136a8c51b3fSopenharmony_ciname,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
137a8c51b3fSopenharmony_ci"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
138a8c51b3fSopenharmony_ci"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
139a8c51b3fSopenharmony_ci"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
140a8c51b3fSopenharmony_ci```
141a8c51b3fSopenharmony_ci
142a8c51b3fSopenharmony_ci<a name="output-files" />
143a8c51b3fSopenharmony_ci
144a8c51b3fSopenharmony_ci## Output Files
145a8c51b3fSopenharmony_ci
146a8c51b3fSopenharmony_ciWrite benchmark results to a file with the `--benchmark_out=<filename>` option
147a8c51b3fSopenharmony_ci(or set `BENCHMARK_OUT`). Specify the output format with
148a8c51b3fSopenharmony_ci`--benchmark_out_format={json|console|csv}` (or set
149a8c51b3fSopenharmony_ci`BENCHMARK_OUT_FORMAT={json|console|csv}`). Note that the 'csv' reporter is
150a8c51b3fSopenharmony_cideprecated and the saved `.csv` file
151a8c51b3fSopenharmony_ci[is not parsable](https://github.com/google/benchmark/issues/794) by csv
152a8c51b3fSopenharmony_ciparsers.
153a8c51b3fSopenharmony_ci
154a8c51b3fSopenharmony_ciSpecifying `--benchmark_out` does not suppress the console output.
155a8c51b3fSopenharmony_ci
156a8c51b3fSopenharmony_ci<a name="running-benchmarks" />
157a8c51b3fSopenharmony_ci
158a8c51b3fSopenharmony_ci## Running Benchmarks
159a8c51b3fSopenharmony_ci
160a8c51b3fSopenharmony_ciBenchmarks are executed by running the produced binaries. Benchmarks binaries,
161a8c51b3fSopenharmony_ciby default, accept options that may be specified either through their command
162a8c51b3fSopenharmony_ciline interface or by setting environment variables before execution. For every
163a8c51b3fSopenharmony_ci`--option_flag=<value>` CLI switch, a corresponding environment variable
164a8c51b3fSopenharmony_ci`OPTION_FLAG=<value>` exist and is used as default if set (CLI switches always
165a8c51b3fSopenharmony_ci prevails). A complete list of CLI options is available running benchmarks
166a8c51b3fSopenharmony_ci with the `--help` switch.
167a8c51b3fSopenharmony_ci
168a8c51b3fSopenharmony_ci<a name="running-a-subset-of-benchmarks" />
169a8c51b3fSopenharmony_ci
170a8c51b3fSopenharmony_ci## Running a Subset of Benchmarks
171a8c51b3fSopenharmony_ci
172a8c51b3fSopenharmony_ciThe `--benchmark_filter=<regex>` option (or `BENCHMARK_FILTER=<regex>`
173a8c51b3fSopenharmony_cienvironment variable) can be used to only run the benchmarks that match
174a8c51b3fSopenharmony_cithe specified `<regex>`. For example:
175a8c51b3fSopenharmony_ci
176a8c51b3fSopenharmony_ci```bash
177a8c51b3fSopenharmony_ci$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
178a8c51b3fSopenharmony_ciRun on (1 X 2300 MHz CPU )
179a8c51b3fSopenharmony_ci2016-06-25 19:34:24
180a8c51b3fSopenharmony_ciBenchmark              Time           CPU Iterations
181a8c51b3fSopenharmony_ci----------------------------------------------------
182a8c51b3fSopenharmony_ciBM_memcpy/32          11 ns         11 ns   79545455
183a8c51b3fSopenharmony_ciBM_memcpy/32k       2181 ns       2185 ns     324074
184a8c51b3fSopenharmony_ciBM_memcpy/32          12 ns         12 ns   54687500
185a8c51b3fSopenharmony_ciBM_memcpy/32k       1834 ns       1837 ns     357143
186a8c51b3fSopenharmony_ci```
187a8c51b3fSopenharmony_ci
188a8c51b3fSopenharmony_ci## Disabling Benchmarks
189a8c51b3fSopenharmony_ci
190a8c51b3fSopenharmony_ciIt is possible to temporarily disable benchmarks by renaming the benchmark
191a8c51b3fSopenharmony_cifunction to have the prefix "DISABLED_". This will cause the benchmark to
192a8c51b3fSopenharmony_cibe skipped at runtime.
193a8c51b3fSopenharmony_ci
194a8c51b3fSopenharmony_ci<a name="result-comparison" />
195a8c51b3fSopenharmony_ci
196a8c51b3fSopenharmony_ci## Result comparison
197a8c51b3fSopenharmony_ci
198a8c51b3fSopenharmony_ciIt is possible to compare the benchmarking results.
199a8c51b3fSopenharmony_ciSee [Additional Tooling Documentation](tools.md)
200a8c51b3fSopenharmony_ci
201a8c51b3fSopenharmony_ci<a name="extra-context" />
202a8c51b3fSopenharmony_ci
203a8c51b3fSopenharmony_ci## Extra Context
204a8c51b3fSopenharmony_ci
205a8c51b3fSopenharmony_ciSometimes it's useful to add extra context to the content printed before the
206a8c51b3fSopenharmony_ciresults. By default this section includes information about the CPU on which
207a8c51b3fSopenharmony_cithe benchmarks are running. If you do want to add more context, you can use
208a8c51b3fSopenharmony_cithe `benchmark_context` command line flag:
209a8c51b3fSopenharmony_ci
210a8c51b3fSopenharmony_ci```bash
211a8c51b3fSopenharmony_ci$ ./run_benchmarks --benchmark_context=pwd=`pwd`
212a8c51b3fSopenharmony_ciRun on (1 x 2300 MHz CPU)
213a8c51b3fSopenharmony_cipwd: /home/user/benchmark/
214a8c51b3fSopenharmony_ciBenchmark              Time           CPU Iterations
215a8c51b3fSopenharmony_ci----------------------------------------------------
216a8c51b3fSopenharmony_ciBM_memcpy/32          11 ns         11 ns   79545455
217a8c51b3fSopenharmony_ciBM_memcpy/32k       2181 ns       2185 ns     324074
218a8c51b3fSopenharmony_ci```
219a8c51b3fSopenharmony_ci
220a8c51b3fSopenharmony_ciYou can get the same effect with the API:
221a8c51b3fSopenharmony_ci
222a8c51b3fSopenharmony_ci```c++
223a8c51b3fSopenharmony_ci  benchmark::AddCustomContext("foo", "bar");
224a8c51b3fSopenharmony_ci```
225a8c51b3fSopenharmony_ci
226a8c51b3fSopenharmony_ciNote that attempts to add a second value with the same key will fail with an
227a8c51b3fSopenharmony_cierror message.
228a8c51b3fSopenharmony_ci
229a8c51b3fSopenharmony_ci<a name="runtime-and-reporting-considerations" />
230a8c51b3fSopenharmony_ci
231a8c51b3fSopenharmony_ci## Runtime and Reporting Considerations
232a8c51b3fSopenharmony_ci
233a8c51b3fSopenharmony_ciWhen the benchmark binary is executed, each benchmark function is run serially.
234a8c51b3fSopenharmony_ciThe number of iterations to run is determined dynamically by running the
235a8c51b3fSopenharmony_cibenchmark a few times and measuring the time taken and ensuring that the
236a8c51b3fSopenharmony_ciultimate result will be statistically stable. As such, faster benchmark
237a8c51b3fSopenharmony_cifunctions will be run for more iterations than slower benchmark functions, and
238a8c51b3fSopenharmony_cithe number of iterations is thus reported.
239a8c51b3fSopenharmony_ci
240a8c51b3fSopenharmony_ciIn all cases, the number of iterations for which the benchmark is run is
241a8c51b3fSopenharmony_cigoverned by the amount of time the benchmark takes. Concretely, the number of
242a8c51b3fSopenharmony_ciiterations is at least one, not more than 1e9, until CPU time is greater than
243a8c51b3fSopenharmony_cithe minimum time, or the wallclock time is 5x minimum time. The minimum time is
244a8c51b3fSopenharmony_ciset per benchmark by calling `MinTime` on the registered benchmark object.
245a8c51b3fSopenharmony_ci
246a8c51b3fSopenharmony_ciFurthermore warming up a benchmark might be necessary in order to get
247a8c51b3fSopenharmony_cistable results because of e.g caching effects of the code under benchmark.
248a8c51b3fSopenharmony_ciWarming up means running the benchmark a given amount of time, before
249a8c51b3fSopenharmony_ciresults are actually taken into account. The amount of time for which
250a8c51b3fSopenharmony_cithe warmup should be run can be set per benchmark by calling
251a8c51b3fSopenharmony_ci`MinWarmUpTime` on the registered benchmark object or for all benchmarks
252a8c51b3fSopenharmony_ciusing the `--benchmark_min_warmup_time` command-line option. Note that
253a8c51b3fSopenharmony_ci`MinWarmUpTime` will overwrite the value of `--benchmark_min_warmup_time`
254a8c51b3fSopenharmony_cifor the single benchmark. How many iterations the warmup run of each
255a8c51b3fSopenharmony_cibenchmark takes is determined the same way as described in the paragraph
256a8c51b3fSopenharmony_ciabove. Per default the warmup phase is set to 0 seconds and is therefore
257a8c51b3fSopenharmony_cidisabled.
258a8c51b3fSopenharmony_ci
259a8c51b3fSopenharmony_ciAverage timings are then reported over the iterations run. If multiple
260a8c51b3fSopenharmony_cirepetitions are requested using the `--benchmark_repetitions` command-line
261a8c51b3fSopenharmony_cioption, or at registration time, the benchmark function will be run several
262a8c51b3fSopenharmony_citimes and statistical results across these repetitions will also be reported.
263a8c51b3fSopenharmony_ci
264a8c51b3fSopenharmony_ciAs well as the per-benchmark entries, a preamble in the report will include
265a8c51b3fSopenharmony_ciinformation about the machine on which the benchmarks are run.
266a8c51b3fSopenharmony_ci
267a8c51b3fSopenharmony_ci<a name="setup-teardown" />
268a8c51b3fSopenharmony_ci
269a8c51b3fSopenharmony_ci## Setup/Teardown
270a8c51b3fSopenharmony_ci
271a8c51b3fSopenharmony_ciGlobal setup/teardown specific to each benchmark can be done by
272a8c51b3fSopenharmony_cipassing a callback to Setup/Teardown:
273a8c51b3fSopenharmony_ci
274a8c51b3fSopenharmony_ciThe setup/teardown callbacks will be invoked once for each benchmark. If the
275a8c51b3fSopenharmony_cibenchmark is multi-threaded (will run in k threads), they will be invoked
276a8c51b3fSopenharmony_ciexactly once before each run with k threads.
277a8c51b3fSopenharmony_ci
278a8c51b3fSopenharmony_ciIf the benchmark uses different size groups of threads, the above will be true
279a8c51b3fSopenharmony_cifor each size group.
280a8c51b3fSopenharmony_ci
281a8c51b3fSopenharmony_ciEg.,
282a8c51b3fSopenharmony_ci
283a8c51b3fSopenharmony_ci```c++
284a8c51b3fSopenharmony_cistatic void DoSetup(const benchmark::State& state) {
285a8c51b3fSopenharmony_ci}
286a8c51b3fSopenharmony_ci
287a8c51b3fSopenharmony_cistatic void DoTeardown(const benchmark::State& state) {
288a8c51b3fSopenharmony_ci}
289a8c51b3fSopenharmony_ci
290a8c51b3fSopenharmony_cistatic void BM_func(benchmark::State& state) {...}
291a8c51b3fSopenharmony_ci
292a8c51b3fSopenharmony_ciBENCHMARK(BM_func)->Arg(1)->Arg(3)->Threads(16)->Threads(32)->Setup(DoSetup)->Teardown(DoTeardown);
293a8c51b3fSopenharmony_ci
294a8c51b3fSopenharmony_ci```
295a8c51b3fSopenharmony_ci
296a8c51b3fSopenharmony_ciIn this example, `DoSetup` and `DoTearDown` will be invoked 4 times each,
297a8c51b3fSopenharmony_cispecifically, once for each of this family:
298a8c51b3fSopenharmony_ci - BM_func_Arg_1_Threads_16, BM_func_Arg_1_Threads_32
299a8c51b3fSopenharmony_ci - BM_func_Arg_3_Threads_16, BM_func_Arg_3_Threads_32
300a8c51b3fSopenharmony_ci
301a8c51b3fSopenharmony_ci<a name="passing-arguments" />
302a8c51b3fSopenharmony_ci
303a8c51b3fSopenharmony_ci## Passing Arguments
304a8c51b3fSopenharmony_ci
305a8c51b3fSopenharmony_ciSometimes a family of benchmarks can be implemented with just one routine that
306a8c51b3fSopenharmony_citakes an extra argument to specify which one of the family of benchmarks to
307a8c51b3fSopenharmony_cirun. For example, the following code defines a family of benchmarks for
308a8c51b3fSopenharmony_cimeasuring the speed of `memcpy()` calls of different lengths:
309a8c51b3fSopenharmony_ci
310a8c51b3fSopenharmony_ci```c++
311a8c51b3fSopenharmony_cistatic void BM_memcpy(benchmark::State& state) {
312a8c51b3fSopenharmony_ci  char* src = new char[state.range(0)];
313a8c51b3fSopenharmony_ci  char* dst = new char[state.range(0)];
314a8c51b3fSopenharmony_ci  memset(src, 'x', state.range(0));
315a8c51b3fSopenharmony_ci  for (auto _ : state)
316a8c51b3fSopenharmony_ci    memcpy(dst, src, state.range(0));
317a8c51b3fSopenharmony_ci  state.SetBytesProcessed(int64_t(state.iterations()) *
318a8c51b3fSopenharmony_ci                          int64_t(state.range(0)));
319a8c51b3fSopenharmony_ci  delete[] src;
320a8c51b3fSopenharmony_ci  delete[] dst;
321a8c51b3fSopenharmony_ci}
322a8c51b3fSopenharmony_ciBENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(4<<10)->Arg(8<<10);
323a8c51b3fSopenharmony_ci```
324a8c51b3fSopenharmony_ci
325a8c51b3fSopenharmony_ciThe preceding code is quite repetitive, and can be replaced with the following
326a8c51b3fSopenharmony_cishort-hand. The following invocation will pick a few appropriate arguments in
327a8c51b3fSopenharmony_cithe specified range and will generate a benchmark for each such argument.
328a8c51b3fSopenharmony_ci
329a8c51b3fSopenharmony_ci```c++
330a8c51b3fSopenharmony_ciBENCHMARK(BM_memcpy)->Range(8, 8<<10);
331a8c51b3fSopenharmony_ci```
332a8c51b3fSopenharmony_ci
333a8c51b3fSopenharmony_ciBy default the arguments in the range are generated in multiples of eight and
334a8c51b3fSopenharmony_cithe command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
335a8c51b3fSopenharmony_cirange multiplier is changed to multiples of two.
336a8c51b3fSopenharmony_ci
337a8c51b3fSopenharmony_ci```c++
338a8c51b3fSopenharmony_ciBENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
339a8c51b3fSopenharmony_ci```
340a8c51b3fSopenharmony_ci
341a8c51b3fSopenharmony_ciNow arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
342a8c51b3fSopenharmony_ci
343a8c51b3fSopenharmony_ciThe preceding code shows a method of defining a sparse range.  The following
344a8c51b3fSopenharmony_ciexample shows a method of defining a dense range. It is then used to benchmark
345a8c51b3fSopenharmony_cithe performance of `std::vector` initialization for uniformly increasing sizes.
346a8c51b3fSopenharmony_ci
347a8c51b3fSopenharmony_ci```c++
348a8c51b3fSopenharmony_cistatic void BM_DenseRange(benchmark::State& state) {
349a8c51b3fSopenharmony_ci  for(auto _ : state) {
350a8c51b3fSopenharmony_ci    std::vector<int> v(state.range(0), state.range(0));
351a8c51b3fSopenharmony_ci    auto data = v.data();
352a8c51b3fSopenharmony_ci    benchmark::DoNotOptimize(data);
353a8c51b3fSopenharmony_ci    benchmark::ClobberMemory();
354a8c51b3fSopenharmony_ci  }
355a8c51b3fSopenharmony_ci}
356a8c51b3fSopenharmony_ciBENCHMARK(BM_DenseRange)->DenseRange(0, 1024, 128);
357a8c51b3fSopenharmony_ci```
358a8c51b3fSopenharmony_ci
359a8c51b3fSopenharmony_ciNow arguments generated are [ 0, 128, 256, 384, 512, 640, 768, 896, 1024 ].
360a8c51b3fSopenharmony_ci
361a8c51b3fSopenharmony_ciYou might have a benchmark that depends on two or more inputs. For example, the
362a8c51b3fSopenharmony_cifollowing code defines a family of benchmarks for measuring the speed of set
363a8c51b3fSopenharmony_ciinsertion.
364a8c51b3fSopenharmony_ci
365a8c51b3fSopenharmony_ci```c++
366a8c51b3fSopenharmony_cistatic void BM_SetInsert(benchmark::State& state) {
367a8c51b3fSopenharmony_ci  std::set<int> data;
368a8c51b3fSopenharmony_ci  for (auto _ : state) {
369a8c51b3fSopenharmony_ci    state.PauseTiming();
370a8c51b3fSopenharmony_ci    data = ConstructRandomSet(state.range(0));
371a8c51b3fSopenharmony_ci    state.ResumeTiming();
372a8c51b3fSopenharmony_ci    for (int j = 0; j < state.range(1); ++j)
373a8c51b3fSopenharmony_ci      data.insert(RandomNumber());
374a8c51b3fSopenharmony_ci  }
375a8c51b3fSopenharmony_ci}
376a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)
377a8c51b3fSopenharmony_ci    ->Args({1<<10, 128})
378a8c51b3fSopenharmony_ci    ->Args({2<<10, 128})
379a8c51b3fSopenharmony_ci    ->Args({4<<10, 128})
380a8c51b3fSopenharmony_ci    ->Args({8<<10, 128})
381a8c51b3fSopenharmony_ci    ->Args({1<<10, 512})
382a8c51b3fSopenharmony_ci    ->Args({2<<10, 512})
383a8c51b3fSopenharmony_ci    ->Args({4<<10, 512})
384a8c51b3fSopenharmony_ci    ->Args({8<<10, 512});
385a8c51b3fSopenharmony_ci```
386a8c51b3fSopenharmony_ci
387a8c51b3fSopenharmony_ciThe preceding code is quite repetitive, and can be replaced with the following
388a8c51b3fSopenharmony_cishort-hand. The following macro will pick a few appropriate arguments in the
389a8c51b3fSopenharmony_ciproduct of the two specified ranges and will generate a benchmark for each such
390a8c51b3fSopenharmony_cipair.
391a8c51b3fSopenharmony_ci
392a8c51b3fSopenharmony_ci<!-- {% raw %} -->
393a8c51b3fSopenharmony_ci```c++
394a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
395a8c51b3fSopenharmony_ci```
396a8c51b3fSopenharmony_ci<!-- {% endraw %} -->
397a8c51b3fSopenharmony_ci
398a8c51b3fSopenharmony_ciSome benchmarks may require specific argument values that cannot be expressed
399a8c51b3fSopenharmony_ciwith `Ranges`. In this case, `ArgsProduct` offers the ability to generate a
400a8c51b3fSopenharmony_cibenchmark input for each combination in the product of the supplied vectors.
401a8c51b3fSopenharmony_ci
402a8c51b3fSopenharmony_ci<!-- {% raw %} -->
403a8c51b3fSopenharmony_ci```c++
404a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)
405a8c51b3fSopenharmony_ci    ->ArgsProduct({{1<<10, 3<<10, 8<<10}, {20, 40, 60, 80}})
406a8c51b3fSopenharmony_ci// would generate the same benchmark arguments as
407a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)
408a8c51b3fSopenharmony_ci    ->Args({1<<10, 20})
409a8c51b3fSopenharmony_ci    ->Args({3<<10, 20})
410a8c51b3fSopenharmony_ci    ->Args({8<<10, 20})
411a8c51b3fSopenharmony_ci    ->Args({3<<10, 40})
412a8c51b3fSopenharmony_ci    ->Args({8<<10, 40})
413a8c51b3fSopenharmony_ci    ->Args({1<<10, 40})
414a8c51b3fSopenharmony_ci    ->Args({1<<10, 60})
415a8c51b3fSopenharmony_ci    ->Args({3<<10, 60})
416a8c51b3fSopenharmony_ci    ->Args({8<<10, 60})
417a8c51b3fSopenharmony_ci    ->Args({1<<10, 80})
418a8c51b3fSopenharmony_ci    ->Args({3<<10, 80})
419a8c51b3fSopenharmony_ci    ->Args({8<<10, 80});
420a8c51b3fSopenharmony_ci```
421a8c51b3fSopenharmony_ci<!-- {% endraw %} -->
422a8c51b3fSopenharmony_ci
423a8c51b3fSopenharmony_ciFor the most common scenarios, helper methods for creating a list of
424a8c51b3fSopenharmony_ciintegers for a given sparse or dense range are provided.
425a8c51b3fSopenharmony_ci
426a8c51b3fSopenharmony_ci```c++
427a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)
428a8c51b3fSopenharmony_ci    ->ArgsProduct({
429a8c51b3fSopenharmony_ci      benchmark::CreateRange(8, 128, /*multi=*/2),
430a8c51b3fSopenharmony_ci      benchmark::CreateDenseRange(1, 4, /*step=*/1)
431a8c51b3fSopenharmony_ci    })
432a8c51b3fSopenharmony_ci// would generate the same benchmark arguments as
433a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)
434a8c51b3fSopenharmony_ci    ->ArgsProduct({
435a8c51b3fSopenharmony_ci      {8, 16, 32, 64, 128},
436a8c51b3fSopenharmony_ci      {1, 2, 3, 4}
437a8c51b3fSopenharmony_ci    });
438a8c51b3fSopenharmony_ci```
439a8c51b3fSopenharmony_ci
440a8c51b3fSopenharmony_ciFor more complex patterns of inputs, passing a custom function to `Apply` allows
441a8c51b3fSopenharmony_ciprogrammatic specification of an arbitrary set of arguments on which to run the
442a8c51b3fSopenharmony_cibenchmark. The following example enumerates a dense range on one parameter,
443a8c51b3fSopenharmony_ciand a sparse range on the second.
444a8c51b3fSopenharmony_ci
445a8c51b3fSopenharmony_ci```c++
446a8c51b3fSopenharmony_cistatic void CustomArguments(benchmark::internal::Benchmark* b) {
447a8c51b3fSopenharmony_ci  for (int i = 0; i <= 10; ++i)
448a8c51b3fSopenharmony_ci    for (int j = 32; j <= 1024*1024; j *= 8)
449a8c51b3fSopenharmony_ci      b->Args({i, j});
450a8c51b3fSopenharmony_ci}
451a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)->Apply(CustomArguments);
452a8c51b3fSopenharmony_ci```
453a8c51b3fSopenharmony_ci
454a8c51b3fSopenharmony_ci### Passing Arbitrary Arguments to a Benchmark
455a8c51b3fSopenharmony_ci
456a8c51b3fSopenharmony_ciIn C++11 it is possible to define a benchmark that takes an arbitrary number
457a8c51b3fSopenharmony_ciof extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
458a8c51b3fSopenharmony_cimacro creates a benchmark that invokes `func`  with the `benchmark::State` as
459a8c51b3fSopenharmony_cithe first argument followed by the specified `args...`.
460a8c51b3fSopenharmony_ciThe `test_case_name` is appended to the name of the benchmark and
461a8c51b3fSopenharmony_cishould describe the values passed.
462a8c51b3fSopenharmony_ci
463a8c51b3fSopenharmony_ci```c++
464a8c51b3fSopenharmony_citemplate <class ...Args>
465a8c51b3fSopenharmony_civoid BM_takes_args(benchmark::State& state, Args&&... args) {
466a8c51b3fSopenharmony_ci  auto args_tuple = std::make_tuple(std::move(args)...);
467a8c51b3fSopenharmony_ci  for (auto _ : state) {
468a8c51b3fSopenharmony_ci    std::cout << std::get<0>(args_tuple) << ": " << std::get<1>(args_tuple)
469a8c51b3fSopenharmony_ci              << '\n';
470a8c51b3fSopenharmony_ci    [...]
471a8c51b3fSopenharmony_ci  }
472a8c51b3fSopenharmony_ci}
473a8c51b3fSopenharmony_ci// Registers a benchmark named "BM_takes_args/int_string_test" that passes
474a8c51b3fSopenharmony_ci// the specified values to `args`.
475a8c51b3fSopenharmony_ciBENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
476a8c51b3fSopenharmony_ci
477a8c51b3fSopenharmony_ci// Registers the same benchmark "BM_takes_args/int_test" that passes
478a8c51b3fSopenharmony_ci// the specified values to `args`.
479a8c51b3fSopenharmony_ciBENCHMARK_CAPTURE(BM_takes_args, int_test, 42, 43);
480a8c51b3fSopenharmony_ci```
481a8c51b3fSopenharmony_ci
482a8c51b3fSopenharmony_ciNote that elements of `...args` may refer to global variables. Users should
483a8c51b3fSopenharmony_ciavoid modifying global state inside of a benchmark.
484a8c51b3fSopenharmony_ci
485a8c51b3fSopenharmony_ci<a name="asymptotic-complexity" />
486a8c51b3fSopenharmony_ci
487a8c51b3fSopenharmony_ci## Calculating Asymptotic Complexity (Big O)
488a8c51b3fSopenharmony_ci
489a8c51b3fSopenharmony_ciAsymptotic complexity might be calculated for a family of benchmarks. The
490a8c51b3fSopenharmony_cifollowing code will calculate the coefficient for the high-order term in the
491a8c51b3fSopenharmony_cirunning time and the normalized root-mean square error of string comparison.
492a8c51b3fSopenharmony_ci
493a8c51b3fSopenharmony_ci```c++
494a8c51b3fSopenharmony_cistatic void BM_StringCompare(benchmark::State& state) {
495a8c51b3fSopenharmony_ci  std::string s1(state.range(0), '-');
496a8c51b3fSopenharmony_ci  std::string s2(state.range(0), '-');
497a8c51b3fSopenharmony_ci  for (auto _ : state) {
498a8c51b3fSopenharmony_ci    auto comparison_result = s1.compare(s2);
499a8c51b3fSopenharmony_ci    benchmark::DoNotOptimize(comparison_result);
500a8c51b3fSopenharmony_ci  }
501a8c51b3fSopenharmony_ci  state.SetComplexityN(state.range(0));
502a8c51b3fSopenharmony_ci}
503a8c51b3fSopenharmony_ciBENCHMARK(BM_StringCompare)
504a8c51b3fSopenharmony_ci    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
505a8c51b3fSopenharmony_ci```
506a8c51b3fSopenharmony_ci
507a8c51b3fSopenharmony_ciAs shown in the following invocation, asymptotic complexity might also be
508a8c51b3fSopenharmony_cicalculated automatically.
509a8c51b3fSopenharmony_ci
510a8c51b3fSopenharmony_ci```c++
511a8c51b3fSopenharmony_ciBENCHMARK(BM_StringCompare)
512a8c51b3fSopenharmony_ci    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
513a8c51b3fSopenharmony_ci```
514a8c51b3fSopenharmony_ci
515a8c51b3fSopenharmony_ciThe following code will specify asymptotic complexity with a lambda function,
516a8c51b3fSopenharmony_cithat might be used to customize high-order term calculation.
517a8c51b3fSopenharmony_ci
518a8c51b3fSopenharmony_ci```c++
519a8c51b3fSopenharmony_ciBENCHMARK(BM_StringCompare)->RangeMultiplier(2)
520a8c51b3fSopenharmony_ci    ->Range(1<<10, 1<<18)->Complexity([](benchmark::IterationCount n)->double{return n; });
521a8c51b3fSopenharmony_ci```
522a8c51b3fSopenharmony_ci
523a8c51b3fSopenharmony_ci<a name="custom-benchmark-name" />
524a8c51b3fSopenharmony_ci
525a8c51b3fSopenharmony_ci## Custom Benchmark Name
526a8c51b3fSopenharmony_ci
527a8c51b3fSopenharmony_ciYou can change the benchmark's name as follows:
528a8c51b3fSopenharmony_ci
529a8c51b3fSopenharmony_ci```c++
530a8c51b3fSopenharmony_ciBENCHMARK(BM_memcpy)->Name("memcpy")->RangeMultiplier(2)->Range(8, 8<<10);
531a8c51b3fSopenharmony_ci```
532a8c51b3fSopenharmony_ci
533a8c51b3fSopenharmony_ciThe invocation will execute the benchmark as before using `BM_memcpy` but changes
534a8c51b3fSopenharmony_cithe prefix in the report to `memcpy`.
535a8c51b3fSopenharmony_ci
536a8c51b3fSopenharmony_ci<a name="templated-benchmarks" />
537a8c51b3fSopenharmony_ci
538a8c51b3fSopenharmony_ci## Templated Benchmarks
539a8c51b3fSopenharmony_ci
540a8c51b3fSopenharmony_ciThis example produces and consumes messages of size `sizeof(v)` `range_x`
541a8c51b3fSopenharmony_citimes. It also outputs throughput in the absence of multiprogramming.
542a8c51b3fSopenharmony_ci
543a8c51b3fSopenharmony_ci```c++
544a8c51b3fSopenharmony_citemplate <class Q> void BM_Sequential(benchmark::State& state) {
545a8c51b3fSopenharmony_ci  Q q;
546a8c51b3fSopenharmony_ci  typename Q::value_type v;
547a8c51b3fSopenharmony_ci  for (auto _ : state) {
548a8c51b3fSopenharmony_ci    for (int i = state.range(0); i--; )
549a8c51b3fSopenharmony_ci      q.push(v);
550a8c51b3fSopenharmony_ci    for (int e = state.range(0); e--; )
551a8c51b3fSopenharmony_ci      q.Wait(&v);
552a8c51b3fSopenharmony_ci  }
553a8c51b3fSopenharmony_ci  // actually messages, not bytes:
554a8c51b3fSopenharmony_ci  state.SetBytesProcessed(
555a8c51b3fSopenharmony_ci      static_cast<int64_t>(state.iterations())*state.range(0));
556a8c51b3fSopenharmony_ci}
557a8c51b3fSopenharmony_ci// C++03
558a8c51b3fSopenharmony_ciBENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
559a8c51b3fSopenharmony_ci
560a8c51b3fSopenharmony_ci// C++11 or newer, you can use the BENCHMARK macro with template parameters:
561a8c51b3fSopenharmony_ciBENCHMARK(BM_Sequential<WaitQueue<int>>)->Range(1<<0, 1<<10);
562a8c51b3fSopenharmony_ci
563a8c51b3fSopenharmony_ci```
564a8c51b3fSopenharmony_ci
565a8c51b3fSopenharmony_ciThree macros are provided for adding benchmark templates.
566a8c51b3fSopenharmony_ci
567a8c51b3fSopenharmony_ci```c++
568a8c51b3fSopenharmony_ci#ifdef BENCHMARK_HAS_CXX11
569a8c51b3fSopenharmony_ci#define BENCHMARK(func<...>) // Takes any number of parameters.
570a8c51b3fSopenharmony_ci#else // C++ < C++11
571a8c51b3fSopenharmony_ci#define BENCHMARK_TEMPLATE(func, arg1)
572a8c51b3fSopenharmony_ci#endif
573a8c51b3fSopenharmony_ci#define BENCHMARK_TEMPLATE1(func, arg1)
574a8c51b3fSopenharmony_ci#define BENCHMARK_TEMPLATE2(func, arg1, arg2)
575a8c51b3fSopenharmony_ci```
576a8c51b3fSopenharmony_ci
577a8c51b3fSopenharmony_ci<a name="fixtures" />
578a8c51b3fSopenharmony_ci
579a8c51b3fSopenharmony_ci## Fixtures
580a8c51b3fSopenharmony_ci
581a8c51b3fSopenharmony_ciFixture tests are created by first defining a type that derives from
582a8c51b3fSopenharmony_ci`::benchmark::Fixture` and then creating/registering the tests using the
583a8c51b3fSopenharmony_cifollowing macros:
584a8c51b3fSopenharmony_ci
585a8c51b3fSopenharmony_ci* `BENCHMARK_F(ClassName, Method)`
586a8c51b3fSopenharmony_ci* `BENCHMARK_DEFINE_F(ClassName, Method)`
587a8c51b3fSopenharmony_ci* `BENCHMARK_REGISTER_F(ClassName, Method)`
588a8c51b3fSopenharmony_ci
589a8c51b3fSopenharmony_ciFor Example:
590a8c51b3fSopenharmony_ci
591a8c51b3fSopenharmony_ci```c++
592a8c51b3fSopenharmony_ciclass MyFixture : public benchmark::Fixture {
593a8c51b3fSopenharmony_cipublic:
594a8c51b3fSopenharmony_ci  void SetUp(const ::benchmark::State& state) {
595a8c51b3fSopenharmony_ci  }
596a8c51b3fSopenharmony_ci
597a8c51b3fSopenharmony_ci  void TearDown(const ::benchmark::State& state) {
598a8c51b3fSopenharmony_ci  }
599a8c51b3fSopenharmony_ci};
600a8c51b3fSopenharmony_ci
601a8c51b3fSopenharmony_ciBENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
602a8c51b3fSopenharmony_ci   for (auto _ : st) {
603a8c51b3fSopenharmony_ci     ...
604a8c51b3fSopenharmony_ci  }
605a8c51b3fSopenharmony_ci}
606a8c51b3fSopenharmony_ci
607a8c51b3fSopenharmony_ciBENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
608a8c51b3fSopenharmony_ci   for (auto _ : st) {
609a8c51b3fSopenharmony_ci     ...
610a8c51b3fSopenharmony_ci  }
611a8c51b3fSopenharmony_ci}
612a8c51b3fSopenharmony_ci/* BarTest is NOT registered */
613a8c51b3fSopenharmony_ciBENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
614a8c51b3fSopenharmony_ci/* BarTest is now registered */
615a8c51b3fSopenharmony_ci```
616a8c51b3fSopenharmony_ci
617a8c51b3fSopenharmony_ci### Templated Fixtures
618a8c51b3fSopenharmony_ci
619a8c51b3fSopenharmony_ciAlso you can create templated fixture by using the following macros:
620a8c51b3fSopenharmony_ci
621a8c51b3fSopenharmony_ci* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
622a8c51b3fSopenharmony_ci* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
623a8c51b3fSopenharmony_ci
624a8c51b3fSopenharmony_ciFor example:
625a8c51b3fSopenharmony_ci
626a8c51b3fSopenharmony_ci```c++
627a8c51b3fSopenharmony_citemplate<typename T>
628a8c51b3fSopenharmony_ciclass MyFixture : public benchmark::Fixture {};
629a8c51b3fSopenharmony_ci
630a8c51b3fSopenharmony_ciBENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
631a8c51b3fSopenharmony_ci   for (auto _ : st) {
632a8c51b3fSopenharmony_ci     ...
633a8c51b3fSopenharmony_ci  }
634a8c51b3fSopenharmony_ci}
635a8c51b3fSopenharmony_ci
636a8c51b3fSopenharmony_ciBENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
637a8c51b3fSopenharmony_ci   for (auto _ : st) {
638a8c51b3fSopenharmony_ci     ...
639a8c51b3fSopenharmony_ci  }
640a8c51b3fSopenharmony_ci}
641a8c51b3fSopenharmony_ci
642a8c51b3fSopenharmony_ciBENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
643a8c51b3fSopenharmony_ci```
644a8c51b3fSopenharmony_ci
645a8c51b3fSopenharmony_ci<a name="custom-counters" />
646a8c51b3fSopenharmony_ci
647a8c51b3fSopenharmony_ci## Custom Counters
648a8c51b3fSopenharmony_ci
649a8c51b3fSopenharmony_ciYou can add your own counters with user-defined names. The example below
650a8c51b3fSopenharmony_ciwill add columns "Foo", "Bar" and "Baz" in its output:
651a8c51b3fSopenharmony_ci
652a8c51b3fSopenharmony_ci```c++
653a8c51b3fSopenharmony_cistatic void UserCountersExample1(benchmark::State& state) {
654a8c51b3fSopenharmony_ci  double numFoos = 0, numBars = 0, numBazs = 0;
655a8c51b3fSopenharmony_ci  for (auto _ : state) {
656a8c51b3fSopenharmony_ci    // ... count Foo,Bar,Baz events
657a8c51b3fSopenharmony_ci  }
658a8c51b3fSopenharmony_ci  state.counters["Foo"] = numFoos;
659a8c51b3fSopenharmony_ci  state.counters["Bar"] = numBars;
660a8c51b3fSopenharmony_ci  state.counters["Baz"] = numBazs;
661a8c51b3fSopenharmony_ci}
662a8c51b3fSopenharmony_ci```
663a8c51b3fSopenharmony_ci
664a8c51b3fSopenharmony_ciThe `state.counters` object is a `std::map` with `std::string` keys
665a8c51b3fSopenharmony_ciand `Counter` values. The latter is a `double`-like class, via an implicit
666a8c51b3fSopenharmony_ciconversion to `double&`. Thus you can use all of the standard arithmetic
667a8c51b3fSopenharmony_ciassignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
668a8c51b3fSopenharmony_ci
669a8c51b3fSopenharmony_ciIn multithreaded benchmarks, each counter is set on the calling thread only.
670a8c51b3fSopenharmony_ciWhen the benchmark finishes, the counters from each thread will be summed;
671a8c51b3fSopenharmony_cithe resulting sum is the value which will be shown for the benchmark.
672a8c51b3fSopenharmony_ci
673a8c51b3fSopenharmony_ciThe `Counter` constructor accepts three parameters: the value as a `double`
674a8c51b3fSopenharmony_ci; a bit flag which allows you to show counters as rates, and/or as per-thread
675a8c51b3fSopenharmony_ciiteration, and/or as per-thread averages, and/or iteration invariants,
676a8c51b3fSopenharmony_ciand/or finally inverting the result; and a flag specifying the 'unit' - i.e.
677a8c51b3fSopenharmony_ciis 1k a 1000 (default, `benchmark::Counter::OneK::kIs1000`), or 1024
678a8c51b3fSopenharmony_ci(`benchmark::Counter::OneK::kIs1024`)?
679a8c51b3fSopenharmony_ci
680a8c51b3fSopenharmony_ci```c++
681a8c51b3fSopenharmony_ci  // sets a simple counter
682a8c51b3fSopenharmony_ci  state.counters["Foo"] = numFoos;
683a8c51b3fSopenharmony_ci
684a8c51b3fSopenharmony_ci  // Set the counter as a rate. It will be presented divided
685a8c51b3fSopenharmony_ci  // by the duration of the benchmark.
686a8c51b3fSopenharmony_ci  // Meaning: per one second, how many 'foo's are processed?
687a8c51b3fSopenharmony_ci  state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
688a8c51b3fSopenharmony_ci
689a8c51b3fSopenharmony_ci  // Set the counter as a rate. It will be presented divided
690a8c51b3fSopenharmony_ci  // by the duration of the benchmark, and the result inverted.
691a8c51b3fSopenharmony_ci  // Meaning: how many seconds it takes to process one 'foo'?
692a8c51b3fSopenharmony_ci  state.counters["FooInvRate"] = Counter(numFoos, benchmark::Counter::kIsRate | benchmark::Counter::kInvert);
693a8c51b3fSopenharmony_ci
694a8c51b3fSopenharmony_ci  // Set the counter as a thread-average quantity. It will
695a8c51b3fSopenharmony_ci  // be presented divided by the number of threads.
696a8c51b3fSopenharmony_ci  state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
697a8c51b3fSopenharmony_ci
698a8c51b3fSopenharmony_ci  // There's also a combined flag:
699a8c51b3fSopenharmony_ci  state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
700a8c51b3fSopenharmony_ci
701a8c51b3fSopenharmony_ci  // This says that we process with the rate of state.range(0) bytes every iteration:
702a8c51b3fSopenharmony_ci  state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024);
703a8c51b3fSopenharmony_ci```
704a8c51b3fSopenharmony_ci
705a8c51b3fSopenharmony_ciWhen you're compiling in C++11 mode or later you can use `insert()` with
706a8c51b3fSopenharmony_ci`std::initializer_list`:
707a8c51b3fSopenharmony_ci
708a8c51b3fSopenharmony_ci<!-- {% raw %} -->
709a8c51b3fSopenharmony_ci```c++
710a8c51b3fSopenharmony_ci  // With C++11, this can be done:
711a8c51b3fSopenharmony_ci  state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
712a8c51b3fSopenharmony_ci  // ... instead of:
713a8c51b3fSopenharmony_ci  state.counters["Foo"] = numFoos;
714a8c51b3fSopenharmony_ci  state.counters["Bar"] = numBars;
715a8c51b3fSopenharmony_ci  state.counters["Baz"] = numBazs;
716a8c51b3fSopenharmony_ci```
717a8c51b3fSopenharmony_ci<!-- {% endraw %} -->
718a8c51b3fSopenharmony_ci
719a8c51b3fSopenharmony_ci### Counter Reporting
720a8c51b3fSopenharmony_ci
721a8c51b3fSopenharmony_ciWhen using the console reporter, by default, user counters are printed at
722a8c51b3fSopenharmony_cithe end after the table, the same way as ``bytes_processed`` and
723a8c51b3fSopenharmony_ci``items_processed``. This is best for cases in which there are few counters,
724a8c51b3fSopenharmony_cior where there are only a couple of lines per benchmark. Here's an example of
725a8c51b3fSopenharmony_cithe default output:
726a8c51b3fSopenharmony_ci
727a8c51b3fSopenharmony_ci```
728a8c51b3fSopenharmony_ci------------------------------------------------------------------------------
729a8c51b3fSopenharmony_ciBenchmark                        Time           CPU Iterations UserCounters...
730a8c51b3fSopenharmony_ci------------------------------------------------------------------------------
731a8c51b3fSopenharmony_ciBM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
732a8c51b3fSopenharmony_ciBM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
733a8c51b3fSopenharmony_ciBM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
734a8c51b3fSopenharmony_ciBM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
735a8c51b3fSopenharmony_ciBM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
736a8c51b3fSopenharmony_ciBM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
737a8c51b3fSopenharmony_ciBM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
738a8c51b3fSopenharmony_ciBM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
739a8c51b3fSopenharmony_ciBM_Factorial                    26 ns         26 ns   26608979 40320
740a8c51b3fSopenharmony_ciBM_Factorial/real_time          26 ns         26 ns   26587936 40320
741a8c51b3fSopenharmony_ciBM_CalculatePiRange/1           16 ns         16 ns   45704255 0
742a8c51b3fSopenharmony_ciBM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
743a8c51b3fSopenharmony_ciBM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
744a8c51b3fSopenharmony_ciBM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
745a8c51b3fSopenharmony_ci```
746a8c51b3fSopenharmony_ci
747a8c51b3fSopenharmony_ciIf this doesn't suit you, you can print each counter as a table column by
748a8c51b3fSopenharmony_cipassing the flag `--benchmark_counters_tabular=true` to the benchmark
749a8c51b3fSopenharmony_ciapplication. This is best for cases in which there are a lot of counters, or
750a8c51b3fSopenharmony_cia lot of lines per individual benchmark. Note that this will trigger a
751a8c51b3fSopenharmony_cireprinting of the table header any time the counter set changes between
752a8c51b3fSopenharmony_ciindividual benchmarks. Here's an example of corresponding output when
753a8c51b3fSopenharmony_ci`--benchmark_counters_tabular=true` is passed:
754a8c51b3fSopenharmony_ci
755a8c51b3fSopenharmony_ci```
756a8c51b3fSopenharmony_ci---------------------------------------------------------------------------------------
757a8c51b3fSopenharmony_ciBenchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
758a8c51b3fSopenharmony_ci---------------------------------------------------------------------------------------
759a8c51b3fSopenharmony_ciBM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
760a8c51b3fSopenharmony_ciBM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
761a8c51b3fSopenharmony_ciBM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
762a8c51b3fSopenharmony_ciBM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
763a8c51b3fSopenharmony_ciBM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
764a8c51b3fSopenharmony_ciBM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
765a8c51b3fSopenharmony_ciBM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
766a8c51b3fSopenharmony_ciBM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
767a8c51b3fSopenharmony_ci--------------------------------------------------------------
768a8c51b3fSopenharmony_ciBenchmark                        Time           CPU Iterations
769a8c51b3fSopenharmony_ci--------------------------------------------------------------
770a8c51b3fSopenharmony_ciBM_Factorial                    26 ns         26 ns   26392245 40320
771a8c51b3fSopenharmony_ciBM_Factorial/real_time          26 ns         26 ns   26494107 40320
772a8c51b3fSopenharmony_ciBM_CalculatePiRange/1           15 ns         15 ns   45571597 0
773a8c51b3fSopenharmony_ciBM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
774a8c51b3fSopenharmony_ciBM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
775a8c51b3fSopenharmony_ciBM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
776a8c51b3fSopenharmony_ciBM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
777a8c51b3fSopenharmony_ciBM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
778a8c51b3fSopenharmony_ciBM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
779a8c51b3fSopenharmony_ciBM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
780a8c51b3fSopenharmony_ciBM_CalculatePi/threads:8      2255 ns       9943 ns      70936
781a8c51b3fSopenharmony_ci```
782a8c51b3fSopenharmony_ci
783a8c51b3fSopenharmony_ciNote above the additional header printed when the benchmark changes from
784a8c51b3fSopenharmony_ci``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
785a8c51b3fSopenharmony_cinot have the same counter set as ``BM_UserCounter``.
786a8c51b3fSopenharmony_ci
787a8c51b3fSopenharmony_ci<a name="multithreaded-benchmarks"/>
788a8c51b3fSopenharmony_ci
789a8c51b3fSopenharmony_ci## Multithreaded Benchmarks
790a8c51b3fSopenharmony_ci
791a8c51b3fSopenharmony_ciIn a multithreaded test (benchmark invoked by multiple threads simultaneously),
792a8c51b3fSopenharmony_ciit is guaranteed that none of the threads will start until all have reached
793a8c51b3fSopenharmony_cithe start of the benchmark loop, and all will have finished before any thread
794a8c51b3fSopenharmony_ciexits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
795a8c51b3fSopenharmony_ciAPI) As such, any global setup or teardown can be wrapped in a check against the thread
796a8c51b3fSopenharmony_ciindex:
797a8c51b3fSopenharmony_ci
798a8c51b3fSopenharmony_ci```c++
799a8c51b3fSopenharmony_cistatic void BM_MultiThreaded(benchmark::State& state) {
800a8c51b3fSopenharmony_ci  if (state.thread_index() == 0) {
801a8c51b3fSopenharmony_ci    // Setup code here.
802a8c51b3fSopenharmony_ci  }
803a8c51b3fSopenharmony_ci  for (auto _ : state) {
804a8c51b3fSopenharmony_ci    // Run the test as normal.
805a8c51b3fSopenharmony_ci  }
806a8c51b3fSopenharmony_ci  if (state.thread_index() == 0) {
807a8c51b3fSopenharmony_ci    // Teardown code here.
808a8c51b3fSopenharmony_ci  }
809a8c51b3fSopenharmony_ci}
810a8c51b3fSopenharmony_ciBENCHMARK(BM_MultiThreaded)->Threads(2);
811a8c51b3fSopenharmony_ci```
812a8c51b3fSopenharmony_ci
813a8c51b3fSopenharmony_ciTo run the benchmark across a range of thread counts, instead of `Threads`, use
814a8c51b3fSopenharmony_ci`ThreadRange`. This takes two parameters (`min_threads` and `max_threads`) and
815a8c51b3fSopenharmony_ciruns the benchmark once for values in the inclusive range. For example:
816a8c51b3fSopenharmony_ci
817a8c51b3fSopenharmony_ci```c++
818a8c51b3fSopenharmony_ciBENCHMARK(BM_MultiThreaded)->ThreadRange(1, 8);
819a8c51b3fSopenharmony_ci```
820a8c51b3fSopenharmony_ci
821a8c51b3fSopenharmony_ciwill run `BM_MultiThreaded` with thread counts 1, 2, 4, and 8.
822a8c51b3fSopenharmony_ci
823a8c51b3fSopenharmony_ciIf the benchmarked code itself uses threads and you want to compare it to
824a8c51b3fSopenharmony_cisingle-threaded code, you may want to use real-time ("wallclock") measurements
825a8c51b3fSopenharmony_cifor latency comparisons:
826a8c51b3fSopenharmony_ci
827a8c51b3fSopenharmony_ci```c++
828a8c51b3fSopenharmony_ciBENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
829a8c51b3fSopenharmony_ci```
830a8c51b3fSopenharmony_ci
831a8c51b3fSopenharmony_ciWithout `UseRealTime`, CPU time is used by default.
832a8c51b3fSopenharmony_ci
833a8c51b3fSopenharmony_ci<a name="cpu-timers" />
834a8c51b3fSopenharmony_ci
835a8c51b3fSopenharmony_ci## CPU Timers
836a8c51b3fSopenharmony_ci
837a8c51b3fSopenharmony_ciBy default, the CPU timer only measures the time spent by the main thread.
838a8c51b3fSopenharmony_ciIf the benchmark itself uses threads internally, this measurement may not
839a8c51b3fSopenharmony_cibe what you are looking for. Instead, there is a way to measure the total
840a8c51b3fSopenharmony_ciCPU usage of the process, by all the threads.
841a8c51b3fSopenharmony_ci
842a8c51b3fSopenharmony_ci```c++
843a8c51b3fSopenharmony_civoid callee(int i);
844a8c51b3fSopenharmony_ci
845a8c51b3fSopenharmony_cistatic void MyMain(int size) {
846a8c51b3fSopenharmony_ci#pragma omp parallel for
847a8c51b3fSopenharmony_ci  for(int i = 0; i < size; i++)
848a8c51b3fSopenharmony_ci    callee(i);
849a8c51b3fSopenharmony_ci}
850a8c51b3fSopenharmony_ci
851a8c51b3fSopenharmony_cistatic void BM_OpenMP(benchmark::State& state) {
852a8c51b3fSopenharmony_ci  for (auto _ : state)
853a8c51b3fSopenharmony_ci    MyMain(state.range(0));
854a8c51b3fSopenharmony_ci}
855a8c51b3fSopenharmony_ci
856a8c51b3fSopenharmony_ci// Measure the time spent by the main thread, use it to decide for how long to
857a8c51b3fSopenharmony_ci// run the benchmark loop. Depending on the internal implementation detail may
858a8c51b3fSopenharmony_ci// measure to anywhere from near-zero (the overhead spent before/after work
859a8c51b3fSopenharmony_ci// handoff to worker thread[s]) to the whole single-thread time.
860a8c51b3fSopenharmony_ciBENCHMARK(BM_OpenMP)->Range(8, 8<<10);
861a8c51b3fSopenharmony_ci
862a8c51b3fSopenharmony_ci// Measure the user-visible time, the wall clock (literally, the time that
863a8c51b3fSopenharmony_ci// has passed on the clock on the wall), use it to decide for how long to
864a8c51b3fSopenharmony_ci// run the benchmark loop. This will always be meaningful, and will match the
865a8c51b3fSopenharmony_ci// time spent by the main thread in single-threaded case, in general decreasing
866a8c51b3fSopenharmony_ci// with the number of internal threads doing the work.
867a8c51b3fSopenharmony_ciBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->UseRealTime();
868a8c51b3fSopenharmony_ci
869a8c51b3fSopenharmony_ci// Measure the total CPU consumption, use it to decide for how long to
870a8c51b3fSopenharmony_ci// run the benchmark loop. This will always measure to no less than the
871a8c51b3fSopenharmony_ci// time spent by the main thread in single-threaded case.
872a8c51b3fSopenharmony_ciBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime();
873a8c51b3fSopenharmony_ci
874a8c51b3fSopenharmony_ci// A mixture of the last two. Measure the total CPU consumption, but use the
875a8c51b3fSopenharmony_ci// wall clock to decide for how long to run the benchmark loop.
876a8c51b3fSopenharmony_ciBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime()->UseRealTime();
877a8c51b3fSopenharmony_ci```
878a8c51b3fSopenharmony_ci
879a8c51b3fSopenharmony_ci### Controlling Timers
880a8c51b3fSopenharmony_ci
881a8c51b3fSopenharmony_ciNormally, the entire duration of the work loop (`for (auto _ : state) {}`)
882a8c51b3fSopenharmony_ciis measured. But sometimes, it is necessary to do some work inside of
883a8c51b3fSopenharmony_cithat loop, every iteration, but without counting that time to the benchmark time.
884a8c51b3fSopenharmony_ciThat is possible, although it is not recommended, since it has high overhead.
885a8c51b3fSopenharmony_ci
886a8c51b3fSopenharmony_ci<!-- {% raw %} -->
887a8c51b3fSopenharmony_ci```c++
888a8c51b3fSopenharmony_cistatic void BM_SetInsert_With_Timer_Control(benchmark::State& state) {
889a8c51b3fSopenharmony_ci  std::set<int> data;
890a8c51b3fSopenharmony_ci  for (auto _ : state) {
891a8c51b3fSopenharmony_ci    state.PauseTiming(); // Stop timers. They will not count until they are resumed.
892a8c51b3fSopenharmony_ci    data = ConstructRandomSet(state.range(0)); // Do something that should not be measured
893a8c51b3fSopenharmony_ci    state.ResumeTiming(); // And resume timers. They are now counting again.
894a8c51b3fSopenharmony_ci    // The rest will be measured.
895a8c51b3fSopenharmony_ci    for (int j = 0; j < state.range(1); ++j)
896a8c51b3fSopenharmony_ci      data.insert(RandomNumber());
897a8c51b3fSopenharmony_ci  }
898a8c51b3fSopenharmony_ci}
899a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}});
900a8c51b3fSopenharmony_ci```
901a8c51b3fSopenharmony_ci<!-- {% endraw %} -->
902a8c51b3fSopenharmony_ci
903a8c51b3fSopenharmony_ci<a name="manual-timing" />
904a8c51b3fSopenharmony_ci
905a8c51b3fSopenharmony_ci## Manual Timing
906a8c51b3fSopenharmony_ci
907a8c51b3fSopenharmony_ciFor benchmarking something for which neither CPU time nor real-time are
908a8c51b3fSopenharmony_cicorrect or accurate enough, completely manual timing is supported using
909a8c51b3fSopenharmony_cithe `UseManualTime` function.
910a8c51b3fSopenharmony_ci
911a8c51b3fSopenharmony_ciWhen `UseManualTime` is used, the benchmarked code must call
912a8c51b3fSopenharmony_ci`SetIterationTime` once per iteration of the benchmark loop to
913a8c51b3fSopenharmony_cireport the manually measured time.
914a8c51b3fSopenharmony_ci
915a8c51b3fSopenharmony_ciAn example use case for this is benchmarking GPU execution (e.g. OpenCL
916a8c51b3fSopenharmony_cior CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
917a8c51b3fSopenharmony_cibe accurately measured using CPU time or real-time. Instead, they can be
918a8c51b3fSopenharmony_cimeasured accurately using a dedicated API, and these measurement results
919a8c51b3fSopenharmony_cican be reported back with `SetIterationTime`.
920a8c51b3fSopenharmony_ci
921a8c51b3fSopenharmony_ci```c++
922a8c51b3fSopenharmony_cistatic void BM_ManualTiming(benchmark::State& state) {
923a8c51b3fSopenharmony_ci  int microseconds = state.range(0);
924a8c51b3fSopenharmony_ci  std::chrono::duration<double, std::micro> sleep_duration {
925a8c51b3fSopenharmony_ci    static_cast<double>(microseconds)
926a8c51b3fSopenharmony_ci  };
927a8c51b3fSopenharmony_ci
928a8c51b3fSopenharmony_ci  for (auto _ : state) {
929a8c51b3fSopenharmony_ci    auto start = std::chrono::high_resolution_clock::now();
930a8c51b3fSopenharmony_ci    // Simulate some useful workload with a sleep
931a8c51b3fSopenharmony_ci    std::this_thread::sleep_for(sleep_duration);
932a8c51b3fSopenharmony_ci    auto end = std::chrono::high_resolution_clock::now();
933a8c51b3fSopenharmony_ci
934a8c51b3fSopenharmony_ci    auto elapsed_seconds =
935a8c51b3fSopenharmony_ci      std::chrono::duration_cast<std::chrono::duration<double>>(
936a8c51b3fSopenharmony_ci        end - start);
937a8c51b3fSopenharmony_ci
938a8c51b3fSopenharmony_ci    state.SetIterationTime(elapsed_seconds.count());
939a8c51b3fSopenharmony_ci  }
940a8c51b3fSopenharmony_ci}
941a8c51b3fSopenharmony_ciBENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
942a8c51b3fSopenharmony_ci```
943a8c51b3fSopenharmony_ci
944a8c51b3fSopenharmony_ci<a name="setting-the-time-unit" />
945a8c51b3fSopenharmony_ci
946a8c51b3fSopenharmony_ci## Setting the Time Unit
947a8c51b3fSopenharmony_ci
948a8c51b3fSopenharmony_ciIf a benchmark runs a few milliseconds it may be hard to visually compare the
949a8c51b3fSopenharmony_cimeasured times, since the output data is given in nanoseconds per default. In
950a8c51b3fSopenharmony_ciorder to manually set the time unit, you can specify it manually:
951a8c51b3fSopenharmony_ci
952a8c51b3fSopenharmony_ci```c++
953a8c51b3fSopenharmony_ciBENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
954a8c51b3fSopenharmony_ci```
955a8c51b3fSopenharmony_ci
956a8c51b3fSopenharmony_ciAdditionally the default time unit can be set globally with the
957a8c51b3fSopenharmony_ci`--benchmark_time_unit={ns|us|ms|s}` command line argument. The argument only
958a8c51b3fSopenharmony_ciaffects benchmarks where the time unit is not set explicitly.
959a8c51b3fSopenharmony_ci
960a8c51b3fSopenharmony_ci<a name="preventing-optimization" />
961a8c51b3fSopenharmony_ci
962a8c51b3fSopenharmony_ci## Preventing Optimization
963a8c51b3fSopenharmony_ci
964a8c51b3fSopenharmony_ciTo prevent a value or expression from being optimized away by the compiler
965a8c51b3fSopenharmony_cithe `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
966a8c51b3fSopenharmony_cifunctions can be used.
967a8c51b3fSopenharmony_ci
968a8c51b3fSopenharmony_ci```c++
969a8c51b3fSopenharmony_cistatic void BM_test(benchmark::State& state) {
970a8c51b3fSopenharmony_ci  for (auto _ : state) {
971a8c51b3fSopenharmony_ci      int x = 0;
972a8c51b3fSopenharmony_ci      for (int i=0; i < 64; ++i) {
973a8c51b3fSopenharmony_ci        benchmark::DoNotOptimize(x += i);
974a8c51b3fSopenharmony_ci      }
975a8c51b3fSopenharmony_ci  }
976a8c51b3fSopenharmony_ci}
977a8c51b3fSopenharmony_ci```
978a8c51b3fSopenharmony_ci
979a8c51b3fSopenharmony_ci`DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
980a8c51b3fSopenharmony_cimemory or a register. For GNU based compilers it acts as read/write barrier
981a8c51b3fSopenharmony_cifor global memory. More specifically it forces the compiler to flush pending
982a8c51b3fSopenharmony_ciwrites to memory and reload any other values as necessary.
983a8c51b3fSopenharmony_ci
984a8c51b3fSopenharmony_ciNote that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
985a8c51b3fSopenharmony_ciin any way. `<expr>` may even be removed entirely when the result is already
986a8c51b3fSopenharmony_ciknown. For example:
987a8c51b3fSopenharmony_ci
988a8c51b3fSopenharmony_ci```c++
989a8c51b3fSopenharmony_ci  /* Example 1: `<expr>` is removed entirely. */
990a8c51b3fSopenharmony_ci  int foo(int x) { return x + 42; }
991a8c51b3fSopenharmony_ci  while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
992a8c51b3fSopenharmony_ci
993a8c51b3fSopenharmony_ci  /*  Example 2: Result of '<expr>' is only reused */
994a8c51b3fSopenharmony_ci  int bar(int) __attribute__((const));
995a8c51b3fSopenharmony_ci  while (...) DoNotOptimize(bar(0)); // Optimized to:
996a8c51b3fSopenharmony_ci  // int __result__ = bar(0);
997a8c51b3fSopenharmony_ci  // while (...) DoNotOptimize(__result__);
998a8c51b3fSopenharmony_ci```
999a8c51b3fSopenharmony_ci
1000a8c51b3fSopenharmony_ciThe second tool for preventing optimizations is `ClobberMemory()`. In essence
1001a8c51b3fSopenharmony_ci`ClobberMemory()` forces the compiler to perform all pending writes to global
1002a8c51b3fSopenharmony_cimemory. Memory managed by block scope objects must be "escaped" using
1003a8c51b3fSopenharmony_ci`DoNotOptimize(...)` before it can be clobbered. In the below example
1004a8c51b3fSopenharmony_ci`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
1005a8c51b3fSopenharmony_ciaway.
1006a8c51b3fSopenharmony_ci
1007a8c51b3fSopenharmony_ci```c++
1008a8c51b3fSopenharmony_cistatic void BM_vector_push_back(benchmark::State& state) {
1009a8c51b3fSopenharmony_ci  for (auto _ : state) {
1010a8c51b3fSopenharmony_ci    std::vector<int> v;
1011a8c51b3fSopenharmony_ci    v.reserve(1);
1012a8c51b3fSopenharmony_ci    auto data = v.data();           // Allow v.data() to be clobbered. Pass as non-const
1013a8c51b3fSopenharmony_ci    benchmark::DoNotOptimize(data); // lvalue to avoid undesired compiler optimizations
1014a8c51b3fSopenharmony_ci    v.push_back(42);
1015a8c51b3fSopenharmony_ci    benchmark::ClobberMemory(); // Force 42 to be written to memory.
1016a8c51b3fSopenharmony_ci  }
1017a8c51b3fSopenharmony_ci}
1018a8c51b3fSopenharmony_ci```
1019a8c51b3fSopenharmony_ci
1020a8c51b3fSopenharmony_ciNote that `ClobberMemory()` is only available for GNU or MSVC based compilers.
1021a8c51b3fSopenharmony_ci
1022a8c51b3fSopenharmony_ci<a name="reporting-statistics" />
1023a8c51b3fSopenharmony_ci
1024a8c51b3fSopenharmony_ci## Statistics: Reporting the Mean, Median and Standard Deviation / Coefficient of variation of Repeated Benchmarks
1025a8c51b3fSopenharmony_ci
1026a8c51b3fSopenharmony_ciBy default each benchmark is run once and that single result is reported.
1027a8c51b3fSopenharmony_ciHowever benchmarks are often noisy and a single result may not be representative
1028a8c51b3fSopenharmony_ciof the overall behavior. For this reason it's possible to repeatedly rerun the
1029a8c51b3fSopenharmony_cibenchmark.
1030a8c51b3fSopenharmony_ci
1031a8c51b3fSopenharmony_ciThe number of runs of each benchmark is specified globally by the
1032a8c51b3fSopenharmony_ci`--benchmark_repetitions` flag or on a per benchmark basis by calling
1033a8c51b3fSopenharmony_ci`Repetitions` on the registered benchmark object. When a benchmark is run more
1034a8c51b3fSopenharmony_cithan once the mean, median, standard deviation and coefficient of variation
1035a8c51b3fSopenharmony_ciof the runs will be reported.
1036a8c51b3fSopenharmony_ci
1037a8c51b3fSopenharmony_ciAdditionally the `--benchmark_report_aggregates_only={true|false}`,
1038a8c51b3fSopenharmony_ci`--benchmark_display_aggregates_only={true|false}` flags or
1039a8c51b3fSopenharmony_ci`ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be
1040a8c51b3fSopenharmony_ciused to change how repeated tests are reported. By default the result of each
1041a8c51b3fSopenharmony_cirepeated run is reported. When `report aggregates only` option is `true`,
1042a8c51b3fSopenharmony_cionly the aggregates (i.e. mean, median, standard deviation and coefficient
1043a8c51b3fSopenharmony_ciof variation, maybe complexity measurements if they were requested) of the runs
1044a8c51b3fSopenharmony_ciis reported, to both the reporters - standard output (console), and the file.
1045a8c51b3fSopenharmony_ciHowever when only the `display aggregates only` option is `true`,
1046a8c51b3fSopenharmony_cionly the aggregates are displayed in the standard output, while the file
1047a8c51b3fSopenharmony_cioutput still contains everything.
1048a8c51b3fSopenharmony_ciCalling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a
1049a8c51b3fSopenharmony_ciregistered benchmark object overrides the value of the appropriate flag for that
1050a8c51b3fSopenharmony_cibenchmark.
1051a8c51b3fSopenharmony_ci
1052a8c51b3fSopenharmony_ci<a name="custom-statistics" />
1053a8c51b3fSopenharmony_ci
1054a8c51b3fSopenharmony_ci## Custom Statistics
1055a8c51b3fSopenharmony_ci
1056a8c51b3fSopenharmony_ciWhile having these aggregates is nice, this may not be enough for everyone.
1057a8c51b3fSopenharmony_ciFor example you may want to know what the largest observation is, e.g. because
1058a8c51b3fSopenharmony_ciyou have some real-time constraints. This is easy. The following code will
1059a8c51b3fSopenharmony_cispecify a custom statistic to be calculated, defined by a lambda function.
1060a8c51b3fSopenharmony_ci
1061a8c51b3fSopenharmony_ci```c++
1062a8c51b3fSopenharmony_civoid BM_spin_empty(benchmark::State& state) {
1063a8c51b3fSopenharmony_ci  for (auto _ : state) {
1064a8c51b3fSopenharmony_ci    for (int x = 0; x < state.range(0); ++x) {
1065a8c51b3fSopenharmony_ci      benchmark::DoNotOptimize(x);
1066a8c51b3fSopenharmony_ci    }
1067a8c51b3fSopenharmony_ci  }
1068a8c51b3fSopenharmony_ci}
1069a8c51b3fSopenharmony_ci
1070a8c51b3fSopenharmony_ciBENCHMARK(BM_spin_empty)
1071a8c51b3fSopenharmony_ci  ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
1072a8c51b3fSopenharmony_ci    return *(std::max_element(std::begin(v), std::end(v)));
1073a8c51b3fSopenharmony_ci  })
1074a8c51b3fSopenharmony_ci  ->Arg(512);
1075a8c51b3fSopenharmony_ci```
1076a8c51b3fSopenharmony_ci
1077a8c51b3fSopenharmony_ciWhile usually the statistics produce values in time units,
1078a8c51b3fSopenharmony_ciyou can also produce percentages:
1079a8c51b3fSopenharmony_ci
1080a8c51b3fSopenharmony_ci```c++
1081a8c51b3fSopenharmony_civoid BM_spin_empty(benchmark::State& state) {
1082a8c51b3fSopenharmony_ci  for (auto _ : state) {
1083a8c51b3fSopenharmony_ci    for (int x = 0; x < state.range(0); ++x) {
1084a8c51b3fSopenharmony_ci      benchmark::DoNotOptimize(x);
1085a8c51b3fSopenharmony_ci    }
1086a8c51b3fSopenharmony_ci  }
1087a8c51b3fSopenharmony_ci}
1088a8c51b3fSopenharmony_ci
1089a8c51b3fSopenharmony_ciBENCHMARK(BM_spin_empty)
1090a8c51b3fSopenharmony_ci  ->ComputeStatistics("ratio", [](const std::vector<double>& v) -> double {
1091a8c51b3fSopenharmony_ci    return std::begin(v) / std::end(v);
1092a8c51b3fSopenharmony_ci  }, benchmark::StatisticUnit::kPercentage)
1093a8c51b3fSopenharmony_ci  ->Arg(512);
1094a8c51b3fSopenharmony_ci```
1095a8c51b3fSopenharmony_ci
1096a8c51b3fSopenharmony_ci<a name="memory-usage" />
1097a8c51b3fSopenharmony_ci
1098a8c51b3fSopenharmony_ci## Memory Usage
1099a8c51b3fSopenharmony_ci
1100a8c51b3fSopenharmony_ciIt's often useful to also track memory usage for benchmarks, alongside CPU
1101a8c51b3fSopenharmony_ciperformance. For this reason, benchmark offers the `RegisterMemoryManager`
1102a8c51b3fSopenharmony_cimethod that allows a custom `MemoryManager` to be injected.
1103a8c51b3fSopenharmony_ci
1104a8c51b3fSopenharmony_ciIf set, the `MemoryManager::Start` and `MemoryManager::Stop` methods will be
1105a8c51b3fSopenharmony_cicalled at the start and end of benchmark runs to allow user code to fill out
1106a8c51b3fSopenharmony_cia report on the number of allocations, bytes used, etc.
1107a8c51b3fSopenharmony_ci
1108a8c51b3fSopenharmony_ciThis data will then be reported alongside other performance data, currently
1109a8c51b3fSopenharmony_cionly when using JSON output.
1110a8c51b3fSopenharmony_ci
1111a8c51b3fSopenharmony_ci<a name="using-register-benchmark" />
1112a8c51b3fSopenharmony_ci
1113a8c51b3fSopenharmony_ci## Using RegisterBenchmark(name, fn, args...)
1114a8c51b3fSopenharmony_ci
1115a8c51b3fSopenharmony_ciThe `RegisterBenchmark(name, func, args...)` function provides an alternative
1116a8c51b3fSopenharmony_ciway to create and register benchmarks.
1117a8c51b3fSopenharmony_ci`RegisterBenchmark(name, func, args...)` creates, registers, and returns a
1118a8c51b3fSopenharmony_cipointer to a new benchmark with the specified `name` that invokes
1119a8c51b3fSopenharmony_ci`func(st, args...)` where `st` is a `benchmark::State` object.
1120a8c51b3fSopenharmony_ci
1121a8c51b3fSopenharmony_ciUnlike the `BENCHMARK` registration macros, which can only be used at the global
1122a8c51b3fSopenharmony_ciscope, the `RegisterBenchmark` can be called anywhere. This allows for
1123a8c51b3fSopenharmony_cibenchmark tests to be registered programmatically.
1124a8c51b3fSopenharmony_ci
1125a8c51b3fSopenharmony_ciAdditionally `RegisterBenchmark` allows any callable object to be registered
1126a8c51b3fSopenharmony_cias a benchmark. Including capturing lambdas and function objects.
1127a8c51b3fSopenharmony_ci
1128a8c51b3fSopenharmony_ciFor Example:
1129a8c51b3fSopenharmony_ci```c++
1130a8c51b3fSopenharmony_ciauto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
1131a8c51b3fSopenharmony_ci
1132a8c51b3fSopenharmony_ciint main(int argc, char** argv) {
1133a8c51b3fSopenharmony_ci  for (auto& test_input : { /* ... */ })
1134a8c51b3fSopenharmony_ci      benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
1135a8c51b3fSopenharmony_ci  benchmark::Initialize(&argc, argv);
1136a8c51b3fSopenharmony_ci  benchmark::RunSpecifiedBenchmarks();
1137a8c51b3fSopenharmony_ci  benchmark::Shutdown();
1138a8c51b3fSopenharmony_ci}
1139a8c51b3fSopenharmony_ci```
1140a8c51b3fSopenharmony_ci
1141a8c51b3fSopenharmony_ci<a name="exiting-with-an-error" />
1142a8c51b3fSopenharmony_ci
1143a8c51b3fSopenharmony_ci## Exiting with an Error
1144a8c51b3fSopenharmony_ci
1145a8c51b3fSopenharmony_ciWhen errors caused by external influences, such as file I/O and network
1146a8c51b3fSopenharmony_cicommunication, occur within a benchmark the
1147a8c51b3fSopenharmony_ci`State::SkipWithError(const std::string& msg)` function can be used to skip that run
1148a8c51b3fSopenharmony_ciof benchmark and report the error. Note that only future iterations of the
1149a8c51b3fSopenharmony_ci`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
1150a8c51b3fSopenharmony_ciUsers must explicitly exit the loop, otherwise all iterations will be performed.
1151a8c51b3fSopenharmony_ciUsers may explicitly return to exit the benchmark immediately.
1152a8c51b3fSopenharmony_ci
1153a8c51b3fSopenharmony_ciThe `SkipWithError(...)` function may be used at any point within the benchmark,
1154a8c51b3fSopenharmony_ciincluding before and after the benchmark loop. Moreover, if `SkipWithError(...)`
1155a8c51b3fSopenharmony_cihas been used, it is not required to reach the benchmark loop and one may return
1156a8c51b3fSopenharmony_cifrom the benchmark function early.
1157a8c51b3fSopenharmony_ci
1158a8c51b3fSopenharmony_ciFor example:
1159a8c51b3fSopenharmony_ci
1160a8c51b3fSopenharmony_ci```c++
1161a8c51b3fSopenharmony_cistatic void BM_test(benchmark::State& state) {
1162a8c51b3fSopenharmony_ci  auto resource = GetResource();
1163a8c51b3fSopenharmony_ci  if (!resource.good()) {
1164a8c51b3fSopenharmony_ci    state.SkipWithError("Resource is not good!");
1165a8c51b3fSopenharmony_ci    // KeepRunning() loop will not be entered.
1166a8c51b3fSopenharmony_ci  }
1167a8c51b3fSopenharmony_ci  while (state.KeepRunning()) {
1168a8c51b3fSopenharmony_ci    auto data = resource.read_data();
1169a8c51b3fSopenharmony_ci    if (!resource.good()) {
1170a8c51b3fSopenharmony_ci      state.SkipWithError("Failed to read data!");
1171a8c51b3fSopenharmony_ci      break; // Needed to skip the rest of the iteration.
1172a8c51b3fSopenharmony_ci    }
1173a8c51b3fSopenharmony_ci    do_stuff(data);
1174a8c51b3fSopenharmony_ci  }
1175a8c51b3fSopenharmony_ci}
1176a8c51b3fSopenharmony_ci
1177a8c51b3fSopenharmony_cistatic void BM_test_ranged_fo(benchmark::State & state) {
1178a8c51b3fSopenharmony_ci  auto resource = GetResource();
1179a8c51b3fSopenharmony_ci  if (!resource.good()) {
1180a8c51b3fSopenharmony_ci    state.SkipWithError("Resource is not good!");
1181a8c51b3fSopenharmony_ci    return; // Early return is allowed when SkipWithError() has been used.
1182a8c51b3fSopenharmony_ci  }
1183a8c51b3fSopenharmony_ci  for (auto _ : state) {
1184a8c51b3fSopenharmony_ci    auto data = resource.read_data();
1185a8c51b3fSopenharmony_ci    if (!resource.good()) {
1186a8c51b3fSopenharmony_ci      state.SkipWithError("Failed to read data!");
1187a8c51b3fSopenharmony_ci      break; // REQUIRED to prevent all further iterations.
1188a8c51b3fSopenharmony_ci    }
1189a8c51b3fSopenharmony_ci    do_stuff(data);
1190a8c51b3fSopenharmony_ci  }
1191a8c51b3fSopenharmony_ci}
1192a8c51b3fSopenharmony_ci```
1193a8c51b3fSopenharmony_ci<a name="a-faster-keep-running-loop" />
1194a8c51b3fSopenharmony_ci
1195a8c51b3fSopenharmony_ci## A Faster KeepRunning Loop
1196a8c51b3fSopenharmony_ci
1197a8c51b3fSopenharmony_ciIn C++11 mode, a ranged-based for loop should be used in preference to
1198a8c51b3fSopenharmony_cithe `KeepRunning` loop for running the benchmarks. For example:
1199a8c51b3fSopenharmony_ci
1200a8c51b3fSopenharmony_ci```c++
1201a8c51b3fSopenharmony_cistatic void BM_Fast(benchmark::State &state) {
1202a8c51b3fSopenharmony_ci  for (auto _ : state) {
1203a8c51b3fSopenharmony_ci    FastOperation();
1204a8c51b3fSopenharmony_ci  }
1205a8c51b3fSopenharmony_ci}
1206a8c51b3fSopenharmony_ciBENCHMARK(BM_Fast);
1207a8c51b3fSopenharmony_ci```
1208a8c51b3fSopenharmony_ci
1209a8c51b3fSopenharmony_ciThe reason the ranged-for loop is faster than using `KeepRunning`, is
1210a8c51b3fSopenharmony_cibecause `KeepRunning` requires a memory load and store of the iteration count
1211a8c51b3fSopenharmony_ciever iteration, whereas the ranged-for variant is able to keep the iteration count
1212a8c51b3fSopenharmony_ciin a register.
1213a8c51b3fSopenharmony_ci
1214a8c51b3fSopenharmony_ciFor example, an empty inner loop of using the ranged-based for method looks like:
1215a8c51b3fSopenharmony_ci
1216a8c51b3fSopenharmony_ci```asm
1217a8c51b3fSopenharmony_ci# Loop Init
1218a8c51b3fSopenharmony_ci  mov rbx, qword ptr [r14 + 104]
1219a8c51b3fSopenharmony_ci  call benchmark::State::StartKeepRunning()
1220a8c51b3fSopenharmony_ci  test rbx, rbx
1221a8c51b3fSopenharmony_ci  je .LoopEnd
1222a8c51b3fSopenharmony_ci.LoopHeader: # =>This Inner Loop Header: Depth=1
1223a8c51b3fSopenharmony_ci  add rbx, -1
1224a8c51b3fSopenharmony_ci  jne .LoopHeader
1225a8c51b3fSopenharmony_ci.LoopEnd:
1226a8c51b3fSopenharmony_ci```
1227a8c51b3fSopenharmony_ci
1228a8c51b3fSopenharmony_ciCompared to an empty `KeepRunning` loop, which looks like:
1229a8c51b3fSopenharmony_ci
1230a8c51b3fSopenharmony_ci```asm
1231a8c51b3fSopenharmony_ci.LoopHeader: # in Loop: Header=BB0_3 Depth=1
1232a8c51b3fSopenharmony_ci  cmp byte ptr [rbx], 1
1233a8c51b3fSopenharmony_ci  jne .LoopInit
1234a8c51b3fSopenharmony_ci.LoopBody: # =>This Inner Loop Header: Depth=1
1235a8c51b3fSopenharmony_ci  mov rax, qword ptr [rbx + 8]
1236a8c51b3fSopenharmony_ci  lea rcx, [rax + 1]
1237a8c51b3fSopenharmony_ci  mov qword ptr [rbx + 8], rcx
1238a8c51b3fSopenharmony_ci  cmp rax, qword ptr [rbx + 104]
1239a8c51b3fSopenharmony_ci  jb .LoopHeader
1240a8c51b3fSopenharmony_ci  jmp .LoopEnd
1241a8c51b3fSopenharmony_ci.LoopInit:
1242a8c51b3fSopenharmony_ci  mov rdi, rbx
1243a8c51b3fSopenharmony_ci  call benchmark::State::StartKeepRunning()
1244a8c51b3fSopenharmony_ci  jmp .LoopBody
1245a8c51b3fSopenharmony_ci.LoopEnd:
1246a8c51b3fSopenharmony_ci```
1247a8c51b3fSopenharmony_ci
1248a8c51b3fSopenharmony_ciUnless C++03 compatibility is required, the ranged-for variant of writing
1249a8c51b3fSopenharmony_cithe benchmark loop should be preferred.
1250a8c51b3fSopenharmony_ci
1251a8c51b3fSopenharmony_ci<a name="disabling-cpu-frequency-scaling" />
1252a8c51b3fSopenharmony_ci
1253a8c51b3fSopenharmony_ci## Disabling CPU Frequency Scaling
1254a8c51b3fSopenharmony_ci
1255a8c51b3fSopenharmony_ciIf you see this error:
1256a8c51b3fSopenharmony_ci
1257a8c51b3fSopenharmony_ci```
1258a8c51b3fSopenharmony_ci***WARNING*** CPU scaling is enabled, the benchmark real time measurements may
1259a8c51b3fSopenharmony_cibe noisy and will incur extra overhead.
1260a8c51b3fSopenharmony_ci```
1261a8c51b3fSopenharmony_ci
1262a8c51b3fSopenharmony_ciyou might want to disable the CPU frequency scaling while running the
1263a8c51b3fSopenharmony_cibenchmark, as well as consider other ways to stabilize the performance of
1264a8c51b3fSopenharmony_ciyour system while benchmarking.
1265a8c51b3fSopenharmony_ci
1266a8c51b3fSopenharmony_ciSee [Reducing Variance](reducing_variance.md) for more information.
1267