1a8c51b3fSopenharmony_ci# User Guide 2a8c51b3fSopenharmony_ci 3a8c51b3fSopenharmony_ci## Command Line 4a8c51b3fSopenharmony_ci 5a8c51b3fSopenharmony_ci[Output Formats](#output-formats) 6a8c51b3fSopenharmony_ci 7a8c51b3fSopenharmony_ci[Output Files](#output-files) 8a8c51b3fSopenharmony_ci 9a8c51b3fSopenharmony_ci[Running Benchmarks](#running-benchmarks) 10a8c51b3fSopenharmony_ci 11a8c51b3fSopenharmony_ci[Running a Subset of Benchmarks](#running-a-subset-of-benchmarks) 12a8c51b3fSopenharmony_ci 13a8c51b3fSopenharmony_ci[Result Comparison](#result-comparison) 14a8c51b3fSopenharmony_ci 15a8c51b3fSopenharmony_ci[Extra Context](#extra-context) 16a8c51b3fSopenharmony_ci 17a8c51b3fSopenharmony_ci## Library 18a8c51b3fSopenharmony_ci 19a8c51b3fSopenharmony_ci[Runtime and Reporting Considerations](#runtime-and-reporting-considerations) 20a8c51b3fSopenharmony_ci 21a8c51b3fSopenharmony_ci[Setup/Teardown](#setupteardown) 22a8c51b3fSopenharmony_ci 23a8c51b3fSopenharmony_ci[Passing Arguments](#passing-arguments) 24a8c51b3fSopenharmony_ci 25a8c51b3fSopenharmony_ci[Custom Benchmark Name](#custom-benchmark-name) 26a8c51b3fSopenharmony_ci 27a8c51b3fSopenharmony_ci[Calculating Asymptotic Complexity](#asymptotic-complexity) 28a8c51b3fSopenharmony_ci 29a8c51b3fSopenharmony_ci[Templated Benchmarks](#templated-benchmarks) 30a8c51b3fSopenharmony_ci 31a8c51b3fSopenharmony_ci[Fixtures](#fixtures) 32a8c51b3fSopenharmony_ci 33a8c51b3fSopenharmony_ci[Custom Counters](#custom-counters) 34a8c51b3fSopenharmony_ci 35a8c51b3fSopenharmony_ci[Multithreaded Benchmarks](#multithreaded-benchmarks) 36a8c51b3fSopenharmony_ci 37a8c51b3fSopenharmony_ci[CPU Timers](#cpu-timers) 38a8c51b3fSopenharmony_ci 39a8c51b3fSopenharmony_ci[Manual Timing](#manual-timing) 40a8c51b3fSopenharmony_ci 41a8c51b3fSopenharmony_ci[Setting the Time Unit](#setting-the-time-unit) 42a8c51b3fSopenharmony_ci 43a8c51b3fSopenharmony_ci[Random Interleaving](random_interleaving.md) 44a8c51b3fSopenharmony_ci 45a8c51b3fSopenharmony_ci[User-Requested Performance Counters](perf_counters.md) 46a8c51b3fSopenharmony_ci 47a8c51b3fSopenharmony_ci[Preventing Optimization](#preventing-optimization) 48a8c51b3fSopenharmony_ci 49a8c51b3fSopenharmony_ci[Reporting Statistics](#reporting-statistics) 50a8c51b3fSopenharmony_ci 51a8c51b3fSopenharmony_ci[Custom Statistics](#custom-statistics) 52a8c51b3fSopenharmony_ci 53a8c51b3fSopenharmony_ci[Memory Usage](#memory-usage) 54a8c51b3fSopenharmony_ci 55a8c51b3fSopenharmony_ci[Using RegisterBenchmark](#using-register-benchmark) 56a8c51b3fSopenharmony_ci 57a8c51b3fSopenharmony_ci[Exiting with an Error](#exiting-with-an-error) 58a8c51b3fSopenharmony_ci 59a8c51b3fSopenharmony_ci[A Faster `KeepRunning` Loop](#a-faster-keep-running-loop) 60a8c51b3fSopenharmony_ci 61a8c51b3fSopenharmony_ci## Benchmarking Tips 62a8c51b3fSopenharmony_ci 63a8c51b3fSopenharmony_ci[Disabling CPU Frequency Scaling](#disabling-cpu-frequency-scaling) 64a8c51b3fSopenharmony_ci 65a8c51b3fSopenharmony_ci[Reducing Variance in Benchmarks](reducing_variance.md) 66a8c51b3fSopenharmony_ci 67a8c51b3fSopenharmony_ci<a name="output-formats" /> 68a8c51b3fSopenharmony_ci 69a8c51b3fSopenharmony_ci## Output Formats 70a8c51b3fSopenharmony_ci 71a8c51b3fSopenharmony_ciThe library supports multiple output formats. Use the 72a8c51b3fSopenharmony_ci`--benchmark_format=<console|json|csv>` flag (or set the 73a8c51b3fSopenharmony_ci`BENCHMARK_FORMAT=<console|json|csv>` environment variable) to set 74a8c51b3fSopenharmony_cithe format type. `console` is the default format. 75a8c51b3fSopenharmony_ci 76a8c51b3fSopenharmony_ciThe Console format is intended to be a human readable format. By default 77a8c51b3fSopenharmony_cithe format generates color output. Context is output on stderr and the 78a8c51b3fSopenharmony_citabular data on stdout. Example tabular output looks like: 79a8c51b3fSopenharmony_ci 80a8c51b3fSopenharmony_ci``` 81a8c51b3fSopenharmony_ciBenchmark Time(ns) CPU(ns) Iterations 82a8c51b3fSopenharmony_ci---------------------------------------------------------------------- 83a8c51b3fSopenharmony_ciBM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s 84a8c51b3fSopenharmony_ciBM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s 85a8c51b3fSopenharmony_ciBM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s 86a8c51b3fSopenharmony_ci``` 87a8c51b3fSopenharmony_ci 88a8c51b3fSopenharmony_ciThe JSON format outputs human readable json split into two top level attributes. 89a8c51b3fSopenharmony_ciThe `context` attribute contains information about the run in general, including 90a8c51b3fSopenharmony_ciinformation about the CPU and the date. 91a8c51b3fSopenharmony_ciThe `benchmarks` attribute contains a list of every benchmark run. Example json 92a8c51b3fSopenharmony_cioutput looks like: 93a8c51b3fSopenharmony_ci 94a8c51b3fSopenharmony_ci```json 95a8c51b3fSopenharmony_ci{ 96a8c51b3fSopenharmony_ci "context": { 97a8c51b3fSopenharmony_ci "date": "2015/03/17-18:40:25", 98a8c51b3fSopenharmony_ci "num_cpus": 40, 99a8c51b3fSopenharmony_ci "mhz_per_cpu": 2801, 100a8c51b3fSopenharmony_ci "cpu_scaling_enabled": false, 101a8c51b3fSopenharmony_ci "build_type": "debug" 102a8c51b3fSopenharmony_ci }, 103a8c51b3fSopenharmony_ci "benchmarks": [ 104a8c51b3fSopenharmony_ci { 105a8c51b3fSopenharmony_ci "name": "BM_SetInsert/1024/1", 106a8c51b3fSopenharmony_ci "iterations": 94877, 107a8c51b3fSopenharmony_ci "real_time": 29275, 108a8c51b3fSopenharmony_ci "cpu_time": 29836, 109a8c51b3fSopenharmony_ci "bytes_per_second": 134066, 110a8c51b3fSopenharmony_ci "items_per_second": 33516 111a8c51b3fSopenharmony_ci }, 112a8c51b3fSopenharmony_ci { 113a8c51b3fSopenharmony_ci "name": "BM_SetInsert/1024/8", 114a8c51b3fSopenharmony_ci "iterations": 21609, 115a8c51b3fSopenharmony_ci "real_time": 32317, 116a8c51b3fSopenharmony_ci "cpu_time": 32429, 117a8c51b3fSopenharmony_ci "bytes_per_second": 986770, 118a8c51b3fSopenharmony_ci "items_per_second": 246693 119a8c51b3fSopenharmony_ci }, 120a8c51b3fSopenharmony_ci { 121a8c51b3fSopenharmony_ci "name": "BM_SetInsert/1024/10", 122a8c51b3fSopenharmony_ci "iterations": 21393, 123a8c51b3fSopenharmony_ci "real_time": 32724, 124a8c51b3fSopenharmony_ci "cpu_time": 33355, 125a8c51b3fSopenharmony_ci "bytes_per_second": 1199226, 126a8c51b3fSopenharmony_ci "items_per_second": 299807 127a8c51b3fSopenharmony_ci } 128a8c51b3fSopenharmony_ci ] 129a8c51b3fSopenharmony_ci} 130a8c51b3fSopenharmony_ci``` 131a8c51b3fSopenharmony_ci 132a8c51b3fSopenharmony_ciThe CSV format outputs comma-separated values. The `context` is output on stderr 133a8c51b3fSopenharmony_ciand the CSV itself on stdout. Example CSV output looks like: 134a8c51b3fSopenharmony_ci 135a8c51b3fSopenharmony_ci``` 136a8c51b3fSopenharmony_ciname,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 137a8c51b3fSopenharmony_ci"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 138a8c51b3fSopenharmony_ci"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 139a8c51b3fSopenharmony_ci"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 140a8c51b3fSopenharmony_ci``` 141a8c51b3fSopenharmony_ci 142a8c51b3fSopenharmony_ci<a name="output-files" /> 143a8c51b3fSopenharmony_ci 144a8c51b3fSopenharmony_ci## Output Files 145a8c51b3fSopenharmony_ci 146a8c51b3fSopenharmony_ciWrite benchmark results to a file with the `--benchmark_out=<filename>` option 147a8c51b3fSopenharmony_ci(or set `BENCHMARK_OUT`). Specify the output format with 148a8c51b3fSopenharmony_ci`--benchmark_out_format={json|console|csv}` (or set 149a8c51b3fSopenharmony_ci`BENCHMARK_OUT_FORMAT={json|console|csv}`). Note that the 'csv' reporter is 150a8c51b3fSopenharmony_cideprecated and the saved `.csv` file 151a8c51b3fSopenharmony_ci[is not parsable](https://github.com/google/benchmark/issues/794) by csv 152a8c51b3fSopenharmony_ciparsers. 153a8c51b3fSopenharmony_ci 154a8c51b3fSopenharmony_ciSpecifying `--benchmark_out` does not suppress the console output. 155a8c51b3fSopenharmony_ci 156a8c51b3fSopenharmony_ci<a name="running-benchmarks" /> 157a8c51b3fSopenharmony_ci 158a8c51b3fSopenharmony_ci## Running Benchmarks 159a8c51b3fSopenharmony_ci 160a8c51b3fSopenharmony_ciBenchmarks are executed by running the produced binaries. Benchmarks binaries, 161a8c51b3fSopenharmony_ciby default, accept options that may be specified either through their command 162a8c51b3fSopenharmony_ciline interface or by setting environment variables before execution. For every 163a8c51b3fSopenharmony_ci`--option_flag=<value>` CLI switch, a corresponding environment variable 164a8c51b3fSopenharmony_ci`OPTION_FLAG=<value>` exist and is used as default if set (CLI switches always 165a8c51b3fSopenharmony_ci prevails). A complete list of CLI options is available running benchmarks 166a8c51b3fSopenharmony_ci with the `--help` switch. 167a8c51b3fSopenharmony_ci 168a8c51b3fSopenharmony_ci<a name="running-a-subset-of-benchmarks" /> 169a8c51b3fSopenharmony_ci 170a8c51b3fSopenharmony_ci## Running a Subset of Benchmarks 171a8c51b3fSopenharmony_ci 172a8c51b3fSopenharmony_ciThe `--benchmark_filter=<regex>` option (or `BENCHMARK_FILTER=<regex>` 173a8c51b3fSopenharmony_cienvironment variable) can be used to only run the benchmarks that match 174a8c51b3fSopenharmony_cithe specified `<regex>`. For example: 175a8c51b3fSopenharmony_ci 176a8c51b3fSopenharmony_ci```bash 177a8c51b3fSopenharmony_ci$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 178a8c51b3fSopenharmony_ciRun on (1 X 2300 MHz CPU ) 179a8c51b3fSopenharmony_ci2016-06-25 19:34:24 180a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 181a8c51b3fSopenharmony_ci---------------------------------------------------- 182a8c51b3fSopenharmony_ciBM_memcpy/32 11 ns 11 ns 79545455 183a8c51b3fSopenharmony_ciBM_memcpy/32k 2181 ns 2185 ns 324074 184a8c51b3fSopenharmony_ciBM_memcpy/32 12 ns 12 ns 54687500 185a8c51b3fSopenharmony_ciBM_memcpy/32k 1834 ns 1837 ns 357143 186a8c51b3fSopenharmony_ci``` 187a8c51b3fSopenharmony_ci 188a8c51b3fSopenharmony_ci## Disabling Benchmarks 189a8c51b3fSopenharmony_ci 190a8c51b3fSopenharmony_ciIt is possible to temporarily disable benchmarks by renaming the benchmark 191a8c51b3fSopenharmony_cifunction to have the prefix "DISABLED_". This will cause the benchmark to 192a8c51b3fSopenharmony_cibe skipped at runtime. 193a8c51b3fSopenharmony_ci 194a8c51b3fSopenharmony_ci<a name="result-comparison" /> 195a8c51b3fSopenharmony_ci 196a8c51b3fSopenharmony_ci## Result comparison 197a8c51b3fSopenharmony_ci 198a8c51b3fSopenharmony_ciIt is possible to compare the benchmarking results. 199a8c51b3fSopenharmony_ciSee [Additional Tooling Documentation](tools.md) 200a8c51b3fSopenharmony_ci 201a8c51b3fSopenharmony_ci<a name="extra-context" /> 202a8c51b3fSopenharmony_ci 203a8c51b3fSopenharmony_ci## Extra Context 204a8c51b3fSopenharmony_ci 205a8c51b3fSopenharmony_ciSometimes it's useful to add extra context to the content printed before the 206a8c51b3fSopenharmony_ciresults. By default this section includes information about the CPU on which 207a8c51b3fSopenharmony_cithe benchmarks are running. If you do want to add more context, you can use 208a8c51b3fSopenharmony_cithe `benchmark_context` command line flag: 209a8c51b3fSopenharmony_ci 210a8c51b3fSopenharmony_ci```bash 211a8c51b3fSopenharmony_ci$ ./run_benchmarks --benchmark_context=pwd=`pwd` 212a8c51b3fSopenharmony_ciRun on (1 x 2300 MHz CPU) 213a8c51b3fSopenharmony_cipwd: /home/user/benchmark/ 214a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 215a8c51b3fSopenharmony_ci---------------------------------------------------- 216a8c51b3fSopenharmony_ciBM_memcpy/32 11 ns 11 ns 79545455 217a8c51b3fSopenharmony_ciBM_memcpy/32k 2181 ns 2185 ns 324074 218a8c51b3fSopenharmony_ci``` 219a8c51b3fSopenharmony_ci 220a8c51b3fSopenharmony_ciYou can get the same effect with the API: 221a8c51b3fSopenharmony_ci 222a8c51b3fSopenharmony_ci```c++ 223a8c51b3fSopenharmony_ci benchmark::AddCustomContext("foo", "bar"); 224a8c51b3fSopenharmony_ci``` 225a8c51b3fSopenharmony_ci 226a8c51b3fSopenharmony_ciNote that attempts to add a second value with the same key will fail with an 227a8c51b3fSopenharmony_cierror message. 228a8c51b3fSopenharmony_ci 229a8c51b3fSopenharmony_ci<a name="runtime-and-reporting-considerations" /> 230a8c51b3fSopenharmony_ci 231a8c51b3fSopenharmony_ci## Runtime and Reporting Considerations 232a8c51b3fSopenharmony_ci 233a8c51b3fSopenharmony_ciWhen the benchmark binary is executed, each benchmark function is run serially. 234a8c51b3fSopenharmony_ciThe number of iterations to run is determined dynamically by running the 235a8c51b3fSopenharmony_cibenchmark a few times and measuring the time taken and ensuring that the 236a8c51b3fSopenharmony_ciultimate result will be statistically stable. As such, faster benchmark 237a8c51b3fSopenharmony_cifunctions will be run for more iterations than slower benchmark functions, and 238a8c51b3fSopenharmony_cithe number of iterations is thus reported. 239a8c51b3fSopenharmony_ci 240a8c51b3fSopenharmony_ciIn all cases, the number of iterations for which the benchmark is run is 241a8c51b3fSopenharmony_cigoverned by the amount of time the benchmark takes. Concretely, the number of 242a8c51b3fSopenharmony_ciiterations is at least one, not more than 1e9, until CPU time is greater than 243a8c51b3fSopenharmony_cithe minimum time, or the wallclock time is 5x minimum time. The minimum time is 244a8c51b3fSopenharmony_ciset per benchmark by calling `MinTime` on the registered benchmark object. 245a8c51b3fSopenharmony_ci 246a8c51b3fSopenharmony_ciFurthermore warming up a benchmark might be necessary in order to get 247a8c51b3fSopenharmony_cistable results because of e.g caching effects of the code under benchmark. 248a8c51b3fSopenharmony_ciWarming up means running the benchmark a given amount of time, before 249a8c51b3fSopenharmony_ciresults are actually taken into account. The amount of time for which 250a8c51b3fSopenharmony_cithe warmup should be run can be set per benchmark by calling 251a8c51b3fSopenharmony_ci`MinWarmUpTime` on the registered benchmark object or for all benchmarks 252a8c51b3fSopenharmony_ciusing the `--benchmark_min_warmup_time` command-line option. Note that 253a8c51b3fSopenharmony_ci`MinWarmUpTime` will overwrite the value of `--benchmark_min_warmup_time` 254a8c51b3fSopenharmony_cifor the single benchmark. How many iterations the warmup run of each 255a8c51b3fSopenharmony_cibenchmark takes is determined the same way as described in the paragraph 256a8c51b3fSopenharmony_ciabove. Per default the warmup phase is set to 0 seconds and is therefore 257a8c51b3fSopenharmony_cidisabled. 258a8c51b3fSopenharmony_ci 259a8c51b3fSopenharmony_ciAverage timings are then reported over the iterations run. If multiple 260a8c51b3fSopenharmony_cirepetitions are requested using the `--benchmark_repetitions` command-line 261a8c51b3fSopenharmony_cioption, or at registration time, the benchmark function will be run several 262a8c51b3fSopenharmony_citimes and statistical results across these repetitions will also be reported. 263a8c51b3fSopenharmony_ci 264a8c51b3fSopenharmony_ciAs well as the per-benchmark entries, a preamble in the report will include 265a8c51b3fSopenharmony_ciinformation about the machine on which the benchmarks are run. 266a8c51b3fSopenharmony_ci 267a8c51b3fSopenharmony_ci<a name="setup-teardown" /> 268a8c51b3fSopenharmony_ci 269a8c51b3fSopenharmony_ci## Setup/Teardown 270a8c51b3fSopenharmony_ci 271a8c51b3fSopenharmony_ciGlobal setup/teardown specific to each benchmark can be done by 272a8c51b3fSopenharmony_cipassing a callback to Setup/Teardown: 273a8c51b3fSopenharmony_ci 274a8c51b3fSopenharmony_ciThe setup/teardown callbacks will be invoked once for each benchmark. If the 275a8c51b3fSopenharmony_cibenchmark is multi-threaded (will run in k threads), they will be invoked 276a8c51b3fSopenharmony_ciexactly once before each run with k threads. 277a8c51b3fSopenharmony_ci 278a8c51b3fSopenharmony_ciIf the benchmark uses different size groups of threads, the above will be true 279a8c51b3fSopenharmony_cifor each size group. 280a8c51b3fSopenharmony_ci 281a8c51b3fSopenharmony_ciEg., 282a8c51b3fSopenharmony_ci 283a8c51b3fSopenharmony_ci```c++ 284a8c51b3fSopenharmony_cistatic void DoSetup(const benchmark::State& state) { 285a8c51b3fSopenharmony_ci} 286a8c51b3fSopenharmony_ci 287a8c51b3fSopenharmony_cistatic void DoTeardown(const benchmark::State& state) { 288a8c51b3fSopenharmony_ci} 289a8c51b3fSopenharmony_ci 290a8c51b3fSopenharmony_cistatic void BM_func(benchmark::State& state) {...} 291a8c51b3fSopenharmony_ci 292a8c51b3fSopenharmony_ciBENCHMARK(BM_func)->Arg(1)->Arg(3)->Threads(16)->Threads(32)->Setup(DoSetup)->Teardown(DoTeardown); 293a8c51b3fSopenharmony_ci 294a8c51b3fSopenharmony_ci``` 295a8c51b3fSopenharmony_ci 296a8c51b3fSopenharmony_ciIn this example, `DoSetup` and `DoTearDown` will be invoked 4 times each, 297a8c51b3fSopenharmony_cispecifically, once for each of this family: 298a8c51b3fSopenharmony_ci - BM_func_Arg_1_Threads_16, BM_func_Arg_1_Threads_32 299a8c51b3fSopenharmony_ci - BM_func_Arg_3_Threads_16, BM_func_Arg_3_Threads_32 300a8c51b3fSopenharmony_ci 301a8c51b3fSopenharmony_ci<a name="passing-arguments" /> 302a8c51b3fSopenharmony_ci 303a8c51b3fSopenharmony_ci## Passing Arguments 304a8c51b3fSopenharmony_ci 305a8c51b3fSopenharmony_ciSometimes a family of benchmarks can be implemented with just one routine that 306a8c51b3fSopenharmony_citakes an extra argument to specify which one of the family of benchmarks to 307a8c51b3fSopenharmony_cirun. For example, the following code defines a family of benchmarks for 308a8c51b3fSopenharmony_cimeasuring the speed of `memcpy()` calls of different lengths: 309a8c51b3fSopenharmony_ci 310a8c51b3fSopenharmony_ci```c++ 311a8c51b3fSopenharmony_cistatic void BM_memcpy(benchmark::State& state) { 312a8c51b3fSopenharmony_ci char* src = new char[state.range(0)]; 313a8c51b3fSopenharmony_ci char* dst = new char[state.range(0)]; 314a8c51b3fSopenharmony_ci memset(src, 'x', state.range(0)); 315a8c51b3fSopenharmony_ci for (auto _ : state) 316a8c51b3fSopenharmony_ci memcpy(dst, src, state.range(0)); 317a8c51b3fSopenharmony_ci state.SetBytesProcessed(int64_t(state.iterations()) * 318a8c51b3fSopenharmony_ci int64_t(state.range(0))); 319a8c51b3fSopenharmony_ci delete[] src; 320a8c51b3fSopenharmony_ci delete[] dst; 321a8c51b3fSopenharmony_ci} 322a8c51b3fSopenharmony_ciBENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(4<<10)->Arg(8<<10); 323a8c51b3fSopenharmony_ci``` 324a8c51b3fSopenharmony_ci 325a8c51b3fSopenharmony_ciThe preceding code is quite repetitive, and can be replaced with the following 326a8c51b3fSopenharmony_cishort-hand. The following invocation will pick a few appropriate arguments in 327a8c51b3fSopenharmony_cithe specified range and will generate a benchmark for each such argument. 328a8c51b3fSopenharmony_ci 329a8c51b3fSopenharmony_ci```c++ 330a8c51b3fSopenharmony_ciBENCHMARK(BM_memcpy)->Range(8, 8<<10); 331a8c51b3fSopenharmony_ci``` 332a8c51b3fSopenharmony_ci 333a8c51b3fSopenharmony_ciBy default the arguments in the range are generated in multiples of eight and 334a8c51b3fSopenharmony_cithe command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the 335a8c51b3fSopenharmony_cirange multiplier is changed to multiples of two. 336a8c51b3fSopenharmony_ci 337a8c51b3fSopenharmony_ci```c++ 338a8c51b3fSopenharmony_ciBENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); 339a8c51b3fSopenharmony_ci``` 340a8c51b3fSopenharmony_ci 341a8c51b3fSopenharmony_ciNow arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. 342a8c51b3fSopenharmony_ci 343a8c51b3fSopenharmony_ciThe preceding code shows a method of defining a sparse range. The following 344a8c51b3fSopenharmony_ciexample shows a method of defining a dense range. It is then used to benchmark 345a8c51b3fSopenharmony_cithe performance of `std::vector` initialization for uniformly increasing sizes. 346a8c51b3fSopenharmony_ci 347a8c51b3fSopenharmony_ci```c++ 348a8c51b3fSopenharmony_cistatic void BM_DenseRange(benchmark::State& state) { 349a8c51b3fSopenharmony_ci for(auto _ : state) { 350a8c51b3fSopenharmony_ci std::vector<int> v(state.range(0), state.range(0)); 351a8c51b3fSopenharmony_ci auto data = v.data(); 352a8c51b3fSopenharmony_ci benchmark::DoNotOptimize(data); 353a8c51b3fSopenharmony_ci benchmark::ClobberMemory(); 354a8c51b3fSopenharmony_ci } 355a8c51b3fSopenharmony_ci} 356a8c51b3fSopenharmony_ciBENCHMARK(BM_DenseRange)->DenseRange(0, 1024, 128); 357a8c51b3fSopenharmony_ci``` 358a8c51b3fSopenharmony_ci 359a8c51b3fSopenharmony_ciNow arguments generated are [ 0, 128, 256, 384, 512, 640, 768, 896, 1024 ]. 360a8c51b3fSopenharmony_ci 361a8c51b3fSopenharmony_ciYou might have a benchmark that depends on two or more inputs. For example, the 362a8c51b3fSopenharmony_cifollowing code defines a family of benchmarks for measuring the speed of set 363a8c51b3fSopenharmony_ciinsertion. 364a8c51b3fSopenharmony_ci 365a8c51b3fSopenharmony_ci```c++ 366a8c51b3fSopenharmony_cistatic void BM_SetInsert(benchmark::State& state) { 367a8c51b3fSopenharmony_ci std::set<int> data; 368a8c51b3fSopenharmony_ci for (auto _ : state) { 369a8c51b3fSopenharmony_ci state.PauseTiming(); 370a8c51b3fSopenharmony_ci data = ConstructRandomSet(state.range(0)); 371a8c51b3fSopenharmony_ci state.ResumeTiming(); 372a8c51b3fSopenharmony_ci for (int j = 0; j < state.range(1); ++j) 373a8c51b3fSopenharmony_ci data.insert(RandomNumber()); 374a8c51b3fSopenharmony_ci } 375a8c51b3fSopenharmony_ci} 376a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert) 377a8c51b3fSopenharmony_ci ->Args({1<<10, 128}) 378a8c51b3fSopenharmony_ci ->Args({2<<10, 128}) 379a8c51b3fSopenharmony_ci ->Args({4<<10, 128}) 380a8c51b3fSopenharmony_ci ->Args({8<<10, 128}) 381a8c51b3fSopenharmony_ci ->Args({1<<10, 512}) 382a8c51b3fSopenharmony_ci ->Args({2<<10, 512}) 383a8c51b3fSopenharmony_ci ->Args({4<<10, 512}) 384a8c51b3fSopenharmony_ci ->Args({8<<10, 512}); 385a8c51b3fSopenharmony_ci``` 386a8c51b3fSopenharmony_ci 387a8c51b3fSopenharmony_ciThe preceding code is quite repetitive, and can be replaced with the following 388a8c51b3fSopenharmony_cishort-hand. The following macro will pick a few appropriate arguments in the 389a8c51b3fSopenharmony_ciproduct of the two specified ranges and will generate a benchmark for each such 390a8c51b3fSopenharmony_cipair. 391a8c51b3fSopenharmony_ci 392a8c51b3fSopenharmony_ci<!-- {% raw %} --> 393a8c51b3fSopenharmony_ci```c++ 394a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}}); 395a8c51b3fSopenharmony_ci``` 396a8c51b3fSopenharmony_ci<!-- {% endraw %} --> 397a8c51b3fSopenharmony_ci 398a8c51b3fSopenharmony_ciSome benchmarks may require specific argument values that cannot be expressed 399a8c51b3fSopenharmony_ciwith `Ranges`. In this case, `ArgsProduct` offers the ability to generate a 400a8c51b3fSopenharmony_cibenchmark input for each combination in the product of the supplied vectors. 401a8c51b3fSopenharmony_ci 402a8c51b3fSopenharmony_ci<!-- {% raw %} --> 403a8c51b3fSopenharmony_ci```c++ 404a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert) 405a8c51b3fSopenharmony_ci ->ArgsProduct({{1<<10, 3<<10, 8<<10}, {20, 40, 60, 80}}) 406a8c51b3fSopenharmony_ci// would generate the same benchmark arguments as 407a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert) 408a8c51b3fSopenharmony_ci ->Args({1<<10, 20}) 409a8c51b3fSopenharmony_ci ->Args({3<<10, 20}) 410a8c51b3fSopenharmony_ci ->Args({8<<10, 20}) 411a8c51b3fSopenharmony_ci ->Args({3<<10, 40}) 412a8c51b3fSopenharmony_ci ->Args({8<<10, 40}) 413a8c51b3fSopenharmony_ci ->Args({1<<10, 40}) 414a8c51b3fSopenharmony_ci ->Args({1<<10, 60}) 415a8c51b3fSopenharmony_ci ->Args({3<<10, 60}) 416a8c51b3fSopenharmony_ci ->Args({8<<10, 60}) 417a8c51b3fSopenharmony_ci ->Args({1<<10, 80}) 418a8c51b3fSopenharmony_ci ->Args({3<<10, 80}) 419a8c51b3fSopenharmony_ci ->Args({8<<10, 80}); 420a8c51b3fSopenharmony_ci``` 421a8c51b3fSopenharmony_ci<!-- {% endraw %} --> 422a8c51b3fSopenharmony_ci 423a8c51b3fSopenharmony_ciFor the most common scenarios, helper methods for creating a list of 424a8c51b3fSopenharmony_ciintegers for a given sparse or dense range are provided. 425a8c51b3fSopenharmony_ci 426a8c51b3fSopenharmony_ci```c++ 427a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert) 428a8c51b3fSopenharmony_ci ->ArgsProduct({ 429a8c51b3fSopenharmony_ci benchmark::CreateRange(8, 128, /*multi=*/2), 430a8c51b3fSopenharmony_ci benchmark::CreateDenseRange(1, 4, /*step=*/1) 431a8c51b3fSopenharmony_ci }) 432a8c51b3fSopenharmony_ci// would generate the same benchmark arguments as 433a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert) 434a8c51b3fSopenharmony_ci ->ArgsProduct({ 435a8c51b3fSopenharmony_ci {8, 16, 32, 64, 128}, 436a8c51b3fSopenharmony_ci {1, 2, 3, 4} 437a8c51b3fSopenharmony_ci }); 438a8c51b3fSopenharmony_ci``` 439a8c51b3fSopenharmony_ci 440a8c51b3fSopenharmony_ciFor more complex patterns of inputs, passing a custom function to `Apply` allows 441a8c51b3fSopenharmony_ciprogrammatic specification of an arbitrary set of arguments on which to run the 442a8c51b3fSopenharmony_cibenchmark. The following example enumerates a dense range on one parameter, 443a8c51b3fSopenharmony_ciand a sparse range on the second. 444a8c51b3fSopenharmony_ci 445a8c51b3fSopenharmony_ci```c++ 446a8c51b3fSopenharmony_cistatic void CustomArguments(benchmark::internal::Benchmark* b) { 447a8c51b3fSopenharmony_ci for (int i = 0; i <= 10; ++i) 448a8c51b3fSopenharmony_ci for (int j = 32; j <= 1024*1024; j *= 8) 449a8c51b3fSopenharmony_ci b->Args({i, j}); 450a8c51b3fSopenharmony_ci} 451a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert)->Apply(CustomArguments); 452a8c51b3fSopenharmony_ci``` 453a8c51b3fSopenharmony_ci 454a8c51b3fSopenharmony_ci### Passing Arbitrary Arguments to a Benchmark 455a8c51b3fSopenharmony_ci 456a8c51b3fSopenharmony_ciIn C++11 it is possible to define a benchmark that takes an arbitrary number 457a8c51b3fSopenharmony_ciof extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` 458a8c51b3fSopenharmony_cimacro creates a benchmark that invokes `func` with the `benchmark::State` as 459a8c51b3fSopenharmony_cithe first argument followed by the specified `args...`. 460a8c51b3fSopenharmony_ciThe `test_case_name` is appended to the name of the benchmark and 461a8c51b3fSopenharmony_cishould describe the values passed. 462a8c51b3fSopenharmony_ci 463a8c51b3fSopenharmony_ci```c++ 464a8c51b3fSopenharmony_citemplate <class ...Args> 465a8c51b3fSopenharmony_civoid BM_takes_args(benchmark::State& state, Args&&... args) { 466a8c51b3fSopenharmony_ci auto args_tuple = std::make_tuple(std::move(args)...); 467a8c51b3fSopenharmony_ci for (auto _ : state) { 468a8c51b3fSopenharmony_ci std::cout << std::get<0>(args_tuple) << ": " << std::get<1>(args_tuple) 469a8c51b3fSopenharmony_ci << '\n'; 470a8c51b3fSopenharmony_ci [...] 471a8c51b3fSopenharmony_ci } 472a8c51b3fSopenharmony_ci} 473a8c51b3fSopenharmony_ci// Registers a benchmark named "BM_takes_args/int_string_test" that passes 474a8c51b3fSopenharmony_ci// the specified values to `args`. 475a8c51b3fSopenharmony_ciBENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); 476a8c51b3fSopenharmony_ci 477a8c51b3fSopenharmony_ci// Registers the same benchmark "BM_takes_args/int_test" that passes 478a8c51b3fSopenharmony_ci// the specified values to `args`. 479a8c51b3fSopenharmony_ciBENCHMARK_CAPTURE(BM_takes_args, int_test, 42, 43); 480a8c51b3fSopenharmony_ci``` 481a8c51b3fSopenharmony_ci 482a8c51b3fSopenharmony_ciNote that elements of `...args` may refer to global variables. Users should 483a8c51b3fSopenharmony_ciavoid modifying global state inside of a benchmark. 484a8c51b3fSopenharmony_ci 485a8c51b3fSopenharmony_ci<a name="asymptotic-complexity" /> 486a8c51b3fSopenharmony_ci 487a8c51b3fSopenharmony_ci## Calculating Asymptotic Complexity (Big O) 488a8c51b3fSopenharmony_ci 489a8c51b3fSopenharmony_ciAsymptotic complexity might be calculated for a family of benchmarks. The 490a8c51b3fSopenharmony_cifollowing code will calculate the coefficient for the high-order term in the 491a8c51b3fSopenharmony_cirunning time and the normalized root-mean square error of string comparison. 492a8c51b3fSopenharmony_ci 493a8c51b3fSopenharmony_ci```c++ 494a8c51b3fSopenharmony_cistatic void BM_StringCompare(benchmark::State& state) { 495a8c51b3fSopenharmony_ci std::string s1(state.range(0), '-'); 496a8c51b3fSopenharmony_ci std::string s2(state.range(0), '-'); 497a8c51b3fSopenharmony_ci for (auto _ : state) { 498a8c51b3fSopenharmony_ci auto comparison_result = s1.compare(s2); 499a8c51b3fSopenharmony_ci benchmark::DoNotOptimize(comparison_result); 500a8c51b3fSopenharmony_ci } 501a8c51b3fSopenharmony_ci state.SetComplexityN(state.range(0)); 502a8c51b3fSopenharmony_ci} 503a8c51b3fSopenharmony_ciBENCHMARK(BM_StringCompare) 504a8c51b3fSopenharmony_ci ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); 505a8c51b3fSopenharmony_ci``` 506a8c51b3fSopenharmony_ci 507a8c51b3fSopenharmony_ciAs shown in the following invocation, asymptotic complexity might also be 508a8c51b3fSopenharmony_cicalculated automatically. 509a8c51b3fSopenharmony_ci 510a8c51b3fSopenharmony_ci```c++ 511a8c51b3fSopenharmony_ciBENCHMARK(BM_StringCompare) 512a8c51b3fSopenharmony_ci ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); 513a8c51b3fSopenharmony_ci``` 514a8c51b3fSopenharmony_ci 515a8c51b3fSopenharmony_ciThe following code will specify asymptotic complexity with a lambda function, 516a8c51b3fSopenharmony_cithat might be used to customize high-order term calculation. 517a8c51b3fSopenharmony_ci 518a8c51b3fSopenharmony_ci```c++ 519a8c51b3fSopenharmony_ciBENCHMARK(BM_StringCompare)->RangeMultiplier(2) 520a8c51b3fSopenharmony_ci ->Range(1<<10, 1<<18)->Complexity([](benchmark::IterationCount n)->double{return n; }); 521a8c51b3fSopenharmony_ci``` 522a8c51b3fSopenharmony_ci 523a8c51b3fSopenharmony_ci<a name="custom-benchmark-name" /> 524a8c51b3fSopenharmony_ci 525a8c51b3fSopenharmony_ci## Custom Benchmark Name 526a8c51b3fSopenharmony_ci 527a8c51b3fSopenharmony_ciYou can change the benchmark's name as follows: 528a8c51b3fSopenharmony_ci 529a8c51b3fSopenharmony_ci```c++ 530a8c51b3fSopenharmony_ciBENCHMARK(BM_memcpy)->Name("memcpy")->RangeMultiplier(2)->Range(8, 8<<10); 531a8c51b3fSopenharmony_ci``` 532a8c51b3fSopenharmony_ci 533a8c51b3fSopenharmony_ciThe invocation will execute the benchmark as before using `BM_memcpy` but changes 534a8c51b3fSopenharmony_cithe prefix in the report to `memcpy`. 535a8c51b3fSopenharmony_ci 536a8c51b3fSopenharmony_ci<a name="templated-benchmarks" /> 537a8c51b3fSopenharmony_ci 538a8c51b3fSopenharmony_ci## Templated Benchmarks 539a8c51b3fSopenharmony_ci 540a8c51b3fSopenharmony_ciThis example produces and consumes messages of size `sizeof(v)` `range_x` 541a8c51b3fSopenharmony_citimes. It also outputs throughput in the absence of multiprogramming. 542a8c51b3fSopenharmony_ci 543a8c51b3fSopenharmony_ci```c++ 544a8c51b3fSopenharmony_citemplate <class Q> void BM_Sequential(benchmark::State& state) { 545a8c51b3fSopenharmony_ci Q q; 546a8c51b3fSopenharmony_ci typename Q::value_type v; 547a8c51b3fSopenharmony_ci for (auto _ : state) { 548a8c51b3fSopenharmony_ci for (int i = state.range(0); i--; ) 549a8c51b3fSopenharmony_ci q.push(v); 550a8c51b3fSopenharmony_ci for (int e = state.range(0); e--; ) 551a8c51b3fSopenharmony_ci q.Wait(&v); 552a8c51b3fSopenharmony_ci } 553a8c51b3fSopenharmony_ci // actually messages, not bytes: 554a8c51b3fSopenharmony_ci state.SetBytesProcessed( 555a8c51b3fSopenharmony_ci static_cast<int64_t>(state.iterations())*state.range(0)); 556a8c51b3fSopenharmony_ci} 557a8c51b3fSopenharmony_ci// C++03 558a8c51b3fSopenharmony_ciBENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 559a8c51b3fSopenharmony_ci 560a8c51b3fSopenharmony_ci// C++11 or newer, you can use the BENCHMARK macro with template parameters: 561a8c51b3fSopenharmony_ciBENCHMARK(BM_Sequential<WaitQueue<int>>)->Range(1<<0, 1<<10); 562a8c51b3fSopenharmony_ci 563a8c51b3fSopenharmony_ci``` 564a8c51b3fSopenharmony_ci 565a8c51b3fSopenharmony_ciThree macros are provided for adding benchmark templates. 566a8c51b3fSopenharmony_ci 567a8c51b3fSopenharmony_ci```c++ 568a8c51b3fSopenharmony_ci#ifdef BENCHMARK_HAS_CXX11 569a8c51b3fSopenharmony_ci#define BENCHMARK(func<...>) // Takes any number of parameters. 570a8c51b3fSopenharmony_ci#else // C++ < C++11 571a8c51b3fSopenharmony_ci#define BENCHMARK_TEMPLATE(func, arg1) 572a8c51b3fSopenharmony_ci#endif 573a8c51b3fSopenharmony_ci#define BENCHMARK_TEMPLATE1(func, arg1) 574a8c51b3fSopenharmony_ci#define BENCHMARK_TEMPLATE2(func, arg1, arg2) 575a8c51b3fSopenharmony_ci``` 576a8c51b3fSopenharmony_ci 577a8c51b3fSopenharmony_ci<a name="fixtures" /> 578a8c51b3fSopenharmony_ci 579a8c51b3fSopenharmony_ci## Fixtures 580a8c51b3fSopenharmony_ci 581a8c51b3fSopenharmony_ciFixture tests are created by first defining a type that derives from 582a8c51b3fSopenharmony_ci`::benchmark::Fixture` and then creating/registering the tests using the 583a8c51b3fSopenharmony_cifollowing macros: 584a8c51b3fSopenharmony_ci 585a8c51b3fSopenharmony_ci* `BENCHMARK_F(ClassName, Method)` 586a8c51b3fSopenharmony_ci* `BENCHMARK_DEFINE_F(ClassName, Method)` 587a8c51b3fSopenharmony_ci* `BENCHMARK_REGISTER_F(ClassName, Method)` 588a8c51b3fSopenharmony_ci 589a8c51b3fSopenharmony_ciFor Example: 590a8c51b3fSopenharmony_ci 591a8c51b3fSopenharmony_ci```c++ 592a8c51b3fSopenharmony_ciclass MyFixture : public benchmark::Fixture { 593a8c51b3fSopenharmony_cipublic: 594a8c51b3fSopenharmony_ci void SetUp(const ::benchmark::State& state) { 595a8c51b3fSopenharmony_ci } 596a8c51b3fSopenharmony_ci 597a8c51b3fSopenharmony_ci void TearDown(const ::benchmark::State& state) { 598a8c51b3fSopenharmony_ci } 599a8c51b3fSopenharmony_ci}; 600a8c51b3fSopenharmony_ci 601a8c51b3fSopenharmony_ciBENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 602a8c51b3fSopenharmony_ci for (auto _ : st) { 603a8c51b3fSopenharmony_ci ... 604a8c51b3fSopenharmony_ci } 605a8c51b3fSopenharmony_ci} 606a8c51b3fSopenharmony_ci 607a8c51b3fSopenharmony_ciBENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 608a8c51b3fSopenharmony_ci for (auto _ : st) { 609a8c51b3fSopenharmony_ci ... 610a8c51b3fSopenharmony_ci } 611a8c51b3fSopenharmony_ci} 612a8c51b3fSopenharmony_ci/* BarTest is NOT registered */ 613a8c51b3fSopenharmony_ciBENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 614a8c51b3fSopenharmony_ci/* BarTest is now registered */ 615a8c51b3fSopenharmony_ci``` 616a8c51b3fSopenharmony_ci 617a8c51b3fSopenharmony_ci### Templated Fixtures 618a8c51b3fSopenharmony_ci 619a8c51b3fSopenharmony_ciAlso you can create templated fixture by using the following macros: 620a8c51b3fSopenharmony_ci 621a8c51b3fSopenharmony_ci* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)` 622a8c51b3fSopenharmony_ci* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)` 623a8c51b3fSopenharmony_ci 624a8c51b3fSopenharmony_ciFor example: 625a8c51b3fSopenharmony_ci 626a8c51b3fSopenharmony_ci```c++ 627a8c51b3fSopenharmony_citemplate<typename T> 628a8c51b3fSopenharmony_ciclass MyFixture : public benchmark::Fixture {}; 629a8c51b3fSopenharmony_ci 630a8c51b3fSopenharmony_ciBENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) { 631a8c51b3fSopenharmony_ci for (auto _ : st) { 632a8c51b3fSopenharmony_ci ... 633a8c51b3fSopenharmony_ci } 634a8c51b3fSopenharmony_ci} 635a8c51b3fSopenharmony_ci 636a8c51b3fSopenharmony_ciBENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) { 637a8c51b3fSopenharmony_ci for (auto _ : st) { 638a8c51b3fSopenharmony_ci ... 639a8c51b3fSopenharmony_ci } 640a8c51b3fSopenharmony_ci} 641a8c51b3fSopenharmony_ci 642a8c51b3fSopenharmony_ciBENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2); 643a8c51b3fSopenharmony_ci``` 644a8c51b3fSopenharmony_ci 645a8c51b3fSopenharmony_ci<a name="custom-counters" /> 646a8c51b3fSopenharmony_ci 647a8c51b3fSopenharmony_ci## Custom Counters 648a8c51b3fSopenharmony_ci 649a8c51b3fSopenharmony_ciYou can add your own counters with user-defined names. The example below 650a8c51b3fSopenharmony_ciwill add columns "Foo", "Bar" and "Baz" in its output: 651a8c51b3fSopenharmony_ci 652a8c51b3fSopenharmony_ci```c++ 653a8c51b3fSopenharmony_cistatic void UserCountersExample1(benchmark::State& state) { 654a8c51b3fSopenharmony_ci double numFoos = 0, numBars = 0, numBazs = 0; 655a8c51b3fSopenharmony_ci for (auto _ : state) { 656a8c51b3fSopenharmony_ci // ... count Foo,Bar,Baz events 657a8c51b3fSopenharmony_ci } 658a8c51b3fSopenharmony_ci state.counters["Foo"] = numFoos; 659a8c51b3fSopenharmony_ci state.counters["Bar"] = numBars; 660a8c51b3fSopenharmony_ci state.counters["Baz"] = numBazs; 661a8c51b3fSopenharmony_ci} 662a8c51b3fSopenharmony_ci``` 663a8c51b3fSopenharmony_ci 664a8c51b3fSopenharmony_ciThe `state.counters` object is a `std::map` with `std::string` keys 665a8c51b3fSopenharmony_ciand `Counter` values. The latter is a `double`-like class, via an implicit 666a8c51b3fSopenharmony_ciconversion to `double&`. Thus you can use all of the standard arithmetic 667a8c51b3fSopenharmony_ciassignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. 668a8c51b3fSopenharmony_ci 669a8c51b3fSopenharmony_ciIn multithreaded benchmarks, each counter is set on the calling thread only. 670a8c51b3fSopenharmony_ciWhen the benchmark finishes, the counters from each thread will be summed; 671a8c51b3fSopenharmony_cithe resulting sum is the value which will be shown for the benchmark. 672a8c51b3fSopenharmony_ci 673a8c51b3fSopenharmony_ciThe `Counter` constructor accepts three parameters: the value as a `double` 674a8c51b3fSopenharmony_ci; a bit flag which allows you to show counters as rates, and/or as per-thread 675a8c51b3fSopenharmony_ciiteration, and/or as per-thread averages, and/or iteration invariants, 676a8c51b3fSopenharmony_ciand/or finally inverting the result; and a flag specifying the 'unit' - i.e. 677a8c51b3fSopenharmony_ciis 1k a 1000 (default, `benchmark::Counter::OneK::kIs1000`), or 1024 678a8c51b3fSopenharmony_ci(`benchmark::Counter::OneK::kIs1024`)? 679a8c51b3fSopenharmony_ci 680a8c51b3fSopenharmony_ci```c++ 681a8c51b3fSopenharmony_ci // sets a simple counter 682a8c51b3fSopenharmony_ci state.counters["Foo"] = numFoos; 683a8c51b3fSopenharmony_ci 684a8c51b3fSopenharmony_ci // Set the counter as a rate. It will be presented divided 685a8c51b3fSopenharmony_ci // by the duration of the benchmark. 686a8c51b3fSopenharmony_ci // Meaning: per one second, how many 'foo's are processed? 687a8c51b3fSopenharmony_ci state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); 688a8c51b3fSopenharmony_ci 689a8c51b3fSopenharmony_ci // Set the counter as a rate. It will be presented divided 690a8c51b3fSopenharmony_ci // by the duration of the benchmark, and the result inverted. 691a8c51b3fSopenharmony_ci // Meaning: how many seconds it takes to process one 'foo'? 692a8c51b3fSopenharmony_ci state.counters["FooInvRate"] = Counter(numFoos, benchmark::Counter::kIsRate | benchmark::Counter::kInvert); 693a8c51b3fSopenharmony_ci 694a8c51b3fSopenharmony_ci // Set the counter as a thread-average quantity. It will 695a8c51b3fSopenharmony_ci // be presented divided by the number of threads. 696a8c51b3fSopenharmony_ci state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); 697a8c51b3fSopenharmony_ci 698a8c51b3fSopenharmony_ci // There's also a combined flag: 699a8c51b3fSopenharmony_ci state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate); 700a8c51b3fSopenharmony_ci 701a8c51b3fSopenharmony_ci // This says that we process with the rate of state.range(0) bytes every iteration: 702a8c51b3fSopenharmony_ci state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); 703a8c51b3fSopenharmony_ci``` 704a8c51b3fSopenharmony_ci 705a8c51b3fSopenharmony_ciWhen you're compiling in C++11 mode or later you can use `insert()` with 706a8c51b3fSopenharmony_ci`std::initializer_list`: 707a8c51b3fSopenharmony_ci 708a8c51b3fSopenharmony_ci<!-- {% raw %} --> 709a8c51b3fSopenharmony_ci```c++ 710a8c51b3fSopenharmony_ci // With C++11, this can be done: 711a8c51b3fSopenharmony_ci state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); 712a8c51b3fSopenharmony_ci // ... instead of: 713a8c51b3fSopenharmony_ci state.counters["Foo"] = numFoos; 714a8c51b3fSopenharmony_ci state.counters["Bar"] = numBars; 715a8c51b3fSopenharmony_ci state.counters["Baz"] = numBazs; 716a8c51b3fSopenharmony_ci``` 717a8c51b3fSopenharmony_ci<!-- {% endraw %} --> 718a8c51b3fSopenharmony_ci 719a8c51b3fSopenharmony_ci### Counter Reporting 720a8c51b3fSopenharmony_ci 721a8c51b3fSopenharmony_ciWhen using the console reporter, by default, user counters are printed at 722a8c51b3fSopenharmony_cithe end after the table, the same way as ``bytes_processed`` and 723a8c51b3fSopenharmony_ci``items_processed``. This is best for cases in which there are few counters, 724a8c51b3fSopenharmony_cior where there are only a couple of lines per benchmark. Here's an example of 725a8c51b3fSopenharmony_cithe default output: 726a8c51b3fSopenharmony_ci 727a8c51b3fSopenharmony_ci``` 728a8c51b3fSopenharmony_ci------------------------------------------------------------------------------ 729a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations UserCounters... 730a8c51b3fSopenharmony_ci------------------------------------------------------------------------------ 731a8c51b3fSopenharmony_ciBM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8 732a8c51b3fSopenharmony_ciBM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m 733a8c51b3fSopenharmony_ciBM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2 734a8c51b3fSopenharmony_ciBM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4 735a8c51b3fSopenharmony_ciBM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8 736a8c51b3fSopenharmony_ciBM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16 737a8c51b3fSopenharmony_ciBM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32 738a8c51b3fSopenharmony_ciBM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4 739a8c51b3fSopenharmony_ciBM_Factorial 26 ns 26 ns 26608979 40320 740a8c51b3fSopenharmony_ciBM_Factorial/real_time 26 ns 26 ns 26587936 40320 741a8c51b3fSopenharmony_ciBM_CalculatePiRange/1 16 ns 16 ns 45704255 0 742a8c51b3fSopenharmony_ciBM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 743a8c51b3fSopenharmony_ciBM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 744a8c51b3fSopenharmony_ciBM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 745a8c51b3fSopenharmony_ci``` 746a8c51b3fSopenharmony_ci 747a8c51b3fSopenharmony_ciIf this doesn't suit you, you can print each counter as a table column by 748a8c51b3fSopenharmony_cipassing the flag `--benchmark_counters_tabular=true` to the benchmark 749a8c51b3fSopenharmony_ciapplication. This is best for cases in which there are a lot of counters, or 750a8c51b3fSopenharmony_cia lot of lines per individual benchmark. Note that this will trigger a 751a8c51b3fSopenharmony_cireprinting of the table header any time the counter set changes between 752a8c51b3fSopenharmony_ciindividual benchmarks. Here's an example of corresponding output when 753a8c51b3fSopenharmony_ci`--benchmark_counters_tabular=true` is passed: 754a8c51b3fSopenharmony_ci 755a8c51b3fSopenharmony_ci``` 756a8c51b3fSopenharmony_ci--------------------------------------------------------------------------------------- 757a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations Bar Bat Baz Foo 758a8c51b3fSopenharmony_ci--------------------------------------------------------------------------------------- 759a8c51b3fSopenharmony_ciBM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8 760a8c51b3fSopenharmony_ciBM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1 761a8c51b3fSopenharmony_ciBM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2 762a8c51b3fSopenharmony_ciBM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4 763a8c51b3fSopenharmony_ciBM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8 764a8c51b3fSopenharmony_ciBM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16 765a8c51b3fSopenharmony_ciBM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32 766a8c51b3fSopenharmony_ciBM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4 767a8c51b3fSopenharmony_ci-------------------------------------------------------------- 768a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 769a8c51b3fSopenharmony_ci-------------------------------------------------------------- 770a8c51b3fSopenharmony_ciBM_Factorial 26 ns 26 ns 26392245 40320 771a8c51b3fSopenharmony_ciBM_Factorial/real_time 26 ns 26 ns 26494107 40320 772a8c51b3fSopenharmony_ciBM_CalculatePiRange/1 15 ns 15 ns 45571597 0 773a8c51b3fSopenharmony_ciBM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 774a8c51b3fSopenharmony_ciBM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 775a8c51b3fSopenharmony_ciBM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 776a8c51b3fSopenharmony_ciBM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 777a8c51b3fSopenharmony_ciBM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 778a8c51b3fSopenharmony_ciBM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 779a8c51b3fSopenharmony_ciBM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 780a8c51b3fSopenharmony_ciBM_CalculatePi/threads:8 2255 ns 9943 ns 70936 781a8c51b3fSopenharmony_ci``` 782a8c51b3fSopenharmony_ci 783a8c51b3fSopenharmony_ciNote above the additional header printed when the benchmark changes from 784a8c51b3fSopenharmony_ci``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does 785a8c51b3fSopenharmony_cinot have the same counter set as ``BM_UserCounter``. 786a8c51b3fSopenharmony_ci 787a8c51b3fSopenharmony_ci<a name="multithreaded-benchmarks"/> 788a8c51b3fSopenharmony_ci 789a8c51b3fSopenharmony_ci## Multithreaded Benchmarks 790a8c51b3fSopenharmony_ci 791a8c51b3fSopenharmony_ciIn a multithreaded test (benchmark invoked by multiple threads simultaneously), 792a8c51b3fSopenharmony_ciit is guaranteed that none of the threads will start until all have reached 793a8c51b3fSopenharmony_cithe start of the benchmark loop, and all will have finished before any thread 794a8c51b3fSopenharmony_ciexits the benchmark loop. (This behavior is also provided by the `KeepRunning()` 795a8c51b3fSopenharmony_ciAPI) As such, any global setup or teardown can be wrapped in a check against the thread 796a8c51b3fSopenharmony_ciindex: 797a8c51b3fSopenharmony_ci 798a8c51b3fSopenharmony_ci```c++ 799a8c51b3fSopenharmony_cistatic void BM_MultiThreaded(benchmark::State& state) { 800a8c51b3fSopenharmony_ci if (state.thread_index() == 0) { 801a8c51b3fSopenharmony_ci // Setup code here. 802a8c51b3fSopenharmony_ci } 803a8c51b3fSopenharmony_ci for (auto _ : state) { 804a8c51b3fSopenharmony_ci // Run the test as normal. 805a8c51b3fSopenharmony_ci } 806a8c51b3fSopenharmony_ci if (state.thread_index() == 0) { 807a8c51b3fSopenharmony_ci // Teardown code here. 808a8c51b3fSopenharmony_ci } 809a8c51b3fSopenharmony_ci} 810a8c51b3fSopenharmony_ciBENCHMARK(BM_MultiThreaded)->Threads(2); 811a8c51b3fSopenharmony_ci``` 812a8c51b3fSopenharmony_ci 813a8c51b3fSopenharmony_ciTo run the benchmark across a range of thread counts, instead of `Threads`, use 814a8c51b3fSopenharmony_ci`ThreadRange`. This takes two parameters (`min_threads` and `max_threads`) and 815a8c51b3fSopenharmony_ciruns the benchmark once for values in the inclusive range. For example: 816a8c51b3fSopenharmony_ci 817a8c51b3fSopenharmony_ci```c++ 818a8c51b3fSopenharmony_ciBENCHMARK(BM_MultiThreaded)->ThreadRange(1, 8); 819a8c51b3fSopenharmony_ci``` 820a8c51b3fSopenharmony_ci 821a8c51b3fSopenharmony_ciwill run `BM_MultiThreaded` with thread counts 1, 2, 4, and 8. 822a8c51b3fSopenharmony_ci 823a8c51b3fSopenharmony_ciIf the benchmarked code itself uses threads and you want to compare it to 824a8c51b3fSopenharmony_cisingle-threaded code, you may want to use real-time ("wallclock") measurements 825a8c51b3fSopenharmony_cifor latency comparisons: 826a8c51b3fSopenharmony_ci 827a8c51b3fSopenharmony_ci```c++ 828a8c51b3fSopenharmony_ciBENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 829a8c51b3fSopenharmony_ci``` 830a8c51b3fSopenharmony_ci 831a8c51b3fSopenharmony_ciWithout `UseRealTime`, CPU time is used by default. 832a8c51b3fSopenharmony_ci 833a8c51b3fSopenharmony_ci<a name="cpu-timers" /> 834a8c51b3fSopenharmony_ci 835a8c51b3fSopenharmony_ci## CPU Timers 836a8c51b3fSopenharmony_ci 837a8c51b3fSopenharmony_ciBy default, the CPU timer only measures the time spent by the main thread. 838a8c51b3fSopenharmony_ciIf the benchmark itself uses threads internally, this measurement may not 839a8c51b3fSopenharmony_cibe what you are looking for. Instead, there is a way to measure the total 840a8c51b3fSopenharmony_ciCPU usage of the process, by all the threads. 841a8c51b3fSopenharmony_ci 842a8c51b3fSopenharmony_ci```c++ 843a8c51b3fSopenharmony_civoid callee(int i); 844a8c51b3fSopenharmony_ci 845a8c51b3fSopenharmony_cistatic void MyMain(int size) { 846a8c51b3fSopenharmony_ci#pragma omp parallel for 847a8c51b3fSopenharmony_ci for(int i = 0; i < size; i++) 848a8c51b3fSopenharmony_ci callee(i); 849a8c51b3fSopenharmony_ci} 850a8c51b3fSopenharmony_ci 851a8c51b3fSopenharmony_cistatic void BM_OpenMP(benchmark::State& state) { 852a8c51b3fSopenharmony_ci for (auto _ : state) 853a8c51b3fSopenharmony_ci MyMain(state.range(0)); 854a8c51b3fSopenharmony_ci} 855a8c51b3fSopenharmony_ci 856a8c51b3fSopenharmony_ci// Measure the time spent by the main thread, use it to decide for how long to 857a8c51b3fSopenharmony_ci// run the benchmark loop. Depending on the internal implementation detail may 858a8c51b3fSopenharmony_ci// measure to anywhere from near-zero (the overhead spent before/after work 859a8c51b3fSopenharmony_ci// handoff to worker thread[s]) to the whole single-thread time. 860a8c51b3fSopenharmony_ciBENCHMARK(BM_OpenMP)->Range(8, 8<<10); 861a8c51b3fSopenharmony_ci 862a8c51b3fSopenharmony_ci// Measure the user-visible time, the wall clock (literally, the time that 863a8c51b3fSopenharmony_ci// has passed on the clock on the wall), use it to decide for how long to 864a8c51b3fSopenharmony_ci// run the benchmark loop. This will always be meaningful, and will match the 865a8c51b3fSopenharmony_ci// time spent by the main thread in single-threaded case, in general decreasing 866a8c51b3fSopenharmony_ci// with the number of internal threads doing the work. 867a8c51b3fSopenharmony_ciBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->UseRealTime(); 868a8c51b3fSopenharmony_ci 869a8c51b3fSopenharmony_ci// Measure the total CPU consumption, use it to decide for how long to 870a8c51b3fSopenharmony_ci// run the benchmark loop. This will always measure to no less than the 871a8c51b3fSopenharmony_ci// time spent by the main thread in single-threaded case. 872a8c51b3fSopenharmony_ciBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime(); 873a8c51b3fSopenharmony_ci 874a8c51b3fSopenharmony_ci// A mixture of the last two. Measure the total CPU consumption, but use the 875a8c51b3fSopenharmony_ci// wall clock to decide for how long to run the benchmark loop. 876a8c51b3fSopenharmony_ciBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime()->UseRealTime(); 877a8c51b3fSopenharmony_ci``` 878a8c51b3fSopenharmony_ci 879a8c51b3fSopenharmony_ci### Controlling Timers 880a8c51b3fSopenharmony_ci 881a8c51b3fSopenharmony_ciNormally, the entire duration of the work loop (`for (auto _ : state) {}`) 882a8c51b3fSopenharmony_ciis measured. But sometimes, it is necessary to do some work inside of 883a8c51b3fSopenharmony_cithat loop, every iteration, but without counting that time to the benchmark time. 884a8c51b3fSopenharmony_ciThat is possible, although it is not recommended, since it has high overhead. 885a8c51b3fSopenharmony_ci 886a8c51b3fSopenharmony_ci<!-- {% raw %} --> 887a8c51b3fSopenharmony_ci```c++ 888a8c51b3fSopenharmony_cistatic void BM_SetInsert_With_Timer_Control(benchmark::State& state) { 889a8c51b3fSopenharmony_ci std::set<int> data; 890a8c51b3fSopenharmony_ci for (auto _ : state) { 891a8c51b3fSopenharmony_ci state.PauseTiming(); // Stop timers. They will not count until they are resumed. 892a8c51b3fSopenharmony_ci data = ConstructRandomSet(state.range(0)); // Do something that should not be measured 893a8c51b3fSopenharmony_ci state.ResumeTiming(); // And resume timers. They are now counting again. 894a8c51b3fSopenharmony_ci // The rest will be measured. 895a8c51b3fSopenharmony_ci for (int j = 0; j < state.range(1); ++j) 896a8c51b3fSopenharmony_ci data.insert(RandomNumber()); 897a8c51b3fSopenharmony_ci } 898a8c51b3fSopenharmony_ci} 899a8c51b3fSopenharmony_ciBENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}}); 900a8c51b3fSopenharmony_ci``` 901a8c51b3fSopenharmony_ci<!-- {% endraw %} --> 902a8c51b3fSopenharmony_ci 903a8c51b3fSopenharmony_ci<a name="manual-timing" /> 904a8c51b3fSopenharmony_ci 905a8c51b3fSopenharmony_ci## Manual Timing 906a8c51b3fSopenharmony_ci 907a8c51b3fSopenharmony_ciFor benchmarking something for which neither CPU time nor real-time are 908a8c51b3fSopenharmony_cicorrect or accurate enough, completely manual timing is supported using 909a8c51b3fSopenharmony_cithe `UseManualTime` function. 910a8c51b3fSopenharmony_ci 911a8c51b3fSopenharmony_ciWhen `UseManualTime` is used, the benchmarked code must call 912a8c51b3fSopenharmony_ci`SetIterationTime` once per iteration of the benchmark loop to 913a8c51b3fSopenharmony_cireport the manually measured time. 914a8c51b3fSopenharmony_ci 915a8c51b3fSopenharmony_ciAn example use case for this is benchmarking GPU execution (e.g. OpenCL 916a8c51b3fSopenharmony_cior CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot 917a8c51b3fSopenharmony_cibe accurately measured using CPU time or real-time. Instead, they can be 918a8c51b3fSopenharmony_cimeasured accurately using a dedicated API, and these measurement results 919a8c51b3fSopenharmony_cican be reported back with `SetIterationTime`. 920a8c51b3fSopenharmony_ci 921a8c51b3fSopenharmony_ci```c++ 922a8c51b3fSopenharmony_cistatic void BM_ManualTiming(benchmark::State& state) { 923a8c51b3fSopenharmony_ci int microseconds = state.range(0); 924a8c51b3fSopenharmony_ci std::chrono::duration<double, std::micro> sleep_duration { 925a8c51b3fSopenharmony_ci static_cast<double>(microseconds) 926a8c51b3fSopenharmony_ci }; 927a8c51b3fSopenharmony_ci 928a8c51b3fSopenharmony_ci for (auto _ : state) { 929a8c51b3fSopenharmony_ci auto start = std::chrono::high_resolution_clock::now(); 930a8c51b3fSopenharmony_ci // Simulate some useful workload with a sleep 931a8c51b3fSopenharmony_ci std::this_thread::sleep_for(sleep_duration); 932a8c51b3fSopenharmony_ci auto end = std::chrono::high_resolution_clock::now(); 933a8c51b3fSopenharmony_ci 934a8c51b3fSopenharmony_ci auto elapsed_seconds = 935a8c51b3fSopenharmony_ci std::chrono::duration_cast<std::chrono::duration<double>>( 936a8c51b3fSopenharmony_ci end - start); 937a8c51b3fSopenharmony_ci 938a8c51b3fSopenharmony_ci state.SetIterationTime(elapsed_seconds.count()); 939a8c51b3fSopenharmony_ci } 940a8c51b3fSopenharmony_ci} 941a8c51b3fSopenharmony_ciBENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); 942a8c51b3fSopenharmony_ci``` 943a8c51b3fSopenharmony_ci 944a8c51b3fSopenharmony_ci<a name="setting-the-time-unit" /> 945a8c51b3fSopenharmony_ci 946a8c51b3fSopenharmony_ci## Setting the Time Unit 947a8c51b3fSopenharmony_ci 948a8c51b3fSopenharmony_ciIf a benchmark runs a few milliseconds it may be hard to visually compare the 949a8c51b3fSopenharmony_cimeasured times, since the output data is given in nanoseconds per default. In 950a8c51b3fSopenharmony_ciorder to manually set the time unit, you can specify it manually: 951a8c51b3fSopenharmony_ci 952a8c51b3fSopenharmony_ci```c++ 953a8c51b3fSopenharmony_ciBENCHMARK(BM_test)->Unit(benchmark::kMillisecond); 954a8c51b3fSopenharmony_ci``` 955a8c51b3fSopenharmony_ci 956a8c51b3fSopenharmony_ciAdditionally the default time unit can be set globally with the 957a8c51b3fSopenharmony_ci`--benchmark_time_unit={ns|us|ms|s}` command line argument. The argument only 958a8c51b3fSopenharmony_ciaffects benchmarks where the time unit is not set explicitly. 959a8c51b3fSopenharmony_ci 960a8c51b3fSopenharmony_ci<a name="preventing-optimization" /> 961a8c51b3fSopenharmony_ci 962a8c51b3fSopenharmony_ci## Preventing Optimization 963a8c51b3fSopenharmony_ci 964a8c51b3fSopenharmony_ciTo prevent a value or expression from being optimized away by the compiler 965a8c51b3fSopenharmony_cithe `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` 966a8c51b3fSopenharmony_cifunctions can be used. 967a8c51b3fSopenharmony_ci 968a8c51b3fSopenharmony_ci```c++ 969a8c51b3fSopenharmony_cistatic void BM_test(benchmark::State& state) { 970a8c51b3fSopenharmony_ci for (auto _ : state) { 971a8c51b3fSopenharmony_ci int x = 0; 972a8c51b3fSopenharmony_ci for (int i=0; i < 64; ++i) { 973a8c51b3fSopenharmony_ci benchmark::DoNotOptimize(x += i); 974a8c51b3fSopenharmony_ci } 975a8c51b3fSopenharmony_ci } 976a8c51b3fSopenharmony_ci} 977a8c51b3fSopenharmony_ci``` 978a8c51b3fSopenharmony_ci 979a8c51b3fSopenharmony_ci`DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either 980a8c51b3fSopenharmony_cimemory or a register. For GNU based compilers it acts as read/write barrier 981a8c51b3fSopenharmony_cifor global memory. More specifically it forces the compiler to flush pending 982a8c51b3fSopenharmony_ciwrites to memory and reload any other values as necessary. 983a8c51b3fSopenharmony_ci 984a8c51b3fSopenharmony_ciNote that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` 985a8c51b3fSopenharmony_ciin any way. `<expr>` may even be removed entirely when the result is already 986a8c51b3fSopenharmony_ciknown. For example: 987a8c51b3fSopenharmony_ci 988a8c51b3fSopenharmony_ci```c++ 989a8c51b3fSopenharmony_ci /* Example 1: `<expr>` is removed entirely. */ 990a8c51b3fSopenharmony_ci int foo(int x) { return x + 42; } 991a8c51b3fSopenharmony_ci while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); 992a8c51b3fSopenharmony_ci 993a8c51b3fSopenharmony_ci /* Example 2: Result of '<expr>' is only reused */ 994a8c51b3fSopenharmony_ci int bar(int) __attribute__((const)); 995a8c51b3fSopenharmony_ci while (...) DoNotOptimize(bar(0)); // Optimized to: 996a8c51b3fSopenharmony_ci // int __result__ = bar(0); 997a8c51b3fSopenharmony_ci // while (...) DoNotOptimize(__result__); 998a8c51b3fSopenharmony_ci``` 999a8c51b3fSopenharmony_ci 1000a8c51b3fSopenharmony_ciThe second tool for preventing optimizations is `ClobberMemory()`. In essence 1001a8c51b3fSopenharmony_ci`ClobberMemory()` forces the compiler to perform all pending writes to global 1002a8c51b3fSopenharmony_cimemory. Memory managed by block scope objects must be "escaped" using 1003a8c51b3fSopenharmony_ci`DoNotOptimize(...)` before it can be clobbered. In the below example 1004a8c51b3fSopenharmony_ci`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized 1005a8c51b3fSopenharmony_ciaway. 1006a8c51b3fSopenharmony_ci 1007a8c51b3fSopenharmony_ci```c++ 1008a8c51b3fSopenharmony_cistatic void BM_vector_push_back(benchmark::State& state) { 1009a8c51b3fSopenharmony_ci for (auto _ : state) { 1010a8c51b3fSopenharmony_ci std::vector<int> v; 1011a8c51b3fSopenharmony_ci v.reserve(1); 1012a8c51b3fSopenharmony_ci auto data = v.data(); // Allow v.data() to be clobbered. Pass as non-const 1013a8c51b3fSopenharmony_ci benchmark::DoNotOptimize(data); // lvalue to avoid undesired compiler optimizations 1014a8c51b3fSopenharmony_ci v.push_back(42); 1015a8c51b3fSopenharmony_ci benchmark::ClobberMemory(); // Force 42 to be written to memory. 1016a8c51b3fSopenharmony_ci } 1017a8c51b3fSopenharmony_ci} 1018a8c51b3fSopenharmony_ci``` 1019a8c51b3fSopenharmony_ci 1020a8c51b3fSopenharmony_ciNote that `ClobberMemory()` is only available for GNU or MSVC based compilers. 1021a8c51b3fSopenharmony_ci 1022a8c51b3fSopenharmony_ci<a name="reporting-statistics" /> 1023a8c51b3fSopenharmony_ci 1024a8c51b3fSopenharmony_ci## Statistics: Reporting the Mean, Median and Standard Deviation / Coefficient of variation of Repeated Benchmarks 1025a8c51b3fSopenharmony_ci 1026a8c51b3fSopenharmony_ciBy default each benchmark is run once and that single result is reported. 1027a8c51b3fSopenharmony_ciHowever benchmarks are often noisy and a single result may not be representative 1028a8c51b3fSopenharmony_ciof the overall behavior. For this reason it's possible to repeatedly rerun the 1029a8c51b3fSopenharmony_cibenchmark. 1030a8c51b3fSopenharmony_ci 1031a8c51b3fSopenharmony_ciThe number of runs of each benchmark is specified globally by the 1032a8c51b3fSopenharmony_ci`--benchmark_repetitions` flag or on a per benchmark basis by calling 1033a8c51b3fSopenharmony_ci`Repetitions` on the registered benchmark object. When a benchmark is run more 1034a8c51b3fSopenharmony_cithan once the mean, median, standard deviation and coefficient of variation 1035a8c51b3fSopenharmony_ciof the runs will be reported. 1036a8c51b3fSopenharmony_ci 1037a8c51b3fSopenharmony_ciAdditionally the `--benchmark_report_aggregates_only={true|false}`, 1038a8c51b3fSopenharmony_ci`--benchmark_display_aggregates_only={true|false}` flags or 1039a8c51b3fSopenharmony_ci`ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be 1040a8c51b3fSopenharmony_ciused to change how repeated tests are reported. By default the result of each 1041a8c51b3fSopenharmony_cirepeated run is reported. When `report aggregates only` option is `true`, 1042a8c51b3fSopenharmony_cionly the aggregates (i.e. mean, median, standard deviation and coefficient 1043a8c51b3fSopenharmony_ciof variation, maybe complexity measurements if they were requested) of the runs 1044a8c51b3fSopenharmony_ciis reported, to both the reporters - standard output (console), and the file. 1045a8c51b3fSopenharmony_ciHowever when only the `display aggregates only` option is `true`, 1046a8c51b3fSopenharmony_cionly the aggregates are displayed in the standard output, while the file 1047a8c51b3fSopenharmony_cioutput still contains everything. 1048a8c51b3fSopenharmony_ciCalling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a 1049a8c51b3fSopenharmony_ciregistered benchmark object overrides the value of the appropriate flag for that 1050a8c51b3fSopenharmony_cibenchmark. 1051a8c51b3fSopenharmony_ci 1052a8c51b3fSopenharmony_ci<a name="custom-statistics" /> 1053a8c51b3fSopenharmony_ci 1054a8c51b3fSopenharmony_ci## Custom Statistics 1055a8c51b3fSopenharmony_ci 1056a8c51b3fSopenharmony_ciWhile having these aggregates is nice, this may not be enough for everyone. 1057a8c51b3fSopenharmony_ciFor example you may want to know what the largest observation is, e.g. because 1058a8c51b3fSopenharmony_ciyou have some real-time constraints. This is easy. The following code will 1059a8c51b3fSopenharmony_cispecify a custom statistic to be calculated, defined by a lambda function. 1060a8c51b3fSopenharmony_ci 1061a8c51b3fSopenharmony_ci```c++ 1062a8c51b3fSopenharmony_civoid BM_spin_empty(benchmark::State& state) { 1063a8c51b3fSopenharmony_ci for (auto _ : state) { 1064a8c51b3fSopenharmony_ci for (int x = 0; x < state.range(0); ++x) { 1065a8c51b3fSopenharmony_ci benchmark::DoNotOptimize(x); 1066a8c51b3fSopenharmony_ci } 1067a8c51b3fSopenharmony_ci } 1068a8c51b3fSopenharmony_ci} 1069a8c51b3fSopenharmony_ci 1070a8c51b3fSopenharmony_ciBENCHMARK(BM_spin_empty) 1071a8c51b3fSopenharmony_ci ->ComputeStatistics("max", [](const std::vector<double>& v) -> double { 1072a8c51b3fSopenharmony_ci return *(std::max_element(std::begin(v), std::end(v))); 1073a8c51b3fSopenharmony_ci }) 1074a8c51b3fSopenharmony_ci ->Arg(512); 1075a8c51b3fSopenharmony_ci``` 1076a8c51b3fSopenharmony_ci 1077a8c51b3fSopenharmony_ciWhile usually the statistics produce values in time units, 1078a8c51b3fSopenharmony_ciyou can also produce percentages: 1079a8c51b3fSopenharmony_ci 1080a8c51b3fSopenharmony_ci```c++ 1081a8c51b3fSopenharmony_civoid BM_spin_empty(benchmark::State& state) { 1082a8c51b3fSopenharmony_ci for (auto _ : state) { 1083a8c51b3fSopenharmony_ci for (int x = 0; x < state.range(0); ++x) { 1084a8c51b3fSopenharmony_ci benchmark::DoNotOptimize(x); 1085a8c51b3fSopenharmony_ci } 1086a8c51b3fSopenharmony_ci } 1087a8c51b3fSopenharmony_ci} 1088a8c51b3fSopenharmony_ci 1089a8c51b3fSopenharmony_ciBENCHMARK(BM_spin_empty) 1090a8c51b3fSopenharmony_ci ->ComputeStatistics("ratio", [](const std::vector<double>& v) -> double { 1091a8c51b3fSopenharmony_ci return std::begin(v) / std::end(v); 1092a8c51b3fSopenharmony_ci }, benchmark::StatisticUnit::kPercentage) 1093a8c51b3fSopenharmony_ci ->Arg(512); 1094a8c51b3fSopenharmony_ci``` 1095a8c51b3fSopenharmony_ci 1096a8c51b3fSopenharmony_ci<a name="memory-usage" /> 1097a8c51b3fSopenharmony_ci 1098a8c51b3fSopenharmony_ci## Memory Usage 1099a8c51b3fSopenharmony_ci 1100a8c51b3fSopenharmony_ciIt's often useful to also track memory usage for benchmarks, alongside CPU 1101a8c51b3fSopenharmony_ciperformance. For this reason, benchmark offers the `RegisterMemoryManager` 1102a8c51b3fSopenharmony_cimethod that allows a custom `MemoryManager` to be injected. 1103a8c51b3fSopenharmony_ci 1104a8c51b3fSopenharmony_ciIf set, the `MemoryManager::Start` and `MemoryManager::Stop` methods will be 1105a8c51b3fSopenharmony_cicalled at the start and end of benchmark runs to allow user code to fill out 1106a8c51b3fSopenharmony_cia report on the number of allocations, bytes used, etc. 1107a8c51b3fSopenharmony_ci 1108a8c51b3fSopenharmony_ciThis data will then be reported alongside other performance data, currently 1109a8c51b3fSopenharmony_cionly when using JSON output. 1110a8c51b3fSopenharmony_ci 1111a8c51b3fSopenharmony_ci<a name="using-register-benchmark" /> 1112a8c51b3fSopenharmony_ci 1113a8c51b3fSopenharmony_ci## Using RegisterBenchmark(name, fn, args...) 1114a8c51b3fSopenharmony_ci 1115a8c51b3fSopenharmony_ciThe `RegisterBenchmark(name, func, args...)` function provides an alternative 1116a8c51b3fSopenharmony_ciway to create and register benchmarks. 1117a8c51b3fSopenharmony_ci`RegisterBenchmark(name, func, args...)` creates, registers, and returns a 1118a8c51b3fSopenharmony_cipointer to a new benchmark with the specified `name` that invokes 1119a8c51b3fSopenharmony_ci`func(st, args...)` where `st` is a `benchmark::State` object. 1120a8c51b3fSopenharmony_ci 1121a8c51b3fSopenharmony_ciUnlike the `BENCHMARK` registration macros, which can only be used at the global 1122a8c51b3fSopenharmony_ciscope, the `RegisterBenchmark` can be called anywhere. This allows for 1123a8c51b3fSopenharmony_cibenchmark tests to be registered programmatically. 1124a8c51b3fSopenharmony_ci 1125a8c51b3fSopenharmony_ciAdditionally `RegisterBenchmark` allows any callable object to be registered 1126a8c51b3fSopenharmony_cias a benchmark. Including capturing lambdas and function objects. 1127a8c51b3fSopenharmony_ci 1128a8c51b3fSopenharmony_ciFor Example: 1129a8c51b3fSopenharmony_ci```c++ 1130a8c51b3fSopenharmony_ciauto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; 1131a8c51b3fSopenharmony_ci 1132a8c51b3fSopenharmony_ciint main(int argc, char** argv) { 1133a8c51b3fSopenharmony_ci for (auto& test_input : { /* ... */ }) 1134a8c51b3fSopenharmony_ci benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); 1135a8c51b3fSopenharmony_ci benchmark::Initialize(&argc, argv); 1136a8c51b3fSopenharmony_ci benchmark::RunSpecifiedBenchmarks(); 1137a8c51b3fSopenharmony_ci benchmark::Shutdown(); 1138a8c51b3fSopenharmony_ci} 1139a8c51b3fSopenharmony_ci``` 1140a8c51b3fSopenharmony_ci 1141a8c51b3fSopenharmony_ci<a name="exiting-with-an-error" /> 1142a8c51b3fSopenharmony_ci 1143a8c51b3fSopenharmony_ci## Exiting with an Error 1144a8c51b3fSopenharmony_ci 1145a8c51b3fSopenharmony_ciWhen errors caused by external influences, such as file I/O and network 1146a8c51b3fSopenharmony_cicommunication, occur within a benchmark the 1147a8c51b3fSopenharmony_ci`State::SkipWithError(const std::string& msg)` function can be used to skip that run 1148a8c51b3fSopenharmony_ciof benchmark and report the error. Note that only future iterations of the 1149a8c51b3fSopenharmony_ci`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop 1150a8c51b3fSopenharmony_ciUsers must explicitly exit the loop, otherwise all iterations will be performed. 1151a8c51b3fSopenharmony_ciUsers may explicitly return to exit the benchmark immediately. 1152a8c51b3fSopenharmony_ci 1153a8c51b3fSopenharmony_ciThe `SkipWithError(...)` function may be used at any point within the benchmark, 1154a8c51b3fSopenharmony_ciincluding before and after the benchmark loop. Moreover, if `SkipWithError(...)` 1155a8c51b3fSopenharmony_cihas been used, it is not required to reach the benchmark loop and one may return 1156a8c51b3fSopenharmony_cifrom the benchmark function early. 1157a8c51b3fSopenharmony_ci 1158a8c51b3fSopenharmony_ciFor example: 1159a8c51b3fSopenharmony_ci 1160a8c51b3fSopenharmony_ci```c++ 1161a8c51b3fSopenharmony_cistatic void BM_test(benchmark::State& state) { 1162a8c51b3fSopenharmony_ci auto resource = GetResource(); 1163a8c51b3fSopenharmony_ci if (!resource.good()) { 1164a8c51b3fSopenharmony_ci state.SkipWithError("Resource is not good!"); 1165a8c51b3fSopenharmony_ci // KeepRunning() loop will not be entered. 1166a8c51b3fSopenharmony_ci } 1167a8c51b3fSopenharmony_ci while (state.KeepRunning()) { 1168a8c51b3fSopenharmony_ci auto data = resource.read_data(); 1169a8c51b3fSopenharmony_ci if (!resource.good()) { 1170a8c51b3fSopenharmony_ci state.SkipWithError("Failed to read data!"); 1171a8c51b3fSopenharmony_ci break; // Needed to skip the rest of the iteration. 1172a8c51b3fSopenharmony_ci } 1173a8c51b3fSopenharmony_ci do_stuff(data); 1174a8c51b3fSopenharmony_ci } 1175a8c51b3fSopenharmony_ci} 1176a8c51b3fSopenharmony_ci 1177a8c51b3fSopenharmony_cistatic void BM_test_ranged_fo(benchmark::State & state) { 1178a8c51b3fSopenharmony_ci auto resource = GetResource(); 1179a8c51b3fSopenharmony_ci if (!resource.good()) { 1180a8c51b3fSopenharmony_ci state.SkipWithError("Resource is not good!"); 1181a8c51b3fSopenharmony_ci return; // Early return is allowed when SkipWithError() has been used. 1182a8c51b3fSopenharmony_ci } 1183a8c51b3fSopenharmony_ci for (auto _ : state) { 1184a8c51b3fSopenharmony_ci auto data = resource.read_data(); 1185a8c51b3fSopenharmony_ci if (!resource.good()) { 1186a8c51b3fSopenharmony_ci state.SkipWithError("Failed to read data!"); 1187a8c51b3fSopenharmony_ci break; // REQUIRED to prevent all further iterations. 1188a8c51b3fSopenharmony_ci } 1189a8c51b3fSopenharmony_ci do_stuff(data); 1190a8c51b3fSopenharmony_ci } 1191a8c51b3fSopenharmony_ci} 1192a8c51b3fSopenharmony_ci``` 1193a8c51b3fSopenharmony_ci<a name="a-faster-keep-running-loop" /> 1194a8c51b3fSopenharmony_ci 1195a8c51b3fSopenharmony_ci## A Faster KeepRunning Loop 1196a8c51b3fSopenharmony_ci 1197a8c51b3fSopenharmony_ciIn C++11 mode, a ranged-based for loop should be used in preference to 1198a8c51b3fSopenharmony_cithe `KeepRunning` loop for running the benchmarks. For example: 1199a8c51b3fSopenharmony_ci 1200a8c51b3fSopenharmony_ci```c++ 1201a8c51b3fSopenharmony_cistatic void BM_Fast(benchmark::State &state) { 1202a8c51b3fSopenharmony_ci for (auto _ : state) { 1203a8c51b3fSopenharmony_ci FastOperation(); 1204a8c51b3fSopenharmony_ci } 1205a8c51b3fSopenharmony_ci} 1206a8c51b3fSopenharmony_ciBENCHMARK(BM_Fast); 1207a8c51b3fSopenharmony_ci``` 1208a8c51b3fSopenharmony_ci 1209a8c51b3fSopenharmony_ciThe reason the ranged-for loop is faster than using `KeepRunning`, is 1210a8c51b3fSopenharmony_cibecause `KeepRunning` requires a memory load and store of the iteration count 1211a8c51b3fSopenharmony_ciever iteration, whereas the ranged-for variant is able to keep the iteration count 1212a8c51b3fSopenharmony_ciin a register. 1213a8c51b3fSopenharmony_ci 1214a8c51b3fSopenharmony_ciFor example, an empty inner loop of using the ranged-based for method looks like: 1215a8c51b3fSopenharmony_ci 1216a8c51b3fSopenharmony_ci```asm 1217a8c51b3fSopenharmony_ci# Loop Init 1218a8c51b3fSopenharmony_ci mov rbx, qword ptr [r14 + 104] 1219a8c51b3fSopenharmony_ci call benchmark::State::StartKeepRunning() 1220a8c51b3fSopenharmony_ci test rbx, rbx 1221a8c51b3fSopenharmony_ci je .LoopEnd 1222a8c51b3fSopenharmony_ci.LoopHeader: # =>This Inner Loop Header: Depth=1 1223a8c51b3fSopenharmony_ci add rbx, -1 1224a8c51b3fSopenharmony_ci jne .LoopHeader 1225a8c51b3fSopenharmony_ci.LoopEnd: 1226a8c51b3fSopenharmony_ci``` 1227a8c51b3fSopenharmony_ci 1228a8c51b3fSopenharmony_ciCompared to an empty `KeepRunning` loop, which looks like: 1229a8c51b3fSopenharmony_ci 1230a8c51b3fSopenharmony_ci```asm 1231a8c51b3fSopenharmony_ci.LoopHeader: # in Loop: Header=BB0_3 Depth=1 1232a8c51b3fSopenharmony_ci cmp byte ptr [rbx], 1 1233a8c51b3fSopenharmony_ci jne .LoopInit 1234a8c51b3fSopenharmony_ci.LoopBody: # =>This Inner Loop Header: Depth=1 1235a8c51b3fSopenharmony_ci mov rax, qword ptr [rbx + 8] 1236a8c51b3fSopenharmony_ci lea rcx, [rax + 1] 1237a8c51b3fSopenharmony_ci mov qword ptr [rbx + 8], rcx 1238a8c51b3fSopenharmony_ci cmp rax, qword ptr [rbx + 104] 1239a8c51b3fSopenharmony_ci jb .LoopHeader 1240a8c51b3fSopenharmony_ci jmp .LoopEnd 1241a8c51b3fSopenharmony_ci.LoopInit: 1242a8c51b3fSopenharmony_ci mov rdi, rbx 1243a8c51b3fSopenharmony_ci call benchmark::State::StartKeepRunning() 1244a8c51b3fSopenharmony_ci jmp .LoopBody 1245a8c51b3fSopenharmony_ci.LoopEnd: 1246a8c51b3fSopenharmony_ci``` 1247a8c51b3fSopenharmony_ci 1248a8c51b3fSopenharmony_ciUnless C++03 compatibility is required, the ranged-for variant of writing 1249a8c51b3fSopenharmony_cithe benchmark loop should be preferred. 1250a8c51b3fSopenharmony_ci 1251a8c51b3fSopenharmony_ci<a name="disabling-cpu-frequency-scaling" /> 1252a8c51b3fSopenharmony_ci 1253a8c51b3fSopenharmony_ci## Disabling CPU Frequency Scaling 1254a8c51b3fSopenharmony_ci 1255a8c51b3fSopenharmony_ciIf you see this error: 1256a8c51b3fSopenharmony_ci 1257a8c51b3fSopenharmony_ci``` 1258a8c51b3fSopenharmony_ci***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 1259a8c51b3fSopenharmony_cibe noisy and will incur extra overhead. 1260a8c51b3fSopenharmony_ci``` 1261a8c51b3fSopenharmony_ci 1262a8c51b3fSopenharmony_ciyou might want to disable the CPU frequency scaling while running the 1263a8c51b3fSopenharmony_cibenchmark, as well as consider other ways to stabilize the performance of 1264a8c51b3fSopenharmony_ciyour system while benchmarking. 1265a8c51b3fSopenharmony_ci 1266a8c51b3fSopenharmony_ciSee [Reducing Variance](reducing_variance.md) for more information. 1267