1a8c51b3fSopenharmony_ci# Benchmark Tools 2a8c51b3fSopenharmony_ci 3a8c51b3fSopenharmony_ci## compare.py 4a8c51b3fSopenharmony_ci 5a8c51b3fSopenharmony_ciThe `compare.py` can be used to compare the result of benchmarks. 6a8c51b3fSopenharmony_ci 7a8c51b3fSopenharmony_ci### Dependencies 8a8c51b3fSopenharmony_ciThe utility relies on the [scipy](https://www.scipy.org) package which can be installed using pip: 9a8c51b3fSopenharmony_ci```bash 10a8c51b3fSopenharmony_cipip3 install -r requirements.txt 11a8c51b3fSopenharmony_ci``` 12a8c51b3fSopenharmony_ci 13a8c51b3fSopenharmony_ci### Displaying aggregates only 14a8c51b3fSopenharmony_ci 15a8c51b3fSopenharmony_ciThe switch `-a` / `--display_aggregates_only` can be used to control the 16a8c51b3fSopenharmony_cidisplayment of the normal iterations vs the aggregates. When passed, it will 17a8c51b3fSopenharmony_cibe passthrough to the benchmark binaries to be run, and will be accounted for 18a8c51b3fSopenharmony_ciin the tool itself; only the aggregates will be displayed, but not normal runs. 19a8c51b3fSopenharmony_ciIt only affects the display, the separate runs will still be used to calculate 20a8c51b3fSopenharmony_cithe U test. 21a8c51b3fSopenharmony_ci 22a8c51b3fSopenharmony_ci### Modes of operation 23a8c51b3fSopenharmony_ci 24a8c51b3fSopenharmony_ciThere are three modes of operation: 25a8c51b3fSopenharmony_ci 26a8c51b3fSopenharmony_ci1. Just compare two benchmarks 27a8c51b3fSopenharmony_ciThe program is invoked like: 28a8c51b3fSopenharmony_ci 29a8c51b3fSopenharmony_ci``` bash 30a8c51b3fSopenharmony_ci$ compare.py benchmarks <benchmark_baseline> <benchmark_contender> [benchmark options]... 31a8c51b3fSopenharmony_ci``` 32a8c51b3fSopenharmony_ciWhere `<benchmark_baseline>` and `<benchmark_contender>` either specify a benchmark executable file, or a JSON output file. The type of the input file is automatically detected. If a benchmark executable is specified then the benchmark is run to obtain the results. Otherwise the results are simply loaded from the output file. 33a8c51b3fSopenharmony_ci 34a8c51b3fSopenharmony_ci`[benchmark options]` will be passed to the benchmarks invocations. They can be anything that binary accepts, be it either normal `--benchmark_*` parameters, or some custom parameters your binary takes. 35a8c51b3fSopenharmony_ci 36a8c51b3fSopenharmony_ciExample output: 37a8c51b3fSopenharmony_ci``` 38a8c51b3fSopenharmony_ci$ ./compare.py benchmarks ./a.out ./a.out 39a8c51b3fSopenharmony_ciRUNNING: ./a.out --benchmark_out=/tmp/tmprBT5nW 40a8c51b3fSopenharmony_ciRun on (8 X 4000 MHz CPU s) 41a8c51b3fSopenharmony_ci2017-11-07 21:16:44 42a8c51b3fSopenharmony_ci------------------------------------------------------ 43a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 44a8c51b3fSopenharmony_ci------------------------------------------------------ 45a8c51b3fSopenharmony_ciBM_memcpy/8 36 ns 36 ns 19101577 211.669MB/s 46a8c51b3fSopenharmony_ciBM_memcpy/64 76 ns 76 ns 9412571 800.199MB/s 47a8c51b3fSopenharmony_ciBM_memcpy/512 84 ns 84 ns 8249070 5.64771GB/s 48a8c51b3fSopenharmony_ciBM_memcpy/1024 116 ns 116 ns 6181763 8.19505GB/s 49a8c51b3fSopenharmony_ciBM_memcpy/8192 643 ns 643 ns 1062855 11.8636GB/s 50a8c51b3fSopenharmony_ciBM_copy/8 222 ns 222 ns 3137987 34.3772MB/s 51a8c51b3fSopenharmony_ciBM_copy/64 1608 ns 1608 ns 432758 37.9501MB/s 52a8c51b3fSopenharmony_ciBM_copy/512 12589 ns 12589 ns 54806 38.7867MB/s 53a8c51b3fSopenharmony_ciBM_copy/1024 25169 ns 25169 ns 27713 38.8003MB/s 54a8c51b3fSopenharmony_ciBM_copy/8192 201165 ns 201112 ns 3486 38.8466MB/s 55a8c51b3fSopenharmony_ciRUNNING: ./a.out --benchmark_out=/tmp/tmpt1wwG_ 56a8c51b3fSopenharmony_ciRun on (8 X 4000 MHz CPU s) 57a8c51b3fSopenharmony_ci2017-11-07 21:16:53 58a8c51b3fSopenharmony_ci------------------------------------------------------ 59a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 60a8c51b3fSopenharmony_ci------------------------------------------------------ 61a8c51b3fSopenharmony_ciBM_memcpy/8 36 ns 36 ns 19397903 211.255MB/s 62a8c51b3fSopenharmony_ciBM_memcpy/64 73 ns 73 ns 9691174 839.635MB/s 63a8c51b3fSopenharmony_ciBM_memcpy/512 85 ns 85 ns 8312329 5.60101GB/s 64a8c51b3fSopenharmony_ciBM_memcpy/1024 118 ns 118 ns 6438774 8.11608GB/s 65a8c51b3fSopenharmony_ciBM_memcpy/8192 656 ns 656 ns 1068644 11.6277GB/s 66a8c51b3fSopenharmony_ciBM_copy/8 223 ns 223 ns 3146977 34.2338MB/s 67a8c51b3fSopenharmony_ciBM_copy/64 1611 ns 1611 ns 435340 37.8751MB/s 68a8c51b3fSopenharmony_ciBM_copy/512 12622 ns 12622 ns 54818 38.6844MB/s 69a8c51b3fSopenharmony_ciBM_copy/1024 25257 ns 25239 ns 27779 38.6927MB/s 70a8c51b3fSopenharmony_ciBM_copy/8192 205013 ns 205010 ns 3479 38.108MB/s 71a8c51b3fSopenharmony_ciComparing ./a.out to ./a.out 72a8c51b3fSopenharmony_ciBenchmark Time CPU Time Old Time New CPU Old CPU New 73a8c51b3fSopenharmony_ci------------------------------------------------------------------------------------------------------ 74a8c51b3fSopenharmony_ciBM_memcpy/8 +0.0020 +0.0020 36 36 36 36 75a8c51b3fSopenharmony_ciBM_memcpy/64 -0.0468 -0.0470 76 73 76 73 76a8c51b3fSopenharmony_ciBM_memcpy/512 +0.0081 +0.0083 84 85 84 85 77a8c51b3fSopenharmony_ciBM_memcpy/1024 +0.0098 +0.0097 116 118 116 118 78a8c51b3fSopenharmony_ciBM_memcpy/8192 +0.0200 +0.0203 643 656 643 656 79a8c51b3fSopenharmony_ciBM_copy/8 +0.0046 +0.0042 222 223 222 223 80a8c51b3fSopenharmony_ciBM_copy/64 +0.0020 +0.0020 1608 1611 1608 1611 81a8c51b3fSopenharmony_ciBM_copy/512 +0.0027 +0.0026 12589 12622 12589 12622 82a8c51b3fSopenharmony_ciBM_copy/1024 +0.0035 +0.0028 25169 25257 25169 25239 83a8c51b3fSopenharmony_ciBM_copy/8192 +0.0191 +0.0194 201165 205013 201112 205010 84a8c51b3fSopenharmony_ci``` 85a8c51b3fSopenharmony_ci 86a8c51b3fSopenharmony_ciWhat it does is for the every benchmark from the first run it looks for the benchmark with exactly the same name in the second run, and then compares the results. If the names differ, the benchmark is omitted from the diff. 87a8c51b3fSopenharmony_ciAs you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`. 88a8c51b3fSopenharmony_ci 89a8c51b3fSopenharmony_ci2. Compare two different filters of one benchmark 90a8c51b3fSopenharmony_ciThe program is invoked like: 91a8c51b3fSopenharmony_ci 92a8c51b3fSopenharmony_ci``` bash 93a8c51b3fSopenharmony_ci$ compare.py filters <benchmark> <filter_baseline> <filter_contender> [benchmark options]... 94a8c51b3fSopenharmony_ci``` 95a8c51b3fSopenharmony_ciWhere `<benchmark>` either specify a benchmark executable file, or a JSON output file. The type of the input file is automatically detected. If a benchmark executable is specified then the benchmark is run to obtain the results. Otherwise the results are simply loaded from the output file. 96a8c51b3fSopenharmony_ci 97a8c51b3fSopenharmony_ciWhere `<filter_baseline>` and `<filter_contender>` are the same regex filters that you would pass to the `[--benchmark_filter=<regex>]` parameter of the benchmark binary. 98a8c51b3fSopenharmony_ci 99a8c51b3fSopenharmony_ci`[benchmark options]` will be passed to the benchmarks invocations. They can be anything that binary accepts, be it either normal `--benchmark_*` parameters, or some custom parameters your binary takes. 100a8c51b3fSopenharmony_ci 101a8c51b3fSopenharmony_ciExample output: 102a8c51b3fSopenharmony_ci``` 103a8c51b3fSopenharmony_ci$ ./compare.py filters ./a.out BM_memcpy BM_copy 104a8c51b3fSopenharmony_ciRUNNING: ./a.out --benchmark_filter=BM_memcpy --benchmark_out=/tmp/tmpBWKk0k 105a8c51b3fSopenharmony_ciRun on (8 X 4000 MHz CPU s) 106a8c51b3fSopenharmony_ci2017-11-07 21:37:28 107a8c51b3fSopenharmony_ci------------------------------------------------------ 108a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 109a8c51b3fSopenharmony_ci------------------------------------------------------ 110a8c51b3fSopenharmony_ciBM_memcpy/8 36 ns 36 ns 17891491 211.215MB/s 111a8c51b3fSopenharmony_ciBM_memcpy/64 74 ns 74 ns 9400999 825.646MB/s 112a8c51b3fSopenharmony_ciBM_memcpy/512 87 ns 87 ns 8027453 5.46126GB/s 113a8c51b3fSopenharmony_ciBM_memcpy/1024 111 ns 111 ns 6116853 8.5648GB/s 114a8c51b3fSopenharmony_ciBM_memcpy/8192 657 ns 656 ns 1064679 11.6247GB/s 115a8c51b3fSopenharmony_ciRUNNING: ./a.out --benchmark_filter=BM_copy --benchmark_out=/tmp/tmpAvWcOM 116a8c51b3fSopenharmony_ciRun on (8 X 4000 MHz CPU s) 117a8c51b3fSopenharmony_ci2017-11-07 21:37:33 118a8c51b3fSopenharmony_ci---------------------------------------------------- 119a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 120a8c51b3fSopenharmony_ci---------------------------------------------------- 121a8c51b3fSopenharmony_ciBM_copy/8 227 ns 227 ns 3038700 33.6264MB/s 122a8c51b3fSopenharmony_ciBM_copy/64 1640 ns 1640 ns 426893 37.2154MB/s 123a8c51b3fSopenharmony_ciBM_copy/512 12804 ns 12801 ns 55417 38.1444MB/s 124a8c51b3fSopenharmony_ciBM_copy/1024 25409 ns 25407 ns 27516 38.4365MB/s 125a8c51b3fSopenharmony_ciBM_copy/8192 202986 ns 202990 ns 3454 38.4871MB/s 126a8c51b3fSopenharmony_ciComparing BM_memcpy to BM_copy (from ./a.out) 127a8c51b3fSopenharmony_ciBenchmark Time CPU Time Old Time New CPU Old CPU New 128a8c51b3fSopenharmony_ci-------------------------------------------------------------------------------------------------------------------- 129a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/8 +5.2829 +5.2812 36 227 36 227 130a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/64 +21.1719 +21.1856 74 1640 74 1640 131a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/512 +145.6487 +145.6097 87 12804 87 12801 132a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/1024 +227.1860 +227.1776 111 25409 111 25407 133a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/8192 +308.1664 +308.2898 657 202986 656 202990 134a8c51b3fSopenharmony_ci``` 135a8c51b3fSopenharmony_ci 136a8c51b3fSopenharmony_ciAs you can see, it applies filter to the benchmarks, both when running the benchmark, and before doing the diff. And to make the diff work, the matches are replaced with some common string. Thus, you can compare two different benchmark families within one benchmark binary. 137a8c51b3fSopenharmony_ciAs you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`. 138a8c51b3fSopenharmony_ci 139a8c51b3fSopenharmony_ci3. Compare filter one from benchmark one to filter two from benchmark two: 140a8c51b3fSopenharmony_ciThe program is invoked like: 141a8c51b3fSopenharmony_ci 142a8c51b3fSopenharmony_ci``` bash 143a8c51b3fSopenharmony_ci$ compare.py filters <benchmark_baseline> <filter_baseline> <benchmark_contender> <filter_contender> [benchmark options]... 144a8c51b3fSopenharmony_ci``` 145a8c51b3fSopenharmony_ci 146a8c51b3fSopenharmony_ciWhere `<benchmark_baseline>` and `<benchmark_contender>` either specify a benchmark executable file, or a JSON output file. The type of the input file is automatically detected. If a benchmark executable is specified then the benchmark is run to obtain the results. Otherwise the results are simply loaded from the output file. 147a8c51b3fSopenharmony_ci 148a8c51b3fSopenharmony_ciWhere `<filter_baseline>` and `<filter_contender>` are the same regex filters that you would pass to the `[--benchmark_filter=<regex>]` parameter of the benchmark binary. 149a8c51b3fSopenharmony_ci 150a8c51b3fSopenharmony_ci`[benchmark options]` will be passed to the benchmarks invocations. They can be anything that binary accepts, be it either normal `--benchmark_*` parameters, or some custom parameters your binary takes. 151a8c51b3fSopenharmony_ci 152a8c51b3fSopenharmony_ciExample output: 153a8c51b3fSopenharmony_ci``` 154a8c51b3fSopenharmony_ci$ ./compare.py benchmarksfiltered ./a.out BM_memcpy ./a.out BM_copy 155a8c51b3fSopenharmony_ciRUNNING: ./a.out --benchmark_filter=BM_memcpy --benchmark_out=/tmp/tmp_FvbYg 156a8c51b3fSopenharmony_ciRun on (8 X 4000 MHz CPU s) 157a8c51b3fSopenharmony_ci2017-11-07 21:38:27 158a8c51b3fSopenharmony_ci------------------------------------------------------ 159a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 160a8c51b3fSopenharmony_ci------------------------------------------------------ 161a8c51b3fSopenharmony_ciBM_memcpy/8 37 ns 37 ns 18953482 204.118MB/s 162a8c51b3fSopenharmony_ciBM_memcpy/64 74 ns 74 ns 9206578 828.245MB/s 163a8c51b3fSopenharmony_ciBM_memcpy/512 91 ns 91 ns 8086195 5.25476GB/s 164a8c51b3fSopenharmony_ciBM_memcpy/1024 120 ns 120 ns 5804513 7.95662GB/s 165a8c51b3fSopenharmony_ciBM_memcpy/8192 664 ns 664 ns 1028363 11.4948GB/s 166a8c51b3fSopenharmony_ciRUNNING: ./a.out --benchmark_filter=BM_copy --benchmark_out=/tmp/tmpDfL5iE 167a8c51b3fSopenharmony_ciRun on (8 X 4000 MHz CPU s) 168a8c51b3fSopenharmony_ci2017-11-07 21:38:32 169a8c51b3fSopenharmony_ci---------------------------------------------------- 170a8c51b3fSopenharmony_ciBenchmark Time CPU Iterations 171a8c51b3fSopenharmony_ci---------------------------------------------------- 172a8c51b3fSopenharmony_ciBM_copy/8 230 ns 230 ns 2985909 33.1161MB/s 173a8c51b3fSopenharmony_ciBM_copy/64 1654 ns 1653 ns 419408 36.9137MB/s 174a8c51b3fSopenharmony_ciBM_copy/512 13122 ns 13120 ns 53403 37.2156MB/s 175a8c51b3fSopenharmony_ciBM_copy/1024 26679 ns 26666 ns 26575 36.6218MB/s 176a8c51b3fSopenharmony_ciBM_copy/8192 215068 ns 215053 ns 3221 36.3283MB/s 177a8c51b3fSopenharmony_ciComparing BM_memcpy (from ./a.out) to BM_copy (from ./a.out) 178a8c51b3fSopenharmony_ciBenchmark Time CPU Time Old Time New CPU Old CPU New 179a8c51b3fSopenharmony_ci-------------------------------------------------------------------------------------------------------------------- 180a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/8 +5.1649 +5.1637 37 230 37 230 181a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/64 +21.4352 +21.4374 74 1654 74 1653 182a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/512 +143.6022 +143.5865 91 13122 91 13120 183a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/1024 +221.5903 +221.4790 120 26679 120 26666 184a8c51b3fSopenharmony_ci[BM_memcpy vs. BM_copy]/8192 +322.9059 +323.0096 664 215068 664 215053 185a8c51b3fSopenharmony_ci``` 186a8c51b3fSopenharmony_ciThis is a mix of the previous two modes, two (potentially different) benchmark binaries are run, and a different filter is applied to each one. 187a8c51b3fSopenharmony_ciAs you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`. 188a8c51b3fSopenharmony_ci 189a8c51b3fSopenharmony_ci### Note: Interpreting the output 190a8c51b3fSopenharmony_ci 191a8c51b3fSopenharmony_ciPerformance measurements are an art, and performance comparisons are doubly so. 192a8c51b3fSopenharmony_ciResults are often noisy and don't necessarily have large absolute differences to 193a8c51b3fSopenharmony_cithem, so just by visual inspection, it is not at all apparent if two 194a8c51b3fSopenharmony_cimeasurements are actually showing a performance change or not. It is even more 195a8c51b3fSopenharmony_ciconfusing with multiple benchmark repetitions. 196a8c51b3fSopenharmony_ci 197a8c51b3fSopenharmony_ciThankfully, what we can do, is use statistical tests on the results to determine 198a8c51b3fSopenharmony_ciwhether the performance has statistically-significantly changed. `compare.py` 199a8c51b3fSopenharmony_ciuses [Mann–Whitney U 200a8c51b3fSopenharmony_citest](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test), with a null 201a8c51b3fSopenharmony_cihypothesis being that there's no difference in performance. 202a8c51b3fSopenharmony_ci 203a8c51b3fSopenharmony_ci**The below output is a summary of a benchmark comparison with statistics 204a8c51b3fSopenharmony_ciprovided for a multi-threaded process.** 205a8c51b3fSopenharmony_ci``` 206a8c51b3fSopenharmony_ciBenchmark Time CPU Time Old Time New CPU Old CPU New 207a8c51b3fSopenharmony_ci----------------------------------------------------------------------------------------------------------------------------- 208a8c51b3fSopenharmony_cibenchmark/threads:1/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 27 vs 27 209a8c51b3fSopenharmony_cibenchmark/threads:1/process_time/real_time_mean -0.1442 -0.1442 90 77 90 77 210a8c51b3fSopenharmony_cibenchmark/threads:1/process_time/real_time_median -0.1444 -0.1444 90 77 90 77 211a8c51b3fSopenharmony_cibenchmark/threads:1/process_time/real_time_stddev +0.3974 +0.3933 0 0 0 0 212a8c51b3fSopenharmony_cibenchmark/threads:1/process_time/real_time_cv +0.6329 +0.6280 0 0 0 0 213a8c51b3fSopenharmony_ciOVERALL_GEOMEAN -0.1442 -0.1442 0 0 0 0 214a8c51b3fSopenharmony_ci``` 215a8c51b3fSopenharmony_ci-------------------------------------------- 216a8c51b3fSopenharmony_ciHere's a breakdown of each row: 217a8c51b3fSopenharmony_ci 218a8c51b3fSopenharmony_ci**benchmark/threads:1/process_time/real_time_pvalue**: This shows the _p-value_ for 219a8c51b3fSopenharmony_cithe statistical test comparing the performance of the process running with one 220a8c51b3fSopenharmony_cithread. A value of 0.0000 suggests a statistically significant difference in 221a8c51b3fSopenharmony_ciperformance. The comparison was conducted using the U Test (Mann-Whitney 222a8c51b3fSopenharmony_ciU Test) with 27 repetitions for each case. 223a8c51b3fSopenharmony_ci 224a8c51b3fSopenharmony_ci**benchmark/threads:1/process_time/real_time_mean**: This shows the relative 225a8c51b3fSopenharmony_cidifference in mean execution time between two different cases. The negative 226a8c51b3fSopenharmony_civalue (-0.1442) implies that the new process is faster by about 14.42%. The old 227a8c51b3fSopenharmony_citime was 90 units, while the new time is 77 units. 228a8c51b3fSopenharmony_ci 229a8c51b3fSopenharmony_ci**benchmark/threads:1/process_time/real_time_median**: Similarly, this shows the 230a8c51b3fSopenharmony_cirelative difference in the median execution time. Again, the new process is 231a8c51b3fSopenharmony_cifaster by 14.44%. 232a8c51b3fSopenharmony_ci 233a8c51b3fSopenharmony_ci**benchmark/threads:1/process_time/real_time_stddev**: This is the relative 234a8c51b3fSopenharmony_cidifference in the standard deviation of the execution time, which is a measure 235a8c51b3fSopenharmony_ciof how much variation or dispersion there is from the mean. A positive value 236a8c51b3fSopenharmony_ci(+0.3974) implies there is more variance in the execution time in the new 237a8c51b3fSopenharmony_ciprocess. 238a8c51b3fSopenharmony_ci 239a8c51b3fSopenharmony_ci**benchmark/threads:1/process_time/real_time_cv**: CV stands for Coefficient of 240a8c51b3fSopenharmony_ciVariation. It is the ratio of the standard deviation to the mean. It provides a 241a8c51b3fSopenharmony_cistandardized measure of dispersion. An increase (+0.6329) indicates more 242a8c51b3fSopenharmony_cirelative variability in the new process. 243a8c51b3fSopenharmony_ci 244a8c51b3fSopenharmony_ci**OVERALL_GEOMEAN**: Geomean stands for geometric mean, a type of average that is 245a8c51b3fSopenharmony_ciless influenced by outliers. The negative value indicates a general improvement 246a8c51b3fSopenharmony_ciin the new process. However, given the values are all zero for the old and new 247a8c51b3fSopenharmony_citimes, this seems to be a mistake or placeholder in the output. 248a8c51b3fSopenharmony_ci 249a8c51b3fSopenharmony_ci----------------------------------------- 250a8c51b3fSopenharmony_ci 251a8c51b3fSopenharmony_ci 252a8c51b3fSopenharmony_ci 253a8c51b3fSopenharmony_ciLet's first try to see what the different columns represent in the above 254a8c51b3fSopenharmony_ci`compare.py` benchmarking output: 255a8c51b3fSopenharmony_ci 256a8c51b3fSopenharmony_ci 1. **Benchmark:** The name of the function being benchmarked, along with the 257a8c51b3fSopenharmony_ci size of the input (after the slash). 258a8c51b3fSopenharmony_ci 259a8c51b3fSopenharmony_ci 2. **Time:** The average time per operation, across all iterations. 260a8c51b3fSopenharmony_ci 261a8c51b3fSopenharmony_ci 3. **CPU:** The average CPU time per operation, across all iterations. 262a8c51b3fSopenharmony_ci 263a8c51b3fSopenharmony_ci 4. **Iterations:** The number of iterations the benchmark was run to get a 264a8c51b3fSopenharmony_ci stable estimate. 265a8c51b3fSopenharmony_ci 266a8c51b3fSopenharmony_ci 5. **Time Old and Time New:** These represent the average time it takes for a 267a8c51b3fSopenharmony_ci function to run in two different scenarios or versions. For example, you 268a8c51b3fSopenharmony_ci might be comparing how fast a function runs before and after you make some 269a8c51b3fSopenharmony_ci changes to it. 270a8c51b3fSopenharmony_ci 271a8c51b3fSopenharmony_ci 6. **CPU Old and CPU New:** These show the average amount of CPU time that the 272a8c51b3fSopenharmony_ci function uses in two different scenarios or versions. This is similar to 273a8c51b3fSopenharmony_ci Time Old and Time New, but focuses on CPU usage instead of overall time. 274a8c51b3fSopenharmony_ci 275a8c51b3fSopenharmony_ciIn the comparison section, the relative differences in both time and CPU time 276a8c51b3fSopenharmony_ciare displayed for each input size. 277a8c51b3fSopenharmony_ci 278a8c51b3fSopenharmony_ci 279a8c51b3fSopenharmony_ciA statistically-significant difference is determined by a **p-value**, which is 280a8c51b3fSopenharmony_cia measure of the probability that the observed difference could have occurred 281a8c51b3fSopenharmony_cijust by random chance. A smaller p-value indicates stronger evidence against the 282a8c51b3fSopenharmony_cinull hypothesis. 283a8c51b3fSopenharmony_ci 284a8c51b3fSopenharmony_ci**Therefore:** 285a8c51b3fSopenharmony_ci 1. If the p-value is less than the chosen significance level (alpha), we 286a8c51b3fSopenharmony_ci reject the null hypothesis and conclude the benchmarks are significantly 287a8c51b3fSopenharmony_ci different. 288a8c51b3fSopenharmony_ci 2. If the p-value is greater than or equal to alpha, we fail to reject the 289a8c51b3fSopenharmony_ci null hypothesis and treat the two benchmarks as similar. 290a8c51b3fSopenharmony_ci 291a8c51b3fSopenharmony_ci 292a8c51b3fSopenharmony_ci 293a8c51b3fSopenharmony_ciThe result of said the statistical test is additionally communicated through color coding: 294a8c51b3fSopenharmony_ci```diff 295a8c51b3fSopenharmony_ci+ Green: 296a8c51b3fSopenharmony_ci``` 297a8c51b3fSopenharmony_ci The benchmarks are _**statistically different**_. This could mean the 298a8c51b3fSopenharmony_ci performance has either **significantly improved** or **significantly 299a8c51b3fSopenharmony_ci deteriorated**. You should look at the actual performance numbers to see which 300a8c51b3fSopenharmony_ci is the case. 301a8c51b3fSopenharmony_ci```diff 302a8c51b3fSopenharmony_ci- Red: 303a8c51b3fSopenharmony_ci``` 304a8c51b3fSopenharmony_ci The benchmarks are _**statistically similar**_. This means the performance 305a8c51b3fSopenharmony_ci **hasn't significantly changed**. 306a8c51b3fSopenharmony_ci 307a8c51b3fSopenharmony_ciIn statistical terms, **'green'** means we reject the null hypothesis that 308a8c51b3fSopenharmony_cithere's no difference in performance, and **'red'** means we fail to reject the 309a8c51b3fSopenharmony_cinull hypothesis. This might seem counter-intuitive if you're expecting 'green' 310a8c51b3fSopenharmony_cito mean 'improved performance' and 'red' to mean 'worsened performance'. 311a8c51b3fSopenharmony_ci```bash 312a8c51b3fSopenharmony_ci But remember, in this context: 313a8c51b3fSopenharmony_ci 314a8c51b3fSopenharmony_ci 'Success' means 'successfully finding a difference'. 315a8c51b3fSopenharmony_ci 'Failure' means 'failing to find a difference'. 316a8c51b3fSopenharmony_ci``` 317a8c51b3fSopenharmony_ci 318a8c51b3fSopenharmony_ci 319a8c51b3fSopenharmony_ciAlso, please note that **even if** we determine that there **is** a 320a8c51b3fSopenharmony_cistatistically-significant difference between the two measurements, it does not 321a8c51b3fSopenharmony_ci_necessarily_ mean that the actual benchmarks that were measured **are** 322a8c51b3fSopenharmony_cidifferent, or vice versa, even if we determine that there is **no** 323a8c51b3fSopenharmony_cistatistically-significant difference between the two measurements, it does not 324a8c51b3fSopenharmony_cinecessarily mean that the actual benchmarks that were measured **are not** 325a8c51b3fSopenharmony_cidifferent. 326a8c51b3fSopenharmony_ci 327a8c51b3fSopenharmony_ci 328a8c51b3fSopenharmony_ci 329a8c51b3fSopenharmony_ci### U test 330a8c51b3fSopenharmony_ci 331a8c51b3fSopenharmony_ciIf there is a sufficient repetition count of the benchmarks, the tool can do 332a8c51b3fSopenharmony_cia [U Test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test), of the 333a8c51b3fSopenharmony_cinull hypothesis that it is equally likely that a randomly selected value from 334a8c51b3fSopenharmony_cione sample will be less than or greater than a randomly selected value from a 335a8c51b3fSopenharmony_cisecond sample. 336a8c51b3fSopenharmony_ci 337a8c51b3fSopenharmony_ciIf the calculated p-value is below this value is lower than the significance 338a8c51b3fSopenharmony_cilevel alpha, then the result is said to be statistically significant and the 339a8c51b3fSopenharmony_cinull hypothesis is rejected. Which in other words means that the two benchmarks 340a8c51b3fSopenharmony_ciaren't identical. 341a8c51b3fSopenharmony_ci 342a8c51b3fSopenharmony_ci**WARNING**: requires **LARGE** (no less than 9) number of repetitions to be 343a8c51b3fSopenharmony_cimeaningful! 344