I wrote a small program named bench
that lets you benchmark other programs from the command line. Think of this as a much nicer alternative to the time
command.
The best way to illustrate how this works is to show a few example uses of the program:
$ bench 'ls /usr/bin | wc -l' # You can benchmark shell pipelines
benchmarking ls /usr/bin | wc -l
time 6.756 ms (6.409 ms .. 7.059 ms)
0.988 R² (0.980 R² .. 0.995 R²)
mean 7.590 ms (7.173 ms .. 8.526 ms)
std dev 1.685 ms (859.0 μs .. 2.582 ms)
variance introduced by outliers: 88% (severely inflated)
$ bench 'sleep 1' # You have to quote multiple tokens
benchmarking sleep 1
time 1.003 s (1.003 s .. 1.003 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.003 s (1.003 s .. 1.003 s)
std dev 65.86 μs (0.0 s .. 68.91 μs)
variance introduced by outliers: 19% (moderately inflated)
$ bench true # The benchmark overhead is below 1 ms
benchmarking true
time 383.9 μs (368.6 μs .. 403.4 μs)
0.982 R² (0.971 R² .. 0.991 R²)
mean 401.1 μs (386.9 μs .. 418.9 μs)
std dev 54.39 μs (41.70 μs .. 67.62 μs)
variance introduced by outliers: 87% (severely inflated)
This utility just provides a command-line API for Haskell's criterion
benchmarking library. The bench
tool wraps any shell command you provide in a a subprocess and benchmarks that subprocess through repeated runs using criterion
. The number of runs varies between 10 to 10000 times depending on how expensive the command is.
This tool also threads through the same command-line options that criterion
accepts for benchmark suites. You can see the full set of options using the --help
flag:
$ bench --help
Command-line tool to benchmark other programs
Usage: bench COMMAND ([-I|--ci CI] [-G|--no-gc] [-L|--time-limit SECS]
[--resamples COUNT] [--regress RESP:PRED..] [--raw FILE]
[-o|--output FILE] [--csv FILE] [--junit FILE]
[-v|--verbosity LEVEL] [-t|--template FILE] [-m|--match MATCH]
[NAME...] | [-n|--iters ITERS] [-m|--match MATCH] [NAME...] |
[-l|--list] | [--version])
Available options:
-h,--help Show this help text
COMMAND The command line to benchmark
-I,--ci CI Confidence interval
-G,--no-gc Do not collect garbage between iterations
-L,--time-limit SECS Time limit to run a benchmark
--resamples COUNT Number of bootstrap resamples to perform
--regress RESP:PRED.. Regressions to perform
--raw FILE File to write raw data to
-o,--output FILE File to write report to
--csv FILE File to write CSV summary to
--junit FILE File to write JUnit summary to
-v,--verbosity LEVEL Verbosity level
-t,--template FILE Template to use for report
-m,--match MATCH How to match benchmark names ("prefix" or "glob")
-n,--iters ITERS Run benchmarks, don't analyse
-m,--match MATCH How to match benchmark names ("prefix" or "glob")
-l,--list List benchmarks
--version Show version info
The --output
option is really useful: it outputs an HTML page with a chart showing the distribution of run times. For example, the following command:
$ bench 'ls /usr/bin | wc -l' --output example.html
benchmarking ls /usr/bin | wc -l
time 6.716 ms (6.645 ms .. 6.807 ms)
0.999 R² (0.999 R² .. 0.999 R²)
mean 7.005 ms (6.897 ms .. 7.251 ms)
std dev 462.0 μs (199.3 μs .. 809.2 μs)
variance introduced by outliers: 37% (moderately inflated)
Also produces something like the following chart which you can view in example.html
:
You can also increase the time limit using the --time-limit
option, which will in turn increase the number of runs for better statistics. For example, criterion
warned me that I had too many outliers for my benchmarks, so I can increase the time limit for the above benchmark to 30 seconds:
$ bench 'ls /usr/bin | wc -l' --time-limit 30 --output example.html
benchmarking ls /usr/bin | wc -l
time 6.937 ms (6.898 ms .. 7.002 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 6.905 ms (6.878 ms .. 6.935 ms)
std dev 114.9 μs (86.59 μs .. 156.1 μs)
... which dials up the number of runs to the ~4000 range, reduces the number of outliers, and brings down the standard deviation by a factor of four:
Keep in mind that there are a few limitations to this tool:
- this tool cannot accurately benchmark code that requires a warm up phase (such as JVM programs that depend on JIT compilation for performance)
- this tool cannot measure performance below about half a millisecond due to the overhead of launching a subprocess and bash interpreter
Despite those limitations, I find that this tool comes in handy in a few scenarios:
- Preliminary benchmarking in the prototyping phase of program development
- Benchmarking program pipelines written in multiple languages
You can install this tool by following the instructions on the Github repo:
Or if you have the Haskell stack
tool installed you can just run:
$ stack update
$ stack install bench
Very nice! I've recently been benchmarking a load of programs, which plug together as pipelines in this way.
ReplyDeleteI ended up making something which looks like a clunky version of this 'bench' command (although using environment variables rather than arguments), and a wrapper which allows toggling between that and 'time' (which allows much faster feedback for long running commands).
With this, it looks like I can enjoy the most satisfying part of programming: deleting code which is no longer needed!
Very nice, thank you. I use https://github.com/simonmichael/hledger/blob/master/tools/simplebench.hs for displaying comparative benchmarks (and for obtaining rough measurements quickly, when criterion's many iterations would be overkill). I wonder if it would fit in bench.
ReplyDeleteYeah, I think it makes a lot of sense to support multiple benchmarks.
DeleteThis page has some issues for mobile users; the code blocks are too wide for mobile phone screens, and the page does not allow you to scroll sidewise. Just a heads up. :)
ReplyDeleteThis page has some issues for mobile users; the code blocks are too wide for mobile phone screens, and the page does not allow you to scroll sidewise. Just a heads up. :)
ReplyDelete