Monday, March 27, 2023

How I run sysbench

I have begun to rely on sysbench to test for performance regressions in MySQL and Postgres and this post explains how I run it. Sysbench is a framework for running database benchmarks and a set of built-in systems benchmarks for testing file system, memory, threading, synchronization and CPU performance. 

I use it as a framework for running database benchmarks, or really, for running database microbenchmarks. It embeds a Lua interpreter and the microbenchmarks are implemented in Lua. The Lua microbenchmarks are here and to that I have added a few more (see here). I call these microbenchmarks rather than benchmarks because each one is simple and most use a single statement per transaction. It is possible to write more complex benchmarks in Lua and Percona has done that for tpcc-mysql. The microbenchmarks I use are very synthetic and far from a real workload. But I want that to help document where CPU regressions occur as a DBMS ages. A less synthetic benchmark would make it harder to distinguish whether the new overheads occur for point queries vs range queries vs writes.

Sysbench includes drivers for Postgres and MySQL. Drivers can be implemented for other DBMS, but the primary usage has been for the popular open source DBMS. While it would be great to get drivers for non-OSS DBMS, sharing results for them would frequently be illegal courtesy of the DeWitt Clause.

My usage

I have been using it for Postgres, MySQL/InnoDB and MySQL/RocksDB (MyRocks). As always, my usage is via layers on layers of shell scripts that are messy but effective for me. The top level script is r.sh which invokes the per-DBMS scripts (cmp_in.sh for InnoDB, cmp_rx.sh for MyRocks, cmp_pg.sh for Postgres). The per-DBMS scripts depend on the existence of startup (ini.sh) and shutdown (down.sh) scripts that I store in the database install directories. The per-DBMS scripts also hardcode the database versions (for example Postgres 11.19) and configuration files that will be used.

The benchmark is run for W seconds per microbenchmark, X rows per table, Y tables and Z threads. The values for W, X, Y and Z are input to the r.sh script.

The sysbench clients share the server with the DBMS. This makes my setup simpler and I can avoid variance from the client/server network. However, there are some downsides from this approach.

Sequence of microbenchmarks

The per-DBMS scripts then call another script to run a sequence of sysbench microbenchmarks in a sequence that is useful to me. That script is named all_small.sh and it then calls run.sh to run a microbenchmark.

There are three classes of microbenchmarks based on the dominant work done: point, range and writes. The dominant work is point queries for point, range queries for range and insert/update/delete for writes.

The microbenchmark sequence is:

  • point-query.warm - load the DBMS, create secondary indexes and then do point queries to warm up the dbms. This does the extra work (load, index) when $setup is non-zero and the code for the extra work is here in run.sh. I try to ignore the results from this microbenchmark. The post-write work is done at the end of this microbenchmark (see below).
  • More read-only tests for point and range queries from line 27 to line 62
    • These are called the pre tests and all have pre in their name. That indicates that they were run prior to the write-heavy tests that follow. 
  • Writes from line 65 to line 85.
    • The post-write work is done at the end of this microbenchmark (see below).
  • Read-only tests from line 87 to line 128
    • These repeat many of the read-only microbenchmarks that were run prior to the write-heavy microbenchmarks. While the ones run prior have pre in their names, these do not have post in their names. However their performance is likely to be worse than the results for the pre tests because the index structures (B-Tree, LSM) have been fragmented (by fragment I just mean subject to random updates).
    • The scan microbenchmark on line 128 does a full-table scan which takes much longer than the SQL statements used by the other microbenchmarks. Thus, reporting for this in my benchmark scripts is via Mrps (Million Rows read Per Second) while for the others it is QPS (Queries Per Second, although Statements Per Second is more truthy). Grep for scan in run.sh to see special treatment for scan in several places. One limitation is there can only be one scan per table, so if the benchmark is run with 8 tables and 20 threads then all of the microbenchmarks except scan use 20 threads while scan uses 8 threads.
  • Write-only tests (delete, insert) on lines 132 and 134
The post-write work is done to get the database structures into a more deterministic state (yes, more deterministic is an oxymoron) to reduce variance for the read-only tests that follow:
  • For two of the microbenchmarks (point-query.warm and read-write.range=100) the 15th arg to run.sh is 1 rather than $pwr (the value of $pwr is 0). Thus, the value of the $postwrite variable in run.sh is =1 and the work triggered when postwrite=1 done. 
  • The work done when postwrite=1 varies by storage engine. For InnoDB it forces dirty pages to be written back, For MyRocks it flushes the memtable and L0. For Postgres it does vacuum analyze and then checkpoint. For all DBMS it sleeps at least 60 seconds (sleep time is a function of the database size) after requesting background work.
To map a microbenchmark to the Lua script that implements it:
  • Start with the microbenchmark name, for example point-query.pre from all_small.sh
  • Find that name in the big if/elif sequence in all_small.sh, point-query.pre is here
  • The Lua script name is on the lua= line that follows (line 56 has lua=oltp_point_select.lua)














No comments:

Post a Comment