This has results for the new insert benchmark (with deletes enabled) for MyRocks on a medium server. I ended up repeating the benchmark 3 times (round 2 after realizing I needed to log the response time for a potentially slow queries, round 3 because I needed to use a new MyRocks build). Results for the benchmark on a small server are here and here.
- MyRocks in 8.0.28 has better perf than in 5.6.35 for most of the benchmark steps. This wasn't true on the small server. I suspect that one reason for the change is that a server CPU was used here while the small server uses a mobile CPU -- the Beelink uses AMD Ryzen 7 4700u and /proc/cpuinfo from the c2 server shows Intel(R) Xeon(R) CPU @ 3.10GHz.
- there is a ~5 second write stall for the l.i1 benchmark step in one of the configurations. I have more work to do to explain it
The medium server is c2-standard-30 from GCP with 15 cores, hyperthreads disabled, 120G of RAM, and 1.5T of XFS vis SW RAID 0 over 4 local NVMe devices.
An overview of the insert benchmark is here, here and here. The insert benchmark was run for 8 clients. The read+write steps (q100, q500, q1000) were run for 3600 seconds each. The delete per insert option was set for l.i1, q100, q500 and q1000.
- cached by RocksDB - all data fits in the 80G RocksDB block cache. The benchmark tables have 160M rows and the database size is ~12G.
- cached by OS - all data fits in the OS page cache but not the 4G RocksDB block cache. The benchmark tables have 160M rows and the database size is ~12G.
- IO-bound - the database is larger than memory. The benchmark tables have 4000M rows and the database size is ~281G.
- insert X million rows across all tables without secondary indexes where X is 20 for cached and 500 for IO-bound
- create 3 secondary indexes. I usually ignore performance from this step.
- insert and delete another 50 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start. The inserts are done to the table head and the deletes are done from the tail.
- do queries as fast as possible with 100 inserts/s/client and the same rate for deletes/s done in the background
- do queries as fast as possible with 500 inserts/s/client and the same rate for deletes/s done in the background
- do queries as fast as possible with 1000 inserts/s/client and the same rate for deletes/s done in the background
I used old and new versions of MyRocks source code. The old versions were built from HEAD in February 2023. The new versions were built from HEAD in June 2023. The details are:
- old versions
- 5.6.35 - RocksDB 7.10.2, FB MySQL git hash 205c31dd
- 8.0.28 - RocksDB 7.10.0, FB MySQL git hash unknown
- new versions
- 5.6.35 - RocksDB 8.2.1, FB MySQL git hash 7e40af67
- 8.0.28 - RocksDB 8.3.1, FB MySQL git hash ef5b9b101
- more throughput mostly implies a better response time histogram
- worst-case response times are bad (~5 seconds) in one case: l.i1 and cached by OS (see here). Note that worst-case response time for l.i1 is <= 1-second for IO-bound and for cached by RocksDB. From the throughput vs time charts that show per-second insert rates and per-second max response times for cached by RocksDB, cached by OS and IO-bound there is one blip on the cached by RocksDB chart.
- there are no write stalls at the end of the step (l.x) that precedes l.i1 (see here)
- the write stalls in l.i1 are from too many files in L0 (see here)
- the average time for a L0->L1 compaction (see Avg(sec) here). I don't know if the median time is close to the average time. Again from here, based on the values of the Rn(GB), Rn+1(GB) and Comp(cnt) columns the average L0->L1 compaction reads ~842M from L0 and ~1075M from L1 and then writes ~1884M. This is done by a single thread which processes the compaction input at ~64M/s. Note that L0->L1->L2 is the choke point for compaction because L0->L1 is usually single-threaded and L0->L1 usually cannot run concurrent with L1->L2. From results for l.i1 with cached by RocksDB and IO-bound the stats aren't that different, but somehow these don't get ~5 second write stalls. Based on the configuration, compaction should be triggered with 4 SSTs in L0 (~32M each) and 4 SSTs in L0 (~64M each) which would be ~384M of input. But when compaction gets behind the input gets larger.