Monday, December 8, 2025

RocksDB performance over time on a small Arm server

This post has results for RocksDB on an Arm server. I previously shared results for RocksDB performance using gcc and clang. Here I share results using clang with LTO.

RocksDB is boring, there are few performance regressions.

tl;dr

  • for cached workloads throughput with RocksDB 10.8 is as good or better than with 6.29
  • for not-cached workloads throughput with RocksDB 10.8 is similar to 6.29 except for the overwrite test where it is 7% less, probably from correctness checks added in 7.x and 8.x.

Software

I used RocksDB versions 6.29, 7.0, 7.10, 8.0, 8.4, 8.8, 8.11, 9.0, 9.4, 9.8, 9.11 and 10.0 through 10.8.

I compiled each version clang version 18.3.1 with link-time optimization enabled (LTO). The build command line was:

flags=( DISABLE_WARNING_AS_ERROR=1 DEBUG_LEVEL=0 V=1 VERBOSE=1 )

# for clang+LTO
AR=llvm-ar-18 RANLIB=llvm-ranlib-18 CC=clang CXX=clang++ \
    make USE_LTO=1 "${flags[@]}" static_lib db_bench

Hardware

I used a small Arm server from the Google cloud running Ubuntu 22.04. The server type was c4a-standard-8-lssd with 8 cores and 32G of RAM. Storage was 2 local SSDs with RAID 0 and ext-4.

Benchmark

Overviews on how I use db_bench are here and here.

The benchmark was run with 1 thread and used the LRU block cache.

Tests were run for three workloads:

  • byrx - database cached by RocksDB
  • iobuf - database is larger than RAM and RocksDB used buffered IO
  • iodir - database is larger than RAM and RocksDB used O_DIRECT

The benchmark steps that I focus on are:
  • fillseq
    • load RocksDB in key order with 1 thread
  • revrangeww, fwdrangeww
    • do reverse or forward range queries with a rate-limited writer. Report performance for the range queries
  • readww
    • do point queries with a rate-limited writer. Report performance for the point queries.
  • overwrite
    • overwrite (via Put) random keys

Relative QPS

Many of the tables below (inlined and via URL) show the relative QPS which is:
    (QPS for my version / QPS for RocksDB 6.29)

The base version varies and is listed below. When the relative QPS is > 1.0 then my version is faster than RocksDB 6.29. When it is < 1.0 then there might be a performance regression or there might just be noise.

The spreadsheet with numbers and charts is here. Performance summaries are here.

Results: byrx

This has results for by byrx workload where the database is cached by RocksDB.

RocksDB 10.x is faster than 6.29 for all tests.

Results: iobuf

This has results for by iobuf workload where the database is larger than RAM and RocksDB used buffered IO.

Performance in RocksDB 10.x is about the same as 6.29 except for overwrite. I think the performance decreases in overwrite that arrived in versions 7.x and 8.x are from new correctness checks and throughput in 10.8 is 7% less than in 6.29. The big drop for fillseq in 10.6.2 was from bug 13996.

Results: iodir

This has results for by iodir workload where the database is larger than RAM and RocksDB used O_DIRECT.

Performance in RocksDB 10.x is about the same as 6.29 except for overwrite. I think the performance decreases in overwrite that arrived in versions 7.x and 8.x are from new correctness checks and throughput in 10.8 is 7% less than in 6.29. The big drop for fillseq in 10.6.2 was from bug 13996.

No comments:

Post a Comment

RocksDB performance over time on a small Arm server

This post has results for RocksDB on an Arm server. I previously shared results for RocksDB performance using gcc and clang. Here I share r...