This has results from db_bench on the same instance type configured with and without hyperthreading to determine the impact from it.
tl;dr
- I used the LRU block cache and will soon repeat this experiment with Hyper Clock Cache
- When the CPU is oversubscribed hyperthreading improves QPS on read-heavy benchmark steps but hurts it on the high-concurrency, write-heavy step (overwrite)
Builds
I used RocksDB 8.5.0 compiled from source.
Benchmark
The benchmark was run with the LRU block cache and an in-memory workload. The test database was ~15GB. The RocksDB benchmarks scripts were used (here and here).
The test server is a c2-standard-60 server from GCP with 120G of RAM. The OS is Ubuntu 22.04. I repeated tests for it it with and without hyperthreading and name the servers ht0 and ht1:
- ht0 - hyperthreads disabled, 30 HW threads and 30 cores
- ht1 - hyperthreads enabled, 60 HW threads and 30 cores
The benchmark was repeated for 10, 20, 30, 40 and 50 threads. At 10 threads the CPU is undersubscribed for both ht0 and ht1. At 50 threads the CPU is oversubscribed for both ht0 and ht1. I want to see the impact on performance as the workload changes from an undersubscribed to an oversubscribed CPU.
Results
Results are here and charts for the results are below. The y-axis for the charts starts at 0.9 rather than 0 to improve readability. The charts show the relative QPS which is (QPS for ht1 / QPS for ht0). Hyperthreading helps when the relative QPS is great than 1.
- At 10 and 20 threads hyperthreading has a small, negative impact on QPS
- At 30 threads hyperthreading has no impact on QPS
- At 40 and 50 threads hyperthreading helps performance for read heavy tests and hurts it for the concurrent write heavy test (overwrite)
- Note that fillseq always runs with 1 thread regardless of what the other tests use
For both tests, the configuration with a much larger context switch rate (more than 2X larger) gets more QPS: ~1.2X more on fwdrangeww for ht1, ~1.1X more for ht0 on overwrite.
ht0 112 62 175
ht1 175 102 277
Next are the average values for context switches (cs), user CPU time (us) and system CPU time (sy) from vmstat. The average CPU utilization sustained is higher with hyperthreads disabled, but that can also be misleading for the same reason as mentioned above. The context switch rate is much higher when hyperthreads are disabled for 20, 30, 40 and 50 threads. That can mean there is more mutex contention.
cs us sy
15878 21.3 8.4
16753 10.9 4.4
cs us sy
50776 30.9 12.8
28250 15.7 6.8
cs us sy
526622 31.3 14.5
102929 20.1 9.9
cs us sy
833918 29.1 15.9
248478 22.3 12.7
cs us sy
1107510 27.7 15.8
461239 19.6 11.7
No comments:
Post a Comment