Tuesday, June 30, 2020

Java GC and Linkbench

I filed a bug for the Sun JDK a long time ago, sometime around 1998. At the time growing the heap didn't work and the workaround was to set -Xms and -Xmx to the same value. That worked as long as you knew a reasonable value -- too small and your process dies, too large and you waste memory.

Now I get to revisit Java GC. I am running Linkbench and the bin/linkbench script that starts the bench client uses -Xmx=1000 to limit the heap to 1000MB of RAM. Someone else wrote that script and I didn't know it was done until a test failed when the heap was too small.

I use Linkbench on large and small servers and need to be careful about memory usage on the small servers so I ran a few tests to understand the impact on performance and memory usage for Linkbench run with -Xmx=1000, -Xmx=2000 and -Xmx not set. I used a test with a ~10G database (maxid1=10M) and two levels of concurrency -- 16 clients, 64 clients.

For the tests I report metrics from the Linkbench client process: max value for VSZ, max value for RSS and number of CPU seconds. This is collected via "ps aux" run at 30-second intervals and now I wish I reduced that to 1-second intervals but I don't want to repeat the tests. The tests used MySQL 8.0.18 and JDBC via Connector/J 8.0.20.

Now I get to revisit Java GC on Amazon Linux 2 and Java is:
$ java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
Disclaimers -- I am far from a Java expert and the JDK I used might not be the latest and greatest. Feedback is welcome.

Results

I noticed two things:
  1. VSZ is 5X to 10X larger than RSS. I get scared when I see a large VSZ, even when RSS is small. But some good things (jemalloc) do that so eventually I tolerate it.
  2. Unlimited heap size doesn't save much CPU time. I expected it to help more by reducing the GC overhead. It helped a bit with 64 clients, but hurt with 16 clients.
Based on these results I will update my test scripts to use -Xmx=1000 on small servers and -Xmx=2000 on large servers. Were I to use more than 64 concurrent clients then I might use a value larger than 2000.

The data

Legend:
  • hMax - value for -Xmx
  • vsz - Linkbench client max VSZ in GB
  • rss.l, rss.r - Linkbench client max RSS in GB during the load (rss.l) and run (rss.r)
  • cpuSec - number of CPU seconds used by the Linkbench client

---16 clients
hMax    vsz     rss.l   rss.r   cpuSec
1000     5.8    0.5     0.9     2795
2000     6.9    0.8     1.3     2856
none    20.8    0.8     4.0     2853

--- 64 clients
hMax    vsz     rss.l   rss.r   cpuSec
1000     9.0    0.9     1.3     8104
2000    10.1    1.2     1.8     7800
none    24.0    5.2     5.8     7773


No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...