Tuesday, June 30, 2020

Java GC and Linkbench

I filed a bug for the Sun JDK a long time ago, sometime around 1998. At the time growing the heap didn't work and the workaround was to set -Xms and -Xmx to the same value. That worked as long as you knew a reasonable value -- too small and your process dies, too large and you waste memory.

Now I get to revisit Java GC. I am running Linkbench and the bin/linkbench script that starts the bench client uses -Xmx=1000 to limit the heap to 1000MB of RAM. Someone else wrote that script and I didn't know it was done until a test failed when the heap was too small.

I use Linkbench on large and small servers and need to be careful about memory usage on the small servers so I ran a few tests to understand the impact on performance and memory usage for Linkbench run with -Xmx=1000, -Xmx=2000 and -Xmx not set. I used a test with a ~10G database (maxid1=10M) and two levels of concurrency -- 16 clients, 64 clients.

For the tests I report metrics from the Linkbench client process: max value for VSZ, max value for RSS and number of CPU seconds. This is collected via "ps aux" run at 30-second intervals and now I wish I reduced that to 1-second intervals but I don't want to repeat the tests. The tests used MySQL 8.0.18 and JDBC via Connector/J 8.0.20.

Now I get to revisit Java GC on Amazon Linux 2 and Java is:
$ java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
Disclaimers -- I am far from a Java expert and the JDK I used might not be the latest and greatest. Feedback is welcome.

Results

I noticed two things:
  1. VSZ is 5X to 10X larger than RSS. I get scared when I see a large VSZ, even when RSS is small. But some good things (jemalloc) do that so eventually I tolerate it.
  2. Unlimited heap size doesn't save much CPU time. I expected it to help more by reducing the GC overhead. It helped a bit with 64 clients, but hurt with 16 clients.
Based on these results I will update my test scripts to use -Xmx=1000 on small servers and -Xmx=2000 on large servers. Were I to use more than 64 concurrent clients then I might use a value larger than 2000.

The data

Legend:
  • hMax - value for -Xmx
  • vsz - Linkbench client max VSZ in GB
  • rss.l, rss.r - Linkbench client max RSS in GB during the load (rss.l) and run (rss.r)
  • cpuSec - number of CPU seconds used by the Linkbench client

---16 clients
hMax    vsz     rss.l   rss.r   cpuSec
1000     5.8    0.5     0.9     2795
2000     6.9    0.8     1.3     2856
none    20.8    0.8     4.0     2853

--- 64 clients
hMax    vsz     rss.l   rss.r   cpuSec
1000     9.0    0.9     1.3     8104
2000    10.1    1.2     1.8     7800
none    24.0    5.2     5.8     7773


Feedback from the Postgres community about the vector index benchmarks

This is a response to some of the feedback I received from the Postgres community about my recent benchmark results for vector indexes usin...