Posts

Showing posts from June, 2022

Setting up a server on GCP

 This is mostly a note to myself to explain what I do to setup a server on GCP for database benchmarks. Create the instance Confirm that quota limits have not been reached on the Quotas page . Go to the VM instances page and click on Create Instance Edit the instance name Edit the region (us-west1 for me) Choose the instance type. Click on Compute Optimized , select the c2 series, select the Machine Type  and then c2-standard-60. Disable hyperthreading to reduce benchmark variance. Click on CPU Platform and GPU, click on vCPUs to core ratio and choose 1 vCPU per core . Scroll down to Boot disk and click on Change . Click on Operating System and select Ubuntu . Click on Version and select Ubuntu 22.04 LTS . Don't change Boot disk type (the default is Balanced persistent disk ). Change Size (GB) to 100 . Then click on Select . Scroll down to Identity and API access and select Allow full access to all Cloud APIs . This enables read and write access to Cloud Object Storage buc

Fixing mmap performance for RocksDB

Image
RocksDB inherited support for mmap from LevelDB. Performance was worse than expected because filesystem readahead fetched more data than needed as I explained in a previous post . I am not a fan of the standard workaround which is to tune kernel settings to reduce readahead because that has an impact for everything running on that server. The DBMS knows more about the IO patterns and can use madvise to provide hints to the OS, just as RocksDB uses fadvise for POSIX IO. Good news, issue 9931 has been fixed and the results are impressive.  Benchmark I used db_bench with an IO-bound workload - the same as was used for my previous post. Two binaries were tested: old - this binary was compiled at git hash ce419c0f and does not have the fix for issue 9931 fix - this binary was compiled at git hash 69a32ee and has the fix for issue 9931. Note that git hash ce419c0f and 69a32ee are adjacent in the commit log. The verify_checksums option was false for all tests. The CPU overhead would be muc

Insert Benchmark for Postgres 12, 13, 14 and 15: part 2

Image
This has graphs of throughput vs time for three of the Insert Benchmark steps. The goal is to determine whether there is too much variance. A common source of variance is checkpoint stalls when using a B-Tree. This is a follow up to my blog post on the Insert Benchmark for Postgres versions 12.11, 13.7, 14.3 and 15b1.  The benchmark steps for which graphs are provided are: l.i0 - load in PK order without secondary indexes l.i1 - load in PK order with 3 secondary indexes The benchmark is repeated for two workloads -- cached and IO-bound.  Cached The database fits in memory for the cached workload. There isn't much variance for the l.i0 workload. The graph for the l.i1 workload is more exciting which is expected. For the l.i0 workload the inserts are in PK order and there are no secondary indexes. So each insert makes R/P pages dirty where R is the row size, P is the page size and R/P is much less than 1. But for l.i1 each insert is likely to make (3 + R/P) pages dirty so there is m

Insert Benchmark for Postgres 12, 13, 14 and 15

Image
I ran the Insert Benchmark for Postgres versions 12.11, 13.7, 14.3 and 15b1. Reports are provided for cached and IO-bound workloads. The benchmark is run on  Intel NUC  servers at low concurrency. The goal is to determine how performance changes over time. A description of the Insert Benchmark is here . tl;dr I am not a Postgres expert regressions are small in most cases the l.i0 benchmark step has regressions that are worth investigating in versions 14 and 15b1 these regressions might have been fixed, see the perf report for the patch (15b1p1) Updates: regressions have been fixed, scroll to the end added links to the configuration files part 2 with throughput vs time graphs is here provided data on the percentage of time in parse, analyze, optimize, execute added command lines  added the index + relation sizes after each test step added links to performance reports when prepared statements were used for queries added links to performance reports with a patch to improve version 15b1