Saturday, September 14, 2024

MySQL and Postgres vs the Insert Benchmark on a large server

This has benchmark results for MySQL and Postgres vs the Insert Benchmark on a large server. My intent is to document how performance changes over time more so than start a Postgres vs MySQL argument.

MySQL has accumulated large regressions from 5.6 to 8.0 that are obvious on low-concurrency benchmarks. While they are less obvious on high-concurrency benchmarks, and there have been significant improvements to make MySQL 8 better at high-concurrency, the regressions that hurt low-concurrency results also reduce throughput at high-concurrency.

tl;dr

  • For Postgres 17rc1 vs 15.8
    • Performance is mostly the same
  • For MySQL 8.0.39 vs 5.6.51
    • Writes are much faster in 8.0.39
    • Reads throughput is mixed,  8.0.39 is slower than, similar to and faster than 5.6.51 depending on the context
  • For MySQL vs Postgres
    • MySQL is faster on point queries
    • Postgres is faster on writes and range queries
Builds, configuration and hardware

I compiled from source:
  • Postgres versions 17rc1 from source using -O2 -fno-omit-frame-pointer.
  • MySQL versions 5.6.51, 5.7.44 and 8.0.39 from source using -DCMAKE_BUILD_TYPE =Release and -O2 -fno-omit-frame-pointer
The server is an ax162-s from Hetzner with 48 cores, AMD 128G RAM and AMD SMT disabled. It uses Ubuntu 22.04 and storage is ext4 using SW RAID 1 over 2 locally attached NVMe devices. More details on it are here. At list prices a similar server from Google Cloud costs 10X more than from Hetzner.

The Postgres configuration file is here.
The MySQL configuration files are here for 5.6.515.7.44 and 8.0.39

The Benchmark

The benchmark is explained here and is run with 20 clients and a table per client with three workloads:
  • cached - database fits in the Postgres buffer pool
  • less IO-bound - database is larger than memory with 64M rows per table
  • more IO-bound - database is larger than memory with 200M rows per table
The benchmark steps are:

  • l.i0
    • insert X million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client. The value of X is 20 for cached, 64 for less IO-bound and 200 for more IO-bound.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts X rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate. The value of X is 40M for cached and 4M for less/more IO-bound.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions) and X rows are inserted and deleted per table. The value of X is 10M for cached and 1M for less/more IO-bound.
    • Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of X is a function of the table size.
  • qr100
    • use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for 1800 seconds. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • like qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
Results: overview

The performance reports are here:
The summary section in each performance report has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version from the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time. The third table makes it easy to see which DBMS+configs failed to meet the SLA.

Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. The base case here is MySQL 5.6.51 for MySQL vs Postgres, MySQL 5.6.51 for only MySQL and Postgres 15.8 for only Postgres.

When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: 
  • insert/s for l.i0, l.i1, l.i2
  • indexed rows/s for l.x
  • range queries/s for qr100, qr500, qr1000
  • point queries/s for qp100, qp500, qp1000
Below I use colors to highlight the relative QPS values with red for <= 0.95, green for >= 1.05 and grey for values between 0.95 and 1.05.

Results: cached

From the summary for only Postgres:
  • Write-heavy steps are up to 11% faster in 17rc1 vs 15.8
  • Read-heavy steps are unchanged in 17rc1 vs 15.8
From the summary for only MySQL:
  • Write-heavy steps are ~2.5X to ~4X faster in 8.0.39 vs 5.6.51
  • Read-heavy steps are between 12% and 27% slower in 8.0.39 vs 5.6.51 
From the summary for MySQL vs Postgres:
  • Postgres 17rc1 is faster on write-heavy and range queries
  • MySQL 8.0.39 and Postgres 17rc1 get similar QPS on point queries
Results: less IO-bound

From the summary for only Postgres:
  • Write-heavy steps are up 13% faster in 17rc1 vs 15.8 except for l.i2 which suffers from an intermittently high CPU overhead from get_actual_variable_range
  • Read-heavy steps are mostly similar between 17rc1 and 15.8
From the summary for only MySQL:
  • Write-heavy steps are ~1.9X to ~4X faster in 8.0.39 vs 5.6.51
  • Read-heavy steps are up to 24% slower in 8.0.39 vs 5.6.51
From the summary for MySQL vs Postgres:
  • Postgres 17rc1 is faster on write-heavy and range queries
    • The range queries are covering and the secondary indexes are mostly cached so the tests are CPU bound for both Postgres and MySQL (see rpq here)
  • MySQL 8.0.39 is faster on point queries
    • MySQL does ~0.1 reads/query while Postgres does ~3 (see rpq here). The table is index organized with MySQL and heap organized with Postgres. So with InnoDB the point queries just access the PK index while with Postgres they first access the PK index and then the heap table to get all of the columns.
Results: more IO-bound

From the summary for only Postgres:
  • Write-heavy steps have similar throughput between 17rc1 and 15.8 ignoring index create (l.x) and variance from l.i2 (problems with get_actual_variable_range)
  • Read-heavy steps are up to 6% faster in 17rc1 vs 15.8
From the summary for only MySQL:
  • Write-heavy steps are up to ~4X faster in 8.0.39 vs 5.6.51
  • Point queries are up to ~3X faster in 8.0.39 vs 5.6.51
  • Range queries are up to 27% slower in 8.0.39 vs 5.6.51
From the summary for MySQL vs Postgres:
  • Postgres 17rc1 is faster on write-heavy and range queries
    • See the less IO-bound section above
    • MySQL does more read IO per range query (see rpq here)
  • MySQL 8.0.39 is faster on point queries
    • See the less IO-bound section above
    • Postgres does ~2X more read IO per point query (see rpq here)








No comments:

Post a Comment

I recently published results with a summary of HW performance counters for sysbench with MySQL on four CPU types. The performance reports we...